AI-Assisted POP: Document Ingestion, AI Inference, and Human-Driven Workflow

The design of this system intentionally prioritizes simplicity, determinism, and human authority. We use AI to power the heavy lifting of extraction, but the workflow is triggered and managed by explicit human actions.

This post details the operational workflow, from the moment a PDF hits the disk to the moment data enters the ERP.

1. The “Lifecycle” Folder Structure

We strictly separate Infrastructure State (Where is the file?) from Business State (Is the PO valid?). The filesystem tracks the lifecycle of the file object, while the database tracks the status of the purchase order.

Directory Structure

/po/
├── input/       # Landing Zone. Files here are "Pending Action."
├── inprogress/  # Lock Zone. Files currently being read by the system.
├── archive/     # The Vault. All successfully ingested files live here permanently.
└── error/       # The Morgue. Corrupted or unreadable files (System failures).

Note: No filesystem-based business folders (like processed or failed) are used. The database is the single source of truth for business status.

2. UI-Driven Ingestion

Ingestion is not a background cron job that silently swallows files. It is an explicit, user-initiated ceremony to ensure operators know precisely what is entering the system.

The Workflow:

  1. Review Input: The user visits the /log page on the web UI. This page scans /po/input and displays a list of pending PDF files.
  2. Trigger Processing: The user clicks the “Process All” button.
  3. Backend Execution: For each file in the list, the backend performs an atomic transaction:
    • Move: File moved from input -> inprogress (preventing duplicate clicks).
    • Register: A new row is created in the MariaDB purchase_orders table with status PROCESSING.
    • Archive: Once the DB record is secure, the file is moved from inprogress -> archive.
    • Dispatch: A message INGESTION_EVENT is published to RabbitMQ to trigger the AI.

Note: Even if the AI fails to read the document later, the file remains safely in /archive/. We rely on the Database status (NEEDS_REVIEW) to flag it for the user, rather than shuffling files around on disk.

3. Asynchronous AI Processing

The Order Processor service consumes the INGESTION_EVENT and begins the AI workflow. This is where the architecture differs significantly from traditional OCR approaches.

Step A: Image Conversion (The “Eyes”)

Instead of using OCR to extract text (which loses layout context), we convert the PDF pages into high-resolution images. We are using a Vision-Language Model (VLM), so we need the AI to “see” the document exactly as a human does.

Step B: Visual Inference (The “Brain”)

The system calls the AI Inference Service (hosting Qwen2-VL-7B-Instruct via vLLM).

  • Input: The raw image of the purchase order.
  • Prompt: “You are a specialized parser. Extract the Vendor Name, PO Date, and Line Items from this image into the following JSON schema…”
  • Output: A clean, valid JSON object containing the business data.

Note: There is no separate “OCR” step. The model reads the pixels and outputs the JSON in a single pass.

4. Smart Validation (The “Guardrails”)

Once the AI returns the JSON, it isn’t blindly trusted. It passes through the Anomaly Detection Engine:

  1. Structure Check: Is the JSON valid? Do the required fields exist?
  2. Business Logic:
    • Does Line Item Total = Quantity * Unit Price?
    • Does the sum of line items match the Grand Total?
    • Is the date in the future?
  3. State Transition:
    • Pass: Status update -> READY_FOR_ERP
    • Fail: Status update -> NEEDS_REVIEW (with specific error reason, e.g., “Math Mismatch”).

5. The “Expert Loop” Interface

The user returns to the UI to manage the results. Because we kept the filesystem simple, the UI logic is straightforward:

  • To Review: The UI queries the DB: SELECT * FROM purchase_orders WHERE current_state = 'NEEDS_REVIEW'.
  • To Display: The UI loads the corresponding PDF from /po/archive/.

The Decision Fork:

From this screen, the user makes one of two explicit decisions:

Path A: “Approve for ERP”

  • Scenario: The AI was correct, or the user fixed a minor typo.
  • Action: The validated JSON is posted to the ERP system.
  • State: POSTED_TO_ERP

Path B: “Mark for Retraining”

  • Scenario: The AI fundamentally misinterpreted the document layout (e.g., missed an entire column).
  • Action: The user corrects the data and flags it.
  • State: RETRAIN_REQUIRED
  • Outcome: This specific document (fetched from /archive/) and its corrected data are queued for the Model Retraining Service to improve the next version of the model.

6. MariaDB Schema

The database schema tracks the PO’s detailed status, decoupling it from the file location.

SQL

CREATE TABLE purchase_orders (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    doc_id CHAR(64) NOT NULL,            -- SHA-256 Hash
    filename VARCHAR(255) NOT NULL,      -- Points to file in /archive/
    current_state ENUM(
        'INITIAL',
        'PROCESSING',
        'READY_FOR_ERP',                 -- AI High Confidence & Logic Pass
        'NEEDS_REVIEW',                  -- Logic Failure or Low Confidence
        'POSTED_TO_ERP',
        'RETRAIN_REQUIRED',              -- Flagged by Human
        'RETRAINED'
    ) NOT NULL DEFAULT 'INITIAL',
    
    extracted_json JSON,                 -- Raw output from Qwen2-VL
    corrected_json JSON,                 -- Human-corrected version
    validation_errors TEXT,              -- Why did it fail logic checks?
    erp_reference_id VARCHAR(128),
    
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    
    UNIQUE KEY uq_doc_identity (filename, doc_id)
);

7. State Transition Diagram

To ensure strict auditability, the purchase order moves through a deterministic state machine. The workflow enables automation where possible but enforces human decision-making where necessary.

    [*] --> INITIAL: File in /input
    INITIAL --> PROCESSING: User clicks "Process"
    
    state "AI Inference & Logic Check" as AI_CHECK
    PROCESSING --> AI_CHECK
    
    state join_state <<choice>>
    AI_CHECK --> join_state
    
    join_state --> READY_FOR_ERP: High Confidence & Valid Logic
    join_state --> NEEDS_REVIEW: Anomaly Detected / Validation Failed
    
    READY_FOR_ERP --> POSTED_TO_ERP: Auto-Post / One-Click Approval
    
    NEEDS_REVIEW --> POSTED_TO_ERP: User Corrects Data
    NEEDS_REVIEW --> RETRAIN_REQUIRED: User Flags Model Failure
    
    RETRAIN_REQUIRED --> RETRAINED: Model Retraining Service
    POSTED_TO_ERP --> [*]
    RETRAINED --> [*]

The Logic Flow:

  1. The Happy Path: The document moves from PROCESSING $\to$ READY_FOR_ERP $\to$ POSTED_TO_ERP. In a mature system, this happens without human intervention.
  2. The Safety Net: Any logic failure (e.g., date mismatch) forces the document into NEEDS_REVIEW.
  3. The Learning Path: This is the architecture’s unique value. Instead of just fixing the error, the user can branch to RETRAIN_REQUIRED. This guarantees that the specific edge case that caused the failure is fed back into the model, thereby closing the quality loop.