The design of this system intentionally prioritizes simplicity, determinism, and human authority. We use AI to power the heavy lifting of extraction, but the workflow is triggered and managed by explicit human actions.
This post details the operational workflow, from the moment a PDF hits the disk to the moment data enters the ERP.
1. The “Lifecycle” Folder Structure
We strictly separate Infrastructure State (Where is the file?) from Business State (Is the PO valid?). The filesystem tracks the lifecycle of the file object, while the database tracks the status of the purchase order.
Directory Structure
/po/
├── input/ # Landing Zone. Files here are "Pending Action."
├── inprogress/ # Lock Zone. Files currently being read by the system.
├── archive/ # The Vault. All successfully ingested files live here permanently.
└── error/ # The Morgue. Corrupted or unreadable files (System failures).
Note: No filesystem-based business folders (like processed or failed) are used. The database is the single source of truth for business status.
2. UI-Driven Ingestion
Ingestion is not a background cron job that silently swallows files. It is an explicit, user-initiated ceremony to ensure operators know precisely what is entering the system.
The Workflow:
- Review Input: The user visits the
/logpage on the web UI. This page scans/po/inputand displays a list of pending PDF files. - Trigger Processing: The user clicks the “Process All” button.
- Backend Execution: For each file in the list, the backend performs an atomic transaction:
- Move: File moved from
input->inprogress(preventing duplicate clicks). - Register: A new row is created in the MariaDB
purchase_orderstable with statusPROCESSING. - Archive: Once the DB record is secure, the file is moved from
inprogress->archive. - Dispatch: A message
INGESTION_EVENTis published to RabbitMQ to trigger the AI.
- Move: File moved from
Note: Even if the AI fails to read the document later, the file remains safely in /archive/. We rely on the Database status (NEEDS_REVIEW) to flag it for the user, rather than shuffling files around on disk.
3. Asynchronous AI Processing
The Order Processor service consumes the INGESTION_EVENT and begins the AI workflow. This is where the architecture differs significantly from traditional OCR approaches.
Step A: Image Conversion (The “Eyes”)
Instead of using OCR to extract text (which loses layout context), we convert the PDF pages into high-resolution images. We are using a Vision-Language Model (VLM), so we need the AI to “see” the document exactly as a human does.
Step B: Visual Inference (The “Brain”)
The system calls the AI Inference Service (hosting Qwen2-VL-7B-Instruct via vLLM).
- Input: The raw image of the purchase order.
- Prompt: “You are a specialized parser. Extract the Vendor Name, PO Date, and Line Items from this image into the following JSON schema…”
- Output: A clean, valid JSON object containing the business data.
Note: There is no separate “OCR” step. The model reads the pixels and outputs the JSON in a single pass.
4. Smart Validation (The “Guardrails”)
Once the AI returns the JSON, it isn’t blindly trusted. It passes through the Anomaly Detection Engine:
- Structure Check: Is the JSON valid? Do the required fields exist?
- Business Logic:
- Does
Line Item Total=Quantity*Unit Price? - Does the sum of line items match the
Grand Total? - Is the date in the future?
- Does
- State Transition:
- Pass: Status update ->
READY_FOR_ERP - Fail: Status update ->
NEEDS_REVIEW(with specific error reason, e.g., “Math Mismatch”).
- Pass: Status update ->
5. The “Expert Loop” Interface
The user returns to the UI to manage the results. Because we kept the filesystem simple, the UI logic is straightforward:
- To Review: The UI queries the DB:
SELECT * FROM purchase_orders WHERE current_state = 'NEEDS_REVIEW'. - To Display: The UI loads the corresponding PDF from
/po/archive/.
The Decision Fork:
From this screen, the user makes one of two explicit decisions:
Path A: “Approve for ERP”
- Scenario: The AI was correct, or the user fixed a minor typo.
- Action: The validated JSON is posted to the ERP system.
- State:
POSTED_TO_ERP
Path B: “Mark for Retraining”
- Scenario: The AI fundamentally misinterpreted the document layout (e.g., missed an entire column).
- Action: The user corrects the data and flags it.
- State:
RETRAIN_REQUIRED - Outcome: This specific document (fetched from
/archive/) and its corrected data are queued for the Model Retraining Service to improve the next version of the model.
6. MariaDB Schema
The database schema tracks the PO’s detailed status, decoupling it from the file location.
SQL
CREATE TABLE purchase_orders (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
doc_id CHAR(64) NOT NULL, -- SHA-256 Hash
filename VARCHAR(255) NOT NULL, -- Points to file in /archive/
current_state ENUM(
'INITIAL',
'PROCESSING',
'READY_FOR_ERP', -- AI High Confidence & Logic Pass
'NEEDS_REVIEW', -- Logic Failure or Low Confidence
'POSTED_TO_ERP',
'RETRAIN_REQUIRED', -- Flagged by Human
'RETRAINED'
) NOT NULL DEFAULT 'INITIAL',
extracted_json JSON, -- Raw output from Qwen2-VL
corrected_json JSON, -- Human-corrected version
validation_errors TEXT, -- Why did it fail logic checks?
erp_reference_id VARCHAR(128),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY uq_doc_identity (filename, doc_id)
);
7. State Transition Diagram
To ensure strict auditability, the purchase order moves through a deterministic state machine. The workflow enables automation where possible but enforces human decision-making where necessary.
[*] --> INITIAL: File in /input
INITIAL --> PROCESSING: User clicks "Process"
state "AI Inference & Logic Check" as AI_CHECK
PROCESSING --> AI_CHECK
state join_state <<choice>>
AI_CHECK --> join_state
join_state --> READY_FOR_ERP: High Confidence & Valid Logic
join_state --> NEEDS_REVIEW: Anomaly Detected / Validation Failed
READY_FOR_ERP --> POSTED_TO_ERP: Auto-Post / One-Click Approval
NEEDS_REVIEW --> POSTED_TO_ERP: User Corrects Data
NEEDS_REVIEW --> RETRAIN_REQUIRED: User Flags Model Failure
RETRAIN_REQUIRED --> RETRAINED: Model Retraining Service
POSTED_TO_ERP --> [*]
RETRAINED --> [*]
The Logic Flow:
- The Happy Path: The document moves from
PROCESSING$\to$READY_FOR_ERP$\to$POSTED_TO_ERP. In a mature system, this happens without human intervention. - The Safety Net: Any logic failure (e.g., date mismatch) forces the document into
NEEDS_REVIEW. - The Learning Path: This is the architecture’s unique value. Instead of just fixing the error, the user can branch to RETRAIN_REQUIRED. This guarantees that the specific edge case that caused the failure is fed back into the model, thereby closing the quality loop.