Invoice ingestion

The pull-side patterns (REST API pull, SFTP pull, SFTP push) all live in your customer's domain — they assume Nuntiq already has the invoice and you're either pulling supporting master data in or pushing the invoice out to an ERP.

This page covers the other direction: putting invoices into Nuntiq from a connector. Two flows, both wrapped in IngestionLoad.

Flow	Use when
`IngestionLoad.ingest_pdf(...)`	You have a PDF / image / ZIP and want Nuntiq's standard pipeline (OCR → template extraction → enrichment). Same path as email intake.
`IngestionLoad.new_invoice() + .submit()`	You already have parsed invoice data (from an XML / EDI feed) plus optionally the original PDF, and you want to import the structured fields directly without OCR.

The raw context.api.upload_source_documents / upload_invoice_attachments / invoice_import helpers are still available if you need them, but IngestionLoad is the recommended entry point — it handles the upload-then-import token-threading dance for you, validates inputs before the HTTP call, and returns typed results.

Pattern 1 — Drop a PDF into the standard pipeline

The simplest case. You have a file, you want Nuntiq to process it the same way it processes emailed invoices. Nuntiq runs OCR, picks a template, extracts fields, runs validation + enrichment, ends up with a fully-processed invoice at status 100 (or whichever stop you configure).

connectors/pull_supplier_portal_invoices.py
"""Pull invoice PDFs from a supplier portal and ingest them."""
import requests

from lib.objects.ingestion import IngestionLoad


def run(context):
    portal_url = context.get_config('portal_url')
    api_token = context.get_secret('portal_api_token')
    receiving_inbox = context.get_config('receiving_inbox')   # e.g. 'invoices@acme.apreceiving.com'

    ingest = IngestionLoad(context)

    # Fetch the list of new invoices from the supplier portal
    headers = {'Authorization': f'Bearer {api_token}'}
    resp = requests.get(f'{portal_url}/new-invoices', headers=headers, timeout=30)
    resp.raise_for_status()

    uploaded = 0
    for inv in resp.json()['invoices']:
        # Download the PDF
        pdf_resp = requests.get(inv['pdf_url'], headers=headers, timeout=60)
        pdf_resp.raise_for_status()

        # Ingest
        result = ingest.ingest_pdf(
            filename=inv['filename'],
            content=pdf_resp.content,
            receiving_inbox=receiving_inbox,
            supplier_code=inv.get('supplier_code'),
        )

        context.logger.info(
            f"Ingested {inv['filename']} → source_document_id={result.source_document_id}",
            step='ingest',
            detail={'supplier_code': inv.get('supplier_code')},
        )
        uploaded += 1

    return {'uploaded': uploaded}

What `receiving_inbox` does

It's the address Nuntiq uses for template lookup. Different receiving inboxes can map to different templates — e.g. xml@acme.apreceiving.com might use a structured-data template, invoices@acme.apreceiving.com uses the standard OCR template. Default to your customer's primary inbox unless you have a specific routing reason.

Hints

company_code and supplier_code are both optional. Pass them when you know — the worker uses them to skip the AI-match step for those fields.
ZIP files are unzipped automatically. Send up to 20 files per request; for larger batches, use ingest_files() instead and chunk.
Max file size is 50MB per file.

Pattern 2 — Submit structured invoice + attach the original PDF

When you already have parsed invoice data (an XML feed, an EDI 810, a JSON payload from an upstream system) AND the original PDF, this is the right pattern. The builder handles the upload-then-import dance — you never touch document tokens directly.

connectors/import_structured_invoices.py
from lib.objects.ingestion import IngestionLoad


def run(context):
    receiving_inbox = context.get_config('receiving_inbox')
    ingest = IngestionLoad(context)

    imported = 0
    for raw in fetch_structured_invoices():           # your code
        inv = ingest.new_invoice(
            workflow='payment_portal',
            receiving_inbox=receiving_inbox,
        )

        # Header — set whatever fields you have
        inv.header.invoice_number = raw['number']
        inv.header.invoice_date = raw['date']         # ISO date string
        inv.header.due_date = raw['due_date']
        inv.header.currency_code = raw['currency']
        inv.header.net_amount = raw['net']
        inv.header.tax_amount = raw['tax']
        inv.header.gross_amount = raw['gross']
        inv.header.supplier_name = raw['supplier']
        inv.header.order_number_1 = raw.get('po_number')

        # Lines
        for i, line in enumerate(raw['lines']):
            l = inv.new_line()
            l.invoice_line_number = i + 1
            l.product_name = line['desc']
            l.quantity = line['qty']
            l.unit_price = line['price']
            l.net_amount = line['net']
            l.tax_amount = line.get('tax', 0)

        # Addresses — address_type must be REMITTO | SUPPLIER | SHIPTO | BILLTO
        addr = inv.new_address('SUPPLIER')
        addr.street = raw['supplier_addr']['street']
        addr.city = raw['supplier_addr']['city']
        addr.country = raw['supplier_addr']['country']

        # Optional tax breakdown
        tax = inv.new_tax()
        tax.tax_name = 'VAT'
        tax.tax_rate = 21
        tax.tax_amount = raw['tax']

        # Attach the original PDF (and any supporting docs)
        inv.attach_image(filename=raw['pdf_name'], content=raw['pdf_bytes'])
        for att in raw.get('supporting_docs', []):
            inv.attach_file(filename=att['name'], content=att['bytes'])

        # Submit — uploads attachments first, then POSTs the structured invoice
        result = inv.submit()

        context.logger.info(
            f"Imported {raw['number']} → invoice_token={result.invoice_token}",
            step='import',
            detail={
                'invoice_status': result.invoice_status,
                'template_id': result.template_id,
            },
        )
        imported += 1

    return {'imported': imported}

Workflow choice

workflow controls how far Nuntiq processes the invoice on import:

Value	Resulting status	Use when
`payment_portal` (default)	100 (Processed)	The invoice arrives fully verified — skip enrichment entirely
`basic_enrichment`	10	Run light enrichment but no AI
`full_enrichment`	1	Run the full pipeline (AI match, validation)

If you trust the upstream data, payment_portal is faster — it sends the invoice straight to the payable stage without re-running OCR or AI match.

Attachment rules

inv.attach_image(filename, content) — at most one per invoice. This is the PDF the AP team will view in the Nuntiq viewer. Calling attach_image() twice raises ValueError.
inv.attach_file(filename, content) — 0..N supporting documents (delivery notes, supporting POs, etc.). Visible to the AP team but not the primary view.
If you don't pass any IMAGE, the invoice has no viewable PDF — AP team sees the structured fields only. Usually a poor UX. Aim to always include at least one IMAGE.

Pattern 3 — Structured invoice only, no PDF

Same as Pattern 2, just skip the attach_image() and attach_file() calls. The invoice gets created with no viewable image — make sure the AP team is OK with that before you build a pipeline that does it at scale.

What `submit()` returns

IngestedInvoice.submit() returns an InvoiceImportResult:

Field	Type	Notes
`invoice_token`	str (UUID)	Pass to `InvoiceLoad.get_by_token()`
`invoice_id`	int	Internal numeric id
`source_document_id`	int	Underlying source-document id
`workflow`	str	The workflow used
`invoice_status`	int	Final numeric invoice status (e.g. 100 = Processed)
`source_document_status`	int	Final source-document status
`template_id` / `template_type`	int / str	Template Nuntiq picked + whether it's `'global'` or `'customer'`
`attachments_linked`	int	Count of attachments tied to the invoice
`receiving_inbox`	str	The inbox used for template lookup

IngestionLoad.ingest_pdf() / ingest_files() return a smaller SourceDocumentResult:

Field	Type
`source_document_id`	int
`files_uploaded`	int
`pdf_files_queued`	int
`upload_session_id`	str (UUID)

Both objects carry a _raw attribute with the full HTTP response if you need to read fields the wrappers haven't surfaced.

Error handling

Local validation (workflow value, address_type value, max-one-image) raises ValueError before any HTTP call — you find out at code-write time.

ingest_pdf() and submit() raise ApiError on transport / HTTP failure. See ApiClient → Error handling.

Common server-side rejections:

Status	Body	Cause
400	`{message: 'workflow is required'}`	Missing `workflow` on `new_invoice`
400	`{message: 'Unknown document_token: ...'}`	Attachment token doesn't exist or was already linked (shouldn't happen via `IngestionLoad` — it does the linking for you)
404	`{message: 'No template found for receiving_inbox ...'}`	The `receiving_inbox` you passed isn't configured on the tenant
500	varies	Server-side; usually transient — retry

Choosing between Pattern 1 and Pattern 2

The simplest filter:

Have a PDF only → Pattern 1.
Have structured data only or structured data + PDF → Pattern 2.
Have structured data and don't care about a viewable PDF → Pattern 3.

If you have both options available (structured data AND a PDF), Pattern 2 is strictly better than Pattern 1 — Nuntiq skips OCR + template extraction (faster and more deterministic) and you still get the PDF for the AP team to view.

Idempotency

Neither endpoint has an idempotency-key concept today. If you call ingest_pdf or inv.submit() twice with the same data, Nuntiq creates two records. Track which upstream records you've already submitted in your own state — typically a delta_run cursor — and don't re-submit.

Raw helpers (escape hatch)

If you need finer control than IngestionLoad offers, the raw helpers on context.api are still available:

context.api.upload_source_documents(files, ...) — multipart upload to /v1/source-document. Returns the raw {data: {sourceDocumentId, ...}} envelope.
context.api.upload_invoice_attachments(files) — multipart upload to /v1/invoice-attachments. Returns {data: {documents: [{document_token, type}, ...]}}.
context.api.invoice_import(body) — POST to /v1/invoice-import with the body shape documented in the Customer API reference.

The IngestionLoad implementation in lib/objects/ingestion.py is the canonical example of how to thread tokens between them.

What's next

IngestionLoad reference — field-level reference for the builder.
Invoice claim flow — once an invoice is ingested, this is how downstream connectors push it onward to an ERP.
Lifecycle messages — once your upstream system knows the invoice's Nuntiq token, write lifecycle events back so the Nuntiq timeline reflects what's happening in your system.

Pattern 1 — Drop a PDF into the standard pipeline​

What receiving_inbox does​

Hints​

Pattern 2 — Submit structured invoice + attach the original PDF​

Workflow choice​

Attachment rules​

Pattern 3 — Structured invoice only, no PDF​

What submit() returns​

Error handling​

Choosing between Pattern 1 and Pattern 2​

Idempotency​

Raw helpers (escape hatch)​

What's next​