Skip to main content

Invoice ingestion

The pull-side patterns (REST API pull, SFTP pull, SFTP push) all live in your customer's domain — they assume Nuntiq already has the invoice and you're either pulling supporting master data in or pushing the invoice out to an ERP.

This page covers the other direction: putting invoices into Nuntiq from a connector. Two flows, both wrapped in IngestionLoad.

FlowUse when
IngestionLoad.ingest_pdf(...)You have a PDF / image / ZIP and want Nuntiq's standard pipeline (OCR → template extraction → enrichment). Same path as email intake.
IngestionLoad.new_invoice() + .submit()You already have parsed invoice data (from an XML / EDI feed) plus optionally the original PDF, and you want to import the structured fields directly without OCR.

The raw context.api.upload_source_documents / upload_invoice_attachments / invoice_import helpers are still available if you need them, but IngestionLoad is the recommended entry point — it handles the upload-then-import token-threading dance for you, validates inputs before the HTTP call, and returns typed results.

Pattern 1 — Drop a PDF into the standard pipeline

The simplest case. You have a file, you want Nuntiq to process it the same way it processes emailed invoices. Nuntiq runs OCR, picks a template, extracts fields, runs validation + enrichment, ends up with a fully-processed invoice at status 100 (or whichever stop you configure).

connectors/pull_supplier_portal_invoices.py
"""Pull invoice PDFs from a supplier portal and ingest them."""
import requests

from lib.objects.ingestion import IngestionLoad


def run(context):
portal_url = context.get_config('portal_url')
api_token = context.get_secret('portal_api_token')
receiving_inbox = context.get_config('receiving_inbox') # e.g. 'invoices@acme.apreceiving.com'

ingest = IngestionLoad(context)

# Fetch the list of new invoices from the supplier portal
headers = {'Authorization': f'Bearer {api_token}'}
resp = requests.get(f'{portal_url}/new-invoices', headers=headers, timeout=30)
resp.raise_for_status()

uploaded = 0
for inv in resp.json()['invoices']:
# Download the PDF
pdf_resp = requests.get(inv['pdf_url'], headers=headers, timeout=60)
pdf_resp.raise_for_status()

# Ingest
result = ingest.ingest_pdf(
filename=inv['filename'],
content=pdf_resp.content,
receiving_inbox=receiving_inbox,
supplier_code=inv.get('supplier_code'),
)

context.logger.info(
f"Ingested {inv['filename']} → source_document_id={result.source_document_id}",
step='ingest',
detail={'supplier_code': inv.get('supplier_code')},
)
uploaded += 1

return {'uploaded': uploaded}

What receiving_inbox does

It's the address Nuntiq uses for template lookup. Different receiving inboxes can map to different templates — e.g. xml@acme.apreceiving.com might use a structured-data template, invoices@acme.apreceiving.com uses the standard OCR template. Default to your customer's primary inbox unless you have a specific routing reason.

Hints

  • company_code and supplier_code are both optional. Pass them when you know — the worker uses them to skip the AI-match step for those fields.
  • ZIP files are unzipped automatically. Send up to 20 files per request; for larger batches, use ingest_files() instead and chunk.
  • Max file size is 50MB per file.

Pattern 2 — Submit structured invoice + attach the original PDF

When you already have parsed invoice data (an XML feed, an EDI 810, a JSON payload from an upstream system) AND the original PDF, this is the right pattern. The builder handles the upload-then-import dance — you never touch document tokens directly.

connectors/import_structured_invoices.py
from lib.objects.ingestion import IngestionLoad


def run(context):
receiving_inbox = context.get_config('receiving_inbox')
ingest = IngestionLoad(context)

imported = 0
for raw in fetch_structured_invoices(): # your code
inv = ingest.new_invoice(
workflow='payment_portal',
receiving_inbox=receiving_inbox,
)

# Header — set whatever fields you have
inv.header.invoice_number = raw['number']
inv.header.invoice_date = raw['date'] # ISO date string
inv.header.due_date = raw['due_date']
inv.header.currency_code = raw['currency']
inv.header.net_amount = raw['net']
inv.header.tax_amount = raw['tax']
inv.header.gross_amount = raw['gross']
inv.header.supplier_name = raw['supplier']
inv.header.order_number_1 = raw.get('po_number')

# Lines
for i, line in enumerate(raw['lines']):
l = inv.new_line()
l.invoice_line_number = i + 1
l.product_name = line['desc']
l.quantity = line['qty']
l.unit_price = line['price']
l.net_amount = line['net']
l.tax_amount = line.get('tax', 0)

# Addresses — address_type must be REMITTO | SUPPLIER | SHIPTO | BILLTO
addr = inv.new_address('SUPPLIER')
addr.street = raw['supplier_addr']['street']
addr.city = raw['supplier_addr']['city']
addr.country = raw['supplier_addr']['country']

# Optional tax breakdown
tax = inv.new_tax()
tax.tax_name = 'VAT'
tax.tax_rate = 21
tax.tax_amount = raw['tax']

# Attach the original PDF (and any supporting docs)
inv.attach_image(filename=raw['pdf_name'], content=raw['pdf_bytes'])
for att in raw.get('supporting_docs', []):
inv.attach_file(filename=att['name'], content=att['bytes'])

# Submit — uploads attachments first, then POSTs the structured invoice
result = inv.submit()

context.logger.info(
f"Imported {raw['number']} → invoice_token={result.invoice_token}",
step='import',
detail={
'invoice_status': result.invoice_status,
'template_id': result.template_id,
},
)
imported += 1

return {'imported': imported}

Workflow choice

workflow controls how far Nuntiq processes the invoice on import:

ValueResulting statusUse when
payment_portal (default)100 (Processed)The invoice arrives fully verified — skip enrichment entirely
basic_enrichment10Run light enrichment but no AI
full_enrichment1Run the full pipeline (AI match, validation)

If you trust the upstream data, payment_portal is faster — it sends the invoice straight to the payable stage without re-running OCR or AI match.

Attachment rules

  • inv.attach_image(filename, content) — at most one per invoice. This is the PDF the AP team will view in the Nuntiq viewer. Calling attach_image() twice raises ValueError.
  • inv.attach_file(filename, content) — 0..N supporting documents (delivery notes, supporting POs, etc.). Visible to the AP team but not the primary view.
  • If you don't pass any IMAGE, the invoice has no viewable PDF — AP team sees the structured fields only. Usually a poor UX. Aim to always include at least one IMAGE.

Pattern 3 — Structured invoice only, no PDF

Same as Pattern 2, just skip the attach_image() and attach_file() calls. The invoice gets created with no viewable image — make sure the AP team is OK with that before you build a pipeline that does it at scale.

What submit() returns

IngestedInvoice.submit() returns an InvoiceImportResult:

FieldTypeNotes
invoice_tokenstr (UUID)Pass to InvoiceLoad.get_by_token()
invoice_idintInternal numeric id
source_document_idintUnderlying source-document id
workflowstrThe workflow used
invoice_statusintFinal numeric invoice status (e.g. 100 = Processed)
source_document_statusintFinal source-document status
template_id / template_typeint / strTemplate Nuntiq picked + whether it's 'global' or 'customer'
attachments_linkedintCount of attachments tied to the invoice
receiving_inboxstrThe inbox used for template lookup

IngestionLoad.ingest_pdf() / ingest_files() return a smaller SourceDocumentResult:

FieldType
source_document_idint
files_uploadedint
pdf_files_queuedint
upload_session_idstr (UUID)

Both objects carry a _raw attribute with the full HTTP response if you need to read fields the wrappers haven't surfaced.

Error handling

Local validation (workflow value, address_type value, max-one-image) raises ValueError before any HTTP call — you find out at code-write time.

ingest_pdf() and submit() raise ApiError on transport / HTTP failure. See ApiClient → Error handling.

Common server-side rejections:

StatusBodyCause
400{message: 'workflow is required'}Missing workflow on new_invoice
400{message: 'Unknown document_token: ...'}Attachment token doesn't exist or was already linked (shouldn't happen via IngestionLoad — it does the linking for you)
404{message: 'No template found for receiving_inbox ...'}The receiving_inbox you passed isn't configured on the tenant
500variesServer-side; usually transient — retry

Choosing between Pattern 1 and Pattern 2

The simplest filter:

  • Have a PDF only → Pattern 1.
  • Have structured data only or structured data + PDF → Pattern 2.
  • Have structured data and don't care about a viewable PDF → Pattern 3.

If you have both options available (structured data AND a PDF), Pattern 2 is strictly better than Pattern 1 — Nuntiq skips OCR + template extraction (faster and more deterministic) and you still get the PDF for the AP team to view.

Idempotency

Neither endpoint has an idempotency-key concept today. If you call ingest_pdf or inv.submit() twice with the same data, Nuntiq creates two records. Track which upstream records you've already submitted in your own state — typically a delta_run cursor — and don't re-submit.

Raw helpers (escape hatch)

If you need finer control than IngestionLoad offers, the raw helpers on context.api are still available:

  • context.api.upload_source_documents(files, ...) — multipart upload to /v1/source-document. Returns the raw {data: {sourceDocumentId, ...}} envelope.
  • context.api.upload_invoice_attachments(files) — multipart upload to /v1/invoice-attachments. Returns {data: {documents: [{document_token, type}, ...]}}.
  • context.api.invoice_import(body) — POST to /v1/invoice-import with the body shape documented in the Customer API reference.

The IngestionLoad implementation in lib/objects/ingestion.py is the canonical example of how to thread tokens between them.

What's next

  • IngestionLoad reference — field-level reference for the builder.
  • Invoice claim flow — once an invoice is ingested, this is how downstream connectors push it onward to an ERP.
  • Lifecycle messages — once your upstream system knows the invoice's Nuntiq token, write lifecycle events back so the Nuntiq timeline reflects what's happening in your system.