Invoice ingestion
The pull-side patterns (REST API pull, SFTP pull, SFTP push) all live in your customer's domain — they assume Nuntiq already has the invoice and you're either pulling supporting master data in or pushing the invoice out to an ERP.
This page covers the other direction: putting invoices into Nuntiq from a
connector. Two flows, both wrapped in
IngestionLoad.
| Flow | Use when |
|---|---|
IngestionLoad.ingest_pdf(...) | You have a PDF / image / ZIP and want Nuntiq's standard pipeline (OCR → template extraction → enrichment). Same path as email intake. |
IngestionLoad.new_invoice() + .submit() | You already have parsed invoice data (from an XML / EDI feed) plus optionally the original PDF, and you want to import the structured fields directly without OCR. |
The raw context.api.upload_source_documents / upload_invoice_attachments /
invoice_import helpers are still available if you need them, but
IngestionLoad is the recommended entry point — it handles the
upload-then-import token-threading dance for you, validates inputs before
the HTTP call, and returns typed results.
Pattern 1 — Drop a PDF into the standard pipeline
The simplest case. You have a file, you want Nuntiq to process it the same way it processes emailed invoices. Nuntiq runs OCR, picks a template, extracts fields, runs validation + enrichment, ends up with a fully-processed invoice at status 100 (or whichever stop you configure).
"""Pull invoice PDFs from a supplier portal and ingest them."""
import requests
from lib.objects.ingestion import IngestionLoad
def run(context):
portal_url = context.get_config('portal_url')
api_token = context.get_secret('portal_api_token')
receiving_inbox = context.get_config('receiving_inbox') # e.g. 'invoices@acme.apreceiving.com'
ingest = IngestionLoad(context)
# Fetch the list of new invoices from the supplier portal
headers = {'Authorization': f'Bearer {api_token}'}
resp = requests.get(f'{portal_url}/new-invoices', headers=headers, timeout=30)
resp.raise_for_status()
uploaded = 0
for inv in resp.json()['invoices']:
# Download the PDF
pdf_resp = requests.get(inv['pdf_url'], headers=headers, timeout=60)
pdf_resp.raise_for_status()
# Ingest
result = ingest.ingest_pdf(
filename=inv['filename'],
content=pdf_resp.content,
receiving_inbox=receiving_inbox,
supplier_code=inv.get('supplier_code'),
)
context.logger.info(
f"Ingested {inv['filename']} → source_document_id={result.source_document_id}",
step='ingest',
detail={'supplier_code': inv.get('supplier_code')},
)
uploaded += 1
return {'uploaded': uploaded}
What receiving_inbox does
It's the address Nuntiq uses for template lookup. Different receiving
inboxes can map to different templates — e.g. xml@acme.apreceiving.com
might use a structured-data template, invoices@acme.apreceiving.com uses
the standard OCR template. Default to your customer's primary inbox unless
you have a specific routing reason.
Hints
company_codeandsupplier_codeare both optional. Pass them when you know — the worker uses them to skip the AI-match step for those fields.- ZIP files are unzipped automatically. Send up to 20 files per request; for
larger batches, use
ingest_files()instead and chunk. - Max file size is 50MB per file.
Pattern 2 — Submit structured invoice + attach the original PDF
When you already have parsed invoice data (an XML feed, an EDI 810, a JSON payload from an upstream system) AND the original PDF, this is the right pattern. The builder handles the upload-then-import dance — you never touch document tokens directly.
from lib.objects.ingestion import IngestionLoad
def run(context):
receiving_inbox = context.get_config('receiving_inbox')
ingest = IngestionLoad(context)
imported = 0
for raw in fetch_structured_invoices(): # your code
inv = ingest.new_invoice(
workflow='payment_portal',
receiving_inbox=receiving_inbox,
)
# Header — set whatever fields you have
inv.header.invoice_number = raw['number']
inv.header.invoice_date = raw['date'] # ISO date string
inv.header.due_date = raw['due_date']
inv.header.currency_code = raw['currency']
inv.header.net_amount = raw['net']
inv.header.tax_amount = raw['tax']
inv.header.gross_amount = raw['gross']
inv.header.supplier_name = raw['supplier']
inv.header.order_number_1 = raw.get('po_number')
# Lines
for i, line in enumerate(raw['lines']):
l = inv.new_line()
l.invoice_line_number = i + 1
l.product_name = line['desc']
l.quantity = line['qty']
l.unit_price = line['price']
l.net_amount = line['net']
l.tax_amount = line.get('tax', 0)
# Addresses — address_type must be REMITTO | SUPPLIER | SHIPTO | BILLTO
addr = inv.new_address('SUPPLIER')
addr.street = raw['supplier_addr']['street']
addr.city = raw['supplier_addr']['city']
addr.country = raw['supplier_addr']['country']
# Optional tax breakdown
tax = inv.new_tax()
tax.tax_name = 'VAT'
tax.tax_rate = 21
tax.tax_amount = raw['tax']
# Attach the original PDF (and any supporting docs)
inv.attach_image(filename=raw['pdf_name'], content=raw['pdf_bytes'])
for att in raw.get('supporting_docs', []):
inv.attach_file(filename=att['name'], content=att['bytes'])
# Submit — uploads attachments first, then POSTs the structured invoice
result = inv.submit()
context.logger.info(
f"Imported {raw['number']} → invoice_token={result.invoice_token}",
step='import',
detail={
'invoice_status': result.invoice_status,
'template_id': result.template_id,
},
)
imported += 1
return {'imported': imported}
Workflow choice
workflow controls how far Nuntiq processes the invoice on import:
| Value | Resulting status | Use when |
|---|---|---|
payment_portal (default) | 100 (Processed) | The invoice arrives fully verified — skip enrichment entirely |
basic_enrichment | 10 | Run light enrichment but no AI |
full_enrichment | 1 | Run the full pipeline (AI match, validation) |
If you trust the upstream data, payment_portal is faster — it sends the
invoice straight to the payable stage without re-running OCR or AI match.
Attachment rules
inv.attach_image(filename, content)— at most one per invoice. This is the PDF the AP team will view in the Nuntiq viewer. Callingattach_image()twice raisesValueError.inv.attach_file(filename, content)— 0..N supporting documents (delivery notes, supporting POs, etc.). Visible to the AP team but not the primary view.- If you don't pass any IMAGE, the invoice has no viewable PDF — AP team sees the structured fields only. Usually a poor UX. Aim to always include at least one IMAGE.
Pattern 3 — Structured invoice only, no PDF
Same as Pattern 2, just skip the attach_image() and attach_file() calls.
The invoice gets created with no viewable image — make sure the AP team is
OK with that before you build a pipeline that does it at scale.
What submit() returns
IngestedInvoice.submit() returns an InvoiceImportResult:
| Field | Type | Notes |
|---|---|---|
invoice_token | str (UUID) | Pass to InvoiceLoad.get_by_token() |
invoice_id | int | Internal numeric id |
source_document_id | int | Underlying source-document id |
workflow | str | The workflow used |
invoice_status | int | Final numeric invoice status (e.g. 100 = Processed) |
source_document_status | int | Final source-document status |
template_id / template_type | int / str | Template Nuntiq picked + whether it's 'global' or 'customer' |
attachments_linked | int | Count of attachments tied to the invoice |
receiving_inbox | str | The inbox used for template lookup |
IngestionLoad.ingest_pdf() / ingest_files() return a smaller
SourceDocumentResult:
| Field | Type |
|---|---|
source_document_id | int |
files_uploaded | int |
pdf_files_queued | int |
upload_session_id | str (UUID) |
Both objects carry a _raw attribute with the full HTTP response if you
need to read fields the wrappers haven't surfaced.
Error handling
Local validation (workflow value, address_type value, max-one-image) raises
ValueError before any HTTP call — you find out at code-write time.
ingest_pdf() and submit() raise ApiError on transport / HTTP failure.
See ApiClient → Error handling.
Common server-side rejections:
| Status | Body | Cause |
|---|---|---|
| 400 | {message: 'workflow is required'} | Missing workflow on new_invoice |
| 400 | {message: 'Unknown document_token: ...'} | Attachment token doesn't exist or was already linked (shouldn't happen via IngestionLoad — it does the linking for you) |
| 404 | {message: 'No template found for receiving_inbox ...'} | The receiving_inbox you passed isn't configured on the tenant |
| 500 | varies | Server-side; usually transient — retry |
Choosing between Pattern 1 and Pattern 2
The simplest filter:
- Have a PDF only → Pattern 1.
- Have structured data only or structured data + PDF → Pattern 2.
- Have structured data and don't care about a viewable PDF → Pattern 3.
If you have both options available (structured data AND a PDF), Pattern 2 is strictly better than Pattern 1 — Nuntiq skips OCR + template extraction (faster and more deterministic) and you still get the PDF for the AP team to view.
Idempotency
Neither endpoint has an idempotency-key concept today. If you call
ingest_pdf or inv.submit() twice with the same data, Nuntiq creates
two records. Track which upstream records you've already submitted in your
own state — typically a delta_run cursor — and
don't re-submit.
Raw helpers (escape hatch)
If you need finer control than IngestionLoad offers, the raw helpers on
context.api are still available:
context.api.upload_source_documents(files, ...)— multipart upload to/v1/source-document. Returns the raw{data: {sourceDocumentId, ...}}envelope.context.api.upload_invoice_attachments(files)— multipart upload to/v1/invoice-attachments. Returns{data: {documents: [{document_token, type}, ...]}}.context.api.invoice_import(body)— POST to/v1/invoice-importwith the body shape documented in the Customer API reference.
The IngestionLoad implementation in lib/objects/ingestion.py is the
canonical example of how to thread tokens between them.
What's next
- IngestionLoad reference — field-level reference for the builder.
- Invoice claim flow — once an invoice is ingested, this is how downstream connectors push it onward to an ERP.
- Lifecycle messages — once your upstream system knows the invoice's Nuntiq token, write lifecycle events back so the Nuntiq timeline reflects what's happening in your system.