Skip to main content
500 pages to test the API, one-time. No credit card.Join the waitlist
Document Intelligence API

Any PDF goes in,Typed JSON comes out.

One REST call turns any invoice, receipt, or statement into typed JSON: fields, tables, and relations with per-field confidence and pixel-level traceability, read by our own proprietary extraction engine.

  • Proprietary engine
  • Enterprise encryption
  • US data residency
Parsift reads a PDF invoice field by field and converts it into typed JSONOn the left, an invoice with four marked fields: vendor, invoice id, the line-items table, and the total. Each field is wired to its matching key in the typed JSON on the right.INPUTOUTPUTINVOICEINV-2024 · No. 004821 · 2026-05-22AWS Inc.Cloud Infrastructure ServicesINV-2024ITEMQTYTOTALEC2 compute12,100.00S3 storage11,400.00RDS instance11,000.00GRAND TOTAL$4,500.00A · vendorB · idC · items[3×3]D · totalTYPED JSON200 OK{"pages": 1,"vendor": "AWS Inc.","id": "INV-2024","items": [{3×3}],"total": 4500.00,"score": 0.99}4 FIELDS · 1 TABLE1,184 ms
Precision · 30d
98.4%
Latency · p50
1.2s/page
SLA observed · 12m
99.95%
Region
USAzure

// measurementPrecision = exact field match on an internal benchmark of 12.4k labeled documents (invoices, contracts, forms); p50 latency per page under mixed loads. SLA observed · 12m is uptime measured across all tiers over the last 12 months; contractual SLA is set per plan (99% to 99.99%). Request the full methodology.

How it works

Three steps, one REST call.

Send a document, the engine reads it, you get typed data back. No templates to maintain, no manual review in the loop.

01
STEP 01

Upload PDF

Drag & drop via dashboard or POST via API. We support PDFs, images, and Word docs with automatic format detection.

02
STEP 02

Engine reads it

Our proprietary engine identifies tables, key-value pairs, and entities with 98.4% field-level precision, measured on a 12.4k-document benchmark.

03
STEP 03

Structured Output

Receive clean JSON, Markdown, or CSV, ready to drop straight into your database or LLM context.

Extraction

Three operations cover most of the document work.

Parsift exposes a small, well-defined set of semantic primitives that you compose into your product's flow. There are no models to train, no prompts to write, and no brittle templates to maintain.

KIEKey Information Extraction

Identifies fields by meaning, not by position. Issuer, dates, amounts, identifiers, and clauses, returned as typed JSON with a per-field confidence score. Works on unseen layouts, mixed languages, and noisy scans.

coverage60+ fields
ontologycustomizable
confidenceper field

TablesTable Structure Recognition

Reconstructs tables with merged cells, hierarchical headers, and multi-page breaks. Output in JSON, CSV, or Markdown with traceability back to the original pixels, ready for validation or direct ingestion into your database.

formatsJSON · CSV · MD
multi-pagecontinuous
merged cellssupported

LayoutLayout metadata

Detects paragraphs, lists, signatures, and stamps while preserving reading order. Every element carries coordinates, page, and original order, ideal for reprocessing, auditing, and composing into downstream LLMs.

bounding boxper element
reading orderpreserved
trailauditable
Developer Experience

From PDF to typed JSON in one call.

No custom regex parsers to maintain. A documented REST API you can call from any language. Rotation, skew, and low-quality scans are handled for you.

Request · POST /v1/extract
# Turn an invoice PDF into typed JSON
curl -X POST https://api.parsift.com/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "[email protected]" \
  -F "schema_type=invoice"
Response · 200 OK
{
"status": "success",
"data": {
"vendor": "AWS Inc.",
"total": 4500.00
}
}

Typed JSON, every call

Fields, tables, and relations come back with per-field confidence, ready to validate against your own schema.

Webhooks for batch jobs

Submit large documents and get a callback when results are ready, so a long batch never blocks a request.

Pixel-level traceability

Every value carries its coordinates on the page, so you can show users exactly where a number came from.

Pricing

Pick the plan that matches your document volume.

MonthlyYearly-20%
Starter

For developers and small projects shipping their first extractions

$10/mo

  • 1,000 pages/month
  • Proprietary extraction engine
  • Built-in templates for common documents
  • REST API access
  • JSON export
  • Proprietary engine, processing you control
  • Email support
Join the waitlist
Enterprise

For high-volume operations with compliance and deployment requirements

Contact sales

  • Unlimited pages/month
  • Everything in Pro
  • Custom templates
  • Guaranteed SLA
  • SSO / SAML
  • On-premise deployment
  • Custom integrations
  • Dedicated 24/7 support
Contact sales

All plans include SSL, data encryption, and a proprietary engine with processing you control.

Compliance

Built for the teams that have to prove compliance.

We operate with auditable practices and provide the contractual instruments needed for deployment in fintechs, law firms, insurers, and healthcare providers.

privacy

LGPD · GDPR

Designed to comply with the LGPD and to be compatible with the GDPR. A data processing agreement (DPA) and standard contractual clauses are available on request.

private

Private processing

Extraction runs on our own engine, and you control how your documents are processed: they are not handed to external model providers without your choice.

crypto

Encryption

Encrypted in transit with TLS and at rest in our storage and database layers. Access to production data is restricted, authenticated, and logged.

residency

Residency

Processing in US regions on Azure. Contractual guarantee that we never train on your documents.

Every control above is documented and verifiable. Read the full posture, or request the report and agreement your review needs.

Access · 500 pages to test

Five hundred pages to test the API.

Parsift is in early access. Join the waitlist and we will send your invite when account creation opens. A one-time grant of 500 pages to test the API, separate from any plan, no credit card to start.

Precision · 30d98.4 %
Latency · p501.2 s/page
SLA observed · 12m99.95 %
Trial pages · one-time500