Schema-driven Document Extraction API

Extract structured JSON from any document, without building parsing logic

Upload documents and receive structured JSON with fields, tables, and entities ready to use in your app.

Process invoices, receipts, contracts, and forms into structured JSON
Extract fields, tables, and entities in a single JSON response
No regex, no layout-specific logic, no maintenance
Try with your document

Start in minutes

50 pages per month free
No credit card required
REST API, SDK, webhooks

Built for

  • Invoice ingestion pipelines
  • Expense automation tools
  • Backend systems needing structured data

Why document parsing pipelines break

OCR systems return text, not structured data.

To make it usable, developers build fragile parsing logic:

  • Invoice formats change across vendors
  • Receipts vary per country / store
  • Tables break traditional parsing logic

These pipelines work at first, but quickly become fragile as document formats change.

Every new document format increases complexity, edge cases, and maintenance.

The real limitation of traditional approaches

Document parsing is not just about extracting text, it is about understanding structure.

Tables, line items, entities, and relationships cannot be reliably reconstructed with regex or static rules alone.

This is why most pipelines break when documents evolve or vary slightly.

Extract structured data directly

Parselyze converts documents into structured JSON using a schema-driven extraction layer.

Fields, tables, and entities are mapped into your predefined JSON schema.

No regex. No layout-specific logic. No maintenance.

Replace post-processing

Stop turning raw OCR blocks into brittle regex pipelines just to get fields your app can use.

Capture fields and tables

Extract scalar fields and repeating rows like line items in the same JSON response.

Ship faster

Define your schema once, then reuse it across OCR-heavy workflows, uploads, and background jobs.

Example: Invoice → structured JSON

Upload a PDF, image, or OCR text. This is the structured JSON returned by the API.

Scanned invoice example before Document parsing conversion
extraction_result.json
{
  "document_type": "invoice",
  "invoice_number": "FCT-000342",
  "invoice_date": "2024-05-28",
  "vendor_name": "ACME Corporation",
  "currency": "USD",
  "total_amount": 1500.00,
  "line_items": [
    {
      "description": "Consulting services",
      "qty": 8,
      "unit_price": 125.00,
      "total": 1000.00
    },
    {
      "description": "Design mockups",
      "qty": 1,
      "unit_price": 500.00,
      "total": 500.00
    }
  ]
}

Need this in your app?

Define your template once, then extract structured JSON automatically via SDK, REST API, or webhook workflows.

How schema-driven extraction works

A simple flow for teams that need structured data from OCR-heavy documents without extra parser maintenance.

01

Upload a document (PDF or image)

Send a PDF or image through the dashboard, REST API, or Node.js SDK.

02

Define your JSON schema

Map invoice numbers, totals, dates, entities, line items, or any custom field names to your own JSON schema.

03

Receive structured JSON via API or webhook

Use the parsed result directly in your database, ERP, product workflow, support tooling, or automation pipeline.

Document parsing for any input format

Use the same extraction approach whether you start from scanned PDFs, mobile photos, or screenshots.

Scanned PDFs

Process scanned PDF files, including multi-page archives, supplier invoices, and uploaded paperwork.

Images and photos

Convert JPG, PNG, WEBP, TIFF, and smartphone photos into structured JSON with named fields.

Mixed document batches

Handle receipts, contracts, IDs, and custom forms in the same ingestion pipeline using templates.

What you can build with Document parsing

Use structured data to automate real downstream operations instead of stopping at text extraction.

Invoice processing automation

Convert scanned invoices into structured JSON to automatically import totals, dates, and line items into accounting systems.

Receipt data extraction

Extract merchant names, amounts, and dates from receipts to automate expense tracking and reimbursements.

Contract data ingestion

Parse contracts and agreements to extract key information like parties, dates, and clauses for internal systems.

Document ingestion pipelines

Convert large volumes of PDFs and scanned documents into structured JSON to feed data warehouses or automation workflows.

Common document types

Define the schema once, then reuse it across these categories.

Invoices
Receipts
Contracts & NDAs
Medical forms
ID documents
Any custom form
How to integrate

Integrate document extraction into any app

Use the SDK or call the REST API directly. Submit scanned files and receive structured JSON ready for sync or async workflows.

1
Install: npm install parselyze
2
Create an extraction template in the dashboard
3
Submit documents and handle results via sync API, async jobs, or webhooks

Ready to integrate?

SDK examples, REST API reference, webhook handling, and template-driven extraction make it easier to launch a reliable structured data extraction workflow.

REST API and Node.js SDK

Ship quickly with API docs, SDK examples, cURL samples, and production-friendly authentication.

Built for automation

Send files and receive JSON in the same request, or use webhooks for async workflows.

Stable schemas over raw text

Replace brittle regex and post-processing with predictable fields that match your template.

Developer integration guide

Frequently asked questions

Everything you need to know about Document parsing conversion.

What is document data extraction?

Document data extraction is the process of turning unstructured files like PDFs, invoices, receipts, or forms into structured, machine-readable JSON with defined fields such as dates, totals, entities, and line items.

How is document parsing different from standard OCR?

Standard OCR returns raw text without structure. Document parsing adds a structured extraction layer that maps content into predefined fields and tables, producing clean JSON ready to use in applications, databases, or workflows.

What document types does Parselyze support?

Parselyze supports invoices, receipts, contracts, medical forms, ID documents, and any custom document type. You define the schema once, and the same extraction logic works across different layouts and formats.

What file formats are accepted?

Parselyze accepts PDF files (native and scanned), PNG, JPG, JPEG, WEBP, TIFF, and BMP images. It also supports multi-page documents and smartphone photos with varying quality.

How do I get started with the API?

Sign up for a free account, create an extraction template in the dashboard, then send documents via the REST API or Node.js SDK. You can start receiving structured JSON in minutes, with 50 pages included per month.

Can Parselyze extract structured data without custom regex?

Yes. Parselyze uses AI-based document parsing to extract structured fields and tables without requiring regex, manual parsing rules, or layout-specific code.

Start turning documents into structured JSON today

50 pages/month free · No credit card required