Document Classification

NEW FEATURE

New in 11.8!

Requirements
View Demo

Document Classification is an AI-trained SDK API that identifies each file on upload, based on a predefined set of 19 categories, so you can:

  • Validate intake.
  • Route to the right workflow.
  • Add metadata for processing later.

The output includes the predicted label with a confidence score in structured JSON for easy integration into your solution.

The benefits of using this feature include:

  • Automatically identify document types from a predefined set of 19 categories such as invoices, receipts, IDs, budgets, contracts, and more.
  • You decide the thresholds for automated routing or manual reviews.
  • Provides easy integration into downstream workflows.

The 19 categories include:

  • "advertisement"
  • "budget"
  • "email"
  • "file_folder"
  • "form"
  • "handwritten"
  • "id"
  • "invoice"
  • "letter"
  • "memo"
  • "news_article"
  • "passport"
  • "presentation"
  • "questionnaire"
  • "receipt"
  • "resume"
  • "scientific_publication"
  • "scientific_report"
  • "specification"

Extract document classes as JSON file

Specify the name of the input PDF file and the name of the output JSON file, then select the Doc Classification engine:

1DataExtractionModule.ExtractData("Invoice.pdf", "Invoice_Classified.json", DataExtractionModule.DataExtractionEngine.e_doc_classification);

Extract document classes as JSON string

Specify the name of the input PDF file, then select the Doc Classification engine:

1string json = DataExtractionModule.ExtractData("Scientific_Publication.pdf", DataExtractionModule.DataExtractionEngine.e_doc_classification);

Optional Configurations

Select OCR Language

Password-Protected PDFs

Page Range

Minimum Confidence Threshold

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales