Document Classification

NEW FEATURE

New in 11.8!

Requirements
View Demo

Document Classification is an AI-trained SDK API that identifies each file on upload, based on a predefined set of 19 categories, so you can:

  • Validate intake.
  • Route to the right workflow.
  • Add metadata for processing later.

The output includes the predicted label with a confidence score in structured JSON for easy integration into your solution.

The benefits of using this feature include:

  • Automatically identify document types from a predefined set of 19 categories such as invoices, receipts, IDs, budgets, contracts, and more.
  • You decide the thresholds for automated routing or manual reviews.
  • Provides easy integration into downstream workflows.

The 19 categories include:

  • "advertisement"
  • "budget"
  • "email"
  • "file_folder"
  • "form"
  • "handwritten"
  • "id"
  • "invoice"
  • "letter"
  • "memo"
  • "news_article"
  • "passport"
  • "presentation"
  • "questionnaire"
  • "receipt"
  • "resume"
  • "scientific_publication"
  • "scientific_report"
  • "specification"

JSON Output Specification

Refer to the following specifications to learn more about the output JSON format:

Extract document classes as JSON file

Specify the name of the input PDF file and the name of the output JSON file, then select the Doc Classification engine:

1DataExtractionModule.ExtractData("Invoice.pdf", "Invoice_Classified.json", DataExtractionModule.DataExtractionEngine.e_doc_classification);

Extract document classes as JSON string

Specify the name of the input PDF file, then select the Doc Classification engine:

1string json = DataExtractionModule.ExtractData("Scientific_Publication.pdf", DataExtractionModule.DataExtractionEngine.e_doc_classification);

Optional Configurations

Select OCR Language

Password-Protected PDFs

Page Range

Minimum Confidence Threshold

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales
Apryse Server SDK Document Classification with Smart Data Extraction | Apryse documentation