Some test text!

Search
Hamburger Icon

Windows / Guides / Packaging

Intelligent Data Extraction - Packaging

Reducing Package Size

The Intelligent Data Extraction module uses artificial intelligence in several of its engines, and as a result, can consume a substantial amount of disk space. This can be limiting for some users who need to work in constrained environments with limited storage, such as certain cloud computing environments. Here we will discuss how to reduce the package size depending on your Intelligent Data Extraction use case.

The Intelligent Data Extraction module is composed of 4 engines, each of which have their own file requirements. If you only need some subset of these engines, then you can remove the files from the package that are not a dependency of your required engine.

The below table maps each engine its the file dependencies.

Engine NameDependencies
Tabular Data Extraction
  • Lib/Windows/TabluarData/*
  • Lib/Windows/OCRModule.exe
Document Structure Recognition
  • Lib/Windows/StructuredOutput.exe
  • Lib/Windows/tessdata/*

The following files are only required if using Deep Learning Assist:

  • Lib/Windows/AIPageObjectExtractor/AIPageObjectExtractor.dll
  • Lib/Windows/AIPageObjectExtractor/table.cfg
  • Lib/Windows/AIPageObjectExtractor/table.onnx
  • Lib/Windows/AIPageObjectExtractor/table_tabular.onnx
  • Lib/Windows/AIPageObjectExtractor/Licenses
Form Field Detection
  • Lib/Windows/AIPageObjectExtractor/AIPageObjectExtractor.exe
  • Lib/Windows/AIPageObjectExtractor/form.cfg
  • Lib/Windows/AIPageObjectExtractor/form.onnx
  • Lib/Windows/AIPageObjectExtractor/Licenses
Form Field Key-Value Extraction
  • Lib/Windows/AIPageObjectExtractor/AIPageObjectExtractor.exe
  • Lib/Windows/AIPageObjectExtractor/form.cfg
  • Lib/Windows/AIPageObjectExtractor/form.onnx
  • Lib/Windows/AIPageObjectExtractor/kv.onnx
  • Lib/Windows/AIPageObjectExtractor/v.cab
  • Lib/Windows/AIPageObjectExtractor/Licenses

If the engines you are using do not depend on a given file, you are free to remove that file. For example, if you are using the Form Field Key-Value Extraction engine and the Document Structure Recognition engine (without Deep Learning Assist), then you can remove any files that are only needed for the Tabular Data Extraction engine. In this example, you would be left with the following:

Lib
└── Windows
    ├── AIPageObjectExtractor
    │   ├── AIPageObjectExtractor.dll
    │   ├── form.cfg
    │   ├── form.onnx
    │   ├── kv.onnx
    │   ├── v.cab
    │   └── Licenses
    ├── StructuredOutput.exe
    └── tessdata
        ├── chi_sim.traineddata
        ├── chi_sim_vert.traineddata
        ├── chi_tra.traineddata
        ├── chi_tra_vert.traineddata
        ├── ell.traineddata
        ├── eng.traineddata
        ├── grc.traineddata
        ├── jpn.traineddata
        ├── jpn_vert.traineddata
        ├── kor.traineddata
        └── kor_vert.traineddata

Get the answers you need: Chat with us