A collection of routines for intelligently extracting data from PDFs: infer document structure from document content, extract data in tabular form, and detect interactive regions on a page.
Extract the underlying document structure in JSON form, yielding the position and content of paragraphs, tables and other structural elements. (DataExtractionModule.ExtractData(), using the e_DocStructure engine)
Use the tabular extraction engine to accurately extract data from your document in the form of tables, producing output in either JSON or XLSX form. (DataExtractionModule.ExtractData(), DataExtractionModule.ExtractToXLSX())
The form extraction engine can be used to recognize interactive elements on a page visually, so that the correct associated PDF annotations can be created. (DataExtractionModule.ExtractData() using the e_Form engine)
SVG to PDF Conversion
Added new built-in conversion from SVG to PDF. (Convert.FromSVG() and as part of Convert.ToPdf())
Other Changes
Added methods to configure the ambient string returned by TextSearch. (TextSearch.SetAmbientLettersBefore, TextSearch.SetAmbientLettersAfter, TextSearch.SetAmbientWordsBefore, TextSearch.SetAmbientWordsAfter)
Added a method that can be used to set the opacity of a stamp annotation. (RubberStamp.SetOpacity())
Improvements:
[node] Added support for Node 19 and Electron 20-22 to npm.
[python] Added support for Python 3.11 to pip.
[all] Added support for consumption licensing keys with expiry dates.
[pdf] Exposed CreateHideField to Python, PHP, Ruby and Golang.
[pdf] Addressed a number of non-critical static analysis issues.
[pdf] Added support for corrupt PDF fonts that require, but don't have a widths entry.
[.net] Exposed PDFDraw.Export(Filter) to .Net Core/Standard/5/6.
[pdf] Added support for partial corruptions in compressed object streams that could previously lead to issues loading annotations or other objects.
[pdf] Greatly reduced memory usage when importing PDF pages from a document with many OCG/Layer objects.
[pdf] Improved support for ignoring garbage bytes when repairing corrupt documents.
[xfdf] Added logic to allow for skipping invalid annotations that don't have a proper Rect entry to XFDF export. Previously an exception would be thrown.
[node] Improved incorrect argument type error handling for a number of functions.
[pdf] Adjusted IsFullSaveRequired to return true if the file was just redacted.
[cad] Updated CAD module binaries to use ODA version 23.8_16.
[cad] Included CAD module version information in the Producer section of the output PDF.
Bugfixes:
[image] Fixed a potential crash issue when encountering certain types of errors when loading JpegXR images.
[office] Fixed a crash that could occur when converting using outlook interop.
[pdf] Fixed an issue with nametree creation that would prevent creation of PDF Portfolios with more than 21 documents.
[pdf] Fixed a potential crash when copy-contructing ObjectIdentifier.
[pdf] Fixed out of bounds issues reported by static analysis that were unlikely to cause any issues in FreeText annotation handling, EMF conversion and CMYK rendering.
[pdf] Fixed loading of FDF trust lists containing a single certificate for digital signature verification.
[pdf] Fixed an issue with PDFView.LoadThumbAsync where, if annotation rendering is disabled, color postprocessing would be skipped.
[.net] Fixed a problem with the TimestampingConfiguration constructor that could cause a crash in .Net4.
[pdf] Added support for String FontName entries when subsetting fonts with Optimizer. Previously this corruption would cause an exception to be thrown.
[cad] Fixed issue where there could be an empty output file after an error occurs during CAD -> PDF conversion.
[xfdf] No longer output appearance references for direct annotations. This is mainly because the reference will always be incorrect and can lead to undesirable handling in WebViewer.
[pdf] Fixed a crash that could occur during some TextDiff use cases.
[node] Fixed a crash when using streaming conversion methods that take a filter as input. (e.g. PDFNet.Convert.createOfficeTemplateWithFilter)
[pdf] Fixed an issue where image masks might not be redacted properly.
[pdf] Fixed a crash that could occur when parsing files with null OCG entries in the content stream.
[cad] Fixed an issue where an invalid layer in a CAD file could cause CAD -> PDF conversion to fail.
[cad] Fixed an issue where off layers could become on after CAD -> PDF conversion.
[cad] Fixed an issue where converting certain DGN files with ZoomToExtents(true) could lead to missing content in the output PDF.
[cad] Fixed an issue where converting DGN files with external references could lead to incorrect colors in the output PDF.
[cad] Fixed an issue where duplicate reference file names could lead to incorrect layers being displayed in the output PDF.
[cad] Fixed an issue where frozen layers could lead to incorrect layers being included in the output PDF.
[cad] Fixed an issue where converting DWF files with non-standard color palette could lead to incorrect colors in the output PDF. Note that this issue still remains if a custom background color is used.
Office Fidelity:
[wmf] Fixed multiple font issues in WMF image conversion.
[xlsx] Fixed a bug with calculating formulas with multiple cell references.
[xlsx] Fixed default text justification for cells with error values.
[xlsx] Added proper handling of the duplicates rule for empty cells.
[xlsx] Removed unnecessary application of column styles.
[xlsx] Added support for reading table data from hidden sheets.
[xlsx] Added implementation of the ISFORMULA, CONCATENATE, N, and EXACT Excel functions.
[xlsx] Fixed issues with INDIRECT and ISERROR Excel functions.
[xlsx] Fixed potential crash when using MATCH or MEDIAN Excel functions.
[office] Updated Harfbuzz to version 2.3.0.
[xlsx] Fixed multiple issues with the elapsed time formatting.
[office] Added support of displaying row and column headings on each page in an Excel file.
[xls] Added ability to open protected XLS files (encrypted with default password).
[pptx] Fixed a bug with paragraph and text run properties inheritance.
[xlsx] Fixed a bug with missing header images in Excel documents.
[office] Fixed an issue with EMF files containing PDF backgrounds.
[docx] Fixed an issue with alternative text for images in PDFs converted from Word.
[docx] Fixed a bug with missing footnote text.
[docx] Added support for doNotExpandShiftReturn compatibility option.
[xlsx] Fixed a bug with skipping Excel sheets without text content.
[pptx] Added support for theme font resolution for CS, EA and symbol font faces.
[xls] Now support reading of alternate content drawings for Excel controls.
[xls] Now set proper default pen and brush objects for EMF rendering.
[office] Added support for alternative Excel page order when page breaks are applied.
[docx] Added support for disabled orphan line placement.
[docx] Implemented classification for symbol characters to improve font substitution.
[docx] Fixed table row splitting with large images.
[xlsx] Implement proper handling of _xlfn prefix of Excel functions.
[docx] Fixed a bug with incorrectly positioned paragraph frames.
[docx] Fixed the default header/footer margin value.
[pptx] Fixed an issue with forcing bold and italic font faces for PowerPoint embedded fonts.
[pptx] Fixed an issue with dropping the bold qualifier for extra bold fonts.
[pptx] Added default border for PowerPoint tables without a table style.
[docx] Fixed an issue with the drawing order of shapes in header and footer.
[doc] Fixed issues with multi-encoding DOC documents.
[docx] Fixed an issue with displaying invisible strokes.
[pptx] Fixed an issue with EMF image scaling.
[doc] Added support for floating tables nested in an inline table.
[pptx] Fixed an issue with applying transformations to rotated shapes inside a group.
[pptx] Fixed an issue with PowerPoint text rotation.
[xlsx] Fixed multiple issues with formatting of fractional numbers.
[xlsx] Fixed Excel cell clipping in RTL Excel sheets.
[docx] Implemented decimal tab stops.
[office] Fixed potential crashes caused by invalid PPT files.
[xlsx] Added support of "shrinkToFit" attribute of Excel text alignment style.
[office] Improved appearance of text underlines.
[docx] Fixed a bug with tab stop past right indent.
[xlsx] Now use the correct origin for references in Excel functions.
[xlsx] Added forward and backward trend line projections for the missing chart values.
[xlsx] Added support of "Show a zero in cells that have zero value" Excel option.
[xlsx] Changed conditional formatting rules to be case insensitive.
[office] Fixed a missing header in Excel documents.
[xlsx] Implemented proper combining of manual and automatic Excel page breaks.
[xlsx] Fixed too large Excel page margins.
[xlsx] Implemented proper scaling of Excel sheets with page breaking enabled.
[office] Added proper Excel page clipping when page breaks are applied.
[office] Fixed an issue with missing charts when Excel page breaks are applied.
[office] Removed Excel page content size limit when page breaks are applied.
[office] Fixed multiple issues with Excel headers and footers.
[xlsx] Fixed an issue where incorrect page orientation was used when SetApplyPageBreaksToSheet option is set.
[docx] Added ability for footnotes to span multiple pages if necessary.
[office] Improved the dotted border dash pattern to better match Word.
[xlsx] Fixed a crash that could occur when requesting a non-embedded images from a drawing without relationships.
[xlsx] Implemented the Excel feature where numbers that cannot fit into cells are replaced with '#' signs.
[xlsx] Added support for Excel accounting underlines.
[xlsx] Added proper support for the asterisk ('*') format code.
[xlsx] Implemented proper handling of tabs in Excel cells (all tabs are printed as two spaces).
[docx] Implemented an option to hide total page numbers in Word documents.
[xlsx] Fixed an issue with conditional formatting formulas containing Excel error values.
[xls] Fixed a crash related to Excel page breaking.
[xlsx] Added support for mixed text and images in Excel headers/footers.
[xlsx] Added processing of date and file name header/footer parts.
[xlsx] Added support for the "Center on page" Excel print options.
PDF-to-office Conversion:
[docx] Fixed converter deadlock that could occur certain corrupt input files.
[docx] Fixed a potential crash on Linux systems caused by a corrupt input.
[docx] Fixed an issue with inconsistent page headers.
[docx] Fixed a bug which could cause cause additional unwanted shapes to be rendered.
[docx] Fixed an issue where the engine could unexpectdly fail during content detection.
[docx] Fixed an annotation bug that could lead to conversion failure.
[docx] Fixed a bug causing portion of graphic to be removed.
[docx] Fixed an issue where empty table columns could be produced.
[docx] Fixed a potential memory leak that could occur when processing type3 shader.
[xlsx] Fixed a bug causing corrupt xlsx output on particular input files.
[docx] Improved table detection, table border styles, rendering of contents and division into columns.
[docx] Improved non standard encoding detection.
[docx] Improved picture placement in the output document.
[docx] Improved conversion of annotations and comments.
[docx] Fixed bookmark/outline structure detection in Table of Contents.
[docx] Fixed a bug causing inconsistent strikethrough styling color in output.
[docx] Improved handling of graphic groups when located inside table cells.
[docx] Fixed a bug where we failed to recognize some bold text.
[docx] Fixed an issue with text location in relation to horizontal lines.
[docx] Fixed character spacing of Thai language.
[docx] Improved detection and distinction between textbox and background regions.
[docx] Upgraded handling of column detection.
[docx] Fixed an issue with counting of hyperlinks on pictures and on other graphic objects within groups.
[docx] Fixed line positioning in the case of Inline Wrapping and DrawingObject type.
[docx] Upgraded list detection routines.
[docx] Fixed a bug preventing a dashed border around annotation textbox.
[docx] Fixed a bug preventing fill color of text box.
[docx] Fixed a bug preventing an image being correctly rendered on a black background.
[docx] Improved the z order detection of images.
[docx] Fixed a bug eliminating the shadow from rotated quotation mark.
[docx] Fixed a bug rendering white text as black text .
[docx] Fixed a bug converting underlined text to normal text with a line shape.
[xlsx] Improved conversion of background gradient colors.
[pptx] Fixed a bug that could result in rendering too many spaces between characters.
[docx] Improved the placement of graphic groups inside cells.
[docx] Improved handling of situations where text is placed on top of images.
[docx] Fixed a bug with the positioning of rotated textboxes when placed as OOXML Drawing Objects.
[docx] Improved the placement of text within the page page border.