Some test text!

Search
Hamburger Icon

Python / Changelog / v9.5

Version 9.5.0 Changelog (February 17th, 2023)

New Features:

Data Extraction Module

  • A collection of routines for intelligently extracting data from PDFs: infer document structure from document content, extract data in tabular form, and detect interactive regions on a page.
  • Extract the underlying document structure in JSON form, yielding the position and content of paragraphs, tables and other structural elements. (DataExtractionModule.ExtractData(), using the e_DocStructure engine)
  • Use the tabular extraction engine to accurately extract data from your document in the form of tables, producing output in either JSON or XLSX form. (DataExtractionModule.ExtractData(), DataExtractionModule.ExtractToXLSX())
  • The form extraction engine can be used to recognize interactive elements on a page visually, so that the correct associated PDF annotations can be created. (DataExtractionModule.ExtractData() using the e_Form engine)

SVG to PDF Conversion

  • Added new built-in conversion from SVG to PDF. (Convert.FromSVG() and as part of Convert.ToPdf())

Other Changes

  • Added methods to configure the ambient string returned by TextSearch. (TextSearch.SetAmbientLettersBefore, TextSearch.SetAmbientLettersAfter, TextSearch.SetAmbientWordsBefore, TextSearch.SetAmbientWordsAfter)
  • Added a method that can be used to set the opacity of a stamp annotation. (RubberStamp.SetOpacity())

Improvements:

  • [node] Added support for Node 19 and Electron 20-22 to npm.
  • [python] Added support for Python 3.11 to pip.
  • [all] Added support for consumption licensing keys with expiry dates.
  • [pdf] Exposed CreateHideField to Python, PHP, Ruby and Golang.
  • [pdf] Addressed a number of non-critical static analysis issues.
  • [pdf] Added support for corrupt PDF fonts that require, but don't have a widths entry.
  • [.net] Exposed PDFDraw.Export(Filter) to .Net Core/Standard/5/6.
  • [pdf] Added support for partial corruptions in compressed object streams that could previously lead to issues loading annotations or other objects.
  • [pdf] Greatly reduced memory usage when importing PDF pages from a document with many OCG/Layer objects.
  • [pdf] Improved support for ignoring garbage bytes when repairing corrupt documents.
  • [xfdf] Added logic to allow for skipping invalid annotations that don't have a proper Rect entry to XFDF export. Previously an exception would be thrown.
  • [node] Improved incorrect argument type error handling for a number of functions.
  • [pdf] Adjusted IsFullSaveRequired to return true if the file was just redacted.
  • [cad] Updated CAD module binaries to use ODA version 23.8_16.
  • [cad] Included CAD module version information in the Producer section of the output PDF.

Bugfixes:

  • [image] Fixed a potential crash issue when encountering certain types of errors when loading JpegXR images.
  • [office] Fixed a crash that could occur when converting using outlook interop.
  • [pdf] Fixed an issue with nametree creation that would prevent creation of PDF Portfolios with more than 21 documents.
  • [pdf] Fixed a potential crash when copy-contructing ObjectIdentifier.
  • [pdf] Fixed out of bounds issues reported by static analysis that were unlikely to cause any issues in FreeText annotation handling, EMF conversion and CMYK rendering.
  • [pdf] Fixed loading of FDF trust lists containing a single certificate for digital signature verification.
  • [pdf] Fixed an issue with PDFView.LoadThumbAsync where, if annotation rendering is disabled, color postprocessing would be skipped.
  • [.net] Fixed a problem with the TimestampingConfiguration constructor that could cause a crash in .Net4.
  • [pdf] Added support for String FontName entries when subsetting fonts with Optimizer. Previously this corruption would cause an exception to be thrown.
  • [cad] Fixed issue where there could be an empty output file after an error occurs during CAD -> PDF conversion.
  • [xfdf] No longer output appearance references for direct annotations. This is mainly because the reference will always be incorrect and can lead to undesirable handling in WebViewer.
  • [pdf] Fixed a crash that could occur during some TextDiff use cases.
  • [node] Fixed a crash when using streaming conversion methods that take a filter as input. (e.g. PDFNet.Convert.createOfficeTemplateWithFilter)
  • [pdf] Fixed an issue where image masks might not be redacted properly.
  • [pdf] Fixed a crash that could occur when parsing files with null OCG entries in the content stream.
  • [cad] Fixed an issue where an invalid layer in a CAD file could cause CAD -> PDF conversion to fail.
  • [cad] Fixed an issue where off layers could become on after CAD -> PDF conversion.
  • [cad] Fixed an issue where converting certain DGN files with ZoomToExtents(true) could lead to missing content in the output PDF.
  • [cad] Fixed an issue where converting DGN files with external references could lead to incorrect colors in the output PDF.
  • [cad] Fixed an issue where duplicate reference file names could lead to incorrect layers being displayed in the output PDF.
  • [cad] Fixed an issue where frozen layers could lead to incorrect layers being included in the output PDF.
  • [cad] Fixed an issue where converting DWF files with non-standard color palette could lead to incorrect colors in the output PDF. Note that this issue still remains if a custom background color is used.

Office Fidelity:

  • [wmf] Fixed multiple font issues in WMF image conversion.
  • [xlsx] Fixed a bug with calculating formulas with multiple cell references.
  • [xlsx] Fixed default text justification for cells with error values.
  • [xlsx] Added proper handling of the duplicates rule for empty cells.
  • [xlsx] Removed unnecessary application of column styles.
  • [xlsx] Added support for reading table data from hidden sheets.
  • [xlsx] Added implementation of the ISFORMULA, CONCATENATE, N, and EXACT Excel functions.
  • [xlsx] Fixed issues with INDIRECT and ISERROR Excel functions.
  • [xlsx] Fixed potential crash when using MATCH or MEDIAN Excel functions.
  • [office] Updated Harfbuzz to version 2.3.0.
  • [xlsx] Fixed multiple issues with the elapsed time formatting.
  • [office] Added support of displaying row and column headings on each page in an Excel file.
  • [xls] Added ability to open protected XLS files (encrypted with default password).
  • [pptx] Fixed a bug with paragraph and text run properties inheritance.
  • [xlsx] Fixed a bug with missing header images in Excel documents.
  • [office] Fixed an issue with EMF files containing PDF backgrounds.
  • [docx] Fixed an issue with alternative text for images in PDFs converted from Word.
  • [docx] Fixed a bug with missing footnote text.
  • [docx] Added support for doNotExpandShiftReturn compatibility option.
  • [xlsx] Fixed a bug with skipping Excel sheets without text content.
  • [pptx] Added support for theme font resolution for CS, EA and symbol font faces.
  • [xls] Now support reading of alternate content drawings for Excel controls.
  • [xls] Now set proper default pen and brush objects for EMF rendering.
  • [office] Added support for alternative Excel page order when page breaks are applied.
  • [docx] Added support for disabled orphan line placement.
  • [docx] Implemented classification for symbol characters to improve font substitution.
  • [docx] Fixed table row splitting with large images.
  • [xlsx] Implement proper handling of _xlfn prefix of Excel functions.
  • [docx] Fixed a bug with incorrectly positioned paragraph frames.
  • [docx] Fixed the default header/footer margin value.
  • [pptx] Fixed an issue with forcing bold and italic font faces for PowerPoint embedded fonts.
  • [pptx] Fixed an issue with dropping the bold qualifier for extra bold fonts.
  • [pptx] Added default border for PowerPoint tables without a table style.
  • [docx] Fixed an issue with the drawing order of shapes in header and footer.
  • [doc] Fixed issues with multi-encoding DOC documents.
  • [docx] Fixed an issue with displaying invisible strokes.
  • [pptx] Fixed an issue with EMF image scaling.
  • [doc] Added support for floating tables nested in an inline table.
  • [pptx] Fixed an issue with applying transformations to rotated shapes inside a group.
  • [pptx] Fixed an issue with PowerPoint text rotation.
  • [xlsx] Fixed multiple issues with formatting of fractional numbers.
  • [xlsx] Fixed Excel cell clipping in RTL Excel sheets.
  • [docx] Implemented decimal tab stops.
  • [office] Fixed potential crashes caused by invalid PPT files.
  • [xlsx] Added support of "shrinkToFit" attribute of Excel text alignment style.
  • [office] Improved appearance of text underlines.
  • [docx] Fixed a bug with tab stop past right indent.
  • [xlsx] Now use the correct origin for references in Excel functions.
  • [xlsx] Added forward and backward trend line projections for the missing chart values.
  • [xlsx] Added support of "Show a zero in cells that have zero value" Excel option.
  • [xlsx] Changed conditional formatting rules to be case insensitive.
  • [office] Fixed a missing header in Excel documents.
  • [xlsx] Implemented proper combining of manual and automatic Excel page breaks.
  • [xlsx] Fixed too large Excel page margins.
  • [xlsx] Implemented proper scaling of Excel sheets with page breaking enabled.
  • [office] Added proper Excel page clipping when page breaks are applied.
  • [office] Fixed an issue with missing charts when Excel page breaks are applied.
  • [office] Removed Excel page content size limit when page breaks are applied.
  • [office] Fixed multiple issues with Excel headers and footers.
  • [xlsx] Fixed an issue where incorrect page orientation was used when SetApplyPageBreaksToSheet option is set.
  • [docx] Added ability for footnotes to span multiple pages if necessary.
  • [office] Improved the dotted border dash pattern to better match Word.
  • [xlsx] Fixed a crash that could occur when requesting a non-embedded images from a drawing without relationships.
  • [xlsx] Implemented the Excel feature where numbers that cannot fit into cells are replaced with '#' signs.
  • [xlsx] Added support for Excel accounting underlines.
  • [xlsx] Added proper support for the asterisk ('*') format code.
  • [xlsx] Implemented proper handling of tabs in Excel cells (all tabs are printed as two spaces).
  • [docx] Implemented an option to hide total page numbers in Word documents.
  • [xlsx] Fixed an issue with conditional formatting formulas containing Excel error values.
  • [xls] Fixed a crash related to Excel page breaking.
  • [xlsx] Added support for mixed text and images in Excel headers/footers.
  • [xlsx] Added processing of date and file name header/footer parts.
  • [xlsx] Added support for the "Center on page" Excel print options.

PDF-to-office Conversion:

  • [docx] Fixed converter deadlock that could occur certain corrupt input files.
  • [docx] Fixed a potential crash on Linux systems caused by a corrupt input.
  • [docx] Fixed an issue with inconsistent page headers.
  • [docx] Fixed a bug which could cause cause additional unwanted shapes to be rendered.
  • [docx] Fixed an issue where the engine could unexpectdly fail during content detection.
  • [docx] Fixed an annotation bug that could lead to conversion failure.
  • [docx] Fixed a bug causing portion of graphic to be removed.
  • [docx] Fixed an issue where empty table columns could be produced.
  • [docx] Fixed a potential memory leak that could occur when processing type3 shader.
  • [xlsx] Fixed a bug causing corrupt xlsx output on particular input files.
  • [docx] Improved table detection, table border styles, rendering of contents and division into columns.
  • [docx] Improved non standard encoding detection.
  • [docx] Improved picture placement in the output document.
  • [docx] Improved conversion of annotations and comments.
  • [docx] Fixed bookmark/outline structure detection in Table of Contents.
  • [docx] Fixed a bug causing inconsistent strikethrough styling color in output.
  • [docx] Improved handling of graphic groups when located inside table cells.
  • [docx] Fixed a bug where we failed to recognize some bold text.
  • [docx] Fixed an issue with text location in relation to horizontal lines.
  • [docx] Fixed character spacing of Thai language.
  • [docx] Improved detection and distinction between textbox and background regions.
  • [docx] Upgraded handling of column detection.
  • [docx] Fixed an issue with counting of hyperlinks on pictures and on other graphic objects within groups.
  • [docx] Fixed line positioning in the case of Inline Wrapping and DrawingObject type.
  • [docx] Upgraded list detection routines.
  • [docx] Fixed a bug preventing a dashed border around annotation textbox.
  • [docx] Fixed a bug preventing fill color of text box.
  • [docx] Fixed a bug preventing an image being correctly rendered on a black background.
  • [docx] Improved the z order detection of images.
  • [docx] Fixed a bug eliminating the shadow from rotated quotation mark.
  • [docx] Fixed a bug rendering white text as black text .
  • [docx] Fixed a bug converting underlined text to normal text with a line shape.
  • [xlsx] Improved conversion of background gradient colors.
  • [pptx] Fixed a bug that could result in rendering too many spaces between characters.
  • [docx] Improved the placement of graphic groups inside cells.
  • [docx] Improved handling of situations where text is placed on top of images.
  • [docx] Fixed a bug with the positioning of rotated textboxes when placed as OOXML Drawing Objects.
  • [docx] Improved the placement of text within the page page border.

Trial setup questions? Ask experts on Discord
Need other help? Contact Support
Pricing or product questions? Contact Sales