Some test text!

Search
Hamburger Icon

Ruby / Changelog / v10.7

Version 10.7.0 Changelog (February 7th, 2024)

New In This Release

Data Extraction Features

  • Added a convenient interface to automatically detect and add form fields to a PDF, using the output of our AI form field detection engine. (DataExtractionModule.DetectAndAddFormFieldsToPDF())
  • Added a new DataExtractionModule engine: the form KeyVal engine. This new engine is an extension of the existing form detection engine which, in addition to the form box itself, also detects questions and labels associated with a given form field. Note that this API is in the beta state: we expect the quality of the output to increase dramatically in subsequent releases, and there may be minor changes to the API. See (e_FormKeyValue in DataExtractionEngine enum and DataExtractionModule.SetFormExtractionEngine() for DataExtractionModule.DetectAndAddFormFieldsToPDF())
  • Added deep learning powered assistance to table detection within the Doc Structure engine. This option increases the accuracy of detected tables at the expense of higher computing costs. (DataExtractionOptions.SetDeepLearningAssist())

Word Template Generation

  • In addition to the existing PDF generation functionality using docx templates, the template engine can now produce filled in .docx files as output. (TemplateDocument.FillTemplateJsonToOffice())

New Options

  • Added functions to allow trusted certificates to be retrieved from VerificationOptions. (VerificationOptions.GetTrustedCertificateCount() and VerificationOptions.GetTrustedCertificate())

Improvements:

  • [node.js] Added support for electron versions 23-26.
  • [pdf] Improved handling for incomplete PDF Font ToUnicode maps. Previously they would be ignored, which could lead to invalid unicode characters when extracting text.
  • [pdfa] Improved PDF/A-1 conversion with respect to transparency in text to better handle transparency very close to zero.
  • [pdf] Adjusted Digital Signature Verification logic to retain a more complete set of revocation data. This produces more compatible results when applying LTV.
  • [pdf] Improved handling of corrupt documents where objects not in the XRef table are referenced as pages. Previously this could lead to the document needing a full repair and, in rare situations, some parts failing to be reconstructed.
  • [pdf] Improved support for repairing corrupt documents that contain invalid references to object 0.
  • [pdf] InsertPages will now insert bookmarks at the start when applicable if the new pages were inserted at the start to better match the document order.
  • [image] Added support for .cal files in the Convert API.
  • [pdf] Retain Structure Tree ClassMap, IDTree and Rolemap entries when inserting pages from one document into another. This ensures that certain groupings and styles of structure elements are maintained.
  • [image] Better partial handling for incomplete/corrupt PNG images. Previously processing these could throw an error.
  • [pdf] Improved support for inserting pages from one document to another when the source document has invalid page references. Previously these cases could lead to an exception being thrown.
  • [pdf] Improved handling of corrupt documents with ICCBased ColorSpace, but not the required N entry.
  • [pdf] Added missing functions Date.SetUTMinutes, Date.SetUTHour and Date.SetUT for C++, Python, Ruby, PHP and Go. These functions are needed in order to effectively set the timezone of the date object.
  • [image] Adjusted image import to ignore Tiff resolution in cases where is it excessively large. This avoids a problem where a Tiff of this sort converted to PDF would end up with a tiny output file, which is unsupported by PDF viewers.
  • [cad] Adjusted DGN to PDF conversion to allow substitution of symbolic text using unavailable CAD-engine fonts.
  • [cad] Improved default font substitutions for a number of common CAD font resources.
  • [cad] Improved line spacing in multi-line text objects.
  • [cad] Added support for the .cal format as a reference in CAD files.
  • [cad] Improved the behaviour of ZoomToExtents mode by omitting more non-visible layers when calculating extents.
  • [txt] Fixed an issue with incorrect handling of carriage return (CR)-only newlines in TXT files, ensuring proper text formatting.

Bugfixes:

  • [xfdf] Fixed issues with certain Cloudy bordered FreeText annotations being cut off after importing from XFDF.
  • [ocr] Fixed a problem related to the IRIS OCR Module where certain high-quality images in the PDF could lead to an incorrect resolution estimate and subsequently an allocation error.
  • [pdf] Fixed an issue with generation of Redaction annotation appearances where, in some cases, the text color would be incorrect.
  • [pdf] Fixed an issue with X501DistinguishedName.GetStringValuesForAttribute() where it could return string values not associated with the given attribute.
  • [pdf] Fixed a potential crash in DigitalSignatureField::GetCertsFromCMS() when no certificates can be retrieved.
  • [pdf] Corrected some issues with calculation of internal bookmark counts when items are opened or closed.
  • [html] Updated reflow logic to restore "This page doesn't contain reflowable text" message when there is actually no text on the page.
  • [xfdf] Fixed issue with XFDF import missing extended ascii support for button "on" states. On rare occasions this could result in failing to check the correct button.
  • [pdf] Fixed an issue with the handling of uncommon mix of cmaps that could lead to incorrect encoding usage and rendering of incorrect glyphs.
  • [pdf] Fixed an issue with extracting text from RTL documents utilizing ActualText.
  • [node.js] Fixed an issue where ObjectIdentifier.createFromPredefined() would not function correctly in Node.js.
  • [pdf] Fixed an issue with digital signing that could occur with a signer's certificate having a negative serial number.
  • [xfdf] Fixed an XFDF import issue where some Page references would be left as integers rather than resolved to an object reference. This could lead to poor interoperability with other PDF consumers.
  • [pdf] Fixed a PDF image rendering issue that would occur when the same image is used both as a softmask and a regular image.
  • [pdf] Adjusted the handling of text as a clipping path to apply even clipping paths that consist of empty glyph outlines. Previously, content could be shown that is clipped by some other PDF consumers.
  • [pdf] Fixed a rare encoding issue that could lead to missing characters in specific documents.
  • [pdf] Fixed an issue when inserting pages from another PDF where the source document had Widget annotations with multiple styles, but the same Field name. Previously this could lead to them ending up with the same style in the destination.
  • [html] Fixed an issue with PDF to HTML where specifying a target PDFDoc parameter would throw an exception.
  • [cad] Fixed an issue where some lines with default ending style could be incorrectly rounded (sausage lines) when converting DGN to PDF.
  • [cad] Fixed an issue where 0-width dots with square line endings would not be visible after converting DGN to PDF.
  • [cad] Fixed a specific case where some DWF markup would not appear after conversion to PDF.
  • [cad] Fixed an issue where some specific fonts would be replaced, even if an exact font itself were actually available during substitution.
  • [cad] Addressed an issue with DWG to PDF conversion where processing Null object ID references could cause the conversion to fail.

Office Fidelity:

  • [xlsx] Fixed an issue with trailing newlines potentially causing infinite loops in Excel documents.
  • [docx] Enhanced XML parsing to support both- [xlsx] Added support for multiple new locales in Excel formats.
  • [office] Enhanced handling of problematic floating elements.
  • [docx] Fixed an issue with incorrect end note numbering.
  • [office] Fixed incorrect default margins of text box elements.
  • [office] Fixed alignment issues for the last line in math paragraphs.
  • [office] Adjusted equation font properties for closer alignment with Word standards.
  • [xlsx] Improved handling of cases with an extreme number of empty cells in Excel documents.
  • [docx] Fixed an issue with missing repeating table header rows.
  • [xlsx] Implement the SUMIFS Excel function.
  • [office] Fixed a small memory leak issue.
  • [xlsx] Added simple handling of dialog sheets, converting them into a sheet that displays the dialog name and an informational message.
  • [docx] Fixed an issue with skipping a document section in a document with multiple sections on the the first page.
  • [docx] Fixed an issue with incorrect position of floating images outside the page boundaries.
  • [xls] Improving handling of cell styles with conditional formatting in XLS documents.
  • [docx] Improved handling of malformed tables with incomplete rows.
  • [xlsx] Fixed an issue where incorrect conditional formatting was applied to vertically merged Excel cells.
  • [xlsx] Added support for the HideTotalNumberOfPages option for Excel documents.
  • [office] Fixed issues with error code propagation in the layout. This addressed the problem of identifying the 'incorrect password' error.
  • [docx] Enhanced page numbering to continue from previous sections, addressing cases where the custom first page number is larger than the actual page number.
  • [docx] Added support for Spanish, German and French languages in numbered lists.
  • [docx] Fixed an issue with incorrectly placed floating elements inside vertically merged table cells.
  • [docx] Fixed a rare issue with incorrect alignment of heading paragraphs.
  • [xlsx] Improved theme color resolution for Excel documents.
  • [docx] Corrected the handling of French text in small caps to appropriately drop letter accents.
  • [docx] Fixed an issue with placing unbreakable paragraphs in the first row of a table.
  • [xlsx] Implemented support of table styles for pivot tables in Excel document.
  • [office] Fixed an issue with missing images when combined with gradient fill background.
  • [office] Fixed an issue where header rows were not repeated in XLS documents.
  • [office] Improved handling of rotated autofit text boxes.
  • [docx] Fixed incorrect layout of tables with center-aligned nested tables.
  • [office] Resolved an issue with gaps in paragraph borders of multiple consecutive paragraphs with the same borders.
  • [docx] Improved the calculation of spacing in justified paragraphs.
  • [office] Fixed incorrect handling of the 'EMR_SCALEVIEWPORTEXTEX' record in EMF images, resolving an issue with incorrect vertical scaling.
  • [office] Fixed the issue of incorrect placement of data labels in charts, specifically addressing labels with manual layout and large size.

PDF to Office Fidelity:

Fixes and improvements for the Structured Output Module

  • [office] Multiple improvements to pdf rendering clipping algorithms to better identify accurate polygons used in document understanding.
  • [office] A number of header and footer improvements specifically targeting one-page documents.
  • [docx] Resolved an issue with CAD source content where vertical text around architectural details is displaced.
  • [docx] Fixed an issue resulting in the top of characters in one line of text to be clipped.
  • [docx] Resolved an issue causing five rows of a table to be incorrectly merged.
  • [docx] Fixed an issue causing two columns of a table to be merged into one.
  • [docx] Fixed an issue in the clipping engine preventing successful rendering of a specific pdf.
  • [docx] Fixed an issue causing the last line of a right-to-left direction paragraph to have a hanging indent.
  • [docx] Resolved an issue where right-to-left text was incorrectly left aligned.
  • [docx] Fixed an issue preventing the rendering of leader (tabbing) characters in table of contents containing right-to-left text.
  • [docx] Fixed incorrectly wrapped right-to-left text causing a page overflow issue.
  • [docx] Resolved an indentation and alignment issue at list items in a right-to-left document.
  • [docx] Improved table of contents detection by optimizing sections across pages.
  • [office] Improved GNSE detection to independently recognize glyphs and Unicode codepoints in separate stages.
  • [office] Improved support for Arabic diacritical marks using analysis of scale and character spacing.
  • [docx] Improved border line termination in specific table cases.
  • [html] Improved the left margin alignment of a document.
  • [docx] Resolved an issue causing text misplacement when viewed on Office 2016 only.
  • [docx] Fixed a hybrid table detection issue resulting in two additional columns.
  • [docx] Fixed an issue causing line shapes to be rendered as underlines.
  • [pptx] Fixed an issue resulting in a block of text in a table to be incorrectly divided into six rows.
  • [docx] Resolved an issue that caused one table to be incorrectly split into two tables.
  • [docx] Resolved an issue causing a textbox to be divided in two parts.
  • [docx] Improved Arabic language character Unicode detection.
  • [docx] Improved alignment and indentation of content with right-to-left text direction.

Trial setup questions? Ask experts on Discord
Need other help? Contact Support
Pricing or product questions? Contact Sales