Some test text!
Cli / Guides / Usage
Apryse PDF2Text is a command-line application designed to convert PDF documents to text or XML. This section covers the basic usage of PDF2Text explaining all of the available options.
The basic command-line syntax is:
pdf2text [options] file1 file2 folder1 file3 ...
See more options in Command-Line Summary for PDF2Text
Notes:
This command heavily relies on defaults. The default output image format is plain text.
The '-o' (or --output) parameter is used to specify the output folder. If this option was not specified, text extracted will show in the console window.
pdf2text -o ex1 test/importantdoc.pdf
Notes:
'-a' or '--pages' option is used to specify the pages to be converted.
'-f' option specifies output file format.
'--xml_output_styles' option is used to show font and styling information.
'--noligatures' option is used to keep ligature setting of the PDF file.
'--remove_hidden_text' option is used so that hidden text of the PDF file can be removed.
'--output' is equal to '-o', specifies the output folder.
pdf2text --output ex2 -a 3-10 -f xml --xml_output_styles --noligatures --remove_hidden_text test/impotantdoc.pdf
pdf2text -f textruns -o ex3 --c 0,0,595,842 test/blue_secret.pdf
PDF2Text supports processing of multiple input documents in the same run. For example, it is possible to specify multiple PDF folders and PDF2Text will automatically process all PDF documents matching a given file extension. For example, the following command-line will process all PDF documents in folders 'test1' and 'test2'
c:\>pdf2text -o c:/output_folder c:/test1 c:/test2
Wildcard characters can also be used to process multiple input files.
For example, if a directory contains the following PDF documents:
C:\test1 >dir
Directory of C:\test1
01/04/2007 03:35 PM <DIR> .
01/04/2007 03:35 PM <DIR> ..
05/21/2004 02:27 PM A1.pdf
05/03/2005 09:38 AM A2.pdf
05/20/2003 08:46 AM B1.pdf
05/15/2003 12:50 PM B2.pdf
To process all PDF documents in this folder, you could specify:
pdf2text -o c:/output_folder c:/test1/*.pdf
To process all PDF documents starting with 'A', you could specify:
pdf2text -o c:/output_folder c:/test1/A*.pdf
Or to process all PDF documents ending with '1', you could specify:
pdf2text -o c:/output_folder c:/test1/*1.pdf
You can use either of the two standard wildcards --- the question mark (?) and the asterisk (*) --- to specify filename and path arguments on the command line.
The wildcards are expanded in the same manner as operating system commands. (Please refer to your operating system user's guide if you are unfamiliar with wildcards). Enclosing an argument in double quotation marks (" ") suppresses the wildcard expansion. Within quoted arguments, you can represent quotation marks literally by preceding the double-quotation-mark character with a backslash (\). If no matches are found for the wildcard argument, the argument is passed literally.
To provide additional feedback, PDF2Text returns exit codes after completing processing. The exit codes can be used to provide user feedback, for logging etc. This is particularly important for applications running in an unattended environment.
The following table lists possible exit codes and their description:
Exit Code Description
--------------- ------------------------------------------------------------------
0 All files converted successfully.
1 Document is secured. Need a valid password to open the document.
2 Error opening the input file(s).
3 An unknown exception encountered.
All codes other then '0' indicate that there was an error during the conversion process.
The following illustrates a sample Windows batch script that processes exit codes:
@echo off rem convert all PDF files in 'data' folder
pdf2text ./data
if errorlevel 1 goto passwd
if errorlevel 2 goto inputerr
if errorlevel 3 goto othererror
if errorlevel 0 goto exit
:passwd
echo Document is protected. Need a valid password to open the document.
goto exit
:inputerr
echo No input files specified.
goto exit
:othererror
echo An error encountered during processing.
goto exit
:exit
Trial setup questions? Ask experts on Discord
Need other help? Contact Support
Pricing or product questions? Contact Sales