XpdfText can be used in different ways:
- Convert entire PDF files or individual pages to plain text
- maintaining layout, or
- converting to "reading order"
- Extract text from a specified rectangle on a page
- useful for extracting text from forms
- Convert pages into word lists – for each word, you can
- font name and font size
- text color
- word position on the page
- character offset (for highlight files)
The extracted text can be converted to a wide choice of standard encodings, including UTF-8 Unicode, ISO-8859-1 (Latin-1), 7-bit ASCII, and various other language-specific encodings.
The XpdfText library also includes all of the functionality of XpdfInfo.
XpdfText is easy to use:
- Windows: DLL
- Windows: COM component - usable from .NET, Visual Basic, Delphi, etc.
- Mac OS X: shared library
- Linux: shared library
- 32-bit and 64-bit versions available for all platforms
- other platforms: portable C++ source code for the library is available
See also: For content extraction to XML (instead of plain text), try our PDFdeconstruct tool.
Contact Glyph & Cog for more information, including pricing, documentation, and evaluation copies.