- XpdfInfo
- XpdfText
- XpdfViewer OCX
- XpdfViewer
- XpdfWidget/Qt
- XpdfPrint
- XpdfRasterizer
- XpdfAnalyze
- XpdfPS
- XpdfSplice
- Source license
|
|
XpdfText Library
The XpdfText® library extracts plain text from PDF files. The PDF
file can be on disk or in memory, and likewise, the text can be
extracted to memory or directly to disk.
XpdfText can be used in different ways:
- Convert entire PDF files or individual pages to plain text
- maintaining layout, or
- converting to "reading order"
- Extract text from a specified rectangle on a page
- useful for extracting text from forms
- Convert pages into word lists - for each word, you can retrieve:
- font name and font size
- text color
- word position on the page
- character offset (for highlight files)
The extracted text can be converted to a wide choice of standard
encodings:
- UTF-8 Unicode
- Latin1 (8-bit ISO-8859-1)
- 7-bit ASCII
- ISO-2022-CN (simplified Chinese)
- EUC-CN (simplified Chinese)
- Big5 (traditional Chinese)
- KOI8-R (Cyrillic)
- ISO-8859-7 (Greek)
- ISO-2022-JP (Japanese)
- EUC-JP (Japanese)
- Shift-JIS (Japanese)
- KSX1001 (Korean)
- TIS-620 (Thai)
- ISO-8859-9 (Turkish)
Additionally, Glyph & Cog can help you define any other encodings
you may need.
The XpdfText library also includes all of the functionality of the XpdfInfo library.
XpdfText is easy to use:
|
PDFHandle pdf;
char *buf;
int length;
pdfLoadFile(&pdf, "MyFile.pdf");
// convert to a text file on disk...
pdfConvertToTextFile(pdf, 1, 5, "MyFile.txt");
// ... or convert in memory
buf = pdfConvertToTextString(pdf, 1, 5, &length);
|
Supported platforms:
- Windows: DLL
- Windows: COM component - usable from Visual Basic, Delphi, etc.
- Linux: shared library
- other platforms: portable C++ source code for the library is
available
Buy XpdfText online at PDF Store
Contact Glyph & Cog for more
information, including pricing, documentation, and evaluation copies.
|
|