Glyph & Cog, LLC
search:  
 
     


Home
Products
       - XpdfInfo
       - XpdfText
       - XpdfViewer OCX
       - XpdfViewer
       - XpdfPrint
       - XpdfRasterizer
       - XpdfAnalyze
       - XpdfPS
       - Source license

Services
Tips & Misc
Buy
Tech Support
Company
Contact
 
XpdfTextTM Library

The XpdfText library extracts plain text from PDF files. The PDF file can be on disk or in memory, and likewise, the text can be extracted to memory or directly to disk.

XpdfText can be used in different ways:

  • Convert entire PDF files or individual pages to plain text
    • maintaining layout, or
    • converting to "reading order"
  • Extract text from a specified rectangle on a page
    • useful for extracting text from forms
  • Convert pages into word lists - for each word, you can retrieve:
    • font name and font size
    • text color
    • word position on the page
    • character offset (for highlight files)

The extracted text can be converted to a wide choice of standard encodings:

  • UTF-8 Unicode
  • Latin1 (8-bit ISO-8859-1)
  • 7-bit ASCII
  • ISO-2022-CN (simplified Chinese)
  • EUC-CN (simplified Chinese)
  • Big5 (traditional Chinese)
  • KOI8-R (Cyrillic)
  • ISO-8859-7 (Greek)
  • ISO-2022-JP (Japanese)
  • EUC-JP (Japanese)
  • Shift-JIS (Japanese)
  • KSX1001 (Korean)
  • TIS-620 (Thai)
  • ISO-8859-9 (Turkish)

Additionally, Glyph & Cog can help you define any other encodings you may need.

The XpdfText library also includes all of the functionality of the XpdfInfo library.

XpdfText is easy to use:

PDFHandle pdf;
char *buf;
int length;

pdfLoadFile(&pdf, "MyFile.pdf");

// convert to a text file on disk...
pdfConvertToTextFile(pdf, 1, 5, "MyFile.txt");

// ... or convert in memory
buf = pdfConvertToTextString(pdf, 1, 5, &length);

Supported platforms:

  • Windows: DLL
  • Windows: COM component - usable from Visual Basic, Delphi, etc.
  • Linux: shared library
  • Solaris: shared library
  • other platforms: portable C++ source code for the library is available

PDF StoreBuy XpdfText online at PDF Store

Contact Glyph & Cog for more information, including pricing, documentation, and evaluation copies.

 
   
    Copyright 2008 Glyph & Cog, LLC