Introduction

Overview

The XpdfText® library allows you to extract plain text from PDF files. Text can be extracted from one or multiple pages, and can be written to a file on disk or stored in a buffer in memory. Text can also be extracted from a rectangular region on a page.

The XpdfText library uses Unicode internally. It can provide text in Unicode format, or it can convert to a user-selected encoding.

The XpdfText library also includes all of the XpdfInfo functions for extracting PDF Info dictionary entries.

Supported Platforms

Intellectual Property

The XpdfText library and documentation are Copyright 1996-2024 Glyph & Cog, LLC.

The PDF data structures, operators, and specification are documented in ISO 32000-2:2020.

About Glyph & Cog

Glyph & Cog designs and implements software for manipulating electronic documents. Current offerings include software libraries, components, and consulting services related to reading, viewing, and converting PDF files.

For more information, visit our web site at www.glyphandcog.com.