pdfExtractTextFromRect

Extract text from a rectangular region.
char *pdfExtractTextFromRect(PDFHandle pdf, int page, double x0, double y0, double x1, double y1, int *length)
This function extracts text from a rectangular region on a page, and returns the resulting text in a string.

The rectangle is defined by two opposite corners: (x0, y0) and (x1, y1). The coordinates are in a coordinate space that places (0,0) at the top-left corner of the page and has 72 units per inch.

pdfExtractTextFromRect returns a string if successful, or NULL if text extraction is prohibited by this PDF file.

The string is returned, and *length is filled in with the string length. The string will be zero-terminated, but it may contain zero bytes, depending on the current text encoding (see pdfSetTextEncoding). The caller is responsible for freeing the string with the pdfFreeMemory function.

This function is identical to pdfExtractTextFromRect2 except that it takes points in a top-down coordinate space.

See the "Setting parameters" section in the function list for settings that affect text extraction.

C:
char *buf; int length; /* extract a rectangle 4" from the left side, 1" down from * the top, 2" wide, 0.5" high, on page 1 */ if (!(buf = pdfExtractTextFromRect(pdf, 1, 4*72, 1*72, 6*72, 1.5*72, &length))) { /* handle the error */ } ... pdfFreeMemory(buf);
pdfExtractTextFromRect2
pdfConvertToTextFile
pdfConvertToTextString
pdfFreeMemory