Structure Tree

The structure tree will be included in the XML output if the PDF file includes a structure tree, and structure tree output is enabled.
<structtree> <struct type="struct-node-type"/> <struct type="struct-node-type"/> <content mcid="mcid"/> ... </structtree>
The struct elements form a tree, with mcid elements as leaves.

Each struct element has a type attribute. These are specified in the PDF file, and may or may not follow any particular standard. But they typically provide information about the document's logical structure.

A struct element may have an alt attribute specifying alternate text for the corresponding document section.

The mcid attribute in each content element provides a link into the document content. ("MCID" stands for marked content ID, jargon taken from the PDF spec.)

When structure tree output is enabled, the page drawing operators are (optionally) surrounded by structmark elements:

<structmark mcid="mcid"> <textop ... /> <fill ... /> ... </structmark>
structmark elements are never nested: any particular content element is never inside more than one structmark element. Some content may not be inside any structmark element.

The mcid attributes can be used to cross-reference between the structure tree and the page content. Note that page content may be drawn in arbitrary order. In general, the order of page drawing operations, and hence the order of structmark elements, will not be exactly the same as a depth-first traversal of the structure tree.