Structure Tree
The structure tree will be included in the XML output if the PDF file includes a structure tree, and structure tree output is enabled.struct
elements form a tree,
with mcid
elements as leaves.
Each struct
element has a type
attribute. These are specified in the PDF file, and may or may not
follow any particular standard. But they typically provide
information about the document's logical structure.
A struct
element may have
an alt
attribute specifying alternate text for the
corresponding document section.
The mcid
attribute in
each content
element provides a link into the
document content. ("MCID" stands for marked content ID, jargon taken
from the PDF spec.)
When structure tree output is enabled, the page drawing operators are
(optionally) surrounded by structmark
elements:
structmark
elements are never nested: any
particular content element is never inside more than
one structmark
element. Some content may not be
inside any structmark
element.
The mcid
attributes can be used to cross-reference
between the structure tree and the page content. Note that page
content may be drawn in arbitrary order. In general, the order of
page drawing operations, and hence the order
of structmark
elements, will not be exactly the
same as a depth-first traversal of the structure tree.