Structured Document Formats

The Congree Authoring Server is capable of processing various structured formats such as the following: XML, HTML, SGML, structured FrameMaker documents (.fm).

For this to work, it must be possible to validate a document. The structure of the document is defined by an underlying DTD which must be linked to the document. In the case of structured formats without DTD (e.g. plain XML), the sentence detection for the Authoring Memory combines all sentences and paragraphs to a single paragraph. This makes it impossible to use Authoring Memory.

To use the Language Check, it is also necessary to provide a suitable DTD interpretation which determines how the Language Check is supposed to treat various structure elements.

One of the advantages of using structured formats is the ability to use features that apply at the level of the elements of a document. For example, this includes the "Flexible settings" feature.

Specifics pertaining to the use of the Congree Authoring Server for structured document formats are described below.

Nested Elements and Congree

Structured documents may contain nested elements. Depending on the underlying DTD, an element may thus contain data (e.g. text) as well as other elements.

For the time being, nested elements can only be handled correctly in the context of the PTC® Arbortext® Editor™.

Well-formedness and Congree

Congree expects structured documents to be well-formed in terms of the XML specification. For example, this means that Congree will regard elements lacking an end tag within a document as broken, so that a check is not possible in such cases.

SGML and HTML documents may contain elements without an end tag (depending on the SHORTTAG feature). Though such cases represent correct SGML or HTML code, Congree cannot successfully perform a check.