_Authoring_Memory_Sentence_Segmentation

Important:

The Authoring Memory sentence segmentation must be configured in the third-party software for the sentence database, not in the settings of the Congree Authoring Server. Normally, the sentence segmentation does not need to be adjusted. If possible, any changes should be coordinated with Congree in order to prevent negative side effects.

The sentence segmentation used by Congree in the Authoring Memory is rule-based, i.e. on the basis of rules, Congree determines where a sentence ends and where a new one begins. The sentence segmentation has a major impact on the identification of previously saved similar sentences for the text entered in the editor.

Each stored sentence rule consists of three parts:

  1. The first part of the rule (e.g. [!]) indicates which separator the rule covers.
  2. The second part of the rule determines whether the rule defines the end of a sentence (+) or not (-).
  3. The third part of the rule represents the core of the rule. The part [!^_] of the rule [!]+[!^_] stands for "An exclamation mark followed by a space", which is interpreted as the end of the sentence.

Note:

In the event of overlapping sentence rules, the more specific rule will override the more general rule. Example: A period followed by a space is interpreted as the end of the sentence. However, if the next word begins with a lowercase letter, the rule will be overridden, and the period will not be interpreted as the end of the sentence.

The following table contains the most important sentence rules used in Congree by default:

Sentence rule

Meaning

[!]+[!^_]

An exclamation mark followed by a space is interpreted as the end of the sentence.

[.] - [^_^n.]

A period followed by a space and a lowercase letter is not interpreted as the end of the sentence.

[.]+[.^_]

A period followed by a space is interpreted as the end of the sentence.

[.] - [.^_^a]

A period followed by a space and a lowercase letter is not interpreted as the end of the sentence.

[.] - [^_^n.]

A space followed by a one-digit number and a period is not interpreted as the end of the sentence.

[?]+[?^_]

A question mark followed by a space is interpreted as the end of the sentence.

[?] - [?^_^a]

A question mark followed by a space and a lowercase letter is not interpreted as the end of the sentence.

[n]+[.\n]

A period followed by a backslash and the letter n is interpreted as the end of the sentence.
Background of this rule: Normally, the character string \n stands for a line break. According to this rule, an end of sentence would be identified e.g. in the following string after \n: "Unable to load file.\nError: 0x%x"

[n]+[!\n]

An exclamation mark followed by a backslash and the letter n is interpreted as the end of the sentence.

[n]+[?\n]

An exclamation mark followed by a backslash and the letter n is interpreted as the end of the sentence.

[t]+[.\t]

A period followed by a backslash and the letter t is interpreted as the end of the sentence.

Background of this rule: Normally, the character string \t stands for a horizontal tab. According to this rule, an end of sentence would be identified e.g. in the following string after \t: "Find...\tCtrl+F"

[t]+[!\t]

A question mark followed by a backslash and the letter t is interpreted as the end of the sentence.

[t]+[?\t]

A question mark followed by a backslash and the letter t is interpreted as the end of the sentence.