Style Guide - User Lexicon - Training Data

Basic information about the dictionary trainer

In your everyday work with Congree, you will repeatedly come across messages that are not necessary in your company environment:

Spelling variants that have become established in your company contrary to the standard
Words that are not stored as terms but are still used as fixed names
Proper names that are not recognized as such

With the dictionary trainer, you can transfer these deviating spellings more easily or even automatically into your user dictionary and avoid false-positive messages and thus noise in the Congree language check.

The dictionary trainer captures ignored messages that are context-independent (e.g. messages for the spelling rule "unknown"). It is part of the user dictionary and is activated separately for each document-specific rule set. The ignored words are recorded in the user dictionary, training data area.

The training data offers three basic functions:

Overview of ignored messages for manual management
Automatically hide certain messages for this user in future
Automatically define ignored messages for all users as part of the user dictionary

Rule overview

Ignoring the following rules is recorded by the dictionary trainer:

German:

Spelling:

unknown
ff
uh

Grammar:

211de
212de
213de
214de

English:

unknown
unknowndigit
acronymext

French and Spanish:

unknown

Activating the dictionary trainer

The dictionary trainer is activated individually for each document-specific rule set. To do this, open the Settings > Document window in the Congree Control Center and scroll to the "Language check" section. Activate the checkbox for the dictionary trainer there.

The ignored messages from the above list are now recorded.

If you use several document-specific rule sets and want to activate the dictionary trainer everywhere, you must make this setting in all document-specific rule sets. This also applies if you use the same editorial guide in different document-specific rule sets.

Working with the dictionary trainer

Ignoring the message

You have a message from the above list and ignore the message. In this example, this is the message of the unknown rule for the word "FastInnoLab".

The word "FastInnoLab" is displayed in the Congree Control Center Web in the Editorial Guide > User Dictionary > Training Data window.

Managing the training data

In the Congree Control Center Web, open the Editorial Guide > User Dictionary > Training Data window. All ignored messages are displayed here in chronological order. The entries in this list cannot be filtered or sorted.

The columns mean in detail:

The word for which the message was ignored
Category of the ignored rule
Code of the ignored rule
Indication of how often the message for this word was ignored in total
Indication of how many different authors have ignored the message for this word (no list by name)
Remove this word from the list
Add this word to the user dictionary (e.g. as a new noun)
Activate/deactivate multiple entries
Remove/accept all activated entries
Configuration for automatic processing of the training data (see next chapter)

If you leave a word unedited in the list or remove it from the list by clicking on the button in column 6, messages will also be displayed in future. If you transfer a word to the user dictionary, it will be treated as part of the user dictionary after the next compilation of the editorial guide. No more messages will be issued for this word.

Automatic processing of training data

You can define values from which the entries in the list of training data are processed automatically. To do this, click on the cogwheel symbol (point 10 in the screenshot above).

The Configure training data dialog opens.

The dots mean in detail:

If an individual user ignores this message as often as specified, the message for this word is no longer displayed for this user.
During the next compilation, the words from the list are automatically transferred to the user dictionary (as a noun or abbreviation) if the conditions defined below are met.
The word must have been ignored at least as often as specified.
At least that many different authors must have ignored the word.
(If 3 and 4 are active at the same time, this is interpreted as an AND link, i.e. both conditions must be fulfilled at the same time).

In order for the words to be moved automatically during the next compilation (point 2), at least one of points 3 or 4 must also be activated.

Examples:

The numerical values in the screenshot have the following effects:

Point 1: If "FastInnoLab" has been ignored three times for an author, the "unknown" rule will no longer output any messages for this word for this author only. However, messages are still output for another word (e.g. QuickInnoLab). Colleagues will also see messages for this word. This effect is active immediately. The author will therefore no longer see any messages for this word in a longer text.

Points 3 and 4: If the message for the word FastInnoLab has been ignored a total of at least five times by at least three different authors, it will be transferred to the user dictionary during the next compilation. This can mean One author has ignored the message three times, two colleagues once each. Or five authors have each ignored the message once. Even if an author ignores the message ten times, the word is not automatically adopted unless other authors also ignore the message.

Tips

Make it a habit to check the training data regularly. This shows you which adjustments in the user dictionary can be useful.

If you transfer words to the user dictionary, you should - as usual - enrich them with linguistic data (case variants, information on genus and semantics). This affects the accuracy with which the word is recognized and treated in the context of future texts. Don't forget to make these adjustments, especially during automatic transfer.

When configuring the training data, we recommend that you do not leave any counter at 1. This prevents the automatism from being activated even if a message has been inadvertently ignored.
Settings 3 and 4 in the Configure training data dialog box do not have to be active at the same time. However, we recommend that you always activate at least point 3. This will also prevent accidental actions.

The training data - if configured - is automatically applied during the next compilation. This is usually the nightly TermSync. Start a manual compilation if you want the training data to be immediately available in the user dictionary.

Check the training data regularly, even with automatic configuration, in order to be able to adjust the counters.