Health Monitoring - Concept and Components

Health Monitoring - Concept and Components

Third party tools can access metrics about the health state of Congree and its server components. You can download, analyze and visualize the data in our own environment, matching your needs.

Install Congree Telemetry Collector

To access the health data, install the Congree Telemetry Collector module with at least one exporter.

Currently, there is one exporter available, the Prometheus Exporter:

Telemetry_10_DA.png

During the installation process, add this module to the server configuration. Proceed as described in Configuring the Congree Telemetry Collector.

Metrics provided by Congree

Congree allows to collect availability metrics and measurement metrics.

Metric structure

All metrics provided by Congree services have the following structure:

congree_coreserver_uptime_seconds{instance="439e796d-9a7b-4ec6-9909-a3a4ffcaec02",job="CongreeCoreServer"}

Each metric contains the value “instance”. It identifies a unique service instance and renews after each restart. Within the “instance”, the metric has a tag “job”. The tag “job” identifies a specific service.

List of Congree services and corresponding “job” tag.

Services marked with do not support health monitoring (yet).

Congree service

“job” tag

Congree service

“job” tag

Core server

CongreeCoreServer

Data Storage wrapper

CongreeDataStorageServer

Identity server

WEB API

Linguistic server

CongreeLinguisticServer

Linguistic Agent service

Congree.LinguisticAgent

UMMT server

CongreeUmmtServer

Linguistic Compiler service

Congree.Compilationservice

Authoring Memory server

CongreeAuthoringMemory

Term Web back-end

CongreeTermWebConnector

Quickterm back-end

CongreeQuickTermConnector

Content Analysis

AI Correction service

TermSync service

Congree.TermSync

Linguistic Reporting service

LinguisticReportingService

Some metrics can have additional tags to distinguish a certain service. See here as example the linguistic connection pool:

congree_linguisticserver_connectionpool_request_count{culture="de-de",instance="640bb02b-4c31-451b-82b2-ab8138cc768b",job="CongreeLinguisticServer",server_name="Default"}

The tag “culture” defines the linguistic culture. The tag “server_name” defines the linguistic server name. Mentioning the linguistic server name is only necessary if multiple linguistic servers are installed.

Availability metrics

In addition, each web service implements the endpoint https://<host>/LinguisticServer/status. This endpoint returns the status of a certain webservice in JSON format.

Example response:

{ "version": "7.0.23332.02.20250521.Dev", "upSince": "2025-05-23T12:57:57.1551312Z", "uptime": "00:22:21.8325997" }

List of web services that provide the status endpoint:

Web service

Status is available

Web service

Status is available

Core server

Data Storage wrapper

Identity server

WEB API

Linguistic server

UMMT server

Authoring Memory server

Term Web back end

Quickterm back end

Content Analysis

AI Correction service

Uptime metric

Each service measures its uptime and provides a metric with a name in the following form:

<service name>_uptime_seconds.

This metric shows the time the service was running since its start.

http_check metric

The http_check metric is part of the health_check of the OpenTelemetry Collector components that check the health or availability of an HTTP endpoint.

  • Purpose: Monitors the health of an HTTP endpoint by sending requests and recording metrics like latency, status, and availability.

  • Typical Use Case: Monitoring microservices, APIs, or web endpoints to ensure they are responsive and returning expected HTTP status codes.

Documentation: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/httpcheckreceiver/README.md

For Congree, the status endpoint has this structure:
https://<host>/<web_service>/status.
This metric works only for services that provide a status (see table above).

http_check metric

Description

Example

http_check metric

Description

Example

httpcheck_duration_milliseconds

Time taken for the HTTP request

httpcheck_duration_milliseconds{http_url="https://cltws005.int.congree.com/LinguisticServer/status"} 2

httpcheck_status

HTTP status code of the last request

httpcheck_status{http_method="GET",http_status_class="2xx",http_status_code="200",http_url="https://cltws005.int.congree.com/LinguisticServer/status"} 1

http_url is an address of the HTTP endpoint.

This receiver makes a request to the specified endpoint using the configured method. This scraper generates a metric with a label for each HTTP response status class with a value of 1 if the status code matches the class. For example, the following metrics will be generated:

  • if the endpoint returned a 200:

httpcheck_status{http_method="GET",http_status_class="1xx",http_status_code="200",http_url="..."} 0 httpcheck_status{http_method="GET",http_status_class="2xx",http_status_code="200",http_url="..."} 1 httpcheck_status{http_method="GET",http_status_class="3xx",http_status_code="200",http_url="..."} 0 httpcheck_status{http_method="GET",http_status_class="4xx",http_status_code="200",http_url="..."} 0 httpcheck_status{http_method="GET",http_status_class="5xx",http_status_code="200",http_url="..."} 0
  • if the endpoint returned a 404:

httpcheck_status{http_method="GET",http_status_class="1xx",http_status_code="200",http_url="..."} 0 httpcheck_status{http_method="GET",http_status_class="2xx",http_status_code="200",http_url="..."} 0 httpcheck_status{http_method="GET",http_status_class="3xx",http_status_code="200",http_url="..."} 0 httpcheck_status{http_method="GET",http_status_class="4xx",http_status_code="404",http_url="..."} 1 httpcheck_status{http_method="GET",http_status_class="5xx",http_status_code="200",http_url="..."} 0

Health checks

Congree provides a health check status of components by means of health check metrics. These metrics are based on the health check probe that is implemented with Microsoft Health Check libraries: https://learn.microsoft.com/en-us/aspnet/core/host-and-deploy/health-checks.

There are three health statuses: Healthy, degraded, unhealthy. The health status is obtained from the metric <server name>_healthcheck_status.

Currently, the health status is only available for the linguistic server.

Status name

Corresponding metric value

Description

Status name

Corresponding metric value

Description

Unhealthy

0

Indicates that the health check determined that the component was unhealthy, or an unhandled exception was thrown while executing the health check.

Degraded

0.5

Indicates that the health check determined that the component was in a degraded state.

Healthy

1

Indicates that the health check determined that the component was healthy.

Example of health check metrics on the linguistic server:

congree_linguisticserver_healthcheck_status{culture="de-de",instance="640bb02b-4c31-451b-82b2-ab8138cc768b",job="CongreeLinguisticServer",name="Linguistic de-de",server_name="Default"} 1 congree_linguisticserver_healthcheck_status{culture="en-us",instance="640bb02b-4c31-451b-82b2-ab8138cc768b",job="CongreeLinguisticServer",name="Linguistic en-us",server_name="Default"} 0

In addition, all of the services that implement the status endpoint, implement the health endpoint like https://<host>/LinguisticServer/health. Example response:

{ "entries": { "Linguistic de-de": { "data": { "culture": "de-de", "server_name": "Default" }, "description": null, "duration": "00:00:00.0000059", "exception": null, "status": "Healthy", "tags": [] }, "Linguistic en-us": { "data": { "culture": "en-us", "server_name": "Default" }, "description": "Failed to load projects: An error occurred on Congree Linguistic Engine (Englisch) at localhost: Linguistic Engine is not available. Error: Connection refused by server.", "duration": "00:00:00.0000057", "exception": null, "status": "Unhealthy", "tags": [] } }, "status": "Unhealthy", "totalDuration": "00:00:00.0012756" }

Measurements

Currently Linguistic server only collects request related data. Linguistic server offers the following metrics:

Name

Unit

Description

Name

Unit

Description

congree_linguisticagent_attempts_retrieve_job_Count_total

job (count)

Total number of attempts to retrieve jobs / counter

congree_linguisticagent_job_processing_duration_ms_milliseconds_bucket

ms

Time taken to process jobs in milliseconds / histogram

congree_linguisticagent_job_processing_duration_ms_milliseconds_sum

ms

Total time in milliseconds spent in job processing

congree_linguisticagent_job_processing_duration_ms_milliseconds_count

count

Count of measurements of congree_linguisticagent_job_processing_duration_ms_milliseconds_bucket

congree_linguisticagent_jobs_received_total

job (count)

Total number of jobs received

congree_linguisticagent_jobs_processed_total

job (count)

Total number of jobs processed

congree_linguisticagent_jobs_failed_total

job (count)

Total number of jobs that failed

congree_linguisticserver_connectionpool_request_count

request (count)

Number of active requests to the Linguistic Engine in the moment

congree_linguisticserver_connectionpool_request_time_seconds_bucket

s

Request time to the Linguistic Engine in seconds. It is a histogram.

congree_linguisticserver_connectionpool_request_time_seconds_sum

s

Total time in seconds spent in requests to the Linguistic Engine

congree_linguisticserver_connectionpool_request_time_seconds_count

count

Count of measurements of congree_linguisticserver_connectionpool_request_time. In other words, the value is always the same as in congree_linguisticserver_connectionpool_requests_total

congree_linguisticserver_connectionpool_requests_total

request (count)

Total number of requests to the Linguistic Engine

congree_linguisticserver_connectionpool_successful_requests_total

request (count)

Total number of successful requests to the Linguistic Engine

congree_linguisticserver_connectionpool_failed_requests_total

request (count)

Total number of failed requests to the Linguistic Engine