8.1 Externalized Metrics 8.3 Performance Tuning

8.2.1OpenTelemetry Integration
Trial

OpenTelemetry is a framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs. Starting with the 2024.02 release, support for instrumentation with the OpenTelemetry agent is added to Smile CDR. This makes Smile CDR generate telemetry data that can be used to monitor Smile CDR's performance.

Currently, Smile CDR generates traces and metrics using the OpenTelemetry Java agent. This feature is currently in trial phase and is basically an auto-instrumentation with some minimal Smile CDR related customizations. The auto-instrumentation with the OpenTelemetry Java agent generates telemetry data for the libraries and frameworks that the agent has built-in support for. These include many libraries and frameworks used or supported by Smile CDR as well.

As this integration is still in trial, the details of the data generated are subject to change. These details include the names (such as trace spans and metric names), the trace structure and attributes exposed by trace spans and metrics. Feedback for this feature is welcomed.

8.2.2Observability Backends
Trial

To consume OpenTelemetry data generated by Smile CDR, you need observability backends. An observability backend collects, persists the telemetry data and makes it available for monitoring (querying, visualizing and alerting if system is not performing as desired). There are many open source and commercial backends supporting OpenTelemetry. Smile CDR does not recommend any particular observability backend because the choice may depend on your needs, preferences and deployment environment. Since OpenTelemetry is an open standard, you should be able to work with any backend that supports OpenTelemetry.

Some example observability backends and environments that support OpenTelemetry are:

Jaeger is a backend for traces
Prometheus is backend for metrics
OpenTelemetry Collector is a middleware between applications and backends
AWS OpenTelemetry
Azure OpenTelemetry

We also have a very basic otel-backend-starter project for learning purposes. This project provides a docker-compose setup to run Jaeger, Prometheus, and OpenTelemetry Collector locally.

8.2.3Enabling OpenTelemetry Instrumentation in Smile CDR
Trial

To enable OpenTelemetry instrumentation in Smile CDR, you need to set an environment variable called CDR_OTEL when running Smile CDR.

For example,

CDR_OTEL=y bin/smilecdr start

The value of the CDR_OTEL variable has no significance, the instrumentation is enabled as long as the variable is set. When this variable is set, the Smile CDR process is auto-instrumented with the OpenTelemetry Java agent. A version of the OpenTelemetry Java agent is bundled with the Smile CDR release so there is no need to download it separately.

8.2.4Agent Configuration
Trial

By default, the java agent is configured using the following properties file.

otel.service.name=smilecdr
# Enable Smile Extension to Java Agent.
# The extension customizes auto-instrumentation with Smile specific attributes,
# such as adding smilecdr.moduleId to http server metrics.
otel.javaagent.extensions=./otel/bin/cdr-otel-agent-extension.jar
# Send OTEL logging to slf4j so that javaagent logs gets logged in smile.log
otel.javaagent.logging=application
# Capture X-Request-ID as a http span attribute
otel.instrumentation.http.server.capture-response-headers=X-Request-ID
# Use stable semantic conventions for http spans
otel.semconv-stability.opt-in=http
# Disable exporting logs by default, you can set environment variable OTEL_LOGS_EXPORTER to 'otlp' to enable it
otel.logs.exporter=none
# Logback appender related options below take effect only if exporting logs are enabled.
# Enable the capture of experimental log attributes 'thread.name' and 'thread.id'
otel.instrumentation.logback-appender.experimental-log-attributes=true
# add all mdc attributes to exported log record, these include 'requestId' and 'moduleId'
otel.instrumentation.logback-appender.experimental.capture-mdc-attributes=*
# File that contains rules for extracting metrics from JMX MBeans
otel.jmx.config=./otel/etc/jmx_rules.yaml

If you would like to use your own agent configuration file instead of this default configuration, you need to set the OTEL_JAVAAGENT_CONFIGURATION_FILE environment variable to specify the path to your agent configuration file. For example,

CDR_OTEL=y OTEL_JAVAAGENT_CONFIGURATION_FILE=<path_to_your_agent_config_file> bin/smilecdr start

Alternatively, you can override or set individual configuration options for the agent using other OpenTelemetry Java environment variables.

8.2.5Enabling/Disabling Exported Data
Trial

Starting with Smile CDR 2024.11, the OpenTelemetry Java agent that is included with Smile CDR has exporting logs enabled by default. However, exporting logs is disabled by the Smile CDR's Java agent configuration to preserve the agent's old behaviour, where exporting logs was disabled by default. The reason for this is enabling it by default could be a breaking change for users who adopted OpenTelemetry before the 2024.11 release, and don't have a logging backend set up to consume the exported logs.

To enable exporting logs from Smile CDR directly, set the OTEL_LOGS_EXPORTER environment variable to otlp when running Smile CDR, in addition to setting CDR_OTEL.

For example,

CDR_OTEL=y OTEL_LOGS_EXPORTER=otlp bin/smilecdr start

To disable trace or metric exporters, which are enabled by default, set OTEL_TRACES_EXPORTER or OTEL_METRICS_EXPORTER environment variables to none, respectively.

8.2.6Correlating Logs and Traces
Trial

If exporting logs via the agent is enabled, the agent also exports current trace id and span id as part of the log record. These ids are also available in the Smile system logs. Current trace_id and span_id appear on the system log lines with T: and S: prefixes, respectively.

8.2.7Vendor specific OpenTelemetry tools and agents
Trial

Some cloud vendors, such AWS and Azure, provide their own distributions of tools for OpenTelemetry. With such cloud vendors, there are 2 general approaches you can take:

The first approach is to use the OpenTelemetry Java agent bundled with Smile CDR, and use and configure OpenTelemetry Collector to convert the data to vendor specific format. If you follow this approach you need to run Smile CDR with the CDR_OTEL environment variable set as explained in the previous section so that Smile CDR is instrumented with the Java agent.

The second approach is to use the OpenTelemetry Java agent distribution provided by a vendor, if there is one. In this approach when running Smile CDR you do not set the CDR_OTEL environment variable but instead set JAVA_TOOL_OPTIONS environment variable to instrument the Smile CDR process.

Both of these approaches are explained in detail next for AWS and Azure.

8.2.7.1AWS OpenTelemetry

AWS provides its own distribution of the OpenTelemetry Java agent and collector.

You can use the AWS Distro for OpenTelemetry Collector to export telemetry data in AWS formats that can be consumed by AWS CloudWatch and AWS X-Ray. For this to work, you run Smile CDR with the CDR_OTEL environment variable set and configure AWS Distro for OpenTelemetry Collector to export data in AWS formats. You can see some examples for AWS OpenTelemetry Collector Configurations in the AWS Observability repo.

You may also decide to use the AWS Distro for the Java agent instead of the Java agent bundled with SmileCDR. For this, when running SmileCDR, do not set the CDR_OTEL environment variable, but instead set the JAVA_TOOL_OPTIONS environment variable to instrument Smile CDR. For example,

JAVA_TOOL_OPTIONS=-javaagent:<path-to-aws-otel-java-agent-jar> OTEL_SERVICE_NAME=smilecdr bin/smilecdr start

You would still need to use AWS Distro for OpenTelemetry Collector to be able to convert telemetry data to AWS specific formats to consume them in AWS Cloudwatch and AWS X-Ray.

8.2.7.2Azure OpenTelemetry

Azure provides its own Application Insights OpenTelemetry Java agent, and there is a community provided Azure Monitor Exporter to be used with OpenTelemetry Collector.

If you decide to use the Azure Monitor Exporter for the OpenTelemetry Collector then you need to configure your OpenTelemetry Collector to export to azuremonitor as explained in its readme, and run Smile CDR with CDR_OTEL environment variable set.

Alternatively, you may decide to use the Application Insights Java agent directly, instead of the Java agent bundled with Smile CDR. In this approach you do not use the OpenTelemetry Collector, and when running Smile CDR, do not set the CDR_OTEL environment variable, but instead set the JAVA_TOOL_OPTIONS environment variable to instrument Smile CDR. You also need to configure Application Insights Agent according to the instructions provided by Azure.

For example,

JAVA_TOOL_OPTIONS=-javaagent:<path-to-azure-application-insights-agent-jar> bin/smilecdr start

while having a applicationinsights.json configuration file in the same directory as the Application Insights agent jar, with a content similar to following:

{
  "connectionString":"<your_connection_string>",
  "role": {
    "name": "smilecdr"
  }
}

8.2.8Custom Telemetry Data Provided by Smile CDR
Trial

8.2.8.1FHIR Endpoint HTTP Traces

The following additional attributes are added to the root span in a FHIR Endpoint HTTP trace:

smilecdr.fhir_endpoint.request_id: The request id. The request id is also available through http.response.header.x-request-id attribute as the default agent configuration instructs agent to capture it from the response header. The difference between the two is smilecdr.fhir_endpoint.request_id is a string valued attribute whereas http.response.header.x-request-id is an array valued attribute. The string valued version is added because it is easier to search for when using backends that do not support searching array valued attributes yet.
smilecdr.fhir_endpoint.tenant_id: This will be present if partitioning is enabled, and indicates the id of the tenant.
smilecdr.fhir_endpoint.username: The name of the user making the request. This attribute is not present if there is no user involved, for example when using Client Credentials Authorization flow of OIDC, which is a system flow.
smilecdr.fhir_endpoint.oidc_client_id: This attribute is the OIDC client id and present only when using Smart Auth and OIDC clients are managed by Smile CDR.
smilecdr.fhir_endpoint.restful_interaction_code: The interaction code for the request, e.g. read, vread, transaction etc.
smilecdr.fhir_endpoint.request.path.operation_name: If the request is a FHIR extended operation, this attribute is present, and is the name of the operation, e.g. $everything, $meta etc.
smilecdr.fhir_endpoint.request.path.resource_type: The resource type from the request path, e.g. for a request path Patient/1234, the value is Patient.
smilecdr.fhir_endpoint.request.path.logical_id: The logical id from the request path, e.g. for a request path Patient/1234, the value is 1234.
smilecdr.fhir_endpoint.response.resource_type: If a successful request returns a FHIR resource as a response, this attribute is present and is the type of that resource.
smilecdr.fhir_endpoint.response.resource_logical_id: If a successful request returns a FHIR resource that has an id in the response, this attribute is present, and it is the logical id of that resource. This is useful for requests that create a resource with a server-generated id (such as a POST request that creates a resource).

Note: These additional FHIR span attributes (except for the http.response.header.x-request-id) are not available for unauthenticated requests (i.e. the requests that result in a 401 HTTP status code) as they are currently added after a request is authenticated.

If you would like to implement an interceptor to add your own span attributes to the root span in a trace, see accessing the local root span from an interceptor.

8.2.8.2HTTP Server Thread Pool Metrics

The following http thread pool related metrics are available for modules using an HTTP endpoint:

smilecdr.http.server.thread.count: The current number of threads. The metric has smilecdr.http.server.thread.state as an attribute to identify the state of the thread, which can be busy or idle.
smilecdr.http.server.thread.max: The maximum number of threads in the pool.
smilecdr.http.server.thread.queue_size: The number of jobs in the queue waiting for a thread.

Each of these metrics has smilecdr.module_id as an attribute to identify the module emitting the metric.

8.2.8.3Web Session Metrics

The following metric is available for modules that support web sessions:

smilecdr.session.count: The current number of sessions.

The metric has smilecdr.module_id as an attribute to identify the module emitting the metric.

8.2.8.4HL7 v2.x Inbound Message Processing Traces and Metrics

For HL7 v2.x inbound messages ingested by Smile CDR through HL7 v2.x endpoint, the following additional telemetry data are available.

8.2.8.4.1Traces

A parent span named smilecdr.hl7v2.inbound_message.process is generated with the following additional span attributes that contain details of the message that is processed:

smilecdr.hl7v2.inbound_message.type (the type of the hl7 v2.x message , e.g. ADT_A01)
smilecdr.hl7v2.inbound_message.version (the version of the hl7 v2.x message, e.g. 2.5)
smilecdr.hl7v2.inbound_message.control_id (the control id of the hl7 v2.x message)

This parent span also captures any conversion issues, that are added to the conversion result during hl7 v2.x to FHIR conversion, as span events with the following attributes:

smilecdr.hl7v2.inbound_message.conversion_issue.level (the severity of the conversion issue)
smilecdr.hl7v2.inbound_message.conversion_issue.message (the description of the conversion issue)
smilecdr.hl7v2.inbound_message.conversion_issue.path (the location of the conversion issue)

8.2.8.4.2Metrics

The following two metrics are generated for counting received and failed messages:

smilecdr.hl7v2.inbound_message.count (a counter that is incremented for each hl7 v2.x message received for processing)
smilecdr.hl7v2.inbound_message.error_count (a counter that is incremented for each hl7 v2.x message failed to be processed)

Both of these metrics has smilecdr.hl7v2.inbound_message.type and smilecdr.hl7v2.inbound_message.version as attributes so that the counts are available per message type+version pair.

The following metric is generated for monitoring the duration of successfully processed hl7 v2.x messages as a histogram:

smilecdr.hl7v2.inbound_message.duration

The following metric is generated for counting the FHIR resources included in the FHIR transaction bundles generated by HL7 v2.x to FHIR conversions.

smilecdr.hl7v2.inbound_message.conversion_resources_count (the number of FHIR resources in the transaction bundles generated by HL7 v2.x to FHIR conversions)

Note, this metric is not the actual resource counts that are persisted. This metric is published after the conversion but before the transaction is processed. A resource in a transaction bundle may not be persisted if it is a conditional update/create or if transaction fails. For actual persisted resource counts, see Storage Metrics. The metric takes into account the resources that would be created or updated (i.e. the bundle entries with "PATCH", "PUT", or "POST" HTTP verbs are included, whereas "DELETE" operations are ignored). Also, any "contained resources" are not counted.

This metric has smilecdr.hl7v2.inbound_message.type, smilecdr.hl7v2.inbound_message.version, and smilecdr.hl7v2.inbound_message.conversion_resource_type as attributes, which allow to get counts per message type, version and FHIR resource type.

8.2.8.5CDA Exchange+ Traces and Metrics

When importing CDA documents and exporting FHIR resources with the CDA Exchange+ module the following telemetry data is available.

8.2.8.5.1Traces

These are the parent spans that are used when processing CDA Exchange+ documents:

smilecdr.cdaexchange.cda_to_fhir.process: This span is used during import of CDA documents.
smilecdr.cdaexchange.fhir_to_cda.process: This is the span used during export of FHIR resources.

8.2.8.5.2Metrics

During import of CDA documents the following metrics are generated:

smilecdr.cdaexchange.cda_to_fhir.document_count: This counter is incremented every time a CDA document is received.
smilecdr.cdaexchange.cda_to_fhir.document_error_count: This counter is incremented any time an error is encountered during import of a CDA document.
smilecdr.cdaexchange.cda_to_fhir.conversion_resource_count: This is the number of FHIR resources contained in the bundle generated by the CDA import process.
smilecdr.cdaexchange.cda_to_fhir.parse_success_count: This counter is incremented when a CDA document XML is parsed successfully.
smilecdr.cdaexchange.cda_to_fhir.parse_failure_count: This counter increments whenever an XML parse error occurs while processing a CDA document.

During export of FHIR resources the following metrics are generated:

smilecdr.cdaexchange.fhir_to_cda.document_count: This counter increments whenever a CDA document is exported.
smilecdr.cdaexchange.fhir_to_cda.conversion_resource_count: This is the number of FHIR resources that are contained in the bundle used to generate the CDA document.

8.2.8.6Camel Route Traces

Traces for Camel Routes are generated without requiring any additional configuration. For Smile Component processors, the base URI of the processor is used as the span name. That is, each Camel processor span for the Smile Component is named in the following format smile://[moduleId]/[processorName].

8.2.8.7JavaScript Callback Spans

An OpenTelemetry span is generated for any JavaScript callback execution. Such spans are named as smilecdr.javascript_callback. The name of the JavaScript callback function is available as a span attribute named smilecdr.javascript_callback.function_name.

8.2.8.8Interceptor Method Spans

An OpenTelemetry span is generated for any interceptor method execution. Such spans are named as hapifhir.interceptor. The name of the pointcut, the interceptor class name and the interceptor method name are available as the following span attributes:

hapifhir.interceptor.pointcut_name
hapifhir.interceptor.class_name
hapifhir.interceptor.method_name

8.2.8.8.1Accessing the Local Root Span from an Interceptor

If you would like to author an interceptor to update the local root span in a trace, you can use LocalRootSpan.current() from the opentelemetry-instrumentation-api library to access the local root span. Using Span.current() will not work, because it will return the span that is created for interceptor method invocation.

8.2.8.9Batch Job Spans

When processing a batch job, spans named hapifhir.batch_job.execute are generated by the worker threads. These spans have the following span attributes related to the batch job:

hapifhir.batch_job.definition_id: The name of the job, such as BULK_EXPORT, REINDEX.
hapifhir.batch_job.definition_version: The job definition version.
hapifhir.batch_job.instance_id: The job id.
hapifhir.batch_job.step_id: The name of the step being executed.
hapifhir.batch_job.chunk_id: The id of the work chunk being processed. This is not applicable to reduction steps.

8.2.8.10Storage Metrics

8.2.8.10.1Duration

The following histogram metric is generated for the duration of storage operations:

smilecdr.storage.duration

The metric measures the overall invocation count, and the latency of the system-level and resource-level FHIR interactions. The metric has smilecdr.module_id as an attribute showing the id of the storage module the measurements are for.

8.2.8.10.2Resource Creations and Updates

The following metrics are generated for resource creations and updates:

smilecdr.storage.created_resources_count
smilecdr.storage.updated_resources_count

Both metrics have smilecdr.storage.resource_type as an attribute so that the counts are available per FHIR resource type.

8.2.8.10.3FHIR Searches

The number of current FHIR searches can be obtained by the following metric.

smilecdr.storage.fhir_searches_active.count

It includes both the currently running searches, and the completed searches with results currently available in the database cache. The metric has smilecdr.module_id as an attribute to identify the persistence module emitting the metric.

8.2.8.11ETL Import Metrics

8.2.8.11.1CSV Row Import Duration

The following histogram metric captures the duration for each successfully processed entry in a CSV file processed as part of an Extract Transform and Load operation.

smilecdr.etl_import.csv_row_import.duration

The metric can be used to monitor the total number of CSV entries successfully processed along with mapping execution time as a histogram. The metric provides the module id of the ETL import module as an attribute named smilecdr.module_id.

8.2.8.12Authentication Metrics

The following OpenTelemetry metrics capture the number of successful and failed authentication attempts:

smilecdr.authentication.success_count,
smilecdr.authentication.failure_count

The metrics have smilecdr.module_id as an attribute to identify the security module emitting the metric. Note, this is the id of the security module that handled the authentication (like an inbound security module), rather than the module that the request requiring authentication was submitted to (like a FHIR endpoint module).

8.1 Externalized Metrics 8.3 Performance Tuning

8.2.1OpenTelemetry IntegrationTrial

8.2.2Observability BackendsTrial

8.2.3Enabling OpenTelemetry Instrumentation in Smile CDRTrial

8.2.4Agent ConfigurationTrial

8.2.5Enabling/Disabling Exported DataTrial

8.2.6Correlating Logs and TracesTrial

8.2.7Vendor specific OpenTelemetry tools and agentsTrial

8.2.8Custom Telemetry Data Provided by Smile CDRTrial

8.2.8.4.1Traces

8.2.8.4.2Metrics

8.2.8.5.1Traces

8.2.8.5.2Metrics

8.2.8.8.1Accessing the Local Root Span from an Interceptor

8.2.8.10.1Duration

8.2.8.10.2Resource Creations and Updates

8.2.8.10.3FHIR Searches

8.2.8.11.1CSV Row Import Duration

8.2.1OpenTelemetry Integration
Trial

8.2.2Observability Backends
Trial

8.2.3Enabling OpenTelemetry Instrumentation in Smile CDR
Trial

8.2.4Agent Configuration
Trial

8.2.5Enabling/Disabling Exported Data
Trial

8.2.6Correlating Logs and Traces
Trial

8.2.7Vendor specific OpenTelemetry tools and agents
Trial

8.2.8Custom Telemetry Data Provided by Smile CDR
Trial