OpenTelemetry is a framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs. Starting with the 2024.02 release, support for instrumentation with the OpenTelemetry agent is added to Smile CDR. This makes Smile CDR generate telemetry data that can be used to monitor Smile CDR's performance.
Currently, Smile CDR generates traces and metrics using the OpenTelemetry Java agent. This feature is currently in trial phase and is basically an auto-instrumentation with some minimal Smile CDR related customizations. The auto-instrumentation with the OpenTelemetry Java agent generates telemetry data for the libraries and frameworks that the agent has built-in support for. These include many libraries and frameworks used or supported by Smile CDR as well.
As this integration is still in trial, the details of the data generated are subject to change. These details include the names (such as trace spans and metric names), the trace structure and attributes exposed by trace spans and metrics. Feedback for this feature is welcomed.
To consume OpenTelemetry data generated by Smile CDR, you need observability backends. An observability backend collects, persists the telemetry data and makes it available for monitoring (querying, visualizing and alerting if system is not performing as desired). There are many open source and commercial backends supporting OpenTelemetry. Smile CDR does not recommend any particular observability backend because the choice may depend on your needs, preferences and deployment environment. Since OpenTelemetry is an open standard, you should be able to work with any backend that supports OpenTelemetry.
Some example observability backends and environments that support OpenTelemetry are:
We also have a very basic otel-backend-starter project for learning purposes. This project provides a docker-compose setup to run Jaeger, Prometheus, and OpenTelemetry Collector locally.
To enable OpenTelemetry instrumentation in Smile CDR, you need to set an environment variable called CDR_OTEL
when running Smile CDR.
For example,
CDR_OTEL=y bin/smilecdr start
The value of the CDR_OTEL
variable has no significance, the instrumentation is enabled as long as the variable is set.
When this variable is set, the Smile CDR process is auto-instrumented with the OpenTelemetry Java agent. A version of the OpenTelemetry Java agent is bundled with the Smile CDR release so there is no need to download it separately.
By default, the java agent is configured using the following properties file.
otel.service.name=smilecdr
# Enable Smile Extension to Java Agent.
# The extension customizes auto-instrumentation with Smile specific attributes,
# such as adding smilecdr.moduleId to http server metrics.
otel.javaagent.extensions=./otel/bin/cdr-otel-agent-extension.jar
# Send OTEL logging to slf4j so that javaagent logs gets logged in smile.log
otel.javaagent.logging=application
# Capture X-Request-ID as a http span attribute
otel.instrumentation.http.server.capture-response-headers=X-Request-ID
# Use stable semantic conventions for http spans
otel.semconv-stability.opt-in=http
# Disable exporting logs by default, you can set environment variable OTEL_LOGS_EXPORTER to 'otlp' to enable it
otel.logs.exporter=none
# Logback appender related options below take effect only if exporting logs are enabled.
# Enable the capture of experimental log attributes 'thread.name' and 'thread.id'
otel.instrumentation.logback-appender.experimental-log-attributes=true
# add all mdc attributes to exported log record, these include 'requestId' and 'moduleId'
otel.instrumentation.logback-appender.experimental.capture-mdc-attributes=*
# File that contains rules for extracting metrics from JMX MBeans
otel.jmx.config=./otel/etc/jmx_rules.yaml
If you would like to use your own agent configuration file instead of this default configuration, you need to set the
OTEL_JAVAAGENT_CONFIGURATION_FILE
environment variable to specify the path to your agent configuration file. For example,
CDR_OTEL=y OTEL_JAVAAGENT_CONFIGURATION_FILE=<path_to_your_agent_config_file> bin/smilecdr start
Alternatively, you can override or set individual configuration options for the agent using other OpenTelemetry Java environment variables.
By default, exporting logs is disabled, whereas exporting traces and metrics are enabled.
To enable exporting logs from Smile CDR directly, set the OTEL_LOGS_EXPORTER
environment variable to otlp
when running Smile CDR, in addition to setting CDR_OTEL
.
For example,
CDR_OTEL=y OTEL_LOGS_EXPORTER=otlp bin/smilecdr start
To disable trace or metric exporters, set OTEL_TRACES_EXPORTER
or OTEL_METRICS_EXPORTER
environment variables to none
, respectively.
If exporting logs via the agent is enabled, the agent also exports current trace id and span id as part of the log record. These ids are also available in the Smile system logs. Current trace_id
and span_id
appear on the system log lines with T:
and S:
prefixes, respectively.
Some cloud vendors, such AWS and Azure, provide their own distributions of tools for OpenTelemetry. With such cloud vendors, there are 2 general approaches you can take:
The first approach is to use the OpenTelemetry Java agent bundled with Smile CDR, and use and configure OpenTelemetry Collector to convert the data to vendor specific format. If you follow this approach you need to run Smile CDR with the CDR_OTEL
environment variable set as explained in the previous section so that Smile CDR is instrumented with the Java agent.
The second approach is to use the OpenTelemetry Java agent distribution provided by a vendor, if there is one. In this approach when running Smile CDR you do not set the CDR_OTEL
environment variable but instead set JAVA_TOOL_OPTIONS
environment variable to instrument the Smile CDR process.
Both of these approaches are explained in detail next for AWS and Azure.
AWS provides its own distribution of the OpenTelemetry Java agent and collector.
You can use the AWS Distro for OpenTelemetry Collector to export telemetry data in AWS formats that can be consumed by AWS CloudWatch and AWS X-Ray.
For this to work, you run Smile CDR with the CDR_OTEL
environment variable set and configure AWS Distro for OpenTelemetry Collector to export data in AWS formats. You can see some examples for AWS OpenTelemetry Collector Configurations in the AWS Observability repo.
You may also decide to use the AWS Distro for the Java agent instead of the Java agent bundled with SmileCDR. For this,
when running SmileCDR, do not set the CDR_OTEL
environment variable, but instead set the JAVA_TOOL_OPTIONS
environment variable to instrument Smile CDR.
For example,
JAVA_TOOL_OPTIONS=-javaagent:<path-to-aws-otel-java-agent-jar> OTEL_SERVICE_NAME=smilecdr bin/smilecdr start
You would still need to use AWS Distro for OpenTelemetry Collector to be able to convert telemetry data to AWS specific formats to consume them in AWS Cloudwatch and AWS X-Ray.
Azure provides its own Application Insights OpenTelemetry Java agent, and there is a community provided Azure Monitor Exporter to be used with OpenTelemetry Collector.
If you decide to use the Azure Monitor Exporter for the OpenTelemetry Collector then you need to configure your OpenTelemetry Collector to export to azuremonitor
as explained in its readme,
and run Smile CDR with CDR_OTEL
environment variable set.
Alternatively, you may decide to use the Application Insights Java agent directly, instead of the Java agent bundled with Smile CDR. In this approach you do not use the OpenTelemetry Collector, and when running Smile CDR, do not set the CDR_OTEL
environment variable, but instead set the JAVA_TOOL_OPTIONS
environment variable to instrument Smile CDR.
You also need to configure Application Insights Agent according to the instructions provided by Azure.
For example,
JAVA_TOOL_OPTIONS=-javaagent:<path-to-azure-application-insights-agent-jar> bin/smilecdr start
while having a applicationinsights.json
configuration file in the same directory as the Application Insights agent jar, with a content similar to following:
{
"connectionString":"<your_connection_string>",
"role": {
"name": "smilecdr"
}
}
The following additional attributes are added to the root span in a FHIR Endpoint HTTP trace:
smilecdr.fhir_endpoint.request_id
: The request id. The request id is also available through
http.response.header.x-request-id
attribute as the default agent configuration instructs agent to capture it from the response header. The difference between the two is
smilecdr.fhir_endpoint.request_id
is a string valued attribute whereas http.response.header.x-request-id
is an array valued attribute. The string valued version is added because it is easier to search for when using backends that do not support searching array valued attributes yet.smilecdr.fhir_endpoint.tenant_id
: This will be present if partitioning is enabled, and indicates the id of the tenant.smilecdr.fhir_endpoint.username
: The name of the user making the request. This attribute is not present if there
is no user involved, for example when using Client Credentials Authorization flow of OIDC, which is a system flow.smilecdr.fhir_endpoint.oidc_client_id
: This attribute is the OIDC client id and present only when using Smart Auth
and OIDC clients are managed by Smile CDR.smilecdr.fhir_endpoint.restful_interaction_code
: The interaction code for the request, e.g. read
,
vread
, transaction
etc.smilecdr.fhir_endpoint.request.path.operation_name
: If the request is a FHIR extended operation,
this attribute is present, and is the name of the operation, e.g. $everything
, $meta
etc.smilecdr.fhir_endpoint.request.path.resource_type
: The resource type from the request path, e.g. for a request path
Patient/1234
, the value is Patient
.smilecdr.fhir_endpoint.request.path.logical_id
: The logical id from the request path, e.g. for a request path
Patient/1234
, the value is 1234
.smilecdr.fhir_endpoint.response.resource_type
: If a successful request returns a FHIR resource as a response,
this attribute is present and is the type of that resource.smilecdr.fhir_endpoint.response.resource_logical_id
: If a successful request returns a FHIR resource that has an id in the response, this attribute is present, and it is the logical id of that resource. This is useful for requests that create a resource with a server-generated id (such as a POST request that creates a resource).Note: These additional FHIR span attributes (except for the http.response.header.x-request-id
) are not available for unauthenticated requests (i.e. the requests that result in a 401 HTTP status code) as they are currently added after a request is authenticated.
If you would like to implement an interceptor to add your own span attributes to the root span in a trace, see accessing the local root span from an interceptor.
The following http thread pool related metrics are available for modules using an HTTP endpoint:
smilecdr.http.server.thread.count
: The current number of threads. The metric has
smilecdr.http.server.thread.state
as an attribute to identify the state of the thread, which can be busy
or idle
.smilecdr.http.server.thread.max
: The maximum number of threads in the pool.smilecdr.http.server.thread.queue_size
: The number of jobs in the queue waiting for a thread.Each of these metrics has smilecdr.module_id
as an attribute to identify the module emitting the metric.
The following metric is available for modules that support web sessions:
smilecdr.session.count
: The current number of sessions.The metric has smilecdr.module_id
as an attribute to identify the module emitting the metric.
For HL7 v2.x inbound messages ingested by Smile CDR through HL7 v2.x endpoint, the following additional telemetry data are available.
A parent span named smilecdr.hl7v2.inbound_message.process
is generated with the following additional span attributes that contain details of the message that is processed:
smilecdr.hl7v2.inbound_message.type
(the type of the hl7 v2.x message , e.g. ADT_A01)smilecdr.hl7v2.inbound_message.version
(the version of the hl7 v2.x message, e.g. 2.5)smilecdr.hl7v2.inbound_message.control_id
(the control id of the hl7 v2.x message)This parent span also captures any conversion issues, that are added to the conversion result during hl7 v2.x to FHIR conversion, as span events with the following attributes:
smilecdr.hl7v2.inbound_message.conversion_issue.level
(the severity of the conversion issue)smilecdr.hl7v2.inbound_message.conversion_issue.message
(the description of the conversion issue)smilecdr.hl7v2.inbound_message.conversion_issue.path
(the location of the conversion issue)The following two metrics are generated for counting received and failed messages:
smilecdr.hl7v2.inbound_message.count
(a counter that is incremented for each hl7 v2.x message received for processing)smilecdr.hl7v2.inbound_message.error_count
(a counter that is incremented for each hl7 v2.x message failed to be processed)Both of these metrics has smilecdr.hl7v2.inbound_message.type
and smilecdr.hl7v2.inbound_message.version
as attributes so that the counts are available per message type+version pair.
The following metric is generated for monitoring the duration of successfully processed hl7 v2.x messages as a histogram:
smilecdr.hl7v2.inbound_message.duration
The following metric is generated for counting the FHIR resources included in the FHIR transaction bundles generated by HL7 v2.x to FHIR conversions.
smilecdr.hl7v2.inbound_message.conversion_resources_count
(the number of FHIR resources in the transaction bundles generated by HL7 v2.x to FHIR conversions)Note, this metric is not the actual resource counts that are persisted. This metric is published after the conversion but before the transaction is processed. A resource in a transaction bundle may not be persisted if it is a conditional update/create or if transaction fails. For actual persisted resource counts, see Storage Metrics. The metric takes into account the resources that would be created or updated (i.e. the bundle entries with "PATCH", "PUT", or "POST" HTTP verbs are included, whereas "DELETE" operations are ignored). Also, any "contained resources" are not counted.
This metric has smilecdr.hl7v2.inbound_message.type
, smilecdr.hl7v2.inbound_message.version
, and
smilecdr.hl7v2.inbound_message.conversion_resource_type
as attributes, which allow to get counts per message type,
version and FHIR resource type.
When importing CDA documents and exporting FHIR resources with the CDA Exchange+ module the following telemetry data is available.
These are the parent spans that are used when processing CDA Exchange+ documents:
smilecdr.cdaexchange.cda_to_fhir.process
: This span is used during import of CDA documents.smilecdr.cdaexchange.fhir_to_cda.process
: This is the span used during export of FHIR resources.During import of CDA documents the following metrics are generated:
smilecdr.cdaexchange.cda_to_fhir.document_count
: This counter is incremented every time a CDA document is received.smilecdr.cdaexchange.cda_to_fhir.document_error_count
: This counter is incremented any time an error is encountered during import of a CDA document.smilecdr.cdaexchange.cda_to_fhir.conversion_resource_count
: This is the number of FHIR resources contained in the bundle generated by the CDA import process.smilecdr.cdaexchange.cda_to_fhir.parse_success_count
: This counter is incremented when a CDA document XML is parsed successfully.smilecdr.cdaexchange.cda_to_fhir.parse_failure_count
: This counter increments whenever an XML parse error occurs while processing a CDA document.During export of FHIR resources the following metrics are generated:
smilecdr.cdaexchange.fhir_to_cda.document_count
: This counter increments whenever a CDA document is exported.smilecdr.cdaexchange.fhir_to_cda.conversion_resource_count
: This is the number of FHIR resources that are contained in the bundle used to generate the CDA document.Traces for Camel Routes are generated without requiring any additional configuration. For Smile Component processors, the base URI of the processor is used as the span name. That is, each Camel processor span for the Smile Component is named in the following format smile://[moduleId]/[processorName]
.
An OpenTelemetry span is generated for any JavaScript callback execution. Such spans are named as
smilecdr.javascript_callback
. The name of the JavaScript callback function is available as a span attribute named
smilecdr.javascript_callback.function_name
.
An OpenTelemetry span is generated for any interceptor method execution. Such spans are named as
hapifhir.interceptor
. The name of the pointcut, the interceptor class name and the interceptor method name are available as the following span attributes:
hapifhir.interceptor.pointcut_name
hapifhir.interceptor.class_name
hapifhir.interceptor.method_name
If you would like to author an interceptor to update the local root span in a trace, you can use
LocalRootSpan.current()
from the opentelemetry-instrumentation-api
library to access the local root span.
Using Span.current()
will not work, because it will return the span that is created for interceptor method invocation.
When processing a batch job, spans named hapifhir.batch_job.execute
are generated by the worker threads.
These spans have the following span attributes related to the batch job:
hapifhir.batch_job.definition_id
: The name of the job, such as BULK_EXPORT
, REINDEX
.hapifhir.batch_job.definition_version
: The job definition version.hapifhir.batch_job.instance_id
: The job id.hapifhir.batch_job.step_id
: The name of the step being executed.hapifhir.batch_job.chunk_id
: The id of the work chunk being processed. This is not applicable to reduction steps.The following histogram metric is generated for the duration of storage operations:
smilecdr.storage.duration
The metric measures the overall invocation count, and the latency of the system-level and resource-level FHIR interactions. The metric has smilecdr.module_id
as an attribute showing the id of the storage module the measurements are for.
The following metrics are generated for resource creations and updates:
smilecdr.storage.created_resources_count
smilecdr.storage.updated_resources_count
Both metrics have smilecdr.storage.resource_type
as an attribute so that the counts are available per FHIR resource type.
The number of current FHIR searches can be obtained by the following metric.
smilecdr.storage.fhir_searches_active.count
It includes both the currently running searches, and the completed searches with results currently available in the database cache. The metric has smilecdr.module_id
as an attribute to identify the persistence module emitting the metric.
The following histogram metric captures the duration for each successfully processed entry in a CSV file processed as part of an Extract Transform and Load operation.
smilecdr.etl_import.csv_row_import.duration
The metric can be used to monitor the total number of CSV entries successfully processed along with mapping execution time as a histogram. The metric provides the module id of the ETL import module as an attribute named
smilecdr.module_id
.
The following OpenTelemetry metrics capture the number of successful and failed authentication attempts:
smilecdr.authentication.success_count
,smilecdr.authentication.failure_count
The metrics have smilecdr.module_id
as an attribute to identify the security module emitting the metric. Note,
this is the id of the security module that handled the authentication (like an inbound security module), rather than the module that the request requiring authentication was submitted to (like a FHIR endpoint module).