Smile CDR uses the Logback logging framework to emit system logs. System logs are generated at runtime by all components of the CDR, and they can be tweaked based on individual requirements.
Note that this section refers only to system logs, which are separate from the transaction logs (message/transaction payloads that have been received or sent) and the audit logs (activity performed by a given user).
By default, the system will produce the following log files:
File | Purpose | Retention | PHI |
---|---|---|---|
log/smile-startup.log |
This file contains information emitted during startup and shutdown of the system | Logs are rotated and compressed when they reach 5 Mb in size, and 9 historical files are kept | These logs do not contain personal health information (PHI). |
log/smile.log |
This file contains general runtime information about what the system is doing, and can be useful in diagnosing runtime issues | Logs are rotated and compressed on a daily basis. By default logs are kept for 30 days, although this can be configured | These logs do not contain PHI in their default configuration. See note below. |
log/smile-error.log |
This file contains any errors generated by the system (these error messages may also appear in other logs but are gathered here for convenience when troubleshooting) | Logs are rotated and compressed on a daily basis. By default logs are kept for 30 days, although this can be configured | Error logs do not contain PHI. |
log/smile-troubleshooting.log |
This file contains all troubleshooting log messages. | Logs are rotated and compressed on a daily basis. By default logs are kept for 30 days, although this can be configured | Troubleshooting logs at DEBUG or finer may contain sensitive information including PHI |
log/smile-dlq-failures.log |
This file contains the message bodies of messages that suffered total delivery failure and could not be delivered to the dead-letter queue (DLQ). The are logged here to support manual recovery from extreme infrastructure failure. This file will normally not exist, or be empty. | Logs are rotated and compressed on a daily basis. By default logs are kept for 30 days, although this can be configured | DLQ failure messages may contain raw clinical resources or other sensitive data, and may contain PHI. |
log/smile-phi-redacted.log |
This file contains the original log message when the Smile-PHI filter redacts a log message sent to `smile.log` or the console. This file is normally be empty, or absent. | Logs are rotated and compressed on a daily basis. By default logs are kept for 30 days, although this can be configured | PHI-redaction messages may contain inadvertent leaks of sensitive information like debug logging from javascript customizations. |
Logging configuration may be modified in the file customerlib/logback-smile-custom.xml
. It will only contain customer configuration.
It starts with no active configuration, but does contain comments with examples for configuring docker or troubleshooting logs.
This file is optional and can be replaced completely to allow configuration via overlay.
The logback debug context includes two properties: moduleId
and requestId
.
The moduleId
is set to the running module, and the requestId
is set during request processing.
See the Logback Documentation for information on advanced settings that are possible.
The moduleId
debug property can be used with the logback MDCFilter or the GEventEvaluator filters to define a new appender which only writes messages from a single module.
Note that the server polls all log configuration files including customerlib/logback-smile-custom.xml
every 30 seconds by default, and will apply any changes; no restart is required.
This can be configured in classes/logback.xml
with the following configuration:
<configuration scan="true" scanPeriod="30 seconds">...</configuration>
Smile does not log PHI to the system logs in the default configuration.
By default, Smile logs messages at the INFO, WARN, or ERROR levels.
Logs at the DEBUG or TRACE levels may contain complete messages (e.g. message queue contents, HTTP requests, etc).
The smile-error.log
is filtered to only include messages at ERROR level or higher, and should never contain PHI.
During troubleshooting, you may need to enable logs at the DEBUG or TRACE levels.
In this case, these debug logs may contain PHI.
In order to help with issue troubleshooting, a utility called JvmStatsWatcher will periodically emit a summary of key JVM/environment statistics to the system log.
This utility emits lines resembling the following:
10:20:36.813 [jvmStatsWatcher] INFO M: R: c.c.clustermgr.util.JvmStatsWatcher - Runtime health:
JvmMaxMemory[6 GB] JvmUsedMemory[1.7%] GcTime[0.0%] AvailCpu[8] CpuSystemAvg[25.6%] CpuJvmAvg[16.4%] FileHandles[5.6% 572/10240] Uptime[00:01:00]
Http2xxAvg[1m=12/min 5m=5/min 15m=0/min] HttpFailAvg[none] FhirStorageSuccessAvg[1m=2/min 5m=266/min 15m=89/min] FhirStorageFailAvg[1m=2/min 5m=1/min 15m=0/min]
The following logic is used to control the logging:
The log contains the following information:
Name | Description | Example | Alarm Threshold |
---|---|---|---|
JvmMaxMemory | This is the maximum amount of memory that the JVM is configured to be allowed to use. See Memory Settings for information on tuning this value. | 6 GB |
|
JvmUsedMemory | This is the percentage of the *JvmMaxMemory* threshold that has actually been consumed by the system. | 1.7% |
> 75.0% |
GcTime | This is the percentage of the time that the JVM Garbage Collector has been actively working to free up memory. If this number is significant, you may need to increase your available memory. | 4.5% |
> 50.0% |
AvailCpu | This is the number of processors that the operating system reports as available to the JVM. | 8 |
|
CpuSystemAvg | This is the percentage of available processor time that is reported as busy, including all processes. | 22.0% |
|
CpuJvmAvg | This is the percentage of available processor time that is reported as busy, including only the Smile CDR JVM process and excluding any other processes. | 22.0% |
|
FileHandles | This is the number of used and available file handles as reported by the operating system, reported both as a percentage and as raw counts. | 5.6 % 572/10240 |
> 75.0% |
Uptime | This is the amount of time elapsed since Smile CDR was started. | 1 day |
|
Http2xxAvg | This is the average number of HTTP request/responses which returned a successful (typically HTTP 200, but also includes codes 100-399) response across the entire process. This includes FHIR requests on any FHIR Endpoint modules, but also includes HTTP requests to other kinds of HTTP modules including JSON Admin API, SMART Outbound Security, etc. The displayed values contain the number of requestes in the last minute, but also the average requests per minute for the last 5 and 15 minutes. |
1m=2/min 5m=1/min 15m=0/min
If no traffic has been reported in the last 15 minutes, the value will be reported as the string "none". |
|
HttpFailAvg | This is the average number of HTTP request/responses which returned a non-successful (HTTP 400-599) response across the entire process. Metric uses the same sources and format as the Http2xxAvg metric above. | This metric uses the same output format as the Http2xxAvg metric above. | |
FhirStorageSuccessAvg | This is the average number of FHIR storage events in any FHIR Storage modules running across the process. This includes user-initiated actions such as FHIR Endpoint actions, but also includes internal action such as SearchParameter seeding and cache refreshes, $reindex and other batch operations, etc. The reported counts should be used as a rough measure of the amount of activity on the process, and should not be relied on for exact counts. | This metric uses the same output format as the Http2xxAvg metric above. | |
FhirStorageFailAvg | This is the average number of unsuccessful FHIR storage events in any FHIR Storage modules running across the process. Unsuccessful events include any action which would not result in a successful HTTP status code. | This metric uses the same output format as the Http2xxAvg metric above. |
This logging can be controlled by modifying the log configuration for the appender ca.cdr.clustermgr.util.JvmStatsWatcher
.