27.3.1System Logging

 

Smile CDR uses the Logback logging framework to emit system logs. System logs are generated at runtime by all components of the CDR, and they can be tweaked based on individual requirements.

Note that this section refers only to system logs, which are separate from the transaction logs (message/transaction payloads that have been received or sent) and the audit logs (activity performed by a given user).

By default, the system will produce the following log files:

FilePurposeRetentionPHI
log/smile-startup.log This file contains information emitted during startup and shutdown of the system Logs are rotated and compressed when they reach 5 Mb in size, and 9 historical files are kept These logs do not contain personal health information (PHI).
log/smile.log This file contains general runtime information about what the system is doing, and can be useful in diagnosing runtime issues Logs are rotated and compressed on a daily basis. By default logs are kept for 30 days, although this can be configured These logs do not contain PHI in their default configuration. See note below.
log/smile-error.log This file contains any errors generated by the system (these error messages may also appear in other logs but are gathered here for convenience when troubleshooting) Logs are rotated and compressed on a daily basis. By default logs are kept for 30 days, although this can be configured Error logs do not contain PHI.
log/smile-troubleshooting.log This file contains all troubleshooting log messages. Logs are rotated and compressed on a daily basis. By default logs are kept for 30 days, although this can be configured Troubleshooting logs at DEBUG or finer may contain sensitive information including PHI
log/smile-dlq-failures.log This file contains the message bodies of messages that suffered total delivery failure and could not be delivered to the dead-letter queue (DLQ). The are logged here to support manual recovery from extreme infrastructure failure. This file will normally not exist, or be empty. Logs are rotated and compressed on a daily basis. By default logs are kept for 30 days, although this can be configured DLQ failure messages may contain raw clinical resources or other sensitive data, and may contain PHI.
log/smile-phi-redacted.log This file contains the original log message when the Smile-PHI filter redacts a log message sent to `smile.log` or the console. This file is normally be empty, or absent. Logs are rotated and compressed on a daily basis. By default logs are kept for 30 days, although this can be configured PHI-redaction messages may contain inadvertent leaks of sensitive information like debug logging from javascript customizations.

27.3.2Changing Logging Settings

 

Logging configuration may be modified in the file customerlib/logback-smile-custom.xml. It will only contain customer configuration. It starts with no active configuration, but does contain comments with examples for configuring docker or troubleshooting logs. This file is optional and can be replaced completely to allow configuration via overlay.

The logback debug context includes two properties: moduleId and requestId. The moduleId is set to the running module, and the requestId is set during request processing. See the Logback Documentation for information on advanced settings that are possible. The moduleId debug property can be used with the logback MDCFilter or the GEventEvaluator filters to define a new appender which only writes messages from a single module.

Note that the server polls all log configuration files including customerlib/logback-smile-custom.xml every 30 seconds by default, and will apply any changes; no restart is required. This can be configured in classes/logback.xml with the following configuration:

<configuration scan="true" scanPeriod="30 seconds">...</configuration>

27.3.3Protected Health Information (PHI) in Logs

 

Smile does not log PHI to the system logs in the default configuration. By default, Smile logs messages at the INFO, WARN, or ERROR levels. Logs at the DEBUG or TRACE levels may contain complete messages (e.g. message queue contents, HTTP requests, etc). The smile-error.log is filtered to only include messages at ERROR level or higher, and should never contain PHI. During troubleshooting, you may need to enable logs at the DEBUG or TRACE levels. In this case, these debug logs may contain PHI.

27.3.4JVM Stats Logging

 

In order to help with issue troubleshooting, a utility called JvmStatsWatcher will periodically emit a summary of key JVM/environment statistics to the system log.

This utility emits lines resembling the following:

10:20:36.813 [jvmStatsWatcher] INFO  M: R: c.c.clustermgr.util.JvmStatsWatcher - Runtime health:
  JvmMaxMemory[6 GB] JvmUsedMemory[1.7%] GcTime[0.0%] AvailCpu[8] CpuSystemAvg[25.6%] CpuJvmAvg[16.4%] FileHandles[5.6% 572/10240] Uptime[00:01:00]
  Http2xxAvg[1m=12/min 5m=5/min 15m=0/min] HttpFailAvg[none] FhirStorageSuccessAvg[1m=2/min 5m=266/min 15m=89/min] FhirStorageFailAvg[1m=2/min 5m=1/min 15m=0/min]

The following logic is used to control the logging:

  • If any metrics are in alarm status, a WARN level log line is written.
  • If the system has been running for less than 10 mins, an INFO level log line is written once per minute.
  • If the system has been running for more than 10 mins, an INFO level log line is written once per hour and at DEBUG level once per minute.

The log contains the following information:

Name Description Example Alarm Threshold
JvmMaxMemory This is the maximum amount of memory that the JVM is configured to be allowed to use. See Memory Settings for information on tuning this value. 6 GB  
JvmUsedMemory This is the percentage of the *JvmMaxMemory* threshold that has actually been consumed by the system. 1.7% > 75.0%
GcTime This is the percentage of the time that the JVM Garbage Collector has been actively working to free up memory. If this number is significant, you may need to increase your available memory. 4.5% > 50.0%
AvailCpu This is the number of processors that the operating system reports as available to the JVM. 8  
CpuSystemAvg This is the percentage of available processor time that is reported as busy, including all processes. 22.0%  
CpuJvmAvg This is the percentage of available processor time that is reported as busy, including only the Smile CDR JVM process and excluding any other processes. 22.0%  
FileHandles This is the number of used and available file handles as reported by the operating system, reported both as a percentage and as raw counts. 5.6 % 572/10240 > 75.0%
Uptime This is the amount of time elapsed since Smile CDR was started. 1 day  
Http2xxAvg This is the average number of HTTP request/responses which returned a successful (typically HTTP 200, but also includes codes 100-399) response across the entire process. This includes FHIR requests on any FHIR Endpoint modules, but also includes HTTP requests to other kinds of HTTP modules including JSON Admin API, SMART Outbound Security, etc. The displayed values contain the number of requestes in the last minute, but also the average requests per minute for the last 5 and 15 minutes. 1m=2/min 5m=1/min 15m=0/min
If no traffic has been reported in the last 15 minutes, the value will be reported as the string "none".
 
HttpFailAvg This is the average number of HTTP request/responses which returned a non-successful (HTTP 400-599) response across the entire process. Metric uses the same sources and format as the Http2xxAvg metric above. This metric uses the same output format as the Http2xxAvg metric above.  
FhirStorageSuccessAvg This is the average number of FHIR storage events in any FHIR Storage modules running across the process. This includes user-initiated actions such as FHIR Endpoint actions, but also includes internal action such as SearchParameter seeding and cache refreshes, $reindex and other batch operations, etc. The reported counts should be used as a rough measure of the amount of activity on the process, and should not be relied on for exact counts. This metric uses the same output format as the Http2xxAvg metric above.  
FhirStorageFailAvg This is the average number of unsuccessful FHIR storage events in any FHIR Storage modules running across the process. Unsuccessful events include any action which would not result in a successful HTTP status code. This metric uses the same output format as the Http2xxAvg metric above.  

This logging can be controlled by modifying the log configuration for the appender ca.cdr.clustermgr.util.JvmStatsWatcher.