Message Broker Failure Management
Several Smile CDR modules make use of message brokers, acting as consumers. For example, the Channel Import module consumes messages from a broker.
In the normal course of operation, a consumer will be ingesting messages from one or more brokers, and performing whatever processing is necessary internally. However, it is possible that messages may be dropped or silently lost when something goes wrong during that processing.
To mitigate this issue, Smile CDR provides facilities by which any module that requires a channel can also specify a retry channel, a failed channel, and a collection of other settings to control retry behaviour. Any module that requires an intake channel has the ability to also specify the following settings:
module.module_name.config.channel.retry.name =my-retry-channel
module.module_name.config.channel.retry.delay_milliseconds =5000
module.module_name.config.channel.retry.maximum_delay_milliseconds =6000
module.module_name.config.channel.retry.maximum_attempts =3
module.module_name.config.channel.retry.strategy =CONSTANT
module.module_name.config.channel.retry.retriable_exceptions =ca.uhn.fhir.rest.server.exceptions.InvalidRequestException
module.module_name.config.channel.failed.name =my-failure-channel
Setting these properties in a module will cause Smile CDR to automatically wrap any message handling in a retry mechanism following the rules outlined in the properties. Note that all the configuration settings above must be set (i.e. no blank, null or zero values) for the retry mechanism to be enabled. Below is a rough explanation of how the retry mechanism works.
channel.retry.retriable_exceptions
, the message handler will set headers on the message indicating retry count, first failure timestamp, and last failure timestamp. The message then moves onto the channel defined in channel.retry.name
.channel.retry.delay_milliseconds
is the minimum amount of milliseconds between attempts.channel.retry.strategy
determines whether the backoff is exponential or constant. If the delay is 5000ms, a constant backoff would retry every 5 seconds, whereas an exponential backoff would try at 5 seconds, then 10, then 20, and so on.channel.retry.maximum_delay_milliseconds
provides an upper bound for delay in case of exponential growth.channel.retry.maximum_attempts
, then the message handler will publish it to the failed channel as defined in channel.failed.name
. The message handler will also add the unhandled exception to the headers of the message. Smile CDR does not consume messages off of this channel, those messages should be consumed by an external reporting system, or potentially a dead-letter consumer.