26.0.1Designing a Cluster

 

Smile CDR is designed to be clustered in horizontal clusters of any size. This means that you can add an arbitrary number of servers to your installation, and they can be used to share the load of incoming requests.

The built-in clustering capability is designed to be flexible. You can build active/active clusters, active/passive clusters, or any combination of the two in order to meet your specific needs.

All components in Smile CDR are designed to be capable of operating without keeping any local state within a single server. This means that a deployment can grow to a very large number of servers as needed. This design also means that nodes can be added and removed from the cluster at any time (i.e. without requiring a restart of the entire cluster).

26.0.2Node and Module Design

 

The general approach in designing a cluster is to create one or more nodes with distinct IDs that will act as "templates" for configuration.

For example, suppose you create a node called FHIR_Support on server HOST1.acme.org which is configured with:

You can now start up as many Smile CDR processes with the same Node ID as you wish. It will use the same settings when it starts. This means that it will connect to the same backing data store, and it will also listen on port 8000.

The two listeners you have created are both able to handle parallel requests. You can now place a network switch (or load balancer, failover device, reverse proxy, etc.) in front of these two ports, and your requests will be served by both nodes (or by the active node, depending on the configuration).

This same strategy applies to all types of modules that can be created within Smile CDR. Security modules will seamlessly share sessions across all clones, Web and JSON admin APIs will expose their ports and service requests across each node, etc.

26.0.2.1Key Concepts

The following list of terms are the key concepts in Smile CDR clustering.

  • Node: A node is a single group of modules that have their own configuration that is inherited by all Node Processes that are started for the given node.
  • Node ID: A unique identifier for the given node. It should consist only of US ASCII letters, numbers, and the following characters: _-. (i.e. spaces are not allowed). The maximum length is 30 characters.
  • Node Process: An individual Smile CDR process. Each Smile CDR process is designated as a being a process for a specific node, using the ID of that node.
  • Process ID: When a process starts up, it will automatically assign itself an ID. See Process IDs below for more information.
  • Module: A module is a single functional unit within a node. Each node in a cluster will have many modules defined, and all processes for that node will share the same list of modules and the same configuration for each of these modules.
  • Module ID: Each module in a node is uniquely identified by its ID. The module ID is user supplied, and follows the same naming rules as the Node ID above.

26.0.2.2Database Clustering

The clustering capabilities of Smile CDR rely heavily on having access to a clustered underlying database instance. Setting up a cluster of your chosen database platform (PostgreSQL, Oracle, etc.) is beyond the scope of this documentation but Smile CDR does expect the chosen cluster configuration to be globally consistent.

26.0.2.3Lucene Clustering

The Smile CDR FHIR Storage modules may be configured to use Apache Lucene for providing indexing, which is used for certain types of queries. See Lucene Indexing for more information now how this is configured.

If Smile CDR will be used in a cluster (i.e. multiple processes will be created for a single node), ElasticSearch based clustering must be used. Using Lucene in Memory or Disk mode may cause inconsistent results, as indexes are not propagated across the cluster.

26.0.3Process IDs

 

Individual processes in a Smile CDR cluster will all have a user-assigned Node ID, which will be the same for all processes that share the same Node configuration. Each process will also have a Process ID, which uniquely identifies the process across the cluster.

Process IDs are automatically assigned by Smile CDR and do not need to be explicitly set by the user (nor can they be).

26.0.4Adding and Removing Processes

 

Smile CDR is able to handle any arbitrary number of processes being added, and these processes can be started or stopped at any time.

The very first time a process is started with a given Node ID, the configuration for that Node ID is saved in the cluster manager database. There is nothing special about this process however, and it may be shut down if other processes for the same node have subsequently been started without any adverse effects (note that this was not the case in previous versions of Smile CDR).

26.0.4.1Server Port Offset

  • The node.server_port_offset property indicates an integer value to apply as an offset to server port numbers on the clone node. For example, if the master node has a FHIR Endpoint module listening on port 8000 and this property has a value of 10000, on the clone node the same FHIR Endpoint will listen on port 18000.

26.0.5Multi-Node Clusters

 

In many cases it is desirable to have multiple nodes within a cluster, each with their independent set of modules. This is useful if you are designing a cluster with two independent roles that you want to scale independently.

For example, suppose you are planning a deployment of Smile CDR that will consist of a Web Admin Console, a FHIR Endpoint module, and a SMART Outbound Security module. If all of these modules are on the same node, then they will all be scaled together as more processes are added to the cluster. This has an impact on startup time, memory consumption, etc.

An alternate design is to place each function on its own node. In the example above, this might look like:

  • Node: admin_node
    • Module: Cluster Manager
    • Module: Local Inbound Security
    • Module: Web Admin Console
  • Node: auth_node
    • Module: Cluster Manager
    • Module: SMART Outbound Security
    • Module: Local Inbound Security
  • Node: fhir_node
    • Module: Cluster Manager
    • Module: FHIR Endpoint
    • Module: FHIR Storage
    • Module: SMART Inbound Security

With this design, clones could be made of any of these master nodes in order to scale the system up accordingly.

26.0.5.1Multi-Node Clusters and Batch Jobs Status

Be aware that the web console and the admin JSON modules can only access batch job status for persistence modules defined in the same node. Batch job information from other nodes will not normally be visible.

To support sharing batch job information across different node groups, you should define duplicate persistence modules that share the connection configuration of the original module. This will provide access to batch job status information in the web console and via admin JSON. You should:

  • Create a persistence module on the same node as admin web/JSON
  • Give it the same module ID as the persistence module on the first node (ex: if the first module is called persistenceR4, name the new module persistenceR4 as well). This will reduce any confusion when viewing the information on different consoles.
  • CRITICAL: Ensure the module is set to read-only mode: (ie: read_only_mode.enabled=true), otherwise, this module will block batch 2 and scheduler processes
  • Configure the same database credentials as the persistence module on the other node
    • Database Type (db.driver)
    • Database Connection URL (db.url)
    • Database Username (db.username)
    • Database Password (db.password)
  • You don't need many connections as this module is read only
    • Max Idle Connections (module.persistence_r4.config.db.connectionpool.maxidle)
    • Max Total Connections (module.persistence_r4.config.db.connectionpool.maxtotal)

26.0.6Sample Architecture

 

Because Smile CDR is built as modules that can be used in various combinations and configuration, there are a wide variety of cluster designs that can be built.

The following diagram shows a sample design that follows a fairly common pattern:

These two nodes are scaled independently, so during slow periods each node might be served by 1-2 processes. During busy periods the FHIR node might scale up to many times more processes, while the Administration node might not.

Simple Cluster