MegaScale
MegaScale is a mechanism for storing virtually unlimited amounts of data in a single FHIR server. It uses multiple database instances to create discrete pools of data which are logically separate, but are managed under a single Smile CDR FHIR Storage (RDBMS) module.
In its simplest terms, a MegaScale-enabled server can be thought of as a partitioned FHIR repository where individual partitions or groups of partitions are stored in separate database schemas, and potentially in separate physical database instances.
Using this strategy can be helpful in cases such as:
In MegaScale mode, one or more FHIR Endpoint modules are combined with a single FHIR Storage (RDBMS) module. Incoming FHIR requests include a tenant identifier which maps to a particular partition, which then specifies the target database. This architecture is shown in the diagram below.
In a MegaScale architecture, the dual concepts of Partitions and Shards are used. These two terms mean related but different things.
A Partition is a single grouping of resources. Any individual resource must be assigned to a single partition, and that partition will generally contain multiple resources.
One or more partitions are assigned to a given database schema. This grouping of Partitions to a single database schema is called a Shard.
The following diagram shows a potential mapping of the 15000 partitions defined in Patient ID Partition Mode to 3 shards. This is only one potential mapping however; it is possible to have fewer or more shards depending on anticipated storage and scaling requirements.
See MegaScale Patient ID Partition Selection Modes for details on how to use these partition selection modes with MegaScale.
MegaScale creates an architecture where different partitions are stored on different shards (see Partitions and Shards above). This has several implications to the semantics and operation of FHIR Transaction processing, but does not mean that FHIR transactions can not be used even if they span multiple shards.
When loading data using a FHIR Transaction Bundle, the system will automatically attempt to respect the semantics of the FHIR transaction as much as possible, but will make compromises where necessary if a transaction needs to span multiple shards.
When a transaction Bundle needs to write to multiple shards, it will be automatically split into multiple discrete FHIR Transaction Bundles and executed in sequence. The server will order these bundles according to resource dependencies within the Bundle, and will use the outcome of earlier bundles to inform the processing of later Bundles.
For example, suppose you are have configured your Partition Selection Mode to Patient ID Partition Selection Mode, with your Ancillary Resources on a separate MegaScale database from our Patient Resources. In this example, you might have Patient and Encounter resource referencing Organization and Location resources in the same FHIR Transaction Bundle.
In this scenario, the server will automatically process the Ancillary resources first. Any newly assigned resource IDs will be used in references from the subsequent Patient and Encounter resources.
As a result, it is not possible to have circular dependencies in FHIR Transaction Bundles executed on a MegaScale server where the cycle crosses shard boundaries. For example, if your Patient and Ancillary data are on separate shards, attempting to process a FHIR Transaction Bundle with a reference from a Patient to an Organization where the Organization also holds a reference to the Patient would result in an error.
This section lists the known limitations on this feature.
The following FHIR interactions have been tested:
/P1/$reindex
), or all partitions using _ALL
as the tenant name (e.g. POST /_ALL/$reindex
).No other features, operations, or interactions have been tested or are expected to work with MegaScale.
You must ensure that all updates within a single Bundle target a single MegaScale database. This is true for REQUEST_TENANT partitioning mode, but may not be true for other partition modes like Patient-Id partitioning or custom partitioning solutions.
Search requests will only include results from a single database.
To enable MegaScale mode, the following settings must be set.
On the FHIR Storage (RDBMS) module:
true
.DEFAULT
partition.REQUEST_TENANT
or REQUEST_HEADER
.true
.On the FHIR Endpoint module:
REQUEST_TENANT
partition selection mode, Tenant Identification Strategy must be set to URL_BASED
.MegaScale connection details are supplied using a Java Smile CDR Interceptor using the STORAGE_MEGASCALE_PROVIDE_DB_INFO
pointcut.
See Example: MegaScale Connection Provider to see how this pointcut can be used. This example is also available in the Interceptor Starter Project.
You are about to leave the Smile Digital Health documentation and navigate to the Open Source HAPI-FHIR Documentation.