14.5 MegaScale 14.7 MegaScale Document Repository Mode

14.6.1MegaScale Patient ID Partition Selection Modes

Although it is not the only way to use MegaScale, MegaScale is designed to work well with Patient ID Partition Mode and Bucketed Patient ID Partition Mode. These are referred to as Patient ID Partition Modes on this page.

These modes are a great choice if your use case involves needing to store large amounts of data where the majority of your queries will be patient-oriented (i.e. queries about a single patient or a list of patients). This generally means performing FHIR searches with a subject or patient parameter, such as Observation?patient=Patient/37510001&category=vital-signs and DocumentReference?patient=Patient/37510001&date=gt2025.

In these modes, all resources belonging to a single Patient compartment will be colocated in the same partition. This means that reads and writes to data belonging to this compartment can be performed efficiently even within a large MegaScale deployment since these operations need only to access a single Partition on a single Shard.

14.6.2Required and Suggested Settings

When using MegaScale in Patient ID Partition Modes, the following settings should be considered:

Cross-Partition Reference Mode should be set to ALLOWED_UNQUALIFIED if you want to allow resources in the Patient compartment to hold references to resources in the default partition (such as Ancillary resources).
Any Patient.identifier.system values that will be used for conditional operations on Patient resources should be declared using Configuring Pre-Assigned Patient Identifier Systems.
Server ID Mode does not need a specific setting to work. However, since MegaScale encodes the partition IDs in the server-assigned resource IDs, if this setting is set to SEQUENTIAL_NUMERIC it will be possible to perform efficient FHIR Read operations on resources with these server-assigned IDs without needing to scan multiple shards. For example, an Observation with ID Observation/362412010 denotes that the resource is stored in partition 12010, and the server will automatically select that partition if this resource is read using a FHIR Read operation.

14.6.3Partition Distribution

The following table outlines the partition distribution for various resource types in MegaScale in Patient ID Partition Modes:

Resource Type	Patient ID Partition Mode	Bucketed Patient ID Partition Mode
Non-Partitionable Resources (e.g. StructureDefinition, ValueSet, etc.) See HAPI FHIR Partitioning Limitations for a complete list of non-partitionable resources.	Always stored in the default partition (0).	Always stored in the default partition (0).
Ancillary Resources (e.g. Practitioner, Organization, etc.). This includes any resources that are partitionable, but not in a Patient compartment (excluding any resource types explicitly called out elsewhere in this table).	Always stored in the default partition (0).	Stored in the base partition for the selected bucket (e.g. 100, 200, 300, etc).
Patient Resources (e.g. Patient, Encounter, etc.). This includes all resources in the FHIR Patient Compartment, excluding resource types explicitly called out elsewhere in this table.	Distributed between partitions 1 through 14999.	Distributed between partitions [offset] + {1 through 99}.
Group and List Resources (these have special handling because they can be in multiple Patient compartments, and therefore cannot be placed in a specific Patient compartment)	Always stored in the default partition (0).	Stored in the base partition for the selected bucket (e.g. 100, 200, 300, etc).
FHIR Documents (Bundle resources with a Bundle.type code of `document`)	Stored in the Patient Resources partition belonging to the Composition.subject if Document Repository Mode is enabled (i.e. distributed between partitions 1 through 14999). Stored in the Ancillary Resources partition otherwise.	Stored in the Patient Resources partition belonging to the Composition.subject if Document Repository Mode is enabled (i.e. distributed between partitions [offset] + {1 through 99}). Stored in the Ancillary Resources partition otherwise.

14.6.4Patient Resource ID-Identifier Mapping

Patient ID Partition Modes use the ID of the Patient resource as a means of determining the appropriate shard and partition to access. For example, the resource Patient/A and an Observation with an Observation.subject reference to Patient/A will both be stored in the same shard and partition. Searches for Patient/A or for Observation?subject=Patient/A will both use this ID to determine the appropriate shard and partition to read from.

In large infrastructures involving storage of Patient data from multiple sources, data is often read and written using a Patient identifier as the search key, as opposed to the Patient ID. For example, a common pattern is to perform a conditional update operation on a Patient resource using a conditional URL such as Patient?identifier=http://example.org|123.

In MegaScale Patient ID Partition Modes, the server needs to resolve this search to determine the Patient resource ID, which is then used to determine the actual shard and partition to access. This creates a circular dependency since the server needs to know which partition to search, but needs to search to determine the partition.

To resolve this problem, MegaScale leverages an identifier mapping table located on the default partition. When performing a conditional operation on a patient identifier, the server performs an initial lookup on the default partition to check whether the given identifier is known. If it is, the server uses it to select a partition for the rest of the transaction. This lookup is cached to avoid unnecessary database queries if repeated operations access the same identifier.

14.6.4.1Configuring Pre-Assigned Patient Identifier Systems

Identifier systems that will be used for conditional operations on Patient resources must be declared in the FHIR Storage module configuration, using the Patient Identifier Systems for Pre-Assignment setting. Any identifier systems that have not been pre-declared in configuration will not be available for use in conditional operations, and trying to use them will result in an error.

This setting accepts multiple identifier systems, each separated by whitespace (space or newline). Values can be a fixed value, e.g. http://example.org/practitioner. Values can also be specified as a regular expression by adding a prefix of ^ and a suffix of $, e.g. ^http://example.org/practitioner/[0-9]+$.

Values should not be added to this list if they have already been used in stored data in the repository. Values may be added to the list at any time, however, as long as this is done before adding any data using the new identifier system.

14.6.4.2Implications of Pre-Assignment Patient Identifier Systems

Pre-assignment creates a permanent 1:1 mapping between the identifier and the resource ID assigned to this identifier. This has several important consequences:

Any resource with an identifier that has a Pre-Assigned Patient Identifier System can never have that identifier removed or changed. Other identifiers may be added and removed as long as they do not also have a Pre-Assigned Patient Identifier System. Any attempt to remove or change the identifier with the Pre-Assigned Patient Identifier System will result in an error.
No resource may have multiple identifiers with system values which are matched by the Pre-Assigned Patient Identifier Systems list.
All identifiers with system values which are matched by the Pre-Assigned Patient Identifier Systems list have uniqueness enforced automatically, meaning that no two resources may have the same identifier with the same Pre-Assigned Patient Identifier System and value.

14.6.5Reading Data

The following search patterns are supported for efficient queries in MegaScale in Patient ID Partition Modes:

Read Pattern	Example URL	Notes
Read ancillary resource by ID.	http://base/Practitioner/ABC http://base/Practitioner/12301000	Always supported.
Search for ancillary resource(s) by identifier.	http://base/Practitioner?identifier=http://pract\|0 http://base/Practitioner?identifier=http://pract\|0,http://pract\|1	Always supported.
Read Patient Resource by resource ID.	http://base/Patient/1231101693 http://base/Patient/PAT-0	Always supported.
Search for Patient Resource by identifier.	http://base/Patient?identifier=http://patient\|1	Only a single Patient identifier may be placed in the URL.
Search for resources in the Patient Compartment by patient ID or identifier. This search finds any resources that belong to the patient with the given identifier.	http://base/Encounter?patient=Patient/1037201377 http://base/Encounter?patient.identifier=http://patient\|1	Only a single Patient ID or identifier may be placed in the URL. The `_include` parameter may be used to include any referenced Patient compartment or ancillary resources.
Includes (`_include`)	http://base/Encounter?patient=Patient/1037201377&_include=Encounter:patient	Always supported, including across partitions and shards.
Reverse Includes (`_revinclude`)	http://base/Patient?_id=Patient/1037201377&_revinclude=Encounter:patient	Reverse includes will only fetch referencing resources on the same partition. So for example, an Encounter can be reverse included from a Patient search because they will be in the same Partition. A Patient could not be reverse included from a Practitioner resource because they are stored in different partitions.

14.6.6Writing Data

When loading data into a MegaScale repository in Patient ID Partition Modes, it is recommended to always use a FHIR Transaction and to group resources together by Patient as much as possible. In other words, if you are loading Patient resources as well as multiple Observation resources for each Patient, your data will load much faster if you put as many Observation resources referring to the same Patient (as well as other resources referring to that same Patient) in the same transaction Bundle.

MegaScale will split the transaction Bundle into multiple transactions (one for each Shard) and will load each sub-transaction in an order. This means that it is possible for the overall transaction to fail if a later sub-transaction fails after an earlier sub-transaction has succeeded. If you want to avoid this possibility entirely, ensure that any FHIR Transaction Bundles contain only resources that belong to a single Patient, or resources that are not in any Patient compartment (such as Ancillary Resources).

A good compromise is to include resources in a single Patient compartment as well as any ancillary resources referenced by these Patient resources in a single Transaction Bundle. All resources should be included in the Bundle as either a Conditional Create, a Conditional Update or a plain Update so that the transaction can be retried if it fails without creating duplicate resources.

14.6.6.1Conditionally Creating Patient by Identifier

Patient resources and other resources can be created using a Conditional Create, as shown in the example below. Any resources belonging to the same Patient compartment should be referenced using the Patient entry fullUrl, which must contain a Placeholder ID. For this example to work, the http://patient identifier system must be configured.

{
  "resourceType": "Bundle",
  "type": "transaction",
  "entry": [ {
    "fullUrl": "urn:uuid:c4592eed-14b7-4a19-9ec0-bff03965d489",
    "resource": {
      "resourceType": "Patient",
      "identifier": [ {
        "system": "http://patient",
        "value": "1"
      } ]
    },
    "request": {
      "method": "POST",
      "url": "Patient",
      "ifNoneExist": "Patient?identifier=http://patient|1"
    }
  }, {
    "fullUrl": "urn:uuid:958c64e5-83c1-4261-8174-b0cc210dddd4",
    "resource": {
      "resourceType": "Encounter",
      "identifier": [ {
        "system": "http://encounter",
        "value": "1"
      } ],
      "subject": {
        "reference": "urn:uuid:c4592eed-14b7-4a19-9ec0-bff03965d489"
      }
    },
    "request": {
      "method": "POST",
      "url": "Encounter",
      "ifNoneExist": "Encounter?identifier=http://encounter|1"
    }
  } ]
}

14.6.6.2Conditionally Updating Patient by Identifier

Patient resources and other resources can be created using a Conditional Update, as shown in the example below. Any resources belonging to the same Patient compartment should be referenced using the Patient entry fullUrl, which must contain a Placeholder ID. For this example to work, the http://patient identifier system must be configured.

This example also demonstrates a Conditional Update on an Ancillary Resource (the Practitioner), which will be stored in the default partition but may be referenced by resources in other partitions as long as Cross-Partition Reference Mode is set to ALLOWED_UNQUALIFIED.

{
	"resourceType": "Bundle",
	"type": "transaction",
	"entry": [ {
		"fullUrl": "urn:uuid:71f0cbca-7d53-4ca3-a685-f23b0f455256",
		"resource": {
			"resourceType": "Practitioner",
			"identifier": [ {
				"system": "http://practitioner",
				"value": "1"
			} ]
		},
		"request": {
			"method": "PUT",
			"url": "Practitioner?identifier=http://practitioner|1"
		}
	}, {
		"fullUrl": "urn:uuid:5cdc41d7-f1b4-408e-8719-789797c080eb",
		"resource": {
			"resourceType": "Patient",
			"identifier": [ {
				"system": "http://patient",
				"value": "1"
			} ],
			"generalPractitioner": [ {
				"reference": "urn:uuid:71f0cbca-7d53-4ca3-a685-f23b0f455256"
			} ]
		},
		"request": {
			"method": "PUT",
			"url": "Patient?identifier=http://patient|1"
		}
	}, {
		"fullUrl": "urn:uuid:16b16967-bf3a-4e58-82ed-c3c5d8a0605b",
		"resource": {
			"resourceType": "Encounter",
			"identifier": [ {
				"system": "http://encounter",
				"value": "1"
			} ],
			"subject": {
				"reference": "urn:uuid:5cdc41d7-f1b4-408e-8719-789797c080eb"
			},
			"participant": [ {
				"individual": {
					"reference": "urn:uuid:71f0cbca-7d53-4ca3-a685-f23b0f455256"
				}
			} ]
		},
		"request": {
			"method": "PUT",
			"url": "Encounter?identifier=http://encounter|1"
		}
	} ]
}

14.6.7Overriding Partition Distribution

By default, all ancillary resources are placed in a specific partition according to rules specified by the specific Patient ID Partition Selection Mode being used (see Partition Distribution above). Using an interceptor which runs with a lower order than the built-in partitioning interceptor, it is possible to force specific anciallary resources to be stored in a different partition.

This can be useful if you have specific resource types which are expected to occupy a large amount of space.

The following example shows an interceptor which forces MessageHeader resources to be stored on partition 20000.

@Interceptor
private class MessageHeaderSeparatePartitionInterceptor {

   /**
    * In this example, we are explicitly forcing partition 20000
    * for the MessageHeader resource. This is arbitrary and could
    * be a different partition.
    */
   public static final int PARTITION_ID = 20000;

   /**
    * We want this interceptor to run before the standard interceptor
    * which runs at order 0, so it can override the partition that
    * would otherwise be used.
    */
   public static final int POINTCUT_ORDER = -1;

   @Hook(value = Pointcut.STORAGE_PARTITION_IDENTIFY_CREATE, order = POINTCUT_ORDER)
   public RequestPartitionId identifyCreate(IBaseResource theResource) {
      String resourceType = myCtx.getResourceType(theResource);
      if ("MessageHeader".equals(resourceType)) {
         return RequestPartitionId.fromPartitionId(PARTITION_ID);
      }

      // If we return null, the standard partitioning interceptor will be used
      return null;
   }

   @Hook(value = Pointcut.STORAGE_PARTITION_IDENTIFY_READ, order = POINTCUT_ORDER)
   public RequestPartitionId identifyCreate(ReadPartitionIdRequestDetails theDetails) {
      String resourceType = theDetails.getResourceType();
      if ("MessageHeader".equals(resourceType)) {
         return RequestPartitionId.fromPartitionId(PARTITION_ID);
      }

      // If we return null, the standard partitioning interceptor will be used
      return null;
   }

}

14.5 MegaScale 14.7 MegaScale Document Repository Mode