Smile CDR v2022.08.PRE
On this page:

26.1FHIR Bulk Import
Trial

 

Smile CDR supports the use of the FHIR Bulk Import (by manifest) operation to rapidly import a large amount of data. The Bulk Import operation is a draft specification defined at the following URL: https://github.com/smart-on-fhir/bulk-import/blob/master/import-manifest.md

Note that Smile CDR (and HAPI FHIR) implement a modified version of the operation which uses a FHIR Parameters resource as input instead of an arbitrary JSON Payload, making it easier to use for a FHIR Client.

Bulk Import uses NDJSON files as input, with the expectation that each NDJSON file holds resources of only one type. Multiple files can hold the same resource type. In other words, it is acceptable for two files to contain Patient resources but not for one file to contain both Patient and Encounter resources.

26.1.1Triggering a Bulk Import
Trial

 
Initiating a Bulk Export requires the FHIR_OP_INITIATE_BULK_DATA_IMPORT permission.

To initiate a Bulk Import operation, the /$import operation should be executed against the server base URL. The request body is a Parameters resource, and the request must include a directive requesting asynchronous processing.

An example is shown below:

POST /$import
Prefer: respond-async
Content-Type: application/fhir+json

[payload - described below]

Request Payload

The request payload should resemble the example shown below. Note the following parameters:

  • inputFormat – Must be application/fhir+ndjson.
  • inputSource – (optional) Can be used to indicate the base URL for the FHIR server where the source data is from. This value is described in the Bulk Import specification, but is not currently used by Smile CDR.
  • storageDetail.type; Must be https. Other input mechanisms will be added in the future. Note that Smile CDR will accept non-HTTPS URLs as a data source in order to simplify testing, however this should not be used for production / PHI / PII scenarios.
  • storageDetail; – Holds information about how to access source data.
    • credentialHttpBasic; – (optional) Can be used to supply an HTTP Basic Authorization credential to be supplied by Smile CDR while it fetches the NDJSON files.
    • maxBatchResourceCount; – (optional) Specifies the maximum number of resources to process in a single database transaction. Note that each batch is loaded completely into memory, so there is a practical upper limit for this setting. The default value is 500, which is generally a good value to use.
  • input – (can repeat) Describes a single source data file
    • type; – The resource type in this file
    • url; – The URL to access the file
{
  "resourceType": "Parameters",
  "parameter": [ {
    "name": "inputFormat",
    "valueCode": "application/fhir+ndjson"
  }, {
    "name": "inputSource",
    "valueUrl": "http://example.com/fhir/"
  }, {
    "name": "storageDetail",
    "part": [ {
      "name": "type",
      "valueCode": "https"
    }, {
      "name": "credentialHttpBasic",
      "valueString": "admin:password"
    }, {
      "name": "maxBatchResourceCount",
      "valueString": "500"
    } ]
  }, {
    "name": "input",
    "part": [ {
      "name": "type",
      "valueCode": "Observation"
    }, {
      "name": "url",
      "valueUrl": "https://example.com/observations.ndjson"
    } ]
  }, {
    "name": "input",
    "part": [ {
      "name": "type",
      "valueCode": "Patient"
    }, {
      "name": "url",
      "valueUrl": "https://example.com/patients.ndjson"
    } ]
  } ]
}

Response Payload

Assuming the request is successfully accepted by the server, the server will respond with a payload resmbling the following:

{
  "resourceType": "OperationOutcome",
  "issue": [ {
    "severity": "information",
    "code": "informational",
    "diagnostics": "Bulk import job has been submitted with ID: c9398169-aed3-49b1-ac42-0967564bc38c"
  } ]
}

26.1.2Methodology
Trial

 

Within the source NDJSON data, each line will contain a resource. Resources are ingested using an ingestion pipeline that is automatically parallelized and batched for optimal write performance.

Resources are stored using the resource IDs supplied within the resource body:

  • If a resource already exists with a given ID, the existing resource will be updated to match the contents supplied.
  • If no resource already exists, one will be created using the supplied ID. Note that the supplied ID must be an acceptable value for a Client Assigned ID. This means that unless the Client ID Mode has been changed from its default configuration, resources may not be created with a purely numeric ID.

26.1.3Performance
Trial

 

FHIR Bulk Import uses a background batch job to asynchronously load data as quickly as possible. If a Message Broker has been configured, work will be distributed across all processes in your cluster.

The Batch Job Executor: Maximum Thread Count setting controls how many concurrent processing threads will be used on each process for processing.