FHIR Bulk Import Operation
Smile CDR supports the use of the FHIR Bulk Import (by manifest) operation to rapidly import a large amount of data. The Bulk Import operation is a draft specification defined at the following URL: https://github.com/smart-on-fhir/bulk-import/blob/master/import-manifest.md
Note that Smile CDR (and HAPI FHIR) implement a modified version of the operation which uses a FHIR Parameters resource as input instead of an arbitrary JSON Payload, making it easier to use for a FHIR Client.
Bulk Import uses NDJSON files as input, with the expectation that each NDJSON file holds resources of only one type. Multiple files can hold the same resource type. In other words, it is acceptable for two files to contain Patient resources but not for one file to contain both Patient and Encounter resources.
To initiate a Bulk Import operation, the /$import
operation should be executed against the server base URL. The request body is a Parameters resource, and the request must include a directive requesting asynchronous processing.
An example is shown below:
POST /$import
Prefer: respond-async
Content-Type: application/fhir+json
[payload - described below]
The request payload should resemble the example shown below. Note the following parameters:
application/fhir+ndjson
.https
. Other input mechanisms will be added in the future. Note that Smile CDR will accept non-HTTPS URLs as a data source in order to simplify testing, however this should not be used for production / PHI / PII scenarios.{
"resourceType": "Parameters",
"parameter": [ {
"name": "inputFormat",
"valueCode": "application/fhir+ndjson"
}, {
"name": "inputSource",
"valueUri": "http://example.com/fhir/"
}, {
"name": "storageDetail",
"part": [ {
"name": "type",
"valueCode": "https"
}, {
"name": "credentialHttpBasic",
"valueString": "admin:password"
}, {
"name": "maxBatchResourceCount",
"valueString": "500"
} ]
}, {
"name": "input",
"part": [ {
"name": "type",
"valueCode": "Observation"
}, {
"name": "url",
"valueUri": "https://example.com/observations.ndjson"
} ]
}, {
"name": "input",
"part": [ {
"name": "type",
"valueCode": "Patient"
}, {
"name": "url",
"valueUri": "https://example.com/patients.ndjson"
} ]
} ]
}
Assuming the request is successfully accepted by the server, the server will respond with a payload resmbling the following:
{
"resourceType": "OperationOutcome",
"issue": [ {
"severity": "information",
"code": "informational",
"diagnostics": "Bulk import job has been submitted with ID: c9398169-aed3-49b1-ac42-0967564bc38c"
} ]
}
Within the source NDJSON data, each line will contain a resource. Resources are ingested using an ingestion pipeline that is automatically parallelized and batched for optimal write performance.
Resources are stored using the resource IDs supplied within the resource body:
FHIR Bulk Import uses a background batch job to asynchronously load data as quickly as possible. If a Message Broker has been configured, work will be distributed across all processes in your cluster.
The Batch Job Executor: Maximum Thread Count setting controls how many concurrent processing threads will be used on each process for processing.
Due to the nature of batch jobs, bulk import does not support the use of auto-create placeholder references.
Jobs that expect placeholder resources to be populated will be allowed to continue, but the results may be unpredictable.