FHIR Repositories can end up with duplicate resources in them. This page details tools that Smile provides to work with duplicated data.
Cause of Duplicates | Recommended Approach |
---|---|
The same entities were imported from different source systems. E.g. Jane Smith's Patient record is imported from the Lab system as Patient/123 and from the Pharmacy system as Patient/456. | Preserve the source data as is so that future data continues to be properly associated to the right patients. Use MDM to establish Golden Resources and MDM LINK records outside of the data and use MDM features like Observation?patient=Patient/123&_mdm=true to link the data at search time. |
Data was accidentally duplicated on import. E.g. the same data was accidentally loaded twice (as POST new resources) or Conditional Create directives failed to match the intended target correctly. This kind of unintended duplication can also occur when translating HL7v2, CDA, or CSV data into FHIR resources. | In the case where the data was accidentally duplicated, it may make sense to "clean up" the duplicates. See below for details on Smile CDR tools to merge such duplicates. |
Smile CDR provides two operations to deduplicate data:
$merge
which is a backport of the FHIR R5 Patient/$merge specification to FHIR R4$replace-references
performs only the update references part of this $merge
operation.See the FHIR R5 Patient/$merge specification page for a description of this operation. Note the Provenance functionality is not supported yet, but most other parts are. See the bottom of this page for details on the current roadmap for enhancing Smile CDR deduplication functionality.
Name | Type | Default | Notes |
---|---|---|---|
source-patient-identifier | Identifier | List of source patient identifiers | |
source-patient | Reference | Source patient | |
target-patient-identifier | Identifier | List of target patient identifiers | |
target-patient | Reference | Target patient | |
result-patient | Patient | Optional merged patient resource | |
preview | Boolean | false | If true, no changes will be made and response will summarize what would happen were the merge to occur |
delete-source | Boolean | false | If true, delete the source resource |
resource-limit | Integer | 512 | If the request is synchrononous and the number of resources to change exceeds this threshold, the operation will fail with 412 Precondition Failed. This parameter has no effect if the Prefer: respond-async header is set |
See the FHIR R5 Patient/$merge specification for a detailed description of these input parameters.
resource-limit
is a Smile CDR addition to protect users from accidentally changing too many resources at once. If resource-limit
is larger than 10000, the value 10000 will be used.
If you request that the operation be performed asynchronously by providing the Prefer: respond-async
HTTP header, then the resource-limit
parameter is ignored.
When performed asynchronously, the operation is performed in batches of 1024 resource patches at a time, via PATCH transaction transaction Bundles.
Name | Type | Notes |
---|---|---|
input | Parameters | A copy of the input parameters used in the $merge operation |
outcome | OperationOutcome | Details about the result of the merge |
result | Patient | The merged Patient resource |
task | Task | If the merge operation was performed asynchronously, this Task resource provides details about the status of the merge operation |
Input:
POST /Patient/$merge
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-patient",
"valueReference": {
"reference": "Patient/2"
}
},
{
"name": "target-patient",
"valueReference": {
"reference": "Patient/3"
}
}
]
}
Output:
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "input",
"resource": {
"resourceType": "Parameters",
"parameter": [
{
"name": "source-patient",
"valueReference": {
"reference": "Patient/2"
}
},
{
"name": "target-patient",
"valueReference": {
"reference": "Patient/3"
}
}
]
}
},
{
"name": "outcome",
"resource": {
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "information",
"details": {
"text": "Merge operation completed successfully."
}
}
]
}
},
{
"name": "result",
"resource": {
"resourceType": "Patient",
"id": "3",
"identifier": [
{
"system": "SYS2A",
"value": "VAL2A"
},
{
"system": "SYS2B",
"value": "VAL2B"
},
{
"system": "SYSC",
"value": "VALC"
},
{
"use": "old",
"system": "SYS1A",
"value": "VAL1A"
},
{
"use": "old",
"system": "SYS1B",
"value": "VAL1B"
}
],
"link": [
{
"other": {
"reference": "Patient/2"
},
"type": "replaces"
}
]
}
}
]
}
Input:
POST /Patient/$merge
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-patient",
"valueReference": {
"reference": "Patient/2"
}
},
{
"name": "target-patient",
"valueReference": {
"reference": "Patient/3"
}
},
{
"name": "preview",
"valueBoolean": "true"
}
]
}
Output:
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "input",
"resource": {
"resourceType": "Parameters",
...copy of input...
}
},
{
"name": "outcome",
"resource": {
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "information",
"details": {
"text": "Preview only merge operation - no issues detected"
},
"diagnostics": "Merge would update 25 resources"
}
]
}
},
{
"name": "result",
"resource": {
"resourceType": "Patient",
...merged patient...
}
}
]
}
Input:
POST /Patient/$merge
Content-Type: application/fhir+json
Prefer: respond-async
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-patient",
"valueReference": {
"reference": "Patient/2"
}
},
{
"name": "target-patient",
"valueReference": {
"reference": "Patient/3"
}
}
]
}
Output:
HTTP/1.1 202 Accepted
{
"resourceType": "Parameters",
"parameter": [
{
"name": "input",
"resource": {
"resourceType": "Parameters",
...copy of input...
}
},
{
"name": "task",
"resource": {
"resourceType": "Task",
"id": "352",
"identifier": [
{
"system": "http://hapifhir.io/batch/jobId",
"value": "26738f4d-266c-4ef6-934f-1d13b1b474b9"
}
],
"status": "in-progress"
}
}
]
}
You can poll the status of the returned Task resource see when it completes. For more detailed status about the background job, you can view the status of the corresponding Smile CDR batch job either through the Web Admin Console, or through the Admin JSON API. The id of the Smile CDR batch job is provided as an identifer on the returned Task.
The $replace-references
operation searches for all resources in the repository that have a reference to the source
resource, and updates those references to point to the target
resource. It is a simplified form of the $merge
operation when all you want to do is update references.
This operation creates a Transaction Bundle of Patch operations to update the references and returns the output of performing that transaction.
Name | Type | Default | Notes |
---|---|---|---|
source-reference-id | String | The id of the source resource reference to be replaced | |
target-reference-id | String | The id of the target resource reference that the references will be replaced with | |
resource-limit | Integer | 512 | If the request is synchrononous and the number of resources to change exceeds this threshold, the operation will fail with 412 Precondition Failed. This parameter has no effect if the Prefer: respond-async header is set |
The resource-limit
parameter is available to control how many resources can be changed by this operation. If resource-limit
is larger than 10000, the value 10000 will be used.
If you request that the operation be performed asynchronously by providing the Prefer: respond-async
HTTP header, then the resource-limit
parameter is ignored.
When performed asynchronously, the operation is performed in batches of 1024 resource patches at a time, via PATCH transaction transaction Bundles.
Name | Type | Notes |
---|---|---|
outcome | Bundle | The result of the Bundle patch transaction |
task | Task | If the operation was performed asynchronously, this Task resource provides details about the status of the operation |
See the $merge
operation above for details about the returned Task resource in the case when the operation is performed asynchronously.
Input:
POST /Patient/$merge
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-reference-id",
"valueString": "Patient/2"
},
{
"name": "target-reference-id",
"valueString": "Patient/3"
}
]
}
Output:
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "outcome",
"resource": {
"resourceType": "Bundle",
"id": "782add05-549c-4a7e-a687-38c22f2f12d0",
"type": "transaction-response",
"entry": [
{
"response": {
"status": "200 OK",
"location": "CarePlan/62/_history/2",
"etag": "2",
"outcome": {
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "information",
"code": "informational",
"details": {
"coding": [
{
"system": "https://hapifhir.io/fhir/CodeSystem/hapi-fhir-storage-response-code",
"code": "SUCCESSFUL_PATCH",
"display": "Patch succeeded."
}
]
},
"diagnostics": "Successfully patched resource \"CarePlan/62/_history/2\"."
}
]
}
}
},
... etc outcome of the rest of the patch operations ...
]
}
}
}
}
The Deduplication features of Smile CDR are under active development. Here is a roadmap of new features we are planning to roll out.
$merge
and $replace-references
$merge
to all Resource types$undo-merge
and $undo-replace-references
operations that use the Provenance resources to undo the effects of those operations (assuming that none of the affected resources have subsequentially changed).MATCH_AND_MERGE
which is similar to MATCH_ONLY
in that it uses MDM Rules but does not create any links or Golden Resources. When enabled, a MATCH_AND_MERGE
MDM module will automatically perform a $merge
operation on all inbound resources. Any resources that MATCH a single target will be merged into that resources following the MDM Survivorship rules and all references will be updated to point to the merged resource.$deduplicate
operation that takes a FHIR Bundle resource as input and uses MDM Rules to remove duplicates from the inbound Bundle and update references to point to the matched resource. Think of this as a stronger version of Conditional Create where you now have the full power of MDM matching to find matching resources rather than being limited by the FHIR Conditional Create syntax.$deduplicate
operation as a Camel Processor$submit-for-deduplication
operation that works like $mdm-submit
and performs MATCH_AND_MERGE on all resources that match the criteria in the request. E.g. If Organization resources with _source=ABC
were accidentally duplicated in your FHIR Repository, you could call $submit-for-deduplication
with the criteria Organization?_source=ABC
to submit all of those Organizations for deduplication. Ones that have existing matches would be deleted and all references updated to point to the remaining copy of that organization.