FHIR Repositories can end up with duplicate resources in them. This page details tools that Smile provides to work with duplicated data.
Cause of Duplicates | Recommended Approach |
---|---|
The same entities were imported from different source systems. E.g. Jane Smith's Patient record is imported from the Lab system as Patient/123 and from the Pharmacy system as Patient/456. | Preserve the source data as is so that future data continues to be properly associated to the right patients. Use MDM to establish Golden Resources and MDM LINK records outside of the data and use MDM features like Observation?patient=Patient/123&_mdm=true to link the data at search time. |
Data was accidentally duplicated on import. E.g. the same data was accidentally loaded twice (as POST new resources) or Conditional Create directives failed to match the intended target correctly. This kind of unintended duplication can also occur when translating HL7v2, CDA, or CSV data into FHIR resources. | In the case where the data was accidentally duplicated, it may make sense to "clean up" the duplicates. See below for details on Smile CDR tools to merge such duplicates. |
Smile CDR provides two operations to deduplicate data:
$merge
which is a backport of the FHIR R5 Patient/$merge specification to FHIR R4$hapi.fhir.replace-references
performs only the update references part of this $merge
operation.See the FHIR R5 Patient/$merge specification page for a description of this operation. See the bottom of this page for details on the current roadmap for enhancing Smile CDR deduplication functionality.
Name | Type | Default | Notes |
---|---|---|---|
source-patient-identifier | Identifier | List of source patient identifiers | |
source-patient | Reference | Source patient | |
target-patient-identifier | Identifier | List of target patient identifiers | |
target-patient | Reference | Target patient | |
result-patient | Patient | Optional merged patient resource | |
preview | Boolean | false | If true, no changes will be made and response will summarize what would happen were the merge to occur |
delete-source | Boolean | false | If true, delete the source resource |
resource-limit | Integer | 512 | If the request is synchrononous and the number of resources to change exceeds this threshold, the operation will fail with 412 Precondition Failed. This parameter has no effect if the Prefer: respond-async header is set |
See the FHIR R5 Patient/$merge specification for a detailed description of these input parameters.
resource-limit
is a Smile CDR addition to protect users from accidentally changing too many resources at once. If resource-limit
is larger than 10000, the value 10000 will be used.
If you request that the operation be performed asynchronously by providing the Prefer: respond-async
HTTP header, then the resource-limit
parameter is ignored.
When performed asynchronously, the operation is performed in batches of 1024 resource patches at a time, via PATCH transaction transaction Bundles.
Name | Type | Notes |
---|---|---|
input | Parameters | A copy of the input parameters used in the $merge operation |
outcome | OperationOutcome | Details about the result of the merge |
result | Patient | The merged Patient resource |
task | Task | If the merge operation was performed asynchronously, this Task resource provides details about the status of the merge operation |
With the 2025.08 release, the $merge
operation creates a Provenance resource upon successful completion.
This Provenance resource contains, in its target
element, the versioned references to the target patient,
the source patient (if not deleted during the operation), and all other resources updated as part of the operation.
The Provenance.activity
is set to http://terminology.hl7.org/CodeSystem/iso-21089-lifecycle|merge
, and the Provenance.agent.who
is populated with a logical reference to the request user.
Input:
POST /Patient/$merge
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-patient",
"valueReference": {
"reference": "Patient/2"
}
},
{
"name": "target-patient",
"valueReference": {
"reference": "Patient/3"
}
}
]
}
Output:
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "input",
"resource": {
"resourceType": "Parameters",
"parameter": [
{
"name": "source-patient",
"valueReference": {
"reference": "Patient/2"
}
},
{
"name": "target-patient",
"valueReference": {
"reference": "Patient/3"
}
}
]
}
},
{
"name": "outcome",
"resource": {
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "information",
"details": {
"text": "Merge operation completed successfully."
}
}
]
}
},
{
"name": "result",
"resource": {
"resourceType": "Patient",
"id": "3",
"identifier": [
{
"system": "SYS2A",
"value": "VAL2A"
},
{
"system": "SYS2B",
"value": "VAL2B"
},
{
"system": "SYSC",
"value": "VALC"
},
{
"use": "old",
"system": "SYS1A",
"value": "VAL1A"
},
{
"use": "old",
"system": "SYS1B",
"value": "VAL1B"
}
],
"link": [
{
"other": {
"reference": "Patient/2"
},
"type": "replaces"
}
]
}
}
]
}
Input:
POST /Patient/$merge
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-patient",
"valueReference": {
"reference": "Patient/2"
}
},
{
"name": "target-patient",
"valueReference": {
"reference": "Patient/3"
}
},
{
"name": "preview",
"valueBoolean": "true"
}
]
}
Output:
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "input",
"resource": {
"resourceType": "Parameters",
...copy of input...
}
},
{
"name": "outcome",
"resource": {
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "information",
"details": {
"text": "Preview only merge operation - no issues detected"
},
"diagnostics": "Merge would update 25 resources"
}
]
}
},
{
"name": "result",
"resource": {
"resourceType": "Patient",
...merged patient...
}
}
]
}
Input:
POST /Patient/$merge
Content-Type: application/fhir+json
Prefer: respond-async
{
"resourceType": "Parameters",
"parameter": [ {
"name": "source-patient-identifier",
"valueIdentifier": {
"system" : "urn:oid:1.2.36.146.595.217.0.1",
"value" : "12345"
}
},
{
"name": "target-patient-identifier",
"valueIdentifier": {
"system" : "urn:oid:1.2.36.146.595.217.0.1",
"value" : "12346"
}
}
]
}
Output:
HTTP/1.1 202 Accepted
{
"resourceType": "Parameters",
"parameter": [
{
"name": "input",
"resource": {
"resourceType": "Parameters",
...copy of input...
}
},
{
"name": "task",
"resource": {
"resourceType": "Task",
"id": "352",
"identifier": [
{
"system": "http://hapifhir.io/batch/jobId",
"value": "26738f4d-266c-4ef6-934f-1d13b1b474b9"
}
],
"status": "in-progress"
}
}
]
}
You can poll the status of the returned Task resource see when it completes. For more detailed status about the background job, you can view the status of the corresponding Smile CDR batch job either through the Web Admin Console, or through the Admin JSON API. The id of the Smile CDR batch job is provided as an identifer on the returned Task.
The $hapi.fhir.replace-references
operation searches for all resources in the repository that have a reference to the
source
resource, and updates those references to point to the target
resource. It is a simplified form of the $merge
operation when all you want to do is update references.
This operation creates a Transaction Bundle of Patch operations to update the references and returns the output of performing that transaction.
Name | Type | Default | Notes |
---|---|---|---|
source-reference-id | String | The id of the source resource reference to be replaced | |
target-reference-id | String | The id of the target resource reference that the references will be replaced with | |
resource-limit | Integer | 512 | If the request is synchrononous and the number of resources to change exceeds this threshold, the operation will fail with 412 Precondition Failed. This parameter has no effect if the Prefer: respond-async header is set |
The resource-limit
parameter is available to control how many resources can be changed by this operation. If resource-limit
is larger than 10000, the value 10000 will be used.
If you request that the operation be performed asynchronously by providing the Prefer: respond-async
HTTP header, then the resource-limit
parameter is ignored.
When performed asynchronously, the operation is performed in batches of 1024 resource patches at a time, via PATCH transaction transaction Bundles.
Name | Type | Notes |
---|---|---|
outcome | Bundle | The result of the Bundle patch transaction |
task | Task | If the operation was performed asynchronously, this Task resource provides details about the status of the operation |
See the $merge
operation above for details about the returned Task resource in the case when the operation is performed asynchronously.
With the 2025.08 release, the $hapi.fhir.replace-references
operation creates a Provenance resource upon successful completion. This Provenance resource contains, in its target
element, the versioned references to the target resource,
the source resource, and the resources updated as part of the operation.
The Provenance.activity
is set to http://terminology.hl7.org/CodeSystem/iso-21089-lifecycle|link
, and the Provenance.agent.who
is populated with a logical reference to the request user. Note that a provenance resource for this operation is not created if no resources were actually updated because the source resource is not referenced by any resources.
Input:
POST /$hapi.fhir.replace-references
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-reference-id",
"valueString": "Patient/2"
},
{
"name": "target-reference-id",
"valueString": "Patient/3"
}
]
}
Output:
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "outcome",
"resource": {
"resourceType": "Bundle",
"id": "782add05-549c-4a7e-a687-38c22f2f12d0",
"type": "transaction-response",
"entry": [
{
"response": {
"status": "200 OK",
"location": "CarePlan/62/_history/2",
"etag": "2",
"outcome": {
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "information",
"code": "informational",
"details": {
"coding": [
{
"system": "https://hapifhir.io/fhir/CodeSystem/hapi-fhir-storage-response-code",
"code": "SUCCESSFUL_PATCH",
"display": "Patch succeeded."
}
]
},
"diagnostics": "Successfully patched resource \"CarePlan/62/_history/2\"."
}
]
}
}
},
... etc outcome of the rest of the patch operations ...
]
}
}
}
}
The $hapi.fhir.undo-replace-references
operation undoes the effects of the most recent $hapi.fhir. replace-references
operation on the given source and target ids. This operation uses the Provenance resource that was created by the $hapi.fhir.replace-references
operation, and restores the all resources that were updated as part of the operation back to their versions before the operation. This restore operation is done as an update, so it actually creates a newer version of each restored resource. The operation is performed as a transaction,
so it either restores all or none.
The hapi.fhir.undo-replace-references
operation currently has the following limitations:
$hapi.fhir.replace-references
operation was performed.Name | Type | Default | Notes |
---|---|---|---|
source-reference-id | String | The id of the source resource, this must be the same as the source-reference-id that was used in the $hapi-fhir-replace-references operation being undone | |
target-reference-id | String | The id of the target resource, this must be the same as the target-reference-id that was used in the $hapi-fhir-replace-references operation being undone |
Name | Type | Notes |
---|---|---|
outcome | OperationOutcome | The outcome of the operation |
Assuming that the $hapi.fhir.replace-references
operation was previously performed to replace references from Patient/2
to Patient/3
,
you can undo that operation with the following request:
Input:
POST /$hapi.fhir.undo-replace-references
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{
"name": "source-reference-id",
"valueString": "Patient/2"
},
{
"name": "target-reference-id",
"valueString": "Patient/3"
}
]
}
Output:
HTTP/1.1 200 OK
{
"resourceType": "Parameters",
"parameter": [
{
"name": "outcome",
"resource": {
"resourceType": "OperationOutcome",
"issue": [
{
"severity": "information",
"diagnostics": "Successfully restored 8 resources to their previous versions based on the Provenance resource: Provenance/1234/_history/1"
}
]
}
}
]
}
The Deduplication features of Smile CDR are under active development. Here is a roadmap of new features we are planning to roll out.
$merge
and $hapi.fhir.replace-references
$undo-merge
and $hapi.fhir.undo-replace-references
operations that use the Provenance resources to undo
the effects of those operations (assuming that none of the affected resources have subsequentially changed).$merge
to all Resource typesMATCH_AND_MERGE
which is similar to MATCH_ONLY
in that it uses MDM Rules but does not create any links or Golden Resources. When enabled, a MATCH_AND_MERGE
MDM module will automatically perform a $merge
operation on all inbound resources. Any resources that MATCH a single target will be merged into that resources following the MDM Survivorship rules and all references will be updated to point to the merged resource.$deduplicate
operation that takes a FHIR Bundle resource as input and uses MDM Rules to remove duplicates from the inbound Bundle and update references to point to the matched resource. Think of this as a stronger version of Conditional Create where you now have the full power of MDM matching to find matching resources rather than being limited by the FHIR Conditional Create syntax.$deduplicate
operation as a Camel Processor$submit-for-deduplication
operation that works like $mdm-submit
and performs MATCH_AND_MERGE on all resources that match the criteria in the request. E.g. If Organization resources with _source=ABC
were accidentally duplicated in your FHIR Repository, you could call $submit-for-deduplication
with the criteria Organization?_source=ABC
to submit all of those Organizations for deduplication. Ones that have existing matches would be deleted and all references updated to point to the remaining copy of that organization.