In many scenarios, resources such as DocumentReference are used to store large files such as scanned PDFs and images. These resources use the Attachment datatype, which ultimately stores a content type and a base 64 encoded representation of the binary content.
In the case of large files, using base 64 encoding can take up a lot of extra space, and placing these binary attachments inline within a FHIR resource can be cumbersome for clients.
This page describes strategies for dealing with large binary content.
Smile CDR can optionally be configured to move binary content into secondary storage that is better suited to large binary payloads than a relational database would be.
Configuring this setting away from the default is not always required: If your use cases don't involve storing lots of binary data, or if you will rarely be doing so it may be less hassle to simply store the binary data in your database. On the other hand, if you need to store lots of binary data, a relational database is not always the most efficient way of doing this and you should consider alternatives.
Important Note on Scope: This feature is currently only used for Attachment data and Binary resource data submitted and retrieved via the Binary Access Operations described below. Over time we plan to add other storage options as well as expand the data that can be stored with binary storage.
The following sections describe options for binary storage:
By default, all binary content is stored in the database, directly embedded within FHIR resources as base 64 encoded content. This is an easy configuration to use, especially for testing setups.
In this configuration, binary data will be stored in the same relational database as other FHIR resource contents, but it will be store separately in a BLOB column in the HFJ_BINARY_STORAGE
table.
Unlike the default Database mode, binary contents are not stored inline as Base64 encoded contents, and will generally be streamed directly to the database instead of being loaded into memory.
In filesystem mode, individual files are used within a directory structure to store binary content. Each file is assigned a globally unique name upon create, so it is fine to use a shared directory such as a network share, even if the directory is shared by multiple nodes in a cluster.
When setting up Filesystem based binary storage, the following settings apply:
binary_storage.filesystem.directory
: This specifies the path (either absolute or relative to Smile CDR) that is used as the base path to store binary files. Smile CDR will create and manage a directory structure beneath this path.In this mode, binary content is stored in an AWS S3 bucket. All binary data will be stored in one bucket, which can be named via the Blob Service Bucket / Container property. You can also configure the region by setting the Blob Service Region property. On boot, Smile CDR will create a bucket if it does not already exist.
Authentication to S3 is done using the DefaultAwsCredentialsProviderChain. This means that credentials can be provided in a variety of ways, including:
However, you also have the option to provide your own credentials via the Blob Service S3 Access Key property and the Blob Service S3 Secret Key property. If credentials are provided in this fashion, they will be used instead of the default credentials.
The account you authenticate with will need permissions to create buckets, as well as to put/head/get/delete objects in the bucket.
In this mode, binary content is stored in a MinIO server. All binary data will be stored in one bucket, which can be named via the Blob Service Bucket / Container property.
Authentication for MinIO must be provided via the Blob Service S3 Access Key and Blob Service S3 Secret Key properties.
Currently MinIO is only recommended for development purposes.
In this mode, binary content is stored in an Azure Blog Storage container. All binary data will be stored in one container, which can be named via the Blob Service Bucket / Container property. You can also configure the account name by setting the Blob Service Azure Account property. On boot, Smile CDR will create a container if it does not already exist.
There are three different supported authentication methods:
The authenticated account will need permissions to create containers, as well as to PUT/HEAD/GET/DELETE objects in the container.
HAPI FHIR provides two custom FHIR operations that can be used to interact directly with binary content contained within resources such as DocumentReference. These operations can be used both to write and read back binary content.
These operations can be enabled/disabled using the Binary Access Operations Enabled property.
Note that these operations are subject to all the same security restrictions as a standard FHIR read/write. In other words, a user needs to have appropriate write permissions to the DocumentReference resource in question in order to be able to write binary content within it, and a user needs to have appropriate read permissions in order to read binary content from it.
The act of writing a binary payload to a FHIR Endpoint using the Binary Access Write Operation is a two step process: First, the container resource must be created on the server, with a placeholder Attachment that will be populated afterward. Second, the Binary Access Write Operation is invoked to directly populate the content.
The following shows a simple example of a creation of a DocumentReference with a placeholder Attachment. Note the almost empty attachment element that must be created in order to create a place for the attachment reference.
POST /DocumentReference
Content-Type: application/fhir+json
{
"resourceType": "DocumentReference",
"subject": {
"reference": "Patient/123"
},
"content": [
{
"attachment": {
"contentType": "image/jpeg"
}
}
]
}
The server will reply with a Location
header containing the ID of the newly created resource.
Location: http://localhost:8000/DocumentReference/1623/_history/1
This ID is then used in the Binary Access Write Operation to set the binary content. Note the path
parameter, which specifies a FHIRPath expression to the attachment element within the DocumentReference resource. It is important to provide the appropriate content type via the Content-Type
header in the operation HTTP request. Smile CDR does not validate this content type, but it will be faithfully preserved and returned if the payload is requested via the Binary Access Read operation.
POST /DocumentReference/1623/$binary-access-write?path=DocumentReference.content.attachment
Content-Type: image/png
(... binary content ...)
The Binary Access Read Operation can be used to read back binary content from Attachent elements in a similar way to the write operation above.
The following example shows a read operation:
GET /DocumentReference/1623/$binary-access-read?path=DocumentReference.content.attachment
Note that the $binary-access-read
operation works on any resource which contains base64 data. For example:
GET /DocumentReference/[id]/$binary-access-read?path=DocumentReference.content.attachment
GET /Binary/[id]/$binary-access-read?path=Binary.data
GET /Media/[id]/$binary-access-read?path=Media.content
It is important to note that if a binary is of such a size that it has been externalized, then it will never be shown as encoded base64, and must always be streamed from the server via the $binary-access-read
operation.
The server will then respond by serving the binary content with the correct Content Type.
The Media resource is used to store media such as photos in the FHIR respository. A Media resource has fields for storing metadata such as the subject of the media and the body site, but also has two primary fields for storing the media itself:
Media.content.contentType
field stores the MIME type of the media, e.g. image/png
Media.content.data
field stores the media itselfWhen retrieving the resource via a standard FHIR operation (e.g. a read or a search) the data is represented as base64 encoded data.
If the Serve Raw Media Resources property is enabled, clients may request the raw contents of the Media resource.
Enabling this setting causes two things to happen:
For example, consider the following (abbreviated) Media resource:
{
"resourceType": "Media",
"id": "example999",
"subject": { "reference": "Patient/123" },
"content": {
"contentType": "image/png",
"data": "R0lGODlhfgCRAPcAAAAAAIAAAACAAICAAAAAgIAA"
}
}
This resource will be served as a raw binary image if the following HTTP request is used:
GET /Media/example999
Accept: image/png
The _output
parameter may be used with a value of data
to indicate to the server that this resource should be served raw.
For example, the following request will request the resource above as raw content.
GET /Media/example999?_output=data
By default, Binary Elements that have been externalized will be automatically reinflated into the FHIR Resource that references them, when that FHIR Resource is requested via a standard outbound FHIR Operation(e.g. read/search). For example, consider a DocumentReference with ID 123 that is stored, and a binary attached to it via $binary-access-write
. When a user calls GET /DocumentReference/123
Smile CDR will automatically fetch the binary data and inflate it in the data element of DocumentReference as Base64 encoded data.
This behaviour is useful in most cases, but if you are often querying resources that contain binaries, this can cause request sizes to grow very large. You can control this behaviour via two settings:
This feature can be completely disabled by toggling off auto-inflate binaries. If you wish to never automatically inflate a binary, you can set this to false, and it will prevent any binary substitution from occurring during reads or searches.
If you do wish to externalize binaries, the Maximum auto-inflate size(bytes) setting permits you to change the total size of the binaries that will be included in the response.
For example, if you had 3 Binary resources, each with a size of 100 bytes, and you set this value to 200, then the response to a search for all binaries would contain the first two binaries inlined, and the server will return a link to the third binary.