6.17.1Binary Data

 

In many scenarios, resources such as DocumentReference are used to store large files such as scanned PDFs and images. These resources use the Attachment datatype, which ultimately stores a content type and a base 64 encoded representation of the binary content.

In the case of large files, using base 64 encoding can take up a lot of extra space, and placing these binary attachments inline within a FHIR resource can be cumbersome for clients.

This page describes strategies for dealing with large binary content.

6.17.2Externalized Binary Storage

 

Smile CDR can optionally be configured to move binary content into secondary storage that is better suited to large binary payloads than a relational database would be.

Configuring this setting away from the default is not always required: If your use cases don't involve storing lots of binary data, or if you will rarely be doing so it may be less hassle to simply store the binary data in your database. On the other hand, if you need to store lots of binary data, a relational database is not always the most efficient way of doing this and you should consider alternatives.

Important Note on Scope: This feature is currently only used for Attachment data and Binary resource data submitted and retrieved via the Binary Access Operations described below. Over time we plan to add other storage options as well as expand the data that can be stored with binary storage.

The following sections describe options for binary storage:

6.17.2.1Binary Storage Mode: Database (Default)

By default, all binary content is stored in the database, directly embedded within FHIR resources as base 64 encoded content. This is an easy configuration to use, especially for testing setups.

6.17.2.2Binary Storage Mode: Database Blob

In this configuration, binary data will be stored in the same relational database as other FHIR resource contents, but it will be store separately in a BLOB column in the HFJ_BINARY_STORAGE table.

Unlike the default Database mode, binary contents are not stored inline as Base64 encoded contents, and will generally be streamed directly to the database instead of being loaded into memory.

6.17.2.3Binary Storage Mode: Filesystem

In filesystem mode, individual files are used within a directory structure to store binary content. Each file is assigned a globally unique name upon create, so it is fine to use a shared directory such as a network share, even if the directory is shared by multiple nodes in a cluster.

When setting up Filesystem based binary storage, the following settings apply:

  • binary_storage.filesystem.directory: This specifies the path (either absolute or relative to Smile CDR) that is used as the base path to store binary files. Smile CDR will create and manage a directory structure beneath this path.

6.17.2.4Binary Storage Mode: AWS S3

In this mode, binary content is stored in an AWS S3 bucket. All binary data will be stored in one bucket, which can be named via the Blob Service Bucket / Container property. You can also configure the region by setting the Blob Service Region property. On boot, Smile CDR will create a bucket if it does not already exist.

Authentication to S3 is done using the DefaultAwsCredentialsProviderChain. This means that credentials can be provided in a variety of ways, including:

  • Environment Variables
  • Java System Properties
  • Credential Profiles File

However, you also have the option to provide your own credentials via the Blob Service S3 Access Key property and the Blob Service S3 Secret Key property. If credentials are provided in this fashion, they will be used instead of the default credentials.

The account you authenticate with will need permissions to create buckets, as well as to put/head/get/delete objects in the bucket.

6.17.2.5Binary Storage Mode: MinIO

In this mode, binary content is stored in a MinIO server. All binary data will be stored in one bucket, which can be named via the Blob Service Bucket / Container property.

Authentication for MinIO must be provided via the Blob Service S3 Access Key and Blob Service S3 Secret Key properties.

Currently MinIO is only recommended for development purposes.

6.17.2.6Binary Storage Mode: Azure Blob Storage

In this mode, binary content is stored in an Azure Blog Storage container. All binary data will be stored in one container, which can be named via the Blob Service Bucket / Container property. You can also configure the account name by setting the Blob Service Azure Account property. On boot, Smile CDR will create a container if it does not already exist.

There are three different supported authentication methods:

  1. Access Key
  2. Account-level Shared Access Signatures (SAS) token
  3. Azure Active Directory

The authenticated account will need permissions to create containers, as well as to PUT/HEAD/GET/DELETE objects in the container.

6.17.3Binary Access Operations

 

HAPI FHIR provides two custom FHIR operations that can be used to interact directly with binary content contained within resources such as DocumentReference. These operations can be used both to write and read back binary content.

These operations can be enabled/disabled using the Binary Access Operations Enabled property.

Note that these operations are subject to all the same security restrictions as a standard FHIR read/write. In other words, a user needs to have appropriate write permissions to the DocumentReference resource in question in order to be able to write binary content within it, and a user needs to have appropriate read permissions in order to read binary content from it.

6.17.3.1Binary Access Write Operation

The act of writing a binary payload to a FHIR Endpoint using the Binary Access Write Operation is a two step process: First, the container resource must be created on the server, with a placeholder Attachment that will be populated afterward. Second, the Binary Access Write Operation is invoked to directly populate the content.

The following shows a simple example of a creation of a DocumentReference with a placeholder Attachment. Note the almost empty attachment element that must be created in order to create a place for the attachment reference.

POST /DocumentReference
Content-Type: application/fhir+json

{
  "resourceType": "DocumentReference",
	"subject": {
		"reference": "Patient/123"
	},
  "content": [
    {
      "attachment": {
        "contentType": "image/jpeg"
      }
    }
  ]
}

The server will reply with a Location header containing the ID of the newly created resource.

Location: http://localhost:8000/DocumentReference/1623/_history/1

This ID is then used in the Binary Access Write Operation to set the binary content. Note the path parameter, which specifies a FHIRPath expression to the attachment element within the DocumentReference resource. It is important to provide the appropriate content type via the Content-Type header in the operation HTTP request. Smile CDR does not validate this content type, but it will be faithfully preserved and returned if the payload is requested via the Binary Access Read operation.

POST /DocumentReference/1623/$binary-access-write?path=DocumentReference.content.attachment
Content-Type: image/png

(... binary content ...)

6.17.3.2Binary Access Read Operation

The Binary Access Read Operation can be used to read back binary content from Attachent elements in a similar way to the write operation above.

The following example shows a read operation:

GET /DocumentReference/1623/$binary-access-read?path=DocumentReference.content.attachment

Note that the $binary-access-read operation works on any resource which contains base64 data. For example:

GET /DocumentReference/[id]/$binary-access-read?path=DocumentReference.content.attachment
GET /Binary/[id]/$binary-access-read?path=Binary.data
GET /Media/[id]/$binary-access-read?path=Media.content

It is important to note that if a binary is of such a size that it has been externalized, then it will never be shown as encoded base64, and must always be streamed from the server via the $binary-access-read operation.

The server will then respond by serving the binary content with the correct Content Type.

6.17.4Serving Raw Media Resources

 

The Media resource is used to store media such as photos in the FHIR respository. A Media resource has fields for storing metadata such as the subject of the media and the body site, but also has two primary fields for storing the media itself:

  • The Media.content.contentType field stores the mime type of the media, e.g. image/png
  • The Media.content.data field stores the media itself

When retrieving the resource via a standard FHIR operation (e.g. a read or a search) the data is represented as base64 encoded data.

If the Serve Raw Media Resources property is enabled, clients may request the raw contents of the Media resource.

Enabling this setting causes two things to happen:

  1. Raw content is served if the Accept header matches the content type exactly

For example, consider the following (abbreviated) Media resource:

{
  "resourceType": "Media",
  "id": "example999",
  "subject": { "reference": "Patient/123" },
  "content": {
    "contentType": "image/png",
    "data": "R0lGODlhfgCRAPcAAAAAAIAAAACAAICAAAAAgIAA"
  }
}

This resource will be served as a raw binary image if the following HTTP request is used:

GET /Media/example999
Accept: image/png
  1. Raw content is served if the client explicitly requests it

The _output parameter may be used with a value of data to indicate to the server that this resource should be served raw.

For example, the following request will request the resource above as raw content.

GET /Media/example999?_output=data

6.17.5Serving binary data within a FHIR Resource

 

By default, Binary Elements that have been externalized will be automatically reinflated into the FHIR Resource that references them, when that FHIR Resource is requested via a standard outbound FHIR Operation(e.g. read/search). For example, consider a DocumentReference with ID 123 that is stored, and a binary attached to it via $binary-access-write. When a user calls GET /DocumentReference/123 Smile CDR will automatically fetch the binary data and inflate it in the data element of DocumentReference as Base64 encoded data.

This behaviour is useful in most cases, but if you are often querying resources that contain binaries, this can cause request sizes to grow very large. You can control this behaviour via two settings:

6.17.5.1Disable automatic inflating of Binary Data

This feature can be completely disabled by toggling off auto-inflate binaries. If you wish to never automatically inflate a binary, you can set this to false, and it will prevent any binary substitution from occurring during reads or searches.

6.17.5.2Limiting the amount of binary data returned

If you do wish to externalize binaries, the Maximum auto-inflate size(bytes) setting permits you to change the total size of the binaries that will be included in the response.

For example, if you had 3 Binary resources, each with a size of 100 bytes, and you set this value to 200, then the response to a search for all binaries would contain the first two binaries inlined, and the server will return a link to the third binary.