Some organizations, jurisdictions, or projects have policies requiring that specific fields within resources be obfuscated when at rest in the database.
For example, consider the following simple Patient resource:
{
"resourceType": "Patient",
"id": 123,
"name": [ {
"family": "Simpson",
"given": [ "Homer" ]
} ],
"birthDate": "1956-05-12",
"gender": "male"
}
In the example above, the complete resource text will be stored in a text field within the database. In addition, for FHIR search indexing, specific strings are extracted from the resource and also stored in dedicated database tables designed to support searching.
Of course, database-level encryption and HTTPS transport security should generally already be used to protect this data, but sometimes additional security is needed for specific fields. Smile CDR Tokenization causes specific elements within the FHIR resource body to be extracted and replaced with "tokens", which are opaque strings serving as placeholders for these elements.
By using the tokenization capability, these strings can be replaced by opaque placeholder strings, which reduces the ability for someone to re-identify the data if they have access to the database.
These tokenized strings can take any form and do not need to match the format of the original datatype. For this reason, they are stored in an extension and the original value is removed. The following example shows a Patient.birthDate element with its value replaced by a tokenized string:
{
"_birthDate": {
"extension": [
{
"url": "https://smilecdr.com/fhir/ns/StructureDefinition/resource-tokenized-value",
"valueCode": "cce82123-748d-4597-b52b-9200646ab788"
}
]
}
}
Smile CDR relies on a user-supplied algorithm for tokenization, and does not provide the tokenization capability (i.e. the actual algorithm used to convert between a plaintext string and a token) directly. This is because tokenization should be external to Smile CDR for separation of concerns.
The tokenization algorithm:
The tokenization algorithm is supplied via an implementation of the ITokenizationProvider interface. This interface provides two methods, one for converting a string into a token and the other for performing the reverse.
You can use the General Purpose Interceptor Demo Project as a starting point for creating your own tokenization provider.
The following example shows a tokenization provider:
/**
* A simple tokenization provider which tokenizes a few PHI fields within the Patient resource. This spec uses
* the (completely insecure!) ROT-13 algorithm for tokenization and is intended for demonstration
* purposes only.
*/
public class ExampleTokenizationProvider implements ITokenizationProvider {
/**
* This method is called in order to tokenize one or more source strings. The system tries
* to provide batches of strings for tokenization so that if an external tokenization
* service is used, and it supports batching, this capability can be leveraged.
*
* @param theRequestDetails The request associated with the tokenization. This object
* contains one or more strings for tokenization.
* @param theRequests The requests, which include the specific rule as well as the object being tokenized
*/
@Override
public TokenizationResults tokenize(RequestDetails theRequestDetails, TokenizationRequests theRequests) {
TokenizationResults retVal = new TokenizationResults();
for (TokenizationRequest request : theRequests) {
String source = request.getObjectAsString();
String token = rot13(source);
retVal.addResult(request, token);
}
return retVal;
}
/**
* This method is called in order to convert one or more tokenized strings back into their
* original source value. This method must return exactly the same value as was originally
* provided for tokenization. Method is only called if one or more of the configured
* tokenization rules declare support for de-tokenization.
*/
@Override
public DetokenizationResults detokenize(RequestDetails theRequestDetails, DetokenizationRequests theRequests) {
DetokenizationResults retVal = new DetokenizationResults();
for (DetokenizationRequest request : theRequests) {
String token = request.getToken();
String source = rot13(token);
retVal.addResult(request, source);
}
return retVal;
}
/**
* Implementation of ROT13 obfuscation, based on a solution found
* here: https://stackoverflow.com/questions/8981296/rot-13-function-in-java
* This is not intended to be a suitable production tokenization algorithm,
* it is simply provided as an easy way to demonstrate the concept!
*/
public static String rot13(String theInput) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < theInput.length(); i++) {
char c = theInput.charAt(i);
if (c >= 'a' && c <= 'm') c += 13;
else if (c >= 'A' && c <= 'M') c += 13;
else if (c >= 'n' && c <= 'z') c -= 13;
else if (c >= 'N' && c <= 'Z') c -= 13;
else if (c >= '0' && c <= '4') c += 5;
else if (c >= '5' && c <= '9') c -= 5;
sb.append(c);
}
return sb.toString();
}
}
The configured tokenization rules define the set of FHIR data elements which will be tokenized. Essentially they are a collection of FHIRPath Expressions which will be extracted from resources being stored in the repository, and replaced with equivalent tokens.
The rules are configured using the Tokenization Rules Text or Tokenization Rules File settings. The value is a JSON document using the TokenizationRules model.
Each rule must contain a FHIRPath expression, beginning with a resource type. For example, the expression Patient.name.family
instructs the module that when storing a Patient resource, each repetition of the Patient.name
element must have the family name extracted and replaced with a token.
If a given expression corresponds to a search parameter which is active on the server, that search parameter must also be declared in the rule. See Searching and Tokenization below.
The following example shows a rules collection with several active rules for the Patient resource.
{
"rules" : [ {
"description" : "Rule for a path including a search parameter",
"path" : "Patient.identifier",
"searchParameter" : "identifier",
"searchValueNormalization" : "IDENTIFIER"
}, {
"description" : "Another rule for a path including a search parameter",
"path" : "Patient.name.family",
"searchParameter" : "family",
"searchValueNormalization" : "STRING"
}, {
"description" : "Rule for a path with no associated search parameter",
"path" : "Patient.maritalStatus"
} ]
}
When an element in a resource is tokenized and that element is also used as a search parameter expression, declaring the search parameter as a part of the Tokenization Rule causes the search index to also be tokenized.
For example this means that if you have chosen to tokenize the Patient.name.family
element (which is used to support the family
Search Parameter) the tokenized string will be indexed instead of the original value. Suppose the configured tokenization algorithm tokenizes the value "Smith" with the token "ABCDEFG". When performing a search using this parameter, the value being searched for will also be tokenized in order to ensure that values can still be found.
To make this work, Smile CDR automatically creates internal SearchParameter resources with the same name as the original SearchParameter but with the suffix -tokenized
. Therefore, if a FHIR client performs a search for Patient?family=smith
, the search term will be automatically tokenized and the search will be treated as Patient?family-tokenized=ABCDEFG
.
If you need to support searching on a tokenized value, you may need to declare a normalization rule in order for the search to behave in the way a client would expect. Several normalization modes are available:
Patient.name.family
.Patient.identifier
.Observation.code
.Note the following limitations on searches which use tokenized values:
When using either Repository Validation or Endpoint Validation with this feature, resources are validated prior to tokenization.
This means that tokenization will not cause validation failures if a mandatory data element is then removed and tokenized. However, if a non-reversible tokenization algorithm is chosen it could mean that the resource will no longer meet the same requirements when it is returned.
The FHIR PATCH operation is not currently supported when Tokenization is enabled.