On this page:

11.1 ETL Import (Extract/Transform/Load)

 

The Smile CDR ETL Import module can be used to load data into the CDR via an external flat file in CSV (or similar) format.

11.1.1 Importing CSV Data

 

Comma-Separated Value (CSV) data is a simple table-based format containing rows of data in named (or sometimes simply numbered) columns. It is a commonly used format for exporting data from databases and spreadsheets in a platform neutral way.

CSV data is broken up into rows, where each new line forms a separate record of data. For example, the following example shows a simple extract of Patients from a system.

ID,FAMILYNAME,GIVENNAME,MIDDLENAME,BIRTHDATE,CITY
001,Smith,John,Edward,1920-01-22,LONDON
002,Fernie,Frances,,1967-12-11,PARIS

CSV data typically separates values using a comma but sometimes a different character is used. Tab-separated and pipe-separated are popular choices, and these are both supported by the ETL Import module. CSV data typically uses the first row to indicate the names of the columns but this is not mandatory – although it does make processing easier.

Importing CSV data involves three steps:

  1. Pushing data into the ETL Import module.
  2. Transforming this data into appropriate FHIR resources.
  3. Storing the data in a persistence module.

Creating a CSV-based ETL Import Module

The import module has several properties that should be set:

  • The import_source property should be set to CSV to specify that the importing data will be in CSV format.
  • A mapping.script should be specified (see below).
  • The csv_delimiter should be set to the appropriate delimiter (typically ,).

Creating a Mapping Script

The ETL Import module uses JavaScript to specify mapping scripts. The mapping script must contain at least one function, with signature function handle(inputMap, context) {}. This method will be called once for each row in the CSV file. This method should convert the row into one or more FHIR resources and then store them in the persistence module (or it may choose to ignore the row completely).

The handle method has two arguments:

  • inputMap (argument 0) – A dictionary containing key-value pairs. Each key represents a column name, and each value represents the value for the given key in the current row. Note that this map is strict: any request for a key (i.e. column) that is not defined in the CSV file results in an error.
  • context (argument 1) – An object containing details about the current processing job.

Mapping Script Example

/*
 * CSV Mapping function - Expects input in the format:
 * ID,FAMILYNAME,GIVENNAME,MIDDLENAME,BIRTHDATE,CITY
 */
function handle(inputMap, context) {
    Log.info("Processing CSV row from file: " + context.filename);

    // Create a patient
    var patient = ResourceBuilder.build('Patient');
    patient.id = Uuid.newPlaceholderId();

    // Identifier
    patient.identifier.system = 'http://example.com/mrns';
    patient.identifier.value = inputMap['ID'];

    // Name
    patient.name.family = inputMap['FAMILYNAME'];
    patient.name.given[0] = inputMap['GIVENNAME'];
    patient.name.given[1] = inputMap['MIDDLENAME'];

    // DOB
    patient.birthDate = inputMap['BIRTHDATE'];

    // Address
    patient.address.city = inputMap['CITY'];

    // Build a transaction and process it
    var transaction = TransactionBuilder.newTransactionBuilder();
    transaction.create(patient);
    Fhir.transaction(transaction);
}

In this example, each row is converted to a Patient resource and then stored in the CDR. The Smile CDR JavaScript execution environment makes it easy to build and manipulate FHIR Resource objects using the FHIR ResourceBuilder.

The created resources are then converted into FHIR transactions that can be used to store and manipulate resources.

For a complete reference of APIs available for JavaScript conversion see the Smile CDR Javascript Execution Environment documentation.

Locking Rows

A common problem when ingesting CSV data in a multithreaded environment is that multiple rows can attempt to create/update the same backing object, which leads to failures because the FHIR server does not permit multiple threads to update the same resource at the same time.

In order to assist with this, the context object has a method called lock(mutexName) that can be used to lock individual rows.

The lock object simply acquires a mutex on a particular string, releasing this mutex only when the row has finished processing.

For example:

function handle(inputMap, context) {
   var patient = ResourceBuilder.build('Patient');
   
   // Lock on the patient ID
   context.lock(inputMap['PATIENT_ID']);
   
   patient.id = inputMap['PATIENT_ID'];
   patient.name.family = inputMap['PATIENT_FAMILY'];
   
   Fhir.update(patient);
}

11.1.2 Ingesting Data

 

Once the module has been created, the ETL Import Endpoint on the JSON Admin API may be used to upload and process data.

See ETL Import Endpoint for details on how to use this endpoint. Note that this action must be invoked by a user with the ETL_IMPORT_PROCESS_FILE permission. This user does not require additional permissions in order to read and write FHIR data.