SQL Syntax: Select
This module uses a proprietary flavour of SQL that is specific to HAPI FHIR. This syntax is referred to as HFQL, the HAPI FHIR Query Language.
HFQL keywords are case-insensitive, although expressions referring to elements within FHIR resources are case-sensitive. All examples here use CAPITALIZED SQL keywords for easier readability but this is not required.
Unlike RDBMS tables, FHIR resources are document based with many repeating elements. Simply selecting columns like in standard SQL can be complicated by this structure. In order to provide flexibility, SELECT expressions can be expressed as FHIRPath Expressions. The FROM keyword indicates which FHIR resource type to search for and return values from.
The following example searches for all Patient resources, and returns their respective ID, family name, and given name in 3 separate columns.
SELECT id, name[0].family, name[0].given[0]
FROM Patient
One of the most common challenges when writing HFQL expressions is in dealing with repeating elements. This challenge arises because FHIR is a document format, and many elements can repeat. This naturally conflicts with the traditional RDBMS/SQL model where each column can hold a single value within a given row.
Consider the Patient.name
element. This element can repeat within a document, and holds values of the FHIR general purpose datatype HumanName. The HumanName datatype in turn holds an element called family
which is non-repeatable, and an element called given
which is repeatable.
Because of this, the simple path Patient.name.given
can potentially return multiple values, either because there are multiple name
repetitions, or multiple given
repetitions, or both. As a result, selecting a path which can potentially return multiple values will result in JSON output.
For example, consider the following example resource:
{
"resourceType": "Patient",
"name": [{
"family": "Simpson",
"given": ["Homer", "Jay"]
}]
}
Selecting the path Patient.name.given
will result in a column with the value ["Homer", "Jay"]
. You can use array index markers if you want to explicitly select a specific repetition:
SELECT id, name[0].family, name[0].given[0]
FROM Patient
FHIRPath also supports various other mechanisms for picking exactly which leaf in a document you want to return. For example, the where operator could be used to pick a specific name regardless of which repetition it was found in.
SELECT id, name.where(use = 'maiden').first().family
FROM Patient
If your FHIRPath expression points to a non-primitive element within a FHIR resource, the element will be encoded using FHIR JSON encoding rules and returned as a STRING column value.
For example, consider the following query:
SELECT name
FROM Patient
The name element in FHIR is a complex object, so column values would resemble the following:
{"family":"Simpson","given":["Homer", "Jay"]}
If you want to control the column name for a given SELECT expression, you can use a standard SQL AS clause. For example:
SELECT id AS ID,
name[0].family AS FamilyName,
name[0].given[0] AS GivenName,
name[0].given[1] AS MiddleName
FROM Patient
Alternately, you can also create an alias by prefixing the expression by an alias and a colon. For example:
SELECT ID: id,
FamilyName: name[0].family,
GivenName: name[0].given[0],
MiddleName: name[0].given[1]
FROM Patient
The HFQL processor will always try to use the most appropriate SQL datatype for column values that it returns. It does this by inspecting the FHIRPath expression and trying to determine the FHIR datatype associated with the leaf node in the path.
In the case of choice[x] elements, where the datatype can be one of several possibilities, use the FHIRPath ofType(...) operator to select the type you want to return. This also acts as a hint to the HFQL parser about which SQL datatype should be returned. In the example below, only Observation values with a Quantity type are returned, and the Quantity.value
element is returned. Because this element has a FHIR decimal datatype, this column is returned by the HFQL processor as a SQL Decimal value.
SELECT
id,
value.ofType(Quantity).system AS Quantity_System,
value.ofType(Quantity).unit AS Quantity_Unit,
value.ofType(Quantity).value AS Quantity_Value
FROM
Observation
If you expect different choice[x] types to be available in different resources being selected (e.g. if some Observations will have a valueQuantity
but others will have a valueCoding
) you may wish to include columns for each. An example is shown below. When using a construct like this one, the Coding_* columns will be NULL if the Quantity columns are not, and vice-versa which may make processing easier.
SELECT
id,
value.ofType(Quantity).system AS Quantity_System,
value.ofType(Quantity).unit AS Quantity_Unit,
value.ofType(Quantity).value AS Quantity_Value,
value.ofType(Coding).system AS Coding_System,
value.ofType(Coding).code AS Coding_Code,
value.ofType(Coding).display AS Coding_Display
FROM
Observation
Of course, you can always simply include the choice element name as shown below. This will result in JSON content being emitted for whatever type is found.
SELECT
id,
value
FROM
Observation
The COUNT operator can be used to return counts of various items.
Be warned that this is potentially a very slow operation, since the HFQL executor needs to examine every single Patient resource in the database.
If you are returning any other SELECT expressions, you must also include those expressions in a corresponding GROUP BY clause.
The following example returns a table containing each distinct family name in the database, and the number of times it appears.
SELECT COUNT(*), name[0].family
FROM Patient
GROUP BY name[0].family
When queries with a GROUP BY expression containing a repeating element, each repetition will be counted once. For example, consider the following resource:
{
"resourceType": "Patient",
"name": [{
"family": "Simpson",
"given": ["Homer"]
}, {
"family": "Simpson",
"given": ["Jay"]
}]
}
Using the following query:
SELECT COUNT(*), name.family
FROM Patient
GROUP BY name.family
Results in the following values:
COUNT(*) | name.given |
---|---|
2 | Simpson |
The SQL ORDER BY keywords can be used to suggest a specific order for the search results.
Note that ordering will currently only work on statements that return 10,000 rows or less. This limitation may be removed in a future release.
The following statement searches for Patients with a family name of Simpson, counts the frequencies of their first names, and orders by the most frequent names.
SELECT name.given, count(*)
FROM Patient
WHERE family = 'Simpson'
GROUP BY name.given
ORDER BY count(*) DESC
FHIRPath contains various helpful keywords that can be used to manipulate the data being returned.
A simple example showing string concatenation is shown below. Note the use of the [0]
indexes on repeatable fields. Without these this expression will fail with a FHIRPath execution error if multiple names are present, as FHIRPath does not allow string concatenation using a list of strings on either side of the + operator.
SELECT
Patient.name[0].given[0] + ' ' + Patient.name[0].family AS FullName
FROM Patient
The SQL LIMIT clause can be used to restrict the number of results that can be fetched. For example:
SELECT
id, name[0].family, name[0].given[0]
FROM
Patient
LIMIT
100