Where health data becomes knowledgeTalk to us ↗

UQL · Unison Query Language

One query. Every biobank. Aggregate-only by design.

UQL is a clinical-research query language that compiles to federated execution, returns aggregate-only results, and leaves a replayable artefact behind. Write the cohort once. Run it everywhere your data lives.

Read the docs Try it in a sandbox docker run entsupml/unison-runner:0.4.8

cohort.yaml

SELECT c1.condition_concept_id · gender_concept_id

JOIN condition_occurrence: c1 · drug_exposure: d1

FROM [BIOBANK_A, EHR_B, NATIONAL_C]

WHERE c1.condition_concept_id IN find_concepts(

"Type 2 diabetes mellitus", with_descendants=true)

year_of_birth <= 2008

LIMIT 30

# compile → execute federated → return aggregate

READ THE QUERY

A protocol you can actually read.

UQL looks like the protocol, not like SQL. Cohort, exclusions, outcome, comparator, federation, return shape — expressed once, versioned, replayable.

Declarative

YAML cohorts against OMOP concepts. Not raw SQL.

OMOP-native

Concepts, not local codes.

Federation-aware

A single query fans out.

Aggregate-only

Patient-level data never moves.

Versioned

Every execution an artefact.

Agent-ready

MCP surface built-in.

01 · Why UQL exists

SQL wasn't designed for federated clinical research. We wrote something that was.

WHAT HURT BEFORE

SQL per biobank — different schemas, different dialects, different answers.
Cohort logic re-written for every network; reproducibility drifts.
Patient-level movement becomes the bottleneck — legal, ethical, technical.
No artefact a regulator or reviewer can re-run.

WHAT UQL DOES

One query against OMOP concepts. Compiles to the dialect of each target.
Cohort logic is pre-specified, versioned, auditable. Drift becomes observable.
Runner executes in-situ at each custodian. Aggregate-only returns.
Every execution is a replayable artefact — reviewer re-runs, not just reads.

02 · Anatomy of a query

Seven primitives. Everything else composes from them.

cohort

Population definition

Named, versioned, reusable. A cohort is a first-class artefact, not a scratchpad.

where

Inclusion criteria

OMOP concepts, temporal windows, lab-value thresholds. Composable with boolean logic.

exclude

Exclusion criteria

Prior exposures, contraindications, washout windows. Same shape as inclusion.

outcome

Event of interest

Clinical events, lab changes, device follow-ups. Time-to-event or count, by window.

compare

Comparator & matching

Pre-specified PS matching, external controls, comparator arms. Methods locked before execution.

across

Federation target

One or more Runners. The query fans out; results fan in as aggregate-only.

return

Shape of the answer

Tables, curves, artefact handles. Never patient-level data unless explicitly pre-approved.

03 · How a UQL query executes

Compile once. Fan out. Fan in. Aggregate-only.

Author

Write the query as a protocol. Version it. Lock before execution.

Compile

UQL compiles to the dialect of each target — PostgreSQL, MySQL, SQL Server, Spark.

Federate

Runners execute in-situ at each custodian. Data never leaves the boundary.

Aggregate

Only aggregate results return. Small-cell suppression applied per custodian policy.

Artefact

Every execution produces a versioned, replayable artefact. Reviewer re-runs, not just reads.

AGENT-READY BY DESIGN

UQL speaks MCP. Point an agent at the federation; it writes protocol-shaped queries, not SQL improv.

UQL exposes its grammar, concept catalogue, and federation topology through an MCP server. An LLM can reason about cohorts without inventing schema — and every generated query lands as a versioned, replayable artefact.

Grammar-aware

LLM authors valid YAML UQL via MCP tools

Concept-grounded

OMOP vocabularies exposed as tools

Aggregate-only

agent cannot exfiltrate patient rows

Artefact-logged

every run traceable, replayable

# agent · mcp://unison/uql

> "GLP-1 vs DPP-4, MACE at 24m, T2D adults, across our federation."

→ tool: biobank_list · 3 online

→ tool: biobank_search_mapped_values("%GLP-1%") · 14 concepts

→ tool: cohort_execute_query · 3 biobanks · aggregate-only

Pooled PS-matched · n(GLP-1)=24,812 n(DPP-4)=24,812

24m MACE GLP-1 4.2% DPP-4 5.6%

HR 0.74 (0.67–0.82)

# uql://query/mace-glp-9d42 · versioned · replayable

04 · What UQL runs against

If it speaks OMOP — or can be mapped to it — UQL can query it.

DATABASES

· PostgreSQL

· MySQL

· SQL Server

· Spark / Hive

NETWORKS

· OHDSI

· DARWIN EU

· National data bodies

· Consortia

EXECUTION

· Docker

· Kubernetes

· Nextflow pipelines

INTEGRATION

· MCP tools

· REST API

Trust model:· Aggregate-only returns· Small-cell suppression· Versioned artefacts· Versioned concepts· Custodian-controlled Runners

05 · Who writes UQL

Not just the data team.

BIOSTATISTICIANS

Methods as code

Pre-specify matching, subgroup analyses and sensitivity analyses as versioned UQL — not SAP appendices that drift from the run.

EPIDEMIOLOGISTS

Protocol as query

The same cohort logic that appears in the protocol is the one that executes. Registry drift disappears.

CLINICAL LEADS

Templates, not syntax

Pick a pre-built template — burden-of-illness, persistence, external control — parameterise it, dispatch it.

UQL SDK · sandbox access · onboarding workshop

Write the protocol. Run it everywhere.

Read the docs Request sandbox