UQL · Unison Query Language

One query. Every biobank. Aggregate-only by design.

UQL is a clinical-research query language that compiles to federated execution, returns aggregate-only results, and leaves a replayable artefact behind. Write the cohort once. Run it everywhere your data lives.

Read the docsTry it in a sandbox pip install unison-uql · v1.4
cohort.uql
cohort "type-2-diabetes-adults"
where age >= 18 and hba1c >= 7.0
exclude prior_exposure("insulin", window="12m")
outcome major_adverse_cv_event within 24m
compare glp1_agonist vs dpp4_inhibitor · PS-matched
across federation("biobank-a", "ehr-b", "national-c")
return aggregate_only · replayable_artefact
# compile → execute federated → return aggregate
READ THE QUERY
A protocol you can actually read.

UQL looks like the protocol, not like SQL. Cohort, exclusions, outcome, comparator, federation, return shape — expressed once, versioned, replayable.

Protocol-shaped
Cohort · outcome · comparator, not joins.
OMOP-native
Concepts, not local codes.
Federation-aware
A single query fans out.
Aggregate-only
Patient-level data never moves.
Versioned
Every execution an artefact.
Agent-ready
MCP surface built-in.
01 · Why UQL exists

SQL wasn't designed for federated clinical research. We wrote something that was.

WHAT HURT BEFORE
  • SQL per biobank — different schemas, different dialects, different answers.
  • Cohort logic re-written for every network; reproducibility drifts.
  • Patient-level movement becomes the bottleneck — legal, ethical, technical.
  • No artefact a regulator or reviewer can re-run.
WHAT UQL DOES
  • One query against OMOP concepts. Compiles to the dialect of each target.
  • Cohort logic is pre-specified, versioned, auditable. Drift becomes observable.
  • Runner executes in-situ at each custodian. Aggregate-only returns.
  • Every execution is a replayable artefact — reviewer re-runs, not just reads.
02 · Anatomy of a query

Seven primitives. Everything else composes from them.

cohort
Population definition
Named, versioned, reusable. A cohort is a first-class artefact, not a scratchpad.
where
Inclusion criteria
OMOP concepts, temporal windows, lab-value thresholds. Composable with boolean logic.
exclude
Exclusion criteria
Prior exposures, contraindications, washout windows. Same shape as inclusion.
outcome
Event of interest
Clinical events, lab changes, device follow-ups. Time-to-event or count, by window.
compare
Comparator & matching
Pre-specified PS matching, external controls, comparator arms. Methods locked before execution.
across
Federation target
One or more Runners. The query fans out; results fan in as aggregate-only.
return
Shape of the answer
Tables, curves, artefact handles. Never patient-level data unless explicitly pre-approved.
03 · How a UQL query executes

Compile once. Fan out. Fan in. Aggregate-only.

01
Author
Write the query as a protocol. Version it. Lock before execution.
02
Compile
UQL compiles to the dialect of each target — Spark, Postgres, BigQuery, custom Runner backends.
03
Federate
Runners execute in-situ at each custodian. Data never leaves the boundary.
04
Aggregate
Only aggregate results return. Small-cell suppression, differential-privacy hooks available.
05
Artefact
Every execution produces a signed, replayable artefact. Reviewer re-runs, not just reads.
AGENT-READY BY DESIGN

UQL speaks MCP. Point an agent at the federation; it writes protocol-shaped queries, not SQL improv.

UQL exposes its grammar, concept catalogue, and federation topology through an MCP server. An LLM can reason about cohorts without inventing schema — and every generated query lands as a versioned, replayable artefact.

Grammar-aware
LLM compiles to valid UQL, not near-SQL
Concept-grounded
OMOP vocabularies exposed as tools
Aggregate-only
agent cannot exfiltrate patient rows
Artefact-logged
every run traceable, replayable
# agent · mcp://unison/uql
> "GLP-1 vs DPP-4, MACE at 24m, T2D adults, across our federation."
→ tool: list_concepts("glp1_agonist") · 14 matched
→ tool: compile_uql → cohort.uql v3
→ tool: dispatch_federated · 3 targets · aggregate-only
Pooled PS-matched · n(GLP-1)=24,812 n(DPP-4)=24,812
24m MACE GLP-1 4.2% DPP-4 5.6%
HR 0.74 (0.67–0.82)
# uql://query/mace-glp-9d42 · signed · replayable
04 · What UQL runs against

If it speaks OMOP — or can be mapped to it — UQL can query it.

WAREHOUSES
· Spark
· BigQuery
· Snowflake
· Databricks
DATABASES
· PostgreSQL
· SQL Server
· Oracle
· DuckDB
NETWORKS
· OHDSI
· DARWIN EU
· National data bodies
· Consortia
CUSTOM
· Runner SDK
· Python / R bindings
· MCP connectors
· REST & gRPC
Trust model:· Aggregate-only returns· Small-cell suppression· Differential-privacy hooks· Signed artefacts· Versioned concepts· Custodian-controlled Runners· CFR 21 Part 11-ready
05 · Who writes UQL

Not just the data team.

BIOSTATISTICIANS
Methods as code
Pre-specify matching, subgroup analyses and sensitivity analyses as versioned UQL — not SAP appendices that drift from the run.
EPIDEMIOLOGISTS
Protocol as query
The same cohort logic that appears in the protocol is the one that executes. Registry drift disappears.
CLINICAL LEADS
Templates, not syntax
Pick a pre-built template — burden-of-illness, persistence, external control — parameterise it, dispatch it.
UQL SDK · sandbox access · onboarding workshop

Write the protocol. Run it everywhere.