Semantic Leakage
Semantic leakage is what happens when business logic that should live in the governed semantic layer escapes into other parts of the stack. The metric definition for "revenue" starts in one place but gradually gets reimplemented – in dbt models, in dashboard formulas, in spreadsheets, in analyst memory.
Over time, metric definitions fork. Trust erodes. And the semantic layer loses its authority as the single source of truth.
Two directions of leakage
Semantic leakage flows in two directions, each with different causes and consequences.
Upstream leakage pushes logic into the transformation layer. When a semantic layer can't express a nested aggregation or cross-grain ratio natively, teams build derived tables in dbt or LookML to pre-compute the metric. The business logic now lives in a transformation pipeline, outside the governed layer. Changes require engineering cycles. Business users can't modify or inspect the logic. And the metric is no longer portable – it's locked into a specific materialization.
Downstream leakage pushes logic into the consumption layer. When the semantic layer can't handle a follow-up question, users write table calculations in Looker, DAX overrides in Power BI, or export to a spreadsheet and compute there. The business logic now lives in a dashboard or a Google Sheet. It's ungoverned, undocumented, and invisible to everyone else.
Both directions share the same root cause: the semantic ceiling – the limit of what the semantic layer can express natively.
How to spot semantic leakage
A few diagnostic signals:
Derived table proliferation. If your dbt project or LookML project has a growing number of models that exist solely to pre-compute metrics for the BI layer, that's upstream leakage. The transformation layer is doing the semantic layer's job.
Table calculations in production dashboards. If business-critical dashboards contain calculations defined at the dashboard level rather than in the semantic layer, that's downstream leakage. These calculations bypass governance and often diverge from the "official" metric.
Spreadsheet exports for "real analysis." If analysts routinely export data to Excel or Google Sheets to compute metrics the BI tool can't handle, the semantic layer has lost control of those metrics.
"Which number is right?" conversations. When two dashboards show different values for the same metric, the underlying cause is almost always business logic living in more than one place.
Analyst tickets for repeatable questions. If the data team receives recurring requests for questions that should be self-service – like "compare this quarter to last quarter" – the semantic layer isn't expressive enough to handle them, and users have learned to route around it.
Why AI makes semantic leakage worse
Historically, human analysts absorbed the ambiguity. When a business user asked a question the BI tool couldn't answer natively, an analyst wrote the SQL, applied institutional knowledge, and delivered the result. The leakage existed, but a human buffer contained it.
AI removes that buffer. When an AI interface receives a complex question and the semantic layer can't express it, the AI falls back to raw text-to-SQL – guessing table structures, join paths, and filter logic. The result looks polished. It may be wrong. And there's no analyst in the loop to catch the error.
The more AI surfaces become available to end users, the more semantic leakage matters. Every ungoverned fallback is a potential source of confidently wrong answers at scale.
Reducing semantic leakage
The primary lever is semantic layer expressiveness. A layer that handles nested aggregations, period-over-period comparisons, cross-grain ratios, and multi-step calculations natively gives business logic fewer reasons to escape.
The secondary lever is governance. Metric certification, audit trails, and version control make it visible when logic moves outside the governed layer – and create organizational pressure to pull it back in.
Neither lever works alone. Expressiveness without governance allows drift. Governance without expressiveness creates workarounds.
The Holistics Perspective
Holistics' AQL is designed to keep complex metric logic inside the semantic layer rather than pushing it into dbt models or dashboard formulas. Nested aggregations, period-over-period comparisons, and cross-grain ratios are expressed natively in AQL – removing the most common triggers for upstream and downstream leakage.
See how Holistics approaches this →