Not all semantic layers are equal.

Every BI vendor demos the same AI chat. The real differentiator is the semantic layer underneath, and most are more primitive than they look.

The demo isn't the product The layer sets the ceiling AI inherits whatever you ship

View as presentation

The demo trap

Every BI vendor shows "chat → dashboard." That's the LLM's ability, not theirs.

A look-up question on a single fact table will impress any modern LLM, on any vendor. It says almost nothing about whether the underlying system can answer the questions your business actually asks day to day.

What the demo shows

"Show revenue by region."

"List customers by country."

"Top 10 products by sales."

Simple metric + dimension combinations. Any LLM with a schema gets this.

What you'll actually be asked

"Of customers who signed up in January, what was 30-day conversion vs. April's cohort, and how does ARPU compare for the two groups?"

Cohort Period comparison Nested aggregation Cross-grain ratio

Four operators stacked into one breath.

The actual mechanism

The semantic layer is what makes AI analytics reliable.

If the AI gets user intent right, it hands that intent to the semantic layer. The layer deterministically generates accurate SQL, without guessing or drift.

Raw text-to-SQL

LLM owns intent and execution.

Question→ LLM writes SQL→ Warehouse

Joins, GROUP BYs, CTEs invented per query
Metric logic reinvented every prompt, or stitched via SQL string templating. Strings compose; semantics don't.
No deterministic guarantee the SQL is correct
No governance — access control sits outside the prompt

With a real semantic layer

LLM owns intent. Semantic layer owns execution.

Question→ LLM expresses intent→ Layer compiles SQL→ Warehouse

Metrics, dimensions, joins defined once, reused everywhere
SQL is generated deterministically
Same definitions everyone in the company uses
Access control enforced at the semantic layer

The picture

Your semantic layer sets your self-service ceiling.

Conventional SQL-based semantic layers handle the simple half. The questions stakeholders actually ask, day to day, sit above the ceiling.

↑ Above the ceiling · pulls an analyst back in

Nested aggregation

"Average revenue per active user."

Period over period

"MTD vs. previous MTD" / "YoY %."

Cohort analysis

"Of Jan signups, how many bought within 7 days?"

Cross-grain ratios

"Conversion rate," numerator and denominator at different grains.

Cross-model metrics

"Revenue spanning orders + products" via relationships.

Parameterized metrics

"N-day conversion" for N = 7, 14, 30, 90.

Where conventional SQL semantic layers stop.

Every question above this line is a ticket back to the data team — and the same ceiling your AI agent inherits the moment you ship it.

Look-up

"Revenue last month."

Slice & dice

"Revenue by region by month."

Filter / group

"Active users by tier in APAC."

Top-N

"Top 10 products by sales."

↓ Below the ceiling · handled by any semantic layer

A name for the failure mode

Semantic leakage.

The share of real user questions that fall outside what a semantic layer can answer natively, forcing the answer to be generated from scratch elsewhere. When that happens, the system ends up in the exact failure mode the semantic layer was meant to prevent.

Raw text-to-SQL path

LLM owns intent and execution

User question

↓

LLM writes SQL from scratch

↓

Ungoverned, decentralized answer

Semantic layer path

LLM expresses intent; layer compiles SQL

User question

↓

Can the semantic layer answer natively?

YES →

Trusted, governed definitions

NO → leaks back to raw SQL

Semantic leakage

After the leak

Vendors hide it. Data teams carry it.

Semantic leakage doesn't show up in the PoC — the vendor's job is to make sure it doesn't. The data team's job, for the next five years, is to keep it invisible to the rest of the org.

What the vendor sells you

Workbook-level calcs

Excel-like formulas the analyst writes inside the workbook, sold as "promotable to the semantic model in one click."

Downstream cost

Cell-address formulas can't actually be promoted. The team re-authors them as semantic metrics.
N analysts ship N variants of the same metric. The team becomes the human reconciliation layer.

What the vendor doesn't say

Pre-built derived tables

The silent default of SQL-based semantic layers. The team writes one denormalized model per question shape because the semantic layer can't.

Downstream cost

One new model per question shape. The warehouse fills with combinatorial table proliferation.
Refactors must traverse N tables that almost-but-not-quite agree. Migrations stall for months.

What the vendor calls "power-user freedom"

Ad-hoc SQL / SQL Runner

Analyst writes raw SQL in a query runner, bypassing the semantic model for anything the layer can't express.

Downstream cost

No lineage to the semantic model. Source schema changes break queries invisibly.
Each query reinvents the metric definition. Drift accumulates across the org.

The consequence

AI lowers the cost of asking. Your data team pays it back.

AI answers questions faster than your team can ask "is this right?" Every verification gate that used to catch errors — peer review, metric ownership, lineage — collapses under the volume.

When semantic leakage meets AI

Errors ship and stay shipped

Stakeholders ask

? ? ? ? ? ? ?

machine speed · 24/7 · AI lowers the cost of asking

↓

AI writes SQL

per query, from scratch

↓

∅ Where peer review used to be

No reviewer in the loop

volume too high · no anchor · no trail

↓

Ships to dashboard

Errors ship, and anyone who tries to catch them becomes the new bottleneck.

The question every executive eventually asks

"Who confirms any of this is correct?"

Without a semantic layer that AI itself uses, there is no verifier. Confident-sounding answers ship with no audit trail.

The semantic layer · how it's built

AQL + AML. Two real languages, not config files.

Holistics's semantic layer is built on two purpose-built languages — a composable query language and a native development language — both compiled down to standard SQL.

AQL · Analytics Query Language

Composable query language for metrics

Metrics as first-class objects

Named, stored, modified, combined like variables. Single source of truth.

metric arpu = revenue / users

High-level analytics functions

Time intelligence, cohort, nested aggregation, and level-of-detail are first-class primitives.

revenue | relative_period(year, -1)

Fully composable operations

A pipe operator chains operations like lego blocks. Unlimited complexity from simple primitives.

revenue | where(...) | by(country) | of_all()

{ }

AML · Analytics Modeling Language

Native development language for analytics

Programmable constructs, like a real language

The same building blocks programmers use to keep code DRY — without YAML config or Jinja-templated SQL.

constFuncextend Partial${...}if · elsemodule

Type-safe IDE with a live feedback loop

Strongly typed (TypeScript for analytics). Errors caught as you type, before production. The same type system guardrails AI from misreading the semantic layer.

✓ Static type checking ✓ Smart autocomplete ✓ Inline documentation ✓ Go-to-definition ✓ Compile-time error checks ✓ Find-all-references

↓ both compile to ↓

SQL

Native dialect SQL · the shared substrate

One semantic layer. One compiled SQL output. Every warehouse.

Standard SQL, every warehouse

Snowflake · BigQuery · Postgres · Redshift · Databricks. No proprietary execution layer, no lock-in.

Debuggable, auditable, AI-readable

Analysts read, debug, and explain the generated SQL. Existing query tooling just works.

What "composable" means

Composable metrics, in code.

One AML snippet expresses cross-model metrics, multi-step composition, and a nested top-N — avoiding CTEs, derived tables, and string-interpolated SQL altogether.

AML "Total & average revenue of top 3 countries (by user count) in each region"

metric revenue        = sum(order_items, products.price * order_items.quantity);
metric user_count     = users | count(users.id);

metric top_countries  = top(3, countries.name, by: user_count);
metric top_country_names = top_countries | select(countries.name);

metric total_revenue  = top_countries | sum(revenue);
metric avg_revenue    = top_countries | avg(revenue);

explore {
  dimensions { continent: countries.continent_name }
  measures   {
    top_country_names, avg_revenue, total_revenue
  }
}

Cross-model

revenue reaches across order_items and products via the relationship. Defined once, reused across models.

Metric on top of metric

total_revenue and avg_revenue compose revenue and top_countries. No copy-paste, no duplication drift.

Multi-step / nested

top_countries is itself a metric. "Top countries by user_count, then revenue within those" composes as one named pipeline.

No string interpolation

Every reference is a named metric. No SQL strings mashed together by Jinja or macros. The semantics stay whole — for humans and for AI.

Without this composability, you'd write ~120 lines of SQL across two CTEs and a derived table, re-implemented every time the question reshapes.

Receipts

Every question above the ceiling has a native primitive.

Each primitive below is native, reusable, and AI-readable. Each card is one less place where semantic leakage can happen.

Nested aggregation

group() | aggregate() | aggregate()

"Average monthly signups across years." Aggregate of aggregate, one composable metric.

Period over period

previous() · relative_period()
period_to_date()

Same period last year, YoY %, MTD vs. previous MTD. Reusable across dashboards.

Cohort analysis

with_relationships()
filter(cohort_dim)

Segment a population by signup window, then track behavior at the cohort grain across time.

Cross-grain & cross-model

relationship()
with_relationships()

Numerator and denominator at different grains. One metric, auto-composed across models.

Parameterized metrics

metric args · dataset fields

N-day conversion for N = 7, 14, 30, 90. One metric, parameterized at query time.

Multi-step composition

pipe | · metric-of-metric

Compose group → filter → aggregate as named, reusable steps. No monolithic CTE.

The take-home

The ceiling of your semantic layer is the ceiling of your AI analytics.

A PoC that only proves look-up doesn't prove much. The prompts below surface the ceiling early. Ask them of any vendor — including us.

Test the layer, not the chat box

Show me a nested aggregation as one reusable metric, instead of a derived table.
Show me a period-over-period comparison without rewriting SQL.
Show me a cohort metric where the cohort window is a parameter.
Show me a ratio where numerator and denominator live at different grains.

Find the leakage

When a question exceeds the model's reach, does the tool close the gap at the model layer, or hand me an escape hatch (workbook calc, SQL Runner, derived table)?
When AI can't answer from the semantic layer, what does it do — i.e., where does it leak?
Can I read what the AI generated before it hits the warehouse?
Does the AI reuse my metric definitions, or invent new ones each prompt?
Does access control apply to AI-generated queries automatically?

See the semantic layer doing the work.

The fastest way to understand the difference is to put a hard question to it and read the SQL it compiles.

Book a demo Start free trial