AI-ready data in practice: What dbt Semantic Layer and dbt's MCP server and agent skills do for your team

last updated on May 19, 2026
When it comes to getting their data AI-ready, many organizations start with cleaning and structuring their data and then simply stop. This is an important first step, but it’s not the last step, because AI-ready data relies heavily on context: the layer of meaning that explains what your data actually represents.
You need to gather as much information as you can about that data: Where are data points coming from? Which team defines the metric? Which team owns inputting this data into a system? Without answers to questions like these, even clean, well-structured data can lead AI astray.
One way to think about AI is as a great teammate that knows SQL and analytics really, really well but knows zero about your organization. An agent doesn't know the different acronyms used in your industry, for example, and it doesn’t understand your business goals. For AI to work effectively and efficiently, you need to give it all that important context to make the data meaningful.
In practice, teams use dbt’s AI capabilities to make data meaningful to AI agents. dbt lives on top of tools like Snowflake, BigQuery, and Databricks to transform data without having to use stored procedures or other data transformation techniques, and there are three key pieces to dbt’s AI stack: the dbt Semantic Layer, dbt MCP server, and dbt agent skills. Here’s what they are, how they work together, and how to use them to ensure high-quality, AI-ready data.
The semantic data layer is your lens
The semantic data layer provides all of the context that the AI will need to understand your data: the structure of the data, how you work with the data, and what exists in the data.
I think of it like this: I have very bad eyesight. When I take my glasses off, I can still see things, but they are far from in focus. There will be some things that I miss and other things that are incomplete in my vision because I can't fully see everything. When I put my glasses on, I'm able to see clearly and completely. This is essentially what a semantic layer does for your data.
A generic semantic layer is like buying plain, off-the-rack reading glasses. It makes things somewhat clearer; you will get answers some, but not all, of the time, and you’re not getting the most detailed vision possible.
A governed, dbt-backed semantic layer gives you prescription lenses that are custom-focused for your business's vision, signed off by someone trusted, and updated through scheduled exams as your vision (your data, your definitions, your business) change. AI wearing drugstore readers might see something somewhat clearly, but it'll squint and need to occasionally guess. AI wearing your prescription sees exactly what your business means by "revenue," "active customer," or "churn" and keeps seeing correctly as those definitions evolve.
So when we talk about gathering context around data, most of that context is typically handled within the semantic layer. This is especially true when it comes to what certain columns mean, what certain metrics are, and how different values or properties are to be calculated.
You don't need a perfect semantic layer to start
You can get a lot of use out of dbt’s AI tooling even without a semantic data layer in place. The semantic layer is mainly used for conversational AI, letting agents query your actual data and return reliable AI outputs. But if you want to use dbt's AI tooling for development workflows, you don't need it. There are still things that you can do with dbt's AI tools outside of it, like diagnosing job failures, finding column-level lineage, and other things that really speed up your workflow.
Don't let not having your data fully cleaned up, or not yet having your data fully defined in the semantic layer, be what stops you from using dbt’s AI tools. You can absolutely start using them now, and you can even use some of them to help build your semantic layer as you go.
Three pieces of the dbt AI stack
Terms like "agent skills" and "MCP server" can be intimidating when you first hear them. Let's demystify these.
MCP server: the tools. An MCP server is a set of tools like API calls that can be used to communicate with applications on the backend. Its function is to give the agent instructions on how to make those calls and how to use what it gets back. For example, there's a tool called list_metrics used to pull data from the semantic layer, and another one called get_job_run_error for diagnosing failures available as functions in the dbt MCP server. The dbt MCP server grounds those interactions in structured, dbt-native context, so agents are working from what your data actually means, not guessing from static documentation.
Agent skills: the instructions. dbt’s agent skills are workflow instructions that give your agent proven, opinion guidance for common dbt tasks like writing tests, debugging failures, defining metrics, handling migrations. They load on demand and only when relevant. An agent skill gives the agent a set of clear instructions needed to complete a specific task. Skills also provide the agent with rules and guardrails: never do this; here are common pitfalls you may run into; here are things that you need to look out for.
How the semantic layer, MCP, and agent skills fit together: Each piece has a distinct role, and together they cover everything an agent needs to work effectively with your data. The semantic layer provides the context, MCP provides governed access, and agent skills provide the proven workflows agents need to query the data or to get the tools they need out of the MCP server.
dbt’s AI tools in production to speed up data development
The best way to understand how these three pieces work together is to see them in action. One of our clients, a very large technology company, used them to feed structured data into a Slack channel where dbt errors are automatically sent. They hooked dbt's MCP server, along with Claude, into that error triage channel to look at those job failures and actually diagnose them.
The integration uses the get_job_failure function in the dbt MCP server, looks at the error, and then has the agent analyze what happened and why. By the time a developer actually gets to that error they're able to see a quick triage that was already done, along with some possible solutions.
This integration is not fully set up for self-healing just yet. There are definitely controls around the AI, and it doesn't get everything right all of the time, but it's a huge time save. Instead of having to go into the dbt platform and dig through the logs to find the specific problem, you have it all laid out there by your agent.
That same team is also working on a GitHub action: if somebody creates a model and doesn't include a semantic layer definition, the agent will try to create one and send it back to the developer with a note: here's what I created, add on to it to make your semantic layer. The goal is to encourage that hygiene of getting that context as a natural part of the workflow, rather than an afterthought.
And, notably, both of these are use cases that don't require a semantic layer at all.
Where to start: pilot small and smart
If you're ready to include AI in your data pipelines, the most important advice I can give is to do it in steps. Really hone in on one business unit that is willing to work with you on a pilot program for AI readiness, and focus on gathering semantics around the data for that small subset.
(Pilots within the data team itself, like the error triage example above, are a great place to start. They can be very useful, and they don't require a well-crafted semantic layer to work effectively. So there's no reason to wait!)
Gathering that semantic information, though, will really allow you to get your feet under you when it comes to building a semantic layer, and it will allow you to iterate very quickly. When you collaborate with one team in a pilot project, you're able to break things and learn from your mistakes before bringing it out to more business units.
So: start small, really focus in on what you're able to do (and what you reasonably can do), and then apply what you learned. Then you can use the momentum you gain by providing something great to that particular team or business unit to expand the semantic layer to more teams across your org.
Why semantic standards matter: Open Semantic Interchange
Once you’re ready to build out your semantic layer it’s important to understand that, right now, basically every data tool implements semantic definitions in its own proprietary format.. Power BI has one, Omni has one, Databricks has one, Snowflake has one, and of course we have one.
That fragmentation creates a portability problem: if your semantic layer definitions (metric names, calculations, business logic) are expressed in a format that’s specific to one tool, you can't move them to another tool without rebuilding everything from scratch. So, for example, if you define "monthly recurring revenue" in dbt's Semantic Layer and then want to also expose that definition in Power BI or Snowflake, you'd have to redefine it natively in each system. Besides redundancy it also creates inconsistency risk and a lot of maintenance overhead.
This is why dbt, along with Snowflake, Databricks, and a large number of other major organizations in data, have joined an initiative called the Open Semantic Interchange. The v0.1 a vendor-neutral spec is already live and open source. It’s an industry-wide specification that standardizes how we exchange semantic metadata across analytics, AI and BI platforms. The OSI spec serves as a common language for metrics, dimensions, and relationships, so metrics can be interpreted consistently across tools (e.g., Snowflake, Tableau, dbt) while minimizing vendor lock-in.
The dbt Semantic Layer complements the spec by making those definitions operational: you define and govern your metrics in the dbt Semantic Layer using MetricFlow, and OSI provides the interchange format to move those definitions across other tools like Snowflake and Tableau. Author once, use everywhere.
Think of it like the same reason we needed MCP in the first place: when there's no common standard, every tool reinvents the wheel and nothing moves cleanly between systems. A shared standard changes that.
VS Code Extension
The free dbt VS Code extension is the best way to develop locally in dbt.





