Data & AnalyticsEngineering & Operations

Your AI Agents Are Stateless — and That's Why They Keep Failing in Production

AI agents deployed across organizations forget everything between sessions. They have no awareness of prior conversations, no knowledge of the user's context, and no access to the institutional memory your business has spent years building. That is the core reliability problem — and Databricks is now positioning its data platform as the infrastructure layer that solves it.

Author: Roy Gatling, RMG Associates — linkedin.com/in/roygatling

Published: 2026-05-23

Last updated: 2026-05-23

Freshness note: Reflects Databricks Lakebase GA and Agent Bricks Supervisor Agent GA as of May 2026.

What does it mean for an AI agent to have memory?

Agent memory is not a single thing. Researchers and platform architects now recognize four distinct types, each serving a different function in production agent systems.

Working memory holds context within a single session — the conversational thread the model is actively processing. This is the only memory most deployed agents have today.

Episodic memory captures what happened in prior sessions. It's the difference between an agent that starts from zero on every conversation and one that knows you reviewed a vendor contract last Tuesday and asked three follow-up questions about indemnification clauses.

Entity memory stores structured facts about people, accounts, products, and relationships — the equivalent of a CRM record embedded into the agent's reasoning layer.

Procedural memory encodes learned workflows and operating patterns. An agent with procedural memory improves over time; one without it repeats the same mistakes indefinitely.

A recent Databricks session on stateful agent systems argued that teams building memory through basic LLM prompts and traditional databases miss roughly 75% of what makes an agent truly intelligent — because working memory alone is not a memory architecture.

Why does the stateless problem persist in most deployments?

The architectural explanation is straightforward: large language models have no persistent state by design. Every API call starts with a blank slate. Agents appear "intelligent" within a session because the context window carries the conversation, but that context disappears the moment the session closes.

The common workarounds — storing conversation history in a relational database, dumping it back into the prompt on each turn, or using RAG to retrieve relevant prior content — all work at small scale and break under load. They introduce latency, inflate token costs, and create fragile ETL pipelines connecting the places where data lives (typically an OLAP data warehouse) to the places where agents need low-latency reads (typically an OLTP transactional store).

The infrastructure gap between analytics data and operational agent state is the problem most organizations are currently duct-taping.

What is Databricks Lakebase and how does it close that infrastructure gap?

Lakebase is Databricks' managed serverless Postgres OLTP database, built on the Neon architecture following Databricks' acquisition. It runs inside the Databricks Lakehouse, meaning operational transactional data lives alongside Delta Lake analytics data on the same platform, under the same governance model, without ETL.

Lakebase provides a fully managed Postgres backend for storing agent state and memory, integrating natively with Databricks authentication and scaling automatically with workload. Lakebase-backed agent memory is supported on both Databricks Apps and Mosaic AI Model Serving, and supports LangGraph time travel to resume or fork conversations from any checkpoint.

For agent memory specifically, short-term memory captures context in a single conversation session while long-term memory extracts and stores key information across multiple conversations — and teams can build agents with either or both types.

Lakebase also supports pgvector, which means the same database instance handling transactional agent state can serve semantic similarity search — eliminating a separate vector database from the stack entirely for many use cases.

The branching feature deserves mention. Git-style zero-copy database branching provides isolated environments for testing agent changes against production data without risk — a capability that matters enormously in regulated industries where testing against real production data is a compliance requirement teams currently work around by creating expensive copies.

What is Agent Bricks and how does it address knowledge quality?

Agent Bricks is Databricks' enterprise agent platform, announced and reaching general availability in early-to-mid 2026. It addresses a different layer of the problem: not just where agents store memory, but how accurately they reason over business knowledge.

Standard RAG (Retrieval-Augmented Generation) pulls documents and feeds them to a model. The accuracy ceiling is determined by retrieval quality and the model's ability to interpret raw content without business context. That ceiling tends to be low when agents work with structured operational data — tables with ambiguous column names, metrics without business definitions, or dashboards that mean different things to Finance than to Operations.

Agent Bricks uses Unity Catalog metadata — including schema, business definitions, lineage, permissions, and data quality signals — to improve how agents reason and act. This context is embedded directly into retrieval and planning, delivering 70% higher accuracy than standard RAG and a 30% improvement in multi-step workflows.

The mechanism matters here. Agents aren't just getting documents. They're getting the semantic layer — how your business actually defines revenue, churn, headcount, or any other metric — embedded in every query. That's the difference between an agent that retrieves data and one that understands it.

Capability	Standard RAG	Agent Bricks on Databricks
Knowledge source	Documents, unstructured	Lakehouse + Unity Catalog semantic layer
Business context	None (raw column names)	Schema, definitions, lineage embedded
Permissions	Separate system or broad service account	On-behalf-of user identity, Unity Catalog enforced
Memory persistence	External database, custom ETL	Lakebase native, no ETL required
Accuracy vs. baseline	Baseline	Reported 70% improvement on structured queries
Governance audit trail	Varies	Every model call, tool call, data access logged

What is agent sprawl and why is it a governance problem now?

Agent sprawl describes what happens when organizations build AI agents faster than they build the infrastructure to manage them. The result is dozens of specialized bots with no shared permissions model, no centralized observability, and no way for users to know which agent holds which information.

Teams are left playing agent roulette, toggling between dozens of niche bots and trying to remember if the "Travel Policy" lives in the HR agent or the Finance Agent. This cognitive load slows productivity, causes teams to search aimlessly, create agents that have already been built, or reference out-of-date information.

The security problem runs parallel to the usability problem. Most agent tools require duplicating permissions or using broad service accounts, creating a compliance gap where an agent might access data the requesting user is not authorized to see.

Unity Catalog has governed data since 2021. Databricks is now extending that same governance infrastructure to cover every asset an AI system touches — LLMs, MCP servers, skills, and agents. The catalog that already knows who can access your customer data now also governs which agents can call which tools, and under what conditions.

The Agent Bricks Supervisor Agent, now generally available, provides a single orchestration entry point. A user asks a question; the supervisor reasons about intent and routes to the appropriate specialized agent — whether that's a Genie Space for structured data, a Knowledge Assistant for unstructured documents, or an MCP-connected external tool — without the user needing to know which agent to invoke.

What is the semantic layer and why does it determine agent accuracy?

The semantic layer is the layer of business meaning that sits between raw data and the systems that consume it. It is where a column named rev_adj_net_usd becomes "Net Revenue," where a metric called "churn" carries a precise definition rather than an assumption, and where relationships between tables are expressed in business terms rather than foreign key constraints.

Without a semantic layer, agents operate on raw schema. They see column names, not concepts. They retrieve data, but have no reliable way to interpret whether they are applying the right business logic — and the errors they make are often invisible until a decision has already been made on top of them.

Databricks introduced Unity Catalog Business Semantics (reached general availability in April 2026) as the platform's native semantic layer. It has two integrated components:

Metric views are reusable SQL objects that define business KPIs once and make them queryable across any dimension at runtime. Standard SQL views lock in aggregations at creation time. A metric view separates the measure definition from the dimensional grouping, so "Revenue per Customer by Region" and "Revenue per Customer by Product" are two queries against one governed definition — not two separately maintained calculations that can quietly drift apart.

Agent metadata sits alongside metric views and adds synonyms, display names, and formatting rules specifically for AI consumption. When an agent encounters a question about "top-line performance," agent metadata maps that phrase to the correct metric view. This is what closes the gap between how business users ask questions and how data is actually structured.

The consequence for agent accuracy is direct. When Agent Bricks routes a query through Genie Spaces, it is not generating SQL against raw tables. It is reasoning over metric definitions, business synonyms, and lineage context that are registered and governed in Unity Catalog. That is the mechanism behind the reported 70% accuracy improvement over standard RAG — the business logic was already resolved before the model was asked to reason.

The practical implication for teams building agents today: accuracy is not primarily a model problem. It is a knowledge representation problem. A well-constructed semantic layer is what allows an agent to answer "What drove the margin compression in Q1?" with a defensible, auditable response rather than a plausible-sounding hallucination.

What should a company do in the next 90 days?

In our work with companies building out agent infrastructure, the most common structural mistake is treating agent deployment as a model selection problem. Teams spend months evaluating which LLM to use and almost no time designing the memory and knowledge architecture the agent will run on top of. The model is the last mile. The data infrastructure is the foundation.

1. Audit your agent's memory architecture, not its output quality.

If your deployed agents have no episodic or entity memory, every conversation starts from zero. Map what you actually have against the four memory types. The gap is usually larger than expected and fixable before it compounds.

2. Resolve the OLAP/OLTP integration before scaling.

If your agents are reading from your analytics warehouse on every query, you are already paying a latency and cost penalty that will worsen as usage grows. Whether that means Lakebase on Databricks or a different OLTP layer depends on your stack — but the architectural gap needs a named solution, not a workaround.

3. Define agent permissions using your existing identity model.

Agents that use broad service accounts are a compliance incident waiting to happen. The enforcement pattern — on-behalf-of user identity passed through to every data and tool call — is available in Agent Bricks and in other frameworks. It needs to be designed in, not retrofitted.

The organizations shipping agents into production with confidence are not the ones with the best models. They are the ones with the cleanest data, the clearest permissions, and the memory architecture to make each agent interaction incrementally smarter than the last.

Why Your Company Needs Shared-Memory AI Agents, Not More Personal AI Tools — why persistence and institutional memory matter beyond individual chat sessions.
The Forward Deployed AI Enablement Role: The Function That Determines Whether Your AI Agents Actually Work — how to evaluate whether agents will survive real business processes before production scale.
AI for Data: The Executive Case for Funding 'Time-to-Answer' — why governed data layers and semantic clarity determine whether AI answers are trustworthy.

Executive FAQ

Frequently asked questions about AI agent memory and Databricks.

What is the difference between short-term and long-term memory for AI agents?

Short-term memory captures context within a single conversation session. Long-term memory extracts and stores key information across multiple conversations, so the agent can reference prior interactions, user preferences, and historical context in future sessions. Most production agents need both.

Is Lakebase a replacement for a dedicated vector database?

For many use cases, yes. Lakebase supports pgvector, which handles semantic similarity search alongside transactional state management. Teams with very high-volume vector workloads or specialized retrieval requirements may still evaluate purpose-built options, but eliminating a separate vector database from the architecture reduces complexity and operational overhead for the majority of deployments.

What is agent sprawl and how does Agent Bricks address it?

Agent sprawl occurs when organizations build many specialized AI agents without centralized discovery, governance, or orchestration — leaving users to manually navigate dozens of bots. Agent Bricks Supervisor Agent provides a single entry point that routes user queries to the appropriate agent, governed by Unity Catalog permissions throughout.

What is the difference between a semantic layer and standard RAG for agents?

Standard RAG retrieves documents or table rows and passes them to a model. The model then interprets raw content without knowing what the data means in business terms. A semantic layer pre-resolves that meaning: metrics are defined once, synonyms are mapped, and business logic is registered before any agent query runs. The result is that agents reason over governed definitions rather than raw schema, which is why semantic layer-backed agents produce more consistent and auditable outputs.

How does Unity Catalog governance extend to AI agents?

Unity Catalog now governs not just data tables but models, MCP servers, tools, and agents. Every model call and tool invocation passes through Unity AI Gateway, evaluated against Unity Catalog policies before executing and logged after. Agents inherit the requesting user's identity and permissions, so they access only what that user is authorized to access.

About the author

Roy Gatling is the founder of RMG Associates LLC, an AI strategy and implementation consultancy. RMG is a Databricks and Snowflake partner. Connect on LinkedIn.

Need help designing agent memory and governance?

RMG helps leadership teams audit memory architecture, resolve OLAP/OLTP gaps, and deploy governed agents on Databricks and Snowflake.

Discuss Your Agent Infrastructure

Featured guide

Start with where most AI programs actually break down

Why Your AI Transformation Is Being Overcomplicated (And How to Fix the Partner Problem) — the operating logic for picking partners and pacing transformation so execution matches mid-market realities.

Read the flagship guide