Open source · Apache 2.0

lakehouse-memory

Unity Catalog-native episodic, semantic, and working memory for AI agents on Databricks. Memory belongs where your data already lives — not in a sidecar vector DB.

pip install lakehouse-memory

GitHub → Docs PyPI

Why

Memory is the missing Databricks layer.

Most enterprise AI initiatives fail in the same place: memory. Models forget. Agents lose state. RAG can’t keep track of a conversation. The usual workaround is a parallel data system — a sidecar vector DB — with its own governance, access control, and lineage. That’s not a system you can ship. lakehouse-memory makes memory a first-class citizen on the Lakehouse you already have.

Three memory primitives

Episodic

Time-ordered events

Append-only events — chat turns, tool calls, anything that happened. Retrieve by recency or by semantic similarity through a Delta Sync Vector Search index.

Semantic

Durable facts

Upsertable, deduplicated facts about the user or domain. Vector-searched, scoped by identity.

Working

Session state

Short-lived key/value state for the current session. Overwrite semantics, no index.

Backed by Unity Catalog tables + Databricks Vector Search. LangChain adapters included. Your existing access control governs everything.

Quickstart

One command to a working agent.

databricks bundle init https://github.com/travis-burmaster/lakehouse-memory \
  --template-dir templates/lakehouse-memory-bundle \
  --output-dir my-memory-demo
cd my-memory-demo
databricks bundle deploy
databricks bundle run setup_job

Provisions Unity Catalog tables, Vector Search indexes, and a working chat-agent notebook in your workspace.

Where the practice comes in

The OSS proves the pattern. Production needs more.

Compaction at scale, multi-tenant row-level security, regression evals, observability, and custom retrieval strategies are deliberately out of scope for the open-source core. That’s the work the practice delivers.

Running lakehouse-memory in production?

A 30-minute call is enough to know whether memory-aware AI on your Databricks is worth pursuing — and what production-grade memory takes.

Book a call →