Capability Deck · Internal

Giving our AI
a memory.

A private, on-prem memory layer that keeps notes during work, respects team boundaries, and turns durable observations into reviewed documentation — automatically.

agent-memory · 2026-04-26

Without memory, every conversation starts from zero.

Our AI assistants re-learn the same context each session — the codebase, prior decisions, who's working on what, how things were done last week. Humans re-explain. The same answer comes out different the second time it's asked. Knowledge that should compound… doesn't.

Facts carried into a new session

~40%

Session time re-establishing context*

∞

Variance in answers to the same question

*Observed pattern across recurring sessions; not yet formally measured.

01 · The gap

So we built a memory that knows what you're working on, what your team can see, and when to give up its secrets to the docs.

A private, on-premises memory for our AI agents — that organises itself around the projects we actually work on, respects who should see what, and promotes the durable parts into version-controlled documentation over time. — agent-memory, in one sentence

Three design principles, ranked:

Safety first. Isolation is enforced at the database layer, not in code. Application bugs cannot leak data across team boundaries.
On-premises by default. No external API calls for memory data. Embeddings run locally on CPU. Nothing leaves the server.
Human-auditable. Every automatic decision the system makes is logged, inspectable, and reversible.

02 · The shape

Every memory lives at the intersection of three coordinates.

This is the cognitive scaffolding for everything else in the deck. Each axis answers a different question; each plays a different role in safety and search.

Team boundary

Which group of people can ever see this? Administrators and developers are kept apart by the database itself — neither group can read the other's notes, even by accident.

Project

Which initiative does this belong to? cicd, infinity, or a per-team catch-all for work that spans projects.

Topic

Free-form tags the AI attaches — python, security, incident — so future searches can narrow by subject without touching the boundaries above.

03 · Mental model

Where agent-memory sits in the platform.

One service, three categories of neighbour. Clients above; storage and identity beneath; promoted memories crystallise into version-controlled docs to the side. Nothing leaves the host — identity, embeddings, and storage are all on-prem.

Clients

Claude Code (server-local)
admin's CLI on linuxserver

Claude Code (laptop)
on LAN or VPN'd in

Codex / Cursor
future MCP clients

↓ MCP

Service

agent-memory

multi-tenant MCP service · single in-process server · HTTPS via nginx sidecar · static Bearer keys (Tier 1: internal admin/developer tooling)

↓ ↓ ↓

Backends

TimescaleDB
memory storage · RLS enforces tenant boundary at DB level

LiteLLM → Infinity
local embeddings (bge-small) · no external API

docs/context/*.md
promoted memories · git'd · canonical truth

Tier 1 service per platform-architecture intent: LAN/VPN-only, simple auth, no Keycloak/Traefik in the data path. Trust boundary: everything lives on linuxserver.lan. The service container holds the DB credentials; clients never do.

04 · System architecture

The lifecycle of a memory — from a single note to a documented fact.

Five stages. All automated except the last (which is a quick read by you).

Write

The AI takes a note while working. Auto-tagged with the right project (or a team catch-all if context is unclear).

→

Recall

Future sessions search the memory store. Each hit is counted — durability accrues.

→

Reconcile

Each night, the most-durable un-promoted notes are compared to the existing documentation.

→

Promote / Supersede

Worth keeping → becomes a bullet in docs/context/*.md. Conflicts with docs → memory loses, marked superseded.

→

Digest

One-page summary lands in your inbox. Two-minute scan. Revert anything dumb.

Documentation is canonical. Memory is the cheap-to-write feeder. The reconciler is the bridge — never the boss of either.

05 · Lifecycle

What do you want to know more about?

Quick recap: agent-memory is a Postgres-backed memory store with three-axis scoping (team / project / topic), enforced at the database with row-level security, fed by AI agents during work, and reconciled nightly into version-controlled documentation. Embeddings run locally — no data leaves the server. Each link below opens a focused topic.

Critical to understand

Why it's safe

Four independent layers of isolation. What gets enforced in the database, not in code.

→

Critical to use

Operator cheat-sheet

Most days: nothing. Levers for when you do want them. The "what do I actually do" page.

→

Internals

How a request flows through

Tool-context diagram: where the trust boundary actually gets enforced inside the service.

→

How it works (deeper)

Scoping, the reconciler's decision rules, and why memory and docs both have a job.

→

Risks honestly

Three real things we're tracking. Not "managed" — tracked, with mitigations.

→

Where we are

Shipped, running, collecting evidence. What's live versus what's queued.

→

What this unlocks

Continuity, consistency, compounding. Why this matters beyond "AI remembers things."

→

New · April 2026

The serving rebuild

Why the MCP path is being rebuilt: trust boundary, single transport (Tier-1 pattern), TLS day-one. v4 final.

→

Platform pattern

The Tier-1 MCP standard

The platform-wide pattern agent-memory will adopt: SSH-piped-stdio + dispatcher + docker exec. No HTTPS, no Bearer header, no nginx in the data path.

→

Live · daily

Health dashboard

Live operational view — traffic-light tiles for run health, decision quality, and drift. Refreshed nightly.

↗

06 · Index

← hub

Inside the box: how a request crosses the trust boundary.

Same business logic regardless of door. Tenant resolution differs by transport — but the core, pools, and database flow are identical.

 Anywhere on LAN/VPN
 (admin on server · admin on laptop · developer on laptop · future Codex/Cursor)
        │
        │ HTTPS  GET/POST /mcp
        │ Authorization: Bearer <static-tenant-key>
        │ (key from ~/projects/secrets/agent-memory-{tenant}.key,
        │  group-gated 0640; same pattern as code-executor)
        ▼
   ┌─────────────────────────────────────────────────────────────────┐
   │ nginx sidecar (separate container, same Compose stack)          │
   │   · TLS termination (public-CA cert for                         │
   │     agent-memory.ai-servicers.com · split DNS to linuxserver.lan)│
   │   · limit_req_zone 10r/s per IP (defends bad-sig flood)         │
   │   · access log → Loki via Promtail                              │
   └────────────────────┬────────────────────────────────────────────┘
                        │ HTTP, private docker network
                        │ host: agent-memory-bridge:9099
                        ▼
   ┌─────────────────────────────────────────────────────────────────┐
   │ agent-memory bridge (single in-process MCP server)              │
   │   1. Bearer middleware: SHA-256(token) → keys.json lookup       │
   │      · miss → 401                                                │
   │      · hit  → tenant resolved (administrators | developers)     │
   │      · subject = "key:administrator" / "key:developer"          │
   │      · request_id minted                                        │
   │   2. tenant + subject passed explicitly into MCP handler        │
   │      (no contextvars · no Mcp-Session-Id state)                 │
   │   3. handler runs MCP tool against per-tenant asyncpg pool      │
   │      (max_size=8 · TTL 30min · client disconnect →              │
   │       asyncio cancel → asyncpg ROLLBACK · healthy conn returned)│
   │   4. audit log emitted per request:                             │
   │      ts · request_id · subject · tenant · source_ip ·           │
   │      tool · args_hash · affected_memory_ids · duration · status │
   └────────────────────┬────────────────────────────────────────────┘
                        │ asyncpg
                        ▼
   ┌─────────────────────────────────────────────────────────────────┐
   │ TimescaleDB                                                     │
   │   connects as agent_memory_app_{tenant}                         │
   │   RLS: physical row-level tenant isolation at DB layer          │
   └─────────────────────────────────────────────────────────────────┘

Three guarantees in one picture: (1) nginx never speaks MCP — it's a TLS terminator + rate limiter. (2) the bridge never sees a DB credential in a client request — DSNs live in the bridge process env. (3) the DB role is bound to the tenant — RLS catches even tenant-routing bugs in app code. Three independent enforcement points: TLS on the wire, Bearer key in app layer, role+RLS at the DB.

topic · tool-context (internals)

← hub

The platform pattern: SSH-piped-stdio, not HTTPS-Bearer.

This supersedes the nginx-sidecar shape on the prior slide. v3 of mcpstandard.md is the canonical Tier-1 MCP pattern — agent-memory adopts it; code-executor migrates to it. No nginx, no TLS, no Bearer header. Identity is kernel-attested at the SSH layer; the role keyfile is mounted into the container, never put in argv or environ.

 Claude Code
   ├─ on the server: spawns /usr/local/bin/mcp-<name> directly (stdio child)
   └─ on a laptop:   spawns ssh -T <user>@linuxserver.lan "mcp <name>"
                     (ControlMaster keeps a warm socket; cold handshake paid once)
                                  │ stdio JSON-RPC
                                  ▼
   ┌─────────────────────────────────────────────────────────────────┐
   │ sshd                                                            │
   │   authorized_keys: restrict,command="/usr/local/bin/mcp-dispatcher",no-user-rc
   │   key offered by laptop CAN ONLY invoke the dispatcher          │
   └────────────────────┬────────────────────────────────────────────┘
                        ▼
   ┌─────────────────────────────────────────────────────────────────┐
   │ /usr/local/bin/mcp-dispatcher  (root-owned, 0755)               │
   │   · scrubs MCP_* env (defense in depth)                         │
   │   · reads SSH_ORIGINAL_COMMAND ("mcp <name>")                   │
   │   · validates <name> against literal allow-list                 │
   │   · logger -t mcp-dispatcher  (audit trail)                     │
   │   · exec /usr/local/bin/mcp-<name>                              │
   └────────────────────┬────────────────────────────────────────────┘
                        ▼
   ┌─────────────────────────────────────────────────────────────────┐
   │ /usr/local/bin/mcp-<name>  (root-owned, 0755 · shared wrapper)  │
   │   · resolves role from kernel-attested group membership         │
   │     (getent group · literal compare · admin wins on overlap)    │
   │   · stamps MCP_SENDER_NAME server-side from real identity       │
   │   · logger -t mcp-<name>  (audit trail)                         │
   │   · exec docker exec -i \                                       │
   │       -e MCP_KEY_FILE=/run/secrets/<name>-<role>.key \           │
   │       -e MCP_SENDER_NAME=... -e MCP_ROLE=... \                  │
   │       mcp-<name> <stdio entry-point>                            │
   └────────────────────┬────────────────────────────────────────────┘
                        │ stdio over docker exec (no host ports)
                        ▼
   ┌─────────────────────────────────────────────────────────────────┐
   │ mcp-<name> container                                            │
   │   · /home/administrator/projects/secrets/ mounted RO at         │
   │     /run/secrets/  (keyfile delivered by path, not bytes)       │
   │   · reads MCP_KEY_FILE → SHA-256 + constant-time vs keys.json   │
   │   · binds tool handlers to resolved role                        │
   │   · backend creds in container env (DSNs etc.)                  │
   └────────────────────┬────────────────────────────────────────────┘
                        ▼
                 backend services on internal docker network

Five things this picture buys: (1) identity is kernel-attested — `id -un` at the wrapper, `SSH_CLIENT` from sshd; clients can't forge `MCP_SENDER_NAME`. (2) raw key bytes never appear in argv or `/proc/<pid>/environ` — only the path does. (3) `restrict,command="..."` is the single line that keeps a laptop key from being a shell key — a §4 negative test (`ssh ... ls /` MUST reject) catches regression. (4) SSH multiplexing (ControlPersist 1h, ControlPath under the 108-byte socket-path limit, local-FS only — not NFS) keeps cold handshakes from blowing MCP's `initialize` timeout. (5) the dispatcher and wrapper both write `logger -t` audit lines — `journalctl -t mcp-dispatcher -t mcp-<name>` is the trail.

topic · mcp-standard (v3)

← hub

Four layers of isolation. Take any one away — the others still hold.

1

Filesystem. Each user's database credentials live in a 0600-permissioned file in their own home. An admin's credentials cannot be read by a developer account.
2

Linux group at launch. The memory service refuses to start for anyone not in a recognised team group. Unknown identity → fails loud, never fails open.
3

Database role. Each group connects to Postgres as a different role with a different password. Privileges are assigned once at provisioning and never change at runtime.
4

Row-level security. Every query passes through a database policy that physically refuses to return rows belonging to the other team. Bilateral — admins can't see dev memories either.

The load-bearing guarantee is layer 4. The database itself enforces the boundary; the application can't bypass it even if you want it to.

topic · safety · 1/2

← hub

Nothing leaves the server.

The "embedding" step — converting a note into the numeric fingerprint that semantic search uses — traditionally goes to a hosted cloud API. Ours doesn't.

Recurring cost for embedding traffic

External API calls for memory data

~30ms

Embedding latency on our CPU

A small open-source model runs in a dedicated container. Same quality as cloud vendors' small-tier offerings; none of the data-leaves-the-building.

Auditability

Every automatic decision the system makes is recorded. When the AI writes a note and the system isn't confident which scope it belongs to, that note is flagged inferred for human review. No guesswork is silent. Operators can pull the queue with one command and re-tag in bulk if a pattern looks wrong.

topic · safety · 2/2

← hub

Memories land where the work happens — not where the session started.

If you launch in one project and drift into another area mid-conversation, notes follow the work. The system resolves the right scope at the moment of writing, not at session start.

Naive

"Session wins" memory

The project you launched in is the project forever. Drift into another area mid-session and every note afterwards gets mis-filed. Search results lie.

What we do

"Work wins" memory

The system resolves at the moment of writing from the file or command context. Ambiguous? The note is flagged inferred for later human review.

topic · how-it-works · 1/3

← hub

Four classifications. One rule: documentation wins.

Each night the reconciler picks durable un-promoted memories and compares each against the most-related documentation section. The answer falls into one of four buckets, each mapping to one action.

If the memory…	Action	Why
Agrees with the docs	keep memory · no patch	Both findable. Redundancy is fine.
Complements the docs	append a bullet to the matching section · mark `promoted_at`	The note adds detail the doc didn't carry. Lift it into the durable record.
Contradicts the docs	memory loses · mark `superseded_by_md`	Docs are reviewed, version-controlled, audited. Canon by construction.
Orthogonal (unrelated)	leave alone · revisit next cycle	Not yet ready to promote, but not contradicting anything either.

The classifier is a single small-model call returning JSON. Validated against a hand-labeled fixture before going live; 90% agreement gate.

topic · how-it-works · 2/3

← hub

Memory and documentation, doing different jobs.

Each holds what the other can't. The reconciler keeps them in sync without manual ceremony.

Memory's job

Cheap, fast, unstructured.

Capture observations the moment they happen — too small for a PR, too useful to forget. Surfaces on related searches; not designed for blast-radius survival.

Documentation's job

Reviewed, version-controlled, durable.

The team's shared mental model. PR-reviewable, diff-able, survives DB loss. When memory and docs disagree, docs win — and interesting memories get pulled in over time.

topic · how-it-works · 3/3

← hub

Operator cheat-sheet.

Most days, nothing. The system runs itself. These are the levers for when you do want them.

If you want to…

see what it's doing
agent-memory monitor
read yesterday's digest
cat ~/projects/secrets/agent-memory-digest-latest.md
review the auto-flagged queue
agent-memory inferred --days 7
undo a bad auto-promotion
git revert <sha> · find via git log --grep reconciler
kill the auto-process
AGENT_MEMORY_RECONCILER_ENABLED=0 in the env file

When you talk to me…

in a project directory
I write to that project's scope automatically
in /tmp, ~/.claude, etc.
I write to your group's catch-all
it should land somewhere specific
tell me where; I'll pass --scope
a memory shouldn't have been written
tell me; I'll resolve <id> forget
the routing is wrong for a path
edit ~/projects/secrets/agent-memory-scope.yml

topic · operator

← hub

Why we're rebuilding how agent-memory is served.

The storage and policy layers are working. The way agents reach them isn't the right shape for a service we'll run for years. Two gaps drove the rebuild:

1

Trust boundary leaked. Developers had a Postgres password in their home directory. The database role + RLS confined damage to their tenant — but inside that tenant they had raw SQL. Bug, mistake, or compromise meant unbounded write access to every developer's notes. Not acceptable for a year-plus horizon.
2

Local-only architecture. The original path required SSH to the server. A developer on a laptop couldn't use agent-memory without first becoming the server. That's fine for solo; wrong for a team and wrong for tomorrow's heterogeneous AI clients.

Both gaps were named by the user, not by the code. The redesign went through three independent reviewer passes (Gemini · Codex · Claude review-board node), then a fourth iteration after recognizing the platform's deliberate Tier-1 pattern (every other MCP on this server uses simple key auth — adding Keycloak only here would make agent-memory the outlier).

topic · serving · 1/3

← hub

One brain. One door. Same auth pattern as every other MCP on this server.

The new shape is a single in-process MCP server with a single HTTPS ingress. Static per-tenant Bearer keys, group-gated key files, exactly the pattern code-executor already uses. nginx sidecar terminates TLS with a public-CA cert (split DNS) — no client trust-store changes anywhere. Stateless — Postgres is the only source of truth.

What we considered

Dual-transport + Keycloak

Unix socket local door + HTTPS remote door, with Keycloak JWT. Reviewer-endorsed for "year-plus multi-tenant" framing. Withdrew after recognizing: no other MCP on this server uses Keycloak, and admin's git-shared ~/.claude.json wants one config that works on every host.

What we shipped

Single transport, static keys, TLS

One config in git, works on server and laptop. Auth pattern matches the rest of the MCP ecosystem. TLS day-one (no token-on-wire risk). Per-tenant DB role + RLS unchanged — defense in depth holds. Decision criteria for revisiting Keycloak: developer count grows past ~5, OR per-user audit becomes a real need.

Three guarantees survive: (1) clients never see DB credentials; (2) DB role is bound to tenant — RLS catches even tenant-routing bugs in app code; (3) TLS + Bearer + RLS = three independent enforcement points.

topic · serving · 2/3

← hub

The path from here to there — without breaking what works.

The current subprocess bridge keeps running until the new path is proven. Each phase ends usable; nothing big-bangs.

Extract core

Transport-agnostic MCP handler; tests for tenant routing.

HTTP loopback

In-process server, HTTP on 127.0.0.1, no auth yet, subprocess fork deleted same change.

Bearer auth + keys

SHA-256 keys.json validation; generate per-tenant key files (group-gated); tests ship in phase.

nginx + public CA + LAN

Public-CA cert via split DNS; nginx sidecar; rate limiting; private docker network.

R5 onboards the developer team onto the LAN path; R6 finalises integration tests + docs + rotation procedures. Estimate: ~5-6 working days actual. Tests ship per phase, not lumped at the end. CA bootstrap (R4 prerequisite) decided before the phase starts: public-CA via DNS-01 against existing *.ai-servicers.com infra, internal DNS pointing the hostname at linuxserver.lan.

topic · serving · 3/3

← hub

Where we are.

✓

Storage + policy

Schema, RLS, embeddings, scoping, reconciler — shipped and stable.

✓

Subprocess bridge

Two-tenant Unix-socket bridge (Phase 1+2) running. Transition scaffolding; retired in R2.

✓

Architecture review

v4 final: Tier-1 single-transport + static keys + nginx + TLS day-one. Three reviewer passes plus platform-pattern alignment.

Rebuild R1–R6

Core extraction → HTTP loopback → Bearer auth → nginx + LAN → onboarding → cleanup. ~5-6 working days.

Live today

Storage layer: 128 automated tests green; CI wired
Daily timers: TTL purge, retries, reconciler, health probes
Subprocess bridge (transition scaffolding): per-tenant Unix sockets, kernel-enforced auth
Reconciler in dry-run on first 7 runs for confidence-building

In flight

R1: extract transport-agnostic core from _mcp.py
Developer-tenant DB password rotation (paused websurfinmurf onboarding)
Public-CA cert bootstrap for agent-memory.ai-servicers.com (R4 prerequisite)

topic · status

← hub

Risks we're tracking honestly.

R1

Within-team privacy. Members of the same team share reads. A team of two is fine; a team of ten may want per-user compartments. We'll extend when that's a real need, not before.
R2

Scope drift. If the path-routing map doesn't keep up with new project directories, notes land in the team catch-all. Weekly fallback-rate digest catches this; tune the map when the rate spikes.
R3

Embedding-model drift. The open-source model we run today is small and fast. If quality becomes a limiter, swap to a larger one — no schema change, no data migration.

Reconciler-introduced risks (mis-promotions) are mitigated by the dry-run default + clear commit footer + one-command revert. See the operator cheat-sheet.

topic · risks

← hub

What this unlocks.

Continuity

"What did we decide about X last month?" is answerable without a human scrolling chat history.

Consistency

Every agent works from the same shared memory, giving the same answer to the same question.

Compounding

Every decision recorded today makes tomorrow's session better — and every other project's session too.

This isn't a feature. It's a capability. Once memory is there, every AI workflow we build on top of it gets sharper, cheaper, and more auditable by default.

topic · outcomes

A quiet dependency.
A loud improvement.

Memory is now a first-class part of our AI platform. Private by construction, team-aware by policy, and ours.

questions welcome · agent-memory · 2026-04-26

Giving our AIa memory.

Without memory, every conversation starts from zero.

So we built a memory that knows what you're working on, what your team can see, and when to give up its secrets to the docs.

Every memory lives at the intersection of three coordinates.

Team boundary

Project

Topic

Where agent-memory sits in the platform.

The lifecycle of a memory — from a single note to a documented fact.

Write

Recall

Reconcile

Promote / Supersede

Digest

What do you want to know more about?

Why it's safe

Operator cheat-sheet

How a request flows through

How it works (deeper)

Risks honestly

Where we are

What this unlocks

The serving rebuild

The Tier-1 MCP standard

Health dashboard

Inside the box: how a request crosses the trust boundary.

The platform pattern: SSH-piped-stdio, not HTTPS-Bearer.

Four layers of isolation. Take any one away — the others still hold.

Nothing leaves the server.

Auditability

Memories land where the work happens — not where the session started.

"Session wins" memory

"Work wins" memory

Four classifications. One rule: documentation wins.

Memory and documentation, doing different jobs.

Cheap, fast, unstructured.

Reviewed, version-controlled, durable.

Operator cheat-sheet.

Daily default — do nothing.

If you want to…

When you talk to me…

Why we're rebuilding how agent-memory is served.

One brain. One door. Same auth pattern as every other MCP on this server.

Dual-transport + Keycloak

Single transport, static keys, TLS

The path from here to there — without breaking what works.

Extract core

HTTP loopback

Bearer auth + keys

nginx + public CA + LAN

Where we are.

Storage + policy

Subprocess bridge

Architecture review

Rebuild R1–R6

Live today

In flight

Risks we're tracking honestly.

What this unlocks.

Continuity

Consistency

Compounding

A quiet dependency.A loud improvement.

Giving our AI
a memory.

A quiet dependency.
A loud improvement.