A private, on-prem memory layer that keeps notes during work, respects team boundaries, and turns durable observations into reviewed documentation — automatically.
Our AI assistants re-learn the same context each session — the codebase, prior decisions, who's working on what, how things were done last week. Humans re-explain. The same answer comes out different the second time it's asked. Knowledge that should compound… doesn't.
*Observed pattern across recurring sessions; not yet formally measured.
Three design principles, ranked:
This is the cognitive scaffolding for everything else in the deck. Each axis answers a different question; each plays a different role in safety and search.
Which group of people can ever see this? Administrators and developers are kept apart by the database itself — neither group can read the other's notes, even by accident.
Which initiative does this belong to? cicd, infinity, or a per-team catch-all for work that spans projects.
Free-form tags the AI attaches — python, security, incident — so future searches can narrow by subject without touching the boundaries above.
One service, three categories of neighbour. Clients above; storage and identity beneath; promoted memories crystallise into version-controlled docs to the side. Nothing leaves the host — identity, embeddings, and storage are all on-prem.
Tier 1 service per platform-architecture intent: LAN/VPN-only, simple auth, no Keycloak/Traefik in the data path. Trust boundary: everything lives on linuxserver.lan. The service container holds the DB credentials; clients never do.
Five stages. All automated except the last (which is a quick read by you).
The AI takes a note while working. Auto-tagged with the right project (or a team catch-all if context is unclear).
Future sessions search the memory store. Each hit is counted — durability accrues.
Each night, the most-durable un-promoted notes are compared to the existing documentation.
Worth keeping → becomes a bullet in docs/context/*.md. Conflicts with docs → memory loses, marked superseded.
One-page summary lands in your inbox. Two-minute scan. Revert anything dumb.
Documentation is canonical. Memory is the cheap-to-write feeder. The reconciler is the bridge — never the boss of either.
Four independent layers of isolation. What gets enforced in the database, not in code.
→Most days: nothing. Levers for when you do want them. The "what do I actually do" page.
→Tool-context diagram: where the trust boundary actually gets enforced inside the service.
→Scoping, the reconciler's decision rules, and why memory and docs both have a job.
→Three real things we're tracking. Not "managed" — tracked, with mitigations.
→Shipped, running, collecting evidence. What's live versus what's queued.
→Continuity, consistency, compounding. Why this matters beyond "AI remembers things."
→Why the MCP path is being rebuilt: trust boundary, single transport (Tier-1 pattern), TLS day-one. v4 final.
→The platform-wide pattern agent-memory will adopt: SSH-piped-stdio + dispatcher + docker exec. No HTTPS, no Bearer header, no nginx in the data path.
→Live operational view — traffic-light tiles for run health, decision quality, and drift. Refreshed nightly.
↗Same business logic regardless of door. Tenant resolution differs by transport — but the core, pools, and database flow are identical.
Anywhere on LAN/VPN
(admin on server · admin on laptop · developer on laptop · future Codex/Cursor)
│
│ HTTPS GET/POST /mcp
│ Authorization: Bearer <static-tenant-key>
│ (key from ~/projects/secrets/agent-memory-{tenant}.key,
│ group-gated 0640; same pattern as code-executor)
▼
┌─────────────────────────────────────────────────────────────────┐
│ nginx sidecar (separate container, same Compose stack) │
│ · TLS termination (public-CA cert for │
│ agent-memory.ai-servicers.com · split DNS to linuxserver.lan)│
│ · limit_req_zone 10r/s per IP (defends bad-sig flood) │
│ · access log → Loki via Promtail │
└────────────────────┬────────────────────────────────────────────┘
│ HTTP, private docker network
│ host: agent-memory-bridge:9099
▼
┌─────────────────────────────────────────────────────────────────┐
│ agent-memory bridge (single in-process MCP server) │
│ 1. Bearer middleware: SHA-256(token) → keys.json lookup │
│ · miss → 401 │
│ · hit → tenant resolved (administrators | developers) │
│ · subject = "key:administrator" / "key:developer" │
│ · request_id minted │
│ 2. tenant + subject passed explicitly into MCP handler │
│ (no contextvars · no Mcp-Session-Id state) │
│ 3. handler runs MCP tool against per-tenant asyncpg pool │
│ (max_size=8 · TTL 30min · client disconnect → │
│ asyncio cancel → asyncpg ROLLBACK · healthy conn returned)│
│ 4. audit log emitted per request: │
│ ts · request_id · subject · tenant · source_ip · │
│ tool · args_hash · affected_memory_ids · duration · status │
└────────────────────┬────────────────────────────────────────────┘
│ asyncpg
▼
┌─────────────────────────────────────────────────────────────────┐
│ TimescaleDB │
│ connects as agent_memory_app_{tenant} │
│ RLS: physical row-level tenant isolation at DB layer │
└─────────────────────────────────────────────────────────────────┘
Three guarantees in one picture: (1) nginx never speaks MCP — it's a TLS terminator + rate limiter. (2) the bridge never sees a DB credential in a client request — DSNs live in the bridge process env. (3) the DB role is bound to the tenant — RLS catches even tenant-routing bugs in app code. Three independent enforcement points: TLS on the wire, Bearer key in app layer, role+RLS at the DB.
This supersedes the nginx-sidecar shape on the prior slide. v3 of mcpstandard.md is the canonical Tier-1 MCP pattern — agent-memory adopts it; code-executor migrates to it. No nginx, no TLS, no Bearer header. Identity is kernel-attested at the SSH layer; the role keyfile is mounted into the container, never put in argv or environ.
Claude Code
├─ on the server: spawns /usr/local/bin/mcp-<name> directly (stdio child)
└─ on a laptop: spawns ssh -T <user>@linuxserver.lan "mcp <name>"
(ControlMaster keeps a warm socket; cold handshake paid once)
│ stdio JSON-RPC
▼
┌─────────────────────────────────────────────────────────────────┐
│ sshd │
│ authorized_keys: restrict,command="/usr/local/bin/mcp-dispatcher",no-user-rc
│ key offered by laptop CAN ONLY invoke the dispatcher │
└────────────────────┬────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ /usr/local/bin/mcp-dispatcher (root-owned, 0755) │
│ · scrubs MCP_* env (defense in depth) │
│ · reads SSH_ORIGINAL_COMMAND ("mcp <name>") │
│ · validates <name> against literal allow-list │
│ · logger -t mcp-dispatcher (audit trail) │
│ · exec /usr/local/bin/mcp-<name> │
└────────────────────┬────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ /usr/local/bin/mcp-<name> (root-owned, 0755 · shared wrapper) │
│ · resolves role from kernel-attested group membership │
│ (getent group · literal compare · admin wins on overlap) │
│ · stamps MCP_SENDER_NAME server-side from real identity │
│ · logger -t mcp-<name> (audit trail) │
│ · exec docker exec -i \ │
│ -e MCP_KEY_FILE=/run/secrets/<name>-<role>.key \ │
│ -e MCP_SENDER_NAME=... -e MCP_ROLE=... \ │
│ mcp-<name> <stdio entry-point> │
└────────────────────┬────────────────────────────────────────────┘
│ stdio over docker exec (no host ports)
▼
┌─────────────────────────────────────────────────────────────────┐
│ mcp-<name> container │
│ · /home/administrator/projects/secrets/ mounted RO at │
│ /run/secrets/ (keyfile delivered by path, not bytes) │
│ · reads MCP_KEY_FILE → SHA-256 + constant-time vs keys.json │
│ · binds tool handlers to resolved role │
│ · backend creds in container env (DSNs etc.) │
└────────────────────┬────────────────────────────────────────────┘
▼
backend services on internal docker network
Five things this picture buys: (1) identity is kernel-attested — `id -un` at the wrapper, `SSH_CLIENT` from sshd; clients can't forge `MCP_SENDER_NAME`. (2) raw key bytes never appear in argv or `/proc/<pid>/environ` — only the path does. (3) `restrict,command="..."` is the single line that keeps a laptop key from being a shell key — a §4 negative test (`ssh ... ls /` MUST reject) catches regression. (4) SSH multiplexing (ControlPersist 1h, ControlPath under the 108-byte socket-path limit, local-FS only — not NFS) keeps cold handshakes from blowing MCP's `initialize` timeout. (5) the dispatcher and wrapper both write `logger -t` audit lines — `journalctl -t mcp-dispatcher -t mcp-<name>` is the trail.
The load-bearing guarantee is layer 4. The database itself enforces the boundary; the application can't bypass it even if you want it to.
The "embedding" step — converting a note into the numeric fingerprint that semantic search uses — traditionally goes to a hosted cloud API. Ours doesn't.
A small open-source model runs in a dedicated container. Same quality as cloud vendors' small-tier offerings; none of the data-leaves-the-building.
Every automatic decision the system makes is recorded. When the AI writes a note and the system isn't confident which scope it belongs to, that note is flagged inferred for human review. No guesswork is silent. Operators can pull the queue with one command and re-tag in bulk if a pattern looks wrong.
If you launch in one project and drift into another area mid-conversation, notes follow the work. The system resolves the right scope at the moment of writing, not at session start.
The project you launched in is the project forever. Drift into another area mid-session and every note afterwards gets mis-filed. Search results lie.
The system resolves at the moment of writing from the file or command context. Ambiguous? The note is flagged inferred for later human review.
Each night the reconciler picks durable un-promoted memories and compares each against the most-related documentation section. The answer falls into one of four buckets, each mapping to one action.
| If the memory… | Action | Why |
|---|---|---|
| Agrees with the docs | keep memory · no patch | Both findable. Redundancy is fine. |
| Complements the docs | append a bullet to the matching section · mark promoted_at | The note adds detail the doc didn't carry. Lift it into the durable record. |
| Contradicts the docs | memory loses · mark superseded_by_md | Docs are reviewed, version-controlled, audited. Canon by construction. |
| Orthogonal (unrelated) | leave alone · revisit next cycle | Not yet ready to promote, but not contradicting anything either. |
The classifier is a single small-model call returning JSON. Validated against a hand-labeled fixture before going live; 90% agreement gate.
Each holds what the other can't. The reconciler keeps them in sync without manual ceremony.
Capture observations the moment they happen — too small for a PR, too useful to forget. Surfaces on related searches; not designed for blast-radius survival.
The team's shared mental model. PR-reviewable, diff-able, survives DB loss. When memory and docs disagree, docs win — and interesting memories get pulled in over time.
Most days, nothing. The system runs itself. These are the levers for when you do want them.
Reconciler at 03:30 UTC. Digest ready by morning. Backup at 04:00 captures the new state.
agent-memory monitorcat ~/projects/secrets/agent-memory-digest-latest.mdagent-memory inferred --days 7git revert <sha> · find via git log --grep reconcilerAGENT_MEMORY_RECONCILER_ENABLED=0 in the env file/tmp, ~/.claude, etc.--scoperesolve <id> forget~/projects/secrets/agent-memory-scope.ymlMental model: Memory = cheap notes I take while we work · Documentation wins all conflicts · Reconciler is the bridge between them, not the boss of either.
The storage and policy layers are working. The way agents reach them isn't the right shape for a service we'll run for years. Two gaps drove the rebuild:
Both gaps were named by the user, not by the code. The redesign went through three independent reviewer passes (Gemini · Codex · Claude review-board node), then a fourth iteration after recognizing the platform's deliberate Tier-1 pattern (every other MCP on this server uses simple key auth — adding Keycloak only here would make agent-memory the outlier).
The new shape is a single in-process MCP server with a single HTTPS ingress. Static per-tenant Bearer keys, group-gated key files, exactly the pattern code-executor already uses. nginx sidecar terminates TLS with a public-CA cert (split DNS) — no client trust-store changes anywhere. Stateless — Postgres is the only source of truth.
Unix socket local door + HTTPS remote door, with Keycloak JWT. Reviewer-endorsed for "year-plus multi-tenant" framing. Withdrew after recognizing: no other MCP on this server uses Keycloak, and admin's git-shared ~/.claude.json wants one config that works on every host.
One config in git, works on server and laptop. Auth pattern matches the rest of the MCP ecosystem. TLS day-one (no token-on-wire risk). Per-tenant DB role + RLS unchanged — defense in depth holds. Decision criteria for revisiting Keycloak: developer count grows past ~5, OR per-user audit becomes a real need.
Three guarantees survive: (1) clients never see DB credentials; (2) DB role is bound to tenant — RLS catches even tenant-routing bugs in app code; (3) TLS + Bearer + RLS = three independent enforcement points.
The current subprocess bridge keeps running until the new path is proven. Each phase ends usable; nothing big-bangs.
Transport-agnostic MCP handler; tests for tenant routing.
In-process server, HTTP on 127.0.0.1, no auth yet, subprocess fork deleted same change.
SHA-256 keys.json validation; generate per-tenant key files (group-gated); tests ship in phase.
Public-CA cert via split DNS; nginx sidecar; rate limiting; private docker network.
R5 onboards the developer team onto the LAN path; R6 finalises integration tests + docs + rotation procedures. Estimate: ~5-6 working days actual. Tests ship per phase, not lumped at the end. CA bootstrap (R4 prerequisite) decided before the phase starts: public-CA via DNS-01 against existing *.ai-servicers.com infra, internal DNS pointing the hostname at linuxserver.lan.
Schema, RLS, embeddings, scoping, reconciler — shipped and stable.
Two-tenant Unix-socket bridge (Phase 1+2) running. Transition scaffolding; retired in R2.
v4 final: Tier-1 single-transport + static keys + nginx + TLS day-one. Three reviewer passes plus platform-pattern alignment.
Core extraction → HTTP loopback → Bearer auth → nginx + LAN → onboarding → cleanup. ~5-6 working days.
_mcp.pyagent-memory.ai-servicers.com (R4 prerequisite)Reconciler-introduced risks (mis-promotions) are mitigated by the dry-run default + clear commit footer + one-command revert. See the operator cheat-sheet.
"What did we decide about X last month?" is answerable without a human scrolling chat history.
Every agent works from the same shared memory, giving the same answer to the same question.
Every decision recorded today makes tomorrow's session better — and every other project's session too.
This isn't a feature. It's a capability. Once memory is there, every AI workflow we build on top of it gets sharper, cheaper, and more auditable by default.
Memory is now a first-class part of our AI platform. Private by construction, team-aware by policy, and ours.