Most teams using AI today are doing it one person at a time — individual ChatGPT accounts, scattered API experiments, no shared memory. I took a different approach: building a shared, company-wide AI assistant that knows the business, has access to company data sources, and gets smarter over time. Here’s what I built and how it works.
The system runs as a multi-user platform on top of a frontier LLM (Claude by Anthropic). Each conversation runs in an isolated Docker container with its own workspace. Containers are ephemeral — they spin up per session — but persistent state lives in mounted volumes: one private volume per user, one shared volume per team channel.
A host orchestration layer handles container lifecycle, message routing, and scheduled task persistence. Scheduled tasks are stored in a host database, not crontab — so they survive container restarts and are manageable via API. The agent surfaces in Slack, is thread-aware, uses @mentions, and can spawn named sub-agents that appear as distinct Slack bot identities for complex multi-step workflows.
The tool interface between the LLM and the host uses the Model Context Protocol (MCP). Every tool call — file reads, bash commands, Slack messages, Notion writes, secret retrieval — goes through MCP servers that the harness exposes to the container over stdio. The LLM requests a tool call; the harness checks the permission tier; if approved, executes it and returns the result. This gives the harness fine-grained control over every action the model takes, with a complete audit trail at the harness layer rather than relying on the model to self-report.
Workspace layout: /workspace/global (read-only system config and skills, synced from host), /workspace/group (shared team channel workspace — memory, daily logs, conversation archives), /workspace/user (private per-user workspace, mounted on every container start — secrets, personal config, scripts). Session state in the user workspace persists across ephemeral containers because the volume is mounted from the host, not the container filesystem. The model is explicitly instructed to write intermediate results and decisions to files rather than holding them in context, which also protects against context rot as conversations grow long.
Not every task needs the same model. A model router directs requests to different tiers based on complexity: lightweight checks and simple summaries go to a fast, cheap model; multi-step reasoning, code generation, and anything requiring judgment goes to a more capable one. The routing logic lives in the harness, not the prompt — so it can't be talked out of it. In practice this cuts costs significantly without any quality regression, because most of the volume is low-complexity work that a smaller model handles fine.
The harness layer is built on NanoClaw, a ~500-line TypeScript wrapper around the Anthropic SDK. The closest comparison is OpenClaw, a popular open-source framework for running AI agents. The design differs in one central dimension: surface area.
OpenClaw is full-featured and handles a lot of the plumbing automatically. NanoClaw deliberately stays small. At ~500 lines, you can audit the entire harness in an afternoon, understand every code path, and modify it for your environment without fighting framework conventions. That size constraint is itself a security property: with a minimal harness and a strict egress firewall, the attack surface is enumerable.
The permission model is split between two explicit tiers: admin-level operations (harness process — container lifecycle, volume mounts, network config, MCP server registration) and user-level agent operations (runs inside the container with minimal permissions). The agent can only take actions the harness exposes via MCP. There is no path for a model to escalate from user-level to admin through prompt manipulation, because the privilege boundary is enforced by the OS process model, not the application layer.
The tradeoff: you own more of the plumbing. SDK upgrades, tool permission changes, and new MCP server integrations all require editing the harness. For a team that wants deep customization and a clear security story, that's a feature. For a team that wants to stay on autopilot, OpenClaw is the better starting point.
Security was first-class from day one, not retrofitted: