Architecture · Walkthrough

Have you considered containers for runtime?

Most enterprise agentic stacks are monoliths in disguise — single Lambdas, single notebooks, single managed sessions you cannot operate, version, or swap a model out of. Containers are not the only answer, but they are the answer most production agentic systems converge on. Here is what the runtime decision actually looks like.

The most common architectural shortcut in enterprise agentic AI looks like this: a single AWS Lambda or Azure Function holds the entire agent. The Anthropic SDK is imported, the system prompt is hardcoded, the tool definitions are inline, and the embedding lookups go through one vector database call. The function returns the agent's response, the team ships it, the demo is good, the production reality lasts about a quarter.

It lasts a quarter because the runtime decision and the architecture decision are the same decision, and the team made it by default. By the time the second use case arrives — different cohort, different system prompt, different tool set, different model — the monolith has to be unwound. By the third use case, there is a platform team retroactively building what should have been the runtime foundation in the first place.

This piece walks through the decision honestly. Containers are not the only answer. They are the answer most production agentic systems converge on, and we want to explain why before you find out the expensive way.

The runtime question most teams skip

When you build a single agent that handles one workflow, the runtime question is trivial. Pick whatever your team already runs — Lambda, App Service, Cloud Run, a single VM. Done.

The runtime question becomes load-bearing the moment you have:

  • More than one agent. Different system prompts, different tool sets, different models. You need isolation between them so a change to agent A does not destabilize agent B.
  • Skills you want to reuse across agents. A "company governance" skill, a "product catalog" retrieval skill, a custom MCP server you wrote for ServiceNow. They should be defined once and consumed by every agent that needs them.
  • Models you want to swap. Claude Sonnet 4.6 today, Claude Opus 4.7 for one cohort, Haiku 4.5 for cost-sensitive paths, a self-hosted Llama for a regulated workflow. The agent code should not change when the model does.
  • Multi-tenancy. If you serve agents to internal cohorts with different data permissions, you cannot share a single runtime context across tenants without leaking.
  • Observability. Per-turn latency, token consumption broken down by agent and cohort, tool-call success rates, skill-activation rates, refusal rates. The monolith does not give you these by default.

Each of those is solvable in a monolith with enough engineering. None of them are cheaper solved that way than they would be in a containerized stack.

What a containerized agentic stack actually contains

Strip the Kubernetes vocabulary; here is what is in the boxes:

  • Agent runtime container. Holds the LLM call, the tool dispatch, the conversation state. One image, parameterized by environment variables for system prompt, tool set, model selection. Multiple instances of this image are running for different agents.
  • Skill registry. Markdown skill definitions (Anthropic format or your own), embedded with metadata. Loaded on demand by agents that match a skill's trigger description. Lives in object storage with a thin metadata index.
  • Embedding + retrieval container. Voyage AI for English-heavy enterprise content, OpenAI text-embedding-3 for cost-sensitive at-scale, plus a hybrid retriever (vector + keyword + reranker). The agent runtime calls this; it does not embed itself.
  • Model routing layer. A thin proxy that resolves "model: smart" to a specific Anthropic, OpenAI, or self-hosted endpoint based on cohort policy. When you want to test Sonnet 4.6 against Opus 4.7 for a specific cohort, you change a config, not code.
  • MCP servers. Your bespoke connectors — SharePoint, ServiceNow, internal Postgres, GitHub — running as their own containers. Tokens stay on-prem, audit flows through your SIEM, and the agents that use them are decoupled from the integration code.
  • Observability sidecar. OpenTelemetry traces with custom attributes for agent ID, cohort, skill activations, tool calls, model used. Your Grafana, Honeycomb, or Datadog already knows how to consume this.

The shape is not exotic. It is the same shape every production microservices system converges on, applied to agents.

Mazalgo as a worked reference

Mazalgo is the Watch Dealer/Trader Lifecycle agentic system Protime built and runs in production. It is a useful reference because it has the constraints most enterprise teams are about to hit:

  • Multiple agent personalities — the dealer-side intake agent, the trader-side sourcing agent, and the back-office reconciliation agent each have different system prompts and tool sets, but share the same runtime image.
  • Real CRM as the system of record — agents act on CRM data, do not hallucinate inventory, and route updates back through proper write paths with idempotency keys.
  • Voyage AI embeddings for semantic search across watch dealers' historical inventory descriptions; the embedding container is shared across all three agents.
  • Live pricing API integrations — the agent runtime calls into a pricing routing layer that abstracts vendor specifics. When a market data provider changes their API, one container updates, not all three agents.
  • Gemma4 content discovery running for content scout workflows where the cost profile of a fine-tuned open-weights model wins over a frontier API.

The architectural lesson Mazalgo enforces: the runtime decisions show up everywhere, but they show up cheapest if you make them at the start. The Voyage embeddings are not in the agent code. The pricing API is not in the agent code. The system prompt for the dealer agent is not in the trader agent's image. Each of those separations costs about a day of engineering up front and saves about a quarter of refactoring later.

What containers solve, specifically

For an enterprise stack, the operational wins compound:

  • Multi-tenancy. Agent images run with cohort-specific environment overrides; data access is per-instance. A finance cohort's agents cannot reach an HR cohort's data because the IAM and the network are container-level, not application-level.
  • Reproducibility. A failing agent in production can be reproduced by pulling its image tag and replaying the inputs. The version of the system prompt, the model, and the skills are all pinned.
  • A/B and canary routing. New agent definition gets 5% of traffic, the proxy collects metrics, the rollout proceeds or reverses. This is impossible in the monolith without a meaningful rebuild.
  • Cost attribution. Tokens-per-cohort and tools-per-cohort fall out of the trace data naturally. Finance can see which cohort's agents are expensive and where.
  • Disaster recovery. Containers are immutable artifacts. Rebuilding the agent stack from scratch in a new region is a registry pull and a config apply, not a recovery project.

Anthropic's own agentic systems guidance lands in similar territory: the cost of running production agents is dominated by the integration and operational work, not the model calls. Containers are the operational abstraction that lets the integration and operational work scale.

When containers are overkill

The honest answer: not every team needs this on day one.

  • Single-skill, single-cohort, single-vendor agents. If you are running one Copilot Studio agent inside Microsoft's managed runtime for one workflow, you do not need a container fleet. You need to make sure your governance is correct.
  • Pre-product-market-fit prototypes. A notebook is the right answer when you are still figuring out whether the agent is useful.
  • Teams without DevOps capacity. Containers without operators is worse than a Lambda with operators. If you do not have someone who can run the cluster, run the Lambda.

The conversation we have most often: "we have one agent today, but we know we are going to have five within a year." For that profile, containers are not overkill. The migration cost from Lambda to containers at agent five is greater than the up-front cost of starting in containers at agent one.

The Protime conversation

When we sit with a team designing their first production agent, the runtime question is on the agenda from week one. Not because containers are the answer for every team — they are not — but because the question itself reveals what kind of system you are actually building. Teams that name "we are building one workflow forever" can confidently pick Lambda. Teams that name "we are building a platform" pick containers. Teams that say "we'll figure it out as we go" almost always end up rebuilding once.

If your roadmap has more than one agent on it, or more than one cohort, or more than one model, or more than one external data source — the runtime is the architecture. Decide it on purpose.