Deployment & scale¶
Spine runs the same kernel across three shapes — only backend config differs.
Because State is fully serializable and the kernel is stateless between steps,
horizontal scale is "add workers."
1. Embedded¶
Imported into an app or script. In-memory or SQLite checkpoint.
from spine_backends import SQLiteCheckpoint
agent = Agent("openai:gpt-4o-mini", checkpoint=SQLiteCheckpoint("runs.db"))
2. Server¶
Behind an async API (e.g. FastAPI). Postgres checkpoint, Redis rate limiter.
from spine_backends import PostgresCheckpoint
from spine_middleware import RateLimit
agent = Agent(
"openai:gpt-4o-mini",
checkpoint=PostgresCheckpoint(DSN),
middleware=[RateLimit(max_calls=10, per_s=1.0)],
)
3. Distributed¶
Stateless workers pull jobs from a queue; all state external (Postgres/Redis). A human-in-the-loop pause or a crash mid-run is recoverable because the resume token points at the durable checkpoint — any worker can pick it up.
Multi-tenancy¶
tenant_id flows through State; budgets and (with a namespacing store) isolation
are tenant-scoped. See TenantBudget.
Observability in production¶
Add the OTel middleware and point it at your collector — spans carry token counts, cost, latency, model, and tool name, so existing dashboards and your SIEM work unchanged.
Cooperative shutdown¶
On SIGTERM, pass a should_cancel predicate to run(); the kernel finishes and
checkpoints the current step, then returns cancelled — the run resumes later
from exactly there.