Guardrails & safety¶
Safety is built on hook points: tool output is treated as untrusted data, and
blocking guardrails stop the run cleanly (a guardrail stop reason, not a crash).
PII redaction¶
Redacts email / SSN / credit-card / phone / IPv4 from tool output before it reaches the model or the trace, and from the final answer:
from spine_middleware import PIIRedaction
agent = Agent("openai:gpt-4o-mini", tools=tools, middleware=[PIIRedaction()])
Prompt-injection screening¶
Treats tool output as untrusted — annotate with a caution banner (default) or block on injection patterns:
from spine_middleware import PromptInjectionScreen
# annotate (default) — wrap suspicious output so the model treats it as data
PromptInjectionScreen()
# or hard-block
PromptInjectionScreen(action="block")
Content policy¶
Block the user input and/or the final answer on banned patterns or a custom predicate:
from spine_middleware import ContentPolicy
ContentPolicy(banned=["password", "ssn"])
ContentPolicy(validate=lambda text: len(text) < 2000)
Reliability primitives¶
| Middleware | Protects against |
|---|---|
CircuitBreaker |
repeated provider failures — fail fast for a cooldown |
RateLimit |
exceeding a call budget (token bucket) |
Idempotency |
duplicate side effects on retries |
Sandbox |
runaway CPU/memory in a sync tool (subprocess + rlimits) |
Per-tenant budgets¶
Enforce a cumulative cost/token ceiling per tenant across many runs:
from spine_middleware import TenantBudget
budget = TenantBudget(max_cost_usd=10.0)
agent = Agent("openai:gpt-4o-mini", middleware=[budget], tenant_id="acme")
Sandbox scope
Sandbox is a resource sandbox (stops runaway CPU/memory/hangs), not a
security jail. For genuinely untrusted code use a container or VM boundary.