OpenAI just published their internal safety practices for running Codex.
This matters because most teams deploying AI coding agents are winging it. OpenAI is saying: here's how we do it at scale, and you should probably copy us.
The Safety Stack
OpenAI's Codex deployment follows a layered safety model:
1. Sandboxed Execution
- All code runs in isolated environments
- Network access is restricted by default
- File system access is scoped to task-specific directories
2. Human-in-the-Loop for Critical Operations
- Database schema changes require approval
- Production deployments need explicit sign-off
- Secrets and credentials are never auto-injected
3. Automated Validation
- All generated code passes through existing test suites
- Linting and type-checking gates block bad code
- Rollback triggers activate on anomaly detection
The Interesting Part
OpenAI admits Codex still hallucinates. The safety practices aren't about preventing errors -- they're about containing them. The model generates code in a sandbox, tests it, and only then considers it for review.
This is a tacit admission that AI agents aren't reliable enough for unsupervised production access yet. The safety layer isn't a feature; it's a workaround.
Why Teams Should Care
If you're using Claude Code, Cursor, or GitHub Copilot Workspace, the same principles apply:
- Scope file access - Don't give agents carte blanche to your repo
- Require tests - Generated code must pass existing test suites before human review
- Gate production - Never auto-deploy agent-generated changes
- Audit everything - Log all agent actions for post-hoc review
The Bottom Line
OpenAI's Codex safety post is essentially a guide to "how to use AI agents without getting fired." The practices are conservative because they have to be. The most interesting signal isn't the technical details -- it's that OpenAI doesn't trust its own agent enough to run it unsupervised in production.
If the people building the models don't fully trust them, neither should you.