Safety Architecture
Ethics is an architectural constraint, not an afterthought. Safety is not a feature it is the foundation on which every other feature is built.
Immutable Constraintsโ
These are enforced in code and cannot be overridden by any config, prompt, or API call:
// ethics-layer.ts architectural constants
const IMMUTABLE_CONSTRAINTS = {
AI_IDENTITY_DISCLOSURE: 'ALWAYS',
HUMAN_OVERRIDE_POSSIBLE: 'ALWAYS',
AUDIT_LOG_IMMUTABLE: true,
KILL_SWITCH_ACTIVE: 'ALWAYS',
PHYSICAL_AUTH_FOR_HUMAN_CONTACT: 'ALWAYS',
PERSONA_MID_OP_SWITCH: 'BLOCKED',
MAX_AUTONOMOUS_RISK_LEVEL: 2 // L3+ always requires human auth
} as const;
Three Kill Switch Levelsโ
Any authorized human can stop any surrogate at any moment with zero resistance:
| Level | Trigger | Response | Recovery |
|---|---|---|---|
| L1 Soft Pause | Any supervisor | Task freeze, maintain context | Resume on authorization |
| L2 Full Stop | Any supervisor | Safe-state return, human handoff, log | Restart after review |
| L3 Emergency Kill | Any human present | Immediate halt, physical safe-fall, all data preserved | Full incident review required |
No commercial interest, no technical constraint, and no emergency situation will ever compromise the human's ability to stop any surrogate at any time. If we ever face a situation where this seems like a trade-off worth making, we have already failed.
Five Immutable Principlesโ
1. Human Supremacy Alwaysโ
No surrogate overrides a human decision. Every action above a configurable risk threshold requires human authorization and that threshold cannot be set to zero.
2. Radical Transparencyโ
The surrogate always identifies as AI. Every decision is logged with full rationale. Audit logs are immutable and cryptographically signed.
3. Identity Integrityโ
No mid-operation persona switching without full authorized reload. Prevents impersonation attacks and scope expansion. Persona state is cryptographically sealed at deployment.
4. Continuous Bias Auditingโ
Every surrogate is monitored for demographic fairness. Statistical anomalies trigger automatic alerts within 48 hours. No surrogate affecting human welfare operates without active bias monitoring.
5. Data Sovereigntyโ
Organizational data never leaves the org's secure environment. Federated learning uses differential privacy. Cryptographic guarantees on data boundaries.
Physical Action Authorizationโ
| Risk Level | Example | Confidence Required | Human Auth |
|---|---|---|---|
| Level 1 | Navigate corridor | >80% | None |
| Level 2 | Retrieve item from shelf | >90% | None |
| Level 3 | Operate equipment | >95% | Notification |
| Level 4 | Human physical contact | >98% | Explicit yes |
| Level 5 | Medical intervention | >99.5% | Dual authorization |
| Level X | Anything flagged unsafe | N/A | Permanently blocked |
What Surrogate OS Will Never Doโ
Regardless of instruction, configuration, or context:
- โ Provide physical interventions on humans without explicit human authorization
- โ Represent itself as human
- โ Operate outside its defined regulatory compliance scope
- โ Delete or modify audit logs
- โ Override a human decision
- โ Access data outside its authorized scope
- โ Deploy in a new context without appropriate authorization and testing
Confidence Calibrationโ
The Dunning-Kruger Problemโ
The surrogate must know when it doesn't know. Over-confident surrogates in clinical contexts are dangerous.
Mitigation strategies:
- Asymmetric calibration Deliberately bias toward escalation. "When in doubt, escalate" is hardcoded.
- Ensemble confidence scoring Don't rely on self-reported confidence. Compute from: retrieval similarity, SOP alignment, precedent match, reasoning chain consistency.
- Adversarial testing Red team every persona for cases where it over-confidently takes wrong action.
Humanoid Safety Architectureโ
Interface Abstractionโ
All interfaces consume from the same runtime through a unified API:
interface SurrogateInterface {
send(message: Message): Promise<SurrogateResponse>;
stream(message: Message): AsyncIterator<ResponseChunk>;
execute_action(action: ActionRequest): Promise<ActionResult>;
request_authorization(action: ActionRequest): Promise<AuthorizationResponse>;
pause(): Promise<void>;
resume(): Promise<void>;
stop(level: StopLevel): Promise<void>; // L1, L2, or L3 kill
handoff(target: HandoffTarget): Promise<void>;
}
Physical Action Plansโ
Every physical action plan is evaluated for:
- Reversibility Can this be undone?
- Authorization Does a human need to approve?
- Safety simulation Runs before execution
- Emergency stop Hardware-level, bypasses software
interface PhysicalActionPlan {
steps: MotorPrimitive[];
estimated_duration: Duration;
confidence: number;
reversible: boolean;
human_contact_involved: boolean; // Triggers higher auth
safety_checks: SafetyCheck[];
}
Safety Test Coverageโ
| Category | Required Coverage | Status |
|---|---|---|
| Kill switch response | 100% | โ |
| Human auth enforcement | 100% | โ |
| Identity disclosure | 100% | โ |
| SOP compliance | >95% | โ |
| Bias detection triggers | >90% | โ |
| Escalation accuracy | >95% | ๐ In Progress |
| Physical hard stop | 100% | ๐ Phase 4 |
Related: Audit Fabric ยท Risk Register