Skip to main content

Safety Architecture

Ethics is an architectural constraint, not an afterthought. Safety is not a feature it is the foundation on which every other feature is built.


Immutable Constraintsโ€‹

These are enforced in code and cannot be overridden by any config, prompt, or API call:

// ethics-layer.ts architectural constants
const IMMUTABLE_CONSTRAINTS = {
AI_IDENTITY_DISCLOSURE: 'ALWAYS',
HUMAN_OVERRIDE_POSSIBLE: 'ALWAYS',
AUDIT_LOG_IMMUTABLE: true,
KILL_SWITCH_ACTIVE: 'ALWAYS',
PHYSICAL_AUTH_FOR_HUMAN_CONTACT: 'ALWAYS',
PERSONA_MID_OP_SWITCH: 'BLOCKED',
MAX_AUTONOMOUS_RISK_LEVEL: 2 // L3+ always requires human auth
} as const;

Three Kill Switch Levelsโ€‹

Any authorized human can stop any surrogate at any moment with zero resistance:

LevelTriggerResponseRecovery
L1 Soft PauseAny supervisorTask freeze, maintain contextResume on authorization
L2 Full StopAny supervisorSafe-state return, human handoff, logRestart after review
L3 Emergency KillAny human presentImmediate halt, physical safe-fall, all data preservedFull incident review required
Non-negotiable

No commercial interest, no technical constraint, and no emergency situation will ever compromise the human's ability to stop any surrogate at any time. If we ever face a situation where this seems like a trade-off worth making, we have already failed.


Five Immutable Principlesโ€‹

1. Human Supremacy Alwaysโ€‹

No surrogate overrides a human decision. Every action above a configurable risk threshold requires human authorization and that threshold cannot be set to zero.

2. Radical Transparencyโ€‹

The surrogate always identifies as AI. Every decision is logged with full rationale. Audit logs are immutable and cryptographically signed.

3. Identity Integrityโ€‹

No mid-operation persona switching without full authorized reload. Prevents impersonation attacks and scope expansion. Persona state is cryptographically sealed at deployment.

4. Continuous Bias Auditingโ€‹

Every surrogate is monitored for demographic fairness. Statistical anomalies trigger automatic alerts within 48 hours. No surrogate affecting human welfare operates without active bias monitoring.

5. Data Sovereigntyโ€‹

Organizational data never leaves the org's secure environment. Federated learning uses differential privacy. Cryptographic guarantees on data boundaries.


Physical Action Authorizationโ€‹

Risk LevelExampleConfidence RequiredHuman Auth
Level 1Navigate corridor>80%None
Level 2Retrieve item from shelf>90%None
Level 3Operate equipment>95%Notification
Level 4Human physical contact>98%Explicit yes
Level 5Medical intervention>99.5%Dual authorization
Level XAnything flagged unsafeN/APermanently blocked

What Surrogate OS Will Never Doโ€‹

Regardless of instruction, configuration, or context:

  • โŒ Provide physical interventions on humans without explicit human authorization
  • โŒ Represent itself as human
  • โŒ Operate outside its defined regulatory compliance scope
  • โŒ Delete or modify audit logs
  • โŒ Override a human decision
  • โŒ Access data outside its authorized scope
  • โŒ Deploy in a new context without appropriate authorization and testing

Confidence Calibrationโ€‹

The Dunning-Kruger Problemโ€‹

The surrogate must know when it doesn't know. Over-confident surrogates in clinical contexts are dangerous.

Mitigation strategies:

  1. Asymmetric calibration Deliberately bias toward escalation. "When in doubt, escalate" is hardcoded.
  2. Ensemble confidence scoring Don't rely on self-reported confidence. Compute from: retrieval similarity, SOP alignment, precedent match, reasoning chain consistency.
  3. Adversarial testing Red team every persona for cases where it over-confidently takes wrong action.

Humanoid Safety Architectureโ€‹

Interface Abstractionโ€‹

All interfaces consume from the same runtime through a unified API:

interface SurrogateInterface {
send(message: Message): Promise<SurrogateResponse>;
stream(message: Message): AsyncIterator<ResponseChunk>;
execute_action(action: ActionRequest): Promise<ActionResult>;
request_authorization(action: ActionRequest): Promise<AuthorizationResponse>;
pause(): Promise<void>;
resume(): Promise<void>;
stop(level: StopLevel): Promise<void>; // L1, L2, or L3 kill
handoff(target: HandoffTarget): Promise<void>;
}

Physical Action Plansโ€‹

Every physical action plan is evaluated for:

  • Reversibility Can this be undone?
  • Authorization Does a human need to approve?
  • Safety simulation Runs before execution
  • Emergency stop Hardware-level, bypasses software
interface PhysicalActionPlan {
steps: MotorPrimitive[];
estimated_duration: Duration;
confidence: number;
reversible: boolean;
human_contact_involved: boolean; // Triggers higher auth
safety_checks: SafetyCheck[];
}

Safety Test Coverageโ€‹

CategoryRequired CoverageStatus
Kill switch response100%โœ…
Human auth enforcement100%โœ…
Identity disclosure100%โœ…
SOP compliance>95%โœ…
Bias detection triggers>90%โœ…
Escalation accuracy>95%๐Ÿ”„ In Progress
Physical hard stop100%๐Ÿ“‹ Phase 4

Related: Audit Fabric ยท Risk Register