Skip to main content

Introducing Surrogate OS: The Open-Source Platform That Turns Job Descriptions Into AI Employees

· 13 min read
Creator of Surrogate OS

Today we are open-sourcing Surrogate OS, an AI identity engine that transforms job descriptions into fully operational AI professionals — complete with structured personas, standard operating procedures, regulatory compliance, and institutional memory. It is available now on GitHub under the MIT license.

This is not another chatbot framework. Surrogate OS produces AI surrogates that carry professional identities, follow auditable decision-making procedures, comply with real-world regulations, and learn from their own operational history. Think of it as the operating system layer between foundation models and the actual work that regulated industries need AI to perform.

The Problem Nobody Wants to Talk About

The AI agent landscape in 2026 is simultaneously thrilling and deeply broken.

On one side, foundation models have reached a level of capability that makes autonomous task completion genuinely viable. Every week brings a new framework promising to turn GPT or Claude into an agent that can handle customer support, triage medical records, process insurance claims, or manage financial portfolios.

On the other side, the industries that would benefit most from AI agents — healthcare, finance, legal, government — are the ones that cannot deploy them. The reason is not technical capability. The reason is trust, governance, and regulatory compliance.

Consider the state of affairs. A hospital system evaluating an AI triage assistant faces HIPAA requirements that demand a complete audit trail of every clinical decision. A European bank deploying an AI financial advisor must satisfy both GDPR data handling requirements and the incoming EU AI Act obligations for high-risk AI systems. A law firm exploring AI paralegals needs to demonstrate that the system follows established procedures and does not introduce bias into case assessments.

None of the mainstream agent frameworks address these requirements as first-class concerns. Compliance is treated as something to bolt on later — a Phase 2 problem. But anyone who has worked in regulated industries knows that compliance cannot be Phase 2. If the foundation is not auditable, no amount of wrapper code will make a regulator comfortable.

The gap between a compelling AI demo and a production deployment in a regulated environment is enormous. We built Surrogate OS to close that gap.

What Surrogate OS Actually Does

The core premise is deceptively simple: you define a role, and Surrogate OS synthesizes a complete AI professional to fill it.

But "define a role" means something very specific here. A surrogate is not a prompt template with a personality description stapled on. It is a structured professional identity composed of several interconnected layers:

Identity and Persona — Every surrogate has a defined professional background, communication style, areas of expertise, and behavioral boundaries. This is not flavor text. The persona layer constrains the surrogate's behavior in measurable ways, ensuring it operates within its defined scope of practice.

Standard Operating Procedures (SOPs) — This is the core of what makes a surrogate different from a chatbot. Each surrogate comes with a directed acyclic graph of operational procedures. These are not simple instruction lists. They are structured decision trees with defined inputs, outputs, validation criteria, escalation triggers, and compliance checkpoints.

Regulatory Compliance — Every SOP node is tagged with applicable regulatory frameworks. The system enforces compliance at the procedure level, not as an afterthought.

Institutional Memory — Surrogates maintain both short-term and long-term memory, with mechanisms for promoting operational learnings into persistent institutional knowledge.

Interface Adaptability — The same surrogate identity can be deployed across text chat, voice, avatar, and eventually humanoid robotic interfaces without redefining its core logic.

A Concrete Example: The Senior ER Nurse

To make this tangible, walk through what happens when you create a Senior ER Nurse surrogate.

You start with a role definition that specifies the clinical scope: emergency triage, patient assessment, care coordination, medication verification. The system synthesizes a professional identity with appropriate clinical communication patterns and establishes boundaries — this surrogate will escalate to a physician for diagnostic decisions and will not prescribe medications.

The SOP engine then generates a 9-node procedure graph for the triage workflow:

  1. Patient Intake — Collect presenting complaint, vital signs, medical history. Validation: all required fields populated, vital signs within instrument ranges.
  2. Acuity Assessment — Apply the Emergency Severity Index. Decision branches for each level.
  3. Allergy and Medication Check — Cross-reference reported allergies against any pending medications. Compliance checkpoint: HIPAA audit log entry.
  4. Clinical Priority Routing — Based on acuity level, route to appropriate care pathway. Escalation trigger: any ESI Level 1 or 2 immediately flags attending physician.
  5. Care Coordination — Manage handoffs between departments. Each handoff is logged with timestamp, participants, and clinical summary.
  6. Documentation — Generate structured clinical notes. Compliance checkpoint: ensure all required fields for CMS billing compliance.
  7. Follow-up Scheduling — For discharge cases, generate follow-up care plan.
  8. Shift Debrief — At end of operational period, the surrogate generates a self-assessment of its performance, flagging any decision points where confidence was low.
  9. SOP Improvement Proposal — Based on accumulated operational data, propose specific procedure modifications for human review.

Every node in this graph has defined inputs, outputs, and validation criteria. Every transition is logged. Every compliance-relevant action generates an audit trail entry with a cryptographic signature.

This is not a theoretical architecture. It is implemented, tested, and running.

The Compliance Story: Why This Matters More Than Features

We made an early decision that shaped everything about Surrogate OS: compliance is not a feature. It is the foundation.

The platform currently supports six regulatory frameworks:

  • HIPAA — Health Insurance Portability and Accountability Act. Required for any AI system touching protected health information in the United States.
  • GDPR — General Data Protection Regulation. Required for any system processing personal data of EU residents.
  • EU AI Act — The European Union's comprehensive AI regulation, with high-risk AI system requirements taking effect in 2026.
  • SOC 2 — Service Organization Control standards for data security, availability, and confidentiality.
  • FDA SaMD — Software as a Medical Device guidelines, relevant when AI systems contribute to clinical decision-making.
  • FINRA — Financial Industry Regulatory Authority requirements for AI systems in financial services.

Each framework imposes specific requirements on how AI systems make decisions, store data, handle personal information, and maintain audit trails. Surrogate OS implements these requirements at the architectural level.

Cryptographic SOP Signing

Every standard operating procedure in Surrogate OS is cryptographically signed using Ed25519 signatures. When an SOP is created or modified, the system generates a signature that covers the complete procedure definition — every node, every transition, every validation criterion.

This creates an immutable chain of custody for operational procedures. If a regulator asks "what procedure was this AI following when it made that decision at 3:47 AM on Tuesday?", we can provide the exact SOP version, prove it has not been tampered with, and show the complete decision trace through that procedure.

The signing infrastructure uses hash chaining, where each SOP revision includes the hash of the previous version. This produces a tamper-evident history of every procedural change, who authorized it, and when it took effect.

Bias Auditing

The compliance layer includes a bias detection system that monitors surrogate behavior across demographic dimensions. The system tracks decision distributions and flags statistical anomalies that might indicate bias in triage priority, response quality, escalation frequency, or any other measurable operational dimension.

This is not optional. For high-risk AI systems under the EU AI Act, bias monitoring is a legal requirement. We built it in from day one rather than trying to retrofit it later.

Why Open Source Matters for Compliance

There is a philosophical reason we chose to open-source Surrogate OS, and it goes beyond the usual arguments about community and collaboration.

In regulated industries, trust requires transparency. When a hospital deploys an AI triage system, the compliance team needs to be able to audit not just the decisions the system makes, but the code that governs how it makes them. Black-box AI systems face an inherently harder path to regulatory approval than systems whose decision-making logic can be inspected, tested, and verified.

Open source is not just a distribution model for Surrogate OS. It is a compliance strategy.

The Intelligence Layer: Surrogates That Learn

A static set of procedures is useful but limited. Real professionals learn from experience. Surrogate OS includes an intelligence layer designed to give surrogates the same capability — with appropriate guardrails.

Institutional Memory

Every surrogate maintains two tiers of memory:

Short-Term Memory (STM) captures the immediate operational context — the current interaction, recent decisions, active tasks. This functions like a professional's working memory during a shift.

Long-Term Memory (LTM) stores persistent institutional knowledge — patterns observed across many interactions, procedural learnings, domain-specific insights. The system includes a promotion mechanism that identifies high-value short-term observations and elevates them to long-term storage after validation.

The key design decision is that memory promotion is not automatic. The system proposes promotions based on frequency, impact, and novelty. A human operator reviews and approves before any learning becomes part of the surrogate's permanent knowledge base.

Shift Debriefs

At the end of each operational period, a surrogate generates a structured self-assessment. The LLM analyzes its own performance across that period, identifying:

  • Decisions where confidence was below threshold
  • Interactions that required escalation
  • Patterns in user requests that existing SOPs do not cover well
  • Potential procedure improvements based on operational data

These debriefs feed into the SOP improvement pipeline, creating a structured feedback loop between operational experience and procedural refinement.

SOP Self-Improvement With Human Oversight

When accumulated operational data suggests a procedure modification, the system generates a specific, testable proposal. This proposal includes the current procedure, the suggested change, the evidence supporting the change, and the expected impact.

Critically, no SOP modification takes effect without human approval. The system proposes; humans decide. This preserves the auditability and accountability that regulated environments require while still allowing the system to surface genuine operational improvements.

Federated Learning With Differential Privacy

For organizations running multiple surrogates across different facilities or departments, Surrogate OS supports federated learning with differential privacy guarantees. This means surrogates can benefit from collective operational experience without exposing the underlying data from any individual facility.

A hospital network running ER triage surrogates at twelve locations can aggregate learnings about triage patterns without any individual patient data leaving its home facility. The differential privacy layer adds calibrated noise to the aggregated updates, providing mathematical guarantees about individual data protection.

The Interface Vision: One Identity, Many Forms

One of the most consequential design decisions in Surrogate OS is the separation between a surrogate's identity and its interface.

The same Senior ER Nurse surrogate — with the same persona, SOPs, compliance layer, and institutional memory — can be deployed as a text-based chat interface for patient intake, a voice interface for phone triage, an avatar for telemedicine interactions, or eventually a physical presence through humanoid robotic systems.

The interface layer handles modality-specific concerns: speech-to-text conversion, natural language generation tuned for spoken delivery, gesture and expression mapping for avatars. But the decision-making logic, compliance enforcement, and professional identity remain constant across all interfaces.

Fleet Management

Real-world deployments involve not one surrogate but many. A hospital might run triage surrogates in the ER, patient education surrogates in outpatient clinics, and administrative surrogates handling insurance coordination.

Surrogate OS includes fleet management capabilities for monitoring, updating, and coordinating multiple surrogates. Operators can push SOP updates across an entire fleet, monitor compliance metrics in aggregate, and manage handoff protocols between surrogates with different specializations.

Human-AI Handoff Protocol

Not every situation can be handled by a surrogate. The system includes a structured handoff protocol for transferring operational context from a surrogate to a human professional. The handoff package includes a complete interaction summary, relevant clinical or operational data, the surrogate's assessment and confidence level, and any compliance-relevant flags.

This is designed to minimize the information loss that typically occurs when an AI system "escalates to a human." The receiving professional gets a structured briefing, not a raw chat transcript.

Technical Foundation

Surrogate OS is not a prototype. It is a production-grade platform built with the engineering rigor that regulated industries demand.

Test Coverage — The platform includes 571 tests spanning unit, integration, and end-to-end scenarios. Test coverage includes not just happy paths but failure modes, edge cases, and compliance-specific validation scenarios.

API Surface — Over 130 REST endpoints covering surrogate lifecycle management, SOP operations, compliance reporting, memory management, fleet coordination, and administrative functions.

Technology Stack — TypeScript monorepo using Turborepo for build orchestration. The backend runs on Node.js with a PostgreSQL database. The architecture follows a loosely coupled, event-driven design with dependency injection throughout.

Deployment — Docker Compose configuration for single-command deployment. The system is designed for both cloud and on-premises deployment — an important consideration for healthcare organizations with data residency requirements.

Multi-Tenant Architecture — Built from the ground up for multi-tenancy, with tenant isolation at the database, API, and surrogate level. Organizations can run isolated surrogate fleets with independent compliance configurations.

Observability — Full observability stack with structured logging, distributed tracing, and metrics collection. Every decision a surrogate makes is traceable from API request through SOP execution to the final response.

What Comes Next

Surrogate OS is ready for early adopters, contributors, and organizations willing to explore what compliance-first AI agents look like in practice.

The near-term roadmap includes:

  • Production Hardening — Enhanced rate limiting, circuit breakers, and resilience patterns for high-availability deployments.
  • Additional Regulatory Frameworks — Expanding beyond the initial six frameworks to cover industry-specific regulations across more jurisdictions.
  • Voice Interface Reference Implementation — A complete reference deployment of voice-based surrogate interaction.
  • Partner Integrations — Connectors for major EHR systems, financial platforms, and enterprise communication tools.
  • Compliance Certification Support — Tooling to help organizations generate the documentation artifacts needed for regulatory certification processes.

We are particularly interested in collaborators from regulated industries — healthcare systems, financial institutions, legal organizations — who can provide domain expertise and real-world validation scenarios.

Get Involved

Surrogate OS is available now:

We welcome contributions across the entire stack — from core platform development to compliance framework implementations to documentation and testing. If you have domain expertise in healthcare, finance, or legal technology, we especially want to hear from you.

A Closing Thought

The conversation about AI in the workforce tends to oscillate between utopian promises and dystopian fears. We think both framings miss the point.

The question is not whether AI will take on professional roles. It already is. The question is whether those AI professionals will operate with the same standards of accountability, compliance, and governance that we expect from their human counterparts.

Surrogate OS is our answer: build the compliance in from the start, make the decision-making transparent and auditable, give organizations the tools to govern their AI workforce with the same rigor they apply to their human workforce, and open-source the entire thing so that trust can be verified rather than assumed.

The workforce of the future will be a blend of human and AI professionals. Surrogate OS exists to make sure the AI half of that equation is built on a foundation worthy of the trust it will be given.