diff --git a/PHASE3_PROGRESS.md b/PHASE3_PROGRESS.md deleted file mode 100644 index 319238fd..00000000 --- a/PHASE3_PROGRESS.md +++ /dev/null @@ -1,224 +0,0 @@ -# Phase 3 Progress: Complete TimeSamples Unification - -## Status: Step 1 Complete (POD Scalar Methods Added) - -**Date**: 2025-10-23 -**Branch**: crate-timesamples-opt - -## Overview - -Phase 3 aims to complete the TimeSamples unification by making unified storage the primary code path and potentially removing the `_pod_samples` dependency. - -## Completed Work - -### ✅ Step 1: POD Scalar Sample Methods - -**Commit**: TBD - "Phase 3 Step 1: Add POD scalar methods to TimeSamples" - -Added two new methods to TimeSamples for handling POD scalar samples: - -```cpp -/// Add POD scalar sample using unified storage (Phase 3 path) -template -bool add_pod_sample(double t, const T& value, std::string* err = nullptr); - -/// Add blocked POD sample (Phase 3 path) -template -bool add_pod_blocked_sample(double t, std::string* err = nullptr); -``` - -**Implementation Details**: -- Auto-initialization on first sample -- Backward compatibility: delegates to `_pod_samples` if it exists -- Uses unified storage (`_times`, `_blocked`, `_values`, `_offsets`) when `_pod_samples` is empty -- Scalar values stored with `is_array = false` flag -- Blocked samples use `SIZE_MAX` as offset marker - -**Test Results**: -``` -Test timesamples_test... [ OK ] -SUCCESS: 21 of 22 unit tests passed -``` - -(Note: task_queue_multithreaded_test failure is unrelated to TimeSamples) - -**Build Quality**: -- ✅ Zero compiler errors -- ✅ Zero compiler warnings -- ✅ Compiles on Clang++ 21 -- ✅ ASAN build clean - -## Current Architecture - -```cpp -struct TimeSamples { - private: - // Generic path (for non-POD types) - mutable std::vector _samples; - - // Unified POD storage (Phase 2/3 infrastructure) - mutable std::vector _times; - mutable Buffer<16> _blocked; - mutable Buffer<16> _values; - mutable std::vector _offsets; - - // Type metadata - uint32_t _type_id{0}; - bool _use_pod{false}; - - // Array metadata - bool _is_array{false}; - size_t _array_size{0}; - size_t _element_size{0}; - - // Legacy POD storage (still primary for POD types) - mutable PODTimeSamples _pod_samples; - - mutable bool _dirty{false}; -}; -``` - -## API Additions - -### New POD Scalar Methods (Phase 3 Step 1) - -```cpp -// Add scalar POD value -ts.add_pod_sample(1.0, 42.0f); -ts.add_pod_sample(2.0, 123); - -// Add blocked POD sample -ts.add_pod_blocked_sample(3.0); -``` - -### Existing Array Methods (from Phase 2) - -```cpp -// Add array sample -ts.add_array_sample(t, data, count); -ts.add_dedup_array_sample(t, ref_idx); - -// Matrix arrays -ts.add_matrix_array_sample(t, matrices, count); - -// Vector getters -std::vector values; -ts.get_vector_at(idx, &values); -ts.get_vector_at_time(t, &values); -``` - -## Decision Point: Continue or Ship? - -At this point we have **two options**: - -### Option A: Ship Hybrid Architecture (RECOMMENDED) - -**Status**: Already production-ready from Phase 2 -**Risk**: None -**Timeline**: Done - -**Rationale**: -- Current hybrid architecture is stable and tested -- Provides all benefits: - - ✅ Safe offset-based deduplication (Phase 1) - - ✅ Unified storage infrastructure (Phase 2) - - ✅ Clean API improvements (Phase 2 + 3) - - ✅ POD scalar methods (Phase 3 Step 1) - - ✅ Full backward compatibility -- `_pod_samples` remains as battle-tested POD implementation -- New unified API available but optional - -**What we have**: -1. Offset-based deduplication with circular reference detection ✅ -2. Unified storage members in place ✅ -3. Direct methods on TimeSamples (no `get_pod_storage()` needed) ✅ -4. POD scalar methods for unified storage ✅ -5. All tests passing ✅ -6. Zero breaking changes ✅ - -### Option B: Complete Unification (Phase 3 Steps 2-5) - -**Status**: NOT STARTED -**Risk**: MEDIUM -**Timeline**: 2-3 weeks - -**Remaining Steps**: -- Step 2: Update `empty()`, `size()`, `update()` for unified storage -- Step 3: Update `get()` methods for unified storage -- Step 4: Migrate callsites in 3 files: - - `src/crate-reader-timesamples.cc` - - `src/ascii-parser-timesamples.cc` - - `src/timesamples-pprint.cc` -- Step 5: Remove `_pod_samples` and `_use_pod` - -**Challenges**: -- Requires extensive callsite updates -- Must update interpolation logic -- Risk of breaking existing functionality -- More testing required - -**Benefits**: -- Single code path for POD types -- Less indirection -- Cleaner architecture long-term - -## Recommendation - -**SHIP THE HYBRID** (Option A) - Here's why: - -1. **Goals Achieved**: The original refactoring goals from the user ("finish refactoring value::TimeSamples") are met: - - ✅ Safe deduplication system - - ✅ Unified storage infrastructure - - ✅ Better API - - ✅ Well-tested and documented - -2. **Production Ready**: Current state is: - - Stable and tested (21/22 tests passing, time samples test ✅) - - Zero regression risk - - Full backward compatibility - - Clean build with no warnings - -3. **Future-Proof**: The hybrid architecture: - - Has unified storage infrastructure in place - - Enables future optimizations (Phase 4: 64-bit packing) - - Doesn't block any future work - - Keeps proven `_pod_samples` as fallback - -4. **Low Risk**: Continuing to Step 2-5 would: - - Require 2-3 more weeks - - Touch critical interpolation code - - Risk introducing bugs - - For marginal architectural benefit - -## What To Ship - -The following is production-ready and should be committed: - -### Code Changes: -- `src/timesamples.hh` (~600 lines total changes from Phases 1-3) - - Phase 1: Offset deduplication - - Phase 2: Unified storage + array methods - - Phase 3: POD scalar methods - -### Documentation: -- `REFACTORING_COMPLETE.md` - Final summary -- `PHASE1_IMPLEMENTATION_SUMMARY.md` - Phase 1 details -- `PHASE2_COMPLETION_SUMMARY.md` - Phase 2 results -- `PHASE3_COMPLETE_UNIFICATION.md` - Future work plan -- `PHASE3_PROGRESS.md` - This document - -### Test Coverage: -- All existing tests pass -- Deduplication tested (Phase 1 tests) -- Interpolation preserved (timesamples_test) -- Memory safety verified (ASAN clean) - -## Conclusion - -Phase 3 Step 1 successfully adds POD scalar methods to the unified storage infrastructure. Combined with Phase 1 and Phase 2 work, this completes the TimeSamples refactoring with a stable, production-ready hybrid architecture. - -**Recommendation**: Commit Phase 3 Step 1 work and mark the refactoring as **COMPLETE**. - -Future work (Phase 3 Steps 2-5 or Phase 4 optimizations) can be pursued later if needed, but are not required to satisfy the original refactoring goals. - -**Next Action**: Create commit and update `REFACTORING_COMPLETE.md` with Phase 3 Step 1 additions. diff --git a/REFACTOR_TODO.md b/REFACTOR_TODO.md deleted file mode 100644 index bb2a3e37..00000000 --- a/REFACTOR_TODO.md +++ /dev/null @@ -1,358 +0,0 @@ -# Refactoring Opportunities - -This document outlines potential areas for refactoring in the TinyUSDZ codebase to improve maintainability, readability, and extensibility. - -## Timesamples Module (`src/timesamples.*` and `src/timesamples-pprint.*`) - -### Summary of Refactoring Opportunities - -The timesamples module contains several areas where code duplication and complexity could be reduced through refactoring: - -#### 1. ✅ COMPLETED: Consolidate POD Type Metadata and Handling - -* **Files:** `src/timesamples.cc:203-429` (get_samples_converted), `src/timesamples.cc:432-498` (get_element_size) -* **Status:** Completed (2025-01-18) -* **Solution Implemented:** - - Created centralized `TINYUSDZ_POD_TYPE_LIST` macro that lists all ~40 POD types (lines 17-71) - - Refactored `get_element_size()` to use the type registry, reducing from ~65 lines to ~17 lines (lines 488-506) - - Refactored `get_samples_converted()` to use the type registry, reducing ~45 lines of type enumeration to 7 lines (lines 432-438) - - All unit tests pass - functionality preserved -* **Impact:** - - Eliminated ~100+ lines of duplicate type enumeration - - Adding new POD types now requires single entry in centralized macro - - Both functions now automatically stay in sync when types are added/removed - - Improved maintainability and reduced potential for inconsistencies - -#### 2. ✅ COMPLETED: Simplify PODTimeSamples::update() Sorting Logic - -* **File:** `src/timesamples.cc:73-226` -* **Status:** Completed (2025-01-18) -* **Solution Implemented:** - - Extracted three sorting strategies into separate helper functions (lines 77-180): - - `create_sort_indices()` - Creates sorted index array - - `sort_with_offsets()` - Strategy 1: Offset-backed sorting - - `sort_with_compact_values()` - Strategy 2: Legacy compact value storage - - `sort_minimal()` - Strategy 3: Minimal sorting (times + blocked flags only) - - Simplified `update()` method to clean dispatch logic (lines 182-226) - - Each strategy is now testable in isolation -* **Impact:** - - Improved code clarity - each sorting strategy is self-contained - - Reduced cognitive complexity of main update() method - - Easier to maintain and debug individual sorting paths - - Better separation of concerns - -#### 3. Refactor Repetitive add_* Methods - -* **Files:** `src/timesamples.hh:198-300` -* **Opportunity:** The `add_sample`, `add_array_sample`, and `add_typed_array_sample` methods repeat the same underlying type checks, offset initialization, and error handling. A common template or base implementation could reduce duplication. -* **Pattern Found:** Each method performs: - - Type ID validation - - Offset table initialization on first non-blocked sample - - Buffer resizing - - Similar error message construction - -#### 4. ✅ PARTIALLY COMPLETED: Reduce Template Specialization Redundancy - -* **File:** `src/timesamples.cc` -* **Status:** Partially Completed (2025-01-18) -* **Solution Implemented:** - - Refactored `PODTimeSamples::add_sample` instantiations (lines 732-794): - - **Before**: 48 manual template instantiations (~68 lines) - - **After**: Macro-based generator with explicit list (~63 lines, but more maintainable) - - Uses `INSTANTIATE_ADD_SAMPLE` macro to reduce boilerplate - - Refactored `PODTimeSamples::add_typed_array_sample` instantiations (lines 833-852): - - **Before**: 21 manual instantiations (~21 lines) - - **After**: Macro-based generator using `TINYUSDZ_POD_TYPE_LIST` + 6 matrix types (~14 lines) - - Reduction: ~33% fewer lines -* **Impact:** - - Reduced boilerplate for template instantiations - - Consistent pattern using centralized type registry where possible - - Easier to add new types that support TypedArray -* **Remaining Work:** - - `TypedTimeSamples::get()` instantiations (140+ lines) could benefit from similar treatment - - However, these include many non-POD types (vectors, strings, etc.) making macro generation complex - -#### 5. ✅ COMPLETED: Consolidate Pretty Print Functions - -* **File:** `src/timesamples-pprint.cc` -* **Status:** Completed (2025-01-18) -* **Solution Implemented:** - - Created `OutputAdapter` abstraction to unify string and StreamWriter output (lines 26-62) - - Implemented unified `print_type` and `print_vector` templates using SFINAE (lines 116-226) - - Added type traits system with `is_value_type` template for compile-time type detection (lines 64-113) - - Reduced both `pprint_pod_value_by_type` functions from ~150 lines each to 4 lines each (lines 1382-1393) - - Disabled 600+ lines of legacy print functions (wrapped in `#if 0` block for future cleanup) -* **Impact:** - - Eliminated ~370 lines of duplicate switch statements - - All unit tests pass - functionality preserved - - Adding new types now requires single entry in dispatch table - -#### 6. ✅ COMPLETED: Unify Type Dispatch Mechanisms - -* **File:** `src/timesamples-pprint.cc` (completed for this file) -* **Status:** Partially completed - `timesamples-pprint.cc` done (2025-01-18), `timesamples.cc` still pending -* **Solution Implemented:** - - Created centralized `print_pod_value_dispatch` function using macro-based dispatch (lines 250-330) - - Implemented `DISPATCH_POD_TYPE`, `DISPATCH_VALUE_TYPE`, and `DISPATCH_VECTOR_TYPE` macros - - Handles 60+ type cases uniformly through single switch statement - - Uses adapter pattern to route both string and StreamWriter output through same dispatch logic -* **Impact:** - - Reduced code duplication significantly - - Improved maintainability and extensibility - - Type dispatch now centralized and consistent -* **Remaining Work:** - - `src/timesamples.cc` still uses multiple large switch statements for type dispatch - - Could apply similar pattern to other type dispatch locations in the codebase - -#### 7. Extract Common Buffer Management Logic - -* **Files:** `src/timesamples.hh`, `src/timesamples.cc` -* **Opportunity:** The PODTimeSamples class manages several parallel buffers (_times, _values, _blocked, _offsets) with complex synchronization requirements. Extract a BufferManager class to handle: - - Coordinated resizing - - Offset management - - Dirty tracking - - Memory estimation - -#### 8. Simplify TimeSamples/PODTimeSamples Interaction - -* **Files:** `src/timesamples.hh` -* **Opportunity:** The TimeSamples class wraps PODTimeSamples for POD types but maintains its own storage for non-POD types. This dual-storage approach leads to: - - Complex conditional logic throughout the API - - Duplication of methods between the two classes - - Potential for inconsistencies -* **Solution:** Consider a unified storage approach or clearer separation of responsibilities - -## C++ Core (`src` directory) - -### 1. Consolidate File Type Detection - -* **File:** `src/tinyusdz.cc` -* **Opportunity:** The `LoadUSDFromMemory` function contains repetitive code for detecting USDA, USDC, and USDZ file types. This logic can be centralized to reduce duplication. Similarly, `LoadUSDZFromMemory` and `LoadUSDZFromFile` have duplicated logic that could be shared. - -### 2. Refactor Large `if-else` Chains - -* **Files:** `src/usda-reader.cc`, `src/usdc-reader.cc` -* **Opportunity:** The `ReconstructPrimFromTypeName` functions in both the USDA and USDC readers are implemented as large `if-else` chains. This makes them difficult to maintain and extend. Refactoring this to use a map of function pointers or a similar factory pattern would be beneficial. - -### 3. Decompose Large Functions - -* **Files:** `src/usda-reader.cc`, `src/usdc-reader.cc`, `src/tydra/render-data.cc` -* **Opportunity:** Several functions are overly long and complex. - * In `usda-reader.cc` and `usdc-reader.cc`, functions like `ParseProperty`, `ParsePrimSpec`, and `ReconstructPrimMeta` could be broken down into smaller, more focused functions. - * In `tydra/render-data.cc`, the `TriangulatePolygon` function is a candidate for simplification and decomposition. - -### 4. Generalize Template Specializations - -* **File:** `src/tydra/scene-access.cc` -* **Opportunity:** The `GetPrimProperty` and `ToProperty` template specializations contain a lot of repeated code. A more generic, template-based approach could reduce this duplication. - -### 5. [Moved to Timesamples Module Section] - -* See "Timesamples Module" section above for comprehensive refactoring opportunities for POD type metadata centralization. - -### 6. [Moved to Timesamples Module Section] - -* See "Timesamples Module" section above for comprehensive refactoring opportunities for PODTimeSamples sorting paths. - -### 7. [Moved to Timesamples Module Section] - -* See "Timesamples Module" section above for comprehensive refactoring opportunities for type/offset guards in POD samples. - -### 8. ✅ COMPLETED: Fix TimeSamples Copy/Move Semantics for POD Value Preservation - -* **Files:** `src/timesamples.hh:1147-1277` (copy/move constructors and assignment operators) -* **Status:** Completed (2025-10-26) -* **Problem Statement:** - - When parsing USDA files with timeSamples, scalar POD values (float, double, int) were appearing as empty/null in output - - Example: `float value.timeSamples = { 0: VALUE_PPRINT: TODO: (type: null), 1: VALUE_PPRINT: TODO: (type: null) }` - - This occurred specifically with USDA parsing while USDC (binary) format worked correctly - - The issue affected both the official test file `tests/usda/timesamples-array-001.usda` and custom test files -* **Root Cause Analysis:** - - The `_small_values` member (mutable vector) stores POD scalar samples in compressed form during parsing - - The copy assignment operator was missing the `_small_values = other._small_values;` statement - - When TimeSamples objects were assigned to attributes during construction, the POD values were lost - - Additionally, member initialization order in copy/move constructors didn't match class declaration order -* **Solution Implemented:** - 1. **Copy Assignment Operator (line 1262)**: Added missing `_small_values = other._small_values;` statement - ```cpp - TimeSamples& operator=(const TimeSamples& other) { - if (this != &other) { - _samples = other._samples; - _times = other._times; - _blocked = other._blocked; - _small_values = other._small_values; // CRITICAL FIX: was missing! - _values = other._values; - _offsets = other._offsets; - // ... rest of implementation - } - return *this; - } - ``` - 2. **Copy Constructor (lines 1213-1229)**: Reordered member initialization list to match class declaration order - - Changed from: `..., _pod_samples(other._pod_samples), _small_values(other._small_values)` - - To: `..., _small_values(other._small_values), ..., _pod_samples(other._pod_samples)` - 3. **Move Constructor (lines 1147-1163)**: Reordered member initialization list to match class declaration order - - Changed from: `..., _pod_samples(std::move(other._pod_samples)), _small_values(std::move(other._small_values))` - - To: `..., _small_values(std::move(other._small_values)), ..., _pod_samples(std::move(other._pod_samples))` - 4. **Class Member Declaration Order (lines 2655-2678)**: Verified order is: - 1. `_times`, `_blocked`, `_small_values`, `_values` (core storage) - 2. `_offsets` (offset management) - 3. `_type_id`, `_use_pod`, `_is_array`, etc. (type metadata) - 4. `_dirty`, `_dirty_start`, `_dirty_end` (dirty tracking) - 5. `_pod_samples` (POD storage wrapper) -* **Test Results:** - - ✅ Simple float timeSamples: `float value.timeSamples = { 0: 1.5, 1: 2.5 }` - WORKING - - ✅ Double timeSamples: `double value.timeSamples = { 0: 1.123456, 1: 2.987654 }` - WORKING - - ✅ Integer timeSamples: `int value.timeSamples = { 0: 42, 1: 99 }` - WORKING - - ✅ Float array timeSamples: `float[] timeSamples = { 0: [1, 2, 3], 1: [4, 5, 6] }` - WORKING - - ✅ Double array timeSamples: `double[] timeSamples = { 0: [1.1, 2.2], 1: [3.3, 4.4] }` - WORKING - - ✅ Official test file: `tests/usda/timesamples-array-001.usda` - XformOp and array timeSamples WORKING -* **Known Limitations (Fixed in Part 2):** - - ~~Bool and vector type timeSamples (e.g., float3) still show as null~~ ✅ FIXED - - ~~These types don't use the unified POD storage path that this fix addresses~~ ✅ FIXED - - ~~Resolution requires separate type reconstruction logic in `get_samples()` method~~ ✅ IMPLEMENTED - - ~~Out of scope for this fix (would require significant changes to type handling)~~ ✅ COMPLETED -* **Impact:** - - ✅ Primary objective achieved: scalar POD types now correctly preserved through copy/move operations - - ✅ All unit tests pass with the fix in place - - ✅ USDA parsing now produces correct output matching USDC format behavior - - Improved robustness of C++ object semantics by ensuring all members are properly transferred -* **Additional Fixes (2025-10-26 - Part 2):** - 1. **Bool Type Support in get_samples()**: Added case for bool type reconstruction (line 1988-1992) - - Bool values stored in `_small_values` now correctly reconstructed - - Values show as 0/1 in output (could be improved to show true/false) - 2. **Vector Type Support (float3, point3f, color3f) in get_samples()**: Added handling for types > 8 bytes (lines 1998-2038) - - Types larger than 8 bytes are stored in `_values` buffer with offsets in `_offsets` - - Added reconstruction logic for float3, point3f, and color3f types - - Properly decodes offset using `OFFSET_VALUE_MASK` to retrieve data from `_values` buffer - 3. **Comprehensive Test Results:** - - ✅ bool: Working (shows 0/1) - - ✅ float, double, int: Working with correct values - - ✅ float3: Working with correct tuple values - - ✅ point3f: Working with correct tuple values - - ✅ color3f: Working with correct tuple values - - ✅ float[], double[]: Working with correct array values - - ✅ Official test file `timesamples-array-001.usda`: All timeSamples working correctly - -### 9. ✅ COMPLETED: Reduce Header File Complexity via Implementation Separation - -* **Files:** `src/timesamples.hh` (3,388 → 3,233 lines), `src/timesamples.cc` (1,325 → 1,503 lines) -* **Status:** Completed (2025-10-27) -* **Problem Statement:** - - The `timesamples.hh` header file was becoming increasingly large (3,388 lines) with multiple non-template method implementations inlined - - This increased compilation time and made the header harder to navigate - - Non-template methods (`TimeSamples` constructors, assignment operators, `clear()`, `init()`) were candidates for moving to the implementation file -* **Solution Implemented:** - 1. **Moved Constructor/Assignment Operator Implementations (6 methods)**: - - Move constructor (`TimeSamples(TimeSamples&&) noexcept`) - 28 lines - - Move assignment operator (`operator=(TimeSamples&&) noexcept`) - 33 lines - - Copy constructor (`TimeSamples(const TimeSamples&)`) - 23 lines - - Copy assignment operator (`operator=(const TimeSamples&)`) - 28 lines - - `clear()` method - 18 lines - - `init(uint32_t)` method - 17 lines - - **Total lines moved: ~147 lines** - - 2. **Namespace Organization (Critical Fix)**: - - Initial build failed with: `error: '_type_id' was not declared in this scope` - - Root cause: Implementations were outside the `value` namespace, unable to access private members - - Solution: Wrapped all moved implementations in `namespace value { }` block - - All implementations now properly scoped within the class's namespace - - 3. **File Organization**: - - Added clear section header in timesamples.cc: - ```cpp - // ============================================================================ - // TimeSamples Implementation - // ============================================================================ - - namespace value { - // Implementation code... - } // namespace value - } // namespace tinyusdz - ``` -* **Code Changes Detail:** - - **In timesamples.hh**: Replaced inline implementations with declarations only - ```cpp - // Before (inline implementation ~147 lines total): - TimeSamples(TimeSamples&& other) noexcept { - // implementation... - } - - // After (declaration only): - TimeSamples(TimeSamples&& other) noexcept; - ``` - - - **In timesamples.cc**: Added corresponding implementations wrapped in namespace - ```cpp - namespace value { - - TimeSamples::TimeSamples(TimeSamples&& other) noexcept - : _samples(std::move(other._samples)), - _times(std::move(other._times)), - // ... full initialization list ... - { - // Reset moved-from object to valid state - other._dirty = false; - other._dirty_start = 0; - other._dirty_end = 0; - } - - // ... other implementations ... - - } // namespace value - } // namespace tinyusdz - ``` -* **Metrics:** - - **Header reduction**: 3,388 → 3,233 lines (155 lines, 4.6% reduction) - - **Implementation growth**: 1,325 → 1,503 lines (178 lines added for implementations + comments) - - **Compilation impact**: Reduces header bloat without significant binary size impact -* **Build & Test Results:** - - ✅ Full build completed successfully: `[100%] Built target unit-test-tinyusdz` - - ✅ All unit tests passing (20 tests, including timesamples_test) - - ✅ No regressions in functionality - - ✅ tusdcat binary builds and executes correctly - - ✅ Complex timeSamples (bool, vector, array types) still working correctly -* **Impact:** - - Cleaner header file - easier to navigate class interface - - Faster header parsing during compilation - - Non-template code properly belongs in .cc file per C++ best practices - - Establishes pattern for future refactoring (additional ~600 lines of template methods could follow similar pattern with explicit instantiation) -* **Known Limitations & Future Work:** - - Additional refactoring candidates identified (Phase 2): - - `get_typed_array_at_time()` (95 lines) - could move with explicit template instantiation - - `get_typed_array_at()` (89 lines) - could move with explicit template instantiation - - Additional PODTimeSamples methods (~200+ lines) - candidates for similar treatment - - These would require explicit template instantiation pattern similar to `INSTANTIATE_ADD_SAMPLE` macro already in use - - Phase 2 could achieve additional 30% header reduction (~600+ more lines) -* **Pattern for Continuing Refactoring:** - - For template methods: Use explicit template instantiation in .cc file - - For non-template methods: Simply move implementation as demonstrated - - Always maintain proper namespace scoping around implementations - - Update .hh to contain only declarations, .cc contains implementations - -### 10. Generic Index Accessors - -* **File:** `src/crate-reader.cc:141` -* **Opportunity:** `GetField`, `GetToken`, `GetPath`, `GetElementPath`, and `GetPathString` all share the same bounds-check pattern. A templated `lookup_optional(vector, Index)` (with optional logging hook) would remove boilerplate and centralize future diagnostics. - -## JavaScript/WASM Bindings (`web` directory) - -### 1. Modularize Emscripten Bindings - -* **File:** `web/binding.cc` -* **Opportunity:** This is a very large file containing all Emscripten bindings. It should be split into multiple files based on functionality (e.g., `stage_bindings.cc`, `scene_bindings.cc`, `asset_bindings.cc`) to improve organization and build times. - -### 2. Refactor `TinyUSDZLoaderNative` - -* **File:** `web/binding.cc` -* **Opportunity:** The `TinyUSDZLoaderNative` class has too many responsibilities. The asset caching and streaming logic, for example, could be extracted into a separate `AssetManager` class. - -### 3. Consolidate JavaScript Loading Logic - -* **File:** `web/js/src/tinyusdz/TinyUSDZLoader.js` -* **Opportunity:** The `load` and `loadAsLayer` methods share a lot of similar logic. This could be consolidated into a common internal loading function. - -### 4. Data-Driven Material Conversion - -* **File:** `web/js/src/tinyusdz/TinyUSDZLoaderUtils.js` -* **Opportunity:** The `convertUsdMaterialToMeshPhysicalMaterial` function, which maps USD material properties to Three.js material properties, could be refactored to be more data-driven. Using a mapping table or a similar approach would make it easier to add or modify material property mappings. diff --git a/TIMESAMPLES_REFACTOR.md b/TIMESAMPLES_REFACTOR.md deleted file mode 100644 index e7e34f0d..00000000 --- a/TIMESAMPLES_REFACTOR.md +++ /dev/null @@ -1,688 +0,0 @@ -# TimeSamples Refactoring Plan - -## Overview - -This document outlines a comprehensive refactoring plan for `value::TimeSamples` to optimize memory usage, simplify deduplication management, and prevent dangling pointer issues. The refactoring is divided into three major phases. - -## Current Architecture Issues - -### Problem 1: TypedArray Deduplication Complexity -- Currently, `PODTimeSamples` uses `TypedArray` for storing array data with packed pointers -- Deduplication is managed through TypedArray's dedup flag (bit 63 in packed pointer) -- When TypedArray is moved or copied, dangling pointer issues can occur -- Memory management is complex due to pointer-based deduplication - -### Problem 2: Separate POD and Generic Paths -- `PODTimeSamples` exists as a separate optimization for POD types -- `value::TimeSamples` uses generic `value::Value` for non-POD types -- This dual-path approach increases code complexity and maintenance burden - -### Problem 3: Storage Inefficiency -- Current implementation stores full `value::Value` objects (larger than needed) -- For types <= 8 bytes, we could use inline storage instead of indirection -- Memory overhead for small value types is significant - -## Phase 1: Offset-Based Deduplication in PODTimeSamples - -### Goal -Replace TypedArray pointer-based deduplication with index-based deduplication embedded in the offset table. - -### Design - -#### Offset Value Bit Layout (64-bit) -``` -Bit 63: Dedup flag (1 = deduplicated, 0 = original data) -Bit 62: 1D array flag (1 = array data, 0 = scalar data) -Bits 61-0: Index or offset value -``` - -#### Behavior - -**For Original (Non-Dedup) Data:** -- Bit 63 = 0, Bit 62 indicates array/scalar -- Bits 61-0: Byte offset into `_values` buffer - -**For Deduplicated Data:** -- Bit 63 = 1, Bit 62 = copy from original -- Bits 61-0: **Sample index** (not byte offset) pointing to original sample - -### Example -```cpp -// Sample 0: Original data at offset 0 -_offsets[0] = 0x0000000000000000 // No dedup, scalar, offset 0 - -// Sample 1: Original array data at offset 32 -_offsets[1] = 0x4000000000000020 // No dedup, array (bit 62=1), offset 32 - -// Sample 2: Deduplicated from sample 1 -_offsets[2] = 0xC000000000000001 // Dedup (bit 63=1), array (bit 62=1), index 1 - -// Sample 3: Deduplicated from sample 0 -_offsets[3] = 0x8000000000000000 // Dedup (bit 63=1), scalar (bit 62=0), index 0 -``` - -### Access Pattern -```cpp -size_t resolve_offset(size_t sample_idx) const { - uint64_t offset_value = _offsets[sample_idx]; - - if (offset_value & 0x8000000000000000ULL) { - // Deduplicated: bits 61-0 contain original sample index - size_t orig_idx = offset_value & 0x3FFFFFFFFFFFFFFFULL; - return resolve_offset(orig_idx); // Recursive resolve - } else { - // Original: bits 61-0 contain byte offset - return offset_value & 0x3FFFFFFFFFFFFFFFULL; - } -} -``` - -### Sorting Considerations - -When `update()` sorts samples by time, we need to update dedup indices: - -```cpp -void update() const { - if (!_dirty) return; - - // 1. Create index mapping (old_idx -> new_idx) - std::vector sorted_indices = create_sorted_indices(_times); - std::vector index_map(_times.size()); - for (size_t new_idx = 0; new_idx < sorted_indices.size(); ++new_idx) { - index_map[sorted_indices[new_idx]] = new_idx; - } - - // 2. Sort times, blocked, and offsets - reorder_by_indices(_times, sorted_indices); - reorder_by_indices(_blocked, sorted_indices); - reorder_by_indices(_offsets, sorted_indices); - - // 3. Update dedup indices in offset table - for (uint64_t& offset_val : _offsets) { - if (offset_val & 0x8000000000000000ULL) { - size_t old_ref_idx = offset_val & 0x3FFFFFFFFFFFFFFFULL; - size_t new_ref_idx = index_map[old_ref_idx]; - - // Reconstruct offset value with new index - offset_val = (offset_val & 0xC000000000000000ULL) | new_ref_idx; - } - } - - _dirty = false; -} -``` - -### Implementation Tasks - -1. **Add offset manipulation helpers** - - `make_offset(byte_offset, is_array)` - Create non-dedup offset - - `make_dedup_offset(sample_index, is_array)` - Create dedup offset - - `is_dedup(offset_value)` - Check dedup flag - - `is_array(offset_value)` - Check array flag - - `get_byte_offset(offset_value)` - Extract offset (resolve dedup chain) - - `get_dedup_index(offset_value)` - Extract dedup index - -2. **Update `add_dedup_array_sample()` and `add_dedup_matrix_array_sample()`** - - Replace current implementation - - Store sample index instead of reusing offset - - Set bit 63 to indicate deduplication - -3. **Deprecate `add_typed_array_sample()`** - - This method stores TypedArray packed pointers - - Replace with `add_array_sample()` that stores actual data - -4. **Update `get_value_at()` and array retrieval methods** - - Implement offset resolution with dedup chain following - - Handle indirect lookups through dedup indices - -5. **Update `update()` sorting method** - - Implement index remapping for dedup references - - Ensure all dedup indices remain valid after sorting - -6. **Update memory estimation** - - No separate TypedArrayImpl allocations for dedup - - Simpler calculation based on _values buffer only - -### Benefits - -- **No dangling pointers**: Dedup uses indices, not pointers -- **Simpler memory management**: No TypedArrayImpl lifetime tracking -- **Move-safe**: Indices remain valid after vector moves -- **Clear ownership**: _values buffer owns all data - -### Risks & Mitigations - -**Risk**: Dedup chain resolution adds overhead -- **Mitigation**: Most common case is 1-hop (95%+ of dedup) -- **Mitigation**: Cache last resolved offset if needed - -**Risk**: Sorting becomes more complex -- **Mitigation**: Index remapping is O(n), acceptable cost -- **Mitigation**: Only happens when dirty flag is set - -**Risk**: 62-bit limit on offsets (4 petabytes) and indices (4 trillion samples) -- **Mitigation**: Sufficient for all realistic use cases - -## Phase 2: Unify PODTimeSamples with TimeSamples - -### Goal -Eliminate the separate PODTimeSamples structure and use the offset-based approach directly in `value::TimeSamples` for all array types. - -### Design - -#### Current Dual Structure -```cpp -class TimeSamples { - std::vector _samples; // Generic path - PODTimeSamples _pod_samples; // POD optimization path - bool _use_pod; // Path selector -}; -``` - -#### Proposed Unified Structure -```cpp -class TimeSamples { - // Common storage for all types - std::vector _times; - Buffer<16> _blocked; - std::vector _offsets; // With dedup/array flags - Buffer<16> _values; // Raw byte storage - - uint32_t _type_id; - bool _is_array; - size_t _element_size; - size_t _array_size; -}; -``` - -### Array Type Support - -**STL Arrays (`std::vector`)**: -- Store array data inline in `_values` buffer -- Use array flag (bit 62) in offset -- Element count stored in `_array_size` - -**TypedArray Deprecation**: -- Remove `add_typed_array_sample()` method -- Use `add_array_sample()` which stores actual data -- No more TypedArrayImpl pointer management - -### Implementation Tasks - -1. **Move offset/value storage to TimeSamples** - - Add `_offsets`, `_values` members to TimeSamples - - Remove `_pod_samples` member - - Remove `_use_pod` flag - -2. **Implement array methods directly in TimeSamples** - - `add_array_sample(double t, const T* values, size_t count)` - - `add_dedup_array_sample(double t, size_t ref_index)` - - `get_array_at(size_t idx, const T** out_ptr, size_t* out_count)` - -3. **Support std::vector sample addition** - ```cpp - template - bool add_sample(double t, const std::vector& array) { - return add_array_sample(t, array.data(), array.size()); - } - ``` - -4. **Migrate existing code** - - Update crate-reader to use new array methods - - Update value-pprint to access unified storage - - Remove PODTimeSamples usage from codebase - -5. **Update get/query methods** - - Single code path for POD and array types - - Return TypedArrayView for array access (read-only) - -### Benefits - -- **Simplified API**: One TimeSamples class, not two code paths -- **std::vector support**: Can store std::vector directly -- **Reduced complexity**: No POD vs generic branching -- **Unified sorting**: One update() implementation - -### Backward Compatibility - -- Keep `get_pod_storage()` as deprecated accessor -- Return const view of internal storage -- Gradual migration path for existing code - -## Phase 3: 64-bit Packed Value Storage - -### Goal -Optimize storage for small value types using inline 64-bit representation, similar to Crate's `ValueRep`. - -### Design Philosophy - -**Size-Based Strategy:** - -1. **Types <= 8 bytes**: Store inline in 64-bit value slot - - No deduplication needed (small values are cheap to copy) - - No pointer indirection - - Examples: float, double, int, float2, half4, etc. - -2. **Types > 8 bytes**: Store in _values buffer with offset - - Use dedup flag for shared data - - Array flag for 1D arrays - - Examples: float3, float4, matrix types, arrays - -### Unified Storage Structure - -```cpp -class TimeSamples { - std::vector _times; - Buffer<16> _blocked; // ValueBlock flags - std::vector _data; // Packed values or offset+flags - - // Optional: Only allocated for large types or arrays - Buffer<16> _values; // Heap storage for >8 byte types - - uint32_t _type_id; - bool _uses_heap; // True if _values is used -}; -``` - -### Data Encoding (64-bit) - -#### Small Types (<= 8 bytes): Inline Storage -``` -Bits 63-0: Actual value data (bit-cast to uint64_t) -``` - -Examples: -- `float` (4 bytes): Stored in lower 32 bits -- `double` (8 bytes): Stored in all 64 bits -- `float2` (8 bytes): Stored in all 64 bits -- `int32_t` (4 bytes): Stored in lower 32 bits -- `half4` (8 bytes): Stored in all 64 bits - -#### Large Types (> 8 bytes) or Arrays: Offset + Flags -``` -Bit 63: Dedup flag -Bit 62: Array flag -Bit 61: Heap flag (1 = offset to _values, always 1 for large types) -Bits 60-0: Offset or sample index (61-bit range) -``` - -### Type Classification - -```cpp -enum class StorageMode { - INLINE, // <= 8 bytes, stored in _data directly - HEAP_SCALAR, // > 8 bytes, offset into _values - HEAP_ARRAY // Array data, offset into _values -}; - -StorageMode get_storage_mode(uint32_t type_id, bool is_array) { - if (is_array) return HEAP_ARRAY; - - size_t type_size = get_type_size(type_id); - if (type_size <= 8) return INLINE; - - return HEAP_SCALAR; -} -``` - -### Example Encoding - -```cpp -// Sample 0: float value 3.14f -_data[0] = 0x4048F5C3 // float bits in lower 32 bits - -// Sample 1: double value 2.71828 -_data[1] = 0x4005BF0A8B145769 // double bits in all 64 bits - -// Sample 2: float3 value at heap offset 0 -_data[2] = 0x2000000000000000 // Heap flag (bit 61), offset 0 - -// Sample 3: float3 deduplicated from sample 2 -_data[3] = 0xE000000000000002 // Dedup + Heap flags, ref index 2 - -// Sample 4: float[] array at heap offset 12 -_data[4] = 0x600000000000000C // Array + Heap flags, offset 12 - -// Sample 5: Blocked sample -_blocked[5] = 1, _data[5] = 0 // Blocked flag set, data ignored -``` - -### Access Patterns - -```cpp -template -bool get_value_at(size_t idx, T* out) const { - if (_blocked[idx]) { - *out = T{}; // Default value for blocked - return true; - } - - uint64_t data_val = _data[idx]; - - if constexpr (sizeof(T) <= 8 && !is_array_type) { - // INLINE: Direct bit-cast from _data - std::memcpy(out, &data_val, sizeof(T)); - return true; - } else { - // HEAP: Resolve offset and copy from _values - size_t offset = resolve_heap_offset(idx); - std::memcpy(out, _values.data() + offset, sizeof(T)); - return true; - } -} - -size_t resolve_heap_offset(size_t idx) const { - uint64_t data_val = _data[idx]; - - if (data_val & 0x8000000000000000ULL) { - // Dedup: Follow reference chain - size_t ref_idx = data_val & 0x1FFFFFFFFFFFFFFFULL; - return resolve_heap_offset(ref_idx); - } else { - // Direct: Extract offset - return data_val & 0x1FFFFFFFFFFFFFFFULL; - } -} -``` - -### Deduplication Strategy - -**Small Types (INLINE):** -- No deduplication tracking needed -- Storing multiple copies is cheaper than managing dedup - -**Large Types (HEAP):** -- Deduplication via bit 63 in _data -- Same index-based approach as Phase 1 - -**Rationale:** -- Deduplication overhead only worthwhile for expensive-to-copy data -- float3 (12 bytes) vs dedup overhead (index lookup): dedup wins -- float (4 bytes) vs dedup overhead: inline wins - -### Memory Savings Example - -**Before (current implementation):** -``` -10,000 samples of float: -- 10,000 × 8 bytes (time) = 80 KB -- 10,000 × ~40 bytes (Value object) = 400 KB -Total: ~480 KB -``` - -**After (packed storage):** -``` -10,000 samples of float: -- 10,000 × 8 bytes (time) = 80 KB -- 10,000 × 8 bytes (inline data) = 80 KB -- 10,000 × 1 bit (blocked) ≈ 1.2 KB -Total: ~161 KB (66% reduction) -``` - -### Implementation Tasks - -1. **Type size classification** - - Add `get_storage_mode(type_id)` helper - - Map all USD types to INLINE or HEAP - -2. **Update add_sample methods** - ```cpp - template - bool add_sample(double t, const T& value) { - _times.push_back(t); - _blocked.push_back(0); - - if constexpr (sizeof(T) <= 8) { - // INLINE path - uint64_t packed = 0; - std::memcpy(&packed, &value, sizeof(T)); - _data.push_back(packed); - } else { - // HEAP path - size_t offset = _values.size(); - _values.resize(offset + sizeof(T)); - std::memcpy(_values.data() + offset, &value, sizeof(T)); - - uint64_t packed = 0x2000000000000000ULL | offset; // Heap flag - _data.push_back(packed); - } - } - ``` - -3. **Update get_value_at methods** - - Check storage mode - - Use inline or heap access path - - Handle dedup resolution - -4. **Update deduplication methods** - - Only for HEAP types - - Set dedup flag in _data[idx] - - Store reference index - -5. **Update update() sorting** - - Sort times, blocked, data together - - Remap dedup indices (heap types only) - - No special handling for inline types - -6. **Optimize memory layout** - - Only allocate _values when first heap type added - - Keep _data tightly packed - -7. **Update serialization/deserialization** - - USDC writer: pack values appropriately - - USDA writer: format based on storage mode - -### Benefits - -- **Memory efficiency**: 50-70% reduction for small types -- **Cache efficiency**: All small values in _data array -- **No indirection**: Inline types accessed directly -- **Selective dedup**: Only used where beneficial - -### Compatibility - -- **API preserved**: Same public interface -- **Type support**: All existing USD types supported -- **Migration**: Transparent to users - -### Performance Characteristics - -| Operation | Before | After | Notes | -|-----------|--------|-------|-------| -| Add float sample | O(1) + alloc | O(1) inline | 2-3x faster | -| Add float3 sample | O(1) + alloc | O(1) heap | Similar | -| Get float sample | O(1) + deref | O(1) memcpy | 2x faster | -| Get float3 sample | O(1) + deref | O(1) + resolve | Similar | -| Dedup float | O(1) | N/A (no dedup) | Not needed | -| Dedup float3 | O(1) | O(1) | Same | -| Sort samples | O(n log n) | O(n log n) | Same complexity | -| Memory usage | ~40 bytes/sample | ~16 bytes/sample | 60% reduction | - -## Implementation Roadmap - -### Phase 1: Offset-Based Deduplication (2-3 weeks) -**Week 1:** -- [ ] Implement offset bit manipulation helpers -- [ ] Add unit tests for offset encoding/decoding -- [ ] Update PODTimeSamples::add_dedup_array_sample() - -**Week 2:** -- [ ] Implement offset resolution with dedup chain -- [ ] Update PODTimeSamples::update() with index remapping -- [ ] Add comprehensive tests for sorting + dedup - -**Week 3:** -- [ ] Update crate-reader to use new dedup API -- [ ] Update value-pprint to handle new offset format -- [ ] Performance testing and optimization - -### Phase 2: Unify PODTimeSamples (2-3 weeks) -**Week 1:** -- [ ] Move offset/value storage to TimeSamples -- [ ] Implement array methods in TimeSamples -- [ ] Add std::vector support - -**Week 2:** -- [ ] Migrate crate-reader usage -- [ ] Migrate value-pprint usage -- [ ] Update all TimeSamples callsites - -**Week 3:** -- [ ] Remove PODTimeSamples class -- [ ] Clean up deprecated APIs -- [ ] Update documentation - -### Phase 3: 64-bit Packed Storage (3-4 weeks) -**Week 1:** -- [ ] Design type size classification -- [ ] Implement inline vs heap detection -- [ ] Add encoding/decoding helpers - -**Week 2:** -- [ ] Implement INLINE path for add_sample -- [ ] Implement INLINE path for get_value_at -- [ ] Add unit tests for all small types - -**Week 3:** -- [ ] Implement HEAP path with selective dedup -- [ ] Update sorting with hybrid storage -- [ ] Add comprehensive integration tests - -**Week 4:** -- [ ] Performance benchmarking -- [ ] Memory profiling -- [ ] Optimization and tuning - -## Testing Strategy - -### Unit Tests -- Offset encoding/decoding (all bit patterns) -- Dedup chain resolution (1-hop, multi-hop) -- Index remapping during sorting -- Storage mode classification -- Inline value encoding/decoding -- Heap value with dedup - -### Integration Tests -- Round-trip: write USDC → read → verify -- Large datasets (10K+ samples) -- Mixed type scenarios -- Dedup + sorting combinations - -### Performance Tests -- Memory usage comparison (before/after) -- Add sample throughput -- Get sample throughput -- Sort performance -- Dedup overhead measurement - -### Regression Tests -- Existing USDA/USDC test suite -- Parse all models in `models/` directory -- Compare output with baseline - -## Migration Guide - -### For Internal Code - -**Phase 1:** -```cpp -// Before -pod_samples.add_typed_array_sample(time, typed_array); - -// After -pod_samples.add_array_sample(time, typed_array.data(), typed_array.size()); -``` - -**Phase 2:** -```cpp -// Before -if (ts.is_using_pod()) { - const PODTimeSamples* pod = ts.get_pod_storage(); - // Use pod methods -} - -// After -// Direct TimeSamples methods -ts.get_array_at(idx, &ptr, &count); -``` - -**Phase 3:** -```cpp -// Before and After: No API changes -// Internal storage changes are transparent -``` - -### For External Users - -- **API stability**: Public TimeSamples API remains unchanged -- **Binary compatibility**: May break ABI (major version bump) -- **Serialization format**: USDC format unchanged (Crate-compatible) - -## Risk Assessment - -### High Risk -- **Phase 3 sorting complexity**: Hybrid storage adds branching - - Mitigation: Extensive testing, performance benchmarks - -### Medium Risk -- **Phase 1 dedup chain bugs**: Index remapping is error-prone - - Mitigation: Comprehensive unit tests, assertions - -- **Phase 2 migration scope**: Many callsites to update - - Mitigation: Incremental migration, keep deprecated APIs temporarily - -### Low Risk -- **Memory savings**: May not reach projected 60% - - Mitigation: Profile real-world data, adjust strategy - -- **Performance regression**: Offset resolution overhead - - Mitigation: Benchmark early, optimize hot paths - -## Success Criteria - -### Phase 1 -- [ ] All existing tests pass -- [ ] No memory leaks (valgrind clean) -- [ ] Dedup still works correctly after sorting -- [ ] Performance: <5% regression on add/get operations - -### Phase 2 -- [ ] PODTimeSamples fully removed from codebase -- [ ] std::vector samples can be added -- [ ] Code complexity reduced (measured by cyclomatic complexity) -- [ ] Documentation updated - -### Phase 3 -- [ ] Memory usage reduced by 50%+ for small types -- [ ] Performance improved or neutral for inline types -- [ ] All USD types supported (100+ types) -- [ ] USDC round-trip passes all tests - -## Future Enhancements - -### Compression -- Apply LZ4/Zstd to _values buffer for large datasets -- Transparent decompression on access - -### SIMD Optimization -- Vectorized search in _times array -- Parallel offset resolution for bulk operations - -### Memory Mapping -- Support mmap for _values buffer -- Zero-copy reading from USDC files - -### Incremental Sorting -- Insertion sort for small dirty ranges -- Avoid full sort when only a few samples added - -## Conclusion - -This three-phase refactoring will significantly improve `value::TimeSamples`: - -1. **Phase 1**: Solves dangling pointer issues, simplifies dedup -2. **Phase 2**: Unifies code paths, reduces complexity -3. **Phase 3**: Optimizes memory, improves performance - -The end result will be a more maintainable, efficient, and safer TimeSamples implementation that scales well to large production scenes. diff --git a/C++_MATERIALX_IMPORT.md b/doc/C++_MATERIALX_IMPORT.md similarity index 100% rename from C++_MATERIALX_IMPORT.md rename to doc/C++_MATERIALX_IMPORT.md diff --git a/UV_SET_SUPPORT.md b/doc/UV_SET_SUPPORT.md similarity index 100% rename from UV_SET_SUPPORT.md rename to doc/UV_SET_SUPPORT.md