Files
tinyusdz/REFACTOR_TODO.md
Syoyo Fujita 16e60d86f3 Refactor TimeSamples header: Move non-template methods to implementation file
This commit implements Phase 1 of header complexity reduction for timesamples.hh:

**Changes:**
- Moved 6 non-template methods from timesamples.hh to timesamples.cc:
  - Move constructor (TimeSamples&&)
  - Move assignment operator
  - Copy constructor
  - Copy assignment operator
  - clear() method
  - init() method
  - Total: 147 lines moved

- All implementations properly wrapped in namespace value blocks
- Header reduced from 3,388 to 3,233 lines (155 lines, 4.6% reduction)
- Implementation file increased from 1,325 to 1,503 lines (+178 lines)

**Benefits:**
- Cleaner header file - easier to navigate class interface
- Faster header parsing during compilation
- Non-template code properly belongs in .cc per C++ best practices
- Establishes pattern for Phase 2 (template methods with explicit instantiation)

**Testing:**
- All unit tests pass (20 tests including timesamples_test)
- No regressions in functionality
- tusdcat builds and executes correctly with all timeSamples types

**Future Work Identified:**
- Phase 2: Move large template methods (~600 lines) using explicit instantiation
- Could achieve additional 30% header reduction

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-27 02:18:29 +09:00

20 KiB

Refactoring Opportunities

This document outlines potential areas for refactoring in the TinyUSDZ codebase to improve maintainability, readability, and extensibility.

Timesamples Module (src/timesamples.* and src/timesamples-pprint.*)

Summary of Refactoring Opportunities

The timesamples module contains several areas where code duplication and complexity could be reduced through refactoring:

1. COMPLETED: Consolidate POD Type Metadata and Handling

  • Files: src/timesamples.cc:203-429 (get_samples_converted), src/timesamples.cc:432-498 (get_element_size)
  • Status: Completed (2025-01-18)
  • Solution Implemented:
    • Created centralized TINYUSDZ_POD_TYPE_LIST macro that lists all ~40 POD types (lines 17-71)
    • Refactored get_element_size() to use the type registry, reducing from ~65 lines to ~17 lines (lines 488-506)
    • Refactored get_samples_converted() to use the type registry, reducing ~45 lines of type enumeration to 7 lines (lines 432-438)
    • All unit tests pass - functionality preserved
  • Impact:
    • Eliminated ~100+ lines of duplicate type enumeration
    • Adding new POD types now requires single entry in centralized macro
    • Both functions now automatically stay in sync when types are added/removed
    • Improved maintainability and reduced potential for inconsistencies

2. COMPLETED: Simplify PODTimeSamples::update() Sorting Logic

  • File: src/timesamples.cc:73-226
  • Status: Completed (2025-01-18)
  • Solution Implemented:
    • Extracted three sorting strategies into separate helper functions (lines 77-180):
      • create_sort_indices() - Creates sorted index array
      • sort_with_offsets() - Strategy 1: Offset-backed sorting
      • sort_with_compact_values() - Strategy 2: Legacy compact value storage
      • sort_minimal() - Strategy 3: Minimal sorting (times + blocked flags only)
    • Simplified update() method to clean dispatch logic (lines 182-226)
    • Each strategy is now testable in isolation
  • Impact:
    • Improved code clarity - each sorting strategy is self-contained
    • Reduced cognitive complexity of main update() method
    • Easier to maintain and debug individual sorting paths
    • Better separation of concerns

3. Refactor Repetitive add_* Methods

  • Files: src/timesamples.hh:198-300
  • Opportunity: The add_sample, add_array_sample, and add_typed_array_sample methods repeat the same underlying type checks, offset initialization, and error handling. A common template or base implementation could reduce duplication.
  • Pattern Found: Each method performs:
    • Type ID validation
    • Offset table initialization on first non-blocked sample
    • Buffer resizing
    • Similar error message construction

4. PARTIALLY COMPLETED: Reduce Template Specialization Redundancy

  • File: src/timesamples.cc
  • Status: Partially Completed (2025-01-18)
  • Solution Implemented:
    • Refactored PODTimeSamples::add_sample instantiations (lines 732-794):
      • Before: 48 manual template instantiations (~68 lines)
      • After: Macro-based generator with explicit list (~63 lines, but more maintainable)
      • Uses INSTANTIATE_ADD_SAMPLE macro to reduce boilerplate
    • Refactored PODTimeSamples::add_typed_array_sample instantiations (lines 833-852):
      • Before: 21 manual instantiations (~21 lines)
      • After: Macro-based generator using TINYUSDZ_POD_TYPE_LIST + 6 matrix types (~14 lines)
      • Reduction: ~33% fewer lines
  • Impact:
    • Reduced boilerplate for template instantiations
    • Consistent pattern using centralized type registry where possible
    • Easier to add new types that support TypedArray
  • Remaining Work:
    • TypedTimeSamples::get() instantiations (140+ lines) could benefit from similar treatment
    • However, these include many non-POD types (vectors, strings, etc.) making macro generation complex

5. COMPLETED: Consolidate Pretty Print Functions

  • File: src/timesamples-pprint.cc
  • Status: Completed (2025-01-18)
  • Solution Implemented:
    • Created OutputAdapter abstraction to unify string and StreamWriter output (lines 26-62)
    • Implemented unified print_type and print_vector templates using SFINAE (lines 116-226)
    • Added type traits system with is_value_type template for compile-time type detection (lines 64-113)
    • Reduced both pprint_pod_value_by_type functions from ~150 lines each to 4 lines each (lines 1382-1393)
    • Disabled 600+ lines of legacy print functions (wrapped in #if 0 block for future cleanup)
  • Impact:
    • Eliminated ~370 lines of duplicate switch statements
    • All unit tests pass - functionality preserved
    • Adding new types now requires single entry in dispatch table

6. COMPLETED: Unify Type Dispatch Mechanisms

  • File: src/timesamples-pprint.cc (completed for this file)
  • Status: Partially completed - timesamples-pprint.cc done (2025-01-18), timesamples.cc still pending
  • Solution Implemented:
    • Created centralized print_pod_value_dispatch function using macro-based dispatch (lines 250-330)
    • Implemented DISPATCH_POD_TYPE, DISPATCH_VALUE_TYPE, and DISPATCH_VECTOR_TYPE macros
    • Handles 60+ type cases uniformly through single switch statement
    • Uses adapter pattern to route both string and StreamWriter output through same dispatch logic
  • Impact:
    • Reduced code duplication significantly
    • Improved maintainability and extensibility
    • Type dispatch now centralized and consistent
  • Remaining Work:
    • src/timesamples.cc still uses multiple large switch statements for type dispatch
    • Could apply similar pattern to other type dispatch locations in the codebase

7. Extract Common Buffer Management Logic

  • Files: src/timesamples.hh, src/timesamples.cc
  • Opportunity: The PODTimeSamples class manages several parallel buffers (_times, _values, _blocked, _offsets) with complex synchronization requirements. Extract a BufferManager class to handle:
    • Coordinated resizing
    • Offset management
    • Dirty tracking
    • Memory estimation

8. Simplify TimeSamples/PODTimeSamples Interaction

  • Files: src/timesamples.hh
  • Opportunity: The TimeSamples class wraps PODTimeSamples for POD types but maintains its own storage for non-POD types. This dual-storage approach leads to:
    • Complex conditional logic throughout the API
    • Duplication of methods between the two classes
    • Potential for inconsistencies
  • Solution: Consider a unified storage approach or clearer separation of responsibilities

C++ Core (src directory)

1. Consolidate File Type Detection

  • File: src/tinyusdz.cc
  • Opportunity: The LoadUSDFromMemory function contains repetitive code for detecting USDA, USDC, and USDZ file types. This logic can be centralized to reduce duplication. Similarly, LoadUSDZFromMemory and LoadUSDZFromFile have duplicated logic that could be shared.

2. Refactor Large if-else Chains

  • Files: src/usda-reader.cc, src/usdc-reader.cc
  • Opportunity: The ReconstructPrimFromTypeName functions in both the USDA and USDC readers are implemented as large if-else chains. This makes them difficult to maintain and extend. Refactoring this to use a map of function pointers or a similar factory pattern would be beneficial.

3. Decompose Large Functions

  • Files: src/usda-reader.cc, src/usdc-reader.cc, src/tydra/render-data.cc
  • Opportunity: Several functions are overly long and complex.
    • In usda-reader.cc and usdc-reader.cc, functions like ParseProperty, ParsePrimSpec, and ReconstructPrimMeta could be broken down into smaller, more focused functions.
    • In tydra/render-data.cc, the TriangulatePolygon function is a candidate for simplification and decomposition.

4. Generalize Template Specializations

  • File: src/tydra/scene-access.cc
  • Opportunity: The GetPrimProperty and ToProperty template specializations contain a lot of repeated code. A more generic, template-based approach could reduce this duplication.

5. [Moved to Timesamples Module Section]

  • See "Timesamples Module" section above for comprehensive refactoring opportunities for POD type metadata centralization.

6. [Moved to Timesamples Module Section]

  • See "Timesamples Module" section above for comprehensive refactoring opportunities for PODTimeSamples sorting paths.

7. [Moved to Timesamples Module Section]

  • See "Timesamples Module" section above for comprehensive refactoring opportunities for type/offset guards in POD samples.

8. COMPLETED: Fix TimeSamples Copy/Move Semantics for POD Value Preservation

  • Files: src/timesamples.hh:1147-1277 (copy/move constructors and assignment operators)
  • Status: Completed (2025-10-26)
  • Problem Statement:
    • When parsing USDA files with timeSamples, scalar POD values (float, double, int) were appearing as empty/null in output
    • Example: float value.timeSamples = { 0: VALUE_PPRINT: TODO: (type: null), 1: VALUE_PPRINT: TODO: (type: null) }
    • This occurred specifically with USDA parsing while USDC (binary) format worked correctly
    • The issue affected both the official test file tests/usda/timesamples-array-001.usda and custom test files
  • Root Cause Analysis:
    • The _small_values member (mutable vector<uint64_t>) stores POD scalar samples in compressed form during parsing
    • The copy assignment operator was missing the _small_values = other._small_values; statement
    • When TimeSamples objects were assigned to attributes during construction, the POD values were lost
    • Additionally, member initialization order in copy/move constructors didn't match class declaration order
  • Solution Implemented:
    1. Copy Assignment Operator (line 1262): Added missing _small_values = other._small_values; statement
      TimeSamples& operator=(const TimeSamples& other) {
        if (this != &other) {
          _samples = other._samples;
          _times = other._times;
          _blocked = other._blocked;
          _small_values = other._small_values;  // CRITICAL FIX: was missing!
          _values = other._values;
          _offsets = other._offsets;
          // ... rest of implementation
        }
        return *this;
      }
      
    2. Copy Constructor (lines 1213-1229): Reordered member initialization list to match class declaration order
      • Changed from: ..., _pod_samples(other._pod_samples), _small_values(other._small_values)
      • To: ..., _small_values(other._small_values), ..., _pod_samples(other._pod_samples)
    3. Move Constructor (lines 1147-1163): Reordered member initialization list to match class declaration order
      • Changed from: ..., _pod_samples(std::move(other._pod_samples)), _small_values(std::move(other._small_values))
      • To: ..., _small_values(std::move(other._small_values)), ..., _pod_samples(std::move(other._pod_samples))
    4. Class Member Declaration Order (lines 2655-2678): Verified order is:
      1. _times, _blocked, _small_values, _values (core storage)
      2. _offsets (offset management)
      3. _type_id, _use_pod, _is_array, etc. (type metadata)
      4. _dirty, _dirty_start, _dirty_end (dirty tracking)
      5. _pod_samples (POD storage wrapper)
  • Test Results:
    • Simple float timeSamples: float value.timeSamples = { 0: 1.5, 1: 2.5 } - WORKING
    • Double timeSamples: double value.timeSamples = { 0: 1.123456, 1: 2.987654 } - WORKING
    • Integer timeSamples: int value.timeSamples = { 0: 42, 1: 99 } - WORKING
    • Float array timeSamples: float[] timeSamples = { 0: [1, 2, 3], 1: [4, 5, 6] } - WORKING
    • Double array timeSamples: double[] timeSamples = { 0: [1.1, 2.2], 1: [3.3, 4.4] } - WORKING
    • Official test file: tests/usda/timesamples-array-001.usda - XformOp and array timeSamples WORKING
  • Known Limitations (Fixed in Part 2):
    • Bool and vector type timeSamples (e.g., float3) still show as null FIXED
    • These types don't use the unified POD storage path that this fix addresses FIXED
    • Resolution requires separate type reconstruction logic in get_samples() method IMPLEMENTED
    • Out of scope for this fix (would require significant changes to type handling) COMPLETED
  • Impact:
    • Primary objective achieved: scalar POD types now correctly preserved through copy/move operations
    • All unit tests pass with the fix in place
    • USDA parsing now produces correct output matching USDC format behavior
    • Improved robustness of C++ object semantics by ensuring all members are properly transferred
  • Additional Fixes (2025-10-26 - Part 2):
    1. Bool Type Support in get_samples(): Added case for bool type reconstruction (line 1988-1992)
      • Bool values stored in _small_values now correctly reconstructed
      • Values show as 0/1 in output (could be improved to show true/false)
    2. Vector Type Support (float3, point3f, color3f) in get_samples(): Added handling for types > 8 bytes (lines 1998-2038)
      • Types larger than 8 bytes are stored in _values buffer with offsets in _offsets
      • Added reconstruction logic for float3, point3f, and color3f types
      • Properly decodes offset using OFFSET_VALUE_MASK to retrieve data from _values buffer
    3. Comprehensive Test Results:
      • bool: Working (shows 0/1)
      • float, double, int: Working with correct values
      • float3: Working with correct tuple values
      • point3f: Working with correct tuple values
      • color3f: Working with correct tuple values
      • float[], double[]: Working with correct array values
      • Official test file timesamples-array-001.usda: All timeSamples working correctly

9. COMPLETED: Reduce Header File Complexity via Implementation Separation

  • Files: src/timesamples.hh (3,388 → 3,233 lines), src/timesamples.cc (1,325 → 1,503 lines)
  • Status: Completed (2025-10-27)
  • Problem Statement:
    • The timesamples.hh header file was becoming increasingly large (3,388 lines) with multiple non-template method implementations inlined
    • This increased compilation time and made the header harder to navigate
    • Non-template methods (TimeSamples constructors, assignment operators, clear(), init()) were candidates for moving to the implementation file
  • Solution Implemented:
    1. Moved Constructor/Assignment Operator Implementations (6 methods):

      • Move constructor (TimeSamples(TimeSamples&&) noexcept) - 28 lines
      • Move assignment operator (operator=(TimeSamples&&) noexcept) - 33 lines
      • Copy constructor (TimeSamples(const TimeSamples&)) - 23 lines
      • Copy assignment operator (operator=(const TimeSamples&)) - 28 lines
      • clear() method - 18 lines
      • init(uint32_t) method - 17 lines
      • Total lines moved: ~147 lines
    2. Namespace Organization (Critical Fix):

      • Initial build failed with: error: '_type_id' was not declared in this scope
      • Root cause: Implementations were outside the value namespace, unable to access private members
      • Solution: Wrapped all moved implementations in namespace value { } block
      • All implementations now properly scoped within the class's namespace
    3. File Organization:

      • Added clear section header in timesamples.cc:
        // ============================================================================
        // TimeSamples Implementation
        // ============================================================================
        
        namespace value {
        // Implementation code...
        } // namespace value
        } // namespace tinyusdz
        
  • Code Changes Detail:
    • In timesamples.hh: Replaced inline implementations with declarations only

      // Before (inline implementation ~147 lines total):
      TimeSamples(TimeSamples&& other) noexcept {
        // implementation...
      }
      
      // After (declaration only):
      TimeSamples(TimeSamples&& other) noexcept;
      
    • In timesamples.cc: Added corresponding implementations wrapped in namespace

      namespace value {
      
      TimeSamples::TimeSamples(TimeSamples&& other) noexcept
          : _samples(std::move(other._samples)),
            _times(std::move(other._times)),
            // ... full initialization list ...
      {
        // Reset moved-from object to valid state
        other._dirty = false;
        other._dirty_start = 0;
        other._dirty_end = 0;
      }
      
      // ... other implementations ...
      
      } // namespace value
      } // namespace tinyusdz
      
  • Metrics:
    • Header reduction: 3,388 → 3,233 lines (155 lines, 4.6% reduction)
    • Implementation growth: 1,325 → 1,503 lines (178 lines added for implementations + comments)
    • Compilation impact: Reduces header bloat without significant binary size impact
  • Build & Test Results:
    • Full build completed successfully: [100%] Built target unit-test-tinyusdz
    • All unit tests passing (20 tests, including timesamples_test)
    • No regressions in functionality
    • tusdcat binary builds and executes correctly
    • Complex timeSamples (bool, vector, array types) still working correctly
  • Impact:
    • Cleaner header file - easier to navigate class interface
    • Faster header parsing during compilation
    • Non-template code properly belongs in .cc file per C++ best practices
    • Establishes pattern for future refactoring (additional ~600 lines of template methods could follow similar pattern with explicit instantiation)
  • Known Limitations & Future Work:
    • Additional refactoring candidates identified (Phase 2):
      • get_typed_array_at_time() (95 lines) - could move with explicit template instantiation
      • get_typed_array_at() (89 lines) - could move with explicit template instantiation
      • Additional PODTimeSamples methods (~200+ lines) - candidates for similar treatment
    • These would require explicit template instantiation pattern similar to INSTANTIATE_ADD_SAMPLE macro already in use
    • Phase 2 could achieve additional 30% header reduction (~600+ more lines)
  • Pattern for Continuing Refactoring:
    • For template methods: Use explicit template instantiation in .cc file
    • For non-template methods: Simply move implementation as demonstrated
    • Always maintain proper namespace scoping around implementations
    • Update .hh to contain only declarations, .cc contains implementations

10. Generic Index Accessors

  • File: src/crate-reader.cc:141
  • Opportunity: GetField, GetToken, GetPath, GetElementPath, and GetPathString all share the same bounds-check pattern. A templated lookup_optional(vector, Index) (with optional logging hook) would remove boilerplate and centralize future diagnostics.

JavaScript/WASM Bindings (web directory)

1. Modularize Emscripten Bindings

  • File: web/binding.cc
  • Opportunity: This is a very large file containing all Emscripten bindings. It should be split into multiple files based on functionality (e.g., stage_bindings.cc, scene_bindings.cc, asset_bindings.cc) to improve organization and build times.

2. Refactor TinyUSDZLoaderNative

  • File: web/binding.cc
  • Opportunity: The TinyUSDZLoaderNative class has too many responsibilities. The asset caching and streaming logic, for example, could be extracted into a separate AssetManager class.

3. Consolidate JavaScript Loading Logic

  • File: web/js/src/tinyusdz/TinyUSDZLoader.js
  • Opportunity: The load and loadAsLayer methods share a lot of similar logic. This could be consolidated into a common internal loading function.

4. Data-Driven Material Conversion

  • File: web/js/src/tinyusdz/TinyUSDZLoaderUtils.js
  • Opportunity: The convertUsdMaterialToMeshPhysicalMaterial function, which maps USD material properties to Three.js material properties, could be refactored to be more data-driven. Using a mapping table or a similar approach would make it easier to add or modify material property mappings.