/* Copyright (c) 2025, AMD Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of AMD nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************** * Content : Documentation on the use of AMD AOCL through Eigen ******************************************************************************** */ namespace Eigen { /** \page TopicUsingAOCL Using AMD® AOCL from %Eigen Since %Eigen version 3.4 and later, users can benefit from built-in AMD® Optimizing CPU Libraries (AOCL) optimizations with an installed copy of AOCL 5.0 (or later). AMD AOCL provides highly optimized, multi-threaded mathematical routines for x86-64 processors with a focus on AMD "Zen"-based architectures. AOCL is available on Linux and Windows for x86-64 architectures. \note AMD® AOCL is freely available software, but it is the responsibility of users to download, install, and ensure their product's license allows linking to the AOCL libraries. AOCL is distributed under a permissive license that allows commercial use. Using AMD AOCL through %Eigen is straightforward: -# export \c AOCL_ROOT into your environment -# define one of the AOCL macros before including any %Eigen headers (see table below) -# link your program to AOCL libraries (BLIS, FLAME, LibM) -# ensure your system supports the target architecture optimizations When doing so, a number of %Eigen's algorithms are silently substituted with calls to AMD AOCL routines. These substitutions apply only for \b Dynamic \b or \b large \b enough objects with one of the following standard scalar types: \c float, \c double, \c complex, and \c complex. Operations on other scalar types or mixing reals and complexes will continue to use the built-in algorithms. The AOCL integration targets three core components: - **BLIS**: High-performance BLAS implementation optimized for modern cache hierarchies - **FLAME**: Dense linear algebra algorithms providing LAPACK functionality - **LibM**: Optimized standard math routines with vectorized implementations \section TopicUsingAOCL_Macros Configuration Macros You can choose which parts will be substituted by defining one or multiple of the following macros:

\c EIGEN_USE_BLAS	Enables the use of external BLAS level 2 and 3 routines (AOCL-BLIS)
\c EIGEN_USE_LAPACKE	Enables the use of external LAPACK routines via the LAPACKE C interface (AOCL-FLAME)
\c EIGEN_USE_LAPACKE_STRICT	Same as \c EIGEN_USE_LAPACKE but algorithms of lower robustness are disabled. \n This currently concerns only JacobiSVD which would be replaced by \c gesvd.
\c EIGEN_USE_AOCL_VML	Enables the use of AOCL LibM vector math operations for coefficient-wise functions
\c EIGEN_USE_AOCL_ALL	Defines \c EIGEN_USE_BLAS, \c EIGEN_USE_LAPACKE, and \c EIGEN_USE_AOCL_VML
\c EIGEN_USE_AOCL_MT	Equivalent to \c EIGEN_USE_AOCL_ALL, but ensures multi-threaded BLIS (\c libblis-mt) is used. \n \b Recommended for most applications.

\note The AOCL integration automatically enables optimizations when the matrix/vector size exceeds \c EIGEN_AOCL_VML_THRESHOLD (default: 128 elements). For smaller operations, Eigen's built-in vectorization may be faster due to function call overhead. \section TopicUsingAOCL_Performance Performance Considerations The \c EIGEN_USE_BLAS and \c EIGEN_USE_LAPACKE macros can be combined with AOCL-specific optimizations: - **Multi-threading**: Use \c EIGEN_USE_AOCL_MT to automatically select the multi-threaded BLIS library - **Architecture targeting**: AOCL libraries are optimized for AMD Zen architectures (Zen, Zen2, Zen3, Zen4, Zen5) - **Vector Math Library**: AOCL LibM provides vectorized implementations that can operate on entire arrays simultaneously - **Memory layout**: Eigen's column-major storage directly matches AOCL's expected data layout for zero-copy operation \section TopicUsingAOCL_Types Supported Data Types and Sizes AOCL acceleration is applied to: - **Scalar types**: \c float, \c double, \c complex, \c complex - **Matrix/Vector sizes**: Dynamic size or compile-time size ≥ \c EIGEN_AOCL_VML_THRESHOLD - **Storage order**: Both column-major (default) and row-major layouts - **Memory alignment**: Eigen's data pointers are directly compatible with AOCL function signatures The current AOCL Vector Math Library integration is specialized for \c double precision, with automatic fallback to scalar implementations for \c float. \section TopicUsingAOCL_Functions Vector Math Functions The following table summarizes coefficient-wise operations accelerated by \c EIGEN_USE_AOCL_VML:

Code example	AOCL routines
\code v2 = v1.array().exp(); v2 = v1.array().sin(); v2 = v1.array().cos(); v2 = v1.array().tan(); v2 = v1.array().log(); v2 = v1.array().log10(); v2 = v1.array().log2(); v2 = v1.array().sqrt(); v2 = v1.array().pow(1.5); v2 = v1.array() + v2.array(); \endcode	\code amd_vrda_exp amd_vrda_sin amd_vrda_cos amd_vrda_tan amd_vrda_log amd_vrda_log10 amd_vrda_log2 amd_vrda_sqrt amd_vrda_pow amd_vrda_add \endcode

In the examples, v1 and v2 are dense vectors of type \c VectorXd with size ≥ \c EIGEN_AOCL_VML_THRESHOLD. \section TopicUsingAOCL_Example Complete Example \code #define EIGEN_USE_AOCL_MT #include #include int main() { const int n = 2048; // Large matrices automatically use AOCL-BLIS for multiplication Eigen::MatrixXd A = Eigen::MatrixXd::Random(n, n); Eigen::MatrixXd B = Eigen::MatrixXd::Random(n, n); Eigen::MatrixXd C = A * B; // Dispatched to dgemm // Large vectors automatically use AOCL LibM for math functions Eigen::VectorXd v = Eigen::VectorXd::LinSpaced(10000, 0, 10); Eigen::VectorXd result = v.array().sin(); // Dispatched to amd_vrda_sin // LAPACK decompositions use AOCL-FLAME Eigen::LLT llt(A); // Dispatched to dpotrf std::cout << "Matrix norm: " << C.norm() << std::endl; std::cout << "Vector result norm: " << result.norm() << std::endl; return 0; } \endcode \section TopicUsingAOCL_Building Building and Linking To compile with AOCL support, set the \c AOCL_ROOT environment variable and link against the required libraries: \code export AOCL_ROOT=/path/to/aocl clang++ -O3 -g -DEIGEN_USE_AOCL_ALL \ -I./install/include -I${AOCL_ROOT}/include \ -Wno-parentheses my_app.cpp \ -L${AOCL_ROOT} -lamdlibm -lflame -lblis \ -lpthread -lrt -lm -lomp \ -o eigen_aocl_example \endcode For multi-threaded performance, use the multi-threaded BLIS library: \code clang++ -O3 -g -DEIGEN_USE_AOCL_MT \ -I./install/include -I${AOCL_ROOT}/include \ -Wno-parentheses my_app.cpp \ -L${AOCL_ROOT} -lamdlibm -lflame -lblis-mt \ -lpthread -lrt -lm -lomp \ -o eigen_aocl_example \endcode Key compiler and linker flags: - \c -DEIGEN_USE_AOCL_ALL: Enable all AOCL accelerations (BLAS, LAPACK, VML) - \c -DEIGEN_USE_AOCL_MT: Enable multi-threaded version (uses \c -lblis-mt) - \c -lblis: Single-threaded BLIS library - \c -lblis-mt: Multi-threaded BLIS library (recommended for performance) - \c -lflame: FLAME LAPACK implementation - \c -lamdlibm: AMD LibM vector math library - \c -lomp: OpenMP runtime for multi-threading support - \c -lpthread -lrt: System threading and real-time libraries - \c -Wno-parentheses: Suppress common warnings when using AOCL headers \subsection TopicUsingAOCL_EigenBuild Building Eigen with AOCL Support To build Eigen with AOCL Support, use the following CMake configuration: \code cmake .. -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_C_COMPILER=clang \ -DCMAKE_CXX_COMPILER=clang++ \ -DCMAKE_INSTALL_PREFIX=$PWD/install \ -DINCLUDE_INSTALL_DIR=$PWD/install/include \ && make install -j$(nproc) \endcode To build Eigen with AOCL integration and benchmarking capabilities, use the following CMake configuration: \code cmake .. -DEIGEN_BUILD_AOCL_BENCH=ON \ -DEIGEN_AOCL_BENCH_FLAGS="-O3 -mavx512f -fveclib=AMDLIBM" \ -DEIGEN_AOCL_BENCH_USE_MT=OFF \ -DEIGEN_AOCL_BENCH_ARCH=znver5 \ -DCMAKE_BUILD_TYPE=Debug \ -DCMAKE_C_COMPILER=clang \ -DCMAKE_CXX_COMPILER=clang++ \ -DCMAKE_INSTALL_PREFIX=$PWD/install \ -DINCLUDE_INSTALL_DIR=$PWD/install/include \ && make install -j$(nproc) \endcode **CMake Configuration Parameters:**

Parameter	Expected Values	Description
\c EIGEN_BUILD_AOCL_BENCH	\c ON, \c OFF	Enable/disable AOCL benchmark compilation
\c EIGEN_AOCL_BENCH_FLAGS	Compiler flags string	Additional compiler optimizations: \c "-O3 -mavx512f -fveclib=AMDLIBM"
\c EIGEN_AOCL_BENCH_USE_MT	\c ON, \c OFF	Use multi-threaded AOCL libraries (\c ON recommended for performance)
\c EIGEN_AOCL_BENCH_ARCH	\c znver3, \c znver4, \c znver5, \c native, \c generic	Target AMD architecture (match your CPU generation)
\c CMAKE_BUILD_TYPE	\c Release, \c Debug, \c RelWithDebInfo	Build configuration (\c Release recommended for benchmarks)
\c CMAKE_C_COMPILER	\c clang, \c gcc	C compiler (clang recommended for AOCL)
\c CMAKE_CXX_COMPILER	\c clang++, \c g++	C++ compiler (clang++ recommended for AOCL)
\c CMAKE_INSTALL_PREFIX	Installation path	Where to install Eigen headers
\c INCLUDE_INSTALL_DIR	Header path	Specific path for Eigen headers

**Architecture Selection Guide:** - \c znver3: AMD Zen 3 (EPYC 7003, Ryzen 5000 series) - \c znver4: AMD Zen 4 (EPYC 9004, Ryzen 7000 series) - \c znver5: AMD Zen 5 (EPYC 9005, Ryzen 9000 series) - \c native: Auto-detect current CPU architecture - \c generic: Generic x86-64 without specific optimizations **Custom Compiler Flags Explanation:** - \c -O3: Maximum optimization level - \c -mavx512f: Enable AVX-512 instruction set (if supported) - \c -fveclib=AMDLIBM: Use AMD LibM for vectorized math functions \subsection TopicUsingAOCL_Benchmark Building the AOCL Benchmark After configuring Eigen, build the AOCL benchmark executable: \code cmake --build . --target benchmark_aocl -j$(nproc) \endcode This creates the \c benchmark_aocl executable that demonstrates AOCL acceleration with various matrix sizes and operations. **Running the Benchmark:** \code ./benchmark_aocl \endcode The benchmark will automatically compare: - Eigen's native performance vs AOCL-accelerated operations - Matrix multiplication performance (BLIS vs Eigen) - Vector math functions performance (LibM vs Eigen) - Memory bandwidth utilization and cache efficiency \section TopicUsingAOCL_CMake CMake Integration When using CMake, you can use a FindAOCL module: \code find_package(AOCL REQUIRED) target_compile_definitions(my_target PRIVATE EIGEN_USE_AOCL_MT) target_link_libraries(my_target PRIVATE AOCL::BLIS_MT AOCL::FLAME AOCL::LIBM) \endcode \section TopicUsingAOCL_Troubleshooting Troubleshooting Common issues and solutions: - **Link errors**: Ensure \c AOCL_ROOT is set and libraries are in \c LD_LIBRARY_PATH - **Performance not improved**: Verify you're using matrices/vectors larger than the threshold - **Thread contention**: Set \c OMP_NUM_THREADS to match your CPU core count - **Architecture mismatch**: Use appropriate \c -march flag for your AMD processor \section TopicUsingAOCL_Links Links - AMD AOCL can be downloaded for free here - AOCL User Guide and documentation available on the AMD Developer Portal - AOCL is also available through package managers and containerized environments */ }