mirror of
https://gitlab.com/libeigen/eigen.git
synced 2026-01-18 17:31:19 +01:00
290 lines
13 KiB
Plaintext
290 lines
13 KiB
Plaintext
/*
|
|
Copyright (c) 2025, AMD Inc. All rights reserved.
|
|
Redistribution and use in source and binary forms, with or without modification,
|
|
are permitted provided that the following conditions are met:
|
|
* Redistributions of source code must retain the above copyright notice, this
|
|
list of conditions and the following disclaimer.
|
|
* Redistributions in binary form must reproduce the above copyright notice,
|
|
this list of conditions and the following disclaimer in the documentation
|
|
and/or other materials provided with the distribution.
|
|
* Neither the name of AMD nor the names of its contributors may
|
|
be used to endorse or promote products derived from this software without
|
|
specific prior written permission.
|
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
|
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
|
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
|
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
|
|
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
|
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
|
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
|
|
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
|
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
********************************************************************************
|
|
* Content : Documentation on the use of AMD AOCL through Eigen
|
|
********************************************************************************
|
|
*/
|
|
|
|
namespace Eigen {
|
|
|
|
/** \page TopicUsingAOCL Using AMD® AOCL from %Eigen
|
|
|
|
Since %Eigen version 3.4 and later, users can benefit from built-in AMD® Optimizing CPU Libraries (AOCL) optimizations with an installed copy of AOCL 5.0 (or later).
|
|
|
|
<a href="https://www.amd.com/en/developer/aocl.html"> AMD AOCL </a> provides highly optimized, multi-threaded mathematical routines for x86-64 processors with a focus on AMD "Zen"-based architectures. AOCL is available on Linux and Windows for x86-64 architectures.
|
|
|
|
\note
|
|
AMD® AOCL is freely available software, but it is the responsibility of users to download, install, and ensure their product's license allows linking to the AOCL libraries. AOCL is distributed under a permissive license that allows commercial use.
|
|
|
|
Using AMD AOCL through %Eigen is straightforward:
|
|
-# export \c AOCL_ROOT into your environment
|
|
-# define one of the AOCL macros before including any %Eigen headers (see table below)
|
|
-# link your program to AOCL libraries (BLIS, FLAME, LibM)
|
|
-# ensure your system supports the target architecture optimizations
|
|
|
|
When doing so, a number of %Eigen's algorithms are silently substituted with calls to AMD AOCL routines.
|
|
These substitutions apply only for \b Dynamic \b or \b large \b enough objects with one of the following standard scalar types: \c float, \c double, \c complex<float>, and \c complex<double>.
|
|
Operations on other scalar types or mixing reals and complexes will continue to use the built-in algorithms.
|
|
|
|
The AOCL integration targets three core components:
|
|
- **BLIS**: High-performance BLAS implementation optimized for modern cache hierarchies
|
|
- **FLAME**: Dense linear algebra algorithms providing LAPACK functionality
|
|
- **LibM**: Optimized standard math routines with vectorized implementations
|
|
|
|
\section TopicUsingAOCL_Macros Configuration Macros
|
|
|
|
You can choose which parts will be substituted by defining one or multiple of the following macros:
|
|
|
|
<table class="manual">
|
|
<tr><td>\c EIGEN_USE_BLAS </td><td>Enables the use of external BLAS level 2 and 3 routines (AOCL-BLIS)</td></tr>
|
|
<tr class="alt"><td>\c EIGEN_USE_LAPACKE </td><td>Enables the use of external LAPACK routines via the LAPACKE C interface (AOCL-FLAME)</td></tr>
|
|
<tr><td>\c EIGEN_USE_LAPACKE_STRICT </td><td>Same as \c EIGEN_USE_LAPACKE but algorithms of lower robustness are disabled. \n This currently concerns only JacobiSVD which would be replaced by \c gesvd.</td></tr>
|
|
<tr class="alt"><td>\c EIGEN_USE_AOCL_VML </td><td>Enables the use of AOCL LibM vector math operations for coefficient-wise functions</td></tr>
|
|
<tr><td>\c EIGEN_USE_AOCL_ALL </td><td>Defines \c EIGEN_USE_BLAS, \c EIGEN_USE_LAPACKE, and \c EIGEN_USE_AOCL_VML</td></tr>
|
|
<tr class="alt"><td>\c EIGEN_USE_AOCL_MT </td><td>Equivalent to \c EIGEN_USE_AOCL_ALL, but ensures multi-threaded BLIS (\c libblis-mt) is used. \n \b Recommended for most applications.</td></tr>
|
|
</table>
|
|
|
|
\note The AOCL integration automatically enables optimizations when the matrix/vector size exceeds \c EIGEN_AOCL_VML_THRESHOLD (default: 128 elements). For smaller operations, Eigen's built-in vectorization may be faster due to function call overhead.
|
|
|
|
\section TopicUsingAOCL_Performance Performance Considerations
|
|
|
|
The \c EIGEN_USE_BLAS and \c EIGEN_USE_LAPACKE macros can be combined with AOCL-specific optimizations:
|
|
|
|
- **Multi-threading**: Use \c EIGEN_USE_AOCL_MT to automatically select the multi-threaded BLIS library
|
|
- **Architecture targeting**: AOCL libraries are optimized for AMD Zen architectures (Zen, Zen2, Zen3, Zen4, Zen5)
|
|
- **Vector Math Library**: AOCL LibM provides vectorized implementations that can operate on entire arrays simultaneously
|
|
- **Memory layout**: Eigen's column-major storage directly matches AOCL's expected data layout for zero-copy operation
|
|
|
|
\section TopicUsingAOCL_Types Supported Data Types and Sizes
|
|
|
|
AOCL acceleration is applied to:
|
|
- **Scalar types**: \c float, \c double, \c complex<float>, \c complex<double>
|
|
- **Matrix/Vector sizes**: Dynamic size or compile-time size ≥ \c EIGEN_AOCL_VML_THRESHOLD
|
|
- **Storage order**: Both column-major (default) and row-major layouts
|
|
- **Memory alignment**: Eigen's data pointers are directly compatible with AOCL function signatures
|
|
|
|
The current AOCL Vector Math Library integration is specialized for \c double precision, with automatic fallback to scalar implementations for \c float.
|
|
|
|
\section TopicUsingAOCL_Functions Vector Math Functions
|
|
|
|
The following table summarizes coefficient-wise operations accelerated by \c EIGEN_USE_AOCL_VML:
|
|
|
|
<table class="manual">
|
|
<tr><th>Code example</th><th>AOCL routines</th></tr>
|
|
<tr><td>\code
|
|
v2 = v1.array().exp();
|
|
v2 = v1.array().sin();
|
|
v2 = v1.array().cos();
|
|
v2 = v1.array().tan();
|
|
v2 = v1.array().log();
|
|
v2 = v1.array().log10();
|
|
v2 = v1.array().log2();
|
|
v2 = v1.array().sqrt();
|
|
v2 = v1.array().pow(1.5);
|
|
v2 = v1.array() + v2.array();
|
|
\endcode</td><td>\code
|
|
amd_vrda_exp
|
|
amd_vrda_sin
|
|
amd_vrda_cos
|
|
amd_vrda_tan
|
|
amd_vrda_log
|
|
amd_vrda_log10
|
|
amd_vrda_log2
|
|
amd_vrda_sqrt
|
|
amd_vrda_pow
|
|
amd_vrda_add
|
|
\endcode</td></tr>
|
|
</table>
|
|
|
|
In the examples, v1 and v2 are dense vectors of type \c VectorXd with size ≥ \c EIGEN_AOCL_VML_THRESHOLD.
|
|
|
|
\section TopicUsingAOCL_Example Complete Example
|
|
|
|
\code
|
|
#define EIGEN_USE_AOCL_MT
|
|
#include <iostream>
|
|
#include <Eigen/Dense>
|
|
|
|
int main() {
|
|
const int n = 2048;
|
|
|
|
// Large matrices automatically use AOCL-BLIS for multiplication
|
|
Eigen::MatrixXd A = Eigen::MatrixXd::Random(n, n);
|
|
Eigen::MatrixXd B = Eigen::MatrixXd::Random(n, n);
|
|
Eigen::MatrixXd C = A * B; // Dispatched to dgemm
|
|
|
|
// Large vectors automatically use AOCL LibM for math functions
|
|
Eigen::VectorXd v = Eigen::VectorXd::LinSpaced(10000, 0, 10);
|
|
Eigen::VectorXd result = v.array().sin(); // Dispatched to amd_vrda_sin
|
|
|
|
// LAPACK decompositions use AOCL-FLAME
|
|
Eigen::LLT<Eigen::MatrixXd> llt(A); // Dispatched to dpotrf
|
|
|
|
std::cout << "Matrix norm: " << C.norm() << std::endl;
|
|
std::cout << "Vector result norm: " << result.norm() << std::endl;
|
|
|
|
return 0;
|
|
}
|
|
\endcode
|
|
|
|
\section TopicUsingAOCL_Building Building and Linking
|
|
|
|
To compile with AOCL support, set the \c AOCL_ROOT environment variable and link against the required libraries:
|
|
|
|
\code
|
|
export AOCL_ROOT=/path/to/aocl
|
|
clang++ -O3 -g -DEIGEN_USE_AOCL_ALL \
|
|
-I./install/include -I${AOCL_ROOT}/include \
|
|
-Wno-parentheses my_app.cpp \
|
|
-L${AOCL_ROOT} -lamdlibm -lflame -lblis \
|
|
-lpthread -lrt -lm -lomp \
|
|
-o eigen_aocl_example
|
|
\endcode
|
|
|
|
For multi-threaded performance, use the multi-threaded BLIS library:
|
|
\code
|
|
clang++ -O3 -g -DEIGEN_USE_AOCL_MT \
|
|
-I./install/include -I${AOCL_ROOT}/include \
|
|
-Wno-parentheses my_app.cpp \
|
|
-L${AOCL_ROOT} -lamdlibm -lflame -lblis-mt \
|
|
-lpthread -lrt -lm -lomp \
|
|
-o eigen_aocl_example
|
|
\endcode
|
|
|
|
Key compiler and linker flags:
|
|
- \c -DEIGEN_USE_AOCL_ALL: Enable all AOCL accelerations (BLAS, LAPACK, VML)
|
|
- \c -DEIGEN_USE_AOCL_MT: Enable multi-threaded version (uses \c -lblis-mt)
|
|
- \c -lblis: Single-threaded BLIS library
|
|
- \c -lblis-mt: Multi-threaded BLIS library (recommended for performance)
|
|
- \c -lflame: FLAME LAPACK implementation
|
|
- \c -lamdlibm: AMD LibM vector math library
|
|
- \c -lomp: OpenMP runtime for multi-threading support
|
|
- \c -lpthread -lrt: System threading and real-time libraries
|
|
- \c -Wno-parentheses: Suppress common warnings when using AOCL headers
|
|
|
|
\subsection TopicUsingAOCL_EigenBuild Building Eigen with AOCL Support
|
|
|
|
To build Eigen with AOCL Support, use the following CMake configuration:
|
|
|
|
\code
|
|
cmake .. -DCMAKE_BUILD_TYPE=Release \
|
|
-DCMAKE_C_COMPILER=clang \
|
|
-DCMAKE_CXX_COMPILER=clang++ \
|
|
-DCMAKE_INSTALL_PREFIX=$PWD/install \
|
|
-DINCLUDE_INSTALL_DIR=$PWD/install/include \
|
|
&& make install -j$(nproc)
|
|
\endcode
|
|
|
|
|
|
To build Eigen with AOCL integration and benchmarking capabilities, use the following CMake configuration:
|
|
|
|
\code
|
|
cmake .. -DEIGEN_BUILD_AOCL_BENCH=ON \
|
|
-DEIGEN_AOCL_BENCH_FLAGS="-O3 -mavx512f -fveclib=AMDLIBM" \
|
|
-DEIGEN_AOCL_BENCH_USE_MT=OFF \
|
|
-DEIGEN_AOCL_BENCH_ARCH=znver5 \
|
|
-DCMAKE_BUILD_TYPE=Debug \
|
|
-DCMAKE_C_COMPILER=clang \
|
|
-DCMAKE_CXX_COMPILER=clang++ \
|
|
-DCMAKE_INSTALL_PREFIX=$PWD/install \
|
|
-DINCLUDE_INSTALL_DIR=$PWD/install/include \
|
|
&& make install -j$(nproc)
|
|
\endcode
|
|
|
|
**CMake Configuration Parameters:**
|
|
|
|
<table class="manual">
|
|
<tr><th>Parameter</th><th>Expected Values</th><th>Description</th></tr>
|
|
<tr><td>\c EIGEN_BUILD_AOCL_BENCH</td><td>\c ON, \c OFF</td><td>Enable/disable AOCL benchmark compilation</td></tr>
|
|
<tr class="alt"><td>\c EIGEN_AOCL_BENCH_FLAGS</td><td>Compiler flags string</td><td>Additional compiler optimizations: \c "-O3 -mavx512f -fveclib=AMDLIBM"</td></tr>
|
|
<tr><td>\c EIGEN_AOCL_BENCH_USE_MT</td><td>\c ON, \c OFF</td><td>Use multi-threaded AOCL libraries (\c ON recommended for performance)</td></tr>
|
|
<tr class="alt"><td>\c EIGEN_AOCL_BENCH_ARCH</td><td>\c znver3, \c znver4, \c znver5, \c native, \c generic</td><td>Target AMD architecture (match your CPU generation)</td></tr>
|
|
<tr><td>\c CMAKE_BUILD_TYPE</td><td>\c Release, \c Debug, \c RelWithDebInfo</td><td>Build configuration (\c Release recommended for benchmarks)</td></tr>
|
|
<tr class="alt"><td>\c CMAKE_C_COMPILER</td><td>\c clang, \c gcc</td><td>C compiler (clang recommended for AOCL)</td></tr>
|
|
<tr><td>\c CMAKE_CXX_COMPILER</td><td>\c clang++, \c g++</td><td>C++ compiler (clang++ recommended for AOCL)</td></tr>
|
|
<tr class="alt"><td>\c CMAKE_INSTALL_PREFIX</td><td>Installation path</td><td>Where to install Eigen headers</td></tr>
|
|
<tr><td>\c INCLUDE_INSTALL_DIR</td><td>Header path</td><td>Specific path for Eigen headers</td></tr>
|
|
</table>
|
|
|
|
**Architecture Selection Guide:**
|
|
- \c znver3: AMD Zen 3 (EPYC 7003, Ryzen 5000 series)
|
|
- \c znver4: AMD Zen 4 (EPYC 9004, Ryzen 7000 series)
|
|
- \c znver5: AMD Zen 5 (EPYC 9005, Ryzen 9000 series)
|
|
- \c native: Auto-detect current CPU architecture
|
|
- \c generic: Generic x86-64 without specific optimizations
|
|
|
|
**Custom Compiler Flags Explanation:**
|
|
- \c -O3: Maximum optimization level
|
|
- \c -mavx512f: Enable AVX-512 instruction set (if supported)
|
|
- \c -fveclib=AMDLIBM: Use AMD LibM for vectorized math functions
|
|
|
|
\subsection TopicUsingAOCL_Benchmark Building the AOCL Benchmark
|
|
|
|
After configuring Eigen, build the AOCL benchmark executable:
|
|
|
|
\code
|
|
cmake --build . --target benchmark_aocl -j$(nproc)
|
|
\endcode
|
|
|
|
This creates the \c benchmark_aocl executable that demonstrates AOCL acceleration with various matrix sizes and operations.
|
|
|
|
**Running the Benchmark:**
|
|
\code
|
|
./benchmark_aocl
|
|
\endcode
|
|
|
|
The benchmark will automatically compare:
|
|
- Eigen's native performance vs AOCL-accelerated operations
|
|
- Matrix multiplication performance (BLIS vs Eigen)
|
|
- Vector math functions performance (LibM vs Eigen)
|
|
- Memory bandwidth utilization and cache efficiency
|
|
|
|
\section TopicUsingAOCL_CMake CMake Integration
|
|
|
|
When using CMake, you can use a FindAOCL module:
|
|
|
|
\code
|
|
find_package(AOCL REQUIRED)
|
|
target_compile_definitions(my_target PRIVATE EIGEN_USE_AOCL_MT)
|
|
target_link_libraries(my_target PRIVATE AOCL::BLIS_MT AOCL::FLAME AOCL::LIBM)
|
|
\endcode
|
|
|
|
\section TopicUsingAOCL_Troubleshooting Troubleshooting
|
|
|
|
Common issues and solutions:
|
|
|
|
- **Link errors**: Ensure \c AOCL_ROOT is set and libraries are in \c LD_LIBRARY_PATH
|
|
- **Performance not improved**: Verify you're using matrices/vectors larger than the threshold
|
|
- **Thread contention**: Set \c OMP_NUM_THREADS to match your CPU core count
|
|
- **Architecture mismatch**: Use appropriate \c -march flag for your AMD processor
|
|
|
|
\section TopicUsingAOCL_Links Links
|
|
|
|
- AMD AOCL can be downloaded for free <a href="https://www.amd.com/en/developer/aocl.html">here</a>
|
|
- AOCL User Guide and documentation available on the AMD Developer Portal
|
|
- AOCL is also available through package managers and containerized environments
|
|
|
|
*/
|
|
|
|
}
|