Add a simd library for SSE / NEON / WASM_SIMD

Adds a "simd" namespace with vector utilities (SSE, NEON, WASM_SIMD), and implements RawPath::bounds in SIMD. This namespace relies exclusively on the latest clang vector extensions, so from here on, Rive needs to be built with a recent clang. On our own bots, we had to bump the Android builder to NDK r25b, the Ubuntu builder to 22.04, and the MacOS builder to macos-12. RawPathBounds bench result: MacOS: 2.37ms -> .579 (4.1x) Windows: 3.53ms -> 1.68 (2.1x) Windows SSE can be optimized down to .927ms (3.8x) by forcing the SSE min/max instructions, but they have different behavior with NaN, which is why clang doesn't use them directly, so it seems like an over-optimization at this point. Diffs= 986b49674 Add a simd library for SSE / NEON / WASM_SIMD (#4199)
2026-01-18 21:21:17 +01:00 · 2022-08-30 22:54:10 +00:00
parent f7ac0f9ed1
commit fe9112bbe8
6 changed files with 299 additions and 22 deletions
--- a/.rive_head
+++ b/.rive_head
@@ -1 +1 @@
-ba0406a89050a0bd7833bcbf8805daca711de625
+986b4967488b06539f41dad2cb1e7711eb51ab08
--- a/README.md
+++ b/README.md
@@ -12,13 +12,15 @@ C++ runtime for [Rive](https://rive.app). Provides these runtime features:
 - Example concrete renderer written in C++ with [Skia](https://skia.org/). Skia renderer code is in [skia/renderer/src/skia_renderer.cpp](skia/renderer/src/skia_renderer.cpp).

 ## Build System
-We use [premake5](https://premake.github.io/). The Rive dev team primarily works on MacOS. There is some work done by the community to also support Windows and Linux. PRs welcomed for specific platforms you with to support! We encourage you to use premake as it's highly extensible and configurable for a variety of platforms.
+We use [premake5](https://premake.github.io/). The Rive dev team primarily works on MacOS. There is some work done by the community to also support Windows and Linux. PRs welcomed for specific platforms you wish to support! We encourage you to use premake as it's highly extensible and configurable for a variety of platforms.

 ## Build
 In the ```rive-cpp``` directory, run ```build.sh``` to debug build and ```build.sh release``` for a release build.

 If you've put the `premake5` executable in the `rive-cpp/build` folder, you can run it with `PATH=.:$PATH ./build.sh`

+Rive makes use of clang [vector builtins](https://reviews.llvm.org/D111529), which are, as of 2022, still a work in progress. Please use clang and ensure you have the latest version.
+
 ## Building skia projects
 ```
 cd skia/dependencies
--- a/include/rive/math/simd.hpp
+++ b/include/rive/math/simd.hpp
@@ -0,0 +1,111 @@
+/*
+ * Copyright 2022 Rive
+ */
+
+// An SSE / NEON / WASM_SIMD library based on clang vector types.
+//
+// This header makes use of the clang vector builtins specified in https://reviews.llvm.org/D111529.
+// This effort in clang is still a work in progress, and compiling this header requires an
+// extremely recent version of clang.
+//
+// To explore the codegen from this header, paste it into https://godbolt.org/, select a recent
+// clang compiler, and add an -O3 flag.
+
+#ifndef _RIVE_SIMD_HPP_
+#define _RIVE_SIMD_HPP_
+
+#include <stdint.h>
+
+#define SIMD_ALWAYS_INLINE inline __attribute__((always_inline))
+
+namespace rive {
+namespace simd {
+
+// The GLSL spec uses "gvec" to denote a vector of unspecified type.
+template <typename T, int N>
+using gvec = T __attribute__((ext_vector_type(N))) __attribute__((aligned(sizeof(T) * N)));
+
+////// Math //////
+
+// Similar to std::min(), with a noteworthy difference:
+// If a[i] or b[i] is NaN and the other is not, returns whichever is _not_ NaN.
+template <typename T, int N> SIMD_ALWAYS_INLINE gvec<T, N> min(gvec<T, N> a, gvec<T, N> b) {
+    return __builtin_elementwise_min(a, b);
+}
+
+// Similar to std::max(), with a noteworthy difference:
+// If a[i] or b[i] is NaN and the other is not, returns whichever is _not_ NaN.
+template <typename T, int N> SIMD_ALWAYS_INLINE gvec<T, N> max(gvec<T, N> a, gvec<T, N> b) {
+    return __builtin_elementwise_max(a, b);
+}
+
+// Returns the absolute value of x per element, with one exception:
+// If x[i] is an integer type and equal to the minimum representable value, returns x[i].
+template <typename T, int N> SIMD_ALWAYS_INLINE gvec<T, N> abs(gvec<T, N> x) {
+    return __builtin_elementwise_abs(x);
+}
+
+////// Boolean logic //////
+//
+// Vector booleans are of type int32_t, where true is ~0 and false is 0. Vector booleans can be
+// generated using the builtin boolean operators: ==, !=, <=, >=, <, >
+//
+
+// Returns true if all elements in x are equal to 0.
+template <int N> SIMD_ALWAYS_INLINE bool any(gvec<int32_t, N> x) {
+    // This particular logic structure gets decent codegen in clang.
+    // TODO: __builtin_reduce_or(x) once it's implemented in the compiler.
+    for (int i = 0; i < N; ++i) {
+        if (x[i])
+            return true;
+    }
+    return false;
+}
+
+// Returns true if all elements in x are equal to ~0.
+template <int N> SIMD_ALWAYS_INLINE bool all(gvec<int32_t, N> x) {
+    // In vector, true is represented by -1 exactly, so we use ~x for "not".
+    // TODO: __builtin_reduce_and(x) once it's implemented in the compiler.
+    return !any(~x);
+}
+
+////// Loading and storing //////
+
+template <typename T, int N> SIMD_ALWAYS_INLINE gvec<T, N> load(const T* ptr) {
+    gvec<T, N> vec;
+    __builtin_memcpy(&vec, ptr, sizeof(vec));
+    return vec;
+}
+SIMD_ALWAYS_INLINE gvec<float, 2> load2f(const float* ptr) { return load<float, 2>(ptr); }
+SIMD_ALWAYS_INLINE gvec<float, 4> load4f(const float* ptr) { return load<float, 4>(ptr); }
+SIMD_ALWAYS_INLINE gvec<int32_t, 2> load2i(const int32_t* ptr) { return load<int32_t, 2>(ptr); }
+SIMD_ALWAYS_INLINE gvec<int32_t, 4> load4i(const int32_t* ptr) { return load<int32_t, 4>(ptr); }
+SIMD_ALWAYS_INLINE gvec<uint32_t, 2> load2ui(const uint32_t* ptr) { return load<uint32_t, 2>(ptr); }
+SIMD_ALWAYS_INLINE gvec<uint32_t, 4> load4ui(const uint32_t* ptr) { return load<uint32_t, 4>(ptr); }
+
+template <typename T, int N> SIMD_ALWAYS_INLINE void store(T* ptr, gvec<T, N> vec) {
+    __builtin_memcpy(ptr, &vec, sizeof(vec));
+}
+
+} // namespace simd
+} // namespace rive
+
+#undef SIMD_ALWAYS_INLINE
+
+namespace rive {
+
+template <int N> using vec = simd::gvec<float, N>;
+using float2 = vec<2>;
+using float4 = vec<4>;
+
+template <int N> using ivec = simd::gvec<int32_t, N>;
+using int2 = ivec<2>;
+using int4 = ivec<4>;
+
+template <int N> using uvec = simd::gvec<uint32_t, N>;
+using uint2 = uvec<2>;
+using uint4 = uvec<4>;
+
+} // namespace rive
+
+#endif
--- a/src/math/raw_path.cpp
+++ b/src/math/raw_path.cpp
@@ -3,11 +3,14 @@
 */

 #include "rive/math/raw_path.hpp"
+
+#include "rive/command_path.hpp"
+#include "rive/math/simd.hpp"
 #include <cmath>
 #include <cstring>
 #include <algorithm>

-using namespace rive;
+namespace rive {

 RawPath::RawPath(Span<const Vec2D> points, Span<const PathVerb> verbs) :
    m_Points(points.begin(), points.end()), m_Verbs(verbs.begin(), verbs.end()) {}
@@ -17,22 +20,24 @@ bool RawPath::operator==(const RawPath& o) const {
 }

 AABB RawPath::bounds() const {
-    if (this->empty()) {
-        return {0, 0, 0, 0};
+    float4 mins, maxes;
+    size_t i;
+    if (m_Points.size() & 1) {
+        mins = maxes = simd::load2f(&m_Points[0].x).xyxy;
+        i = 1;
+    } else {
+        mins = maxes = m_Points.empty() ? float4{0, 0, 0, 0} : simd::load4f(&m_Points[0].x);
+        i = 2;
    }
-
-    float l, t, r, b;
-    l = r = m_Points[0].x;
-    t = b = m_Points[0].y;
-    for (size_t i = 1; i < m_Points.size(); ++i) {
-        const float x = m_Points[i].x;
-        const float y = m_Points[i].y;
-        l = std::min(l, x);
-        r = std::max(r, x);
-        t = std::min(t, y);
-        b = std::max(b, y);
+    for (; i < m_Points.size(); i += 2) {
+        float4 pts = simd::load4f(&m_Points[i].x);
+        mins = simd::min(mins, pts);
+        maxes = simd::max(maxes, pts);
    }
-    return {l, t, r, b};
+    AABB bounds;
+    simd::store(&bounds.minX, simd::min(mins.xy, mins.zw));
+    simd::store(&bounds.maxX, simd::max(maxes.xy, maxes.zw));
+    return bounds;
 }

 void RawPath::move(Vec2D a) {
@@ -187,7 +192,6 @@ void RawPath::addPath(const RawPath& src, const Mat2D* mat) {

 //////////////////////////////////////////////////////////////////////////

-namespace rive {
 int path_verb_to_point_count(PathVerb v) {
    static uint8_t ptCounts[] = {
        1, // move
@@ -201,7 +205,6 @@ int path_verb_to_point_count(PathVerb v) {
    assert(index < sizeof(ptCounts));
    return ptCounts[index];
 }
-} // namespace rive

 RawPath::Iter::Rec RawPath::Iter::next() {
    // initialize with "false"
@@ -237,8 +240,6 @@ void RawPath::rewind() {

 ///////////////////////////////////

-#include "rive/command_path.hpp"
-
 void RawPath::addTo(CommandPath* result) const {
    RawPath::Iter iter(*this);
    while (auto rec = iter.next()) {
@@ -251,3 +252,5 @@ void RawPath::addTo(CommandPath* result) const {
        }
    }
 }
+
+} // namespace rive
--- a/test/raw_path_test.cpp
+++ b/test/raw_path_test.cpp
@@ -7,6 +7,7 @@

 #include <catch.hpp>
 #include <cstdio>
+#include <limits>

 using namespace rive;

@@ -319,4 +320,37 @@ TEST_CASE("factory", "[rawpath]") {
    path1.close();

    REQUIRE(path0 == path1);
-}
+}
+
+TEST_CASE("bounds", "[rawpath]") {
+    RawPath path;
+    AABB bounds;
+    srand(0);
+    const auto randPt = [&] {
+        Vec2D pt = Vec2D(float(rand()), float(rand())) / (float(RAND_MAX) * .5f) - Vec2D(1, 1);
+        bounds.minX = std::min(bounds.minX, pt.x);
+        bounds.minY = std::min(bounds.minY, pt.y);
+        bounds.maxX = std::max(bounds.maxX, pt.x);
+        bounds.maxY = std::max(bounds.maxY, pt.y);
+        return pt;
+    };
+    for (int numVerbs = 1; numVerbs < 1 << 16; numVerbs <<= 1) {
+        path.rewind();
+        bounds.minX = bounds.minY = std::numeric_limits<float>::infinity();
+        bounds.maxX = bounds.maxY = -std::numeric_limits<float>::infinity();
+        for (int i = 0; i < numVerbs; ++i) {
+            switch (rand() % 5) {
+                case 0: path.move(randPt()); break;
+                case 1: path.line(randPt()); break;
+                case 2: path.quad(randPt(), randPt()); break;
+                case 3: path.cubic(randPt(), randPt(), randPt()); break;
+                case 4: path.close(); break;
+            }
+        }
+        AABB pathBounds = path.bounds();
+        REQUIRE(pathBounds.minX == bounds.minX);
+        REQUIRE(pathBounds.minY == bounds.minY);
+        REQUIRE(pathBounds.maxX == bounds.maxX);
+        REQUIRE(pathBounds.maxY == bounds.maxY);
+    }
+}
--- a/test/simd_test.cpp
+++ b/test/simd_test.cpp
@@ -0,0 +1,127 @@
+/*
+ * Copyright 2022 Rive
+ */
+
+#include <catch.hpp>
+
+#include "rive/math/simd.hpp"
+#include <limits>
+
+namespace rive {
+
+constexpr float kInf = std::numeric_limits<float>::infinity();
+constexpr float kNaN = std::numeric_limits<float>::quiet_NaN();
+
+// Verify the simd float types are IEEE 754 compliant for infinity and NaN.
+TEST_CASE("ieee-compliance", "[simd]") {
+    float4 test = float4{1, -kInf, 1, 4} / float4{0, 2, kInf, 4};
+    CHECK(simd::all(test == float4{kInf, -kInf, 0, 1}));
+
+    // Inf * Inf == Inf
+    test = float4{kInf, -kInf, kInf, -kInf} * float4{kInf, kInf, -kInf, -kInf};
+    CHECK(simd::all(test == float4{kInf, -kInf, -kInf, kInf}));
+
+    // Inf/0 == Inf, 0/Inf == 0
+    test = float4{kInf, -kInf, 0, 0} / float4{0, 0, kInf, -kInf};
+    CHECK(simd::all(test == float4{kInf, -kInf, 0, 0}));
+
+    // Inf/Inf, 0/0, 0 * Inf, Inf - Inf == NaN
+    test = {kInf, 0, 0, kInf};
+    test.xy /= float2{kInf, 0};
+    test.z *= kInf;
+    test.w -= kInf;
+    for (int i = 0; i < 4; ++i) {
+        CHECK(std::isnan(test[i]));
+    }
+    // NaN always fails comparisons.
+    CHECK(!simd::any(test == test));
+    CHECK(simd::all(test != test));
+    CHECK(!simd::any(test <= test));
+    CHECK(!simd::any(test >= test));
+    CHECK(!simd::any(test < test));
+    CHECK(!simd::any(test > test));
+
+    // Inf + Inf == Inf, Inf + -Inf == NaN
+    test = float4{kInf, -kInf, kInf, -kInf} + float4{kInf, -kInf, -kInf, kInf};
+    CHECK(simd::all(test.xy == float2{kInf, -kInf}));
+    CHECK(!simd::any(test.zw == test.zw)); // NaN
+}
+
+// Check that ?: works on vector and scalar conditions.
+TEST_CASE("ternary-operator", "[simd]") {
+    // Vector condition.
+    float4 f4 = int4{1, 2, 3, 4} < int4{4, 3, 2, 1} ? float4(-1) : 1.f;
+    CHECK(simd::all(f4 == float4{-1, -1, 1, 1}));
+
+    // In vector, -1 is true, 0 is false.
+    uint2 u2 = int2{0, -1} ? uint2{1, 2} : uint2{3, 4};
+    CHECK(simd::all(u2 == uint2{3, 2}));
+
+    // Scalar condition.
+    f4 = u2.x == u2.y ? float4{1, 2, 3, 4} : float4{5, 6, 7, 8};
+    CHECK(simd::all(f4 == float4{5, 6, 7, 8}));
+}
+
+// Check simd::min/max compliance.
+TEST_CASE("min-max", "[simd]") {
+    float4 f4 = simd::min(float4{1, 2, 3, 4}, float4{4, 3, 2});
+    CHECK(simd::all(f4 == float4{1, 2, 2, 0}));
+    f4 = simd::max(float4{1, 2, 3, 4}, float4{4, 3, 2});
+    CHECK(simd::all(f4 == float4{4, 3, 3, 4}));
+
+    int2 i2 = simd::max(int2(-1), int2{-2});
+    CHECK(simd::all(i2 == int2{-1, 0}));
+    i2 = simd::min(int2(-1), int2{-2});
+    CHECK(simd::all(i2 == int2{-2, -1}));
+
+    // Infinity works as expected.
+    f4 = simd::min(float4{100, -kInf, -kInf, kInf}, float4{kInf, 100, kInf, -kInf});
+    CHECK(simd::all(f4 == float4{100, -kInf, -kInf, -kInf}));
+    f4 = simd::max(float4{100, -kInf, -kInf, kInf}, float4{kInf, 100, kInf, -kInf});
+    CHECK(simd::all(f4 == float4{kInf, 100, kInf, kInf}));
+
+    // If a or b is NaN, min returns whichever is not NaN.
+    f4 = simd::min(float4{1, kNaN, 2, kNaN}, float4{kNaN, 1, 1, kNaN});
+    CHECK(simd::all(f4.xyz == 1));
+    CHECK(std::isnan(f4.w));
+    f4 = simd::max(float4{1, kNaN, 2, kNaN}, float4{kNaN, 1, 1, kNaN});
+    CHECK(simd::all(f4.xyz == vec<3>{1, 1, 2}));
+    CHECK(std::isnan(f4.w));
+
+    // simd::min/max differs from std::min/max when the first argument is NaN.
+    CHECK(simd::min<float, 1>(kNaN, 1).x == 1);
+    CHECK(std::isnan(std::min<float>(kNaN, 1)));
+    CHECK(simd::max<float, 1>(kNaN, 1).x == 1);
+    CHECK(std::isnan(std::max<float>(kNaN, 1)));
+
+    // simd::min/max is equivalent std::min/max when the second argument is NaN.
+    CHECK(simd::min<float, 1>(1, kNaN).x == std::min<float>(1, kNaN));
+    CHECK(simd::max<float, 1>(1, kNaN).x == std::max<float>(1, kNaN));
+}
+
+// Check simd::abs.
+TEST_CASE("abs", "[simd]") {
+
+    CHECK(simd::all(simd::abs(float4{-1, 2, -3, 4}) == float4{1, 2, 3, 4}));
+    CHECK(simd::all(simd::abs(float2{-5, 6}) == float2{5, 6}));
+    CHECK(simd::all(float4{-std::numeric_limits<float>::epsilon(),
+                           -std::numeric_limits<float>::denorm_min(),
+                           -std::numeric_limits<float>::max(),
+                           -kInf} == float4{-std::numeric_limits<float>::epsilon(),
+                                            -std::numeric_limits<float>::denorm_min(),
+                                            -std::numeric_limits<float>::max(),
+                                            -kInf}
+
+                    ));
+    float2 nan2 = simd::abs(float2{kNaN, -kNaN});
+    CHECK(std::isnan(nan2.x));
+    CHECK(std::isnan(nan2.y));
+    CHECK(simd::all(simd::abs(int4{7, -8, 9, -10}) == int4{7, 8, 9, 10}));
+    // abs(INT_MIN) returns INT_MIN.
+    CHECK(
+        simd::all(simd::abs(int2{-std::numeric_limits<int32_t>::max(),
+                                 std::numeric_limits<int32_t>::min()}) ==
+                  int2{std::numeric_limits<int32_t>::max(), std::numeric_limits<int32_t>::min()}));
+}
+
+} // namespace rive