* Initial commit
* Update macros
* Add GPU Markers
* Fix builds without microprofile
* minor updates
* clang format
* Update profiler.cpp
* clang format
* Name Main Thread
* Update profiler_macros.h
* Fix end flip
* Update fiddle_context_gl.cpp
* clang format
* Update rive_build_config.lua
* Update rive_build_config.lua
* forked microprofile so I can use a tag
* Update render_context_d3d_impl.cpp
* clang
Co-authored-by: John White <aliasbinman@gmail.com>
Adreno 308 had a few issues:
* Crash from drawing too many instances, which we work around by
breaking them up with glFlush.
* Compiler failure from the compiler declaring a 3.1 bulitin in ESSL
3.0, which we work around with a #define.
* The advertised max texture size is 8192, but textures larger than
2048 seem to not work with EXT_multisampled_render_to_texture.
Either way, we shouldn't have any gms larger than 2048 since that's
the bare minimum per the spec. Shrink the larger gms down to 2048.
Rive had an issue as well:
* With EXT_multisampled_render_to_texture but not
KHR_blend_equation_advanced, we were trying to use the same texture
for both msaa and the dstRead. Separate these into their own
textures.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
With clockwise mode, we introduced ".vert" and ".frag" files and started
sharing the main vertex shaders with multiple fragment shaders. This PR
is a cleanup that removes redundant code and moves the clockwiseAtomic
shaders to that same system. clockwiseAtomic shaders also work out paint
colors via varyings now instead of storage buffers, which seems better
but doesn't register a difference in performance.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
When the renderTarget doesn't support
VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT, we have to use an offscreen color
texture instead. Previously, we would copy this offscreen texture back
into the renderTarget after the render pass, which incurred a
substantial amount of memory bandwidth. This PR instead transfers the
offscreen texture to the renderTarget as part of the render pass, and
then discards the offscreen texture, saving a fullscreen copy on TBDR
architectures.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
Color ramps are the final resource texture we need to make interruptible
for old Android GPUs that don't support complex render passes.
Also fix lots_of_tess_spans to look the same on MSAA and not.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
The Mali Vulkan driver/device was running out of memory internally, occasionally. This adds a hook into the testing window that gets run after each GM finishes, which TestingWindowAndroidVulkan now uses to tear the device down completely. The device then gets rebuilt as needed.
Co-authored-by: Josh Jersild <joshua@rive.app>
chore(vk): Make the tessellation pass interruptible
We recently worked around some driver crashes on Vulkan by breaking up
atlas & draw render passes that were too complex. This PR makes the
exact same workaround for tessellation passes, and adds a GM to catch
this case.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
* Clean up some assumptions in WebGPU that support for PLS meant it was
actually being used. Now that we support MSAA, it might not be in use.
* Clean up a refactoring typo that caused an input attachment to be
deleted before it was used.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
Vulkan has always been using minimum sized atlases of 2048x2048 because
we forgot to set PlatformFeatures::maxTextureSize. Start grabbing this
value off the physicalDeviceProps to enable fewer flushes.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
Vulkan's LOAD_OP_LOAD cannot be specified on a render pass attachment that is IMAGE_LAYOUT_UNDEFINED - this conditionally makes the image layout GENERAL in those cases (and allows it to stay UNDEFINED for the general case, since the texture may not yet be initialized)
Co-authored-by: Josh Jersild <joshua@rive.app>
Adreno 730, 740, and 830 don't seem to appreciate binding buffers and
updating descriptor sets before beginning the render pass, even though
this appears to be valid usage of the Vulkan API. Re-update atlas
rendering to begin the render pass first.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
Some early Android tilers are known to crash when a render pass is too complex. This PR is a first stab at adding an "interrupt" mechanism to atlas and draw render passes, which allows us to break up our rendering into smaller chunks that don't crash. Currently, only rasterOrdering mode is supported for interrupting draw passes. We will need to investigate whether this workaround is also needed for msaa, tessellation, and gradient textures.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
This change primarily works around a driver issue that was causing visual corruption on some newer Adreno-based devices.
There are other minor changes as well (displaying the driver version from the bootstrapping code, setting a minimum requirement of Vulkan 1.1 in the renderer)
Co-authored-by: Josh Jersild <joshua@rive.app>
Some tooling changed today-ish that is causing the 310 es shaders to fail to build with:
```
runner/_work/rive/rive/packages/runtime/tests/unit_tests/out/android_arm64_debug/include/generated/shaders/glsl.minified.glsl:450: 'token pasting (##)' : not supported with this profile: es
ERROR: spirv/blit_texture_as_draw_filtered.main:6: 'stringify (#)' : not supported with this profile: es
ERROR: spirv/blit_texture_as_draw_filtered.main:6: '#' : '#' is not followed by a macro parameter.
ERROR: spirv/blit_texture_as_draw_filtered.main:6: '' : missing #endif
ERROR: spirv/blit_texture_as_draw_filtered.main:6: '' : compilation terminated
ERROR: 5 compilation errors. No code generated.
```
This is only a "310 es" issue, and so since most spir-v shaders are already 460, this moves the rest to be the same.
Co-authored-by: Josh Jersild <joshua@rive.app>
* fix(vk): Implement manual MSAA resolves
Some Android devices have issues with MSAA resolves when the MSAA color
buffer is also read as an input attachment. In the past we've worked
around this by adding an empty subpass at the end of the render pass.
This PR implements fully manual resolves instead, which we now use when
there are blend modes and partial updates. This is hopefully a more
stable workaround than a mystery subpass, and will ideally get better
performance as well when we don't need to resolve the entire render
target.
* Fix synchronization validation (had a write/write hazard between the image state transition and the load op)
* fix(vk): Implement manual MSAA resolves
Some Android devices have issues with MSAA resolves when the MSAA color
buffer is also read as an input attachment. In the past we've worked
around this by adding an empty subpass at the end of the render pass.
This PR implements fully manual resolves instead, which we now use when
there are blend modes and partial updates. This is hopefully a more
stable workaround than a mystery subpass, and will ideally get better
performance as well when we don't need to resolve the entire render
target.
* Fix synchronization validation (had a write/write hazard between the image state transition and the load op)
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
Co-authored-by: JoshJRive <joshua@rive.app>
After testing a statistically significant number of devices (7,
specifically!) I have concluded that implicit PowerVR raster ordering
only works with Rive on Vulkan 1.3 contexts. Update our renderer
accordingly.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
When Vulkan expands the renderTarget into the MSAA color buffer for
LoadAction::preserveRenderTarget, we've been reading the resolve texture
as an input attachment. But it's debatable whether a texture can be an
input attachment AND a resolve attachment in the same render pass, and
some early Qualcomm devices have struggled with this even if we
implement the MSAA resolve manually. For now, always copy out the render
target to a separate texture when there's a preserve.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
We already have a codepath that enables rasterOrdering on ARM, even
if the extension isn't present, because we know that's how these GPUs
work. If we enable this codepath on Imagination, it appears to work as
well. This works around an MSAA crash on Pixel 10.
feat(wgpu): Add core support for MSAA (#11040) cb9968caef
Implement an MSAA mode in WebGPU that doesn't rely on any optional
(non-core) features besides SPIR-V. This will be the catch-all fallback
that works everywhere.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
Co-authored-by: blakdragan7 <jcopela4@gmail.com>
Fix a couple shader flags in WebGPU (including some refactoring to
handle them at the RenderContext level instead of the backends).
Fix renderTarget{Width,Height} in TestingWindowWGPU.
Add a "--gms" option to check_golds.sh so we can skip goldens in local
runs and make them faster.
Add a "-r" option to check_golds.sh for release runs.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
* fixed intel arc
* dont use raster ordering with intel arc
* better comment
* pulled out intel checks as requested on PR
Co-authored-by: Jonathon Copeland <jcopela4@gmail.com>
The version that was being passed to the rive vulkan context was the version from querying the instance, which can differ from the API version that the actual device supports. This now correctly passes the device's vulkan version, which eliminates the need for a weird workaround in the VMA initialization (where we thought we had a Vulkan 1.3 device but in fact it was only Vulkan 1.0).
Additionally mark the VKDBGUTILWARN003 message that happens on the S23s as non-aborting. It appears to be an incorrect warning - "Renderpass is not qualified for multipass due to a given subpass", for a renderpass that is not set up as multipass.
Co-authored-by: Josh Jersild <joshua@rive.app>
Clockwise mode finally gives us a really good use case for
EXT_shader_pixel_local_storage2. On scenes that don't use advanced
blend, this is showing speedups around 1.35x.
This PR also does some cleanup around consolidating the logic for
ShaderMiscFlags::fixedFunctionColorOutput, so we don't have to
re-implement the same logic in every backend.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
* proper states based one warnings. Support for gpunamefilter
* fix for raster order mode
* now works on AMD
* factored out wstring conversion
* clang format
* no need for pop back if using len -1
* removed uneeded forward declare
Co-authored-by: Jonathon Copeland <jcopela4@gmail.com>
* updated unreal msaa shaders to use new includes
* moved gamma convert to be at end instead of begining
Co-authored-by: Jonathon Copeland <jcopela4@gmail.com>
* Started adding msaa mode to rhi
* initial state msaa working on d3d12
* Now set stencil ref via set pipeline state
* set stencil in correct location
* vulkan msaa working for non advanced blend in rhi
* fixed missing check for msaa
* proper merge msaa and fix advanced blend for atomics
* added COALESCED_PLS_RESOLVE_AND_TRANSFER permutation and added better shader compile exclusions
* Now uses coalesce resolve
* base msaa working in d3d12 rhi
* mac support for non advanced blend msaa
* finally wokring on both dx12 and vulkan
* seperated out splitting render passes by a platform feature flag
* better name
* merge master
* some odd typo
* better matrix compare
* better comment
* glsl syntax fix
* removed commented code
* removed is matrix function in favor of all() as per PR comment
* factored out modify shader as a subcless
* updated cvar variable
* fix typo
* followed PR comments
* more pr comments
* clip planes actually do work now
* removed FixedFunctionColorOutput variable in favor of the flush desc member var
* fixed type restrictions and clang format
* improved comment and added missed file from last commit
* revert factoring common code because it was causing several errors
* factored out common shader code
* check clip plane support
* changed back to vulkan only
Co-authored-by: Jonathon Copeland <jcopela4@gmail.com>
Pretty straight-forward implementation.
On Intel this is showing speedups in the 1.2-1.6x range (over atomic
mode, and depending on content).
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
Device fixes:
- Break the MSAA resolve off into a separate render subpass to resolve corruption on some Adreno devices
- Work around an issue where some early Mali drivers report Vulkan 1.1 support but are missing the allocation functions that vma is looking for
- Some of those Mali devices spew incorrect validation errors about the inability to create, say, an RGB8 texture because the maximum mipmap count is 0 (and other similar "oops we queried and got 0" errors) even though the images actually create fine. Added them to an ignore list.
- The MSAA color and depth/stencil textures are now being created with the TRANSIENT bit (this doesn't fix any known device issues but it does seem more correct)
Other improvements:
- vk_check now displays a string representation of the vulkan error instead of just its numeric value
- the abort handler on Android now also prints the stack trace (to make debugging easier)
- fiddle_context_vulkan now requests the correct width and height for image capture
- The queueImageCopy function in swapchain was not properly setting the pixel read bounds to the whole swapchain if passed an empty AABB
Co-authored-by: Josh Jersild <joshua@rive.app>
Add a new InterlockMode that overwrites all fill rules as clockwise and
implements the clockwise path rendering algorithm using raster ordered
PLS. The only backend to support this so far is GL with shader images,
but more will come.
Notably, when there is no advanced blend, we can set
"fixedFunctionColorOutput" and render directly to the target
framebuffer.
Performance so far looks promising, especially on Intel, but more
in-depth perf optimizations arebyet to come.
This is implemented as its own InterlockMode only to give it soak time.
Once this rendering mode is stable, we can merge it back into
"rasterOrdering" and just select draw shaders based on fill rule.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
On some Adreno models, gradients were not properly rendering properly, due to the texture/gradient coordinates always being 0
This was caused by what appears to be a driver bug, in which doing vkUpdateDescriptorSets with multiple descriptors (in our case, updating paintData and paintAuxData in a single call) would fail to properly commit the data from the second one. We're only doing this in the one place. Separating this into two calls causes gradients and images to now display as expected.
Co-authored-by: Josh Jersild <joshua@rive.app>
The shader key and render pass key were both including the interlock
mode, and the pipeline key was so full that adding a new interlock mode
would overflow it. Only include the interlock mode once.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
A recent refactor introduced an issue on Vivo Y21 and Oppo Reno 3 where
writes to pixel local storage got disabled when the color mask was off.
Just leave the color mask enabled in EXT_shader_pixel_local_storage
mode.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
Previously we were trying to shoehorn atlas blits into
draw_raster_order_path.frag, which led to a lot of hard-to-read #ifdefs
since they're so different from paths. In reality, atlas blits are more
similar to image meshes -- both are just triangle meshes that render
directly without having to work out winding numbers -- so handle them in
draw_raster_order_mesh.frag instead (which we renamed from
draw_raster_order_image_mesh.frag).
Additionally, for MSAA, the draws are all so similar that we can just
merge them all into a common "draw_msaa_object.frag" shader.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
We've been testing _TARGET_OS in a lot of places in our build, but that is not actually updated in Android/Emscripten builds (it would continue to report, say, Windows). Among other things, this meant that we would build D3D12 shaders for every Android build, even though they were not needed.
This adds a `rive_target_os` value to `rive_build_config.lua` which is updated manually for those two projects, so that scripts won't do the Windows/mac things when building for android/emscripten on those platforms.
Co-authored-by: Josh Jersild <joshua@rive.app>
* Workaround for Android 9/10 Adreno 5- and 6-series Vulkan driver bug
There's a bug in some of the early Adreno drivers where some of the Rive shaders will fail to compile due to hitting an internal limit. The workaround is to run a shader compilation pre-pass to inline functions before embedding the compiled shaders into the runtime.
Co-authored-by: Josh Jersild <joshua@rive.app>
* refactor(renderer): Generalize fixedFunctionColorOutput
Previously, fixedFunctionColorOutput was called
"atomicFixedFunctionColorOutput" and only applied to atomic mode.
Generalize it so it also applies to msaa, and will be compatible with
clockwise rendering in the near future.
* made build in unreal
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>
Co-authored-by: blakdragan7 <jcopela4@Gmail.com>
Before this change, renderer backends were responsible for allocating
any textures or buffers required to back pixel local storage data. They
all did this by allocating separate backings for each render target,
which is wasteful on systems with multiple render targets.
This change adds PLS backings to the RenderContext's resource tracking
system, and allocates one global PLS backing with the dimensions of the
largest recent render targets. For now, only GL is updated to take
advantage of this resource, but other backends should follow soon.
It also merges the transient PLS backings into a single logical
TEXTURE_2D_ARRAY, as opposed to separate TEXTURE_2Ds. Allocating the PLS
backings in a 3D layout appears to get better cache performance on Intel
Arc GPUs.
Co-authored-by: Chris Dalton <99840794+csmartdalton@users.noreply.github.com>