| Age | Commit message (Collapse) | Author |
|
Reimplement the buffer cache using cached bindings and page level
granularity for modification tracking. This also drops the usage of
shared pointers and virtual functions from the cache.
- Bindings are cached, allowing to skip work when the game changes few
bits between draws.
- OpenGL Assembly shaders no longer copy when a region has been modified
from the GPU to emulate constant buffers, instead GL_EXT_memory_object
is used to alias sub-buffers within the same allocation.
- OpenGL Assembly shaders stream constant buffer data using
glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In
theory this should save one hash table resolve inside the driver
compared to glBufferSubData.
- A new OpenGL stream buffer is implemented based on fences for drivers
that are not Nvidia's proprietary, due to their low performance on
partial glBufferSubData calls synchronized with 3D rendering (that
some games use a lot).
- Most optimizations are shared between APIs now, allowing Vulkan to
cache more bindings than before, skipping unnecesarry work.
This commit adds the necessary infrastructure to use Vulkan object from
OpenGL. Overall, it improves performance and fixes some bugs present on
the old cache. There are still some edge cases hit by some games that
harm performance on some vendors, this are planned to be fixed in later
commits.
|
|
Instead of using a two step initialization to report errors, initialize
the GPU renderer and rasterizer on the constructor and report errors
through std::runtime_error.
|
|
|
|
INSERT_PADDING_BYTES_NOINIT is more descriptive of the underlying behavior.
|
|
The current texture cache has several points that hurt maintainability
and performance. It's easy to break unrelated parts of the cache
when doing minor changes. The cache can easily forget valuable
information about the cached textures by CPU writes or simply by its
normal usage.The current texture cache has several points that hurt
maintainability and performance. It's easy to break unrelated parts
of the cache when doing minor changes. The cache can easily forget
valuable information about the cached textures by CPU writes or simply
by its normal usage.
This commit aims to address those issues.
|
|
maxwell_3d: Remove unused dirty_pointer array
|
|
fmt now automatically prints the numeric value of an enum class member
by default, so we don't need to use casts any more.
Reduces the line noise a bit.
|
|
Follows our established coding style.
|
|
Removes a documentation comment for a non-existent member.
|
|
This is unused and removing it shrinks the structure by 3584 bytes.
|
|
On Apple platforms, FALSE and TRUE are defined as macros by
<mach/boolean.h>, which is included by various system headers.
Note that there appear to be no actual users of the names to fix up.
|
|
Resolves variable shadowing scenarios up to the end of the OpenGL code
to make it nicer to review. The rest will be resolved in a following
commit.
|
|
Force early fragment tests when the 3D method is enabled.
The established pipeline cache takes care of recompiling if needed.
This is implemented only on Vulkan to avoid invalidating the shader
cache on OpenGL.
|
|
shader_bytecode: Eliminate variable shadowing
|
|
Ensures that all queried values are made use of.
|
|
|
|
This reduces the overhead of bounds checking on each element.
It won't reduce the cost of allocation because usually this vector's
capacity is usually large enough to hold whatever we push to it.
|
|
Deduplicate some code and put it in separate functions so it's easier to
understand and profile.
|
|
Trivially add the encoding for this.
|
|
|
|
|
|
Allows some implementations to avoid completely zeroing out the internal
buffer of the optional, and instead only set the validity byte within
the structure.
This also makes it consistent how we return empty optionals.
|
|
Same behavior, less repetition. We can also ensure all members of Config
are initialized.
|
|
Add an extra step in GPU initialization to be able to initialize render
backends with a valid GPU instance.
|
|
maxwell_3d: Resolve -Wextra-semi warning
|
|
Semicolons after a function definition aren't necessary.
|
|
There were two issues with block linear copies. First the swizzling was
wrong and this commit reimplements them.
The other issue was that these copies are generally used to download
render targets from the GPU and yuzu was not downloading them from
host GPU memory unless the extreme GPU accuracy setting was selected.
This commit enables cached memory reads for all accuracy levels.
- Fixes level thumbnails in Super Mario Maker 2.
|
|
Change GOB sizes from free-functions to constexpr constants.
Add SwizzleSliceToVoxel, a function that swizzles a 2D array of pixels
into a 3D texture and use it for 3D copies.
|
|
Rename registers in the MaxwellDMA class to match Nvidia's official
documentation. This one can be found here:
https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/dma-copy/clb0b5.h
While we are at it, reorganize the code in MaxwellDMA to be separated in
different functions.
|
|
shader/half_set: Implement HSET2_IMM
|
|
|
|
|
|
Add HSET2_IMM. Due to the complexity of the encoding avoid using
BitField unions and read the relevant bits from the code itself.
This is less error prone.
|
|
shader/texture: Join separate image and sampler pairs offline
|
|
This allows rendering to 3D textures with more than one slice.
Applications are allowed to render to more than one slice of a texture
using gl_Layer from a VTG shader.
This also requires reworking how 3D texture collisions are handled, for
now, this commit allows rendering to slices but not to miplevels. When a
render target attempts to write to a mipmap, we fallback to the previous
implementation (copying or flushing as needed).
- Fixes color correction 3D textures on UE4 games (rainbow effects).
- Allows Xenoblade games to render to 3D textures directly.
|
|
Games using D3D idioms can join images and samplers when a shader
executes, instead of baking them into a combined sampler image. This is
also possible on Vulkan.
One approach to this solution would be to use separate samplers on
Vulkan and leave this unimplemented on OpenGL, but we can't do this
because there's no consistent way of determining which constant buffer
holds a sampler and which one an image. We could in theory find the
first bit and if it's in the TIC area, it's an image; but this falls
apart when an image or sampler handle use an index of zero.
The used approach is to track for a LOP.OR operation (this is done at an
IR level, not at an ISA level), track again the constant buffers used as
source and store this pair. Then, outside of shader execution, join
the sample and image pair with a bitwise or operation.
This approach won't work on games that truly use separate samplers in a
meaningful way. For example, pooling textures in a 2D array and
determining at runtime what sampler to use.
This invalidates OpenGL's disk shader cache :)
- Used mostly by D3D ports to Switch
|
|
video_core: Implement Macro JIT
|
|
|
|
|
|
|
|
maxwell_3d: Initialize more registers to their expected value
|
|
|
|
These logs were killing performance on some games when they were
spammed. Reduce them to Debug severity.
|
|
Initialize line widths to avoid setting a line width of zero.
|
|
NVN expects this to be initialized as Fill, otherwise games that never
bind a rasterizer state will log an invalid polygon mode.
|
|
shader_ir: Add separate instructions for ordered and unordered comparisons and fix NE on GLSL
|
|
This allows us to use native SPIR-V instructions without having to
manually check for NAN.
|
|
video_core: Implement viewport swizzles with NV_viewport_swizzle
|
|
GPU: More optimizations to GPU Command List Processing and DMA Copy Optimizations
|
|
|