| Age | Commit message (Collapse) | Author |
|
Trivially add the encoding for this.
|
|
|
|
|
|
Allows some implementations to avoid completely zeroing out the internal
buffer of the optional, and instead only set the validity byte within
the structure.
This also makes it consistent how we return empty optionals.
|
|
Same behavior, less repetition. We can also ensure all members of Config
are initialized.
|
|
Add an extra step in GPU initialization to be able to initialize render
backends with a valid GPU instance.
|
|
maxwell_3d: Resolve -Wextra-semi warning
|
|
Semicolons after a function definition aren't necessary.
|
|
There were two issues with block linear copies. First the swizzling was
wrong and this commit reimplements them.
The other issue was that these copies are generally used to download
render targets from the GPU and yuzu was not downloading them from
host GPU memory unless the extreme GPU accuracy setting was selected.
This commit enables cached memory reads for all accuracy levels.
- Fixes level thumbnails in Super Mario Maker 2.
|
|
Change GOB sizes from free-functions to constexpr constants.
Add SwizzleSliceToVoxel, a function that swizzles a 2D array of pixels
into a 3D texture and use it for 3D copies.
|
|
Rename registers in the MaxwellDMA class to match Nvidia's official
documentation. This one can be found here:
https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/dma-copy/clb0b5.h
While we are at it, reorganize the code in MaxwellDMA to be separated in
different functions.
|
|
shader/half_set: Implement HSET2_IMM
|
|
|
|
|
|
Add HSET2_IMM. Due to the complexity of the encoding avoid using
BitField unions and read the relevant bits from the code itself.
This is less error prone.
|
|
shader/texture: Join separate image and sampler pairs offline
|
|
This allows rendering to 3D textures with more than one slice.
Applications are allowed to render to more than one slice of a texture
using gl_Layer from a VTG shader.
This also requires reworking how 3D texture collisions are handled, for
now, this commit allows rendering to slices but not to miplevels. When a
render target attempts to write to a mipmap, we fallback to the previous
implementation (copying or flushing as needed).
- Fixes color correction 3D textures on UE4 games (rainbow effects).
- Allows Xenoblade games to render to 3D textures directly.
|
|
Games using D3D idioms can join images and samplers when a shader
executes, instead of baking them into a combined sampler image. This is
also possible on Vulkan.
One approach to this solution would be to use separate samplers on
Vulkan and leave this unimplemented on OpenGL, but we can't do this
because there's no consistent way of determining which constant buffer
holds a sampler and which one an image. We could in theory find the
first bit and if it's in the TIC area, it's an image; but this falls
apart when an image or sampler handle use an index of zero.
The used approach is to track for a LOP.OR operation (this is done at an
IR level, not at an ISA level), track again the constant buffers used as
source and store this pair. Then, outside of shader execution, join
the sample and image pair with a bitwise or operation.
This approach won't work on games that truly use separate samplers in a
meaningful way. For example, pooling textures in a 2D array and
determining at runtime what sampler to use.
This invalidates OpenGL's disk shader cache :)
- Used mostly by D3D ports to Switch
|
|
video_core: Implement Macro JIT
|
|
|
|
|
|
|
|
maxwell_3d: Initialize more registers to their expected value
|
|
|
|
These logs were killing performance on some games when they were
spammed. Reduce them to Debug severity.
|
|
Initialize line widths to avoid setting a line width of zero.
|
|
NVN expects this to be initialized as Fill, otherwise games that never
bind a rasterizer state will log an invalid polygon mode.
|
|
shader_ir: Add separate instructions for ordered and unordered comparisons and fix NE on GLSL
|
|
This allows us to use native SPIR-V instructions without having to
manually check for NAN.
|
|
video_core: Implement viewport swizzles with NV_viewport_swizzle
|
|
GPU: More optimizations to GPU Command List Processing and DMA Copy Optimizations
|
|
|
|
|
|
{maxwell_3d,buffer_cache}: Implement memory barriers using 3D registers
|
|
maxwell_3d: Fix depth clamping register
|
|
shader: Implement P2R CC, IADD Rd.CC and IADD.X
|
|
|
|
|
|
Drop MemoryBarrier from the buffer cache and use Maxwell3D's register
WaitForIdle.
To implement this on OpenGL we just call glMemoryBarrier with the
necessary bits.
Vulkan lacks this synchronization primitive, so we set an event and
immediately wait for it. This is not a pretty solution, but it's what
Vulkan can do without submitting the current command buffer to the queue
(which ends up being more expensive on the CPU).
|
|
|
|
Using deko3d as reference:
https://github.com/devkitPro/deko3d/blob/4e47ba0013552e592a86ab7a2510d1e7dadf236a/source/maxwell/gpu_3d_state.cpp#L42
We were using bits 3 and 4 to determine depth clamping, but these are
the same both enabled and disabled:
state->depthClampEnable ? 0x101A : 0x181D
The same happens on Nvidia's OpenGL driver, where they do something like
this (default capabilities, GL 4.5 compatibility):
(state & DEPTH_CLAMP) != 0 ? 0x201a : 0x281c
There's always a difference between the first bits in this register, but
bit 11 is consistently disabled on both deko3d/NVN and OpenGL. This
commit changes yuzu's behaviour to use bit 11 to determine depth
clamping.
- Fixes depth issues on Super Mario Odyssey's intro.
|
|
Optimize GPU Command Lists and Introduce Fast GPU Time Option
|
|
{gl,vk}_rasterizer: Add lazy default buffer maker and use it for empty buffers
|
|
IADD.X takes the carry flag and adds it to the result. This is generally
used to emulate 64-bit operations with 32-bit registers.
|
|
decode/arithmetic_half: Fix HADD2 and HMUL2 absolute and negation bits
|
|
|
|
The encoding for negation and absolute value was wrong.
Extracting is now done manually. Similar instructions having different
encodings is the rule, not the exception. To keep sanity and readability
I preferred to extract the desired bit manually.
This is implemented against nxas:
https://github.com/ReinUsesLisp/nxas/blob/8dbc38995711cc12206aa370145a3a02665fd989/table.h#L68
That is itself tested against nvdisasm (Nvidia's official disassembler).
|
|
|
|
|
|
|