| Age | Commit message (Collapse) | Author |
|
shader/half_set: Implement HSET2_IMM
|
|
Add HSET2_IMM. Due to the complexity of the encoding avoid using
BitField unions and read the relevant bits from the code itself.
This is less error prone.
|
|
- Used by Kirby Star Allies
|
|
Games using D3D idioms can join images and samplers when a shader
executes, instead of baking them into a combined sampler image. This is
also possible on Vulkan.
One approach to this solution would be to use separate samplers on
Vulkan and leave this unimplemented on OpenGL, but we can't do this
because there's no consistent way of determining which constant buffer
holds a sampler and which one an image. We could in theory find the
first bit and if it's in the TIC area, it's an image; but this falls
apart when an image or sampler handle use an index of zero.
The used approach is to track for a LOP.OR operation (this is done at an
IR level, not at an ISA level), track again the constant buffers used as
source and store this pair. Then, outside of shader execution, join
the sample and image pair with a bitwise or operation.
This approach won't work on games that truly use separate samplers in a
meaningful way. For example, pooling textures in a 2D array and
determining at runtime what sampler to use.
This invalidates OpenGL's disk shader cache :)
- Used mostly by D3D ports to Switch
|
|
shader/other: Fix hardcoded value in S2R INVOCATION_INFO
|
|
Geometry shaders built from Nvidia's compiler check for bits[16:23] to
be less than or equal to 0 with VSETP to default to a "safe" value of
0x8000'0000 (safe from hardware's perspective). To avoid hitting this
path in the shader, return 0x00ff'0000 from S2R INVOCATION_INFO.
This seems to be the maximum number of vertices a geometry shader can
emit in a primitive.
|
|
This silences an assertion we were hitting and uses workgroup memory
barriers when the game requests it.
|
|
shader/other: Implement BAR.SYNC 0x0
|
|
shader/memory: Implement non-addition operations in RED
|
|
Trivially implement this particular case of BAR. Unless games use OpenCL
or CUDA barriers, we shouldn't hit any other case here.
|
|
Trivially implement these instructions. They are used in Astral Chain.
|
|
Hardware S2R special registers match gl_Thread*MaskNV. We can trivially
implement these using Nvidia's extension on OpenGL or naively stubbing
them with the ARB instructions to match. This might cause issues if the
host device warp size doesn't match Nvidia's. That said, this is
unlikely on proper shaders.
Refer to the attached url for more documentation about these flags.
https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shader_thread_group.txt
|
|
This allows us to use native SPIR-V instructions without having to
manually check for NAN.
|
|
shader/texture: Support multiple unknown sampler properties
|
|
This temporary is not needed as we mark Rd.CC + IADD.X as unimplemented.
It caused issues when tracking global buffers.
|
|
IADD.X Rd.CC requires some extra logic that is not currently
implemented. Abort when this is hit.
|
|
Signed integer addition overflow might be undefined behavior. It's free
to change operations to UAdd and use unsigned integers to avoid
potential bugs.
|
|
IADD.X takes the carry flag and adds it to the result. This is generally
used to emulate 64-bit operations with 32-bit registers.
|
|
|
|
P2R CC takes the state of condition codes and puts them into a register.
We already have this implemented for PR (predicates). This commit
implements CC over that.
|
|
Avoid atomic counters used by shared pointers.
|
|
decode/arithmetic_half: Fix HADD2 and HMUL2 absolute and negation bits
|
|
shader/arithmetic_integer: Fix LEA_IMM encoding
|
|
The encoding for negation and absolute value was wrong.
Extracting is now done manually. Similar instructions having different
encodings is the rule, not the exception. To keep sanity and readability
I preferred to extract the desired bit manually.
This is implemented against nxas:
https://github.com/ReinUsesLisp/nxas/blob/8dbc38995711cc12206aa370145a3a02665fd989/table.h#L68
That is itself tested against nvdisasm (Nvidia's official disassembler).
|
|
This allows deducing some properties from the texture instruction before
asking the runtime. By doing this we can handle type mismatches in some
instructions from the renderer instead of the shader decoder.
Fixes texelFetch issues with games using 2D texture instructions on a 1D
sampler.
|
|
|
|
The operand order in LEA_IMM was flipped compared to nvdisasm. Fix that
using nxas as reference:
https://github.com/ReinUsesLisp/nxas/blob/8dbc38995711cc12206aa370145a3a02665fd989/table.h#L122
|
|
Only the first element of the returned pair is ever used.
|
|
Some variables aren't used, so we can remove these.
Unfortunately, diagnostics are still reported on structured bindings
even when annotated with [[maybe_unused]], so we need to unpack the
elements that we want to use manually.
|
|
Same behavior, less code.
|
|
We can just specify the initializers.
|
|
CMakeLists: Specify -Wextra on linux builds
|
|
Removes a redundant variable that is already satisfied by the IsFull()
utility function.
|
|
Allows reporting more cases where logic errors may exist, such as
implicit fallthrough cases, etc.
We currently ignore unused parameters, since we currently have many
cases where this is intentional (virtual interfaces).
While we're at it, we can also tidy up any existing code that causes
warnings. This also uncovered a few bugs as well.
|
|
shader/memory: Implement RED.E.ADD and minor changes to ATOM
|
|
Adds another variant of FCMP.
|
|
shader/conversion: Implement I2I sign extension, saturation and selection
|
|
shader/texture: Remove type mismatches management from shader decoder
|
|
shader/video: Partially implement VMNMX
|
|
Implements the common usages for VMNMX. Inputs with a different size
than 32 bits are not supported and sign mismatches aren't supported
either.
VMNMX works as follows:
It grabs Ra and Rb and applies a maximum/minimum on them (this is
defined by .MX), having in mind the input sign. This result can then be
saturated. After the intermediate result is calculated, it applies
another operation on it using Rc. These operations are merges,
accumulations or another min/max pass.
This instruction allows to implement with a more flexible approach GCN's
min3 and max3 instructions (for instance).
|
|
Since commit e22816a5bb we handle type mismatches from the CPU.
We don't need to hack our shader decoder due to game bugs anymore.
Removed in this commit.
|
|
video_core/shader: Add some instruction and S2R encodings
|
|
shader: implement SULD.D bits32/64
|
|
|
|
Reimplements I2I adding sign extension, saturation (clamp source value
to the destination), selection and destination sizes that are not 32
bits wide.
It doesn't implement CC yet.
|
|
Co-Authored-By: Rodrigo Locatti <reinuseslisp@airmail.cc>
|
|
|
|
|
|
|
|
removed shader stage.
|