| Age | Commit message (Collapse) | Author |
|
|
|
|
|
offsets for Tlds
Formatting
|
|
Reimplement the buffer cache using cached bindings and page level
granularity for modification tracking. This also drops the usage of
shared pointers and virtual functions from the cache.
- Bindings are cached, allowing to skip work when the game changes few
bits between draws.
- OpenGL Assembly shaders no longer copy when a region has been modified
from the GPU to emulate constant buffers, instead GL_EXT_memory_object
is used to alias sub-buffers within the same allocation.
- OpenGL Assembly shaders stream constant buffer data using
glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In
theory this should save one hash table resolve inside the driver
compared to glBufferSubData.
- A new OpenGL stream buffer is implemented based on fences for drivers
that are not Nvidia's proprietary, due to their low performance on
partial glBufferSubData calls synchronized with 3D rendering (that
some games use a lot).
- Most optimizations are shared between APIs now, allowing Vulkan to
cache more bindings than before, skipping unnecesarry work.
This commit adds the necessary infrastructure to use Vulkan object from
OpenGL. Overall, it improves performance and fixes some bugs present on
the old cache. There are still some edge cases hit by some games that
harm performance on some vendors, this are planned to be fixed in later
commits.
|
|
|
|
The current texture cache has several points that hurt maintainability
and performance. It's easy to break unrelated parts of the cache
when doing minor changes. The cache can easily forget valuable
information about the cached textures by CPU writes or simply by its
normal usage.The current texture cache has several points that hurt
maintainability and performance. It's easy to break unrelated parts
of the cache when doing minor changes. The cache can easily forget
valuable information about the cached textures by CPU writes or simply
by its normal usage.
This commit aims to address those issues.
|
|
fmt now automatically prints the numeric value of an enum class member
by default, so we don't need to use casts any more.
Reduces the line noise a bit.
|
|
Cleans out the rest of the occurrences of variable shadowing and makes
any further occurrences of shadowing compiler errors.
|
|
Migrates the video core code closer to enabling variable shadowing
warnings as errors.
This primarily sorts out shadowing occurrences within the Vulkan code.
|
|
decoder/image: Fix incorrect G24R8 component sizes in GetComponentSize()
|
|
shader: Partially implement texture cube array shadow
|
|
This implements texture cube arrays with shadow comparisons but doesn't
fix the asserts related to it.
Fixes out of bounds reads on swizzle constructors and makes them use
bounds checked ::at instead of the unsafe operator[].
|
|
Trivially add the encoding for this.
|
|
TMML takes an array argument that has no known meaning, this one appears
as the first component in gpr8 followed by s, t and r. Skip this
component when arrays are being used. Also implement CUBE texture types.
- Used by Pikmin 3: Deluxe Demo.
|
|
Same behavior, minus any redundant atomic reference count increments and
decrements.
|
|
decoder/texture: Eliminate narrowing conversion in GetTldCode()
|
|
Fortunately this didn't result in any issues, given the block that code
was falling through to would immediately break.
|
|
The assignment was previously truncating a u64 value to a bool.
|
|
This forces us to fix all -Wswitch warnings in video_core.
|
|
We need to provide a message for this variant of the macro, so we can
simply log out the type being used.
|
|
|
|
video_core: Allow copy elision to take place where applicable
|
|
decode/other: Implement S2R.LaneId
|
|
Removes const from some variables that are returned from functions, as
this allows the move assignment/constructors to execute for them.
|
|
This maps to host's thread id.
- Fixes graphical issues on Paper Mario.
|
|
Normalizes pixel format names to match Vulkan names. Previous to this
commit pixel formats had no convention, leading to confusion and
potential bugs.
|
|
shader/half_set: Implement HSET2_IMM
|
|
Add HSET2_IMM. Due to the complexity of the encoding avoid using
BitField unions and read the relevant bits from the code itself.
This is less error prone.
|
|
- Used by Kirby Star Allies
|
|
Games using D3D idioms can join images and samplers when a shader
executes, instead of baking them into a combined sampler image. This is
also possible on Vulkan.
One approach to this solution would be to use separate samplers on
Vulkan and leave this unimplemented on OpenGL, but we can't do this
because there's no consistent way of determining which constant buffer
holds a sampler and which one an image. We could in theory find the
first bit and if it's in the TIC area, it's an image; but this falls
apart when an image or sampler handle use an index of zero.
The used approach is to track for a LOP.OR operation (this is done at an
IR level, not at an ISA level), track again the constant buffers used as
source and store this pair. Then, outside of shader execution, join
the sample and image pair with a bitwise or operation.
This approach won't work on games that truly use separate samplers in a
meaningful way. For example, pooling textures in a 2D array and
determining at runtime what sampler to use.
This invalidates OpenGL's disk shader cache :)
- Used mostly by D3D ports to Switch
|
|
shader/other: Fix hardcoded value in S2R INVOCATION_INFO
|
|
Geometry shaders built from Nvidia's compiler check for bits[16:23] to
be less than or equal to 0 with VSETP to default to a "safe" value of
0x8000'0000 (safe from hardware's perspective). To avoid hitting this
path in the shader, return 0x00ff'0000 from S2R INVOCATION_INFO.
This seems to be the maximum number of vertices a geometry shader can
emit in a primitive.
|
|
This silences an assertion we were hitting and uses workgroup memory
barriers when the game requests it.
|
|
shader/other: Implement BAR.SYNC 0x0
|
|
shader/memory: Implement non-addition operations in RED
|
|
Trivially implement this particular case of BAR. Unless games use OpenCL
or CUDA barriers, we shouldn't hit any other case here.
|
|
Trivially implement these instructions. They are used in Astral Chain.
|
|
Hardware S2R special registers match gl_Thread*MaskNV. We can trivially
implement these using Nvidia's extension on OpenGL or naively stubbing
them with the ARB instructions to match. This might cause issues if the
host device warp size doesn't match Nvidia's. That said, this is
unlikely on proper shaders.
Refer to the attached url for more documentation about these flags.
https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shader_thread_group.txt
|
|
This allows us to use native SPIR-V instructions without having to
manually check for NAN.
|
|
shader/texture: Support multiple unknown sampler properties
|
|
This temporary is not needed as we mark Rd.CC + IADD.X as unimplemented.
It caused issues when tracking global buffers.
|
|
IADD.X Rd.CC requires some extra logic that is not currently
implemented. Abort when this is hit.
|
|
Signed integer addition overflow might be undefined behavior. It's free
to change operations to UAdd and use unsigned integers to avoid
potential bugs.
|
|
IADD.X takes the carry flag and adds it to the result. This is generally
used to emulate 64-bit operations with 32-bit registers.
|
|
|
|
P2R CC takes the state of condition codes and puts them into a register.
We already have this implemented for PR (predicates). This commit
implements CC over that.
|
|
Avoid atomic counters used by shared pointers.
|
|
decode/arithmetic_half: Fix HADD2 and HMUL2 absolute and negation bits
|
|
shader/arithmetic_integer: Fix LEA_IMM encoding
|
|
The encoding for negation and absolute value was wrong.
Extracting is now done manually. Similar instructions having different
encodings is the rule, not the exception. To keep sanity and readability
I preferred to extract the desired bit manually.
This is implemented against nxas:
https://github.com/ReinUsesLisp/nxas/blob/8dbc38995711cc12206aa370145a3a02665fd989/table.h#L68
That is itself tested against nvdisasm (Nvidia's official disassembler).
|