aboutsummaryrefslogtreecommitdiff
path: root/src/video_core/engines
AgeCommit message (Collapse)Author
2020-10-28shader/arithmetic: Implement FCMP immediate + register variantReinUsesLisp
Trivially add the encoding for this.
2020-10-09video_core: Enforce -Wclass-memaccessReinUsesLisp
2020-10-02video_core: Enforce -Wunused-variable and -Wunused-but-set-variableReinUsesLisp
2020-09-22General: Make use of std::nullopt where applicableLioncash
Allows some implementations to avoid completely zeroing out the internal buffer of the optional, and instead only set the validity byte within the structure. This also makes it consistent how we return empty optionals.
2020-09-18fermi_2d: Make use of designated initializersLioncash
Same behavior, less repetition. We can also ensure all members of Config are initialized.
2020-08-22video_core: Initialize renderer with a GPUReinUsesLisp
Add an extra step in GPU initialization to be able to initialize render backends with a valid GPU instance.
2020-08-16Merge pull request #4519 from lioncash/semibunnei
maxwell_3d: Resolve -Wextra-semi warning
2020-08-14maxwell_3d: Resolve -Wextra-semi warningLioncash
Semicolons after a function definition aren't necessary.
2020-08-10textures/decoders: Fix block linear to pitch copiesReinUsesLisp
There were two issues with block linear copies. First the swizzling was wrong and this commit reimplements them. The other issue was that these copies are generally used to download render targets from the GPU and yuzu was not downloading them from host GPU memory unless the extreme GPU accuracy setting was selected. This commit enables cached memory reads for all accuracy levels. - Fixes level thumbnails in Super Mario Maker 2.
2020-07-10video_core/textures: Add and use SwizzleSliceToVoxel, and minor style changesReinUsesLisp
Change GOB sizes from free-functions to constexpr constants. Add SwizzleSliceToVoxel, a function that swizzles a 2D array of pixels into a 3D texture and use it for 3D copies.
2020-07-07maxwell_dma: Rename registers to match official docs and reorderReinUsesLisp
Rename registers in the MaxwellDMA class to match Nvidia's official documentation. This one can be found here: https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/dma-copy/clb0b5.h While we are at it, reorganize the code in MaxwellDMA to be separated in different functions.
2020-06-26Merge pull request #4147 from ReinUsesLisp/hset2-immbunnei
shader/half_set: Implement HSET2_IMM
2020-06-24Addressed issuesDavid Marcec
2020-06-24Macro HLE supportDavid Marcec
2020-06-22shader/half_set: Implement HSET2_IMMReinUsesLisp
Add HSET2_IMM. Due to the complexity of the encoding avoid using BitField unions and read the relevant bits from the code itself. This is less error prone.
2020-06-13Merge pull request #4049 from ReinUsesLisp/separate-samplersbunnei
shader/texture: Join separate image and sampler pairs offline
2020-06-08texture_cache: Implement rendering to 3D texturesReinUsesLisp
This allows rendering to 3D textures with more than one slice. Applications are allowed to render to more than one slice of a texture using gl_Layer from a VTG shader. This also requires reworking how 3D texture collisions are handled, for now, this commit allows rendering to slices but not to miplevels. When a render target attempts to write to a mipmap, we fallback to the previous implementation (copying or flushing as needed). - Fixes color correction 3D textures on UE4 games (rainbow effects). - Allows Xenoblade games to render to 3D textures directly.
2020-06-05shader/texture: Join separate image and sampler pairs offlineReinUsesLisp
Games using D3D idioms can join images and samplers when a shader executes, instead of baking them into a combined sampler image. This is also possible on Vulkan. One approach to this solution would be to use separate samplers on Vulkan and leave this unimplemented on OpenGL, but we can't do this because there's no consistent way of determining which constant buffer holds a sampler and which one an image. We could in theory find the first bit and if it's in the TIC area, it's an image; but this falls apart when an image or sampler handle use an index of zero. The used approach is to track for a LOP.OR operation (this is done at an IR level, not at an ISA level), track again the constant buffers used as source and store this pair. Then, outside of shader execution, join the sample and image pair with a bitwise or operation. This approach won't work on games that truly use separate samplers in a meaningful way. For example, pooling textures in a 2D array and determining at runtime what sampler to use. This invalidates OpenGL's disk shader cache :) - Used mostly by D3D ports to Switch
2020-06-04Merge pull request #4009 from ogniK5377/macro-jit-prodbunnei
video_core: Implement Macro JIT
2020-06-04Default init labels and use initializer list for macro engineDavid Marcec
2020-06-03Mark parameters as constDavid Marcec
2020-06-02Pass by reference instead of copying parametersDavid Marcec
2020-06-01Merge pull request #3998 from ReinUsesLisp/init-3dbunnei
maxwell_3d: Initialize more registers to their expected value
2020-05-30Implement macro JITDavid Marcec
2020-05-28maxwell_3d: Reduce severity of logs that can be spammedReinUsesLisp
These logs were killing performance on some games when they were spammed. Reduce them to Debug severity.
2020-05-27maxwell_3d: Initialize line widthsReinUsesLisp
Initialize line widths to avoid setting a line width of zero.
2020-05-27maxwell_3d: Initialize polygon modesReinUsesLisp
NVN expects this to be initialized as Fill, otherwise games that never bind a rasterizer state will log an invalid polygon mode.
2020-05-13Merge pull request #3899 from ReinUsesLisp/float-comparisonsbunnei
shader_ir: Add separate instructions for ordered and unordered comparisons and fix NE on GLSL
2020-05-09shader_ir: Separate float-point comparisons in ordered and unorderedReinUsesLisp
This allows us to use native SPIR-V instructions without having to manually check for NAN.
2020-05-08Merge pull request #3885 from ReinUsesLisp/viewport-swizzlesbunnei
video_core: Implement viewport swizzles with NV_viewport_swizzle
2020-05-05Merge pull request #3815 from FernandoS27/command-list-2bunnei
GPU: More optimizations to GPU Command List Processing and DMA Copy Optimizations
2020-05-04vk_graphics_pipeline: Implement viewport swizzles with NV_viewport_swizzleReinUsesLisp
2020-05-04maxwell_3d: Add viewport swizzlesReinUsesLisp
2020-05-03Merge pull request #3808 from ReinUsesLisp/wait-for-idlebunnei
{maxwell_3d,buffer_cache}: Implement memory barriers using 3D registers
2020-04-30Merge pull request #3807 from ReinUsesLisp/fix-depth-clampbunnei
maxwell_3d: Fix depth clamping register
2020-04-30Merge pull request #3799 from ReinUsesLisp/iadd-ccbunnei
shader: Implement P2R CC, IADD Rd.CC and IADD.X
2020-04-28Clang Format and Documentation.Fernando Sahmkow
2020-04-28MaxwellDMA: Optimize micro copies.Fernando Sahmkow
2020-04-28{maxwell_3d,buffer_cache}: Implement memory barriers using 3D registersReinUsesLisp
Drop MemoryBarrier from the buffer cache and use Maxwell3D's register WaitForIdle. To implement this on OpenGL we just call glMemoryBarrier with the necessary bits. Vulkan lacks this synchronization primitive, so we set an event and immediately wait for it. This is not a pretty solution, but it's what Vulkan can do without submitting the current command buffer to the queue (which ends up being more expensive on the CPU).
2020-04-27VideoCore/Engines: Refactor Engines CallMethod.Fernando Sahmkow
2020-04-27maxwell_3d: Fix depth clamping registerReinUsesLisp
Using deko3d as reference: https://github.com/devkitPro/deko3d/blob/4e47ba0013552e592a86ab7a2510d1e7dadf236a/source/maxwell/gpu_3d_state.cpp#L42 We were using bits 3 and 4 to determine depth clamping, but these are the same both enabled and disabled: state->depthClampEnable ? 0x101A : 0x181D The same happens on Nvidia's OpenGL driver, where they do something like this (default capabilities, GL 4.5 compatibility): (state & DEPTH_CLAMP) != 0 ? 0x201a : 0x281c There's always a difference between the first bits in this register, but bit 11 is consistently disabled on both deko3d/NVN and OpenGL. This commit changes yuzu's behaviour to use bit 11 to determine depth clamping. - Fixes depth issues on Super Mario Odyssey's intro.
2020-04-27Merge pull request #3742 from FernandoS27/command-listbunnei
Optimize GPU Command Lists and Introduce Fast GPU Time Option
2020-04-26Merge pull request #3753 from ReinUsesLisp/ac-vulkanRodrigo Locatti
{gl,vk}_rasterizer: Add lazy default buffer maker and use it for empty buffers
2020-04-25shader/arithmetic_integer: Implement IADD.XReinUsesLisp
IADD.X takes the carry flag and adds it to the result. This is generally used to emulate 64-bit operations with 32-bit registers.
2020-04-25Merge pull request #3734 from ReinUsesLisp/half-float-modsbunnei
decode/arithmetic_half: Fix HADD2 and HMUL2 absolute and negation bits
2020-04-24Fix -Wdeprecated-copy warning.Markus Wick
2020-04-23decode/arithmetic_half: Fix HADD2 and HMUL2 absolute and negation bitsReinUsesLisp
The encoding for negation and absolute value was wrong. Extracting is now done manually. Similar instructions having different encodings is the rule, not the exception. To keep sanity and readability I preferred to extract the desired bit manually. This is implemented against nxas: https://github.com/ReinUsesLisp/nxas/blob/8dbc38995711cc12206aa370145a3a02665fd989/table.h#L68 That is itself tested against nvdisasm (Nvidia's official disassembler).
2020-04-23Clang Format.Fernando Sahmkow
2020-04-23Maxwell3D: Process Macros on MultiMethod.Fernando Sahmkow
2020-04-23DMAPusher: Propagate multimethod writes into the engines.Fernando Sahmkow