<feed xmlns='http://www.w3.org/2005/Atom'>
<title>Ryujinx/Ryujinx.Graphics.Gpu/GpuContext.cs, branch master</title>
<subtitle>A backup of the Ryujinx master git branch.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/'/>
<entry>
<title>Move solution and projects to src</title>
<updated>2023-04-27T21:51:14+00:00</updated>
<author>
<name>TSR Berry</name>
<email>20988865+TSRBerry@users.noreply.github.com</email>
</author>
<published>2023-04-07T23:22:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=cee712105850ac3385cd0091a923438167433f9f'/>
<id>cee712105850ac3385cd0091a923438167433f9f</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Vulkan: Don't flush commands when creating most sync (#4087)</title>
<updated>2022-12-29T14:39:04+00:00</updated>
<author>
<name>riperiperi</name>
<email>rhy3756547@hotmail.com</email>
</author>
<published>2022-12-29T14:39:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=e20abbf9cc93167aa073a81a65c7e8fd60f259b1'/>
<id>e20abbf9cc93167aa073a81a65c7e8fd60f259b1</id>
<content type='text'>
* Vulkan: Don't flush commands when creating most sync

When the WaitForIdle method is called, we create sync as some internal GPU method may read back written buffer data. Some games randomly intersperse compute dispatch into their render passes, which result in this happening an unbounded number of times depending on how many times they run compute.

Creating sync in Vulkan is expensive, as we need to flush the current command buffer so that it can be waited on. We have a limited number of active command buffers due to how we track resource usage, so submitting too many command buffers will force us to wait for them to return to the pool.

This PR allows less "important" sync (things which are less likely to be waited on) to wait on a command buffer's result without submitting it, instead relying on AutoFlush or another, more important sync to flush it later on.

Because of the possibility of us waiting for a command buffer that hasn't submitted yet, any thread needs to be able to force the active command buffer to submit. The ability to do this has been added to the backend multithreading via an "Interrupt", though it is not supported without multithreading.

OpenGL drivers should already be doing something similar so they don't blow up when creating lots of sync, which is why this hasn't been a problem for these games over there.

Improves Vulkan performance on Xenoblade DE, Pokemon Scarlet/Violet, and Zelda BOTW (still another large issue here)

* Add strict argument

This is technically a separate concern from whether the sync is a host syncpoint.

* Remove _interrupted variable

* Actually wait for the invoke

This is required by AMD GPUs, and also may have caused some issues on other GPUs.

* Remove unused using.

* I don't know why it added these ones.

* Address Feedback

* Fix typo</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Vulkan: Don't flush commands when creating most sync

When the WaitForIdle method is called, we create sync as some internal GPU method may read back written buffer data. Some games randomly intersperse compute dispatch into their render passes, which result in this happening an unbounded number of times depending on how many times they run compute.

Creating sync in Vulkan is expensive, as we need to flush the current command buffer so that it can be waited on. We have a limited number of active command buffers due to how we track resource usage, so submitting too many command buffers will force us to wait for them to return to the pool.

This PR allows less "important" sync (things which are less likely to be waited on) to wait on a command buffer's result without submitting it, instead relying on AutoFlush or another, more important sync to flush it later on.

Because of the possibility of us waiting for a command buffer that hasn't submitted yet, any thread needs to be able to force the active command buffer to submit. The ability to do this has been added to the backend multithreading via an "Interrupt", though it is not supported without multithreading.

OpenGL drivers should already be doing something similar so they don't blow up when creating lots of sync, which is why this hasn't been a problem for these games over there.

Improves Vulkan performance on Xenoblade DE, Pokemon Scarlet/Violet, and Zelda BOTW (still another large issue here)

* Add strict argument

This is technically a separate concern from whether the sync is a host syncpoint.

* Remove _interrupted variable

* Actually wait for the invoke

This is required by AMD GPUs, and also may have caused some issues on other GPUs.

* Remove unused using.

* I don't know why it added these ones.

* Address Feedback

* Fix typo</pre>
</div>
</content>
</entry>
<entry>
<title>GPU: Track buffer migrations and flush source on incomplete copy (#3952)</title>
<updated>2022-12-01T15:30:13+00:00</updated>
<author>
<name>riperiperi</name>
<email>rhy3756547@hotmail.com</email>
</author>
<published>2022-12-01T15:30:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=458452279cee03bfe1bbf2c3daf3fc9722b03a74'/>
<id>458452279cee03bfe1bbf2c3daf3fc9722b03a74</id>
<content type='text'>
* Track buffer migrations and flush source on incomplete copy

Makes sure that the modified range list is always from the latest iteration of the buffer, and flushes earlier iterations of a buffer if the data has not been migrated yet.

* Cleanup 1

* Reduce cost for redundant signal checks on Vulkan

* Only inherit the range list if there are pending ranges.

* Fix OpenGL

* Address Feedback

* Whoops</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Track buffer migrations and flush source on incomplete copy

Makes sure that the modified range list is always from the latest iteration of the buffer, and flushes earlier iterations of a buffer if the data has not been migrated yet.

* Cleanup 1

* Reduce cost for redundant signal checks on Vulkan

* Only inherit the range list if there are pending ranges.

* Fix OpenGL

* Address Feedback

* Whoops</pre>
</div>
</content>
</entry>
<entry>
<title>Use RGBA16 vertex format if RGB16 is not supported on Vulkan (#3552)</title>
<updated>2022-08-20T19:20:27+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2022-08-20T19:20:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=88a0e720cbe567aff67b6124aa9e6cfc17599739'/>
<id>88a0e720cbe567aff67b6124aa9e6cfc17599739</id>
<content type='text'>
* Use RGBA16 vertex format if RGB16 is not supported on Vulkan

* Catch all shader compilation exceptions</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Use RGBA16 vertex format if RGB16 is not supported on Vulkan

* Catch all shader compilation exceptions</pre>
</div>
</content>
</entry>
<entry>
<title>Prefetch capabilities before spawning translation threads. (#3338)</title>
<updated>2022-05-14T14:58:33+00:00</updated>
<author>
<name>riperiperi</name>
<email>rhy3756547@hotmail.com</email>
</author>
<published>2022-05-14T14:58:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=9ba73ffbe5f78c0403cf102b95768f388da05122'/>
<id>9ba73ffbe5f78c0403cf102b95768f388da05122</id>
<content type='text'>
* Prefetch capabilities before spawning translation threads.

The Backend Multithreading only expects one thread to submit commands at a time. When compiling shaders, the translator may request the host GPU capabilities from the backend. It's possible for a bunch of translators to do this at the same time.

There's a caching mechanism in place so that the capabilities are only fetched once. By triggering this before spawning the thread, the async translation threads no longer try to queue onto the backend queue all at the same time.

The Capabilities do need to be checked from the GPU thread, due to OpenGL needing a context to check them, so it's not possible to call the underlying backend directly.

* Initialize the capabilities when setting the GPU thread + missing call in headless

* Remove private variables</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Prefetch capabilities before spawning translation threads.

The Backend Multithreading only expects one thread to submit commands at a time. When compiling shaders, the translator may request the host GPU capabilities from the backend. It's possible for a bunch of translators to do this at the same time.

There's a caching mechanism in place so that the capabilities are only fetched once. By triggering this before spawning the thread, the async translation threads no longer try to queue onto the backend queue all at the same time.

The Capabilities do need to be checked from the GPU thread, due to OpenGL needing a context to check them, so it's not possible to call the underlying backend directly.

* Initialize the capabilities when setting the GPU thread + missing call in headless

* Remove private variables</pre>
</div>
</content>
</entry>
<entry>
<title>New shader cache implementation (#3194)</title>
<updated>2022-04-10T13:49:44+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2022-04-10T13:49:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=43ebd7a9bbba0c1290a9e98b9224f0752627c400'/>
<id>43ebd7a9bbba0c1290a9e98b9224f0752627c400</id>
<content type='text'>
* New shader cache implementation

* Remove some debug code

* Take transform feedback varying count into account

* Create shader cache directory if it does not exist + fragment output map related fixes

* Remove debug code

* Only check texture descriptors if the constant buffer is bound

* Also check CPU VA on GetSpanMapped

* Remove more unused code and move cache related code

* XML docs + remove more unused methods

* Better codegen for TransformFeedbackDescriptor.AsSpan

* Support migration from old cache format, remove more unused code

Shader cache rebuild now also rewrites the shared toc and data files

* Fix migration error with BRX shaders

* Add a limit to the async translation queue

 Avoid async translation threads not being able to keep up and the queue growing very large

* Re-create specialization state on recompile

This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access

* Make shader cache more error resilient

* Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc

* Address early PR feedback

* Fix rebase

* Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly

* Handle some missing exceptions

* Make shader cache purge delete both old and new shader caches

* Register textures on new specialization state

* Translate and compile shaders in forward order (eliminates diffs due to different binding numbers)

* Limit in-flight shader compilation to the maximum number of compilation threads

* Replace ParallelDiskCacheLoader state changed event with a callback function

* Better handling for invalid constant buffer 1 data length

* Do not create the old cache directory structure if the old cache does not exist

* Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented)

* Replace rectangle texture with just coordinate normalization

* Skip incompatible shaders that are missing texture information, instead of crashing

This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable

* Fix coordinates normalization on cubemap textures

* Check if title ID is null before combining shader cache path

* More robust constant buffer address validation on spec state

* More robust constant buffer address validation on spec state (2)

* Regenerate shader cache with one stream, rather than one per shader.

* Only create shader cache directory during initialization

* Logging improvements

* Proper shader program disposal

* PR feedback, and add a comment on serialized structs

* XML docs for RegisterTexture

Co-authored-by: riperiperi &lt;rhy3756547@hotmail.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* New shader cache implementation

* Remove some debug code

* Take transform feedback varying count into account

* Create shader cache directory if it does not exist + fragment output map related fixes

* Remove debug code

* Only check texture descriptors if the constant buffer is bound

* Also check CPU VA on GetSpanMapped

* Remove more unused code and move cache related code

* XML docs + remove more unused methods

* Better codegen for TransformFeedbackDescriptor.AsSpan

* Support migration from old cache format, remove more unused code

Shader cache rebuild now also rewrites the shared toc and data files

* Fix migration error with BRX shaders

* Add a limit to the async translation queue

 Avoid async translation threads not being able to keep up and the queue growing very large

* Re-create specialization state on recompile

This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access

* Make shader cache more error resilient

* Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc

* Address early PR feedback

* Fix rebase

* Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly

* Handle some missing exceptions

* Make shader cache purge delete both old and new shader caches

* Register textures on new specialization state

* Translate and compile shaders in forward order (eliminates diffs due to different binding numbers)

* Limit in-flight shader compilation to the maximum number of compilation threads

* Replace ParallelDiskCacheLoader state changed event with a callback function

* Better handling for invalid constant buffer 1 data length

* Do not create the old cache directory structure if the old cache does not exist

* Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented)

* Replace rectangle texture with just coordinate normalization

* Skip incompatible shaders that are missing texture information, instead of crashing

This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable

* Fix coordinates normalization on cubemap textures

* Check if title ID is null before combining shader cache path

* More robust constant buffer address validation on spec state

* More robust constant buffer address validation on spec state (2)

* Regenerate shader cache with one stream, rather than one per shader.

* Only create shader cache directory during initialization

* Logging improvements

* Proper shader program disposal

* PR feedback, and add a comment on serialized structs

* XML docs for RegisterTexture

Co-authored-by: riperiperi &lt;rhy3756547@hotmail.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>misc: Make PID unsigned long instead of long (#3043)</title>
<updated>2022-02-09T20:18:07+00:00</updated>
<author>
<name>Mary</name>
<email>mary@mary.zone</email>
</author>
<published>2022-02-09T20:18:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=6dffe0fad4bc8dee0e25ce038639d890b29d56a0'/>
<id>6dffe0fad4bc8dee0e25ce038639d890b29d56a0</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Add timestamp to 16-byte/4-word semaphore releases. (#3049)</title>
<updated>2022-01-27T21:50:32+00:00</updated>
<author>
<name>riperiperi</name>
<email>rhy3756547@hotmail.com</email>
</author>
<published>2022-01-27T21:50:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=c52158b73361bd25364c23c2b39780d2e626c858'/>
<id>c52158b73361bd25364c23c2b39780d2e626c858</id>
<content type='text'>
* Add timestamp to 16-byte semaphore releases.

BOTW was reading a ulong 8 bytes after a semaphore return. Turns out this is the timestamp it was trying to do performance calculation with, so I've made it write when necessary.

This mode was also added to the DMA semaphore I added recently, as it is required by a few games. (i think quake?)

The timestamp code has been moved to GPU context. Check other games with an unusually low framerate cap or dynamic resolution to see if they have improved.

* Cast dma semaphore payload to ulong to fill the space

* Write timestamp first

Might be just worrying too much, but we don't want the applcation reading timestamp if it sees the payload before timestamp is written.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Add timestamp to 16-byte semaphore releases.

BOTW was reading a ulong 8 bytes after a semaphore return. Turns out this is the timestamp it was trying to do performance calculation with, so I've made it write when necessary.

This mode was also added to the DMA semaphore I added recently, as it is required by a few games. (i think quake?)

The timestamp code has been moved to GPU context. Check other games with an unusually low framerate cap or dynamic resolution to see if they have improved.

* Cast dma semaphore payload to ulong to fill the space

* Write timestamp first

Might be just worrying too much, but we don't want the applcation reading timestamp if it sees the payload before timestamp is written.</pre>
</div>
</content>
</entry>
<entry>
<title>Add support for BC1/2/3 decompression (for 3D textures) (#2987)</title>
<updated>2022-01-22T18:23:00+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2022-01-22T18:23:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=42c75dbb8f9472f434d0324a37a87e91ee7b50f3'/>
<id>42c75dbb8f9472f434d0324a37a87e91ee7b50f3</id>
<content type='text'>
* Add support for BC1/2/3 decompression (for 3D textures)

* Optimize and clean up

* Unsafe not needed here

* Fix alpha value interpolation when a0 &lt;= a1</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Add support for BC1/2/3 decompression (for 3D textures)

* Optimize and clean up

* Unsafe not needed here

* Fix alpha value interpolation when a0 &lt;= a1</pre>
</div>
</content>
</entry>
<entry>
<title>Texture Sync, incompatible overlap handling, data flush improvements. (#2971)</title>
<updated>2022-01-09T16:28:48+00:00</updated>
<author>
<name>riperiperi</name>
<email>rhy3756547@hotmail.com</email>
</author>
<published>2022-01-09T16:28:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=cda659955ced1b16839cdd1e7fea1ef6f8d99041'/>
<id>cda659955ced1b16839cdd1e7fea1ef6f8d99041</id>
<content type='text'>
* Initial test for texture sync

* WIP new texture flushing setup

* Improve rules for incompatible overlaps

Fixes a lot of issues with Unreal Engine games. Still a few minor issues (some caused by dma fast path?) Needs docs and cleanup.

* Cleanup, improvements

Improve rules for fast DMA

* Small tweak to group together flushes of overlapping handles.

* Fixes, flush overlapping texture data for ASTC and BC4/5 compressed textures.

Fixes the new Life is Strange game.

* Flush overlaps before init data, fix 3d texture size/overlap stuff

* Fix 3D Textures, faster single layer flush

Note: nosy people can no longer merge this with Vulkan. (unless they are nosy enough to implement the new backend methods)

* Remove unused method

* Minor cleanup

* More cleanup

* Use the More Fun and Hopefully No Driver Bugs method for getting compressed tex too

This one's for metro

* Address feedback, ASTC+ETC to FormatClass

* Change offset to use Span slice rather than IntPtr Add

* Fix this too</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Initial test for texture sync

* WIP new texture flushing setup

* Improve rules for incompatible overlaps

Fixes a lot of issues with Unreal Engine games. Still a few minor issues (some caused by dma fast path?) Needs docs and cleanup.

* Cleanup, improvements

Improve rules for fast DMA

* Small tweak to group together flushes of overlapping handles.

* Fixes, flush overlapping texture data for ASTC and BC4/5 compressed textures.

Fixes the new Life is Strange game.

* Flush overlaps before init data, fix 3d texture size/overlap stuff

* Fix 3D Textures, faster single layer flush

Note: nosy people can no longer merge this with Vulkan. (unless they are nosy enough to implement the new backend methods)

* Remove unused method

* Minor cleanup

* More cleanup

* Use the More Fun and Hopefully No Driver Bugs method for getting compressed tex too

This one's for metro

* Address feedback, ASTC+ETC to FormatClass

* Change offset to use Span slice rather than IntPtr Add

* Fix this too</pre>
</div>
</content>
</entry>
</feed>
