<feed xmlns='http://www.w3.org/2005/Atom'>
<title>Ryujinx/ARMeilleure/State, branch master</title>
<subtitle>A backup of the Ryujinx master git branch.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/'/>
<entry>
<title>Move solution and projects to src</title>
<updated>2023-04-27T21:51:14+00:00</updated>
<author>
<name>TSR Berry</name>
<email>20988865+TSRBerry@users.noreply.github.com</email>
</author>
<published>2023-04-07T23:22:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=cee712105850ac3385cd0091a923438167433f9f'/>
<id>cee712105850ac3385cd0091a923438167433f9f</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>ARMeilleure: Move TPIDR_EL0 and TPIDRRO_EL0 to NativeContext (#4661)</title>
<updated>2023-04-11T06:55:04+00:00</updated>
<author>
<name>riperiperi</name>
<email>rhy3756547@hotmail.com</email>
</author>
<published>2023-04-11T06:55:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=9ef94c8292beda825fa76e05ad2e561c6d571c95'/>
<id>9ef94c8292beda825fa76e05ad2e561c6d571c95</id>
<content type='text'>
* ARMeilleure: Move TPIDR_EL0 and TPIDRRO_EL0 to NativeContext

Some games access these system registers several tens of thousands of times in a second from many different threads. While this isn't really crippling, it is a lot of wasted time spent in a reverse pinvoke transition.

Example games are Pokemon Scarlet/Violet and BOTW. These games have a lot of different potential bottlenecks so it's unlikely you will see a consistent improvement, but it definitely disappears from the cpu profile.

* Remove unreachable code.

* Add ulong conversion for offsets

* Nit</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* ARMeilleure: Move TPIDR_EL0 and TPIDRRO_EL0 to NativeContext

Some games access these system registers several tens of thousands of times in a second from many different threads. While this isn't really crippling, it is a lot of wasted time spent in a reverse pinvoke transition.

Example games are Pokemon Scarlet/Violet and BOTW. These games have a lot of different potential bottlenecks so it's unlikely you will see a consistent improvement, but it definitely disappears from the cpu profile.

* Remove unreachable code.

* Add ulong conversion for offsets

* Nit</pre>
</div>
</content>
</entry>
<entry>
<title>A64: Add fast path for Fcvtas_Gp/S/V, Fcvtau_Gp/S/V and Frinta_S/V in… (#3712)</title>
<updated>2022-10-19T00:21:33+00:00</updated>
<author>
<name>LDj3SNuD</name>
<email>35856442+LDj3SNuD@users.noreply.github.com</email>
</author>
<published>2022-10-19T00:21:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=5af8ce7c38d0b5c910a271ff4a43313850b49a59'/>
<id>5af8ce7c38d0b5c910a271ff4a43313850b49a59</id>
<content type='text'>
* A64: Add fast path for Fcvtas_Gp/S/V, Fcvtau_Gp/S/V and Frinta_S/V instructions;

they use "Round to Nearest with Ties to Away" rounding mode not supported in x86.

All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq.

The titles Mario Strikers and Super Smash Bros. U. use these instructions intensively.

* Update Ptc.cs

* A32: Add fast path for Vcvta_RM, Vrinta_RM and Vrinta_V instructions aswell.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* A64: Add fast path for Fcvtas_Gp/S/V, Fcvtau_Gp/S/V and Frinta_S/V instructions;

they use "Round to Nearest with Ties to Away" rounding mode not supported in x86.

All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq.

The titles Mario Strikers and Super Smash Bros. U. use these instructions intensively.

* Update Ptc.cs

* A32: Add fast path for Vcvta_RM, Vrinta_RM and Vrinta_V instructions aswell.</pre>
</div>
</content>
</entry>
<entry>
<title>Fpsr and Fpcr freed. (#3701)</title>
<updated>2022-09-20T21:55:13+00:00</updated>
<author>
<name>LDj3SNuD</name>
<email>35856442+LDj3SNuD@users.noreply.github.com</email>
</author>
<published>2022-09-20T21:55:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=814f75142e8e4e81741a859db8189ca60535f3e6'/>
<id>814f75142e8e4e81741a859db8189ca60535f3e6</id>
<content type='text'>
* Implemented in IR the managed methods of the Saturating region ...

... of the SoftFallback class (the SatQ ones).

The need to natively manage the Fpcr and Fpsr system registers is still a fact.

Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones).

All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq.

* Ptc.InternalVersion = 3665

* Addressed PR feedback.

* Implemented in IR the managed methods of the ShlReg region of the SoftFallback class.

It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665).

All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq.

* Fpsr and Fpcr freed.

Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this.

Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS.

Depends on shlreg.

* Update InstEmitSimdHelper.cs

* De-magic Masks.

Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions.

* Addressed PR feedback.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Implemented in IR the managed methods of the Saturating region ...

... of the SoftFallback class (the SatQ ones).

The need to natively manage the Fpcr and Fpsr system registers is still a fact.

Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones).

All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq.

* Ptc.InternalVersion = 3665

* Addressed PR feedback.

* Implemented in IR the managed methods of the ShlReg region of the SoftFallback class.

It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665).

All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq.

* Fpsr and Fpcr freed.

Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this.

Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS.

Depends on shlreg.

* Update InstEmitSimdHelper.cs

* De-magic Masks.

Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions.

* Addressed PR feedback.</pre>
</div>
</content>
</entry>
<entry>
<title>Implement PLD and SUB (imm16) on T32, plus UADD8, SADD8, USUB8 and SSUB8 on both A32 and T32 (#3693)</title>
<updated>2022-09-13T22:51:40+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2022-09-13T22:51:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=8e119a1e9694416289c4c4c7ed38c408c096f9a2'/>
<id>8e119a1e9694416289c4c4c7ed38c408c096f9a2</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Refactor CPU interface to allow the implementation of other CPU emulators (#3362)</title>
<updated>2022-05-31T19:29:35+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2022-05-31T19:29:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=0c87bf9ea4b0655fc6b90b4ab20e93f237e7549b'/>
<id>0c87bf9ea4b0655fc6b90b4ab20e93f237e7549b</id>
<content type='text'>
* Refactor CPU interface

* Use IExecutionContext interface on SVC handler, change how CPU interrupts invokes the handlers

* Make CpuEngine take a ITickSource rather than returning one

The previous implementation had the scenario where the CPU engine had to implement the tick source in mind, like for example, when we have a hypervisor and the game can read CNTPCT on the host directly. However given that we need to do conversion due to different frequencies anyway, it's not worth it. It's better to just let the user pass the tick source and redirect any reads to CNTPCT to the user tick source

* XML docs for the public interfaces

* PPTC invalidation due to NativeInterface function name changes

* Fix build of the CPU tests

* PR feedback</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Refactor CPU interface

* Use IExecutionContext interface on SVC handler, change how CPU interrupts invokes the handlers

* Make CpuEngine take a ITickSource rather than returning one

The previous implementation had the scenario where the CPU engine had to implement the tick source in mind, like for example, when we have a hypervisor and the game can read CNTPCT on the host directly. However given that we need to do conversion due to different frequencies anyway, it's not worth it. It's better to just let the user pass the tick source and redirect any reads to CNTPCT to the user tick source

* XML docs for the public interfaces

* PPTC invalidation due to NativeInterface function name changes

* Fix build of the CPU tests

* PR feedback</pre>
</div>
</content>
</entry>
<entry>
<title>KThread: Fix GetPsr mask (#3180)</title>
<updated>2022-03-11T02:16:32+00:00</updated>
<author>
<name>merry</name>
<email>git@mary.rs</email>
</author>
<published>2022-03-11T02:16:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=bb2f9df0a1d5e7cbd333c39cd485a42a19a772dc'/>
<id>bb2f9df0a1d5e7cbd333c39cd485a42a19a772dc</id>
<content type='text'>
* ExecutionContext: GetPstate / SetPstate

* Put it in NativeContext

* KThread: Fix GetPsr mask

* ExecutionContext: Turn methods into Pstate property

* Address nit</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* ExecutionContext: GetPstate / SetPstate

* Put it in NativeContext

* KThread: Fix GetPsr mask

* ExecutionContext: Turn methods into Pstate property

* Address nit</pre>
</div>
</content>
</entry>
<entry>
<title>Implement a "Pause Emulation" option &amp; hotkey (#2428)</title>
<updated>2021-09-11T20:08:25+00:00</updated>
<author>
<name>mpnico</name>
<email>mpnico@gmail.com</email>
</author>
<published>2021-09-11T20:08:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=117e32a6fffc30cdb895aa98483af7df353a8dd1'/>
<id>117e32a6fffc30cdb895aa98483af7df353a8dd1</id>
<content type='text'>
* Add a "Pause Emulation" option and hotkey

Closes Ryujinx#1604

* Refactoring how pause is handled

* Applied suggested changes from review

* Applied suggested fixes

* Pass correct suspend type to threads for suspend/resume

* Fix NRE after stoping emulation

* Removing SimulateWakeUpMessage call after resuming emulation

* Skip suspending non game process

* Pause the tickCounter in the ExecutionContext

* Refactoring tickCounter pause/resume as suggested

* Fix Config migration to add pause hotkey

* Fixed pausing only application threads

* Fix exiting emulator while paused

* Avoid pause/resume while already paused/resumed

* Cleanup unused code

* Avoid restarting audio if stopping emulation while in pause.

* Added suggested changes

* Fix ConfigurationState</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Add a "Pause Emulation" option and hotkey

Closes Ryujinx#1604

* Refactoring how pause is handled

* Applied suggested changes from review

* Applied suggested fixes

* Pass correct suspend type to threads for suspend/resume

* Fix NRE after stoping emulation

* Removing SimulateWakeUpMessage call after resuming emulation

* Skip suspending non game process

* Pause the tickCounter in the ExecutionContext

* Refactoring tickCounter pause/resume as suggested

* Fix Config migration to add pause hotkey

* Fixed pausing only application threads

* Fix exiting emulator while paused

* Avoid pause/resume while already paused/resumed

* Cleanup unused code

* Avoid restarting audio if stopping emulation while in pause.

* Added suggested changes

* Fix ConfigurationState</pre>
</div>
</content>
</entry>
<entry>
<title>Add multi-level function table (#2228)</title>
<updated>2021-05-29T21:06:28+00:00</updated>
<author>
<name>FICTURE7</name>
<email>FICTURE7@gmail.com</email>
</author>
<published>2021-05-29T21:06:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=9d7627af6484e090ebbc3209bc7301f0bdf47d24'/>
<id>9d7627af6484e090ebbc3209bc7301f0bdf47d24</id>
<content type='text'>
* Add AddressTable&lt;T&gt;

* Use AddressTable&lt;T&gt; for dispatch

* Remove JumpTable &amp; co.

* Add fallback for out of range addresses

* Add PPTC support

* Add documentation to `AddressTable&lt;T&gt;`

* Make AddressTable&lt;T&gt; configurable

* Fix table walk

* Fix IsMapped check

* Remove CountTableCapacity

* Add PPTC support for fast path

* Rename IsMapped to IsValid

* Remove stale comment

* Change format of address in exception message

* Add TranslatorStubs

* Split DispatchStub

Avoids recompilation of stubs during tests.

* Add hint for 64bit or 32bit

* Add documentation to `Symbol`

* Add documentation to `TranslatorStubs`

Make `TranslatorStubs` disposable as well.

* Add documentation to `SymbolType`

* Add `AddressTableEventSource` to monitor function table size

Add an EventSource which measures the amount of unmanaged bytes
allocated by AddressTable&lt;T&gt; instances.

 dotnet-counters monitor -n Ryujinx --counters ARMeilleure

* Add `AllowLcqInFunctionTable` optimization toggle

This is to reduce the impact this change has on the test duration.
Before everytime a test was ran, the FunctionTable would be initialized
and populated so that the newly compiled test would get registered to
it.

* Implement unmanaged dispatcher

Uses the DispatchStub to dispatch into the next translation, which
allows execution to stay in unmanaged for longer and skips a
ConcurrentDictionary look up when the target translation has been
registered to the FunctionTable.

* Remove redundant null check

* Tune levels of FunctionTable

Uses 5 levels instead of 4 and change unit of AddressTableEventSource
from KB to MB.

* Use 64-bit function table

Improves codegen for direct branches:

    mov qword [rax+0x408],0x10603560
 -  mov rcx,sub_10603560_OFFSET
 -  mov ecx,[rcx]
 -  mov ecx,ecx
 -  mov rdx,JIT_CACHE_BASE
 -  add rdx,rcx
 +  mov rcx,sub_10603560
 +  mov rdx,[rcx]
    mov rcx,rax

Improves codegen for dispatch stub:

    and rax,byte +0x1f
 -  mov eax,[rcx+rax*4]
 -  mov eax,eax
 -  mov rcx,JIT_CACHE_BASE
 -  lea rax,[rcx+rax]
 +  mov rax,[rcx+rax*8]
    mov rcx,rbx

* Remove `JitCacheSymbol` &amp; `JitCache.Offset`

* Turn `Translator.Translate` into an instance method

We do not have to add more parameter to this method and related ones as
new structures are added &amp; needed for translation.

* Add symbol only when PTC is enabled

Address LDj3SNuD's feedback

* Change `NativeContext.Running` to a 32-bit integer

* Fix PageTable symbol for host mapped</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Add AddressTable&lt;T&gt;

* Use AddressTable&lt;T&gt; for dispatch

* Remove JumpTable &amp; co.

* Add fallback for out of range addresses

* Add PPTC support

* Add documentation to `AddressTable&lt;T&gt;`

* Make AddressTable&lt;T&gt; configurable

* Fix table walk

* Fix IsMapped check

* Remove CountTableCapacity

* Add PPTC support for fast path

* Rename IsMapped to IsValid

* Remove stale comment

* Change format of address in exception message

* Add TranslatorStubs

* Split DispatchStub

Avoids recompilation of stubs during tests.

* Add hint for 64bit or 32bit

* Add documentation to `Symbol`

* Add documentation to `TranslatorStubs`

Make `TranslatorStubs` disposable as well.

* Add documentation to `SymbolType`

* Add `AddressTableEventSource` to monitor function table size

Add an EventSource which measures the amount of unmanaged bytes
allocated by AddressTable&lt;T&gt; instances.

 dotnet-counters monitor -n Ryujinx --counters ARMeilleure

* Add `AllowLcqInFunctionTable` optimization toggle

This is to reduce the impact this change has on the test duration.
Before everytime a test was ran, the FunctionTable would be initialized
and populated so that the newly compiled test would get registered to
it.

* Implement unmanaged dispatcher

Uses the DispatchStub to dispatch into the next translation, which
allows execution to stay in unmanaged for longer and skips a
ConcurrentDictionary look up when the target translation has been
registered to the FunctionTable.

* Remove redundant null check

* Tune levels of FunctionTable

Uses 5 levels instead of 4 and change unit of AddressTableEventSource
from KB to MB.

* Use 64-bit function table

Improves codegen for direct branches:

    mov qword [rax+0x408],0x10603560
 -  mov rcx,sub_10603560_OFFSET
 -  mov ecx,[rcx]
 -  mov ecx,ecx
 -  mov rdx,JIT_CACHE_BASE
 -  add rdx,rcx
 +  mov rcx,sub_10603560
 +  mov rdx,[rcx]
    mov rcx,rax

Improves codegen for dispatch stub:

    and rax,byte +0x1f
 -  mov eax,[rcx+rax*4]
 -  mov eax,eax
 -  mov rcx,JIT_CACHE_BASE
 -  lea rax,[rcx+rax]
 +  mov rax,[rcx+rax*8]
    mov rcx,rbx

* Remove `JitCacheSymbol` &amp; `JitCache.Offset`

* Turn `Translator.Translate` into an instance method

We do not have to add more parameter to this method and related ones as
new structures are added &amp; needed for translation.

* Add symbol only when PTC is enabled

Address LDj3SNuD's feedback

* Change `NativeContext.Running` to a 32-bit integer

* Fix PageTable symbol for host mapped</pre>
</div>
</content>
</entry>
<entry>
<title>PPTC &amp; Pool Enhancements. (#1968)</title>
<updated>2021-02-22T02:23:48+00:00</updated>
<author>
<name>LDj3SNuD</name>
<email>35856442+LDj3SNuD@users.noreply.github.com</email>
</author>
<published>2021-02-22T02:23:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=dc0adb533dc15a007e9ca2dc0533ef6a61f13393'/>
<id>dc0adb533dc15a007e9ca2dc0533ef6a61f13393</id>
<content type='text'>
* PPTC &amp; Pool Enhancements.

* Avoid buffer allocations in CodeGenContext.GetCode(). Avoid stream allocations in PTC.PtcInfo.

Refactoring/nits.

* Use XXHash128, for Ptc.Load &amp; Ptc.Save, x10 faster than Md5.

* Why not a nice Span.

* Added a simple PtcFormatter library for deserialization/serialization, which does not require reflection, in use at PtcJumpTable and PtcProfiler; improves maintainability and simplicity/readability of affected code.

* Nits.

* Revert #1987.

* Revert "Revert #1987."

This reverts commit 998be765cf7f7da5ff0c1c08de704c9012b0f49c.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* PPTC &amp; Pool Enhancements.

* Avoid buffer allocations in CodeGenContext.GetCode(). Avoid stream allocations in PTC.PtcInfo.

Refactoring/nits.

* Use XXHash128, for Ptc.Load &amp; Ptc.Save, x10 faster than Md5.

* Why not a nice Span.

* Added a simple PtcFormatter library for deserialization/serialization, which does not require reflection, in use at PtcJumpTable and PtcProfiler; improves maintainability and simplicity/readability of affected code.

* Nits.

* Revert #1987.

* Revert "Revert #1987."

This reverts commit 998be765cf7f7da5ff0c1c08de704c9012b0f49c.</pre>
</div>
</content>
</entry>
</feed>
