<feed xmlns='http://www.w3.org/2005/Atom'>
<title>Ryujinx/ARMeilleure/IntermediateRepresentation, branch master</title>
<subtitle>A backup of the Ryujinx master git branch.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/'/>
<entry>
<title>Move solution and projects to src</title>
<updated>2023-04-27T21:51:14+00:00</updated>
<author>
<name>TSR Berry</name>
<email>20988865+TSRBerry@users.noreply.github.com</email>
</author>
<published>2023-04-07T23:22:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=cee712105850ac3385cd0091a923438167433f9f'/>
<id>cee712105850ac3385cd0091a923438167433f9f</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>ARMeilleure: Respect FZ/RM flags for all floating point operations (#4618)</title>
<updated>2023-04-10T10:22:58+00:00</updated>
<author>
<name>riperiperi</name>
<email>rhy3756547@hotmail.com</email>
</author>
<published>2023-04-10T10:22:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=9db73f74cf77484c4d8b34af54c563c68cabb41e'/>
<id>9db73f74cf77484c4d8b34af54c563c68cabb41e</id>
<content type='text'>
* ARMeilleure: Respect Fz flag for all floating point operations.

This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32.

The new strategy is to set the Fz flag only in the following circumstances:

- Set to match FPCR before translated functions/loop are executed.
- Reset when calling SoftFloat methods, set when returning.
- Reset when exiting execution.

This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code.

Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM.

This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero.

This is draft right now because I need to answer the questions:
- Does dotnet avoid changing the value of Mxcsr?
- Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat?
- If we assume that, do we want a unit test to verify the behaviour?

I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611.

* Remove unused method

* Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls

...when available.

Similar implementation to A32

* Use FMA for Frecps, Frsqrts

* Don't set DAZ.

* Add round mode to ARM FP mode

* Fix mistakes

* Add test for FP state when calling managed methods

* Add explanatory comment to test.

* Cleanup

* Add A64 FPCR flags

* Vrintx_S A32 fast path on A64 backend

* Address feedback 1, re-enable DAZ

* Fix FMA instructions By Elem

* Address feedback</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* ARMeilleure: Respect Fz flag for all floating point operations.

This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32.

The new strategy is to set the Fz flag only in the following circumstances:

- Set to match FPCR before translated functions/loop are executed.
- Reset when calling SoftFloat methods, set when returning.
- Reset when exiting execution.

This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code.

Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM.

This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero.

This is draft right now because I need to answer the questions:
- Does dotnet avoid changing the value of Mxcsr?
- Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat?
- If we assume that, do we want a unit test to verify the behaviour?

I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611.

* Remove unused method

* Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls

...when available.

Similar implementation to A32

* Use FMA for Frecps, Frsqrts

* Don't set DAZ.

* Add round mode to ARM FP mode

* Fix mistakes

* Add test for FP state when calling managed methods

* Add explanatory comment to test.

* Cleanup

* Add A64 FPCR flags

* Vrintx_S A32 fast path on A64 backend

* Address feedback 1, re-enable DAZ

* Fix FMA instructions By Elem

* Address feedback</pre>
</div>
</content>
</entry>
<entry>
<title>ARMeilleure: Add initial support for AVX512 (EVEX encoding) (cont) (#4147)</title>
<updated>2023-03-20T19:09:24+00:00</updated>
<author>
<name>Wunk</name>
<email>wunkolo@gmail.com</email>
</author>
<published>2023-03-20T19:09:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=17620d18db8d4a67e4b917596c760107d26fadc5'/>
<id>17620d18db8d4a67e4b917596c760107d26fadc5</id>
<content type='text'>
* ARMeilleure: Add AVX512{F,VL,DQ,BW} detection

Add `UseAvx512Ortho` and `UseAvx512OrthoFloat` optimization flags as
short-hands for `F+VL` and `F+VL+DQ`.

* ARMeilleure: Add initial support for EVEX instruction encoding

Does not implement rounding, or exception controls.

* ARMeilleure: Add `X86Vpternlogd`

Accelerates the vector-`Not` instruction.

* ARMeilleure: Add check for `OSXSAVE` for AVX{2,512}

* ARMeilleure: Add check for `XCR0` flags

Add XCR0 register checks for AVX and AVX512F, following the guidelines
from section 14.3 and 15.2 from the Intel Architecture Software
Developer's Manual.

* ARMeilleure: Remove redundant `ReProtect` and `Dispose`, formatting

* ARMeilleure: Move XCR0 procedure to GetXcr0Eax

* ARMeilleure: Add `XCR0` to `FeatureInfo` structure

* ARMeilleure: Utilize `ReadOnlySpan` for Xcr0 assembly

Avoids an additional allocation

* ARMeilleure: Formatting fixes

* ARMeilleure: Fix EVEX encoding src2 register index

&gt; Just like in VEX prefix, vvvv is provided in inverted form.

* ARMeilleure: Add `X86Vpternlogd` acceleration to `Vmvn_I`

Passes unit tests, verified instruction utilization

* ARMeilleure: Fix EVEX register operand designations

Operand 2 was being sourced improperly.

EVEX encoded instructions source their operands like so:
Operand 1: ModRM:reg
Operand 2: EVEX.vvvvv
Operand 3: ModRM:r/m
Operand 4: Imm

This fixes the improper register designations when emitting vpternlog.
Now "dest", "src1", "src2" arguments emit in the proper order in EVEX instructions.

* ARMeilleure: Add `X86Vpternlogd` acceleration to `Orn_V`

* ARMeilleure: PTC version bump

* ARMeilleure: Update EVEX encoding Debug.Assert to Debug.Fail

* ARMeilleure: Update EVEX encoding comment capitalization</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* ARMeilleure: Add AVX512{F,VL,DQ,BW} detection

Add `UseAvx512Ortho` and `UseAvx512OrthoFloat` optimization flags as
short-hands for `F+VL` and `F+VL+DQ`.

* ARMeilleure: Add initial support for EVEX instruction encoding

Does not implement rounding, or exception controls.

* ARMeilleure: Add `X86Vpternlogd`

Accelerates the vector-`Not` instruction.

* ARMeilleure: Add check for `OSXSAVE` for AVX{2,512}

* ARMeilleure: Add check for `XCR0` flags

Add XCR0 register checks for AVX and AVX512F, following the guidelines
from section 14.3 and 15.2 from the Intel Architecture Software
Developer's Manual.

* ARMeilleure: Remove redundant `ReProtect` and `Dispose`, formatting

* ARMeilleure: Move XCR0 procedure to GetXcr0Eax

* ARMeilleure: Add `XCR0` to `FeatureInfo` structure

* ARMeilleure: Utilize `ReadOnlySpan` for Xcr0 assembly

Avoids an additional allocation

* ARMeilleure: Formatting fixes

* ARMeilleure: Fix EVEX encoding src2 register index

&gt; Just like in VEX prefix, vvvv is provided in inverted form.

* ARMeilleure: Add `X86Vpternlogd` acceleration to `Vmvn_I`

Passes unit tests, verified instruction utilization

* ARMeilleure: Fix EVEX register operand designations

Operand 2 was being sourced improperly.

EVEX encoded instructions source their operands like so:
Operand 1: ModRM:reg
Operand 2: EVEX.vvvvv
Operand 3: ModRM:r/m
Operand 4: Imm

This fixes the improper register designations when emitting vpternlog.
Now "dest", "src1", "src2" arguments emit in the proper order in EVEX instructions.

* ARMeilleure: Add `X86Vpternlogd` acceleration to `Orn_V`

* ARMeilleure: PTC version bump

* ARMeilleure: Update EVEX encoding Debug.Assert to Debug.Fail

* ARMeilleure: Update EVEX encoding comment capitalization</pre>
</div>
</content>
</entry>
<entry>
<title>Implement JIT Arm64 backend (#4114)</title>
<updated>2023-01-10T22:16:59+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2023-01-10T22:16:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=5e0f8e873857ce3ca3f532aff0936beb28e412c8'/>
<id>5e0f8e873857ce3ca3f532aff0936beb28e412c8</id>
<content type='text'>
* Implement JIT Arm64 backend

* PPTC version bump

* Address some feedback from Arm64 JIT PR

* Address even more PR feedback

* Remove unused IsPageAligned function

* Sync Qc flag before calls

* Fix comment and remove unused enum

* Address riperiperi PR feedback

* Delete Breakpoint IR instruction that was only implemented for Arm64</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Implement JIT Arm64 backend

* PPTC version bump

* Address some feedback from Arm64 JIT PR

* Address even more PR feedback

* Remove unused IsPageAligned function

* Sync Qc flag before calls

* Fix comment and remove unused enum

* Address riperiperi PR feedback

* Delete Breakpoint IR instruction that was only implemented for Arm64</pre>
</div>
</content>
</entry>
<entry>
<title>Use new ArgumentNullException and ObjectDisposedException throw-helper API (#4163)</title>
<updated>2022-12-27T19:27:11+00:00</updated>
<author>
<name>Berkan Diler</name>
<email>berkan.diler1@ingka.ikea.com</email>
</author>
<published>2022-12-27T19:27:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=0d3b82477ecbf7128340b6725a79413427c68748'/>
<id>0d3b82477ecbf7128340b6725a79413427c68748</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>ARMeilleure: Hash _data pointer instead of value for Operand (#4156)</title>
<updated>2022-12-21T01:28:18+00:00</updated>
<author>
<name>riperiperi</name>
<email>rhy3756547@hotmail.com</email>
</author>
<published>2022-12-21T01:28:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=c200a7b7c668acc733a0c20df865662b86859746'/>
<id>c200a7b7c668acc733a0c20df865662b86859746</id>
<content type='text'>
I noticed a weirdly high cost for dictionary accesses from MarkLabel etc. Turns out that the hash code was always the same for labels, so the whole point of having a dictionary was missed and it was putting everything in the same bucket. I made it always hash the _data pointer as that's a good source of identifiable and "random" data.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I noticed a weirdly high cost for dictionary accesses from MarkLabel etc. Turns out that the hash code was always the same for labels, so the whole point of having a dictionary was missed and it was putting everything in the same bucket. I made it always hash the _data pointer as that's a good source of identifiable and "random" data.</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "ARMeilleure: Add initial support for AVX512(EVEX encoding) (#3663)" (#4145)</title>
<updated>2022-12-18T23:21:10+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2022-12-18T23:21:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=f93c5f006a44fb4b32ad6e7afdaaaf737b115ace'/>
<id>f93c5f006a44fb4b32ad6e7afdaaaf737b115ace</id>
<content type='text'>
This reverts commit 295fbd0542a93ac50e558054a3f0c8c64286b764.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 295fbd0542a93ac50e558054a3f0c8c64286b764.</pre>
</div>
</content>
</entry>
<entry>
<title>ARMeilleure: Add initial support for AVX512(EVEX encoding) (#3663)</title>
<updated>2022-12-18T19:46:13+00:00</updated>
<author>
<name>Wunk</name>
<email>wunkolo@gmail.com</email>
</author>
<published>2022-12-18T19:46:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=295fbd0542a93ac50e558054a3f0c8c64286b764'/>
<id>295fbd0542a93ac50e558054a3f0c8c64286b764</id>
<content type='text'>
* ARMeilleure: Add AVX512{F,VL,DQ,BW} detection

Add `UseAvx512Ortho` and `UseAvx512OrthoFloat` optimization flags as
short-hands for `F+VL` and `F+VL+DQ`.

* ARMeilleure: Add initial support for EVEX instruction encoding

Does not implement rounding, or exception controls.

* ARMeilleure: Add `X86Vpternlogd`

Accelerates the vector-`Not` instruction.

* ARMeilleure: Add check for `OSXSAVE` for AVX{2,512}

* ARMeilleure: Add check for `XCR0` flags

Add XCR0 register checks for AVX and AVX512F, following the guidelines
from section 14.3 and 15.2 from the Intel Architecture Software
Developer's Manual.

* ARMeilleure: Increment InternalVersion

* ARMeilleure: Remove redundant `ReProtect` and `Dispose`, formatting

* ARMeilleure: Move XCR0 procedure to GetXcr0Eax

* ARMeilleure: Add `XCR0` to `FeatureInfo` structure

* ARMeilleure: Utilize `ReadOnlySpan` for Xcr0 assembly

Avoids an additional allocation

* ARMeilleure: Formatting fixes</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* ARMeilleure: Add AVX512{F,VL,DQ,BW} detection

Add `UseAvx512Ortho` and `UseAvx512OrthoFloat` optimization flags as
short-hands for `F+VL` and `F+VL+DQ`.

* ARMeilleure: Add initial support for EVEX instruction encoding

Does not implement rounding, or exception controls.

* ARMeilleure: Add `X86Vpternlogd`

Accelerates the vector-`Not` instruction.

* ARMeilleure: Add check for `OSXSAVE` for AVX{2,512}

* ARMeilleure: Add check for `XCR0` flags

Add XCR0 register checks for AVX and AVX512F, following the guidelines
from section 14.3 and 15.2 from the Intel Architecture Software
Developer's Manual.

* ARMeilleure: Increment InternalVersion

* ARMeilleure: Remove redundant `ReProtect` and `Dispose`, formatting

* ARMeilleure: Move XCR0 procedure to GetXcr0Eax

* ARMeilleure: Add `XCR0` to `FeatureInfo` structure

* ARMeilleure: Utilize `ReadOnlySpan` for Xcr0 assembly

Avoids an additional allocation

* ARMeilleure: Formatting fixes</pre>
</div>
</content>
</entry>
<entry>
<title>Make structs readonly when applicable (#4002)</title>
<updated>2022-12-05T13:47:39+00:00</updated>
<author>
<name>Andrey Sukharev</name>
<email>SukharevAndrey@users.noreply.github.com</email>
</author>
<published>2022-12-05T13:47:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=4da44e09cb2a32f69b4a6b47221117b78e4618dc'/>
<id>4da44e09cb2a32f69b4a6b47221117b78e4618dc</id>
<content type='text'>
* Make all structs readonly when applicable. It should reduce amount of needless defensive copies

* Make structs with trivial boilerplate equality code record structs

* Remove unnecessary readonly modifiers from TextureCreateInfo

* Make BitMap structs readonly too</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Make all structs readonly when applicable. It should reduce amount of needless defensive copies

* Make structs with trivial boilerplate equality code record structs

* Remove unnecessary readonly modifiers from TextureCreateInfo

* Make BitMap structs readonly too</pre>
</div>
</content>
</entry>
<entry>
<title>ARMeilleure: Add `gfni` acceleration (#3669)</title>
<updated>2022-10-02T09:17:19+00:00</updated>
<author>
<name>Wunk</name>
<email>wunkolo@gmail.com</email>
</author>
<published>2022-10-02T09:17:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=45ce540b9b756f372840e923b73cfd7e3edd85f8'/>
<id>45ce540b9b756f372840e923b73cfd7e3edd85f8</id>
<content type='text'>
* ARMeilleure: Add `GFNI` detection

This is intended for utilizing the `gf2p8affineqb` instruction

* ARMeilleure: Add `gf2p8affineqb`

Not using the VEX or EVEX-form of this instruction is intentional. There
are `GFNI`-chips that do not support AVX(so no VEX encoding) such as
Tremont(Lakefield) chips as well as Jasper Lake.

https://github.com/InstLatx64/InstLatx64/blob/13df339fe7150b114929f71b19a6b2fe72fc751e/GenuineIntel/GenuineIntel00806A1_Lakefield_LC_InstLatX64.txt#L1297-L1299

https://github.com/InstLatx64/InstLatx64/blob/13df339fe7150b114929f71b19a6b2fe72fc751e/GenuineIntel/GenuineIntel00906C0_JasperLake_InstLatX64.txt#L1252-L1254

* ARMeilleure: Add `gfni` acceleration of `Rbit_V`

Passes all `Rbit_V*` unit tests on my `i9-11900k`

* ARMeilleure: Add `gfni` acceleration of `S{l,r}i_V`

Also added a fast-path for when the shift amount is greater than the
size of the element.

* ARMeilleure: Add `gfni` acceleration of `Shl_V` and `Sshr_V`

* ARMeilleure: Increment InternalVersion

* ARMeilleure: Fix Intrinsic and Assembler Table alignment

`gf2p8affineqb` is the longest instruction name I know of. It shouldn't
get any wider than this.

* ARMeilleure: Remove SSE2+SHA requirement for GFNI

* ARMeilleure Add `X86GetGf2p8LogicalShiftLeft`

Used to generate GF(2^8) 8x8 bit-matrices for bit-shifting for the `gf2p8affineqb` instruction.

* ARMeilleure: Append `FeatureInfo7Ecx` to `FeatureInfo`</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* ARMeilleure: Add `GFNI` detection

This is intended for utilizing the `gf2p8affineqb` instruction

* ARMeilleure: Add `gf2p8affineqb`

Not using the VEX or EVEX-form of this instruction is intentional. There
are `GFNI`-chips that do not support AVX(so no VEX encoding) such as
Tremont(Lakefield) chips as well as Jasper Lake.

https://github.com/InstLatx64/InstLatx64/blob/13df339fe7150b114929f71b19a6b2fe72fc751e/GenuineIntel/GenuineIntel00806A1_Lakefield_LC_InstLatX64.txt#L1297-L1299

https://github.com/InstLatx64/InstLatx64/blob/13df339fe7150b114929f71b19a6b2fe72fc751e/GenuineIntel/GenuineIntel00906C0_JasperLake_InstLatX64.txt#L1252-L1254

* ARMeilleure: Add `gfni` acceleration of `Rbit_V`

Passes all `Rbit_V*` unit tests on my `i9-11900k`

* ARMeilleure: Add `gfni` acceleration of `S{l,r}i_V`

Also added a fast-path for when the shift amount is greater than the
size of the element.

* ARMeilleure: Add `gfni` acceleration of `Shl_V` and `Sshr_V`

* ARMeilleure: Increment InternalVersion

* ARMeilleure: Fix Intrinsic and Assembler Table alignment

`gf2p8affineqb` is the longest instruction name I know of. It shouldn't
get any wider than this.

* ARMeilleure: Remove SSE2+SHA requirement for GFNI

* ARMeilleure Add `X86GetGf2p8LogicalShiftLeft`

Used to generate GF(2^8) 8x8 bit-matrices for bit-shifting for the `gf2p8affineqb` instruction.

* ARMeilleure: Append `FeatureInfo7Ecx` to `FeatureInfo`</pre>
</div>
</content>
</entry>
</feed>
