<feed xmlns='http://www.w3.org/2005/Atom'>
<title>Ryujinx/ARMeilleure/Translation, branch master</title>
<subtitle>A backup of the Ryujinx master git branch.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/'/>
<entry>
<title>Move solution and projects to src</title>
<updated>2023-04-27T21:51:14+00:00</updated>
<author>
<name>TSR Berry</name>
<email>20988865+TSRBerry@users.noreply.github.com</email>
</author>
<published>2023-04-07T23:22:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=cee712105850ac3385cd0091a923438167433f9f'/>
<id>cee712105850ac3385cd0091a923438167433f9f</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>ARMeilleure: Move TPIDR_EL0 and TPIDRRO_EL0 to NativeContext (#4661)</title>
<updated>2023-04-11T06:55:04+00:00</updated>
<author>
<name>riperiperi</name>
<email>rhy3756547@hotmail.com</email>
</author>
<published>2023-04-11T06:55:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=9ef94c8292beda825fa76e05ad2e561c6d571c95'/>
<id>9ef94c8292beda825fa76e05ad2e561c6d571c95</id>
<content type='text'>
* ARMeilleure: Move TPIDR_EL0 and TPIDRRO_EL0 to NativeContext

Some games access these system registers several tens of thousands of times in a second from many different threads. While this isn't really crippling, it is a lot of wasted time spent in a reverse pinvoke transition.

Example games are Pokemon Scarlet/Violet and BOTW. These games have a lot of different potential bottlenecks so it's unlikely you will see a consistent improvement, but it definitely disappears from the cpu profile.

* Remove unreachable code.

* Add ulong conversion for offsets

* Nit</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* ARMeilleure: Move TPIDR_EL0 and TPIDRRO_EL0 to NativeContext

Some games access these system registers several tens of thousands of times in a second from many different threads. While this isn't really crippling, it is a lot of wasted time spent in a reverse pinvoke transition.

Example games are Pokemon Scarlet/Violet and BOTW. These games have a lot of different potential bottlenecks so it's unlikely you will see a consistent improvement, but it definitely disappears from the cpu profile.

* Remove unreachable code.

* Add ulong conversion for offsets

* Nit</pre>
</div>
</content>
</entry>
<entry>
<title>ARMeilleure: Respect FZ/RM flags for all floating point operations (#4618)</title>
<updated>2023-04-10T10:22:58+00:00</updated>
<author>
<name>riperiperi</name>
<email>rhy3756547@hotmail.com</email>
</author>
<published>2023-04-10T10:22:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=9db73f74cf77484c4d8b34af54c563c68cabb41e'/>
<id>9db73f74cf77484c4d8b34af54c563c68cabb41e</id>
<content type='text'>
* ARMeilleure: Respect Fz flag for all floating point operations.

This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32.

The new strategy is to set the Fz flag only in the following circumstances:

- Set to match FPCR before translated functions/loop are executed.
- Reset when calling SoftFloat methods, set when returning.
- Reset when exiting execution.

This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code.

Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM.

This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero.

This is draft right now because I need to answer the questions:
- Does dotnet avoid changing the value of Mxcsr?
- Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat?
- If we assume that, do we want a unit test to verify the behaviour?

I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611.

* Remove unused method

* Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls

...when available.

Similar implementation to A32

* Use FMA for Frecps, Frsqrts

* Don't set DAZ.

* Add round mode to ARM FP mode

* Fix mistakes

* Add test for FP state when calling managed methods

* Add explanatory comment to test.

* Cleanup

* Add A64 FPCR flags

* Vrintx_S A32 fast path on A64 backend

* Address feedback 1, re-enable DAZ

* Fix FMA instructions By Elem

* Address feedback</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* ARMeilleure: Respect Fz flag for all floating point operations.

This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32.

The new strategy is to set the Fz flag only in the following circumstances:

- Set to match FPCR before translated functions/loop are executed.
- Reset when calling SoftFloat methods, set when returning.
- Reset when exiting execution.

This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code.

Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM.

This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero.

This is draft right now because I need to answer the questions:
- Does dotnet avoid changing the value of Mxcsr?
- Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat?
- If we assume that, do we want a unit test to verify the behaviour?

I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611.

* Remove unused method

* Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls

...when available.

Similar implementation to A32

* Use FMA for Frecps, Frsqrts

* Don't set DAZ.

* Add round mode to ARM FP mode

* Fix mistakes

* Add test for FP state when calling managed methods

* Add explanatory comment to test.

* Cleanup

* Add A64 FPCR flags

* Vrintx_S A32 fast path on A64 backend

* Address feedback 1, re-enable DAZ

* Fix FMA instructions By Elem

* Address feedback</pre>
</div>
</content>
</entry>
<entry>
<title>ARMeilleure: Add initial support for AVX512 (EVEX encoding) (cont) (#4147)</title>
<updated>2023-03-20T19:09:24+00:00</updated>
<author>
<name>Wunk</name>
<email>wunkolo@gmail.com</email>
</author>
<published>2023-03-20T19:09:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=17620d18db8d4a67e4b917596c760107d26fadc5'/>
<id>17620d18db8d4a67e4b917596c760107d26fadc5</id>
<content type='text'>
* ARMeilleure: Add AVX512{F,VL,DQ,BW} detection

Add `UseAvx512Ortho` and `UseAvx512OrthoFloat` optimization flags as
short-hands for `F+VL` and `F+VL+DQ`.

* ARMeilleure: Add initial support for EVEX instruction encoding

Does not implement rounding, or exception controls.

* ARMeilleure: Add `X86Vpternlogd`

Accelerates the vector-`Not` instruction.

* ARMeilleure: Add check for `OSXSAVE` for AVX{2,512}

* ARMeilleure: Add check for `XCR0` flags

Add XCR0 register checks for AVX and AVX512F, following the guidelines
from section 14.3 and 15.2 from the Intel Architecture Software
Developer's Manual.

* ARMeilleure: Remove redundant `ReProtect` and `Dispose`, formatting

* ARMeilleure: Move XCR0 procedure to GetXcr0Eax

* ARMeilleure: Add `XCR0` to `FeatureInfo` structure

* ARMeilleure: Utilize `ReadOnlySpan` for Xcr0 assembly

Avoids an additional allocation

* ARMeilleure: Formatting fixes

* ARMeilleure: Fix EVEX encoding src2 register index

&gt; Just like in VEX prefix, vvvv is provided in inverted form.

* ARMeilleure: Add `X86Vpternlogd` acceleration to `Vmvn_I`

Passes unit tests, verified instruction utilization

* ARMeilleure: Fix EVEX register operand designations

Operand 2 was being sourced improperly.

EVEX encoded instructions source their operands like so:
Operand 1: ModRM:reg
Operand 2: EVEX.vvvvv
Operand 3: ModRM:r/m
Operand 4: Imm

This fixes the improper register designations when emitting vpternlog.
Now "dest", "src1", "src2" arguments emit in the proper order in EVEX instructions.

* ARMeilleure: Add `X86Vpternlogd` acceleration to `Orn_V`

* ARMeilleure: PTC version bump

* ARMeilleure: Update EVEX encoding Debug.Assert to Debug.Fail

* ARMeilleure: Update EVEX encoding comment capitalization</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* ARMeilleure: Add AVX512{F,VL,DQ,BW} detection

Add `UseAvx512Ortho` and `UseAvx512OrthoFloat` optimization flags as
short-hands for `F+VL` and `F+VL+DQ`.

* ARMeilleure: Add initial support for EVEX instruction encoding

Does not implement rounding, or exception controls.

* ARMeilleure: Add `X86Vpternlogd`

Accelerates the vector-`Not` instruction.

* ARMeilleure: Add check for `OSXSAVE` for AVX{2,512}

* ARMeilleure: Add check for `XCR0` flags

Add XCR0 register checks for AVX and AVX512F, following the guidelines
from section 14.3 and 15.2 from the Intel Architecture Software
Developer's Manual.

* ARMeilleure: Remove redundant `ReProtect` and `Dispose`, formatting

* ARMeilleure: Move XCR0 procedure to GetXcr0Eax

* ARMeilleure: Add `XCR0` to `FeatureInfo` structure

* ARMeilleure: Utilize `ReadOnlySpan` for Xcr0 assembly

Avoids an additional allocation

* ARMeilleure: Formatting fixes

* ARMeilleure: Fix EVEX encoding src2 register index

&gt; Just like in VEX prefix, vvvv is provided in inverted form.

* ARMeilleure: Add `X86Vpternlogd` acceleration to `Vmvn_I`

Passes unit tests, verified instruction utilization

* ARMeilleure: Fix EVEX register operand designations

Operand 2 was being sourced improperly.

EVEX encoded instructions source their operands like so:
Operand 1: ModRM:reg
Operand 2: EVEX.vvvvv
Operand 3: ModRM:r/m
Operand 4: Imm

This fixes the improper register designations when emitting vpternlog.
Now "dest", "src1", "src2" arguments emit in the proper order in EVEX instructions.

* ARMeilleure: Add `X86Vpternlogd` acceleration to `Orn_V`

* ARMeilleure: PTC version bump

* ARMeilleure: Update EVEX encoding Debug.Assert to Debug.Fail

* ARMeilleure: Update EVEX encoding comment capitalization</pre>
</div>
</content>
</entry>
<entry>
<title>Reducing memory allocations (#4537)</title>
<updated>2023-03-17T12:14:50+00:00</updated>
<author>
<name>jhorv</name>
<email>38920027+jhorv@users.noreply.github.com</email>
</author>
<published>2023-03-17T12:14:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=5131b71437b36f9b876046a2fddd5465652e0f10'/>
<id>5131b71437b36f9b876046a2fddd5465652e0f10</id>
<content type='text'>
* add RecyclableMemoryStream dependency and MemoryStreamManager

* organize BinaryReader/BinaryWriter extensions

* add StreamExtensions to reduce need for BinaryWriter

* simple replacments of MemoryStream with RecyclableMemoryStream

* add write ReadOnlySequence&lt;byte&gt; support to IVirtualMemoryManager

* avoid 0-length array creation

* rework IpcMessage and related types to greatly reduce memory allocation by using RecylableMemoryStream, keeping streams around longer, avoiding their creation when possible, and avoiding creation of BinaryReader and BinaryWriter when possible

* reduce LINQ-induced memory allocations with custom methods to query KPriorityQueue

* use RecyclableMemoryStream in StreamUtils, and use StreamUtils in EmbeddedResources

* add constants for nanosecond/millisecond conversions

* code formatting

* XML doc adjustments

* fix: StreamExtension.WriteByte not writing non-zero values for lengths &lt;= 16

* XML Doc improvements. Implement StreamExtensions.WriteByte() block writes for large-enough count values.

* add copyless path for StreamExtension.Write(ReadOnlySpan&lt;int&gt;)

* add default implementation of IVirtualMemoryManager.Write(ulong, ReadOnlySequence&lt;byte&gt;); remove previous explicit implementations

* code style fixes

* remove LINQ completely from KScheduler/KPriorityQueue by implementing a custom struct-based enumerator</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* add RecyclableMemoryStream dependency and MemoryStreamManager

* organize BinaryReader/BinaryWriter extensions

* add StreamExtensions to reduce need for BinaryWriter

* simple replacments of MemoryStream with RecyclableMemoryStream

* add write ReadOnlySequence&lt;byte&gt; support to IVirtualMemoryManager

* avoid 0-length array creation

* rework IpcMessage and related types to greatly reduce memory allocation by using RecylableMemoryStream, keeping streams around longer, avoiding their creation when possible, and avoiding creation of BinaryReader and BinaryWriter when possible

* reduce LINQ-induced memory allocations with custom methods to query KPriorityQueue

* use RecyclableMemoryStream in StreamUtils, and use StreamUtils in EmbeddedResources

* add constants for nanosecond/millisecond conversions

* code formatting

* XML doc adjustments

* fix: StreamExtension.WriteByte not writing non-zero values for lengths &lt;= 16

* XML Doc improvements. Implement StreamExtensions.WriteByte() block writes for large-enough count values.

* add copyless path for StreamExtension.Write(ReadOnlySpan&lt;int&gt;)

* add default implementation of IVirtualMemoryManager.Write(ulong, ReadOnlySequence&lt;byte&gt;); remove previous explicit implementations

* code style fixes

* remove LINQ completely from KScheduler/KPriorityQueue by implementing a custom struct-based enumerator</pre>
</div>
</content>
</entry>
<entry>
<title>CPU: Avoid argument value copies on the JIT (#4484)</title>
<updated>2023-03-08T22:25:35+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2023-03-08T22:25:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=f0562b9c75308c8cfcaa2458dfd37ac42751a374'/>
<id>f0562b9c75308c8cfcaa2458dfd37ac42751a374</id>
<content type='text'>
* Minor refactoring of the pre-allocator

* Avoid LoadArgument copies

* PPTC version bump</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Minor refactoring of the pre-allocator

* Avoid LoadArgument copies

* PPTC version bump</pre>
</div>
</content>
</entry>
<entry>
<title>Remove use of GetFunctionPointerForDelegate to get JIT cache function pointer (#4337)</title>
<updated>2023-01-23T22:37:53+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2023-01-23T22:37:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=a1a4771ac1de95f2410c7fb8dfaf4a5986e5ebc6'/>
<id>a1a4771ac1de95f2410c7fb8dfaf4a5986e5ebc6</id>
<content type='text'>
* Remove use of GetFunctionPointerForDelegate to get JIT cache function pointer

* Rename FuncPtr to FuncPointer</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Remove use of GetFunctionPointerForDelegate to get JIT cache function pointer

* Rename FuncPtr to FuncPointer</pre>
</div>
</content>
</entry>
<entry>
<title>Arm64: Simplify TryEncodeBitMask and use for constants (#4328)</title>
<updated>2023-01-22T14:15:49+00:00</updated>
<author>
<name>merry</name>
<email>git@mary.rs</email>
</author>
<published>2023-01-22T14:15:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=4f293f8cbec33e8edce81ad4980bd532a2464c05'/>
<id>4f293f8cbec33e8edce81ad4980bd532a2464c05</id>
<content type='text'>
* Arm64: Simplify TryEncodeBitMask

* CodeGenerator: Use TryEncodeBitMask in GenerateConstantCopy

* Ptc: Bump version</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Arm64: Simplify TryEncodeBitMask

* CodeGenerator: Use TryEncodeBitMask in GenerateConstantCopy

* Ptc: Bump version</pre>
</div>
</content>
</entry>
<entry>
<title>Optimize string memory usage. Use Spans and StringBuilders where possible (#3933)</title>
<updated>2023-01-18T22:25:16+00:00</updated>
<author>
<name>Andrey Sukharev</name>
<email>SukharevAndrey@users.noreply.github.com</email>
</author>
<published>2023-01-18T22:25:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=ae4324032a48ee08a808354673f47536e76759d0'/>
<id>ae4324032a48ee08a808354673f47536e76759d0</id>
<content type='text'>
* Optimize string memory usage. Use ReadOnlySpan&lt;char&gt; and StringBuilder where possible.

* Fix copypaste error

* Code generator review fixes

* Use if statement instead of switch

* Code style fixes

Co-authored-by: TSRBerry &lt;20988865+TSRBerry@users.noreply.github.com&gt;

* Another code style fix

* Styling fix

Co-authored-by: Mary-nyan &lt;thog@protonmail.com&gt;

* Styling fix

Co-authored-by: gdkchan &lt;gab.dark.100@gmail.com&gt;

Co-authored-by: TSRBerry &lt;20988865+TSRBerry@users.noreply.github.com&gt;
Co-authored-by: Mary-nyan &lt;thog@protonmail.com&gt;
Co-authored-by: gdkchan &lt;gab.dark.100@gmail.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Optimize string memory usage. Use ReadOnlySpan&lt;char&gt; and StringBuilder where possible.

* Fix copypaste error

* Code generator review fixes

* Use if statement instead of switch

* Code style fixes

Co-authored-by: TSRBerry &lt;20988865+TSRBerry@users.noreply.github.com&gt;

* Another code style fix

* Styling fix

Co-authored-by: Mary-nyan &lt;thog@protonmail.com&gt;

* Styling fix

Co-authored-by: gdkchan &lt;gab.dark.100@gmail.com&gt;

Co-authored-by: TSRBerry &lt;20988865+TSRBerry@users.noreply.github.com&gt;
Co-authored-by: Mary-nyan &lt;thog@protonmail.com&gt;
Co-authored-by: gdkchan &lt;gab.dark.100@gmail.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>Implement support for page sizes &gt; 4KB (#4252)</title>
<updated>2023-01-17T04:13:24+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2023-01-17T04:13:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=86fd0643c26433362a25acceb4fa1fcee07dd0b2'/>
<id>86fd0643c26433362a25acceb4fa1fcee07dd0b2</id>
<content type='text'>
* Implement support for page sizes &gt; 4KB

* Check and work around more alignment issues

* Was not meant to change this

* Use MemoryBlock.GetPageSize() value for signal handler code

* Do not take the path for private allocations if host supports 4KB pages

* Add Flags attribute on MemoryMapFlags

* Fix dirty region size with 16kb pages

Would accidentally report a size that was too high (generally 16k instead of 4k, uploading 4x as much data)

Co-authored-by: riperiperi &lt;rhy3756547@hotmail.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Implement support for page sizes &gt; 4KB

* Check and work around more alignment issues

* Was not meant to change this

* Use MemoryBlock.GetPageSize() value for signal handler code

* Do not take the path for private allocations if host supports 4KB pages

* Add Flags attribute on MemoryMapFlags

* Fix dirty region size with 16kb pages

Would accidentally report a size that was too high (generally 16k instead of 4k, uploading 4x as much data)

Co-authored-by: riperiperi &lt;rhy3756547@hotmail.com&gt;</pre>
</div>
</content>
</entry>
</feed>
