<feed xmlns='http://www.w3.org/2005/Atom'>
<title>Ryujinx/ARMeilleure/CodeGen/Optimizations, branch master</title>
<subtitle>A backup of the Ryujinx master git branch.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/'/>
<entry>
<title>Move solution and projects to src</title>
<updated>2023-04-27T21:51:14+00:00</updated>
<author>
<name>TSR Berry</name>
<email>20988865+TSRBerry@users.noreply.github.com</email>
</author>
<published>2023-04-07T23:22:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=cee712105850ac3385cd0091a923438167433f9f'/>
<id>cee712105850ac3385cd0091a923438167433f9f</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Implement JIT Arm64 backend (#4114)</title>
<updated>2023-01-10T22:16:59+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2023-01-10T22:16:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=5e0f8e873857ce3ca3f532aff0936beb28e412c8'/>
<id>5e0f8e873857ce3ca3f532aff0936beb28e412c8</id>
<content type='text'>
* Implement JIT Arm64 backend

* PPTC version bump

* Address some feedback from Arm64 JIT PR

* Address even more PR feedback

* Remove unused IsPageAligned function

* Sync Qc flag before calls

* Fix comment and remove unused enum

* Address riperiperi PR feedback

* Delete Breakpoint IR instruction that was only implemented for Arm64</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Implement JIT Arm64 backend

* PPTC version bump

* Address some feedback from Arm64 JIT PR

* Address even more PR feedback

* Remove unused IsPageAligned function

* Sync Qc flag before calls

* Fix comment and remove unused enum

* Address riperiperi PR feedback

* Delete Breakpoint IR instruction that was only implemented for Arm64</pre>
</div>
</content>
</entry>
<entry>
<title>Fix tail merge from block with conditional jump to multiple returns (#3267)</title>
<updated>2022-04-09T14:56:50+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2022-04-09T14:56:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=26a881176eb6513a98889648e0d5b7fe647cd0e3'/>
<id>26a881176eb6513a98889648e0d5b7fe647cd0e3</id>
<content type='text'>
* Fix tail merge from block with conditional jump to multiple returns

* PPTC version bump</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Fix tail merge from block with conditional jump to multiple returns

* PPTC version bump</pre>
</div>
</content>
</entry>
<entry>
<title>Add an early `TailMerge` pass (#2721)</title>
<updated>2021-10-18T22:51:22+00:00</updated>
<author>
<name>FICTURE7</name>
<email>FICTURE7@gmail.com</email>
</author>
<published>2021-10-18T22:51:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=fbf40424f4d3a9aecc7fe528d7503619738ce542'/>
<id>fbf40424f4d3a9aecc7fe528d7503619738ce542</id>
<content type='text'>
* Add an early `TailMerge` pass

Some translations can have a lot of guest calls and since for each guest
call there is a call guard which may return. This can produce a lot of
epilogue code for returns. This pass merges the epilogue into a single
block.

```
Using filter 'hcq'.
Using metric 'code size'.

Total diff: -1648111 (-7.19 %) (bytes):
  Base: 22913847
  Diff: 21265736

Improved: 4567, regressed: 14, unchanged: 144
```

* Set PTC version

* Address feedback

* Handle `void` returning functions

* Actually handle `void` returning functions

* Fix `RegisterToLocal` logging</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Add an early `TailMerge` pass

Some translations can have a lot of guest calls and since for each guest
call there is a call guard which may return. This can produce a lot of
epilogue code for returns. This pass merges the epilogue into a single
block.

```
Using filter 'hcq'.
Using metric 'code size'.

Total diff: -1648111 (-7.19 %) (bytes):
  Base: 22913847
  Diff: 21265736

Improved: 4567, regressed: 14, unchanged: 144
```

* Set PTC version

* Address feedback

* Handle `void` returning functions

* Actually handle `void` returning functions

* Fix `RegisterToLocal` logging</pre>
</div>
</content>
</entry>
<entry>
<title>Fix type mismatch in `BitwiseAnd` simplification (#2571)</title>
<updated>2021-08-20T17:42:00+00:00</updated>
<author>
<name>FICTURE7</name>
<email>FICTURE7@gmail.com</email>
</author>
<published>2021-08-20T17:42:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=f2a7b300c471ee7ad9a925f1085ad86537f68154'/>
<id>f2a7b300c471ee7ad9a925f1085ad86537f68154</id>
<content type='text'>
* Fix type mismatch in `BitwiseAnd` simplification

`TryEliminateBitwiseAnd` would turn the `BitwiseAnd` operation into a
copy of the wrong type. E.g:

Before `Simplification`:
```llvm
i64 %0 = BitwiseAnd i64 0x0, %1
```

After `Simplication`:
```llvm
i64 %0 = Copy i32 0x0
```

Since the with the changes in #2515, we iterate in reverse order and
`Simplication`, `ConstantFolding` does not indicate if it modified
the CFG, the second pass to "retype" the copy into the proper
destination type does not happen.

This also blocked copy propagation since its destination type did not
match with its source type. But in the cases I've seen, the
`PreAllocator` would insert a copy for the propagated constant, which
results in no diffs.

Since the copy remained as is, asserts are fired when generating it.

* Set PPTC version</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Fix type mismatch in `BitwiseAnd` simplification

`TryEliminateBitwiseAnd` would turn the `BitwiseAnd` operation into a
copy of the wrong type. E.g:

Before `Simplification`:
```llvm
i64 %0 = BitwiseAnd i64 0x0, %1
```

After `Simplication`:
```llvm
i64 %0 = Copy i32 0x0
```

Since the with the changes in #2515, we iterate in reverse order and
`Simplication`, `ConstantFolding` does not indicate if it modified
the CFG, the second pass to "retype" the copy into the proper
destination type does not happen.

This also blocked copy propagation since its destination type did not
match with its source type. But in the cases I've seen, the
`PreAllocator` would insert a copy for the propagated constant, which
results in no diffs.

Since the copy remained as is, asserts are fired when generating it.

* Set PPTC version</pre>
</div>
</content>
</entry>
<entry>
<title>Reduce JIT GC allocations (#2515)</title>
<updated>2021-08-17T18:08:34+00:00</updated>
<author>
<name>FICTURE7</name>
<email>FICTURE7@gmail.com</email>
</author>
<published>2021-08-17T18:08:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=22b2cb39af00fb8881e908fd671fbf57a6e2db2a'/>
<id>22b2cb39af00fb8881e908fd671fbf57a6e2db2a</id>
<content type='text'>
* Turn `MemoryOperand` into a struct

* Remove `IntrinsicOperation`

* Remove `PhiNode`

* Remove `Node`

* Turn `Operand` into a struct

* Turn `Operation` into a struct

* Clean up pool management methods

* Add `Arena` allocator

* Move `OperationHelper` to `Operation.Factory`

* Move `OperandHelper` to `Operand.Factory`

* Optimize `Operation` a bit

* Fix `Arena` initialization

* Rename `NativeList&lt;T&gt;` to `ArenaList&lt;T&gt;`

* Reduce `Operand` size from 88 to 56 bytes

* Reduce `Operation` size from 56 to 40 bytes

* Add optimistic interning of Register &amp; Constant operands

* Optimize `RegisterUsage` pass a bit

* Optimize `RemoveUnusedNodes` pass a bit

Iterating in reverse-order allows killing dependency chains in a single
pass.

* Fix PPTC symbols

* Optimize `BasicBlock` a bit

Reduce allocations from `_successor` &amp; `DominanceFrontiers`

* Fix `Operation` resize

* Make `Arena` expandable

Change the arena allocator to be expandable by allocating in pages, with
some of them being pooled. Currently 32 pages are pooled. An LRU removal
mechanism should probably be added to it.

Apparently MHR can allocate bitmaps large enough to exceed the 16MB
limit for the type.

* Move `Arena` &amp; `ArenaList` to `Common`

* Remove `ThreadStaticPool` &amp; co

* Add `PhiOperation`

* Reduce `Operand` size from 56 from 48 bytes

* Add linear-probing to `Operand` intern table

* Optimize `HybridAllocator` a bit

* Add `Allocators` class

* Tune `ArenaAllocator` sizes

* Add page removal mechanism to `ArenaAllocator`

Remove pages which have not been used for more than 5s after each reset.

I am on fence if this would be better using a Gen2 callback object like
the one in System.Buffers.ArrayPool&lt;T&gt;, to trim the pool. Because right
now if a large translation happens, the pages will be freed only after a
reset. This reset may not happen for a while because no new translation
is hit, but the arena base sizes are rather small.

* Fix `OOM` when allocating larger than page size in `ArenaAllocator`

Tweak resizing mechanism for Operand.Uses and Assignemnts.

* Optimize `Optimizer` a bit

* Optimize `Operand.Add&lt;T&gt;/Remove&lt;T&gt;` a bit

* Clean up `PreAllocator`

* Fix phi insertion order

Reduce codegen diffs.

* Fix code alignment

* Use new heuristics for degree of parallelism

* Suppress warnings

* Address gdkchan's feedback

Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that
`Operand.Value` should usually not be modified directly.

* Add fast path to `ArenaAllocator`

* Assembly for `ArenaAllocator.Allocate(ulong)`:

  .L0:
    mov rax, [rcx+0x18]
    lea r8, [rax+rdx]
    cmp r8, [rcx+0x10]
    ja short .L2
  .L1:
    mov rdx, [rcx+8]
    add rax, [rdx+8]
    mov [rcx+0x18], r8
    ret
  .L2:
    jmp ArenaAllocator.AllocateSlow(UInt64)

  A few variable/field had to be changed to ulong so that RyuJIT avoids
  emitting zero-extends.

* Implement a new heuristic to free pooled pages.

  If an arena is used often, it is more likely that its pages will be
  needed, so the pages are kept for longer (e.g: during PPTC rebuild or
  burst sof compilations). If is not used often, then it is more likely
  that its pages will not be needed (e.g: after PPTC rebuild or bursts
  of compilations).

* Address riperiperi's feedback

* Use `EqualityComparer&lt;T&gt;` in `IntrusiveList&lt;T&gt;`

Avoids a potential GC hole in `Equals(T, T)`.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Turn `MemoryOperand` into a struct

* Remove `IntrinsicOperation`

* Remove `PhiNode`

* Remove `Node`

* Turn `Operand` into a struct

* Turn `Operation` into a struct

* Clean up pool management methods

* Add `Arena` allocator

* Move `OperationHelper` to `Operation.Factory`

* Move `OperandHelper` to `Operand.Factory`

* Optimize `Operation` a bit

* Fix `Arena` initialization

* Rename `NativeList&lt;T&gt;` to `ArenaList&lt;T&gt;`

* Reduce `Operand` size from 88 to 56 bytes

* Reduce `Operation` size from 56 to 40 bytes

* Add optimistic interning of Register &amp; Constant operands

* Optimize `RegisterUsage` pass a bit

* Optimize `RemoveUnusedNodes` pass a bit

Iterating in reverse-order allows killing dependency chains in a single
pass.

* Fix PPTC symbols

* Optimize `BasicBlock` a bit

Reduce allocations from `_successor` &amp; `DominanceFrontiers`

* Fix `Operation` resize

* Make `Arena` expandable

Change the arena allocator to be expandable by allocating in pages, with
some of them being pooled. Currently 32 pages are pooled. An LRU removal
mechanism should probably be added to it.

Apparently MHR can allocate bitmaps large enough to exceed the 16MB
limit for the type.

* Move `Arena` &amp; `ArenaList` to `Common`

* Remove `ThreadStaticPool` &amp; co

* Add `PhiOperation`

* Reduce `Operand` size from 56 from 48 bytes

* Add linear-probing to `Operand` intern table

* Optimize `HybridAllocator` a bit

* Add `Allocators` class

* Tune `ArenaAllocator` sizes

* Add page removal mechanism to `ArenaAllocator`

Remove pages which have not been used for more than 5s after each reset.

I am on fence if this would be better using a Gen2 callback object like
the one in System.Buffers.ArrayPool&lt;T&gt;, to trim the pool. Because right
now if a large translation happens, the pages will be freed only after a
reset. This reset may not happen for a while because no new translation
is hit, but the arena base sizes are rather small.

* Fix `OOM` when allocating larger than page size in `ArenaAllocator`

Tweak resizing mechanism for Operand.Uses and Assignemnts.

* Optimize `Optimizer` a bit

* Optimize `Operand.Add&lt;T&gt;/Remove&lt;T&gt;` a bit

* Clean up `PreAllocator`

* Fix phi insertion order

Reduce codegen diffs.

* Fix code alignment

* Use new heuristics for degree of parallelism

* Suppress warnings

* Address gdkchan's feedback

Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that
`Operand.Value` should usually not be modified directly.

* Add fast path to `ArenaAllocator`

* Assembly for `ArenaAllocator.Allocate(ulong)`:

  .L0:
    mov rax, [rcx+0x18]
    lea r8, [rax+rdx]
    cmp r8, [rcx+0x10]
    ja short .L2
  .L1:
    mov rdx, [rcx+8]
    add rax, [rdx+8]
    mov [rcx+0x18], r8
    ret
  .L2:
    jmp ArenaAllocator.AllocateSlow(UInt64)

  A few variable/field had to be changed to ulong so that RyuJIT avoids
  emitting zero-extends.

* Implement a new heuristic to free pooled pages.

  If an arena is used often, it is more likely that its pages will be
  needed, so the pages are kept for longer (e.g: during PPTC rebuild or
  burst sof compilations). If is not used often, then it is more likely
  that its pages will not be needed (e.g: after PPTC rebuild or bursts
  of compilations).

* Address riperiperi's feedback

* Use `EqualityComparer&lt;T&gt;` in `IntrusiveList&lt;T&gt;`

Avoids a potential GC hole in `Equals(T, T)`.</pre>
</div>
</content>
</entry>
<entry>
<title>PPTC: Fix unwanted propagation of a relocatable constant in a specific case. (#1990)</title>
<updated>2021-02-23T12:15:45+00:00</updated>
<author>
<name>LDj3SNuD</name>
<email>35856442+LDj3SNuD@users.noreply.github.com</email>
</author>
<published>2021-02-23T12:15:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=bcbf240d2eab2a2794224487d87519ac31016c96'/>
<id>bcbf240d2eab2a2794224487d87519ac31016c96</id>
<content type='text'>
* Fix unwanted propagation of a relocatable constant in a specific case.

* Ptc.InternalVersion = 1990

* Nit to retrigger the Checks.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Fix unwanted propagation of a relocatable constant in a specific case.

* Ptc.InternalVersion = 1990

* Nit to retrigger the Checks.</pre>
</div>
</content>
</entry>
<entry>
<title>Implement block placement (#1549)</title>
<updated>2020-09-19T23:00:24+00:00</updated>
<author>
<name>FICTURE7</name>
<email>FICTURE7@gmail.com</email>
</author>
<published>2020-09-19T23:00:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=f60033e0aaf546d7f56a4925b5aeec76709fb851'/>
<id>f60033e0aaf546d7f56a4925b5aeec76709fb851</id>
<content type='text'>
* Implement block placement

Implement a simple pass which re-orders cold blocks at the end of the
list of blocks in the CFG.

* Set PPTC version

* Use Array.Resize

Address gdkchan's feedback</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Implement block placement

Implement a simple pass which re-orders cold blocks at the end of the
list of blocks in the CFG.

* Set PPTC version

* Use Array.Resize

Address gdkchan's feedback</pre>
</div>
</content>
</entry>
<entry>
<title>Improve branch operations (#1442)</title>
<updated>2020-08-04T22:52:33+00:00</updated>
<author>
<name>Ficture Seven</name>
<email>FICTURE7@gmail.com</email>
</author>
<published>2020-08-04T22:52:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=ee22517d92c48eab9643b6fc8ce4dac2b7e95f57'/>
<id>ee22517d92c48eab9643b6fc8ce4dac2b7e95f57</id>
<content type='text'>
* Add Compare instruction

* Add BranchIf instruction

* Use test when BranchIf &amp; Compare against 0

* Propagate Compare into BranchIfTrue/False use

- Propagate Compare operations into their BranchIfTrue/False use and
  turn these into a BranchIf.

- Clean up Comparison enum.

* Replace BranchIfTrue/False with BranchIf

* Use BranchIf in EmitPtPointerLoad

- Using BranchIf early instead of BranchIfTrue/False improves LCQ and
  reduces the amount of work needed by the Optimizer.

  EmitPtPointerLoader was a/the big producer of BranchIfTrue/False.

- Fix asserts firing when assembling BitwiseAnd because of type
  mismatch in EmitStoreExclusive. This is harmless and should not
  cause any diffs.

* Increment PPTC interval version

* Improve IRDumper for BranchIf &amp; Compare

* Use BranchIf in EmitNativeCall

* Clean up

* Do not emit test when immediately preceded by and</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Add Compare instruction

* Add BranchIf instruction

* Use test when BranchIf &amp; Compare against 0

* Propagate Compare into BranchIfTrue/False use

- Propagate Compare operations into their BranchIfTrue/False use and
  turn these into a BranchIf.

- Clean up Comparison enum.

* Replace BranchIfTrue/False with BranchIf

* Use BranchIf in EmitPtPointerLoad

- Using BranchIf early instead of BranchIfTrue/False improves LCQ and
  reduces the amount of work needed by the Optimizer.

  EmitPtPointerLoader was a/the big producer of BranchIfTrue/False.

- Fix asserts firing when assembling BitwiseAnd because of type
  mismatch in EmitStoreExclusive. This is harmless and should not
  cause any diffs.

* Increment PPTC interval version

* Improve IRDumper for BranchIf &amp; Compare

* Use BranchIf in EmitNativeCall

* Clean up

* Do not emit test when immediately preceded by and</pre>
</div>
</content>
</entry>
<entry>
<title> Implement inline memory load/store exclusive and ordered (#1413)</title>
<updated>2020-07-30T14:29:28+00:00</updated>
<author>
<name>gdkchan</name>
<email>gab.dark.100@gmail.com</email>
</author>
<published>2020-07-30T14:29:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.benis.co.uk/Ryujinx/commit/?id=9878fc2d3cf4c64f56c44c2a5de013acb6bcbade'/>
<id>9878fc2d3cf4c64f56c44c2a5de013acb6bcbade</id>
<content type='text'>
* Implement inline memory load/store exclusive

* Fix missing REX prefix on 8-bits CMPXCHG

* Increment PTC version due to bugfix

* Remove redundant memory checks

* Address PR feedback

* Increment PPTC version</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Implement inline memory load/store exclusive

* Fix missing REX prefix on 8-bits CMPXCHG

* Increment PTC version due to bugfix

* Remove redundant memory checks

* Address PR feedback

* Increment PPTC version</pre>
</div>
</content>
</entry>
</feed>
