diff options
| author | FICTURE7 <FICTURE7@gmail.com> | 2021-10-09 01:15:44 +0400 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2021-10-08 18:15:44 -0300 |
| commit | 69093cf2d69490862aff974f170cee63a0016fd0 (patch) | |
| tree | 24507a2d3da862416d3c2d3ca228c89cb40d5437 /ARMeilleure/CodeGen/RegisterAllocators/UseList.cs | |
| parent | c54a14d0b8d445d9d0074861dca816cc801e4008 (diff) | |
Optimize LSRA (#2563)
* Optimize `TryAllocateRegWithtoutSpill` a bit
* Add a fast path for when all registers are live.
* Do not query `GetOverlapPosition` if the register is already in use
(i.e: free position is 0).
* Do not allocate child split list if not parent
* Turn `LiveRange` into a reference struct
`LiveRange` is now a reference wrapping struct like `Operand` and
`Operation`.
It has also been changed into a singly linked-list. In micro-benchmarks
traversing the linked-list was faster than binary search on `List<T>`.
Even for quite large input sizes (e.g: 1,000,000), surprisingly.
Could be because the code gen for traversing the linked-list is much
much cleaner and there is no virtual dispatch happening when checking if
intervals overlaps.
* Turn `LiveInterval` into an iterator
The LSRA allocates in forward order and never inspect previous
`LiveInterval` once they are expired. Something similar can be done for
the `LiveRange`s within the `LiveInterval`s themselves.
The `LiveInterval` is turned into a iterator which expires `LiveRange`
within it. The iterator is moved forward along with interval walking
code, i.e: AllocateInterval(context, interval, cIndex).
* Remove `LinearScanAllocator.Sources`
Local methods are less susceptible to do allocations than lambdas.
* Optimize `GetOverlapPosition(interval)` a bit
Time complexity should be in O(n+m) instead of O(nm) now.
* Optimize `NumberLocals` a bit
Use the same idea as in `HybridAllocator` to store the visited state
in the MSB of the Operand's value instead of using a `HashSet<T>`.
* Optimize `InsertSplitCopies` a bit
Avoid allocating a redundant `CopyResolver`.
* Optimize `InsertSplitCopiesAtEdges` a bit
Avoid redundant allocations of `CopyResolver`.
* Use stack allocation for `freePositions`
Avoid redundant computations.
* Add `UseList`
Replace `SortedIntegerList` with an even more specialized data
structure. It allocates memory on the arena allocators and does not
require copying use positions when splitting it.
* Turn `LiveInterval` into a reference struct
`LiveInterval` is now a reference wrapping struct like `Operand` and
`Operation`.
The rationale behind turning this in a reference wrapping struct is
because a `LiveInterval` is associated with each local variable, and
these intervals may themselves be split further. I've seen translations
having up to 8000 local variables.
To make the `LiveInterval` unmanaged, a new data structure called
`LiveIntervalList` was added to store child splits. This differs from
`SortedList<,>` because it can contain intervals with the same start
position.
Really wished we got some more of C++ template in C#. :^(
* Optimize `GetChildSplit` a bit
No need to inspect the remaining ranges if we've reached a range which
starts after position, since the split list is ordered.
* Optimize `CopyResolver` a bit
Lazily allocate the fill, spill and parallel copy structures since most
of the time only one of them is needed.
* Optimize `BitMap.Enumerator` a bit
Marking `MoveNext` as `AggressiveInlining` allows RyuJIT to promote the
`Enumerator` struct into registers completely, reducing load/store code
a lot since it does not have to store the struct on the stack for ABI
purposes.
* Use stack allocation for `use/blockedPositions`
* Optimize `AllocateWithSpill` a bit
* Address feedback
* Make `LiveInterval.AddRange(,)` more conservative
Produces no diff against master, but just for good measure.
Diffstat (limited to 'ARMeilleure/CodeGen/RegisterAllocators/UseList.cs')
| -rw-r--r-- | ARMeilleure/CodeGen/RegisterAllocators/UseList.cs | 84 |
1 files changed, 84 insertions, 0 deletions
diff --git a/ARMeilleure/CodeGen/RegisterAllocators/UseList.cs b/ARMeilleure/CodeGen/RegisterAllocators/UseList.cs new file mode 100644 index 00000000..c89f0854 --- /dev/null +++ b/ARMeilleure/CodeGen/RegisterAllocators/UseList.cs @@ -0,0 +1,84 @@ +using System; + +namespace ARMeilleure.CodeGen.RegisterAllocators +{ + unsafe struct UseList + { + private int* _items; + private int _capacity; + private int _count; + + public int Count => _count; + public int FirstUse => _count > 0 ? _items[_count - 1] : LiveInterval.NotFound; + public Span<int> Span => new(_items, _count); + + public void Add(int position) + { + if (_count + 1 > _capacity) + { + var oldSpan = Span; + + _capacity = Math.Max(4, _capacity * 2); + _items = Allocators.Default.Allocate<int>((uint)_capacity); + + var newSpan = Span; + + oldSpan.CopyTo(newSpan); + } + + // Use positions are usually inserted in descending order, so inserting in descending order is faster, + // since the number of half exchanges is reduced. + int i = _count - 1; + + while (i >= 0 && _items[i] < position) + { + _items[i + 1] = _items[i--]; + } + + _items[i + 1] = position; + _count++; + } + + public int NextUse(int position) + { + int index = NextUseIndex(position); + + return index != LiveInterval.NotFound ? _items[index] : LiveInterval.NotFound; + } + + public int NextUseIndex(int position) + { + int i = _count - 1; + + if (i == -1 || position > _items[0]) + { + return LiveInterval.NotFound; + } + + while (i >= 0 && _items[i] < position) + { + i--; + } + + return i; + } + + public UseList Split(int position) + { + int index = NextUseIndex(position); + + // Since the list is in descending order, the new split list takes the front of the list and the current + // list takes the back of the list. + UseList result = new(); + result._count = index + 1; + result._capacity = result._count; + result._items = _items; + + _count = _count - result._count; + _capacity = _count; + _items = _items + result._count; + + return result; + } + } +}
\ No newline at end of file |
