Optimize LSRA (#2563)

* Optimize `TryAllocateRegWithtoutSpill` a bit * Add a fast path for when all registers are live. * Do not query `GetOverlapPosition` if the register is already in use (i.e: free position is 0). * Do not allocate child split list if not parent * Turn `LiveRange` into a reference struct `LiveRange` is now a reference wrapping struct like `Operand` and `Operation`. It has also been changed into a singly linked-list. In micro-benchmarks traversing the linked-list was faster than binary search on `List<T>`. Even for quite large input sizes (e.g: 1,000,000), surprisingly. Could be because the code gen for traversing the linked-list is much much cleaner and there is no virtual dispatch happening when checking if intervals overlaps. * Turn `LiveInterval` into an iterator The LSRA allocates in forward order and never inspect previous `LiveInterval` once they are expired. Something similar can be done for the `LiveRange`s within the `LiveInterval`s themselves. The `LiveInterval` is turned into a iterator which expires `LiveRange` within it. The iterator is moved forward along with interval walking code, i.e: AllocateInterval(context, interval, cIndex). * Remove `LinearScanAllocator.Sources` Local methods are less susceptible to do allocations than lambdas. * Optimize `GetOverlapPosition(interval)` a bit Time complexity should be in O(n+m) instead of O(nm) now. * Optimize `NumberLocals` a bit Use the same idea as in `HybridAllocator` to store the visited state in the MSB of the Operand's value instead of using a `HashSet<T>`. * Optimize `InsertSplitCopies` a bit Avoid allocating a redundant `CopyResolver`. * Optimize `InsertSplitCopiesAtEdges` a bit Avoid redundant allocations of `CopyResolver`. * Use stack allocation for `freePositions` Avoid redundant computations. * Add `UseList` Replace `SortedIntegerList` with an even more specialized data structure. It allocates memory on the arena allocators and does not require copying use positions when splitting it. * Turn `LiveInterval` into a reference struct `LiveInterval` is now a reference wrapping struct like `Operand` and `Operation`. The rationale behind turning this in a reference wrapping struct is because a `LiveInterval` is associated with each local variable, and these intervals may themselves be split further. I've seen translations having up to 8000 local variables. To make the `LiveInterval` unmanaged, a new data structure called `LiveIntervalList` was added to store child splits. This differs from `SortedList<,>` because it can contain intervals with the same start position. Really wished we got some more of C++ template in C#. :^( * Optimize `GetChildSplit` a bit No need to inspect the remaining ranges if we've reached a range which starts after position, since the split list is ordered. * Optimize `CopyResolver` a bit Lazily allocate the fill, spill and parallel copy structures since most of the time only one of them is needed. * Optimize `BitMap.Enumerator` a bit Marking `MoveNext` as `AggressiveInlining` allows RyuJIT to promote the `Enumerator` struct into registers completely, reducing load/store code a lot since it does not have to store the struct on the stack for ABI purposes. * Use stack allocation for `use/blockedPositions` * Optimize `AllocateWithSpill` a bit * Address feedback * Make `LiveInterval.AddRange(,)` more conservative Produces no diff against master, but just for good measure.
author: FICTURE7 <FICTURE7@gmail.com> 2021-10-09 01:15:44 +0400
committer: GitHub <noreply@github.com> 2021-10-08 18:15:44 -0300
commit: 69093cf2d69490862aff974f170cee63a0016fd0 (patch)
tree: 24507a2d3da862416d3c2d3ca228c89cb40d5437 /ARMeilleure/CodeGen/RegisterAllocators/UseList.cs
parent: c54a14d0b8d445d9d0074861dca816cc801e4008 (diff)
1 files changed, 84 insertions, 0 deletions
diff --git a/ARMeilleure/CodeGen/RegisterAllocators/UseList.cs b/ARMeilleure/CodeGen/RegisterAllocators/UseList.cs
new file mode 100644
index 00000000..c89f0854
--- /dev/null
+++ b/ARMeilleure/CodeGen/RegisterAllocators/UseList.cs
@@ -0,0 +1,84 @@
+using System;
+
+namespace ARMeilleure.CodeGen.RegisterAllocators
+{
+    unsafe struct UseList
+    {
+        private int* _items;
+        private int _capacity;
+        private int _count;
+
+        public int Count => _count;
+        public int FirstUse => _count > 0 ? _items[_count - 1] : LiveInterval.NotFound;
+        public Span<int> Span => new(_items, _count);
+
+        public void Add(int position)
+        {
+            if (_count + 1 > _capacity)
+            {
+                var oldSpan = Span;
+
+                _capacity = Math.Max(4, _capacity * 2);
+                _items = Allocators.Default.Allocate<int>((uint)_capacity);
+
+                var newSpan = Span;
+
+                oldSpan.CopyTo(newSpan);
+            }
+
+            // Use positions are usually inserted in descending order, so inserting in descending order is faster,
+            // since the number of half exchanges is reduced.
+            int i = _count - 1;
+
+            while (i >= 0 && _items[i] < position)
+            {
+                _items[i + 1] = _items[i--];
+            }
+
+            _items[i + 1] = position;
+            _count++;
+        }
+
+        public int NextUse(int position)
+        {
+            int index = NextUseIndex(position);
+
+            return index != LiveInterval.NotFound ? _items[index] : LiveInterval.NotFound;
+        }
+
+        public int NextUseIndex(int position)
+        {
+            int i = _count - 1;
+
+            if (i == -1 || position > _items[0])
+            {
+                return LiveInterval.NotFound;
+            }
+
+            while (i >= 0 && _items[i] < position)
+            {
+                i--;
+            }
+
+            return i;
+        }
+
+        public UseList Split(int position)
+        {
+            int index = NextUseIndex(position);
+
+            // Since the list is in descending order, the new split list takes the front of the list and the current
+            // list takes the back of the list.
+            UseList result = new();
+            result._count = index + 1;
+            result._capacity = result._count;
+            result._items = _items;
+
+            _count = _count - result._count;
+            _capacity = _count;
+            _items = _items + result._count;
+
+            return result;
+        }
+    }
+}
+\ No newline at end of file
author	FICTURE7 <FICTURE7@gmail.com>	2021-10-09 01:15:44 +0400
committer	GitHub <noreply@github.com>	2021-10-08 18:15:44 -0300
commit	69093cf2d69490862aff974f170cee63a0016fd0 (patch)
tree	24507a2d3da862416d3c2d3ca228c89cb40d5437 /ARMeilleure/CodeGen/RegisterAllocators/UseList.cs
parent	c54a14d0b8d445d9d0074861dca816cc801e4008 (diff)