| Age | Commit message (Collapse) | Author |
|
* ARMeilleure/HardwareCapabilities: Add Sha
* ARMeilleure/Intrinsic: Add X86Sha256Rnds2
* ARmeilleure: Hardware accelerate SHA256H/SHA256H2
* ARMeilleure/Intrinsic: Add X86Sha256Msg1, X86Sha256Msg2
* ARMeilleure/Intrinsic: Add X86Palignr
* ARMeilleure: Hardware accelerate SHA256SU0, SHA256SU1
* PTC: Bump InternalVersion
|
|
* Implement some 32-bit Thumb instructions
* Optimize OpCode32MemMult using PopCount
|
|
* A few minor documentation fixes.
* Removed more invalid inheritdoc instances.
|
|
* Removed unused usings.
* Added back using, now that it's used.
* Removed extra whitespace.
|
|
IsVexSameOperandDestSrc1 (#3587)
|
|
* Implement Arm32 Sha256 and MRS Rd, CPSR instructions
* Add tests using Arm64 outputs
|
|
* Initial commit with a lot of testing stuff.
* Partial Unmap Cleanup Part 1
* Fix some minor issues, hopefully windows tests.
* Disable partial unmap tests on macos for now
Weird issue.
* Goodbye magic number
* Add COMPlus_EnableAlternateStackCheck for tests
`COMPlus_EnableAlternateStackCheck` is needed for NullReferenceException handling to work on linux after registering the signal handler, due to how dotnet registers its own signal handler.
* Address some feedback
* Force retry when memory is mapped in memory tracking
This case existed before, but returning `false` no longer retries, so it would crash immediately after unprotecting the memory... Now, we return `true` to deliberately retry.
This case existed before (was just broken by this change) and I don't really want to look into fixing the issue right now. Technically, this means that on guest code partial unmaps will retry _due to this_ rather than hitting the handler. I don't expect this to cause any issues.
This should fix random crashes in Xenoblade Chronicles 2.
* Use IsRangeMapped
* Suppress MockMemoryManager.UnmapEvent warning
This event is not signalled by the mock memory manager.
* Remove 4kb mapping
|
|
* Half <-> Double conversion support
* Add tests, fast path and deduplicate SoftFloat code
* PPTC version
|
|
|
|
(#3362)
* Refactor CPU interface
* Use IExecutionContext interface on SVC handler, change how CPU interrupts invokes the handlers
* Make CpuEngine take a ITickSource rather than returning one
The previous implementation had the scenario where the CPU engine had to implement the tick source in mind, like for example, when we have a hypervisor and the game can read CNTPCT on the host directly. However given that we need to do conversion due to different frequencies anyway, it's not worth it. It's better to just let the user pass the tick source and redirect any reads to CNTPCT to the user tick source
* XML docs for the public interfaces
* PPTC invalidation due to NativeInterface function name changes
* Fix build of the CPU tests
* PR feedback
|
|
* Back to the origins: Make memory manager take guest PA rather than host address once again
* Direct mapping with alias support on Windows
* Fixes and remove more of the emulated shared memory
* Linux support
* Make shared and transfer memory not depend on SharedMemoryStorage
* More efficient view mapping on Windows (no more restricted to 4KB pages at a time)
* Handle potential access violations caused by partial unmap
* Implement host mapping using shared memory on Linux
* Add new GetPhysicalAddressChecked method, used to ensure the virtual address is mapped before address translation
Also align GetRef behaviour with software memory manager
* We don't need a mirrorable memory block for software memory manager mode
* Disable memory aliasing tests while we don't have shared memory support on Mac
* Shared memory & SIGBUS handler for macOS
* Fix typo + nits + re-enable memory tests
* Set MAP_JIT_DARWIN on x86 Mac too
* Add back the address space mirror
* Only set MAP_JIT_DARWIN if we are mapping as executable
* Disable aliasing tests again (still fails on Mac)
* Fix UnmapView4KB (by not casting size to int)
* Use ref counting on memory blocks to delay closing the shared memory handle until all blocks using it are disposed
* Address PR feedback
* Make RO hold a reference to the guest process memory manager to avoid early disposal
Co-authored-by: nastys <nastys@users.noreply.github.com>
|
|
* T32: Implement load/store single (immediate)
* tests
* tidy formatting
* address comments
|
|
* Fix tail merge from block with conditional jump to multiple returns
* PPTC version bump
|
|
* InstEmitMemoryEx: Barrier after write on ordered store
* increment ptc version
* 32
|
|
* ExecutionContext: GetPstate / SetPstate
* Put it in NativeContext
* KThread: Fix GetPsr mask
* ExecutionContext: Turn methods into Pstate property
* Address nit
|
|
* T32: Implement Data Processing (Modified Immediate) instructions
* Update tests
* switch -> lookup table
|
|
* Tests: Add A32 tests for immediate ADC/ADCS/RSC/RSCS/SBC/SBCS
* A32: Fix bug in ADC/ADCS/RSC/RSCS/SBC/SBCS
* CpuTestAluImm32: Add more opcodes
* Increment PTC version
|
|
|
|
instruction (#3153)
* Decoder: Exit on trapping instructions, and resume execution at trapping instruction
* Resume at trapping address
* remove mustExit
|
|
* Decoders: Make IsThumb a function of OpCode32
* OpCode32: Fix GetPc
* T32: Implement B, B.cond, BL, BLX
* rm usings
|
|
* T32: Implement ADC, ADD, AND, BIC, CMN, CMP, EOR, MOV, MVN, ORN, ORR, RSB, SBC, SUB, TEQ, TST (shifted register)
* OpCodeTable: Sort T32 list
* Tests: Rename RandomTestCase to PrecomputedThumbTestCase
* T32: Tests for AluRsImm instructions
* fix nit
* fix nit 2
|
|
* Decoder: Implement SingleInstruction decoder mode
* Translator: Implement Step
* DecoderMode: Rename Normal to MultipleBlocks
|
|
|
|
* Collapse AsSpan().Slice(..) calls into AsSpan(..)
Less code and a bit faster
* Collapse an Array.Clear(array, 0, array.Length) call to Array.Clear(array)
|
|
|
|
* Enable CPU JIT cache invalidation
* Invalidate cache on IC IVAU
|
|
|
|
* Decoders: Add InITBlock argument
* OpCodeTable: Minor cleanup
* OpCodeTable: Remove existing thumb instruction implementations
* OpCodeTable: Prepare for thumb instructions
* OpCodeTables: Improve thumb fast lookup
* Tests: Prepare for thumb tests
* T16: Implement BX
* T16: Implement LSL/LSR/ASR (imm)
* T16: Implement ADDS, SUBS (reg)
* T16: Implement ADDS, SUBS (3-bit immediate)
* T16: Implement MOVS, CMP, ADDS, SUBS (8-bit immediate)
* T16: Implement ANDS, EORS, LSLS, LSRS, ASRS, ADCS, SBCS, RORS, TST, NEGS, CMP, CMN, ORRS, MULS, BICS, MVNS (low registers)
* T16: Implement ADD, CMP, MOV (high reg)
* T16: Implement BLX (reg)
* T16: Implement LDR (literal)
* T16: Implement {LDR,STR}{,H,B,SB,SH} (register)
* T16: Implement {LDR,STR}{,B,H} (immediate)
* T16: Implement LDR/STR (SP)
* T16: Implement ADR
* T16: Implement Add to SP (immediate)
* T16: Implement ADD/SUB (SP)
* T16: Implement SXTH, SXTB, UXTH, UTXB
* T16: Implement CBZ, CBNZ
* T16: Implement PUSH, POP
* T16: Implement REV, REV16, REVSH
* T16: Implement NOP
* T16: Implement LDM, STM
* T16: Implement SVC
* T16: Implement B (conditional)
* T16: Implement B (unconditional)
* T16: Implement IT
* fixup! T16: Implement ADD/SUB (SP)
* fixup! T16: Implement Add to SP (immediate)
* fixup! T16: Implement IT
* CpuTestThumb: Add randomized tests
* Remove inITBlock argument
* Address nits
* Use index to handle IfThenBlockState
* Reduce line noise
* fixup
* nit
|
|
|
|
|
|
|
|
* ARMeilleure: A32: Implement UHSUB8
* ARMeilleure: A32: Implement SHSUB8
|
|
|
|
|
|
* Fix small precision error on CPU reciprocal estimate instructions
* PPTC version bump
|
|
* Fix calls passing V128 values on Linux
* PPTC version bump
|
|
* Add host CPU memory barriers for DMB/DSB and ordered load/store
* PPTC version bump
* Revert to old barrier order
|
|
* Implement FCVTNS (Scalar GP)
* Update Ptc Version
|
|
|
|
* Add FCVTMS_V Implementation to Armeilleure
* Fix opcode designation
* Add tests
* Amend Ptc version
* Fix OpCode / Tests
* Create Math.Floor helper method + Update implementation
* Address gdk comments
* Re-address gdk comments
* Update ARMeilleure/Decoders/OpCodeTable.cs
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
* Update Tests to use 2S (4S) and 2D
Co-authored-by: gdkchan <gab.dark.100@gmail.com>
|
|
|
|
* Remove usage of Mono.Posix.NETStandard in Ryujinx project
* Remove usage of Mono.Posix.NETStandard in ARMeilleure project
* Remove usage of Mono.Posix.NETStandard in Ryujinx.Memory project
* Address gdkchan's comments
|
|
* Implement UHADD8 instruction along with a test unit
* Update PTC revision number
|
|
Very basic migration across the codebase.
|
|
* infra: Migrate to .NET 6
* Rollback version naming change
* Workaround .NET 6 ZipArchive API issues
* ci: Switch to VS 2022 for AppVeyor
CI is now ready for .NET 6
* Suppress WebClient warning in DoUpdateWithMultipleThreads
* Attempt to workaround System.Drawing.Common changes on 6.0.0
* Change keyboard rendering from System.Drawing to ImageSharp
* Make the software keyboard renderer multithreaded
* Bump ImageSharp version to 1.0.4 to fix a bug in Image.Load
* Add fallback fonts to the keyboard renderer
* Fix warnings
* Address caian's comment
* Clean up linux workaround as it's uneeded now
* Update readme
Co-authored-by: Caian Benedicto <caianbene@gmail.com>
|
|
* Add an early `TailMerge` pass
Some translations can have a lot of guest calls and since for each guest
call there is a call guard which may return. This can produce a lot of
epilogue code for returns. This pass merges the epilogue into a single
block.
```
Using filter 'hcq'.
Using metric 'code size'.
Total diff: -1648111 (-7.19 %) (bytes):
Base: 22913847
Diff: 21265736
Improved: 4567, regressed: 14, unchanged: 144
```
* Set PTC version
* Address feedback
* Handle `void` returning functions
* Actually handle `void` returning functions
* Fix `RegisterToLocal` logging
|
|
* Optimize `TryAllocateRegWithtoutSpill` a bit
* Add a fast path for when all registers are live.
* Do not query `GetOverlapPosition` if the register is already in use
(i.e: free position is 0).
* Do not allocate child split list if not parent
* Turn `LiveRange` into a reference struct
`LiveRange` is now a reference wrapping struct like `Operand` and
`Operation`.
It has also been changed into a singly linked-list. In micro-benchmarks
traversing the linked-list was faster than binary search on `List<T>`.
Even for quite large input sizes (e.g: 1,000,000), surprisingly.
Could be because the code gen for traversing the linked-list is much
much cleaner and there is no virtual dispatch happening when checking if
intervals overlaps.
* Turn `LiveInterval` into an iterator
The LSRA allocates in forward order and never inspect previous
`LiveInterval` once they are expired. Something similar can be done for
the `LiveRange`s within the `LiveInterval`s themselves.
The `LiveInterval` is turned into a iterator which expires `LiveRange`
within it. The iterator is moved forward along with interval walking
code, i.e: AllocateInterval(context, interval, cIndex).
* Remove `LinearScanAllocator.Sources`
Local methods are less susceptible to do allocations than lambdas.
* Optimize `GetOverlapPosition(interval)` a bit
Time complexity should be in O(n+m) instead of O(nm) now.
* Optimize `NumberLocals` a bit
Use the same idea as in `HybridAllocator` to store the visited state
in the MSB of the Operand's value instead of using a `HashSet<T>`.
* Optimize `InsertSplitCopies` a bit
Avoid allocating a redundant `CopyResolver`.
* Optimize `InsertSplitCopiesAtEdges` a bit
Avoid redundant allocations of `CopyResolver`.
* Use stack allocation for `freePositions`
Avoid redundant computations.
* Add `UseList`
Replace `SortedIntegerList` with an even more specialized data
structure. It allocates memory on the arena allocators and does not
require copying use positions when splitting it.
* Turn `LiveInterval` into a reference struct
`LiveInterval` is now a reference wrapping struct like `Operand` and
`Operation`.
The rationale behind turning this in a reference wrapping struct is
because a `LiveInterval` is associated with each local variable, and
these intervals may themselves be split further. I've seen translations
having up to 8000 local variables.
To make the `LiveInterval` unmanaged, a new data structure called
`LiveIntervalList` was added to store child splits. This differs from
`SortedList<,>` because it can contain intervals with the same start
position.
Really wished we got some more of C++ template in C#. :^(
* Optimize `GetChildSplit` a bit
No need to inspect the remaining ranges if we've reached a range which
starts after position, since the split list is ordered.
* Optimize `CopyResolver` a bit
Lazily allocate the fill, spill and parallel copy structures since most
of the time only one of them is needed.
* Optimize `BitMap.Enumerator` a bit
Marking `MoveNext` as `AggressiveInlining` allows RyuJIT to promote the
`Enumerator` struct into registers completely, reducing load/store code
a lot since it does not have to store the struct on the stack for ABI
purposes.
* Use stack allocation for `use/blockedPositions`
* Optimize `AllocateWithSpill` a bit
* Address feedback
* Make `LiveInterval.AddRange(,)` more conservative
Produces no diff against master, but just for good measure.
|
|
* Add `Operand.Label` support to `Assembler`
This adds label support to `Assembler` and enables branch tightening
when compiling with relocatables. Jump management and patching has been
moved to the `Assembler`.
* Move instruction table to `Assembler.Table`
* Set PTC internal version
* Rename `Assembler.Table` to `AssemblerTable`
|
|
* Replace CacheResourceWrite with more general "precise" write
The goal of CacheResourceWrite was to notify GPU resources when they were modified directly, by looking up the modified address/size in a structure and calling a method on each resource. The downside of this is that each resource cache has to be queried individually, they all have to implement their own way to do this, and it can only signal to resources using the same PhysicalMemory instance.
This PR adds the ability to signal a write as "precise" on the tracking, which signals a special handler (if present) which can be used to avoid unnecessary flush actions, or maybe even more. For buffers, precise writes specifically do not flush, and instead punch a hole in the modified range list to indicate that the data on GPU has been replaced.
The downside is that precise actions must ignore the page protection bits and always signal - as they need to notify the target resource to ignore the sequence number optimization.
I had to reintroduce the sequence number increment after I2M, as removing it was causing issues in rabbids kingdom battle. However - all resources modified by I2M are notified directly to lower their sequence number, so the problem is likely that another unrelated resource is not being properly updated. Thankfully, doing this does not affect performance in the games I tested.
This should fix regressions from #2624. Test any games that were broken by that. (RF4, rabbids kingdom battle)
I've also added a sequence number increment to ThreedClass.IncrementSyncpoint, as it seems to fix buffer corruption in OpenGL homebrew. (this was a regression from removing sequence number increment from constant buffer update - another unrelated resource thing)
* Add tests.
* Add XML docs for GpuRegionHandle
* Skip UpdateProtection if only precise actions were called
This allows precise actions to skip reprotection costs.
|
|
* Store constant `Operand`s in the `LocalInfo`
Since the spill slot and register assigned is fixed, we can just store
the `Operand` reference in the `LocalInfo` struct. This allows skipping
hitting the intern-table for a look up.
* Skip `Uses`/`Assignments` management
Since the `HybridAllocator` is the last pass and we do not care about
uses/assignments we can skip managing that when setting destinations or
sources.
* Make `GetLocalInfo` inlineable
Also fix a possible issue where with numbered locals. See or-assignment
operator in `SetVisited(local)` before patch.
* Do not run `BlockPlacement` in LCQ
With the host mapped memory manager, there is a lot less cold code to
split from hot code. So disabling this in LCQ gives some extra
throughput - where we need it.
* Address Mou-Ikkai's feedback
* Apply suggestions from code review
Co-authored-by: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>
* Move check to an assert
Co-authored-by: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>
|