aboutsummaryrefslogtreecommitdiff
path: root/ARMeilleure/Instructions
AgeCommit message (Collapse)Author
2021-09-29Use normal memory store path for DC ZVA (#2693)riperiperi
Seems like this is used as an optimized way to clear memory in homebrew applications. Unfortunately, calling the software fallback method every 8 bytes was not very optimal. The existing EmitStore is used by passing in ZR as the register to get a 0 write.
2021-09-14Refactor `PtcInfo` (#2625)FICTURE7
* Refactor `PtcInfo` This change reduces the coupling of `PtcInfo` by moving relocation tracking to the backend. `RelocEntry`s remains as `RelocEntry`s through out the pipeline until it actually needs to be written to the PTC streams. Keeping this representation makes inspecting and manipulating relocations after compilations less painful. This is something I needed to do to patch relocations to 0 to diff dumps. Contributes to #1125. * Turn `Symbol` & `RelocInfo` into readonly structs * Add documentation to `CompiledFunction` * Remove `Compiler.Compile<T>` Remove `Compiler.Compile<T>` and replace it by `Map<T>` of the `CompiledFunction` returned.
2021-08-27Implement MSR instruction for A32 (#2585)Mary
* Implement MSR instruction Fix #1342. Now Pocket Rumble is playable. * Address gdkchan's comments * Address gdkchan's comments * Address gdkchan's comment
2021-08-17Reduce JIT GC allocations (#2515)FICTURE7
* Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.
2021-06-23Implement VORN (register) Arm32 instruction (#2396)gdkchan
2021-05-29Add multi-level function table (#2228)FICTURE7
* Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-24POWER - Performance Optimizations With Extensive Ramifications (#2286)riperiperi
* Refactoring of KMemoryManager class * Replace some trivial uses of DRAM address with VA * Get rid of GetDramAddressFromVa * Abstracting more operations on derived page table class * Run auto-format on KPageTableBase * Managed to make TryConvertVaToPa private, few uses remains now * Implement guest physical pages ref counting, remove manual freeing * Make DoMmuOperation private and call new abstract methods only from the base class * Pass pages count rather than size on Map/UnmapMemory * Change memory managers to take host pointers * Fix a guest memory leak and simplify KPageTable * Expose new methods for host range query and mapping * Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists * Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking) * Add a SharedMemoryStorage class, will be useful for host mapping * Sayonara AddVaRangeToPageList, you served us well * Start to implement host memory mapping (WIP) * Support memory tracking through host exception handling * Fix some access violations from HLE service guest memory access and CPU * Fix memory tracking * Fix mapping list bugs, including a race and a error adding mapping ranges * Simple page table for memory tracking * Simple "volatile" region handle mode * Update UBOs directly (experimental, rough) * Fix the overlap check * Only set non-modified buffers as volatile * Fix some memory tracking issues * Fix possible race in MapBufferFromClientProcess (block list updates were not locked) * Write uniform update to memory immediately, only defer the buffer set. * Fix some memory tracking issues * Pass correct pages count on shared memory unmap * Armeilleure Signal Handler v1 + Unix changes Unix currently behaves like windows, rather than remapping physical * Actually check if the host platform is unix * Fix decommit on linux. * Implement windows 10 placeholder shared memory, fix a buffer issue. * Make PTC version something that will never match with master * Remove testing variable for block count * Add reference count for memory manager, fix dispose Can still deadlock with OpenAL * Add address validation, use page table for mapped check, add docs Might clean up the page table traversing routines. * Implement batched mapping/tracking. * Move documentation, fix tests. * Cleanup uniform buffer update stuff. * Remove unnecessary assignment. * Add unsafe host mapped memory switch On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work. * Remove C# exception handlers They have issues due to current .NET limitations, so the meilleure one fully replaces them for now. * Fix MapPhysicalMemory on the software MemoryManager. * Null check for GetHostAddress, docs * Add configuration for setting memory manager mode (not in UI yet) * Add config to UI * Fix type mismatch on Unix signal handler code emit * Fix 6GB DRAM mode. The size can be greater than `uint.MaxValue` when the DRAM is >4GB. * Address some feedback. * More detailed error if backing memory cannot be mapped. * SetLastError on all OS functions for consistency * Force pages dirty with UBO update instead of setting them directly. Seems to be much faster across a few games. Need retesting. * Rebase, configuration rework, fix mem tracking regression * Fix race in FreePages * Set memory managers null after decrementing ref count * Remove readonly keyword, as this is now modified. * Use a local variable for the signal handler rather than a register. * Fix bug with buffer resize, and index/uniform buffer binding. Should fix flickering in games. * Add InvalidAccessHandler to MemoryTracking Doesn't do anything yet * Call invalid access handler on unmapped read/write. Same rules as the regular memory manager. * Make unsafe mapped memory its own MemoryManagerType * Move FlushUboDirty into UpdateState. * Buffer dirty cache, rather than ubo cache Much cleaner, may be reusable for Inline2Memory updates. * This doesn't return anything anymore. * Add sigaction remove methods, correct a few function signatures. * Return empty list of physical regions for size 0. * Also on AddressSpaceManager Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-05-24Improve accuracy of reciprocal step instructions (#2305)gdkchan
* Improve accuracy of reciprocal step instructions * Fix small mistake on RECPE rounding, nits, PTC version bump
2021-05-20Use branch instead of tailcall for recursive calls (#2282)FICTURE7
* Use branch instead of tailcall for recursive calls Use a branch instead of doing a tailcall for recursive calls. This avoids having to store the dispatch address, setting up the epilogue and keeps guest registers in host registers for longer. The rejit check is moved down into the entry block so that the rejit behaviour remains the same as before. * Set PTC version Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-05-20Add `BIC/ORR Vd.T, #imm` fast path (#2279)FICTURE7
* Add fast path for BIC Vd.T, #imm * Add fast path for ORR Vd.T, #imm * Set PTC version * Fixup Exception to InvalidOperationException
2021-05-13Fold constant offsets and group constant addresses (#2285)gdkchan
* Fold constant offsets and group constant addresses * PPTC version bump
2021-04-18Add inlined on translation call counting (#2190)FICTURE7
* Add EntryTable<TEntry> * Add on translation call counting * Add Counter * Add PPTC support * Make Counter a generic & use a 32-bit counter instead * Return false on overflow * Set PPTC version * Print more information about the rejit queue * Make Counter<T> disposable * Remove Block.TailCall since it is not used anymore * Apply suggestions from code review Address gdkchan's feedback Co-authored-by: gdkchan <gab.dark.100@gmail.com> * Fix more stale docs * Remove rejit requests queue logging * Make Counter<T> finalizable Most certainly quite an odd use case. * Make EntryTable<T>.TryAllocate set entry to default * Re-trigger CI * Dispose Counters before they hit the finalizer queue * Re-trigger CI Just for good measure... * Make EntryTable<T> expandable * EntryTable is now expandable instead of being a fixed slab. * Remove EntryTable<T>.TryAllocate * Remove Counter<T>.TryCreate Address LDj3SNuD's feedback * Apply suggestions from code review Address LDj3SNuD's feedback Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Remove useless return * POH approach, but the sequel * Revert "POH approach, but the sequel" This reverts commit 5f5abaa24735726ff2db367dc74f98055d4f4cba. The sequel got shelved * Add extra documentation Co-authored-by: gdkchan <gab.dark.100@gmail.com> Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2021-04-02Improve `StoreToContext` emission (#2155)FICTURE7
* Improve StoreToContext emission Hoist StoreToContext in dynamic branch fast & slow paths out into their predecessor. Reduces register pressure, code size and compile time because we're throwing less stuff down the pipeline. * Set PTC internal version * Turn EmitDynamicTableCall private * Re-trigger CI
2021-03-25Add Sqdmulh_Ve & Sqrdmulh_Ve Inst.s with Tests. (#2139)LDj3SNuD
2021-02-22Implement VCNT instruction (#1963)mageven
* Implement VCNT based on AArch64 CNT Add tests * Update PTC version * Address LDj's comments * Explicit size in encoding * Tighter tests * Replace SoftFallback with IR helper Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Reduce one BitwiseAnd from IR fallback Based on popcount64b from https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation * Rename parameter and add assert Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2021-02-17Fix memory tracking performance regression (#2026)gdkchan
* Fix memory tracking performance regression * Set PTC version
2021-02-16Validate CPU virtual addresses on access (#1987)gdkchan
* Enable PTE null checks again * Do address validation on EmitPtPointerLoad, and make it branchless * PTC version increment * Mask of pointer tag for exclusive access * Move mask to the correct place Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2021-01-28Lower precision of estimate instruction results to match Arm behavior (#1943)gdkchan
* Lower precision of estimate instruction results to match Arm behavior * PTC version update * Nits
2021-01-26Implement PRFM (register variant) as NOP (#1956)mageven
* Implement PRFM (register variant) as NOP Fix typo pfrm -> prfm Add comments to distinguish variants * Increment PTC version
2021-01-25Add VCLZ.* fast path (#1917)FICTURE7
* Add VCLZ fast path * Add VCLZ.8B/16B SSSE3 fast path * Add VCLZ.4H/8H SSSE3 fast path * Add VCLZ.2S/4S SSE2 fast path * Improve CLZ.4H/8H fast path * Improve CLZ.2S/4S fast path * Set PPTC version
2021-01-20CPU (A64): Add Fmaxnmp & Fminnmp Scalar Inst.s, Fast & Slow Paths; with ↵LDj3SNuD
Tests. (#1894)
2021-01-04CPU (A64): Add Pmull_V Inst. with Clmul fast path for the "1/2D -> 1Q" ↵LDj3SNuD
variant & Sse fast path and slow path for both the "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test. (#1817) * Add Pmull_V Sse fast path only, both "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test. * Add Clmul fast path for the 128 bits variant. * Small optimisation (save 60 instructions) for the Sse fast path about the 128 bits variant. * Add slow path, both variants. Fix V128 Shl/Shr when shift = 0. * A32: Add Vmull_I P64 variant (slow path); not tested. * A32: Add Vmull_I_P8_P64 Test and fix P64 variant.
2020-12-17Fix Vnmls_S fast path (F64: losing input d value). Fix Vnmla_S & Vnmls_S ↵LDj3SNuD
slow paths (using fused inst.s). Fix Vfma_V slow path not using StandardFPSCRValue(). (#1775) * Fix Vnmls_S fast path (F64: losing input d value). Fix Vnmla_S & Vnmls_S slow paths (using fused inst.s). Add Vfma_S & Vfms_S Fma fast paths. Add Vfnma_S inst. with Fma/Sse fast paths and slow path. Add Vfnms_S Sse fast path. Add Tests for affected inst.s. Nits. * InternalVersion = 1775 * Nits. * Fix Vfma_V slow path not using StandardFPSCRValue(). * Nit: Fix Vfma_V order. * Add Vfms_V Sse fast path and slow path. * Add Vfma_V and Vfms_V Test.
2020-12-17PPTC Follow-up. (#1712)LDj3SNuD
* Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs. Added invalidation of .cache files in the event of reuse on a different user operating system. Added .info and .cache files invalidation in case of a failed stream decompression. Nits. * InternalVersion = 1712; * Nits. * Address comment. * Get rid of BinaryFormatter. Nits. * Move Ptc.LoadTranslations(). Nits. * Nits. * Fixed corner cases (in case backup copies have to be used). Added save logs. * Not core fixes. * Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers. * Add LoadTranslations log. * Nits. * Removed the search and management of LowCq overlapping functions. * Final increment of .info and .cache flags. * Nit. * GetIndirectFunctionAddress(): Validate that writing actually takes place in dynamic table memory range (and not elsewhere). * Fix Ptc.UpdateInfo() due to rebase. * Nit for retrigger Checks. * Nit for retrigger Checks.
2020-12-16CPU: Implement VRINTX.F32 | VRINTX.F64 (#1776)sharmander
* Start implementation * Draft * Updated opcode. Needs verification. * Clean up code. * Update implementation and tests. * Update implemenation + tests * Get RM from FPSCR + Do not use emit/addintrinsic * Remove "fast" path, as recommended by gdk. * Variable DELETED. * Update ARMeilleure/Decoders/OpCodeTable.cs Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Move method * stringing things together :) Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2020-12-16Clear JIT cache on exit (#1518)gdkchan
* Initial cache memory allocator implementation * Get rid of CallFlag * Perform cache cleanup on exit * Basic cache invalidation * Thats not how conditionals works in C# it seems * Set PTC version to PR number * Address PR feedback * Update InstEmitFlowHelper.cs * Flag clear on address is no longer needed * Do not include exit block in function size calculation * Dispose jump table * For future use * InternalVersion = 1519 (force retest). Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2020-12-15CPU: Implement VFMA (Vector) (#1762)sharmander
* Implement VFMA.F64 * Simplify switch * Simplify FMA Instructions into their own IntrinsicType. * Remove whitespace * Fix indentation * Change tests for Vfnms -- disable inf / nan * Move args up, not description ;) * Implementation Complete. All Tests Pass (Slow / Fast Path) * Move location of function in assembler + test updates. * Shift params upwards * Remove unused function * Update PTC version. * Add comments / re-oreder opcode table. * Remove whitespace * Fix nit * Fix nit. * Fix whitespace * Wrong opcode was used by a bad merge. * Addressed rip's comments.
2020-12-13Fix register read after write on STREX implementation (#1801)gdkchan
* Fix register read after write on STREX implementation * PTC version update
2020-12-07CPU: Implement VFNMA.F32 | F.64 (#1783)sharmander
* Implement VFNMA.F<32/64> * Update PTC Version * Update Implementation & Renames & Correct Order * Fix alignment * Update implementation to not trigger assert * Actually use the intrinsic that makes sense :)
2020-12-07Add support for guest Fz (Fpcr) mode through host Ftz and Daz (Mxcsr) modes ↵LDj3SNuD
(fast paths). (#1630) * Add support for guest Fz (Fpcr) mode through host Ftz and Daz (Mxcsr) modes (fast paths). * Ptc.InternalVersion = 1630 * Nits. * Address comments. * Update Ptc.cs * Address comment.
2020-12-03CPU: Implement VFNMS.F32/64 (#1758)sharmander
* Add necessary methods / op-code * Enable Support for FMA Instruction Set * Add Intrinsics / Assembly Opcodes for VFMSUB231XX. * Add X86 Instructions for VFMSUB231XX * Implement VFNMS * Implement VFNMS Tests * Add special cases for FMA instructions. * Update PPTC Version * Remove unused Op * Move Check into Assert / Cleanup * Rename and cleanup * Whitespace * Whitespace / Rename * Re-sort * Address final requests * Implement VFMA.F64 * Simplify switch * Simplify FMA Instructions into their own IntrinsicType. * Remove whitespace * Fix indentation * Change tests for Vfnms -- disable inf / nan * Move args up, not description ;) * Undo vfma * Completely remove vfms code., * Fix order of instruction in assembler
2020-11-18CPU (A64): Add FP16/FP32 fast paths (F16C Intrinsics) for Fcvt_S, Fcvtl_V & ↵LDj3SNuD
Fcvtn_V Instructions. Now HardwareCapabilities uses CpuId. (#1650) * net5.0 * CPU (A64): Add FP16/FP32 fast paths (F16C Intrinsics) for Fcvt_S, Fcvtl_V & Fcvtn_V Instructions. Switch to .NET 5.0. Nits. Tests performed successfully in both debug and release mode (for all instructions involved). * Address comment. * Update appveyor.yml * Revert "Update appveyor.yml" This reverts commit 27cdd59e8b90e227e6924d9c162af26c00a89013. * Remove Assembler CpuId. * Update appveyor.yml * Address comment.
2020-10-16Memory Read/Write Tracking using Region Handles (#1272)riperiperi
* WIP Range Tracking - Texture invalidation seems to have large problems - Buffer/Pool invalidation may have problems - Mirror memory tracking puts an additional `add` in compiled code, we likely just want to make HLE access slower if this is the final solution. - Native project is in the messiest possible location. - [HACK] JIT memory access always uses native "fast" path - [HACK] Trying some things with texture invalidation and views. It works :) Still a few hacks, messy things, slow things More work in progress stuff (also move to memory project) Quite a bit faster now. - Unmapping GPU VA and CPU VA will now correctly update write tracking regions, and invalidate textures for the former. - The Virtual range list is now non-overlapping like the physical one. - Fixed some bugs where regions could leak. - Introduced a weird bug that I still need to track down (consistent invalid buffer in MK8 ribbon road) Move some stuff. I think we'll eventually just put the dll and so for this in a nuget package. Fix rebase. [WIP] MultiRegionHandle variable size ranges - Avoid reprotecting regions that change often (needs some tweaking) - There's still a bug in buffers, somehow. - Might want different api for minimum granularity Fix rebase issue Commit everything needed for software only tracking. Remove native components. Remove more native stuff. Cleanup Use a separate window for the background context, update opentk. (fixes linux) Some experimental changes Should get things working up to scratch - still need to try some things with flush/modification and res scale. Include address with the region action. Initial work to make range tracking work Still a ton of bugs Fix some issues with the new stuff. * Fix texture flush instability There's still some weird behaviour, but it's much improved without this. (textures with cpu modified data were flushing over it) * Find the destination texture for Buffer->Texture full copy Greatly improves performance for nvdec videos (with range tracking) * Further improve texture tracking * Disable Memory Tracking for view parents This is a temporary approach to better match behaviour on master (where invalidations would be soaked up by views, rather than trigger twice) The assumption is that when views are created to a texture, they will cover all of its data anyways. Of course, this can easily be improved in future. * Introduce some tracking tests. WIP * Complete base tests. * Add more tests for multiregion, fix existing test. * Cleanup Part 1 * Remove unnecessary code from memory tracking * Fix some inconsistencies with 3D texture rule. * Add dispose tests. * Use a background thread for the background context. Rather than setting and unsetting a context as current, doing the work on a dedicated thread with signals seems to be a bit faster. Also nerf the multithreading test a bit. * Copy to texture with matching alignment This extends the copy to work for some videos with unusual size, such as tutorial videos in SMO. It will only occur if the destination texture already exists at XCount size. * Track reads for buffer copies. Synchronize new buffers before copying overlaps. * Remove old texture flushing mechanisms. Range tracking all the way, baby. * Wake the background thread when disposing. Avoids a deadlock when games are closed. * Address Feedback 1 * Separate TextureCopy instance for background thread Also `BackgroundContextWorker.InBackground` for a more sensible idenfifier for if we're in a background thread. * Add missing XML docs. * Address Feedback * Maybe I should start drinking coffee. * Some more feedback. * Remove flush warning, Refocus window after making background context
2020-10-13Add Umaal & Vabd_I, Vabdl_I, Vaddl_I, Vhadd, Vqshrn, Vshll inst.s (slow ↵LDj3SNuD
paths). (#1577) * Add Umaal & Vabd_I, Vabdl_I, Vaddl_I, Vhadd, Vqshrn, Vshll inst.s (slow paths). No test provided (i.e. draft). * Ptc InternalVersion = 1577
2020-09-22IPC refactor part 1: Use explicit separate threads to process requests (#1447)gdkchan
* Changes to allow explicit management of service threads * Remove now unused code * Remove ThreadCounter, its no longer needed * Allow and use separate server per service, also fix exit issues * New policy change: PTC version now uses PR number
2020-09-19Fix host stack overflow caused by some recursive guest methods. (#1528)LDj3SNuD
* Fix host stack overflow caused by some recursive guest methods. * PPTC flag up. * Address comments. Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2020-09-19Implement block placement (#1549)FICTURE7
* Implement block placement Implement a simple pass which re-orders cold blocks at the end of the list of blocks in the CFG. * Set PPTC version * Use Array.Resize Address gdkchan's feedback
2020-09-07Do not emit StoreToContext before Return (#1537)FICTURE7
* Do not emit StoreToContext before Return * Set PPTC version
2020-08-31Improve static branch prediction along fast path for memory accesses (#1484)FICTURE7
* Improve static branch prediction along fast path for memory accesses * Set PPTC interval version
2020-08-31CPU (A64): Add Scvtf_S_Fixed & Ucvtf_S_Fixed with Tests. (#1492)LDj3SNuD
2020-08-13Fix Vcvt_FI & Vcvt_RM; Add Vfma_S & Vfms_S. Add Tests. (#1471)LDj3SNuD
* Fix Vcvt_FI & Vcvt_RM; Add Vfma_S & Vfms_S. Add Tests. * Address PR feedback & Nit.
2020-08-08CPU: This PR fixes Fpscr, among other things. (#1433)LDj3SNuD
* CPU: This PR fixes Fpscr, among other things. * Add Fpscr.Qc = 1 if sat. for Vqrshrn & Vqrshrun. * Fix Vcmp & Vcmpe opcode table. * Revert "Fix Vcmp & Vcmpe opcode table." This reverts commit c117d9410d693185ff5f8ee8e457ffbfb2027dd5. * Address PR feedbacks.
2020-08-05Improve branch operations (#1442)Ficture Seven
* Add Compare instruction * Add BranchIf instruction * Use test when BranchIf & Compare against 0 * Propagate Compare into BranchIfTrue/False use - Propagate Compare operations into their BranchIfTrue/False use and turn these into a BranchIf. - Clean up Comparison enum. * Replace BranchIfTrue/False with BranchIf * Use BranchIf in EmitPtPointerLoad - Using BranchIf early instead of BranchIfTrue/False improves LCQ and reduces the amount of work needed by the Optimizer. EmitPtPointerLoader was a/the big producer of BranchIfTrue/False. - Fix asserts firing when assembling BitwiseAnd because of type mismatch in EmitStoreExclusive. This is harmless and should not cause any diffs. * Increment PPTC interval version * Improve IRDumper for BranchIf & Compare * Use BranchIf in EmitNativeCall * Clean up * Do not emit test when immediately preceded by and
2020-07-30 Implement inline memory load/store exclusive and ordered (#1413)gdkchan
* Implement inline memory load/store exclusive * Fix missing REX prefix on 8-bits CMPXCHG * Increment PTC version due to bugfix * Remove redundant memory checks * Address PR feedback * Increment PPTC version
2020-07-30 Print guest stack trace on invalid memory access (#1407)gdkchan
* Print guest stack trace on invalid memory access * Improve XML docs
2020-07-19Implements some 32-bit instructions (VBIC, VTST, VSRA) (#1192)Valentin PONS
* Added some 32 bits instructions: * VBIC * VTST * VSRA * Incremented the PTC * Add tests and fix implementation * Fixed VBIC immediate opcode mapping * Hey hey! * Nit. Co-authored-by: gdkchan <gab.dark.100@gmail.com> Co-authored-by: LDj3SNuD <dvitiello@gmail.com> Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2020-07-17CPU: A32: Fix Vabs_V & Vneg_V (S8, S16, S32 & F32); add Tests. (#1394)LDj3SNuD
* Fix Vabs_V & Vneg_V (S8, S16, S32 & F32); add Tests. * Update Ptc.cs
2020-07-17CPU: A32: Add Vadd & Vsub Wide (S/U_8/16/32) Inst.s with Test. (#1390)LDj3SNuD
2020-07-13Add Fmax/minv_V & S/Ushl_S Inst.s with Tests. Fix Maxps/d & Minps/d d… (#1335)LDj3SNuD
* Add Fmax/minv_V & S/Ushl_S Inst.s with Tests. Fix Maxps/d & Minps/d double zero sign handling. Allows better handling of NaNs. * Optimized EmitSse2VectorIsNaNOpF() for multiple uses per opF.
2020-07-13Add SSE4.2 Path for CRC32, add A32 variant, add tests for non-castagnoli ↵riperiperi
variants. (#1328) * Add CRC32 A32 instructions. * Fix CRC32 instructions. * Add CRC intrinsic and fast path. Loop is currently unrolled, will look into adding temp vars after tests are added. * Begin work on Crc tests * Fix SSE4.2 path for CRC32C, finialize tests. * Remove unused IR path. * Fix spacing between prefix checks. * This should be Src. * PTC Version * OpCodeTable Order * Integer check improvement. Value and Crc can be either 32 or 64 size. * This wasn't necessary... * If size is 3, value type must be I64. * Fix same src+dest handling for non crc intrinsics. * Pre-fix (ha) issue with vex encodings