From 68e15c1a7471e4b2844fc0d3c7385523e595521d Mon Sep 17 00:00:00 2001
From: jduncanator <1518948+jduncanator@users.noreply.github.com>
Date: Thu, 5 Mar 2020 11:41:33 +1100
Subject: Implement Fast Paths for most A32 SIMD instructions (#952)

* Begin work on A32 SIMD Intrinsics

* More instructions, some cleanup.

* Intrinsics for Move instructions (zip etc)

These pass the existing tests.

* Intrinsics for some of Cvt

While doing this I noticed that the conversion for int/fp was incorrect
in the slow path. I'll fix this in the original repo.

* Intrinsics for more Arithmetic instructions.

* Intrinsics for Vext

* Fix VEXT Intrinsic for double words.

* Use InsertPs to move scalar values.

* Cleanup, fix VPADD.f32 and VMIN signed integer.

* Cleanup, add SSE2 support for scalar insert.

Works similarly to the IR scalar insert, but obviously this one works
directly on V128.

* Minor cleanup.

* Enable intrinsic for FP64 to integer conversion.

* Address feedback apart from splitting out intrinsic float abs

Also: bad VREV encodings as undefined rather than throwing in translation.

* Move float abs to helper, fix bug with cvt

* Rename opc2 & 3 to match A32 docs, use ArgumentOutOfRangeException appropriately.

* Get name of variable at compilation rather than string literal.

* Use correct double sign mask.
---
 ARMeilleure/IntermediateRepresentation/Intrinsic.cs | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'ARMeilleure/IntermediateRepresentation')

diff --git a/ARMeilleure/IntermediateRepresentation/Intrinsic.cs b/ARMeilleure/IntermediateRepresentation/Intrinsic.cs
index c3f375c4..c60e80cf 100644
--- a/ARMeilleure/IntermediateRepresentation/Intrinsic.cs
+++ b/ARMeilleure/IntermediateRepresentation/Intrinsic.cs
@@ -41,6 +41,7 @@ namespace ARMeilleure.IntermediateRepresentation
         X86Divss,
         X86Haddpd,
         X86Haddps,
+        X86Insertps,
         X86Maxpd,
         X86Maxps,
         X86Maxsd,
@@ -51,6 +52,7 @@ namespace ARMeilleure.IntermediateRepresentation
         X86Minss,
         X86Movhlps,
         X86Movlhps,
+        X86Movss,
         X86Mulpd,
         X86Mulps,
         X86Mulsd,
-- 
cgit v1.2.3