Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Apr 8, 2023

Add SIMD to unroll length [16..64] (can be enabled for [64..128] with avx512), [16..32] on arm64.

bool Test(Span<byte> s) => s.SequenceEqual(
    "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND"u8);

Old codegen:

; Method Prog:Test(System.Span`1[ubyte]):bool:this
G_M52730_IG01:              
       4883EC28             sub      rsp, 40
G_M52730_IG02:              
       49B8882A908BDA010000 mov      r8, 0x1DA8B902A88
       488B0A               mov      rcx, bword ptr [rdx]
       8B5208               mov      edx, dword ptr [rdx+08H]
       4C89442420           mov      bword ptr [rsp+20H], r8
       83FA3E               cmp      edx, 62
       7513                 jne      SHORT G_M52730_IG04
G_M52730_IG03:              
       41B83E000000         mov      r8d, 62
       488B542420           mov      rdx, bword ptr [rsp+20H]
       FF1591FE1600         call     [System.SpanHelpers:SequenceEqual(byref,byref,ulong):bool]
       EB02                 jmp      SHORT G_M52730_IG05
G_M52730_IG04:              
       33C0                 xor      eax, eax
G_M52730_IG05:              
       4883C428             add      rsp, 40
       C3                   ret      
; Total bytes of code: 56

New codegen:

; Method Prog:Test(System.Span`1[ubyte]):bool:this
G_M52730_IG01:              
       C5F877               vzeroupper 
G_M52730_IG02:              
       48B8882A7D01B3020000 mov      rax, 0x2B3017D2A88
       488B0A               mov      rcx, bword ptr [rdx]
       8B5208               mov      edx, dword ptr [rdx+08H]
       4883FA3E             cmp      rdx, 62
       752B                 jne      SHORT G_M52730_IG04
G_M52730_IG03:              
       C5FC1001             vmovups  ymm0, ymmword ptr[rcx]
       C5FC1008             vmovups  ymm1, ymmword ptr[rax]
       C5FC10511E           vmovups  ymm2, ymmword ptr[rcx+1EH]
       C5FC10581E           vmovups  ymm3, ymmword ptr[rax+1EH]
       C5FDEFC1             vpxor    ymm0, ymm0, ymm1
       C5EDEFCB             vpxor    ymm1, ymm2, ymm3
       C5FDEBC1             vpor     ymm0, ymm0, ymm1
       C4E27D17C0           vptest   ymm0, ymm0
       0F94C0               sete     al
       0FB6C0               movzx    rax, al
       EB02                 jmp      SHORT G_M52730_IG05
G_M52730_IG04:              
       33C0                 xor      eax, eax
G_M52730_IG05:              
       C5F877               vzeroupper 
       C3                   ret      
; Total bytes of code: 74

@ghost ghost assigned EgorBo Apr 8, 2023
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 8, 2023
@ghost
Copy link

ghost commented Apr 8, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Add SIMD to unroll length [16..64] (can be enabled for [64..128] with avx512), [16..32] on arm64.

bool Test(Span<byte> s) => s.SequenceEqual(
    "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND"u8);

Old codegen:

; Method Prog:Test(System.Span`1[ubyte]):bool:this
G_M52730_IG01:              
       4883EC28             sub      rsp, 40
G_M52730_IG02:              
       49B8882A908BDA010000 mov      r8, 0x1DA8B902A88
       488B0A               mov      rcx, bword ptr [rdx]
       8B5208               mov      edx, dword ptr [rdx+08H]
       4C89442420           mov      bword ptr [rsp+20H], r8
       83FA3E               cmp      edx, 62
       7513                 jne      SHORT G_M52730_IG04
G_M52730_IG03:              
       41B83E000000         mov      r8d, 62
       488B542420           mov      rdx, bword ptr [rsp+20H]
       FF1591FE1600         call     [System.SpanHelpers:SequenceEqual(byref,byref,ulong):bool]
       EB02                 jmp      SHORT G_M52730_IG05
G_M52730_IG04:              
       33C0                 xor      eax, eax
G_M52730_IG05:              
       4883C428             add      rsp, 40
       C3                   ret      
; Total bytes of code: 56

New codegen:

; Method Prog:Test(System.Span`1[ubyte]):bool:this
G_M52730_IG01:              
       C5F877               vzeroupper 
G_M52730_IG02:              
       48B8882A7D01B3020000 mov      rax, 0x2B3017D2A88
       488B0A               mov      rcx, bword ptr [rdx]
       8B5208               mov      edx, dword ptr [rdx+08H]
       4883FA3E             cmp      rdx, 62
       752B                 jne      SHORT G_M52730_IG04
G_M52730_IG03:              
       C5FC1001             vmovups  ymm0, ymmword ptr[rcx]
       C5FC1008             vmovups  ymm1, ymmword ptr[rax]
       C5FC10511E           vmovups  ymm2, ymmword ptr[rcx+1EH]
       C5FC10581E           vmovups  ymm3, ymmword ptr[rax+1EH]
       C5FDEFC1             vpxor    ymm0, ymm0, ymm1
       C5EDEFCB             vpxor    ymm1, ymm2, ymm3
       C5FDEBC1             vpor     ymm0, ymm0, ymm1
       C4E27D17C0           vptest   ymm0, ymm0
       0F94C0               sete     al
       0FB6C0               movzx    rax, al
       EB02                 jmp      SHORT G_M52730_IG05
G_M52730_IG04:              
       33C0                 xor      eax, eax
G_M52730_IG05:              
       C5F877               vzeroupper 
       C3                   ret      
; Total bytes of code: 74
Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr

Milestone: -

Comment on lines +2044 to +2045
GenTree* rXor = newBinaryOp(comp, GT_XOR, actualLoadType, l2Indir, r2Indir);
GenTree* resultOr = newBinaryOp(comp, GT_OR, actualLoadType, lXor, rXor);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you log an issue tracking us fixing this to opportunistically using vpternlog for AVX-512 hardware?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you log an issue tracking us fixing this to opportunistically using vpternlog for AVX-512 hardware?

Good idea, done: #84534

@tannergooding
Copy link
Member

#84536 is the SPMI replay failure

@EgorBo
Copy link
Member Author

EgorBo commented Apr 10, 2023

PTAL @jakobbotsch since you reviewed the previous impl of LowerCallMemcmp

@EgorBo EgorBo requested a review from jakobbotsch April 10, 2023 09:18
@EgorBo EgorBo merged commit eda1c3a into dotnet:main Apr 10, 2023
@EgorBo EgorBo deleted the SIMD-LowerCallMemcmp branch April 10, 2023 22:56
sbomer added a commit that referenced this pull request Apr 11, 2023
sbomer added a commit that referenced this pull request Apr 11, 2023
EgorBo added a commit to EgorBo/runtime-1 that referenced this pull request Apr 11, 2023
@ghost ghost locked as resolved and limited conversation to collaborators May 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants