Skip to content

JIT: Stack zeroed with rep stosd #10744

@Zhentar

Description

@Zhentar

C# source in this gist: https://gist.github.com/Zhentar/4ffb0a5d597c4c1e788d6007f1602b21

According to vTune, 5% of my execution time is in my function's prologue. This was unexpected because it hadn't been in previous iterations (and my function body had unfortunately not improved at all).
Looking at the the disassembly, I see:

LineEnumerator.MoveNext()
	push    rdi
	push    rsi
	sub     rsp,48h
	mov     rsi,rcx
	lea     rdi,[rsp+28h]
	mov     ecx,8
	xor     eax,eax
	rep     stos dword ptr [rdi]
	mov     rcx,rsi
	mov     rax,0F1CD0434ED23h
	mov     qword ptr [rsp+40h],rax

The rep stos dword in there seems rather odd - at the very least, it should be a rep stos qword with half as many iterations (although I'm not sure it would be any faster on my Skylake). But also I don't think there's any x86 architecture for which a 32 byte rep stos is faster than a reasonable unrolled version and the unrolled version wouldn't even be particularly large. And some of the comments in the JIT code seem to suggest that rep stos shouldn't ever be getting emitted.

category:cq
theme:optimization
skill-level:intermediate
cost:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions