Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: suboptimal block layout #76872

Closed
EgorBo opened this issue Oct 11, 2022 · 10 comments · Fixed by #77103
Closed

JIT: suboptimal block layout #76872

EgorBo opened this issue Oct 11, 2022 · 10 comments · Fixed by #77103
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Oct 11, 2022

bool M(string a) => "b" == a;

Emits:

; Method P:M(System.String):bool:this
G_M31338_IG01:
G_M31338_IG02:
       test     rdx, rdx
       je       SHORT G_M31338_IG06
G_M31338_IG03:
       cmp      dword ptr [rdx+08H], 1
       jne      SHORT G_M31338_IG05
G_M31338_IG04:
       xor      eax, eax
       cmp      word  ptr [rdx+0CH], 98
       sete     al
       jmp      SHORT G_M31338_IG07
G_M31338_IG05:
       xor      eax, eax
       jmp      SHORT G_M31338_IG07
G_M31338_IG06:
       xor      eax, eax
G_M31338_IG07:
       ret      
; Total bytes of code: 30

It seems like IG05 (BB05) block is redundant:

image

Trees after Optimize layout

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..000)-> BB05 ( cond )                     i 
BB02 [0005]  1       BB01                  0.25    [???..???)-> BB04 ( cond )                     i 
BB03 [0009]  1       BB02                  0.12    [???..???)-> BB06 (always)                     i 
BB04 [0008]  1       BB02                  0.25    [???..???)-> BB06 (always)                     i 
BB05 [0004]  1       BB01                  0.25    [???..???)                                     i 
BB06 [0002]  3       BB03,BB04,BB05        1       [000..00C)        (return)                     i 
-----------------------------------------------------------------------------------------------------------------------------------------

------------ BB01 [000..000) -> BB05 (cond), preds={} succs={BB02,BB05}

***** BB01
STMT00004 ( INL01 @ ??? ... ??? ) <- INLRT @ 0x000[E-]
N004 (  5,  5) [000034] -----------                         *  JTRUE     void   $VN.Void
N003 (  3,  3) [000025] J------N---                         \--*  EQ        int    $100
N001 (  1,  1) [000010] -----------                            +--*  LCL_VAR   ref    V01 arg1         u:1 $80
N002 (  1,  1) [000024] -----------                            \--*  CNS_INT   ref    null $VN.Null

------------ BB02 [???..???) -> BB04 (cond), preds={BB01} succs={BB03,BB04}

***** BB02
STMT00007 ( INL01 @ ??? ... ??? ) <- INLRT @ 0x000[E-]
N007 (  8,  8) [000039] -----O-----                         *  JTRUE     void   $241
N006 (  6,  6) [000020] J----O?N---                         \--*  NE        int    <l:$106, c:$105>
N004 (  4,  4) [000009] n----O?----                            +--*  IND       int    <l:$102, c:$101>
N003 (  2,  2) [000008] ------?N---                            |  \--*  ADD       byref  $180
N001 (  1,  1) [000006] ------?----                            |     +--*  LCL_VAR   ref    V01 arg1         u:1 $80
N002 (  1,  1) [000007] ------?----                            |     \--*  CNS_INT   long   8 $140
N005 (  1,  1) [000011] ------?----                            \--*  CNS_INT   int    1 $40

------------ BB03 [???..???) -> BB06 (always), preds={BB02} succs={BB06}

***** BB03
STMT00008 ( INL01 @ ??? ... ??? ) <- INLRT @ 0x000[E-]
N008 ( 10,  7) [000041] -A---O--R--                         *  ASG       int    $241
N007 (  1,  1) [000040] D------N---                         +--*  LCL_VAR   int    V04 tmp2         d:4 $VN.Void
N006 ( 10,  7) [000017] -----O?----                         \--*  EQ        int    <l:$10a, c:$109>
N004 (  5,  5) [000015] n----O?----                            +--*  IND       short  <l:$301, c:$300>
N003 (  2,  2) [000014] ------?N---                            |  \--*  ADD       byref  $181
N001 (  1,  1) [000012] ------?----                            |     +--*  LCL_VAR   ref    V01 arg1         u:1 (last use) $80
N002 (  1,  1) [000013] ------?----                            |     \--*  CNS_INT   long   12 $141
N005 (  1,  1) [000016] ------?----                            \--*  CNS_INT   int    98 $45

------------ BB04 [???..???) -> BB06 (always), preds={BB02} succs={BB06}

***** BB04
STMT00009 ( INL01 @ ??? ... ??? ) <- INLRT @ 0x000[E-]
N003 (  1,  3) [000043] -A------R--                         *  ASG       int    $VN.Void
N002 (  1,  1) [000042] D------N---                         +--*  LCL_VAR   int    V04 tmp2         d:3 $VN.Void
N001 (  1,  1) [000018] ------?----                         \--*  CNS_INT   int    0 $42

------------ BB05 [???..???), preds={BB01} succs={BB06}

***** BB05
STMT00006 ( INL01 @ ??? ... ??? ) <- INLRT @ 0x000[E-]
N003 (  1,  3) [000038] -A------R--                         *  ASG       int    $VN.Void
N002 (  1,  1) [000037] D------N---                         +--*  LCL_VAR   int    V04 tmp2         d:2 $VN.Void
N001 (  1,  1) [000022] ------?----                         \--*  CNS_INT   int    0 $42

------------ BB06 [000..00C) (return), preds={BB03,BB04,BB05} succs={}

***** BB06
STMT00010 ( ??? ... ??? )
N006 (  0,  0) [000046] -A------R--                         *  ASG       int    $VN.Void
N005 (  0,  0) [000044] D------N---                         +--*  LCL_VAR   int    V04 tmp2         d:1 $VN.Void
N004 (  0,  0) [000045] -----------                         \--*  PHI       int    $1c1
N001 (  0,  0) [000049] ----------- pred BB03                  +--*  PHI_ARG   int    V04 tmp2         u:4 <l:$107, c:$108>
N002 (  0,  0) [000048] ----------- pred BB04                  +--*  PHI_ARG   int    V04 tmp2         u:3 $42
N003 (  0,  0) [000047] ----------- pred BB05                  \--*  PHI_ARG   int    V04 tmp2         u:2 $42

***** BB06
STMT00001 ( 0x000[E-] ... ??? )
N002 (  2,  2) [000004] -----------                         *  RETURN    int    $VN.Void
N001 (  1,  1) [000031] -----------                         \--*  LCL_VAR   int    V04 tmp2         u:1 (last use) $1c1

BB04 and BB05 are exactly the same and have the same successor

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 11, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 11, 2022
@ghost
Copy link

ghost commented Oct 11, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details
bool M(string a) => "b" == a;

Emits:

; Method P:M(System.String):bool:this
G_M31338_IG01:
G_M31338_IG02:
       test     rdx, rdx
       je       SHORT G_M31338_IG06
G_M31338_IG03:
       cmp      dword ptr [rdx+08H], 1
       jne      SHORT G_M31338_IG05
G_M31338_IG04:
       xor      eax, eax
       cmp      word  ptr [rdx+0CH], 98
       sete     al
       jmp      SHORT G_M31338_IG07
G_M31338_IG05:
       xor      eax, eax
       jmp      SHORT G_M31338_IG07
G_M31338_IG06:
       xor      eax, eax
G_M31338_IG07:
       ret      
; Total bytes of code: 30

It seems like IG05 (BB05) block is redundant and could use IG06 (BB06) instead:
image

Trees after Optimize layout

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..000)-> BB05 ( cond )                     i 
BB02 [0005]  1       BB01                  0.25    [???..???)-> BB04 ( cond )                     i 
BB03 [0009]  1       BB02                  0.12    [???..???)-> BB06 (always)                     i 
BB04 [0008]  1       BB02                  0.25    [???..???)-> BB06 (always)                     i 
BB05 [0004]  1       BB01                  0.25    [???..???)                                     i 
BB06 [0002]  3       BB03,BB04,BB05        1       [000..00C)        (return)                     i 
-----------------------------------------------------------------------------------------------------------------------------------------

------------ BB01 [000..000) -> BB05 (cond), preds={} succs={BB02,BB05}

***** BB01
STMT00004 ( INL01 @ ??? ... ??? ) <- INLRT @ 0x000[E-]
N004 (  5,  5) [000034] -----------                         *  JTRUE     void   $VN.Void
N003 (  3,  3) [000025] J------N---                         \--*  EQ        int    $100
N001 (  1,  1) [000010] -----------                            +--*  LCL_VAR   ref    V01 arg1         u:1 $80
N002 (  1,  1) [000024] -----------                            \--*  CNS_INT   ref    null $VN.Null

------------ BB02 [???..???) -> BB04 (cond), preds={BB01} succs={BB03,BB04}

***** BB02
STMT00007 ( INL01 @ ??? ... ??? ) <- INLRT @ 0x000[E-]
N007 (  8,  8) [000039] -----O-----                         *  JTRUE     void   $241
N006 (  6,  6) [000020] J----O?N---                         \--*  NE        int    <l:$106, c:$105>
N004 (  4,  4) [000009] n----O?----                            +--*  IND       int    <l:$102, c:$101>
N003 (  2,  2) [000008] ------?N---                            |  \--*  ADD       byref  $180
N001 (  1,  1) [000006] ------?----                            |     +--*  LCL_VAR   ref    V01 arg1         u:1 $80
N002 (  1,  1) [000007] ------?----                            |     \--*  CNS_INT   long   8 $140
N005 (  1,  1) [000011] ------?----                            \--*  CNS_INT   int    1 $40

------------ BB03 [???..???) -> BB06 (always), preds={BB02} succs={BB06}

***** BB03
STMT00008 ( INL01 @ ??? ... ??? ) <- INLRT @ 0x000[E-]
N008 ( 10,  7) [000041] -A---O--R--                         *  ASG       int    $241
N007 (  1,  1) [000040] D------N---                         +--*  LCL_VAR   int    V04 tmp2         d:4 $VN.Void
N006 ( 10,  7) [000017] -----O?----                         \--*  EQ        int    <l:$10a, c:$109>
N004 (  5,  5) [000015] n----O?----                            +--*  IND       short  <l:$301, c:$300>
N003 (  2,  2) [000014] ------?N---                            |  \--*  ADD       byref  $181
N001 (  1,  1) [000012] ------?----                            |     +--*  LCL_VAR   ref    V01 arg1         u:1 (last use) $80
N002 (  1,  1) [000013] ------?----                            |     \--*  CNS_INT   long   12 $141
N005 (  1,  1) [000016] ------?----                            \--*  CNS_INT   int    98 $45

------------ BB04 [???..???) -> BB06 (always), preds={BB02} succs={BB06}

***** BB04
STMT00009 ( INL01 @ ??? ... ??? ) <- INLRT @ 0x000[E-]
N003 (  1,  3) [000043] -A------R--                         *  ASG       int    $VN.Void
N002 (  1,  1) [000042] D------N---                         +--*  LCL_VAR   int    V04 tmp2         d:3 $VN.Void
N001 (  1,  1) [000018] ------?----                         \--*  CNS_INT   int    0 $42

------------ BB05 [???..???), preds={BB01} succs={BB06}

***** BB05
STMT00006 ( INL01 @ ??? ... ??? ) <- INLRT @ 0x000[E-]
N003 (  1,  3) [000038] -A------R--                         *  ASG       int    $VN.Void
N002 (  1,  1) [000037] D------N---                         +--*  LCL_VAR   int    V04 tmp2         d:2 $VN.Void
N001 (  1,  1) [000022] ------?----                         \--*  CNS_INT   int    0 $42

------------ BB06 [000..00C) (return), preds={BB03,BB04,BB05} succs={}

***** BB06
STMT00010 ( ??? ... ??? )
N006 (  0,  0) [000046] -A------R--                         *  ASG       int    $VN.Void
N005 (  0,  0) [000044] D------N---                         +--*  LCL_VAR   int    V04 tmp2         d:1 $VN.Void
N004 (  0,  0) [000045] -----------                         \--*  PHI       int    $1c1
N001 (  0,  0) [000049] ----------- pred BB03                  +--*  PHI_ARG   int    V04 tmp2         u:4 <l:$107, c:$108>
N002 (  0,  0) [000048] ----------- pred BB04                  +--*  PHI_ARG   int    V04 tmp2         u:3 $42
N003 (  0,  0) [000047] ----------- pred BB05                  \--*  PHI_ARG   int    V04 tmp2         u:2 $42

***** BB06
STMT00001 ( 0x000[E-] ... ??? )
N002 (  2,  2) [000004] -----------                         *  RETURN    int    $VN.Void
N001 (  1,  1) [000031] -----------                         \--*  LCL_VAR   int    V04 tmp2         u:1 (last use) $1c1
Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo EgorBo added this to the Future milestone Oct 11, 2022
@EgorBo EgorBo added help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Oct 11, 2022
@EgorBo
Copy link
Member Author

EgorBo commented Oct 11, 2022

cc @AndyAyersMS

@jakobbotsch
Copy link
Member

Related: #8795

@EgorBo
Copy link
Member Author

EgorBo commented Oct 11, 2022

Related: #8795

is it? here we don't merge tails (multiple returns) but just try to merge equal blocks with the same successor, it doesn't have to be a tail 🙂

@jakobbotsch
Copy link
Member

jakobbotsch commented Oct 11, 2022

is it? here we don't merge tails (multiple returns) but just try to merge equal blocks with the same successor, it doesn't have to be a tail 🙂

I don't think tail merging necessarily has to mean the tail of functions, it can cover merging the tails of two basic blocks in general (i.e. same instructions, same terminator)

#8795 seems like it uses the more general description, doesn't it? I guess its particular wording still doesn't cover this case completely though.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 11, 2022

#8795 seems like it uses the more general description, doesn't it? I guess its particular wording still doesn't cover this case completely though.

IMO it's too abstract to ever be closed while this one represents a concrete (popular) case that we can handle

@jakobbotsch
Copy link
Member

Yeah, I think considering the special cases is valuable. But we should just be mindful that we get the mileage out of the transformation we can, for example it might not be too difficult to generalize some parts of "fold identical BBs" to be more like general tail merging.

@AndyAyersMS
Copy link
Member

It is good to keep the more general optimizations in mind. Tail merging (and its dual, head merging) require developing quick and reliable ways to compare trees, something we currently lack.

Generalizations of tail merging that consider reorderable or semantically equivalent but syntactically different computations are probably too costly for us to consider.

@AndyAyersMS
Copy link
Member

It is good to keep the more general optimizations in mind. Tail merging (and its dual, head merging) require developing quick and reliable ways to compare trees, something we currently lack.

Note in debug we have gtHashValue which if made available in release could be used as a preliminary screen for head or tail merge opportunities.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Oct 16, 2022
Add a phase that looks for common tail statements in a block's
predecessors and merges them.

Run it both before and after morph.

Closes dotnet#8795.
Closes dotnet#76872.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 16, 2022
@AndyAyersMS
Copy link
Member

Codegen with #77103:

G_M64862_IG01:              ;; offset=0000H
                                                ;; size=0 bbWeight=1    PerfScore 0.00
G_M64862_IG02:              ;; offset=0000H
       4885C9               test     rcx, rcx
       7412                 je       SHORT G_M64862_IG05
                                                ;; size=5 bbWeight=1    PerfScore 1.25
G_M64862_IG03:              ;; offset=0005H
       83790801             cmp      dword ptr [rcx+08H], 1
       750C                 jne      SHORT G_M64862_IG05
                                                ;; size=6 bbWeight=0.25 PerfScore 1.00
G_M64862_IG04:              ;; offset=000BH
       33C0                 xor      eax, eax
       6683790C62           cmp      word  ptr [rcx+0CH], 98
       0F94C0               sete     al
       EB02                 jmp      SHORT G_M64862_IG06
                                                ;; size=12 bbWeight=0.12 PerfScore 0.78
G_M64862_IG05:              ;; offset=0017H
       33C0                 xor      eax, eax
                                                ;; size=2 bbWeight=0.25 PerfScore 0.06
G_M64862_IG06:              ;; offset=0019H
       C3                   ret
                                                ;; size=1 bbWeight=1    PerfScore 1.00

; Total bytes of code 26

AndyAyersMS added a commit that referenced this issue Oct 25, 2022
Add a phase that looks for common tail statements in a block's
predecessors and merges them.

Run it both before and after morph.

Also
* add range enable config and overall `JitEnableTailMerge` config
* add indir flag checking to `GenTree::Compare`
* remove an apparently unnecessary assert from loop recognition.

Closes #8795.
Closes #76872.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 25, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants