Page MenuHomePhabricator
Paste P104

SPIR-V Loop less-than
ActivePublic

Authored by pmoreau on Sep 10 2016, 8:34 PM.
% NV50_PROG_DEBUG=255 ../mesa_build/src/gallium/drivers/nouveau/nouveau_compiler -a e7 ./loop_lt.spv
Compiling for NVE7
translating program of type 5
Parsing SPIR-V generated by 6 (version 14)
Version 1.0
ID bound: 33
Compiling for nve7
Found var with id 23: 0x1386630
Found BB with id 13: 0x1399560
Couldn't find variable with id 24, keeping looking for it
Couldn't find BB with id 16, keeping looking for it
Node 25:
Var: 0x1386630, BB: 0x1399560
Var: (nil), BB: (nil)
Found var with id 24: 0x138cc50
Found bb with id 16: 0x13a7cd0
Adding pair
Adding pair
Node 25:
Var: 0x1386630, BB: 0x1399560
Var: 0x138cc50, BB: 0x13a7cd0
loop_lt:10 ()
---
BB:0 (3 instructions) - df = { }
-> BB:2 (tree)
0: ld u64 %r0d c0[0x0] (0)
1: ld u32 %r1 c0[0x8] (0)
2: bra BB:2 (0)
---
<- BB:0 (tree)
BB:2 (23 instructions) - df = { }
-> BB:3 (tree)
3: rdsv u32 %r2 sv[TID:0] (0)
4: rdsv u32 %r3 sv[NTID:0] (0)
5: rdsv u32 %r4 sv[CTAID:0] (0)
6: add u32 %r5 %r2 0x00000000 (0)
7: mad u32 %r6 %r3 %r4 %r5 (0)
8: cvt u64 %r7d u32 %r6 (0)
9: rdsv u32 %r8 sv[TID:1] (0)
10: rdsv u32 %r9 sv[NTID:1] (0)
11: rdsv u32 %r10 sv[CTAID:1] (0)
12: add u32 %r11 %r8 0x00000000 (0)
13: mad u32 %r12 %r9 %r10 %r11 (0)
14: cvt u64 %r13d u32 %r12 (0)
15: rdsv u32 %r14 sv[TID:2] (0)
16: rdsv u32 %r15 sv[NTID:2] (0)
17: rdsv u32 %r16 sv[CTAID:2] (0)
18: add u32 %r17 %r14 0x00000000 (0)
19: mad u32 %r18 %r15 %r16 %r17 (0)
20: cvt u64 %r19d u32 %r18 (0)
21: mov u64 %r20d %r7d (0)
22: cvt u64 %r21d u32 %r1 (0)
23: mul u64 %r22d %r20d %r21d (0)
24: cvt u32 %r23 u64 %r22d (0)
25: bra BB:3 (0)
---
<- BB:6 (back)
<- BB:2 (tree)
BB:3 (3 instructions) - df = { }
-> BB:5 (tree)
-> BB:4 (tree)
26: phi u32 %r24 %r31 (0)
27: set u8 %p25 lt u32 %r24 %r1 (0)
28: %p25 bra BB:5 (0)
---
<- BB:3 (tree)
BB:4 (1 instructions) - df = { }
-> BB:7 (tree)
29: bra BB:7 (0)
---
<- BB:4 (tree)
BB:7 (1 instructions) - df = { }
-> BB:1 (tree)
30: bra BB:1 (0)
---
<- BB:7 (tree)
BB:1 (1 instructions) - df = { }
31: exit - # (0)
---
<- BB:3 (tree)
BB:5 (8 instructions) - df = { }
-> BB:6 (tree)
32: add u32 %r26 %r23 %r24 (0)
33: cvt u64 %r27d u32 %r26 (0)
34: mov u64 %r28d 0x0000000000000000 (0)
35: mov u64 %r29d 0x0000000000000004 (0)
36: mad u64 %r28d %r29d %r27d %r28d (0)
37: add u64 %r30d %r0d %r28d (0)
38: st u32 # g[%r30d+0x0] %r26 (0)
39: bra BB:6 (0)
---
<- BB:5 (tree)
BB:6 (2 instructions) - df = { }
-> BB:3 (back)
40: add u32 %r31 %r24 0x00000001 (0)
41: bra BB:3 (0)
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }
Phi node to process!
loop_lt:10 ()
---
BB:0 (3 instructions) - df = { }
-> BB:2 (tree)
0: ld u64 %r0d c0[0x0] (0)
1: ld u32 %r1 c0[0x8] (0)
2: bra BB:2 (0)
---
<- BB:0 (tree)
BB:2 (24 instructions) - df = { }
-> BB:3 (tree)
3: rdsv u32 %r2 sv[TID:0] (0)
4: rdsv u32 %r3 sv[NTID:0] (0)
5: rdsv u32 %r4 sv[CTAID:0] (0)
6: add u32 %r5 %r2 0x00000000 (0)
7: mad u32 %r6 %r3 %r4 %r5 (0)
8: cvt u64 %r7d u32 %r6 (0)
9: rdsv u32 %r8 sv[TID:1] (0)
10: rdsv u32 %r9 sv[NTID:1] (0)
11: rdsv u32 %r10 sv[CTAID:1] (0)
12: add u32 %r11 %r8 0x00000000 (0)
13: mad u32 %r12 %r9 %r10 %r11 (0)
14: cvt u64 %r13d u32 %r12 (0)
15: rdsv u32 %r14 sv[TID:2] (0)
16: rdsv u32 %r15 sv[NTID:2] (0)
17: rdsv u32 %r16 sv[CTAID:2] (0)
18: add u32 %r17 %r14 0x00000000 (0)
19: mad u32 %r18 %r15 %r16 %r17 (0)
20: cvt u64 %r19d u32 %r18 (0)
21: mov u64 %r20d %r7d (0)
22: cvt u64 %r21d u32 %r1 (0)
23: mul u64 %r22d %r20d %r21d (0)
24: cvt u32 %r23 u64 %r22d (0)
25: mov u32 %r24 0x00000000 (0)
26: bra BB:3 (0)
---
<- BB:6 (back)
<- BB:2 (tree)
BB:3 (2 instructions) - df = { }
-> BB:5 (tree)
-> BB:4 (tree)
27: set u8 %p25 lt u32 %r24 %r1 (0)
28: %p25 bra BB:5 (0)
---
<- BB:3 (tree)
BB:4 (1 instructions) - df = { }
-> BB:7 (tree)
29: bra BB:7 (0)
---
<- BB:4 (tree)
BB:7 (1 instructions) - df = { }
-> BB:1 (tree)
30: bra BB:1 (0)
---
<- BB:7 (tree)
BB:1 (1 instructions) - df = { }
31: exit - # (0)
---
<- BB:3 (tree)
BB:5 (8 instructions) - df = { }
-> BB:6 (tree)
32: add u32 %r26 %r23 %r24 (0)
33: cvt u64 %r27d u32 %r26 (0)
34: mov u64 %r28d 0x0000000000000000 (0)
35: mov u64 %r29d 0x0000000000000004 (0)
36: mad u64 %r28d %r29d %r27d %r28d (0)
37: add u64 %r30d %r0d %r28d (0)
38: st u32 # g[%r30d+0x0] %r26 (0)
39: bra BB:6 (0)
---
<- BB:5 (tree)
BB:6 (3 instructions) - df = { }
-> BB:3 (back)
40: add u32 %r31 %r24 0x00000001 (0)
41: mov u32 %r24 %r31 (0)
42: bra BB:3 (0)
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }
loop_lt:10 ()
---
BB:0 (3 instructions) - df = { }
-> BB:2 (tree)
0: ld u64 %r0d c0[0x0] (0)
1: ld u32 %r1 c0[0x8] (0)
2: bra BB:2 (0)
---
<- BB:0 (tree)
BB:2 (24 instructions) - df = { }
-> BB:3 (tree)
3: rdsv u32 %r2 sv[TID:0] (0)
4: rdsv u32 %r3 sv[NTID:0] (0)
5: rdsv u32 %r4 sv[CTAID:0] (0)
6: add u32 %r5 %r2 0x00000000 (0)
7: mad u32 %r6 %r3 %r4 %r5 (0)
8: cvt u64 %r7d u32 %r6 (0)
9: rdsv u32 %r8 sv[TID:1] (0)
10: rdsv u32 %r9 sv[NTID:1] (0)
11: rdsv u32 %r10 sv[CTAID:1] (0)
12: add u32 %r11 %r8 0x00000000 (0)
13: mad u32 %r12 %r9 %r10 %r11 (0)
14: cvt u64 %r13d u32 %r12 (0)
15: rdsv u32 %r14 sv[TID:2] (0)
16: rdsv u32 %r15 sv[NTID:2] (0)
17: rdsv u32 %r16 sv[CTAID:2] (0)
18: add u32 %r17 %r14 0x00000000 (0)
19: mad u32 %r18 %r15 %r16 %r17 (0)
20: cvt u64 %r19d u32 %r18 (0)
21: mov u64 %r20d %r7d (0)
22: cvt u64 %r21d u32 %r1 (0)
23: mul u64 %r22d %r20d %r21d (0)
24: cvt u32 %r23 u64 %r22d (0)
25: mov u32 %r24 0x00000000 (0)
26: bra BB:3 (0)
---
<- BB:6 (back)
<- BB:2 (tree)
BB:3 (2 instructions) - df = { }
-> BB:5 (tree)
-> BB:4 (tree)
27: set u8 %p25 lt u32 %r24 %r1 (0)
28: %p25 bra BB:5 (0)
---
<- BB:3 (tree)
BB:4 (1 instructions) - df = { }
-> BB:7 (tree)
29: bra BB:7 (0)
---
<- BB:4 (tree)
BB:7 (1 instructions) - df = { }
-> BB:1 (tree)
30: bra BB:1 (0)
---
<- BB:7 (tree)
BB:1 (1 instructions) - df = { }
31: exit - # (0)
---
<- BB:3 (tree)
BB:5 (8 instructions) - df = { }
-> BB:6 (tree)
32: add u32 %r26 %r23 %r24 (0)
33: cvt u64 %r27d u32 %r26 (0)
34: mov u64 %r28d 0x0000000000000000 (0)
35: mov u64 %r29d 0x0000000000000004 (0)
36: mad u64 %r28d %r29d %r27d %r28d (0)
37: add u64 %r30d %r0d %r28d (0)
38: st u32 # g[%r30d+0x0] %r26 (0)
39: bra BB:6 (0)
---
<- BB:5 (tree)
BB:6 (3 instructions) - df = { }
-> BB:3 (back)
40: add u32 %r31 %r24 0x00000001 (0)
41: mov u32 %r24 %r31 (0)
42: bra BB:3 (0)
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }
legalizePreSSA
loop_lt:10 ()
---
BB:0 (3 instructions) - df = { }
-> BB:2 (tree)
0: ld u64 %r0d c0[0x0] (0)
1: ld u32 %r1 c0[0x8] (0)
2: bra BB:2 (0)
---
<- BB:0 (tree)
BB:2 (29 instructions) - df = { }
-> BB:3 (tree)
3: rdsv u32 %r2 sv[TID:0] (0)
4: ld u32 %r3 c7[0x100] (0)
5: rdsv u32 %r4 sv[CTAID:0] (0)
6: add u32 %r5 %r2 0x00000000 (0)
7: mad u32 %r6 %r3 %r4 %r5 (0)
8: mov u32 %r32 0x00000000 (0)
9: merge u64 %r7d %r6 %r32 (0)
10: rdsv u32 %r8 sv[TID:1] (0)
11: ld u32 %r9 c7[0x104] (0)
12: rdsv u32 %r10 sv[CTAID:1] (0)
13: add u32 %r11 %r8 0x00000000 (0)
14: mad u32 %r12 %r9 %r10 %r11 (0)
15: mov u32 %r33 0x00000000 (0)
16: merge u64 %r13d %r12 %r33 (0)
17: rdsv u32 %r14 sv[TID:2] (0)
18: ld u32 %r15 c7[0x108] (0)
19: rdsv u32 %r16 sv[CTAID:2] (0)
20: add u32 %r17 %r14 0x00000000 (0)
21: mad u32 %r18 %r15 %r16 %r17 (0)
22: mov u32 %r34 0x00000000 (0)
23: merge u64 %r19d %r18 %r34 (0)
24: mov u64 %r20d %r7d (0)
25: mov u32 %r35 0x00000000 (0)
26: merge u64 %r21d %r1 %r35 (0)
27: mul u64 %r22d %r20d %r21d (0)
28: split u64 { %r38 %r39 } %r22d (0)
29: mov u32 %r23 %r38 (0)
30: mov u32 %r24 0x00000000 (0)
31: bra BB:3 (0)
---
<- BB:6 (back)
<- BB:2 (tree)
BB:3 (2 instructions) - df = { }
-> BB:5 (tree)
-> BB:4 (tree)
32: set u8 %p25 lt u32 %r24 %r1 (0)
33: %p25 bra BB:5 (0)
---
<- BB:3 (tree)
BB:4 (1 instructions) - df = { }
-> BB:7 (tree)
34: bra BB:7 (0)
---
<- BB:4 (tree)
BB:7 (1 instructions) - df = { }
-> BB:1 (tree)
35: bra BB:1 (0)
---
<- BB:7 (tree)
BB:1 (1 instructions) - df = { }
36: exit - # (0)
---
<- BB:3 (tree)
BB:5 (9 instructions) - df = { }
-> BB:6 (tree)
37: add u32 %r26 %r23 %r24 (0)
38: mov u32 %r40 0x00000000 (0)
39: merge u64 %r27d %r26 %r40 (0)
40: mov u64 %r28d 0x0000000000000000 (0)
41: mov u64 %r29d 0x0000000000000004 (0)
42: mad u64 %r28d %r29d %r27d %r28d (0)
43: add u64 %r30d %r0d %r28d (0)
44: st u32 # g[%r30d+0x0] %r26 (0)
45: bra BB:6 (0)
---
<- BB:5 (tree)
BB:6 (3 instructions) - df = { }
-> BB:3 (back)
46: add u32 %r31 %r24 0x00000001 (0)
47: mov u32 %r24 %r31 (0)
48: bra BB:3 (0)
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }
convertToSSA
loop_lt:10 ()
---
BB:0 (3 instructions) - df = { }
-> BB:2 (tree)
0: ld u64 %r41d c0[0x0] (0)
1: ld u32 %r42 c0[0x8] (0)
2: bra BB:2 (0)
---
<- BB:0 (tree)
BB:2 (29 instructions) - idom = BB:0, df = { }
-> BB:3 (tree)
3: rdsv u32 %r43 sv[TID:0] (0)
4: ld u32 %r44 c7[0x100] (0)
5: rdsv u32 %r45 sv[CTAID:0] (0)
6: add u32 %r46 %r43 0x00000000 (0)
7: mad u32 %r47 %r44 %r45 %r46 (0)
8: mov u32 %r48 0x00000000 (0)
9: merge u64 %r49d %r47 %r48 (0)
10: rdsv u32 %r50 sv[TID:1] (0)
11: ld u32 %r51 c7[0x104] (0)
12: rdsv u32 %r52 sv[CTAID:1] (0)
13: add u32 %r53 %r50 0x00000000 (0)
14: mad u32 %r54 %r51 %r52 %r53 (0)
15: mov u32 %r55 0x00000000 (0)
16: merge u64 %r56d %r54 %r55 (0)
17: rdsv u32 %r57 sv[TID:2] (0)
18: ld u32 %r58 c7[0x108] (0)
19: rdsv u32 %r59 sv[CTAID:2] (0)
20: add u32 %r60 %r57 0x00000000 (0)
21: mad u32 %r61 %r58 %r59 %r60 (0)
22: mov u32 %r62 0x00000000 (0)
23: merge u64 %r63d %r61 %r62 (0)
24: mov u64 %r64d %r49d (0)
25: mov u32 %r65 0x00000000 (0)
26: merge u64 %r66d %r42 %r65 (0)
27: mul u64 %r67d %r64d %r66d (0)
28: split u64 { %r68 %r69 } %r67d (0)
29: mov u32 %r70 %r68 (0)
30: mov u32 %r71 0x00000000 (0)
31: bra BB:3 (0)
---
<- BB:6 (back)
<- BB:2 (tree)
BB:3 (3 instructions) - idom = BB:2, df = { BB:3 }
-> BB:5 (tree)
-> BB:4 (tree)
32: phi u32 %r72 %r82 %r71 (0)
33: set u8 %p73 lt u32 %r72 %r42 (0)
34: %p73 bra BB:5 (0)
---
<- BB:3 (tree)
BB:4 (1 instructions) - idom = BB:3, df = { }
-> BB:7 (tree)
35: bra BB:7 (0)
---
<- BB:4 (tree)
BB:7 (1 instructions) - idom = BB:4, df = { }
-> BB:1 (tree)
36: bra BB:1 (0)
---
<- BB:7 (tree)
BB:1 (1 instructions) - idom = BB:7, df = { }
37: exit - # (0)
---
<- BB:3 (tree)
BB:5 (9 instructions) - idom = BB:3, df = { BB:3 }
-> BB:6 (tree)
38: add u32 %r74 %r70 %r72 (0)
39: mov u32 %r75 0x00000000 (0)
40: merge u64 %r76d %r74 %r75 (0)
41: mov u64 %r77d 0x0000000000000000 (0)
42: mov u64 %r78d 0x0000000000000004 (0)
43: mad u64 %r79d %r78d %r76d %r77d (0)
44: add u64 %r80d %r41d %r79d (0)
45: st u32 # g[%r80d+0x0] %r74 (0)
46: bra BB:6 (0)
---
<- BB:5 (tree)
BB:6 (3 instructions) - idom = BB:5, df = { BB:3 }
-> BB:3 (back)
47: add u32 %r81 %r72 0x00000001 (0)
48: mov u32 %r82 %r81 (0)
49: bra BB:3 (0)
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }
PEEPHOLE: DeadCodeElim
PEEPHOLE: CopyPropagation
PEEPHOLE: MergeSplits
PEEPHOLE: GlobalCSE
PEEPHOLE: LocalCSE
PEEPHOLE: AlgebraicOpt
PEEPHOLE: ModifierFolding
PEEPHOLE: ConstantFolding
PEEPHOLE: Split64BitOpPreRA
PEEPHOLE: LoadPropagation
PEEPHOLE: IndirectPropagation
PEEPHOLE: MemoryOpt
PEEPHOLE: LocalCSE
PEEPHOLE: DeadCodeElim
optimizeSSA
loop_lt:10 ()
---
BB:0 (2 instructions) - df = { }
-> BB:2 (tree)
0: ld u32 %r42 c0[0x8] (0)
1: bra BB:2 (0)
---
<- BB:0 (tree)
BB:2 (16 instructions) - idom = BB:0, df = { }
-> BB:3 (tree)
2: rdsv u32 %r43 sv[TID:0] (0)
3: rdsv u32 %r45 sv[CTAID:0] (0)
4: mov u32 %r46 %r43 (0)
5: mad u32 %r47 %r45 c7[0x100] %r46 (0)
6: mov u32 %r48 0x00000000 (0)
7: merge u64 %r49d %r47 %r48 (0)
8: merge u64 %r66d %r42 %r48 (0)
9: split u64 { %r84 %r85 } %r49d (0)
10: split u64 { %r86 %r87 } %r66d (0)
11: mul u32 %r88 %r85 %r86 (0)
12: mad u32 %r89 %r84 %r87 %r88 (0)
13: mad (SUBOP:1) u32 %r91 %r84 %r86 %r89 (0)
14: mul u32 %r90 %r84 %r86 (0)
15: merge u64 %r67d %r90 %r91 (0)
16: split u64 { %r68 %r69 } %r67d (0)
17: bra BB:3 (0)
---
<- BB:6 (back)
<- BB:2 (tree)
BB:3 (3 instructions) - idom = BB:2, df = { BB:3 }
-> BB:5 (tree)
-> BB:4 (tree)
18: phi u32 %r72 %r81 %r48 (0)
19: set u8 %p73 lt u32 %r72 c0[0x8] (0)
20: %p73 bra BB:5 (0)
---
<- BB:3 (tree)
BB:4 (1 instructions) - idom = BB:3, df = { }
-> BB:7 (tree)
21: bra BB:7 (0)
---
<- BB:4 (tree)
BB:7 (1 instructions) - idom = BB:4, df = { }
-> BB:1 (tree)
22: bra BB:1 (0)
---
<- BB:7 (tree)
BB:1 (1 instructions) - idom = BB:7, df = { }
23: exit - # (0)
---
<- BB:3 (tree)
BB:5 (14 instructions) - idom = BB:3, df = { BB:3 }
-> BB:6 (tree)
24: add u32 %r74 %r68 %r72 (0)
25: mov u32 %r75 0x00000000 (0)
26: merge u64 %r76d %r74 %r75 (0)
27: mov u64 %r78d 0x0000000000000004 (0)
28: split u64 { %r93 %r94 } %r78d (0)
29: split u64 { %r95 %r96 } %r76d (0)
30: mul u32 %r97 %r94 %r95 (0)
31: mad u32 %r98 %r93 %r96 %r97 (0)
32: mad (SUBOP:1) u32 %r100 %r93 %r95 %r98 (0)
33: mul u32 %r99 %r93 %r95 (0)
34: merge u64 %r79d %r99 %r100 (0)
35: add u64 %r80d %r79d c0[0x0] (0)
36: st u32 # g[%r80d+0x0] %r74 (0)
37: bra BB:6 (0)
---
<- BB:5 (tree)
BB:6 (2 instructions) - idom = BB:5, df = { BB:3 }
-> BB:3 (back)
38: add u32 %r81 %r72 0x00000001 (0)
39: bra BB:3 (0)
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }
legalizeSSA
loop_lt:10 ()
---
BB:0 (2 instructions) - df = { }
-> BB:2 (tree)
0: ld u32 %r42 c0[0x8] (0)
1: bra BB:2 (0)
---
<- BB:0 (tree)
BB:2 (16 instructions) - idom = BB:0, df = { }
-> BB:3 (tree)
2: rdsv u32 %r43 sv[TID:0] (0)
3: rdsv u32 %r45 sv[CTAID:0] (0)
4: mov u32 %r46 %r43 (0)
5: mad u32 %r47 %r45 c7[0x100] %r46 (0)
6: mov u32 %r48 0x00000000 (0)
7: merge u64 %r49d %r47 %r48 (0)
8: merge u64 %r66d %r42 %r48 (0)
9: split u64 { %r84 %r85 } %r49d (0)
10: split u64 { %r86 %r87 } %r66d (0)
11: mul u32 %r88 %r85 %r86 (0)
12: mad u32 %r89 %r84 %r87 %r88 (0)
13: mad (SUBOP:1) u32 %r91 %r84 %r86 %r89 (0)
14: mul u32 %r90 %r84 %r86 (0)
15: merge u64 %r67d %r90 %r91 (0)
16: split u64 { %r68 %r69 } %r67d (0)
17: bra BB:3 (0)
---
<- BB:6 (back)
<- BB:2 (tree)
BB:3 (3 instructions) - idom = BB:2, df = { BB:3 }
-> BB:5 (tree)
-> BB:4 (tree)
18: phi u32 %r72 %r81 %r48 (0)
19: set u8 %p73 lt u32 %r72 c0[0x8] (0)
20: %p73 bra BB:5 (0)
---
<- BB:3 (tree)
BB:4 (1 instructions) - idom = BB:3, df = { }
-> BB:7 (tree)
21: bra BB:7 (0)
---
<- BB:4 (tree)
BB:7 (1 instructions) - idom = BB:4, df = { }
-> BB:1 (tree)
22: bra BB:1 (0)
---
<- BB:7 (tree)
BB:1 (1 instructions) - idom = BB:7, df = { }
23: exit - # (0)
---
<- BB:3 (tree)
BB:5 (14 instructions) - idom = BB:3, df = { BB:3 }
-> BB:6 (tree)
24: add u32 %r74 %r68 %r72 (0)
25: mov u32 %r75 0x00000000 (0)
26: merge u64 %r76d %r74 %r75 (0)
27: mov u64 %r78d 0x0000000000000004 (0)
28: split u64 { %r93 %r94 } %r78d (0)
29: split u64 { %r95 %r96 } %r76d (0)
30: mul u32 %r97 %r94 %r95 (0)
31: mad u32 %r98 %r93 %r96 %r97 (0)
32: mad (SUBOP:1) u32 %r100 %r93 %r95 %r98 (0)
33: mul u32 %r99 %r93 %r95 (0)
34: merge u64 %r79d %r99 %r100 (0)
35: add u64 %r80d %r79d c0[0x0] (0)
36: st u32 # g[%r80d+0x0] %r74 (0)
37: bra BB:6 (0)
---
<- BB:5 (tree)
BB:6 (2 instructions) - idom = BB:5, df = { BB:3 }
-> BB:3 (back)
38: add u32 %r81 %r72 0x00000001 (0)
39: bra BB:3 (0)
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }
loop_lt:10 ()
---
BB:0 (2 instructions) - df = { }
-> BB:2 (tree)
0: ld u32 %r42 c0[0x8] (0)
1: bra BB:2 (0)
---
<- BB:0 (tree)
BB:2 (19 instructions) - idom = BB:0, df = { }
-> BB:3 (tree)
2: rdsv u32 %r43 sv[TID:0] (0)
3: rdsv u32 %r45 sv[CTAID:0] (0)
4: mov u32 %r46 %r43 (0)
5: mad u32 %r47 %r45 c7[0x100] %r46 (0)
6: mov u32 %r48 0x00000000 (0)
7: mov u32 %r101 %r48 (0)
8: merge u64 %r49d %r47 %r101 (0)
9: mov u32 %r102 %r48 (0)
10: merge u64 %r66d %r42 %r102 (0)
11: split u64 { %r84 %r85 } %r49d (0)
12: split u64 { %r86 %r87 } %r66d (0)
13: mul u32 %r88 %r85 %r86 (0)
14: mad u32 %r89 %r84 %r87 %r88 (0)
15: mad (SUBOP:1) u32 %r91 %r84 %r86 %r89 (0)
16: mul u32 %r90 %r84 %r86 (0)
17: merge u64 %r67d %r90 %r91 (0)
18: split u64 { %r68 %r69 } %r67d (0)
19: mov u32 %r105 %r48 (0)
20: bra BB:3 (0)
---
<- BB:6 (back)
<- BB:2 (tree)
BB:3 (3 instructions) - idom = BB:2, df = { BB:3 }
-> BB:5 (tree)
-> BB:4 (tree)
21: phi u32 %r72 %r104 %r105 (0)
22: set u8 %p73 lt u32 %r72 c0[0x8] (0)
23: %p73 bra BB:5 (0)
---
<- BB:3 (tree)
BB:4 (1 instructions) - idom = BB:3, df = { }
-> BB:7 (tree)
24: bra BB:7 (0)
---
<- BB:4 (tree)
BB:7 (1 instructions) - idom = BB:4, df = { }
-> BB:1 (tree)
25: bra BB:1 (0)
---
<- BB:7 (tree)
BB:1 (1 instructions) - idom = BB:7, df = { }
26: exit - # (0)
---
<- BB:3 (tree)
BB:5 (15 instructions) - idom = BB:3, df = { BB:3 }
-> BB:6 (tree)
27: add u32 %r74 %r68 %r72 (0)
28: mov u32 %r75 0x00000000 (0)
29: mov u32 %r103 %r74 (0)
30: merge u64 %r76d %r103 %r75 (0)
31: mov u64 %r78d 0x0000000000000004 (0)
32: split u64 { %r93 %r94 } %r78d (0)
33: split u64 { %r95 %r96 } %r76d (0)
34: mul u32 %r97 %r94 %r95 (0)
35: mad u32 %r98 %r93 %r96 %r97 (0)
36: mad (SUBOP:1) u32 %r100 %r93 %r95 %r98 (0)
37: mul u32 %r99 %r93 %r95 (0)
38: merge u64 %r79d %r99 %r100 (0)
39: add u64 %r80d %r79d c0[0x0] (0)
40: st u32 # g[%r80d+0x0] %r74 (0)
41: bra BB:6 (0)
---
<- BB:5 (tree)
BB:6 (3 instructions) - idom = BB:5, df = { BB:3 }
-> BB:3 (back)
42: add u32 %r81 %r72 0x00000001 (0)
43: mov u32 %r104 %r81 (0)
44: bra BB:3 (0)
buildLiveSets(BB:0)
buildLiveSets(BB:2)
buildLiveSets(BB:3)
buildLiveSets(BB:5)
buildLiveSets(BB:6)
BB:6 live set of out blocks:
BitSet of size 106:
BB:6 live set after propagation:
BitSet of size 106:
72
BB:5 live set of out blocks:
BitSet of size 106:
72
BB:5 live set after propagation:
BitSet of size 106:
68 72
buildLiveSets(BB:4)
buildLiveSets(BB:7)
buildLiveSets(BB:1)
BB:1 live set of out blocks:
BitSet of size 106:
BB:1 live set after propagation:
BitSet of size 106:
BB:7 live set of out blocks:
BitSet of size 106:
BB:7 live set after propagation:
BitSet of size 106:
BB:4 live set of out blocks:
BitSet of size 106:
BB:4 live set after propagation:
BitSet of size 106:
BB:3 live set of out blocks:
BitSet of size 106:
68 72
BB:3 live set after propagation:
BitSet of size 106:
68
BB:2 live set of out blocks:
BitSet of size 106:
68
BB:2 live set after propagation:
BitSet of size 106:
42
BB:0 live set of out blocks:
BitSet of size 106:
42
BB:0 live set after propagation:
BitSet of size 106:
BuildIntervals(BB:0)
%42 <- live range [0(0), 2)
BuildIntervals(BB:2)
%68 <- live range [18(18), 21)
%105 <- live range [19(19), 21)
%48 <- live range [6(6), 19)
%67 <- live range [17(17), 18)
%90 <- live range [16(16), 17)
%91 <- live range [15(15), 17)
%84 <- live range [11(11), 16)
%86 <- live range [12(12), 16)
%89 <- live range [14(14), 15)
%87 <- live range [12(12), 14)
%88 <- live range [13(13), 14)
%85 <- live range [11(11), 13)
%66 <- live range [10(10), 12)
%49 <- live range [8(8), 11)
%42 <- live range [2(0), 10)
%102 <- live range [9(9), 10)
%47 <- live range [5(5), 8)
%101 <- live range [7(7), 8)
%45 <- live range [3(3), 5)
%46 <- live range [4(4), 5)
%43 <- live range [2(2), 4)
BuildIntervals(BB:3)
%68 <- live range [22(18), 24)
%72 <- live range [22(21), 24)
%73 <- live range [22(22), 23)
BuildIntervals(BB:5)
%72 <- live range [27(21), 42)
%74 <- live range [27(27), 40)
%80 <- live range [39(39), 40)
%79 <- live range [38(38), 39)
%99 <- live range [37(37), 38)
%100 <- live range [36(36), 38)
%93 <- live range [32(32), 37)
%95 <- live range [33(33), 37)
%98 <- live range [35(35), 36)
%96 <- live range [33(33), 35)
%97 <- live range [34(34), 35)
%94 <- live range [32(32), 34)
%76 <- live range [30(30), 33)
%78 <- live range [31(31), 32)
%103 <- live range [29(29), 30)
%75 <- live range [28(28), 30)
%68 <- live range [27(18), 27)
BuildIntervals(BB:6)
%68 <- live range [42(18), 45)
%104 <- live range [43(43), 45)
%81 <- live range [42(42), 43)
%72 <- live range [42(21), 42)
BuildIntervals(BB:4)
BuildIntervals(BB:7)
BuildIntervals(BB:1)
allocateRegisters to 45 instructions
joining %72($-1) <- %104
joining %72($-1) <- %105
joining %49($-1) <- %47
joining %49($-1) <- %101
makeCompound(split = 0): merge u64 %r49d %r47 %r101 (0)
compound: %49:ff <- %47:55
compound: %49:ff <- %101:aa
joining %66($-1) <- %42
joining %66($-1) <- %102
makeCompound(split = 0): merge u64 %r66d %r42 %r102 (0)
compound: %66:ff <- %42:55
compound: %66:ff <- %102:aa
joining %49($-1) <- %84
joining %49($-1) <- %85
makeCompound(split = 1): split u64 { %r84 %r85 } %r49d (0)
compound: %49:ff <- %84:55
compound: %49:ff <- %85:aa
joining %66($-1) <- %86
joining %66($-1) <- %87
makeCompound(split = 1): split u64 { %r86 %r87 } %r66d (0)
compound: %66:ff <- %86:55
compound: %66:ff <- %87:aa
joining %67($-1) <- %90
joining %67($-1) <- %91
makeCompound(split = 0): merge u64 %r67d %r90 %r91 (0)
compound: %67:ff <- %90:55
compound: %67:ff <- %91:aa
joining %67($-1) <- %68
joining %67($-1) <- %69
makeCompound(split = 1): split u64 { %r68 %r69 } %r67d (0)
compound: %67:ff <- %68:55
compound: %67:ff <- %69:aa
joining %76($-1) <- %103
joining %76($-1) <- %75
makeCompound(split = 0): merge u64 %r76d %r103 %r75 (0)
compound: %76:ff <- %103:55
compound: %76:ff <- %75:aa
joining %78($-1) <- %93
joining %78($-1) <- %94
makeCompound(split = 1): split u64 { %r93 %r94 } %r78d (0)
compound: %78:ff <- %93:55
compound: %78:ff <- %94:aa
joining %76($-1) <- %95
joining %76($-1) <- %96
makeCompound(split = 1): split u64 { %r95 %r96 } %r76d (0)
compound: %76:ff <- %95:55
compound: %76:ff <- %96:aa
joining %79($-1) <- %99
joining %79($-1) <- %100
makeCompound(split = 0): merge u64 %r79d %r99 %r100 (0)
compound: %79:ff <- %99:55
compound: %79:ff <- %100:aa
joining %46($-1) <- %43
joining %72($-1) <- %48
joining %72($-1) <- %81
printing live intervals ...
livei(%42): [0 10)
livei(%43): [2 4)
livei(%45): [3 5)
livei(%46): [4 5)
livei(%47): [5 8)
livei(%48): [6 19)
livei(%49): [8 11)
livei(%66): [10 12)
livei(%67): [17 18)
livei(%68): [18 21) [22 24) [42 45)
livei(%72): [22 24) [27 42)
livei(%73): [22 23)
livei(%74): [27 40)
livei(%75): [28 30)
livei(%76): [30 33)
livei(%78): [31 32)
livei(%79): [38 39)
livei(%80): [39 40)
livei(%81): [42 43)
livei(%84): [11 16)
livei(%85): [11 13)
livei(%86): [12 16)
livei(%87): [12 14)
livei(%88): [13 14)
livei(%89): [14 15)
livei(%90): [16 17)
livei(%91): [15 17)
livei(%93): [32 37)
livei(%94): [32 34)
livei(%95): [33 37)
livei(%96): [33 35)
livei(%97): [34 35)
livei(%98): [35 36)
livei(%99): [37 38)
livei(%100): [36 38)
livei(%101): [7 8)
livei(%102): [9 10)
livei(%103): [29 30)
livei(%104): [43 45)
livei(%105): [19 21)
RIG_Node[%0]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%1]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%2]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%3]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%4]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%5]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%6]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%7]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%8]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%9]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%10]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%11]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%12]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%13]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%14]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%15]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%16]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%17]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%18]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%19]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%20]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%21]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%22]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%23]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%24]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%25]($[2]-1): 1 colors, weight inf, deg 0/7
X
RIG_Node[%26]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%27]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%28]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%29]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%30]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%31]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%32]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%33]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%34]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%35]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%36]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%37]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%38]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%39]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%40]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%41]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%42]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%43]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%44]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%45]($[1]-1): 1 colors, weight 0.500000, deg 3/63
X %46 %66
RIG_Node[%46]($[1]-1): 1 colors, weight 1.333333, deg 3/63
X %66 %45
RIG_Node[%47]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%48]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%49]($[1]-1): 2 colors, weight inf, deg 10/62
X %66 %67 %89 %88 %72
RIG_Node[%50]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%51]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%52]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%53]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%54]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%55]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%56]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%57]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%58]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%59]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%60]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%61]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%62]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%63]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%64]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%65]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%66]($[1]-1): 2 colors, weight inf, deg 14/62
X %67 %89 %88 %72 %49 %45 %46
RIG_Node[%67]($[1]-1): 2 colors, weight 0.533333, deg 6/62
X %72 %49 %66
RIG_Node[%68]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%69]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%70]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%71]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%72]($[1]-1): 1 colors, weight 2.076923, deg 19/63
X %49 %66 %80 %79 %98 %97 %78 %76 %74 %67 %89 %88
RIG_Node[%73]($[2]-1): 1 colors, weight 1.000000, deg 0/7
X
RIG_Node[%74]($[1]-1): 1 colors, weight 0.307692, deg 11/63
X %72 %80 %79 %98 %97 %78 %76
RIG_Node[%75]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%76]($[1]-1): 2 colors, weight inf, deg 12/62
X %74 %72 %79 %98 %97 %78
RIG_Node[%77]($[1]-1): 2 colors, weight inf, deg 0/62
X
RIG_Node[%78]($[1]-1): 2 colors, weight 4.166667, deg 12/62
X %76 %74 %72 %79 %98 %97
RIG_Node[%79]($[1]-1): 2 colors, weight 3.000000, deg 8/62
X %78 %76 %74 %72
RIG_Node[%80]($[1]-1): 2 colors, weight 1.000000, deg 4/62
X %74 %72
RIG_Node[%81]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%82]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%83]($[3]-1): 1 colors, weight inf, deg 0/1
X
RIG_Node[%84]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%85]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%86]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%87]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%88]($[1]-1): 1 colors, weight 1.000000, deg 5/63
X %72 %49 %66
RIG_Node[%89]($[1]-1): 1 colors, weight 1.000000, deg 5/63
X %72 %49 %66
RIG_Node[%90]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%91]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%92]($[3]-1): 1 colors, weight inf, deg 0/1
X
RIG_Node[%93]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%94]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%95]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%96]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%97]($[1]-1): 1 colors, weight 1.000000, deg 6/63
X %78 %76 %74 %72
RIG_Node[%98]($[1]-1): 1 colors, weight 1.000000, deg 6/63
X %78 %76 %74 %72
RIG_Node[%99]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%100]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%101]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%102]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%103]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%104]($[1]-1): 1 colors, weight inf, deg 0/63
X
RIG_Node[%105]($[1]-1): 1 colors, weight inf, deg 0/63
X
edge: (%98, deg 6/63) >-< (%78, deg 12/62)
edge: (%98, deg 6/63) >-< (%76, deg 12/62)
edge: (%98, deg 6/63) >-< (%74, deg 11/63)
edge: (%98, deg 6/63) >-< (%72, deg 19/63)
SIMPLIFY: pushed %98
edge: (%97, deg 6/63) >-< (%78, deg 10/62)
edge: (%97, deg 6/63) >-< (%76, deg 10/62)
edge: (%97, deg 6/63) >-< (%74, deg 10/63)
edge: (%97, deg 6/63) >-< (%72, deg 18/63)
SIMPLIFY: pushed %97
edge: (%89, deg 5/63) >-< (%72, deg 17/63)
edge: (%89, deg 5/63) >-< (%49, deg 10/62)
edge: (%89, deg 5/63) >-< (%66, deg 14/62)
SIMPLIFY: pushed %89
edge: (%88, deg 5/63) >-< (%72, deg 16/63)
edge: (%88, deg 5/63) >-< (%49, deg 8/62)
edge: (%88, deg 5/63) >-< (%66, deg 12/62)
SIMPLIFY: pushed %88
edge: (%74, deg 9/63) >-< (%72, deg 15/63)
edge: (%74, deg 9/63) >-< (%80, deg 4/62)
edge: (%74, deg 9/63) >-< (%79, deg 8/62)
edge: (%74, deg 9/63) >-< (%98, deg 6/63)
edge: (%74, deg 9/63) >-< (%97, deg 6/63)
edge: (%74, deg 9/63) >-< (%78, deg 8/62)
edge: (%74, deg 9/63) >-< (%76, deg 8/62)
SIMPLIFY: pushed %74
SIMPLIFY: pushed %73
edge: (%72, deg 14/63) >-< (%49, deg 6/62)
edge: (%72, deg 14/63) >-< (%66, deg 10/62)
edge: (%72, deg 14/63) >-< (%80, deg 2/62)
edge: (%72, deg 14/63) >-< (%79, deg 6/62)
edge: (%72, deg 14/63) >-< (%98, deg 5/63)
edge: (%72, deg 14/63) >-< (%97, deg 5/63)
edge: (%72, deg 14/63) >-< (%78, deg 6/62)
edge: (%72, deg 14/63) >-< (%76, deg 6/62)
edge: (%72, deg 14/63) >-< (%74, deg 9/63)
edge: (%72, deg 14/63) >-< (%67, deg 6/62)
edge: (%72, deg 14/63) >-< (%89, deg 5/63)
edge: (%72, deg 14/63) >-< (%88, deg 5/63)
SIMPLIFY: pushed %72
edge: (%46, deg 3/63) >-< (%66, deg 8/62)
edge: (%46, deg 3/63) >-< (%45, deg 3/63)
SIMPLIFY: pushed %46
edge: (%45, deg 2/63) >-< (%46, deg 3/63)
edge: (%45, deg 2/63) >-< (%66, deg 6/62)
SIMPLIFY: pushed %45
edge: (%80, deg 0/62) >-< (%74, deg 8/63)
edge: (%80, deg 0/62) >-< (%72, deg 14/63)
SIMPLIFY: pushed %80
edge: (%79, deg 4/62) >-< (%78, deg 4/62)
edge: (%79, deg 4/62) >-< (%76, deg 4/62)
edge: (%79, deg 4/62) >-< (%74, deg 6/63)
edge: (%79, deg 4/62) >-< (%72, deg 12/63)
SIMPLIFY: pushed %79
edge: (%78, deg 2/62) >-< (%76, deg 2/62)
edge: (%78, deg 2/62) >-< (%74, deg 4/63)
edge: (%78, deg 2/62) >-< (%72, deg 10/63)
edge: (%78, deg 2/62) >-< (%79, deg 4/62)
edge: (%78, deg 2/62) >-< (%98, deg 4/63)
edge: (%78, deg 2/62) >-< (%97, deg 4/63)
SIMPLIFY: pushed %78
edge: (%76, deg 0/62) >-< (%74, deg 2/63)
edge: (%76, deg 0/62) >-< (%72, deg 8/63)
edge: (%76, deg 0/62) >-< (%79, deg 2/62)
edge: (%76, deg 0/62) >-< (%98, deg 2/63)
edge: (%76, deg 0/62) >-< (%97, deg 2/63)
edge: (%76, deg 0/62) >-< (%78, deg 2/62)
SIMPLIFY: pushed %76
edge: (%67, deg 4/62) >-< (%72, deg 6/63)
edge: (%67, deg 4/62) >-< (%49, deg 4/62)
edge: (%67, deg 4/62) >-< (%66, deg 4/62)
SIMPLIFY: pushed %67
edge: (%66, deg 2/62) >-< (%67, deg 4/62)
edge: (%66, deg 2/62) >-< (%89, deg 4/63)
edge: (%66, deg 2/62) >-< (%88, deg 4/63)
edge: (%66, deg 2/62) >-< (%72, deg 4/63)
edge: (%66, deg 2/62) >-< (%49, deg 2/62)
edge: (%66, deg 2/62) >-< (%45, deg 2/63)
edge: (%66, deg 2/62) >-< (%46, deg 2/63)
SIMPLIFY: pushed %66
edge: (%49, deg 0/62) >-< (%66, deg 2/62)
edge: (%49, deg 0/62) >-< (%67, deg 2/62)
edge: (%49, deg 0/62) >-< (%89, deg 2/63)
edge: (%49, deg 0/62) >-< (%88, deg 2/63)
edge: (%49, deg 0/62) >-< (%72, deg 2/63)
SIMPLIFY: pushed %49
SELECT phase
NODE[%49, 2 colors]
GPR:BitSet of size 63:
assigned reg 0
NODE[%66, 2 colors]
(%66)ff X (%49)03 & 03: $r0.03
(%66) X (%47): no overlap
(%66) X (%101): no overlap
(%66)ff X (%84)55 & 03: $r0.01
(%66)ff X (%85)aa & 03: $r0.02
(%42)55 X (%49)03 & 03: $r0.01
(%42)55 X (%47)55 & 03: $r0.01
(%42)55 X (%101)aa & 03: $r0.00
(%42) X (%84): no overlap
(%42) X (%85): no overlap
(%102)aa X (%49)03 & 03: $r0.02
(%102) X (%47): no overlap
(%102) X (%101): no overlap
(%102) X (%84): no overlap
(%102) X (%85): no overlap
(%86) X (%49): no overlap
(%86) X (%47): no overlap
(%86) X (%101): no overlap
(%86)55 X (%84)55 & 03: $r0.01
(%86)55 X (%85)aa & 03: $r0.00
(%87) X (%49): no overlap
(%87) X (%47): no overlap
(%87) X (%101): no overlap
(%87)aa X (%84)55 & 03: $r0.00
(%87)aa X (%85)aa & 03: $r0.02
GPR:BitSet of size 63:
0 1
assigned reg 2
NODE[%67, 2 colors]
(%67) X (%49): no overlap
(%67) X (%47): no overlap
(%67) X (%101): no overlap
(%67) X (%84): no overlap
(%67) X (%85): no overlap
(%90) X (%49): no overlap
(%90) X (%47): no overlap
(%90) X (%101): no overlap
(%90) X (%84): no overlap
(%90) X (%85): no overlap
(%91) X (%49): no overlap
(%91) X (%47): no overlap
(%91) X (%101): no overlap
(%91)aa X (%84)55 & 03: $r0.00
(%91) X (%85): no overlap
(%68) X (%49): no overlap
(%68) X (%47): no overlap
(%68) X (%101): no overlap
(%68) X (%84): no overlap
(%68) X (%85): no overlap
(%69) X (%49): no overlap
(%69) X (%47): no overlap
(%69) X (%101): no overlap
(%69) X (%84): no overlap
(%69) X (%85): no overlap
(%67) X (%66): no overlap
(%67) X (%42): no overlap
(%67) X (%102): no overlap
(%67) X (%86): no overlap
(%67) X (%87): no overlap
(%90) X (%66): no overlap
(%90) X (%42): no overlap
(%90) X (%102): no overlap
(%90) X (%86): no overlap
(%90) X (%87): no overlap
(%91) X (%66): no overlap
(%91) X (%42): no overlap
(%91) X (%102): no overlap
(%91)aa X (%86)55 & 0c: $r0.00
(%91) X (%87): no overlap
(%68) X (%66): no overlap
(%68) X (%42): no overlap
(%68) X (%102): no overlap
(%68) X (%86): no overlap
(%68) X (%87): no overlap
(%69) X (%66): no overlap
(%69) X (%42): no overlap
(%69) X (%102): no overlap
(%69) X (%86): no overlap
(%69) X (%87): no overlap
GPR:BitSet of size 63:
assigned reg 0
NODE[%76, 2 colors]
GPR:BitSet of size 63:
assigned reg 0
NODE[%78, 2 colors]
(%78)ff X (%76)03 & 03: $r0.03
(%78) X (%103): no overlap
(%78) X (%75): no overlap
(%78) X (%95): no overlap
(%78) X (%96): no overlap
(%93)55 X (%76)03 & 03: $r0.01
(%93) X (%103): no overlap
(%93) X (%75): no overlap
(%93)55 X (%95)55 & 03: $r0.01
(%93)55 X (%96)aa & 03: $r0.00
(%94)aa X (%76)03 & 03: $r0.02
(%94) X (%103): no overlap
(%94) X (%75): no overlap
(%94)aa X (%95)55 & 03: $r0.00
(%94)aa X (%96)aa & 03: $r0.02
GPR:BitSet of size 63:
0 1
assigned reg 2
NODE[%79, 2 colors]
(%79) X (%78): no overlap
(%79) X (%93): no overlap
(%79) X (%94): no overlap
(%99) X (%78): no overlap
(%99) X (%93): no overlap
(%99) X (%94): no overlap
(%100) X (%78): no overlap
(%100)aa X (%93)55 & 0c: $r0.00
(%100) X (%94): no overlap
(%79) X (%76): no overlap
(%79) X (%103): no overlap
(%79) X (%75): no overlap
(%79) X (%95): no overlap
(%79) X (%96): no overlap
(%99) X (%76): no overlap
(%99) X (%103): no overlap
(%99) X (%75): no overlap
(%99) X (%95): no overlap
(%99) X (%96): no overlap
(%100) X (%76): no overlap
(%100) X (%103): no overlap
(%100) X (%75): no overlap
(%100)aa X (%95)55 & 03: $r0.00
(%100) X (%96): no overlap
GPR:BitSet of size 63:
assigned reg 0
NODE[%80, 2 colors]
GPR:BitSet of size 63:
assigned reg 0
NODE[%45, 1 colors]
(%45) X (%66): no overlap
(%45)ff X (%42)55 & 0c: $r0.04
(%45) X (%102): no overlap
(%45) X (%86): no overlap
(%45) X (%87): no overlap
GPR:BitSet of size 63:
2
assigned reg 0
NODE[%46, 1 colors]
(%46) X (%66): no overlap
(%46)ff X (%42)55 & 0c: $r0.04
(%46) X (%102): no overlap
(%46) X (%86): no overlap
(%46) X (%87): no overlap
(%43) X (%66): no overlap
(%43)ff X (%42)55 & 0c: $r0.04
(%43) X (%102): no overlap
(%43) X (%86): no overlap
(%43) X (%87): no overlap
(%46) X (%45): $r0 + 1
GPR:BitSet of size 63:
0 2
assigned reg 1
NODE[%72, 1 colors]
(%72) X (%49): no overlap
(%72) X (%47): no overlap
(%72) X (%101): no overlap
(%72) X (%84): no overlap
(%72) X (%85): no overlap
(%104) X (%49): no overlap
(%104) X (%47): no overlap
(%104) X (%101): no overlap
(%104) X (%84): no overlap
(%104) X (%85): no overlap
(%105) X (%49): no overlap
(%105) X (%47): no overlap
(%105) X (%101): no overlap
(%105) X (%84): no overlap
(%105) X (%85): no overlap
(%48)ff X (%49)03 & 03: $r0.03
(%48)ff X (%47)55 & 03: $r0.01
(%48)ff X (%101)aa & 03: $r0.02
(%48)ff X (%84)55 & 03: $r0.01
(%48)ff X (%85)aa & 03: $r0.02
(%81) X (%49): no overlap
(%81) X (%47): no overlap
(%81) X (%101): no overlap
(%81) X (%84): no overlap
(%81) X (%85): no overlap
(%72) X (%66): no overlap
(%72) X (%42): no overlap
(%72) X (%102): no overlap
(%72) X (%86): no overlap
(%72) X (%87): no overlap
(%104) X (%66): no overlap
(%104) X (%42): no overlap
(%104) X (%102): no overlap
(%104) X (%86): no overlap
(%104) X (%87): no overlap
(%105) X (%66): no overlap
(%105) X (%42): no overlap
(%105) X (%102): no overlap
(%105) X (%86): no overlap
(%105) X (%87): no overlap
(%48)ff X (%66)0c & 0c: $r0.0c
(%48)ff X (%42)55 & 0c: $r0.04
(%48)ff X (%102)aa & 0c: $r0.08
(%48)ff X (%86)55 & 0c: $r0.04
(%48)ff X (%87)aa & 0c: $r0.08
(%81) X (%66): no overlap
(%81) X (%42): no overlap
(%81) X (%102): no overlap
(%81) X (%86): no overlap
(%81) X (%87): no overlap
(%72) X (%80): $r0 + 2
(%72)ff X (%79)03 & 03: $r0.03
(%72)ff X (%99)55 & 03: $r0.01
(%72)ff X (%100)aa & 03: $r0.02
(%104) X (%79): no overlap
(%104) X (%99): no overlap
(%104) X (%100): no overlap
(%105) X (%79): no overlap
(%105) X (%99): no overlap
(%105) X (%100): no overlap
(%48) X (%79): no overlap
(%48) X (%99): no overlap
(%48) X (%100): no overlap
(%81) X (%79): no overlap
(%81) X (%99): no overlap
(%81) X (%100): no overlap
(%72)ff X (%78)0c & 0c: $r0.0c
(%72)ff X (%93)55 & 0c: $r0.04
(%72)ff X (%94)aa & 0c: $r0.08
(%104) X (%78): no overlap
(%104) X (%93): no overlap
(%104) X (%94): no overlap
(%105) X (%78): no overlap
(%105) X (%93): no overlap
(%105) X (%94): no overlap
(%48) X (%78): no overlap
(%48) X (%93): no overlap
(%48) X (%94): no overlap
(%81) X (%78): no overlap
(%81) X (%93): no overlap
(%81) X (%94): no overlap
(%72)ff X (%76)03 & 03: $r0.03
(%72)ff X (%103)55 & 03: $r0.01
(%72)ff X (%75)aa & 03: $r0.02
(%72)ff X (%95)55 & 03: $r0.01
(%72)ff X (%96)aa & 03: $r0.02
(%104) X (%76): no overlap
(%104) X (%103): no overlap
(%104) X (%75): no overlap
(%104) X (%95): no overlap
(%104) X (%96): no overlap
(%105) X (%76): no overlap
(%105) X (%103): no overlap
(%105) X (%75): no overlap
(%105) X (%95): no overlap
(%105) X (%96): no overlap
(%48) X (%76): no overlap
(%48) X (%103): no overlap
(%48) X (%75): no overlap
(%48) X (%95): no overlap
(%48) X (%96): no overlap
(%81) X (%76): no overlap
(%81) X (%103): no overlap
(%81) X (%75): no overlap
(%81) X (%95): no overlap
(%81) X (%96): no overlap
(%72) X (%67): no overlap
(%72) X (%90): no overlap
(%72) X (%91): no overlap
(%72)ff X (%68)55 & 03: $r0.01
(%72) X (%69): no overlap
(%104) X (%67): no overlap
(%104) X (%90): no overlap
(%104) X (%91): no overlap
(%104)ff X (%68)55 & 03: $r0.01
(%104) X (%69): no overlap
(%105) X (%67): no overlap
(%105) X (%90): no overlap
(%105) X (%91): no overlap
(%105)ff X (%68)55 & 03: $r0.01
(%105) X (%69): no overlap
(%48)ff X (%67)03 & 03: $r0.03
(%48)ff X (%90)55 & 03: $r0.01
(%48)ff X (%91)aa & 03: $r0.02
(%48)ff X (%68)55 & 03: $r0.01
(%48) X (%69): no overlap
(%81) X (%67): no overlap
(%81) X (%90): no overlap
(%81) X (%91): no overlap
(%81)ff X (%68)55 & 03: $r0.01
(%81) X (%69): no overlap
GPR:BitSet of size 63:
0 1 2 3
assigned reg 4
NODE[%73, 1 colors]
GPR:BitSet of size 7:
assigned reg 0
NODE[%74, 1 colors]
(%74) X (%72): $r4 + 1
(%74) X (%80): $r0 + 2
(%74)ff X (%79)03 & 03: $r0.03
(%74)ff X (%99)55 & 03: $r0.01
(%74)ff X (%100)aa & 03: $r0.02
(%74)ff X (%78)0c & 0c: $r0.0c
(%74)ff X (%93)55 & 0c: $r0.04
(%74)ff X (%94)aa & 0c: $r0.08
(%74)ff X (%76)03 & 03: $r0.03
(%74)ff X (%103)55 & 03: $r0.01
(%74)ff X (%75)aa & 03: $r0.02
(%74)ff X (%95)55 & 03: $r0.01
(%74)ff X (%96)aa & 03: $r0.02
GPR:BitSet of size 63:
0 1 2 3 4
assigned reg 5
NODE[%88, 1 colors]
(%88) X (%72): $r4 + 1
(%88) X (%49): no overlap
(%88) X (%47): no overlap
(%88) X (%101): no overlap
(%88)ff X (%84)55 & 03: $r0.01
(%88) X (%85): no overlap
(%88) X (%66): no overlap
(%88) X (%42): no overlap
(%88) X (%102): no overlap
(%88)ff X (%86)55 & 0c: $r0.04
(%88)ff X (%87)aa & 0c: $r0.08
GPR:BitSet of size 63:
0 2 3 4
assigned reg 1
NODE[%89, 1 colors]
(%89) X (%72): $r4 + 1
(%89) X (%49): no overlap
(%89) X (%47): no overlap
(%89) X (%101): no overlap
(%89)ff X (%84)55 & 03: $r0.01
(%89) X (%85): no overlap
(%89) X (%66): no overlap
(%89) X (%42): no overlap
(%89) X (%102): no overlap
(%89)ff X (%86)55 & 0c: $r0.04
(%89) X (%87): no overlap
GPR:BitSet of size 63:
0 2 4
assigned reg 1
NODE[%97, 1 colors]
(%97) X (%78): no overlap
(%97)ff X (%93)55 & 0c: $r0.04
(%97) X (%94): no overlap
(%97) X (%76): no overlap
(%97) X (%103): no overlap
(%97) X (%75): no overlap
(%97)ff X (%95)55 & 03: $r0.01
(%97)ff X (%96)aa & 03: $r0.02
(%97) X (%74): $r5 + 1
(%97) X (%72): $r4 + 1
GPR:BitSet of size 63:
0 1 2 4 5
assigned reg 3
NODE[%98, 1 colors]
(%98) X (%78): no overlap
(%98)ff X (%93)55 & 0c: $r0.04
(%98) X (%94): no overlap
(%98) X (%76): no overlap
(%98) X (%103): no overlap
(%98) X (%75): no overlap
(%98)ff X (%95)55 & 03: $r0.01
(%98) X (%96): no overlap
(%98) X (%74): $r5 + 1
(%98) X (%72): $r4 + 1
GPR:BitSet of size 63:
0 2 4 5
assigned reg 1
RegAlloc done: 1
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }
buildLiveSets(BB:0)
BB:0 live set of out blocks:
BitSet of size 0:
BB:0 live set after propagation:
BitSet of size 0:
BuildIntervals(BB:0)
allocateRegisters to 0 instructions
printing live intervals ...
SELECT phase
RegAlloc done: 1
RA
loop_lt:10 ()
---
BB:0 (2 instructions) - df = { }
-> BB:2 (tree)
0: ld u32 $r2 c0[0x8] (0)
1: bra BB:2 (0)
---
<- BB:0 (tree)
BB:2 (19 instructions) - idom = BB:0, df = { }
-> BB:3 (tree)
2: rdsv u32 $r1 sv[TID:0] (0)
3: rdsv u32 $r0 sv[CTAID:0] (0)
4: mov u32 $r1 $r1 (0)
5: mad u32 $r0 $r0 c7[0x100] $r1 (0)
6: mov u32 $r4 0x00000000 (0)
7: mov u32 $r1 $r4 (0)
8: merge u64 $r0d $r0 $r1 (0)
9: mov u32 $r3 $r4 (0)
10: merge u64 $r2d $r2 $r3 (0)
11: split u64 { $r0 $r1 } $r0d (0)
12: split u64 { $r2 $r3 } $r2d (0)
13: mul u32 $r1 $r1 $r2 (0)
14: mad u32 $r1 $r0 $r3 $r1 (0)
15: mad (SUBOP:1) u32 $r1 $r0 $r2 $r1 (0)
16: mul u32 $r0 $r0 $r2 (0)
17: merge u64 $r0d $r0 $r1 (0)
18: split u64 { $r0 $r1 } $r0d (0)
19: mov u32 $r4 $r4 (0)
20: bra BB:3 (0)
---
<- BB:6 (back)
<- BB:2 (tree)
BB:3 (3 instructions) - idom = BB:2, df = { BB:3 }
-> BB:5 (tree)
-> BB:4 (tree)
21: phi u32 $r4 $r4 $r4 (0)
22: set u8 $p0 lt u32 $r4 c0[0x8] (0)
23: $p0 bra BB:5 (0)
---
<- BB:3 (tree)
BB:4 (1 instructions) - idom = BB:3, df = { }
-> BB:7 (tree)
24: bra BB:7 (0)
---
<- BB:4 (tree)
BB:7 (1 instructions) - idom = BB:4, df = { }
-> BB:1 (tree)
25: bra BB:1 (0)
---
<- BB:7 (tree)
BB:1 (1 instructions) - idom = BB:7, df = { }
26: exit - # (0)
---
<- BB:3 (tree)
BB:5 (15 instructions) - idom = BB:3, df = { BB:3 }
-> BB:6 (tree)
27: add u32 $r5 $r0 $r4 (0)
28: mov u32 $r1 0x00000000 (0)
29: mov u32 $r0 $r5 (0)
30: merge u64 $r0d $r0 $r1 (0)
31: mov u64 $r2d 0x0000000000000004 (0)
32: split u64 { $r2 $r3 } $r2d (0)
33: split u64 { $r0 $r1 } $r0d (0)
34: mul u32 $r3 $r3 $r0 (0)
35: mad u32 $r1 $r2 $r1 $r3 (0)
36: mad (SUBOP:1) u32 $r1 $r2 $r0 $r1 (0)
37: mul u32 $r0 $r2 $r0 (0)
38: merge u64 $r0d $r0 $r1 (0)
39: add u64 $r0d $r0d c0[0x0] (0)
40: st u32 # g[$r0d+0x0] $r5 (0)
41: bra BB:6 (0)
---
<- BB:5 (tree)
BB:6 (3 instructions) - idom = BB:5, df = { BB:3 }
-> BB:3 (back)
42: add u32 $r4 $r4 0x00000001 (0)
43: mov u32 $r4 $r4 (0)
44: bra BB:3 (0)
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }
legalizePostRA
loop_lt:10 ()
---
BB:0 (2 instructions) - df = { }
-> BB:2 (tree)
0: ld u32 $r2 c0[0x8] (0)
1: bra BB:2 (0)
---
<- BB:0 (tree)
BB:2 (11 instructions) - idom = BB:0, df = { }
-> BB:3 (tree)
2: rdsv u32 $r1 sv[TID:0] (0)
3: rdsv u32 $r0 sv[CTAID:0] (0)
4: mad u32 $r0 $r0 c7[0x100] $r1 (0)
5: mov u32 $r4 0x00000000 (0)
6: mov u32 $r1 $r4 (0)
7: mov u32 $r3 $r4 (0)
8: mul u32 $r1 $r1 $r2 (0)
9: mad u32 $r1 $r0 $r3 $r1 (0)
10: mad (SUBOP:1) u32 $r1 $r0 $r2 $r1 (0)
11: mul u32 $r0 $r0 $r2 (0)
12: bra BB:3 (0)
---
<- BB:6 (back)
<- BB:2 (tree)
BB:3 (2 instructions) - idom = BB:2, df = { BB:3 }
-> BB:5 (tree)
-> BB:4 (tree)
13: set u8 $p0 lt u32 $r4 c0[0x8] (0)
14: $p0 bra BB:5 (0)
---
<- BB:3 (tree)
BB:4 (1 instructions) - idom = BB:3, df = { }
-> BB:7 (tree)
15: bra BB:7 (0)
---
<- BB:4 (tree)
BB:7 (1 instructions) - idom = BB:4, df = { }
-> BB:1 (tree)
16: bra BB:1 (0)
---
<- BB:7 (tree)
BB:1 (1 instructions) - idom = BB:7, df = { }
17: exit - # (0)
---
<- BB:3 (tree)
BB:5 (13 instructions) - idom = BB:3, df = { BB:3 }
-> BB:6 (tree)
18: add u32 $r5 $r0 $r4 (0)
19: mov u32 $r1 0x00000000 (0)
20: mov u32 $r0 $r5 (0)
21: mov u32 $r2 0x00000004 (0)
22: mov u32 $r3 0x00000000 (0)
23: mul u32 $r3 $r3 $r0 (0)
24: mad u32 $r1 $r2 $r1 $r3 (0)
25: mad (SUBOP:1) u32 $r1 $r2 $r0 $r1 (0)
26: mul u32 $r0 $r2 $r0 (0)
27: add u32 { $r0 $c0 } $r0 c0[0x0] (0)
28: add u32 $r1 $r1 c0[0x4] $c0 (0)
29: st u32 # g[$r0d+0x0] $r5 (0)
30: bra BB:6 (0)
---
<- BB:5 (tree)
BB:6 (2 instructions) - idom = BB:5, df = { BB:3 }
-> BB:3 (back)
31: add u32 $r4 $r4 0x00000001 (0)
32: bra BB:3 (0)
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }
PEEPHOLE: FlatteningPass
optimizePostRA
loop_lt:10 ()
---
BB:0 (2 instructions) - df = { }
-> BB:2 (tree)
0: ld u32 $r2 c0[0x8] (0)
1: bra BB:2 (0)
---
<- BB:0 (tree)
BB:2 (11 instructions) - idom = BB:0, df = { }
-> BB:3 (tree)
2: rdsv u32 $r1 sv[TID:0] (0)
3: rdsv u32 $r0 sv[CTAID:0] (0)
4: mad u32 $r0 $r0 c7[0x100] $r1 (0)
5: mov u32 $r4 0x00000000 (0)
6: mov u32 $r1 $r4 (0)
7: mov u32 $r3 $r4 (0)
8: mul u32 $r1 $r1 $r2 (0)
9: mad u32 $r1 $r0 $r3 $r1 (0)
10: mad (SUBOP:1) u32 $r1 $r0 $r2 $r1 (0)
11: mul u32 $r0 $r0 $r2 (0)
12: bra BB:3 (0)
---
<- BB:6 (back)
<- BB:2 (tree)
BB:3 (2 instructions) - idom = BB:2, df = { BB:3 }
-> BB:5 (tree)
-> BB:4 (tree)
13: set u8 $p0 lt u32 $r4 c0[0x8] (0)
14: $p0 bra BB:5 (0)
---
<- BB:3 (tree)
BB:4 (1 instructions) - idom = BB:3, df = { }
-> BB:7 (tree)
15: bra BB:1 (0)
---
<- BB:4 (tree)
BB:7 (0 instructions) - idom = BB:4, df = { }
-> BB:1 (tree)
---
<- BB:7 (tree)
BB:1 (1 instructions) - idom = BB:7, df = { }
16: exit - # (0)
---
<- BB:3 (tree)
BB:5 (13 instructions) - idom = BB:3, df = { BB:3 }
-> BB:6 (tree)
17: add u32 $r5 $r0 $r4 (0)
18: mov u32 $r1 0x00000000 (0)
19: mov u32 $r0 $r5 (0)
20: mov u32 $r2 0x00000004 (0)
21: mov u32 $r3 0x00000000 (0)
22: mul u32 $r3 $r3 $r0 (0)
23: mad u32 $r1 $r2 $r1 $r3 (0)
24: mad (SUBOP:1) u32 $r1 $r2 $r0 $r1 (0)
25: mul u32 $r0 $r2 $r0 (0)
26: add u32 { $r0 $c0 } $r0 c0[0x0] (0)
27: add u32 $r1 $r1 c0[0x4] $c0 (0)
28: st u32 # g[$r0d+0x0] $r5 (0)
29: bra BB:6 (0)
---
<- BB:5 (tree)
BB:6 (2 instructions) - idom = BB:5, df = { BB:3 }
-> BB:3 (back)
30: add u32 $r4 $r4 0x00000001 (0)
31: bra BB:3 (0)
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }
loop_lt:10 ()
---
BB:0 (1 instructions) - df = { }
-> BB:2 (tree)
0: ld u32 $r2 c0[0x8] (8)
---
<- BB:0 (tree)
BB:2 (10 instructions) - idom = BB:0, df = { }
-> BB:3 (tree)
1: rdsv u32 $r1 sv[TID:0] (8)
2: rdsv u32 $r0 sv[CTAID:0] (8)
3: mad u32 $r0 $r0 c7[0x100] $r1 (8)
4: mov u32 $r4 0x00000000 (8)
5: mov u32 $r1 $r4 (8)
6: mov u32 $r3 $r4 (8)
7: mul u32 $r1 $r1 $r2 (8)
8: mad u32 $r1 $r0 $r3 $r1 (8)
9: mad (SUBOP:1) u32 $r1 $r0 $r2 $r1 (8)
10: mul u32 $r0 $r0 $r2 (8)
---
<- BB:6 (back)
<- BB:2 (tree)
BB:3 (2 instructions) - idom = BB:2, df = { BB:3 }
-> BB:5 (tree)
-> BB:4 (tree)
11: set u8 $p0 lt u32 $r4 c0[0x8] (8)
12: $p0 bra BB:5 (8)
---
<- BB:3 (tree)
BB:4 (0 instructions) - idom = BB:3, df = { }
-> BB:7 (tree)
---
<- BB:4 (tree)
BB:7 (0 instructions) - idom = BB:4, df = { }
-> BB:1 (tree)
---
<- BB:7 (tree)
BB:1 (1 instructions) - idom = BB:7, df = { }
13: exit - # (8)
---
<- BB:3 (tree)
BB:5 (12 instructions) - idom = BB:3, df = { BB:3 }
-> BB:6 (tree)
14: add u32 $r5 $r0 $r4 (8)
15: mov u32 $r1 0x00000000 (8)
16: mov u32 $r0 $r5 (8)
17: mov u32 $r2 0x00000004 (8)
18: mov u32 $r3 0x00000000 (8)
19: mul u32 $r3 $r3 $r0 (8)
20: mad u32 $r1 $r2 $r1 $r3 (8)
21: mad (SUBOP:1) u32 $r1 $r2 $r0 $r1 (8)
22: mul u32 $r0 $r2 $r0 (8)
23: add u32 { $r0 $c0 } $r0 c0[0x0] (8)
24: add u32 $r1 $r1 c0[0x4] $c0 (8)
25: st u32 # g[$r0d+0x0] $r5 (8)
---
<- BB:5 (tree)
BB:6 (2 instructions) - idom = BB:5, df = { BB:3 }
-> BB:3 (back)
26: add u32 $r4 $r4 0x00000001 (8)
27: bra BB:3 (8)
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }
nv50_ir_generate_code: ret = 0
program binary (256 bytes)
42820047 22804280 20009de4 28004000 84005c04 2c000000 94001c04 2c000000
00001c03 20025c04 00011de2 18000000 10005de4 28000000 1000dde4 28000000
420282e7 22e212c0 08105c03 50000000 0c005c03 20020000 08005c43 20020000
08001c03 50000000 2041dc03 188e4000 400001e7 40000000 00001de7 80000000
00428047 2282e282 10015c03 48000000 00005de2 18000000 14001de4 28000000
10009de2 18000000 0000dde2 18000000 0030dc03 50000000 04205c03 20060000
82c2e207 22720042 00205c43 20020000 00201c03 50000000 00001c03 48014000
10105c43 48004000 00015c85 94000000 04411c03 4800c000 a0001de7 4003fffd

Event Timeline

pmoreau created this paste.Sep 10 2016, 8:34 PM
pmoreau created this object with visibility "Public (No Login Required)".
pmoreau created this object with edit policy "Nouveau (Project)".

% spirv-dis loop_lt.spv 1 jobs
; SPIR-V
; Version: 1.0
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 33
; Schema: 0

      OpCapability Addresses
      OpCapability Linkage
      OpCapability Kernel
      OpCapability Int64
 %1 = OpExtInstImport "OpenCL.std"
      OpMemoryModel Physical64 OpenCL
      OpEntryPoint Kernel %10 "loop_lt"
      OpSource OpenCL_C 102000
      OpName %5 "__spirv_BuiltInGlobalInvocationId"
      OpName %11 "out"
      OpName %12 "iterations"
      OpName %13 "entry"
      OpName %14 "for.cond"
      OpName %15 "for.body"
      OpName %16 "for.inc"
      OpName %17 "for.end"
      OpName %19 "call"
      OpName %20 "conv"
      OpName %21 "mul"
      OpName %22 "conv1"
      OpName %24 "inc"
      OpName %25 "i.0"
      OpName %27 "cmp"
      OpName %28 "add"
      OpName %29 "idxprom"
      OpName %30 "arrayidx"
      OpDecorate %5 BuiltIn GlobalInvocationId
      OpDecorate %5 Constant
      OpDecorate %5 LinkageAttributes "__spirv_BuiltInGlobalInvocationId" Import
 %2 = OpTypeInt 64 0
 %7 = OpTypeInt 32 0
%23 = OpConstant %7 0
%31 = OpConstant %7 1
 %3 = OpTypeVector %2 3
 %4 = OpTypePointer UniformConstant %3
 %6 = OpTypeVoid
 %8 = OpTypePointer CrossWorkgroup %7
 %9 = OpTypeFunction %6 %8 %7
%26 = OpTypeBool
 %5 = OpVariable %4 UniformConstant
%10 = OpFunction %6 None %9
%11 = OpFunctionParameter %8
%12 = OpFunctionParameter %7
%13 = OpLabel
%18 = OpLoad %3 %5
%19 = OpCompositeExtract %2 %18 0
%20 = OpUConvert %2 %12
%21 = OpIMul %2 %19 %20
%22 = OpUConvert %7 %21
      OpBranch %14
%14 = OpLabel
%25 = OpPhi %7 %23 %13 %24 %16
%27 = OpULessThan %26 %25 %12
      OpBranchConditional %27 %15 %17
%15 = OpLabel
%28 = OpIAdd %7 %22 %25
%29 = OpUConvert %2 %28
%30 = OpInBoundsPtrAccessChain %8 %11 %29
      OpStore %30 %28 Aligned 4
      OpBranch %16
%16 = OpLabel
%24 = OpIAdd %7 %25 %31
      OpBranch %14
%17 = OpLabel
      OpReturn
      OpFunctionEnd
pmoreau edited the content of this paste. (Show Details)Sep 11 2016, 12:19 PM
MAIN:-1 ()
---
BB:0 (0 instructions) - df = { }

loop_lt:10 ()
---
BB:0 (2 instructions) - df = { }
 -> BB:2 (tree)
  0: ld  u32 %r42 c0[0x8] (0)
  1: bra BB:2 (0)
---
 <- BB:0 (tree)
BB:2 (19 instructions) - idom = BB:0, df = { }
 -> BB:3 (tree)
  2: rdsv u32 %r43 sv[TID:0] (0)
  3: rdsv u32 %r45 sv[CTAID:0] (0)
  4: mov u32 %r46 %r43 (0)
  5: mad u32 %r47 %r45 c7[0x100] %r46 (0)
  6: mov u32 %r48 0x00000000 (0)
  7: mov u32 %r101 %r48 (0)
  8: merge u64 %r49d %r47 %r101 (0)
  9: mov u32 %r102 %r48 (0)
 10: merge u64 %r66d %r42 %r102 (0)
 11: split u64 { %r84 %r85 } %r49d (0)
 12: split u64 { %r86 %r87 } %r66d (0)
 13: mul u32 %r88 %r85 %r86 (0)
 14: mad u32 %r89 %r84 %r87 %r88 (0)
 15: mad (SUBOP:1) u32 %r91 %r84 %r86 %r89 (0)
 16: mul u32 %r90 %r84 %r86 (0)
 17: merge u64 %r67d %r90 %r91 (0)
 18: split u64 { %r68 %r69 } %r67d (0)
 19: mov u32 %r105 %r48 (0)
 20: bra BB:3 (0)
---
 <- BB:6 (back)
 <- BB:2 (tree)
BB:3 (3 instructions) - idom = BB:2, df = { BB:3 }
 -> BB:5 (tree)
 -> BB:4 (tree)
 21: phi u32 %r72 %r104 %r105 (0)
 22: set u8 %p73 lt u32 %r72 c0[0x8] (0)
 23: %p73 bra BB:5 (0)
---
 <- BB:3 (tree)
BB:4 (1 instructions) - idom = BB:3, df = { }
 -> BB:7 (tree)
 24: bra BB:7 (0)
---
 <- BB:4 (tree)
BB:7 (1 instructions) - idom = BB:4, df = { }
 -> BB:1 (tree)
 25: bra BB:1 (0)
---
 <- BB:7 (tree)
BB:1 (1 instructions) - idom = BB:7, df = { }
 26: exit - # (0)
---
 <- BB:3 (tree)
BB:5 (15 instructions) - idom = BB:3, df = { BB:3 }
 -> BB:6 (tree)
 27: add u32 %r74 %r68 %r72 (0)
 28: mov u32 %r75 0x00000000 (0)
 29: mov u32 %r103 %r74 (0)
 30: merge u64 %r76d %r103 %r75 (0)
 31: mov u64 %r78d 0x0000000000000004 (0)
 32: split u64 { %r93 %r94 } %r78d (0)
 33: split u64 { %r95 %r96 } %r76d (0)
 34: mul u32 %r97 %r94 %r95 (0)
 35: mad u32 %r98 %r93 %r96 %r97 (0)
 36: mad (SUBOP:1) u32 %r100 %r93 %r95 %r98 (0)
 37: mul u32 %r99 %r93 %r95 (0)
 38: merge u64 %r79d %r99 %r100 (0)
 39: add u64 %r80d %r79d c0[0x0] (0)
 40: st  u32 # g[%r80d+0x0] %r74 (0)
 41: bra BB:6 (0)
---
 <- BB:5 (tree)
BB:6 (3 instructions) - idom = BB:5, df = { BB:3 }
 -> BB:3 (back)
 42: add u32 %r81 %r72 0x00000001 (0)
 43: mov u32 %r104 %r81 (0)
 44: bra BB:3 (0)
}
BB:2:
    %68 <- live range [18(18), 21)
    [18 21)

BB:3:
    %68 <- live range [22(18), 24)
    [18 21) [22 24)

BB:5:
    %68 <- live range [27(18), 27)
    [18 21) [22 24)

BB:6:
    %68 <- live range [42(18), 45)
    [18 21) [22 24) [42 45)

After buildLiveSets() (liveSet seems to be the set of live-in values here):

BB:0: {}
BB:2: {42}
BB:3: {68}
BB:4: {}
BB:7: {}
BB:1: {}
BB:5: {68,72}
BB:6: {72}

Technically, since BB:3 is outgoing from BB:6, BB:6's liveset should also include 68. But as the first block to be filled in is BB:6, at the time BB:3 is empty, so BB:6 will start empty (see nv50_ir_ra.cpp:577). In this case, should buildLiveSets() be called twice or will that be sorted later on?

After collectLiveValues() (now it looks like liveSet is the set of live-out values):

BB:0: {42}
BB:2: {68}
BB:3: {68,72}
BB:4: {}
BB:7: {}
BB:1: {}
BB:5: {72}
BB:6: {68,72}

As BB:5 is now computed before BB:6, 68 is not found as being part of BB:5's live-out, even if it is. Should this be computed in reverse order instead, or will that be fixed in the remaining of BuildIntervalsPass::visit()? 68 will be added to the live set when processing all the sources from instructions in BB:5, but it will only be live the time of the instructions in that case (since that instruction is the first of the block). So, since 68 is live in BB:6 and we are in SSA-form and 68 is defined in BB:2, 68 should be considered as a live-out of BB:5 after collectLiveValues() was called. Since BuildIntervalsPass::visit() builds using the live sets of the outgoing BBs, should BuildIntervalsPass::visit() process the BBs in reverse order, to start with the outgoing BBs and end with the incoming ones?