Remove old-style arithmetic primitives #292

daemanos · 2019-03-18T19:18:13Z

In #273, MLton was extended to support overflow-checking primitives for arithmetic operations. This allowed checked arithmetic operations to be encoded as normal if-statements that raise an Overflow exception when appropriate, obviating the need to have a special PrimOverflow exception and associated supporting infrastructure in the various IR datatypes. However, until now the old Arith transfer-style primitives have remained in place. This pull request removes the old primitives entirely, along with the special-case code required to support them, simplifying a number of datatypes and optimizations which now no longer need to keep track of arithmetic overflows and can instead rely on the normal exception infrastructure.

With `Prim.ApplyResult.Overflow` constructor removed, there is no confusion with the pervasive `Overflow` exception.

MatthewFluet · 2019-03-21T01:49:37Z

Performance

MLton0 -- /home/mtf/devel/mlton/builds/20190217.152511-g9ba427a/bin/mlton -codegen amd64
MLton1 -- /home/mtf/devel/mlton/builds/20190319.152439-g9d251db/bin/mlton -codegen amd64
MLton2 -- /home/mtf/devel/mlton/builds/20190217.152511-g9ba427a/bin/mlton -codegen c
MLton3 -- /home/mtf/devel/mlton/builds/20190319.152439-g9d251db/bin/mlton -codegen c
MLton4 -- /home/mtf/devel/mlton/builds/20190217.152511-g9ba427a/bin/mlton -codegen llvm
MLton5 -- /home/mtf/devel/mlton/builds/20190319.152439-g9d251db/bin/mlton -codegen llvm
run time ratio
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5
barnes-hut          1.00   0.98   1.00   1.01   0.98   0.97
boyer               1.00   1.02   0.99   0.99   1.00   1.00
checksum            1.00   1.00   1.12   0.95   0.76   0.70
count-graphs        1.00   1.00   0.91   0.81   0.96   0.88
DLXSimulator        1.00   1.00   1.01   1.01   0.96   0.97
even-odd            1.00   1.00   1.11   1.20   1.00   1.00
fft                 1.00   0.98   0.90   0.90   0.84   0.84
fib                 1.00   1.00   1.19   1.18   1.08   1.22
flat-array          1.00   1.00   2.29   2.29   0.00   0.00
hamlet              1.00   1.01   2.17   2.08   1.99   2.21
imp-for             1.00   1.00   1.35   0.97   0.45   0.45
knuth-bendix        1.00   1.00   1.06   1.52   1.18   1.45
lexgen              1.00   0.99   0.94   1.01   0.90   0.96
life                1.00   1.00   1.05   1.04   1.09   1.04
logic               1.00   1.03   1.11   1.11   1.09   1.10
mandelbrot          1.00   1.00   0.38   0.42   0.29   0.29
matrix-multiply     1.00   1.00   0.75   0.57   0.48   0.48
md5                 1.00   1.00   1.31   1.28   1.03   1.03
merge               1.00   0.99   1.00   1.00   0.99   0.99
mlyacc              1.00   0.95   1.11   1.08   1.13   1.06
model-elimination   1.00   1.02   1.54   1.59   1.43   1.47
mpuz                1.00   1.00   0.85   0.83   0.48   0.50
nucleic             1.00   0.99   0.86   0.85   0.87   0.87
output1             1.00   1.00   1.19   1.12   1.05   1.09
peek                1.00   1.00   1.05   1.07   0.17   0.16
psdes-random        1.00   1.00   0.73   0.75   0.80   0.80
ratio-regions       1.00   1.00   1.01   1.01   0.96   0.96
ray                 1.00   0.96   0.90   0.91   0.95   1.01
raytrace            1.00   0.99   0.99   0.97   0.97   0.99
simple              1.00   1.02   1.15   1.17   1.29   1.16
smith-normal-form   1.00   0.99   0.99   0.99   0.98   0.99
string-concat       1.00   1.00   1.01   1.02   0.30   0.27
tailfib             1.00   1.01   0.64   0.54   0.42   0.37
tak                 1.00   1.00   1.20   1.11   1.12   1.11
tensor              1.00   0.99   1.07   0.60   0.31   0.31
tsp                 1.00   1.00   0.70   0.70   0.69   0.68
tyan                1.00   1.00   1.10   1.12   1.06   1.05
vector32-concat     1.00   1.00   1.03   1.03   0.28   0.28
vector64-concat     1.00   1.00   1.03   1.03   0.38   0.37
vector-rev          1.00   1.00   1.02   1.01   0.66   0.67
vliw                1.00   0.97   1.17   1.14   0.98   1.04
wc-input1           1.00   1.00   1.45   0.99   0.99   0.98
wc-scanStream       1.00   1.00   1.90   1.17   1.04   1.02
zebra               1.00   1.00   0.98   0.98   1.03   1.02
zern                1.00   1.00   0.97   0.98   0.79   0.80
size
benchmark            MLton0    MLton1     MLton2    MLton3    MLton4    MLton5
barnes-hut          176,143   176,159    174,200   172,888   166,791   166,791
boyer               243,313   243,313    236,417   236,593   219,249   219,233
checksum            117,505   117,617    123,841   124,113   116,513   116,593
count-graphs        145,009   142,609    148,161   146,753   138,129   137,329
DLXSimulator        209,020   209,020    210,676   210,588   199,284   200,060
even-odd            117,473   117,473    123,905   124,065   116,625   116,625
fft                 142,251   142,251    147,030   146,806   131,990   132,134
fib                 117,393   117,393    123,777   123,937   116,561   116,577
flat-array          117,121   117,121    123,553   123,617   116,065   116,065
hamlet            1,434,172 1,368,524  1,467,332 1,409,796 1,534,804 1,501,716
imp-for             117,185   117,185    123,377   123,521   116,225   116,225
knuth-bendix        186,060   186,060    189,044   189,124   179,188   179,380
lexgen              290,875   290,891    305,459   305,491   292,483   293,683
life                141,057   141,057    144,961   145,249   135,953   135,953
logic               197,361   197,361    197,481   197,721   179,705   179,641
mandelbrot          117,217   117,233    127,185   127,329   116,225   116,225
matrix-multiply     119,521   119,521    128,785   128,833   117,249   117,249
md5                 144,620   144,620    148,124   148,460   139,404   139,452
merge               118,897   118,897    124,977   125,121   117,681   117,681
mlyacc              643,499   640,811    653,563   648,379   643,259   639,499
model-elimination   795,998   793,646    834,998   819,422   894,246   894,206
mpuz                123,489   123,489    129,441   129,489   121,553   121,649
nucleic             297,193   297,193    272,554   273,114   268,826   268,826
output1             151,712   151,712    155,312   155,440   146,528   147,472
peek                150,108   150,108    153,740   153,836   145,532   145,548
psdes-random        121,489   121,489    127,905   127,825   119,665   119,649
ratio-regions       144,081   144,081    150,953   150,937   141,353   141,401
ray                 250,002   250,578    250,395   250,843   236,562   236,050
raytrace            368,932   368,516    352,186   351,290   323,636   324,020
simple              345,149   347,261    366,423   369,391   352,702   354,726
smith-normal-form   279,781   279,781    257,005   257,573   246,317   246,821
string-concat       119,073   119,073    125,521   125,633   118,033   118,129
tailfib             117,217   117,217    123,617   123,729   116,209   116,209
tak                 117,393   117,393    123,809   124,001   116,625   116,625
tensor              179,236   178,452    178,236   179,148   166,780   166,860
tsp                 158,804   158,132    161,435   161,355   145,516   145,292
tyan                223,532   223,516    224,684   225,964   212,908   215,036
vector32-concat     118,241   118,241    124,657   124,753   117,377   117,361
vector64-concat     118,273   118,273    124,721   124,833   117,377   117,361
vector-rev          118,049   118,049    124,561   124,673   116,929   116,929
vliw                505,453   506,013    545,189   537,021   553,829   542,685
wc-input1           178,995   178,963    182,347   184,075   172,251   173,931
wc-scanStream       188,099   188,067    192,203   189,995   183,243   183,419
zebra               225,308   225,308    224,812   226,740   211,756   214,532
zern                153,185   153,217    154,183   154,231   142,695   142,695
compile time
benchmark         MLton0 MLton1 MLton2 MLton3MLton4 MLton5
barnes-hut          2.92   2.92   3.51   3.33  4.00   4.11
boyer               3.30   3.39   5.37   5.42  6.55   6.49
checksum            2.47   2.46   2.66   2.59  2.70   2.72
count-graphs        2.63   2.48   3.06   3.00  3.47   3.48
DLXSimulator        3.15   3.19   4.12   4.13  5.20   5.32
even-odd            2.47   2.45   2.60   2.64  2.72   2.73
fft                 2.55   2.60   2.84   2.85  3.07   3.06
fib                 2.46   2.48   2.63   2.63  2.60   2.69
flat-array          2.47   2.46   2.62   2.64  2.46   2.59
hamlet             14.06  14.00  24.68  23.27 45.79  44.97
imp-for             2.47   2.49   2.60   2.61  2.67   2.65
knuth-bendix        2.89   2.88   3.50   3.65  5.00   4.80
lexgen              3.67   3.53   4.86   5.04  7.34   7.09
life                2.58   2.60   2.90   2.97  3.36   3.36
logic               3.00   3.09   3.73   3.69  5.49   5.32
mandelbrot          2.48   2.49   2.63   2.61  2.52   2.72
matrix-multiply     2.48   2.51   2.65   2.67  2.76   2.76
md5                 2.64   2.66   3.00   3.04  3.32   3.10
merge               2.48   2.48   2.64   2.65  2.62   2.77
mlyacc              7.66   7.78  10.65  10.46 17.45  17.84
model-elimination   7.77   7.78  12.68  12.71 22.98  23.23
mpuz                2.51   2.50   2.75   2.73  2.96   2.80
nucleic             4.14   4.14   6.18   6.20  7.22   6.95
output1             2.65   2.66   2.88   3.09  3.42   3.49
peek                2.45   2.66   3.00   3.04  3.44   3.34
psdes-random        2.50   2.51   2.70   2.66  2.63   2.84
ratio-regions       2.76   2.78   3.30   3.33  3.67   3.68
ray                 3.44   3.54   4.42   4.45  5.58   6.00
raytrace            4.49   4.61   6.40   6.49  9.14   9.64
simple              4.03   3.84   5.11   5.31  7.96   8.48
smith-normal-form   3.82   3.79   6.90   7.01  9.13   9.50
string-concat       2.47   2.49   2.61   2.47  2.84   2.76
tailfib             2.25   2.49   2.58   2.67  2.67   2.70
tak                 2.44   2.45   2.57   2.63  2.63   2.72
tensor              3.05   2.99   3.47   3.77  4.39   4.40
tsp                 2.72   2.72   3.46   3.19  3.55   3.56
tyan                3.28   3.24   4.38   4.33  6.05   5.83
vector32-concat     2.46   2.52   2.62   2.63  2.72   2.72
vector64-concat     2.46   2.50   2.60   2.64  2.70   2.71
vector-rev          2.46   2.46   2.71   2.60  2.74   2.64
vliw                5.93   5.99   8.57   8.63 13.29  14.32
wc-input1           2.92   2.87   3.51   3.59  4.27   4.29
wc-scanStream       2.86   2.93   3.48   3.68  4.35   4.11
zebra               3.31   3.30   4.07   4.32  5.34   5.61
zern                2.62   2.65   2.84   2.73  3.20   3.30
run time
benchmark         MLton0 MLton1 MLton2 MLton3 MLton4 MLton5
barnes-hut         28.58  27.94  28.59  28.74  28.02  27.58
boyer              57.50  58.68  56.85  56.99  57.31  57.31
checksum           25.35  25.34  28.31  24.18  19.28  17.69
count-graphs       39.55  39.70  36.00  31.97  38.10  34.79
DLXSimulator       32.32  32.48  32.50  32.50  31.13  31.33
even-odd           39.08  39.09  43.50  46.85  39.02  39.01
fft                31.41  30.90  28.37  28.22  26.52  26.28
fib                17.99  17.97  21.49  21.24  19.40  21.92
flat-array         23.58  23.56  53.99  54.10   0.00   0.00
hamlet             39.62  39.99  86.12  82.22  79.01  87.44
imp-for            24.46  24.49  32.91  23.66  10.95  11.01
knuth-bendix       34.00  34.15  36.05  51.55  40.03  49.42
lexgen             33.56  33.26  31.44  34.04  30.31  32.23
life               38.83  38.92  40.64  40.37  42.47  40.36
logic              34.66  35.79  38.63  38.52  37.73  38.21
mandelbrot         35.80  35.81  13.45  14.93  10.40  10.45
matrix-multiply    29.72  29.69  22.15  16.98  14.16  14.22
md5                28.11  28.06  36.79  36.04  28.93  29.08
merge              32.36  32.17  32.27  32.39  31.96  32.14
mlyacc             32.44  30.98  35.93  35.08  36.68  34.43
model-elimination  38.01  38.71  58.68  60.36  54.40  55.79
mpuz               29.94  29.90  25.50  24.99  14.30  14.89
nucleic            33.74  33.54  28.86  28.79  29.28  29.33
output1            29.97  30.03  35.76  33.59  31.62  32.58
peek               34.24  34.16  35.93  36.58   5.74   5.64
psdes-random       34.00  33.90  24.85  25.37  27.36  27.31
ratio-regions      48.24  48.24  48.60  48.92  46.30  46.25
ray                39.55  37.91  35.55  35.90  37.47  39.88
raytrace           36.60  36.38  36.19  35.47  35.48  36.38
simple             29.33  29.88  33.70  34.45  37.98  34.01
smith-normal-form  39.47  39.23  39.19  39.24  38.79  38.92
string-concat      91.32  91.34  92.27  93.13  27.62  24.68
tailfib            38.04  38.34  24.32  20.60  16.00  13.90
tak                30.83  30.83  36.90  34.35  34.61  34.36
tensor             39.63  39.24  42.37  23.90  12.32  12.29
tsp                37.61  37.75  26.19  26.31  25.88  25.68
tyan               30.51  30.47  33.59  34.32  32.20  31.96
vector32-concat    82.46  82.37  84.75  84.74  23.36  23.23
vector64-concat    91.46  91.88  93.85  93.77  34.95  33.90
vector-rev         26.53  26.54  27.19  26.76  17.64  17.68
vliw               28.16  27.22  32.98  32.20  27.67  29.22
wc-input1          43.86  43.85  63.82  43.25  43.22  43.20
wc-scanStream      21.84  21.85  41.54  25.59  22.65  22.27
zebra              30.29  30.24  29.79  29.68  31.18  30.93
zern               31.87  31.96  31.02  31.18  25.23  25.36

It appears that GCC (and, to a lesser extent) Clang/LLVM do not always successfully fuse adjacent `Word<N>_<op>` and `Word{S,U}<N>_<op>CheckP` primitives. The performance results reported at MLton#273 and MLton#292 suggest that this does not always have significant impact, but a close look at the `md5` benchmark shows that the native codegen significantly outperforms the C codegen with gcc-9 due to redundant arithmetic computations (one for `Word{S,U}<N>_<op>CheckP` and another for `Word<N>_<op>`). This flag will be used to enable explicit fusing of adjacent `Word<N>_<op>` and `Word{S,U}<N>_<op>CheckP` primitives in the codegens.

It appears that GCC (and, to a lesser extent) Clang/LLVM do not always successfully fuse adjacent `Word<N>_<op>` and `Word{S,U}<N>_<op>CheckP` primitives. The performance results reported at MLton#273 and MLton#292 suggest that this does not always have significant impact, but a close look at the `md5` benchmark shows that the native codegen significantly outperforms the C codegen with gcc-9 due to redundant arithmetic computations (one for `Word{S,U}<N>_<op>CheckP` and another for `Word<N>_<op>`). These functions compute both the arithmetic result and a boolean indicating overflow (using `__builtin_<op>_overflow`). They will be used for explicit fusing of adjacent `Word<N>_<op>` and `Word{S,U}<N>_<op>CheckP` primitives in the C codegen for `-codegen-fuse-op-and-check true`.

It appears that GCC (and, to a lesser extent) Clang/LLVM do not always successfully fuse adjacent `Word<N>_<op>` and `Word{S,U}<N>_<op>CheckP` primitives. The performance results reported at MLton#273 and MLton#292 suggest that this does not always have significant impact, but a close look at the `md5` benchmark shows that the native codegen significantly outperforms the C codegen with gcc-9 due to redundant arithmetic computations (one for `Word{S,U}<N>_<op>CheckP` and another for `Word<N>_<op>`). (Note: Because the final md5 state is not used by the `md5` benchmark program, MLton actually optimizes out most of the md5 computation. What is left is a lot of arithmetic from `PackWord32Little.subVec` to check for indices that should raise `Subscript`.) For example, with `-codegen-fuse-op-and-check false` and gcc-9, the `transform` function of `md5` has the following assembly: movl %r9d, %r10d subl $1, %r10d jo .L650 leal -1(%r8), %r10d movl %r10d, %r12d addl %r10d, %edx jo .L650 addl %r10d, %r11d cmpl %eax, %r11d jnb .L656 movl %ebp, %edx addl $1, %edx jo .L659 leal 1(%rcx), %edx movl %edx, %r11d imull %r9d, %r11d jo .L650 imull %r8d, %edx movl %edx, %r11d addl %r10d, %r11d jo .L650 leal (%rdx,%r10), %r11d cmpl %eax, %r11d jnb .L665 What seems to have happened is that gcc has arranged for equivalent values to be in `%r8` and `%r9`. In the first three lines, there is an implementation of `WordS32_subCheckP (X, 1)` using `subl/jo`, while in the fourth line, there is an implementation of `Word32_sub (X, 1)` using `lea` with an offset of `-1`. Notice that `%r10` is used for the result of both, so the fourth line is redundant (the value is already in `%r10`). On the other hand, with `-codegen-fuse-op-and-check true` and gcc-9, the `transform` function of `md5` has the following assembly: movl %r8d, %r9d subl $1, %r9d jo .L645 addl %r9d, %ecx jo .L645 cmpl %edx, %ecx jnb .L651 movl %eax, %ecx addl $1, %ecx jo .L654 imull %r8d, %ecx jo .L645 addl %r9d, %ecx jo .L645 cmpl %edx, %ecx jnb .L660

Updates to C and LLVM codegens. Highlights: * Add `Machine.Program.rflow` to compute `{returns,raises}To` control flow (654c557) and use in `functor Chunkify` (1b3b7b8) and in Machine IR `Raise/Return` transfers (cf8e487). * Add `chunk-jump-table {false|true}` compile-time option to force generation of a jump table for the chunk switch (8e0dd2d, 5b6439b, 087a5b1). * Add `-chunk-{{must,may}-rto-self,must-rto-sing,must-rto-other}-opt` compile-time options to optimize return/raise transfers (7c10c70, 4d5abde, 4b7c649, c3b9905, 473808f) * Experiment using LLVM's `cc10` (aka, `ghccc`) calling convention (2e26ebd). * Experiment with a new `simple` chunkify strategy (3330cbe, 3d9c499, 138512f, faef164, d1df0de); generally performs about the same as `coalesce4096`, significantly improves `fib` and `tak` (for GCC), slightly improves `hamlet`, but slightly worsens `raytrace`: config command C04 /home/mtf/devel/mlton/builds/g098009d49/bin/mlton -codegen c -cc gcc-9 C05 /home/mtf/devel/mlton/builds/g098009d49/bin/mlton -codegen c -cc gcc-9 -chunkify simple C09 /home/mtf/devel/mlton/builds/g098009d49/bin/mlton -codegen llvm -cc clang C10 /home/mtf/devel/mlton/builds/g098009d49/bin/mlton -codegen llvm -cc clang -chunkify simple task_clock ratio_means.fieller@0.95 (2-level) program `C05/C04` `C10/C09` barnes-hut 0.9978 0.9589 boyer 1.064 1.076 checksum 1.051 0.9775 count-graphs 1.005 0.9876 DLXSimulator 1.000 0.9905 even-odd 1.037 0.9989 fft 0.9616 0.9537 fib 0.6689 0.6260 flat-array 1.000 0.9645 hamlet 0.9547 0.9322 imp-for 1.067 1.014 knuth-bendix 1.092 1.031 lexgen 1.031 1.078 life 1.002 0.9911 logic 1.016 1.015 mandelbrot 0.9776 1.030 matrix-multiply 0.9903 0.9844 md5 1.008 0.9940 merge 0.9927 1.062 mlyacc 0.9810 1.024 model-elimination 0.9877 0.9743 mpuz 1.011 1.010 nucleic 1.036 1.030 output1 0.9943 1.021 peek 1.036 1.027 pidigits 1.000 0.9653 psdes-random 1.009 1.014 ratio-regions 0.9985 0.9881 ray 0.9738 0.9601 raytrace 1.101 1.100 simple 0.9620 0.9272 smith-normal-form 0.9690 0.9806 string-concat 0.9610 0.9772 tailfib 1.006 0.9292 tailmerge 0.9847 1.023 tak 0.8264 1.013 tensor 1.010 0.9998 tsp 0.9981 1.010 tyan 1.045 1.027 vector-rev 1.012 0.9891 vector32-concat 0.9495 1.030 vector64-concat 1.098 0.9744 vliw 0.9413 1.019 wc-input1 0.9301 1.098 wc-scanStream 1.114 0.9234 zebra 1.008 1.001 zern 0.9819 1.014 MIN 0.6689 0.6260 GMEAN 0.9940 0.9912 MAX 1.114 1.100 The `simple` chunkify strategy is not (yet) suitable for a self-compile; it can generate excessively large chunks, including one for a self-compile that requires 8min to compile by `gcc`. * Add `expect: WordX.t option` to RSSA and Machine `Switch.T` (911b5d4, e2b27ab, 695320d) and add `-gc-expect {none|false|true}` compile-time option, where `-gc-expect false` should indicate that performing a GC is cold path (823815a); no notable performance impact. * Lots of tweaks to C codegen, ultimately eliminating almost all `c-chunk.h` macros. * Eliminate unused `Machine.Operand.Contents` constructor (006269b). * Make a major refactoring of LLVM codegen (cec30c5). * Implement `Real<N>_qequal` for C codegen (9b7b2bd) and use `is{less,lessequal}` for `Real<N>_l{t,e}` for C codegen (7b55819). * Generalize LLVM type-based alias-analysis (27709ef). * Add `-llvm-aamd scope` for simple `noalias`/`alias.scope` alias-analysis metadata in LLVM codegen (b825f56); no notable performance impact. * Use C99/C11 `inline` for primitive and Basis Library functions (311331c, c864492, 4f2d213). * Add `-codegen-fuse-op-and-chk {false|true}` compile-time option to explicitly fuse adjacent `Word<N>_<op>` and `Word{S,U}<N>_<op>CheckP` primitives in the C and LLVM codegens (6b738b8, 3d1e89c, 68f8512, 82c019f, 61de560, 5363199, 0d46a85). It appears that GCC (and, to a lesser extent) Clang/LLVM do not always successfully fuse adjacent adjacent `Word<N>_<op>` and `Word{S,U}<N>_<op>CheckP` primitives. The performance results reported at #273 and #292 suggest that this does not always have significant impact, but sometimes `-codegen-fuse-op-and-chk true` can have a positive. Unfortunately, it can also have a (significant) negative impact. In `matrix-multiply` and `vector-rev`, fusing can cause GCC to not recognize that an explicit sequence index can be replaced by a stride length; in these benchmarks, it would be nice if MLton eliminated the overflow checks. config command C04 /home/mtf/devel/mlton/builds/g098009d49/bin/mlton -codegen c -cc gcc-9 C09 /home/mtf/devel/mlton/builds/g098009d49/bin/mlton -codegen llvm -cc clang C11 /home/mtf/devel/mlton/builds/g098009d49/bin/mlton -codegen c -cc gcc-9 -codegen-fuse-op-and-chk true C15 /home/mtf/devel/mlton/builds/g098009d49/bin/mlton -codegen llvm -cc clang -codegen-fuse-op-and-chk true task_clock ratio_means.fieller@0.95 (2-level) program `C11/C04` `C15/C09` barnes-hut 1.005 0.9925 boyer 1.052 1.013 checksum 1.022 1.028 count-graphs 0.9722 1.002 DLXSimulator 1.004 0.9959 even-odd 0.8768 1.003 fft 0.9592 1.016 fib 0.9732 0.9798 flat-array 0.8148 1.019 hamlet 0.9966 1.030 imp-for 0.8993 0.7985 knuth-bendix 1.008 1.013 lexgen 0.9851 1.043 life 0.9954 1.006 logic 0.9994 1.014 mandelbrot 0.9440 1.013 matrix-multiply 1.336 1.009 md5 0.9604 1.007 merge 0.9675 1.037 mlyacc 1.032 1.029 model-elimination 1.010 1.004 mpuz 1.035 0.9599 nucleic 0.9938 0.9983 output1 0.9278 0.9709 peek 0.9850 1.035 pidigits 0.9702 0.9538 psdes-random 1.017 0.9986 ratio-regions 0.9801 0.9887 ray 0.9795 1.009 raytrace 0.9959 1.026 simple 0.9764 1.010 smith-normal-form 1.002 1.049 string-concat 0.7919 0.9035 tailfib 1.030 1.227 tailmerge 1.017 0.9980 tak 0.9790 0.9988 tensor 0.5258 1.000 tsp 0.9845 1.013 tyan 1.019 0.9739 vector-rev 1.178 1.253 vector32-concat 0.8703 0.9230 vector64-concat 0.8906 0.9038 vliw 0.9921 1.044 wc-input1 1.060 0.9809 wc-scanStream 0.9166 1.040 zebra 1.008 1.020 zern 1.051 1.089 MIN 0.5258 0.7985 GMEAN 0.9720 1.007 MAX 1.336 1.253 Note: the issue with `md5` mentioned in the commit messages are with respect to the `md5` benchmark before 2daaebf. Overall, this simplifies the C and LLVM codegen slightly, although there is little significant performance change: config command C02 /home/mtf/devel/mlton/builds/g89891a411/bin/mlton -codegen c -cc gcc-9 C04 /home/mtf/devel/mlton/builds/g098009d49/bin/mlton -codegen c -cc gcc-9 C08 /home/mtf/devel/mlton/builds/g89891a411/bin/mlton -codegen llvm -cc clang C09 /home/mtf/devel/mlton/builds/g098009d49/bin/mlton -codegen llvm -cc clang task_clock ratio_means.fieller@0.95 (2-level) program `C04/C02` `C09/C08` barnes-hut 1.036 1.025 boyer 0.9731 1.006 checksum 0.9652 1.002 count-graphs 0.9988 0.9964 DLXSimulator 0.9970 1.023 even-odd 1.002 0.9881 fft 1.026 0.9674 fib 0.9034 0.7846 flat-array 1.014 1.021 hamlet 0.9740 1.010 imp-for 0.9707 0.9908 knuth-bendix 0.9077 0.9777 lexgen 1.048 0.8985 life 1.002 0.9827 logic 1.006 0.9867 mandelbrot 1.000 1.011 matrix-multiply 1.020 0.9957 md5 0.9700 0.9960 merge 0.9974 0.9818 mlyacc 1.003 0.9824 model-elimination 0.9936 0.9817 mpuz 0.9815 0.9466 nucleic 0.9946 1.002 output1 1.007 1.026 peek 0.9832 0.9898 pidigits 0.9950 1.047 psdes-random 1.009 0.9869 ratio-regions 0.9978 0.9725 ray 0.9938 0.9663 raytrace 0.9975 1.032 simple 0.9936 1.000 smith-normal-form 1.038 0.9941 string-concat 1.041 1.014 tailfib 0.9865 0.9741 tailmerge 1.010 1.020 tak 0.9331 0.9041 tensor 0.9938 0.9941 tsp 0.9825 1.004 tyan 0.9960 0.9879 vector-rev 1.014 0.9091 vector32-concat 1.090 0.9016 vector64-concat 0.9994 0.9800 vliw 0.9995 0.9876 wc-input1 0.9685 0.8634 wc-scanStream 1.178 1.105 zebra 0.9857 0.9900 zern 0.9733 0.9890 MIN 0.9034 0.7846 GMEAN 0.9982 0.9815 MAX 1.178 1.105

daemanos and others added 19 commits February 25, 2019 11:27

Remove old overflow definitions

0e92585

Export *CheckP primitive entries

95ab863

Refactor limit-check to use new overflow checks

c6320de

Deprecate internal references to old overflow prims

289a42a

Correct new implementation of limit-check

54bdda5

Remove newOverflow controls from basis library

2fe9079

Add equality for Word_*CheckP primitives

58f9e18

Deprecate SSA(2) Arith transfers

9c5082f

Deprecate backend/codegen Arith transfers

ab1dd1e

Clean up unused warnings

13f7eac

Correct type error in limit-check

5ff131d

Remove old special-case overflow handling

ed090de

Remove deprecated code sections

6bc666c

Remove superfluous macros from C backend

db12ffa

Remove redundant arith bookkeeping from loop-unroll

5d88ac0

Remove ApplyResult.Overflow constructor

11c33d8

Remove deprecated code sections

4d22c60

Use Overflow rather than Exn.Overflow

ddc04c3

With `Prim.ApplyResult.Overflow` constructor removed, there is no confusion with the pervasive `Overflow` exception.

Consolidate Word<N>_<op> and Word<N>_<op>CheckP in Word-ops.h

9d251db

MatthewFluet merged commit e7b6276 into MLton:master Mar 21, 2019

MatthewFluet mentioned this pull request Nov 22, 2019

C and LLVM codegen updates #351

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove old-style arithmetic primitives #292

Remove old-style arithmetic primitives #292

daemanos commented Mar 18, 2019

MatthewFluet commented Mar 21, 2019

Remove old-style arithmetic primitives #292

Remove old-style arithmetic primitives #292

Conversation

daemanos commented Mar 18, 2019

MatthewFluet commented Mar 21, 2019

Performance