Fix all regressions relative to v1.1 #116

curiousleo · 2020-04-29T08:42:17Z

This combines changes to the bitmask-rejection methods from #103 with a few benchmark-driven INLINE pragmas.

Context

In interface-to-performance right now, there are performance regressions relative to v1.1:

$ ./scripts/compare.py benchmarks-backport-9f3e7f6.csv interface-to-performance-0d51fa4.csv 
SLOWER
                              Name  Mean_ref  Mean_res Diff_rel
           pure/uniformR/full/Int8  0.018059  0.029977     -40%
          pure/uniformR/full/Int16  0.019081  0.030798     -38%
          pure/uniformR/full/CChar  0.019026  0.029913     -36%
         pure/uniformR/full/CSChar  0.018037  0.031495     -43%
         pure/uniformR/full/CShort  0.019681  0.031254     -37%
        pure/uniformR/full/CUShort  0.017602  0.026783     -34%
     pure/uniformR/excludeMax/Int8  0.019803  0.030902     -36%
    pure/uniformR/excludeMax/Int16  0.019031  0.031986     -41%
    pure/uniformR/excludeMax/CChar  0.019136  0.031197     -39%
   pure/uniformR/excludeMax/CSChar  0.018025  0.030840     -42%
   pure/uniformR/excludeMax/CShort  0.018187  0.031697     -43%
  pure/uniformR/excludeMax/CUShort  0.017693  0.026705     -34%
    pure/uniformR/includeHalf/Int8  0.019642  0.033753     -42%
   pure/uniformR/includeHalf/Int16  0.018021  0.034185     -47%
   pure/uniformR/includeHalf/CChar  0.018114  0.033438     -46%
  pure/uniformR/includeHalf/CSChar  0.018453  0.034654     -47%
  pure/uniformR/includeHalf/CShort  0.019080  0.033611     -43%
 pure/uniformR/includeHalf/CUShort  0.018539  0.026329     -30%


FASTER
[all other generator functions are faster]

This PR: no more regressions relative to v1.1

This PR speeds up the slower generators such that every benchmarked function runs faster than on v1.1 by 1400% or more.

Comparison between v1.1 (reference) and this branch:

$ ./scripts/compare.py benchmarks-backport-9f3e7f6.csv fix-signed-regression-8c9c14c.csv 
SLOWER: none


FASTER
                                 Name  Mean_ref  Mean_res Diff_rel
                    pure/random/Float  0.027694  0.000368   7,416%
                   pure/random/Double  0.050627  0.000370  13,581%
                  pure/random/Integer  0.042463  0.000410  10,262%
             pure/uniformR/full/Word8  0.018167  0.000028  64,198%
            pure/uniformR/full/Word16  0.017675  0.000028  62,869%
            pure/uniformR/full/Word32  0.027825  0.000028  98,267%
            pure/uniformR/full/Word64  0.051293  0.000029 178,944%
              pure/uniformR/full/Word  0.054110  0.000029 187,886%
              pure/uniformR/full/Int8  0.018059  0.000776   2,228%
             pure/uniformR/full/Int16  0.019081  0.000466   3,992%
             pure/uniformR/full/Int32  0.030808  0.000451   6,738%
             pure/uniformR/full/Int64  0.053720  0.000029 182,988%
               pure/uniformR/full/Int  0.054235  0.000029 188,930%
              pure/uniformR/full/Char  0.018840  0.000172  10,882%
              pure/uniformR/full/Bool  0.018606  0.000029  63,078%
             pure/uniformR/full/CChar  0.019026  0.000654   2,811%
            pure/uniformR/full/CSChar  0.018037  0.000693   2,502%
            pure/uniformR/full/CUChar  0.018277  0.000029  63,573%
            pure/uniformR/full/CShort  0.019681  0.000398   4,843%
           pure/uniformR/full/CUShort  0.017602  0.000028  63,312%
              pure/uniformR/full/CInt  0.030277  0.000421   7,085%
             pure/uniformR/full/CUInt  0.028652  0.000028 102,453%
             pure/uniformR/full/CLong  0.053663  0.000028 191,645%
            pure/uniformR/full/CULong  0.050913  0.000028 181,127%
          pure/uniformR/full/CPtrdiff  0.056398  0.000029 194,373%
             pure/uniformR/full/CSize  0.053483  0.000029 186,379%
            pure/uniformR/full/CWchar  0.030157  0.000440   6,757%
        pure/uniformR/full/CSigAtomic  0.029044  0.000423   6,771%
            pure/uniformR/full/CLLong  0.051786  0.000057  90,101%
           pure/uniformR/full/CULLong  0.052079  0.000029 180,202%
           pure/uniformR/full/CIntPtr  0.055634  0.000029 193,886%
          pure/uniformR/full/CUIntPtr  0.057014  0.000029 195,080%
           pure/uniformR/full/CIntMax  0.060248  0.000029 205,908%
          pure/uniformR/full/CUIntMax  0.056299  0.000029 193,388%
       pure/uniformR/excludeMax/Word8  0.018251  0.000138  13,125%
      pure/uniformR/excludeMax/Word16  0.019688  0.000179  10,924%
      pure/uniformR/excludeMax/Word32  0.032734  0.000172  18,973%
      pure/uniformR/excludeMax/Word64  0.052261  0.000406  12,785%
        pure/uniformR/excludeMax/Word  0.055847  0.000358  15,510%
        pure/uniformR/excludeMax/Int8  0.019803  0.000732   2,605%
       pure/uniformR/excludeMax/Int16  0.019031  0.000486   3,817%
       pure/uniformR/excludeMax/Int32  0.030451  0.000431   6,957%
       pure/uniformR/excludeMax/Int64  0.051712  0.000393  13,043%
         pure/uniformR/excludeMax/Int  0.049465  0.000348  14,099%
        pure/uniformR/excludeMax/Char  0.017772  0.000181   9,745%
        pure/uniformR/excludeMax/Bool  0.016241  0.000062  26,091%
       pure/uniformR/excludeMax/CChar  0.019136  0.000747   2,461%
      pure/uniformR/excludeMax/CSChar  0.018025  0.000780   2,211%
      pure/uniformR/excludeMax/CUChar  0.018654  0.000147  12,629%
      pure/uniformR/excludeMax/CShort  0.018187  0.000407   4,369%
     pure/uniformR/excludeMax/CUShort  0.017693  0.000140  12,535%
        pure/uniformR/excludeMax/CInt  0.028158  0.000433   6,400%
       pure/uniformR/excludeMax/CUInt  0.027917  0.000171  16,265%
       pure/uniformR/excludeMax/CLong  0.050065  0.000421  11,788%
      pure/uniformR/excludeMax/CULong  0.047202  0.000347  13,509%
    pure/uniformR/excludeMax/CPtrdiff  0.051724  0.000426  12,033%
       pure/uniformR/excludeMax/CSize  0.048485  0.000322  14,974%
      pure/uniformR/excludeMax/CWchar  0.028712  0.000475   5,951%
  pure/uniformR/excludeMax/CSigAtomic  0.028415  0.000383   7,325%
      pure/uniformR/excludeMax/CLLong  0.051382  0.000380  13,427%
     pure/uniformR/excludeMax/CULLong  0.051246  0.000324  15,706%
     pure/uniformR/excludeMax/CIntPtr  0.050008  0.000377  13,163%
    pure/uniformR/excludeMax/CUIntPtr  0.050295  0.000354  14,104%
     pure/uniformR/excludeMax/CIntMax  0.052455  0.000412  12,641%
    pure/uniformR/excludeMax/CUIntMax  0.047624  0.000364  12,999%
      pure/uniformR/includeHalf/Word8  0.018976  0.000149  12,659%
     pure/uniformR/includeHalf/Word16  0.018292  0.000250   7,216%
     pure/uniformR/includeHalf/Word32  0.030351  0.001203   2,423%
     pure/uniformR/includeHalf/Word64  0.046397  0.001419   3,170%
       pure/uniformR/includeHalf/Word  0.050661  0.001341   3,679%
       pure/uniformR/includeHalf/Int8  0.019642  0.001304   1,406%
      pure/uniformR/includeHalf/Int16  0.018021  0.000778   2,216%
      pure/uniformR/includeHalf/Int32  0.028413  0.000639   4,345%
      pure/uniformR/includeHalf/Int64  0.048155  0.000686   6,925%
        pure/uniformR/includeHalf/Int  0.050257  0.000694   7,140%
       pure/uniformR/includeHalf/Char  0.019314  0.000177  10,819%
       pure/uniformR/includeHalf/Bool  0.017146  0.000056  30,693%
      pure/uniformR/includeHalf/CChar  0.018114  0.001044   1,635%
     pure/uniformR/includeHalf/CSChar  0.018453  0.001134   1,527%
     pure/uniformR/includeHalf/CUChar  0.017750  0.000155  11,353%
     pure/uniformR/includeHalf/CShort  0.019080  0.000715   2,568%
    pure/uniformR/includeHalf/CUShort  0.018539  0.000152  12,098%
       pure/uniformR/includeHalf/CInt  0.029964  0.000671   4,362%
      pure/uniformR/includeHalf/CUInt  0.030343  0.001210   2,407%
      pure/uniformR/includeHalf/CLong  0.047799  0.000651   7,243%
     pure/uniformR/includeHalf/CULong  0.046741  0.001368   3,317%
   pure/uniformR/includeHalf/CPtrdiff  0.051961  0.000586   8,774%
      pure/uniformR/includeHalf/CSize  0.048618  0.001409   3,350%
     pure/uniformR/includeHalf/CWchar  0.030430  0.000679   4,380%
 pure/uniformR/includeHalf/CSigAtomic  0.029380  0.000607   4,743%
     pure/uniformR/includeHalf/CLLong  0.051833  0.000593   8,637%
    pure/uniformR/includeHalf/CULLong  0.048065  0.001353   3,453%
    pure/uniformR/includeHalf/CIntPtr  0.048133  0.000724   6,544%
   pure/uniformR/includeHalf/CUIntPtr  0.048777  0.001378   3,441%
    pure/uniformR/includeHalf/CIntMax  0.051089  0.000584   8,652%
   pure/uniformR/includeHalf/CUIntMax  0.048438  0.001354   3,478%
        pure/uniformR/unbounded/Float  0.057696  0.000428  13,372%
       pure/uniformR/unbounded/Double  0.079544  0.000383  20,671%

Some regressions relative to `interface-to-performance`

Strangely, while speeding up many generator functions significantly, this PR introduces some mild regressions relative to interface-to-performance.

Comparison between interface-to-performance (reference) and this branch:

$ ./scripts/compare.py interface-to-performance-0d51fa4.csv fix-signed-regression-8c9c14c.csv 
SLOWER
                             Name  Mean_ref  Mean_res Diff_rel
   pure/uniformR/excludeMax/CBool  0.000140  0.000180     -22%
 pure/uniformR/includeHalf/Word16  0.000134  0.000250     -47%
  pure/uniformR/includeHalf/CBool  0.000145  0.000242     -40%
    pure/uniformR/unbounded/Float  0.000341  0.000428     -20%


FASTER
                                 Name  Mean_ref  Mean_res Diff_rel
            pure/uniformR/full/Word64  0.001532  0.000029   5,248%
              pure/uniformR/full/Word  0.001574  0.000029   5,369%
              pure/uniformR/full/Int8  0.029977  0.000776   3,765%
             pure/uniformR/full/Int16  0.030798  0.000466   6,505%
             pure/uniformR/full/Int32  0.030131  0.000451   6,588%
             pure/uniformR/full/Int64  0.030030  0.000029 102,248%
               pure/uniformR/full/Int  0.001336  0.000029   4,555%
              pure/uniformR/full/Char  0.000220  0.000172      28%
             pure/uniformR/full/CChar  0.029913  0.000654   4,476%
            pure/uniformR/full/CSChar  0.031495  0.000693   4,444%
            pure/uniformR/full/CShort  0.031254  0.000398   7,750%
           pure/uniformR/full/CUShort  0.026783  0.000028  96,385%
              pure/uniformR/full/CInt  0.031104  0.000421   7,281%
             pure/uniformR/full/CUInt  0.019249  0.000028  68,797%
             pure/uniformR/full/CLong  0.029671  0.000028 105,917%
            pure/uniformR/full/CULong  0.001618  0.000028   5,659%
          pure/uniformR/full/CPtrdiff  0.030656  0.000029 105,606%
             pure/uniformR/full/CSize  0.001671  0.000029   5,727%
            pure/uniformR/full/CWchar  0.029691  0.000440   6,651%
        pure/uniformR/full/CSigAtomic  0.029769  0.000423   6,943%
            pure/uniformR/full/CLLong  0.028358  0.000057  49,295%
           pure/uniformR/full/CULLong  0.001726  0.000029   5,874%
           pure/uniformR/full/CIntPtr  0.029031  0.000029 101,126%
          pure/uniformR/full/CUIntPtr  0.001635  0.000029   5,497%
           pure/uniformR/full/CIntMax  0.030994  0.000029 105,878%
          pure/uniformR/full/CUIntMax  0.001664  0.000029   5,620%
      pure/uniformR/excludeMax/Word64  0.001602  0.000406     295%
        pure/uniformR/excludeMax/Word  0.001748  0.000358     389%
        pure/uniformR/excludeMax/Int8  0.030902  0.000732   4,121%
       pure/uniformR/excludeMax/Int16  0.031986  0.000486   6,483%
       pure/uniformR/excludeMax/Int32  0.030413  0.000431   6,948%
       pure/uniformR/excludeMax/Int64  0.028208  0.000393   7,069%
         pure/uniformR/excludeMax/Int  0.001446  0.000348     315%
       pure/uniformR/excludeMax/CChar  0.031197  0.000747   4,075%
      pure/uniformR/excludeMax/CSChar  0.030840  0.000780   3,855%
      pure/uniformR/excludeMax/CShort  0.031697  0.000407   7,688%
     pure/uniformR/excludeMax/CUShort  0.026705  0.000140  18,971%
        pure/uniformR/excludeMax/CInt  0.029532  0.000433   6,718%
       pure/uniformR/excludeMax/CUInt  0.024994  0.000171  14,552%
       pure/uniformR/excludeMax/CLong  0.030473  0.000421   7,136%
      pure/uniformR/excludeMax/CULong  0.001598  0.000347     361%
    pure/uniformR/excludeMax/CPtrdiff  0.027797  0.000426   6,420%
       pure/uniformR/excludeMax/CSize  0.001746  0.000322     443%
      pure/uniformR/excludeMax/CWchar  0.029053  0.000475   6,023%
  pure/uniformR/excludeMax/CSigAtomic  0.031058  0.000383   8,016%
      pure/uniformR/excludeMax/CLLong  0.030359  0.000380   7,892%
     pure/uniformR/excludeMax/CULLong  0.001687  0.000324     420%
     pure/uniformR/excludeMax/CIntPtr  0.029798  0.000377   7,803%
    pure/uniformR/excludeMax/CUIntPtr  0.001693  0.000354     378%
     pure/uniformR/excludeMax/CIntMax  0.030259  0.000412   7,250%
    pure/uniformR/excludeMax/CUIntMax  0.001704  0.000364     369%
     pure/uniformR/includeHalf/Word64  0.002790  0.001419      97%
       pure/uniformR/includeHalf/Word  0.002990  0.001341     123%
       pure/uniformR/includeHalf/Int8  0.033753  0.001304   2,488%
      pure/uniformR/includeHalf/Int16  0.034185  0.000778   4,294%
      pure/uniformR/includeHalf/Int32  0.034772  0.000639   5,340%
      pure/uniformR/includeHalf/Int64  0.031111  0.000686   4,438%
        pure/uniformR/includeHalf/Int  0.001745  0.000694     151%
      pure/uniformR/includeHalf/CChar  0.033438  0.001044   3,103%
     pure/uniformR/includeHalf/CSChar  0.034654  0.001134   2,955%
     pure/uniformR/includeHalf/CShort  0.033611  0.000715   4,599%
    pure/uniformR/includeHalf/CUShort  0.026329  0.000152  17,224%
       pure/uniformR/includeHalf/CInt  0.035288  0.000671   5,155%
      pure/uniformR/includeHalf/CUInt  0.032928  0.001210   2,620%
      pure/uniformR/includeHalf/CLong  0.031242  0.000651   4,699%
     pure/uniformR/includeHalf/CULong  0.002857  0.001368     109%
   pure/uniformR/includeHalf/CPtrdiff  0.031445  0.000586   5,270%
      pure/uniformR/includeHalf/CSize  0.002738  0.001409      94%
     pure/uniformR/includeHalf/CWchar  0.033279  0.000679   4,799%
 pure/uniformR/includeHalf/CSigAtomic  0.034464  0.000607   5,581%
     pure/uniformR/includeHalf/CLLong  0.031628  0.000593   5,231%
    pure/uniformR/includeHalf/CULLong  0.002786  0.001353     106%
    pure/uniformR/includeHalf/CIntPtr  0.033423  0.000724   4,514%
   pure/uniformR/includeHalf/CUIntPtr  0.002784  0.001378     102%
    pure/uniformR/includeHalf/CIntMax  0.031373  0.000584   5,274%
   pure/uniformR/includeHalf/CUIntMax  0.002866  0.001354     112%

As an additional observation, adding one more INLINE, e.g. for Int8's uniformRM, actually makes nextWord32 and nextWord64 slower! My guess is that GHC has some sort of "inlining budget", and that adding another INLINE pragma somewhere leads to less inlining elsewhere. For this reason I've only added INLINE pragmas where they improved the benchmark results significantly, and removed them if they led to slower generated code.

Conclusion

While I can't fully explain the non-local effects I observed with respect to inlining, this PR does objectively remove all regression we previously had relative to v1.1, so I suggest we merge.

curiousleo · 2020-04-29T08:52:24Z

Note that I tried to keep this PR minimal and only apply changes that had a significant impact in the benchmarks. That means, for example, that I didn't copy over the coerce changes from #103 because they did not appear to have a significant performance impact. We can of course still make those changes later on, but this PR is meant to be laser-focused on removing any regressions relative to v1.1, and nothing more.

curiousleo · 2020-04-29T10:43:47Z

Ready for review.

idontgetoutmuch

Great work :)

* Make bitmask-with-rejection non-recursive * INLINE some uniformRM implementations

…hortbytestring Improve uniform `ShortByteString`

curiousleo added 2 commits April 28, 2020 16:39

Make bitmask-with-rejection non-recursive

350b0e2

INLINE some uniformRM implementations

8c9c14c

curiousleo requested review from idontgetoutmuch and lehins April 29, 2020 08:42

curiousleo changed the title ~~Fix regressions relative to v1.1~~ Fix all regressions relative to v1.1 Apr 29, 2020

idontgetoutmuch approved these changes Apr 29, 2020

View reviewed changes

lehins approved these changes Apr 29, 2020

View reviewed changes

curiousleo merged commit 7f91c2f into interface-to-performance Apr 29, 2020

curiousleo deleted the fix-signed-regression branch April 29, 2020 17:16

curiousleo mentioned this pull request May 5, 2020

Improve performance #103

Closed

curiousleo added a commit that referenced this pull request May 13, 2020

Fix all regressions relative to v1.1 (#116)

d3698fa

* Make bitmask-with-rejection non-recursive * INLINE some uniformRM implementations

curiousleo added a commit that referenced this pull request May 13, 2020

Fix all regressions relative to v1.1 (#116)

cf1271c

* Make bitmask-with-rejection non-recursive * INLINE some uniformRM implementations

lehins pushed a commit that referenced this pull request May 18, 2020

Fix all regressions relative to v1.1 (#116)

fa65312

* Make bitmask-with-rejection non-recursive * INLINE some uniformRM implementations

curiousleo added a commit that referenced this pull request May 19, 2020

Fix all regressions relative to v1.1 (#116)

8327a7b

* Make bitmask-with-rejection non-recursive * INLINE some uniformRM implementations

Shimuuar pushed a commit to Shimuuar/random that referenced this pull request Jan 6, 2025

Merge pull request idontgetoutmuch#116 from haskell/improve-uniform-s…

d819629

…hortbytestring Improve uniform `ShortByteString`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix all regressions relative to v1.1 #116

Fix all regressions relative to v1.1 #116

curiousleo commented Apr 29, 2020 •

edited

Loading

curiousleo commented Apr 29, 2020

curiousleo commented Apr 29, 2020

idontgetoutmuch left a comment

Fix all regressions relative to v1.1 #116

Fix all regressions relative to v1.1 #116

Conversation

curiousleo commented Apr 29, 2020 • edited Loading

Context

This PR: no more regressions relative to v1.1

Some regressions relative to interface-to-performance

Conclusion

curiousleo commented Apr 29, 2020

curiousleo commented Apr 29, 2020

idontgetoutmuch left a comment

Choose a reason for hiding this comment

curiousleo commented Apr 29, 2020 •

edited

Loading

Some regressions relative to `interface-to-performance`