Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix all regressions relative to v1.1 #116

Merged
merged 2 commits into from
Apr 29, 2020

Conversation

curiousleo
Copy link
Collaborator

@curiousleo curiousleo commented Apr 29, 2020

This combines changes to the bitmask-rejection methods from #103 with a few benchmark-driven INLINE pragmas.

Context

In interface-to-performance right now, there are performance regressions relative to v1.1:

$ ./scripts/compare.py benchmarks-backport-9f3e7f6.csv interface-to-performance-0d51fa4.csv 
SLOWER
                              Name  Mean_ref  Mean_res Diff_rel
           pure/uniformR/full/Int8  0.018059  0.029977     -40%
          pure/uniformR/full/Int16  0.019081  0.030798     -38%
          pure/uniformR/full/CChar  0.019026  0.029913     -36%
         pure/uniformR/full/CSChar  0.018037  0.031495     -43%
         pure/uniformR/full/CShort  0.019681  0.031254     -37%
        pure/uniformR/full/CUShort  0.017602  0.026783     -34%
     pure/uniformR/excludeMax/Int8  0.019803  0.030902     -36%
    pure/uniformR/excludeMax/Int16  0.019031  0.031986     -41%
    pure/uniformR/excludeMax/CChar  0.019136  0.031197     -39%
   pure/uniformR/excludeMax/CSChar  0.018025  0.030840     -42%
   pure/uniformR/excludeMax/CShort  0.018187  0.031697     -43%
  pure/uniformR/excludeMax/CUShort  0.017693  0.026705     -34%
    pure/uniformR/includeHalf/Int8  0.019642  0.033753     -42%
   pure/uniformR/includeHalf/Int16  0.018021  0.034185     -47%
   pure/uniformR/includeHalf/CChar  0.018114  0.033438     -46%
  pure/uniformR/includeHalf/CSChar  0.018453  0.034654     -47%
  pure/uniformR/includeHalf/CShort  0.019080  0.033611     -43%
 pure/uniformR/includeHalf/CUShort  0.018539  0.026329     -30%


FASTER
[all other generator functions are faster]

This PR: no more regressions relative to v1.1

This PR speeds up the slower generators such that every benchmarked function runs faster than on v1.1 by 1400% or more.

Comparison between v1.1 (reference) and this branch:

$ ./scripts/compare.py benchmarks-backport-9f3e7f6.csv fix-signed-regression-8c9c14c.csv 
SLOWER: none


FASTER
                                 Name  Mean_ref  Mean_res Diff_rel
                    pure/random/Float  0.027694  0.000368   7,416%
                   pure/random/Double  0.050627  0.000370  13,581%
                  pure/random/Integer  0.042463  0.000410  10,262%
             pure/uniformR/full/Word8  0.018167  0.000028  64,198%
            pure/uniformR/full/Word16  0.017675  0.000028  62,869%
            pure/uniformR/full/Word32  0.027825  0.000028  98,267%
            pure/uniformR/full/Word64  0.051293  0.000029 178,944%
              pure/uniformR/full/Word  0.054110  0.000029 187,886%
              pure/uniformR/full/Int8  0.018059  0.000776   2,228%
             pure/uniformR/full/Int16  0.019081  0.000466   3,992%
             pure/uniformR/full/Int32  0.030808  0.000451   6,738%
             pure/uniformR/full/Int64  0.053720  0.000029 182,988%
               pure/uniformR/full/Int  0.054235  0.000029 188,930%
              pure/uniformR/full/Char  0.018840  0.000172  10,882%
              pure/uniformR/full/Bool  0.018606  0.000029  63,078%
             pure/uniformR/full/CChar  0.019026  0.000654   2,811%
            pure/uniformR/full/CSChar  0.018037  0.000693   2,502%
            pure/uniformR/full/CUChar  0.018277  0.000029  63,573%
            pure/uniformR/full/CShort  0.019681  0.000398   4,843%
           pure/uniformR/full/CUShort  0.017602  0.000028  63,312%
              pure/uniformR/full/CInt  0.030277  0.000421   7,085%
             pure/uniformR/full/CUInt  0.028652  0.000028 102,453%
             pure/uniformR/full/CLong  0.053663  0.000028 191,645%
            pure/uniformR/full/CULong  0.050913  0.000028 181,127%
          pure/uniformR/full/CPtrdiff  0.056398  0.000029 194,373%
             pure/uniformR/full/CSize  0.053483  0.000029 186,379%
            pure/uniformR/full/CWchar  0.030157  0.000440   6,757%
        pure/uniformR/full/CSigAtomic  0.029044  0.000423   6,771%
            pure/uniformR/full/CLLong  0.051786  0.000057  90,101%
           pure/uniformR/full/CULLong  0.052079  0.000029 180,202%
           pure/uniformR/full/CIntPtr  0.055634  0.000029 193,886%
          pure/uniformR/full/CUIntPtr  0.057014  0.000029 195,080%
           pure/uniformR/full/CIntMax  0.060248  0.000029 205,908%
          pure/uniformR/full/CUIntMax  0.056299  0.000029 193,388%
       pure/uniformR/excludeMax/Word8  0.018251  0.000138  13,125%
      pure/uniformR/excludeMax/Word16  0.019688  0.000179  10,924%
      pure/uniformR/excludeMax/Word32  0.032734  0.000172  18,973%
      pure/uniformR/excludeMax/Word64  0.052261  0.000406  12,785%
        pure/uniformR/excludeMax/Word  0.055847  0.000358  15,510%
        pure/uniformR/excludeMax/Int8  0.019803  0.000732   2,605%
       pure/uniformR/excludeMax/Int16  0.019031  0.000486   3,817%
       pure/uniformR/excludeMax/Int32  0.030451  0.000431   6,957%
       pure/uniformR/excludeMax/Int64  0.051712  0.000393  13,043%
         pure/uniformR/excludeMax/Int  0.049465  0.000348  14,099%
        pure/uniformR/excludeMax/Char  0.017772  0.000181   9,745%
        pure/uniformR/excludeMax/Bool  0.016241  0.000062  26,091%
       pure/uniformR/excludeMax/CChar  0.019136  0.000747   2,461%
      pure/uniformR/excludeMax/CSChar  0.018025  0.000780   2,211%
      pure/uniformR/excludeMax/CUChar  0.018654  0.000147  12,629%
      pure/uniformR/excludeMax/CShort  0.018187  0.000407   4,369%
     pure/uniformR/excludeMax/CUShort  0.017693  0.000140  12,535%
        pure/uniformR/excludeMax/CInt  0.028158  0.000433   6,400%
       pure/uniformR/excludeMax/CUInt  0.027917  0.000171  16,265%
       pure/uniformR/excludeMax/CLong  0.050065  0.000421  11,788%
      pure/uniformR/excludeMax/CULong  0.047202  0.000347  13,509%
    pure/uniformR/excludeMax/CPtrdiff  0.051724  0.000426  12,033%
       pure/uniformR/excludeMax/CSize  0.048485  0.000322  14,974%
      pure/uniformR/excludeMax/CWchar  0.028712  0.000475   5,951%
  pure/uniformR/excludeMax/CSigAtomic  0.028415  0.000383   7,325%
      pure/uniformR/excludeMax/CLLong  0.051382  0.000380  13,427%
     pure/uniformR/excludeMax/CULLong  0.051246  0.000324  15,706%
     pure/uniformR/excludeMax/CIntPtr  0.050008  0.000377  13,163%
    pure/uniformR/excludeMax/CUIntPtr  0.050295  0.000354  14,104%
     pure/uniformR/excludeMax/CIntMax  0.052455  0.000412  12,641%
    pure/uniformR/excludeMax/CUIntMax  0.047624  0.000364  12,999%
      pure/uniformR/includeHalf/Word8  0.018976  0.000149  12,659%
     pure/uniformR/includeHalf/Word16  0.018292  0.000250   7,216%
     pure/uniformR/includeHalf/Word32  0.030351  0.001203   2,423%
     pure/uniformR/includeHalf/Word64  0.046397  0.001419   3,170%
       pure/uniformR/includeHalf/Word  0.050661  0.001341   3,679%
       pure/uniformR/includeHalf/Int8  0.019642  0.001304   1,406%
      pure/uniformR/includeHalf/Int16  0.018021  0.000778   2,216%
      pure/uniformR/includeHalf/Int32  0.028413  0.000639   4,345%
      pure/uniformR/includeHalf/Int64  0.048155  0.000686   6,925%
        pure/uniformR/includeHalf/Int  0.050257  0.000694   7,140%
       pure/uniformR/includeHalf/Char  0.019314  0.000177  10,819%
       pure/uniformR/includeHalf/Bool  0.017146  0.000056  30,693%
      pure/uniformR/includeHalf/CChar  0.018114  0.001044   1,635%
     pure/uniformR/includeHalf/CSChar  0.018453  0.001134   1,527%
     pure/uniformR/includeHalf/CUChar  0.017750  0.000155  11,353%
     pure/uniformR/includeHalf/CShort  0.019080  0.000715   2,568%
    pure/uniformR/includeHalf/CUShort  0.018539  0.000152  12,098%
       pure/uniformR/includeHalf/CInt  0.029964  0.000671   4,362%
      pure/uniformR/includeHalf/CUInt  0.030343  0.001210   2,407%
      pure/uniformR/includeHalf/CLong  0.047799  0.000651   7,243%
     pure/uniformR/includeHalf/CULong  0.046741  0.001368   3,317%
   pure/uniformR/includeHalf/CPtrdiff  0.051961  0.000586   8,774%
      pure/uniformR/includeHalf/CSize  0.048618  0.001409   3,350%
     pure/uniformR/includeHalf/CWchar  0.030430  0.000679   4,380%
 pure/uniformR/includeHalf/CSigAtomic  0.029380  0.000607   4,743%
     pure/uniformR/includeHalf/CLLong  0.051833  0.000593   8,637%
    pure/uniformR/includeHalf/CULLong  0.048065  0.001353   3,453%
    pure/uniformR/includeHalf/CIntPtr  0.048133  0.000724   6,544%
   pure/uniformR/includeHalf/CUIntPtr  0.048777  0.001378   3,441%
    pure/uniformR/includeHalf/CIntMax  0.051089  0.000584   8,652%
   pure/uniformR/includeHalf/CUIntMax  0.048438  0.001354   3,478%
        pure/uniformR/unbounded/Float  0.057696  0.000428  13,372%
       pure/uniformR/unbounded/Double  0.079544  0.000383  20,671%

Some regressions relative to interface-to-performance

Strangely, while speeding up many generator functions significantly, this PR introduces some mild regressions relative to interface-to-performance.

Comparison between interface-to-performance (reference) and this branch:

$ ./scripts/compare.py interface-to-performance-0d51fa4.csv fix-signed-regression-8c9c14c.csv 
SLOWER
                             Name  Mean_ref  Mean_res Diff_rel
   pure/uniformR/excludeMax/CBool  0.000140  0.000180     -22%
 pure/uniformR/includeHalf/Word16  0.000134  0.000250     -47%
  pure/uniformR/includeHalf/CBool  0.000145  0.000242     -40%
    pure/uniformR/unbounded/Float  0.000341  0.000428     -20%


FASTER
                                 Name  Mean_ref  Mean_res Diff_rel
            pure/uniformR/full/Word64  0.001532  0.000029   5,248%
              pure/uniformR/full/Word  0.001574  0.000029   5,369%
              pure/uniformR/full/Int8  0.029977  0.000776   3,765%
             pure/uniformR/full/Int16  0.030798  0.000466   6,505%
             pure/uniformR/full/Int32  0.030131  0.000451   6,588%
             pure/uniformR/full/Int64  0.030030  0.000029 102,248%
               pure/uniformR/full/Int  0.001336  0.000029   4,555%
              pure/uniformR/full/Char  0.000220  0.000172      28%
             pure/uniformR/full/CChar  0.029913  0.000654   4,476%
            pure/uniformR/full/CSChar  0.031495  0.000693   4,444%
            pure/uniformR/full/CShort  0.031254  0.000398   7,750%
           pure/uniformR/full/CUShort  0.026783  0.000028  96,385%
              pure/uniformR/full/CInt  0.031104  0.000421   7,281%
             pure/uniformR/full/CUInt  0.019249  0.000028  68,797%
             pure/uniformR/full/CLong  0.029671  0.000028 105,917%
            pure/uniformR/full/CULong  0.001618  0.000028   5,659%
          pure/uniformR/full/CPtrdiff  0.030656  0.000029 105,606%
             pure/uniformR/full/CSize  0.001671  0.000029   5,727%
            pure/uniformR/full/CWchar  0.029691  0.000440   6,651%
        pure/uniformR/full/CSigAtomic  0.029769  0.000423   6,943%
            pure/uniformR/full/CLLong  0.028358  0.000057  49,295%
           pure/uniformR/full/CULLong  0.001726  0.000029   5,874%
           pure/uniformR/full/CIntPtr  0.029031  0.000029 101,126%
          pure/uniformR/full/CUIntPtr  0.001635  0.000029   5,497%
           pure/uniformR/full/CIntMax  0.030994  0.000029 105,878%
          pure/uniformR/full/CUIntMax  0.001664  0.000029   5,620%
      pure/uniformR/excludeMax/Word64  0.001602  0.000406     295%
        pure/uniformR/excludeMax/Word  0.001748  0.000358     389%
        pure/uniformR/excludeMax/Int8  0.030902  0.000732   4,121%
       pure/uniformR/excludeMax/Int16  0.031986  0.000486   6,483%
       pure/uniformR/excludeMax/Int32  0.030413  0.000431   6,948%
       pure/uniformR/excludeMax/Int64  0.028208  0.000393   7,069%
         pure/uniformR/excludeMax/Int  0.001446  0.000348     315%
       pure/uniformR/excludeMax/CChar  0.031197  0.000747   4,075%
      pure/uniformR/excludeMax/CSChar  0.030840  0.000780   3,855%
      pure/uniformR/excludeMax/CShort  0.031697  0.000407   7,688%
     pure/uniformR/excludeMax/CUShort  0.026705  0.000140  18,971%
        pure/uniformR/excludeMax/CInt  0.029532  0.000433   6,718%
       pure/uniformR/excludeMax/CUInt  0.024994  0.000171  14,552%
       pure/uniformR/excludeMax/CLong  0.030473  0.000421   7,136%
      pure/uniformR/excludeMax/CULong  0.001598  0.000347     361%
    pure/uniformR/excludeMax/CPtrdiff  0.027797  0.000426   6,420%
       pure/uniformR/excludeMax/CSize  0.001746  0.000322     443%
      pure/uniformR/excludeMax/CWchar  0.029053  0.000475   6,023%
  pure/uniformR/excludeMax/CSigAtomic  0.031058  0.000383   8,016%
      pure/uniformR/excludeMax/CLLong  0.030359  0.000380   7,892%
     pure/uniformR/excludeMax/CULLong  0.001687  0.000324     420%
     pure/uniformR/excludeMax/CIntPtr  0.029798  0.000377   7,803%
    pure/uniformR/excludeMax/CUIntPtr  0.001693  0.000354     378%
     pure/uniformR/excludeMax/CIntMax  0.030259  0.000412   7,250%
    pure/uniformR/excludeMax/CUIntMax  0.001704  0.000364     369%
     pure/uniformR/includeHalf/Word64  0.002790  0.001419      97%
       pure/uniformR/includeHalf/Word  0.002990  0.001341     123%
       pure/uniformR/includeHalf/Int8  0.033753  0.001304   2,488%
      pure/uniformR/includeHalf/Int16  0.034185  0.000778   4,294%
      pure/uniformR/includeHalf/Int32  0.034772  0.000639   5,340%
      pure/uniformR/includeHalf/Int64  0.031111  0.000686   4,438%
        pure/uniformR/includeHalf/Int  0.001745  0.000694     151%
      pure/uniformR/includeHalf/CChar  0.033438  0.001044   3,103%
     pure/uniformR/includeHalf/CSChar  0.034654  0.001134   2,955%
     pure/uniformR/includeHalf/CShort  0.033611  0.000715   4,599%
    pure/uniformR/includeHalf/CUShort  0.026329  0.000152  17,224%
       pure/uniformR/includeHalf/CInt  0.035288  0.000671   5,155%
      pure/uniformR/includeHalf/CUInt  0.032928  0.001210   2,620%
      pure/uniformR/includeHalf/CLong  0.031242  0.000651   4,699%
     pure/uniformR/includeHalf/CULong  0.002857  0.001368     109%
   pure/uniformR/includeHalf/CPtrdiff  0.031445  0.000586   5,270%
      pure/uniformR/includeHalf/CSize  0.002738  0.001409      94%
     pure/uniformR/includeHalf/CWchar  0.033279  0.000679   4,799%
 pure/uniformR/includeHalf/CSigAtomic  0.034464  0.000607   5,581%
     pure/uniformR/includeHalf/CLLong  0.031628  0.000593   5,231%
    pure/uniformR/includeHalf/CULLong  0.002786  0.001353     106%
    pure/uniformR/includeHalf/CIntPtr  0.033423  0.000724   4,514%
   pure/uniformR/includeHalf/CUIntPtr  0.002784  0.001378     102%
    pure/uniformR/includeHalf/CIntMax  0.031373  0.000584   5,274%
   pure/uniformR/includeHalf/CUIntMax  0.002866  0.001354     112%

As an additional observation, adding one more INLINE, e.g. for Int8's uniformRM, actually makes nextWord32 and nextWord64 slower! My guess is that GHC has some sort of "inlining budget", and that adding another INLINE pragma somewhere leads to less inlining elsewhere. For this reason I've only added INLINE pragmas where they improved the benchmark results significantly, and removed them if they led to slower generated code.

Conclusion

While I can't fully explain the non-local effects I observed with respect to inlining, this PR does objectively remove all regression we previously had relative to v1.1, so I suggest we merge.

@curiousleo
Copy link
Collaborator Author

Note that I tried to keep this PR minimal and only apply changes that had a significant impact in the benchmarks. That means, for example, that I didn't copy over the coerce changes from #103 because they did not appear to have a significant performance impact. We can of course still make those changes later on, but this PR is meant to be laser-focused on removing any regressions relative to v1.1, and nothing more.

@curiousleo curiousleo changed the title Fix regressions relative to v1.1 Fix all regressions relative to v1.1 Apr 29, 2020
@curiousleo
Copy link
Collaborator Author

Ready for review.

Copy link
Owner

@idontgetoutmuch idontgetoutmuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work :)

@curiousleo curiousleo merged commit 7f91c2f into interface-to-performance Apr 29, 2020
@curiousleo curiousleo deleted the fix-signed-regression branch April 29, 2020 17:16
@curiousleo curiousleo mentioned this pull request May 5, 2020
curiousleo added a commit that referenced this pull request May 13, 2020
* Make bitmask-with-rejection non-recursive
* INLINE some uniformRM implementations
curiousleo added a commit that referenced this pull request May 13, 2020
* Make bitmask-with-rejection non-recursive
* INLINE some uniformRM implementations
lehins pushed a commit that referenced this pull request May 18, 2020
* Make bitmask-with-rejection non-recursive
* INLINE some uniformRM implementations
curiousleo added a commit that referenced this pull request May 19, 2020
* Make bitmask-with-rejection non-recursive
* INLINE some uniformRM implementations
Shimuuar pushed a commit to Shimuuar/random that referenced this pull request Jan 6, 2025
…hortbytestring

Improve uniform `ShortByteString`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants