Skip to content

Commit

Permalink
chore: add new compat entry for StableRNGs at version 1 for package d…
Browse files Browse the repository at this point in the history
…ocs, (keep existing compat) (#881)

Co-authored-by: CompatHelper Julia <compathelper_noreply@julialang.org>
  • Loading branch information
github-actions[bot] and CompatHelper Julia committed Sep 8, 2024
1 parent abc7057 commit cf99f8b
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ Optimisers = "0.3.3"
Pkg = "1.10"
Printf = "1.10"
Random = "1.10"
StableRNGs = "1"
StaticArrays = "1"
WeightInitializers = "1"
Zygote = "0.6.70"
Expand Down

1 comment on commit cf99f8b

@github-actions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lux Benchmarks

Benchmark suite Current: cf99f8b Previous: abc7057 Ratio
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s) 409792 ns 412833 ns 0.99
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s) 322250 ns 324917 ns 0.99
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s) 243583 ns 322791 ns 0.75
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s) 739625 ns 741270.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA 44053 ns 44918 ns 0.98
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s) 1353834 ns 1358250 ns 1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s) 2426458 ns 2444062.5 ns 0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s) 16512459 ns 14162791 ns 1.17
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s) 2191083.5 ns 2277500 ns 0.96
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA 209370 ns 212604 ns 0.98
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s) 1454375 ns 1450562.5 ns 1.00
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s) 908458 ns 960958.5 ns 0.95
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s) 1834875 ns 1778125 ns 1.03
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s) 2240458.5 ns 2274000 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1748562.5 ns 1767833.5 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1089395.5 ns 1083978.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1512729 ns 1529021 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3013750 ns 2954750 ns 1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA 208817.5 ns 209644 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12152041.5 ns 12148854.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 8814875 ns 8834958.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9198917 ns 9230875 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18613479 ns 18631937.5 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1488013.5 ns 1509941 ns 0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17304750 ns 17314333 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 13952770.5 ns 13961542 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14533958 ns 14514291 ns 1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 21843833.5 ns 21865437.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250399541.5 ns 249016958.5 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 148350083 ns 148521291 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 117130083 ns 116073791 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 450838083 ns 447568292 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5478039 ns 5499808 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1223340875 ns 1227795916 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 931640292 ns 931180042 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 831594354.5 ns 831332521 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1647325416 ns 1629694167 ns 1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 31506744.5 ns 31376705.5 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1144335875 ns 1167771625 ns 0.98
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 995382583.5 ns 1003953563 ns 0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1322398292 ns 1322017146 ns 1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1739450208 ns 1730835103.5 ns 1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s) 1068417 ns 1100791 ns 0.97
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s) 1603458.5 ns 1624625 ns 0.99
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s) 3760063 ns 3431229 ns 1.10
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s) 782062 ns 781521 ns 1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA 261189.5 ns 272287.5 ns 0.96
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s) 3001979 ns 3015146 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s) 4127958 ns 4087333.5 ns 1.01
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s) 10894833 ns 10933000 ns 1.00
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s) 3233270.5 ns 3238167 ns 1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA 1128601 ns 1132885 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 2312312.5 ns 2306750 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1427541.5 ns 1433208.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1552396 ns 1678625.5 ns 0.92
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 4205417 ns 4201375 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 207575 ns 209995 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 19386792 ns 19417729 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 16057458 ns 16114625 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 17256291 ns 17220375 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 25860208 ns 25992250 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1590086 ns 1600144 ns 0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 34375666 ns 34149500 ns 1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 30899458.5 ns 30894937.5 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 31158000 ns 31140666 ns 1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 36246917 ns 36754250 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 4546167 ns 4526959 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2772584 ns 2746459 ns 1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2682438 ns 2911584 ns 0.92
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 8378667 ns 8399583 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 420456 ns 373956 ns 1.12
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 38885979.5 ns 38745459 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 32074313 ns 32111709 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 32239667 ns 32268625 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 51823708 ns 52066792 ns 1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2618884 ns 2635152.5 ns 0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 82643500 ns 88780729 ns 0.93
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 112560458 ns 84997250 ns 1.32
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 185039874.5 ns 218329542 ns 0.85
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 73747708 ns 74358917 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 268204791.5 ns 267246875 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 159374708 ns 158965875 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 123950416.5 ns 126688521 ns 0.98
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 485039833 ns 485596792 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA 7043693 ns 7022210 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1468109979 ns 1468898146 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 1174089583 ns 1171204459 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 1065212458.5 ns 1068921333.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 2013851104.5 ns 2001229479 ns 1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 34531403 ns 34725068.5 ns 0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1695591000 ns 1692415625 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 1493306146 ns 1500720958.5 ns 1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1801755584 ns 1766379833 ns 1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 2201440812.5 ns 2224153125 ns 0.99
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s) 1806792 ns 1760875 ns 1.03
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s) 2531562 ns 2595167 ns 0.98
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s) 7672666 ns 7433916.5 ns 1.03
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s) 2462833 ns 2426041.5 ns 1.02
lenet(28, 28, 1, 128)/forward/GPU/CUDA 266951 ns 273792 ns 0.98
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s) 9343333 ns 9254417 ns 1.01
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s) 11495750 ns 11474333 ns 1.00
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s) 26058854.5 ns 25126166 ns 1.04
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s) 11770625 ns 11780750 ns 1.00
lenet(28, 28, 1, 128)/zygote/GPU/CUDA 1165407 ns 1194908 ns 0.98
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s) 379821291 ns 381207125 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s) 284431333.5 ns 285815709 ns 1.00
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s) 276993833.5 ns 233745708 ns 1.19
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s) 453499125 ns 453344667 ns 1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA 4933427 ns 4852271 ns 1.02
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s) 1154735042 ns 1157427583 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s) 934566458 ns 931406250 ns 1.00
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s) 1022641417 ns 929761209 ns 1.10
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s) 1392634541 ns 1403593291 ns 0.99
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA 18839648 ns 19807136 ns 0.95
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s) 1047667 ns 1051042 ns 1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s) 1906208 ns 1930834 ns 0.99
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s) 6506020.5 ns 4821271 ns 1.35
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s) 1385270.5 ns 1297541 ns 1.07
lenet(28, 28, 1, 64)/forward/GPU/CUDA 268224 ns 269906 ns 0.99
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s) 6461437 ns 6495729 ns 0.99
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s) 13802959 ns 12306583.5 ns 1.12
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s) 21722625 ns 18165416.5 ns 1.20
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s) 6091083 ns 6025750 ns 1.01
lenet(28, 28, 1, 64)/zygote/GPU/CUDA 1208321 ns 1207681.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70468396 ns 70586437.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43613625 ns 43556333.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39889875 ns 39526083 ns 1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132854895.5 ns 132710667 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1872456 ns 1944845 ns 0.96
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 355307875 ns 356816354 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 270273125 ns 270253083 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 254197770.5 ns 254146791.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 534390229.5 ns 534914958.5 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 12309296.5 ns 12308008 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 395284167 ns 396010084 ns 1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 394804354.5 ns 407805500 ns 0.97
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 701196333.5 ns 706921292 ns 0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 711179875 ns 711811750 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s) 1186639833 ns 1187507791 ns 1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s) 689274542 ns 764568937.5 ns 0.90
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s) 640237249.5 ns 631341166 ns 1.01
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s) 1775678646 ns 1772828250 ns 1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA 12314528 ns 12544942.5 ns 0.98
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s) 3680556646 ns 3767262229 ns 0.98
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s) 2857162417 ns 2869944333 ns 1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s) 2854405625 ns 2705287250 ns 1.06
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s) 5145784083 ns 5058993459 ns 1.02
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA 49808957 ns 49891272 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3409479 ns 3429042 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2065084 ns 2081583 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2479917 ns 2543583 ns 0.97
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 6015479 ns 6024375 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA 341120 ns 338827 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 25925021 ns 26104562.5 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 18915667 ns 19078958.5 ns 0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 19134125.5 ns 19625020.5 ns 0.97
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 39216437.5 ns 39317959 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2468869 ns 2462668 ns 1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 55378250 ns 54777416 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 81111916 ns 80697167 ns 1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 174313958.5 ns 170440292 ns 1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 45500125 ns 45420250 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s) 1779417 ns 1787458 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s) 1092250 ns 1101875 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s) 1547583 ns 1569708 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s) 3037625 ns 3035500 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA 212275 ns 215425 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s) 12533437.5 ns 12537208 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s) 9199000 ns 9283500 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s) 9578167 ns 9641937.5 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s) 18975812.5 ns 18984166.5 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA 1533549 ns 1531405 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s) 17619125 ns 17668583 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s) 14239459 ns 14332291.5 ns 0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s) 14500521 ns 14569250 ns 1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s) 22180250 ns 22181083.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 70496583.5 ns 70579000.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 43594834 ns 43509167 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 39807625 ns 39545292 ns 1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 132718979 ns 132823604.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 1947710 ns 1947535 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 360073791 ns 361581166 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 345868042 ns 345861541.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 302741792 ns 303584333 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 725319167 ns 724116959 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 13371028 ns 13351785.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 419555417 ns 419705187.5 ns 1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 418148437.5 ns 420514459 ns 0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 710077458.5 ns 697427687 ns 1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 715636334 ns 717027625 ns 1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s) 1661042 ns 1700896 ns 0.98
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s) 1277792 ns 1344562.5 ns 0.95
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s) 1134813 ns 1353750 ns 0.84
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s) 2433292 ns 2400417 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA 584506.5 ns 590707 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s) 9020542 ns 8924250 ns 1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s) 12869000 ns 12992208 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s) 32651417 ns 30772062.5 ns 1.06
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s) 9805792 ns 9884229.5 ns 0.99
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA 1428291 ns 1479651 ns 0.97
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s) 18111583 ns 17441145.5 ns 1.04
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s) 17253354 ns 16807333 ns 1.03
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s) 26535354 ns 30461791.5 ns 0.87
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s) 14356792 ns 14317375 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s) 710208 ns 789375 ns 0.90
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s) 599312.5 ns 595083.5 ns 1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s) 912395.5 ns 1038125 ns 0.88
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s) 725791 ns 725167 ns 1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA 47816 ns 48555.5 ns 0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s) 1582187.5 ns 1507084 ns 1.05
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s) 973833 ns 1043292 ns 0.93
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s) 1835187.5 ns 1413583 ns 1.30
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s) 2183125 ns 2256583 ns 0.97
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA 236731.5 ns 241345.5 ns 0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s) 1600083 ns 1541063 ns 1.04
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s) 1053041.5 ns 1073583.5 ns 0.98
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s) 1388771 ns 1495667 ns 0.93
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s) 2256062 ns 2216500 ns 1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s) 3409541.5 ns 3407458.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s) 2060229 ns 2060208 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s) 2482875 ns 2504792 ns 0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s) 5998167 ns 6019500 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA 286197 ns 283414 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s) 24038625 ns 24068584 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s) 17258666.5 ns 17256458.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s) 17123396 ns 17166250 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s) 37487104 ns 37584937.5 ns 1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA 2409477.5 ns 2397302 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s) 54679729 ns 52933521 ns 1.03
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s) 84538542 ns 83805875 ns 1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s) 157339000 ns 168151312.5 ns 0.94
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s) 44498708 ns 44568645.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s) 250028813 ns 250376958 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s) 147930708 ns 148122999.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s) 116617291 ns 115699917 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s) 454228375 ns 448012646 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA 5443404 ns 5442645 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s) 1101896208 ns 1105356584 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s) 855324125.5 ns 854303812.5 ns 1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s) 839930250.5 ns 826724000 ns 1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s) 1774005666 ns 1752988167 ns 1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA 29278014 ns 28762466 ns 1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s) 1013677520.5 ns 1031896104 ns 0.98
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s) 922761000 ns 962579167 ns 0.96
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s) 1320593542 ns 1179808792 ns 1.12
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s) 1744904771 ns 1752419187.5 ns 1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s) 1230812.5 ns 1246312 ns 0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s) 967417 ns 981667 ns 0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s) 669125 ns 924938 ns 0.72
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s) 2028541 ns 1952875 ns 1.04
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA 558507.5 ns 559173.5 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s) 6006292 ns 5968250 ns 1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s) 6899417 ns 6725083 ns 1.03
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s) 25958937 ns 24147709 ns 1.08
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s) 7102312 ns 7125208 ns 1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA 1368625 ns 1363102 ns 1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s) 10886750 ns 10592083.5 ns 1.03
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s) 9389042 ns 9872770.5 ns 0.95
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s) 17293854.5 ns 16891792 ns 1.02
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s) 7443459 ns 8542250.5 ns 0.87
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s) 352104 ns 490083 ns 0.72
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s) 409416.5 ns 414250 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s) 3455917 ns 1848916.5 ns 1.87
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s) 88750 ns 89417 ns 0.99
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA 27682 ns 27713 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s) 392604 ns 381875 ns 1.03
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s) 399000 ns 447500 ns 0.89
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s) 4557125 ns 4415146 ns 1.03
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s) 258875 ns 259083.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA 221053 ns 221456.5 ns 1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s) 422125 ns 412875 ns 1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s) 429208 ns 474250 ns 0.91
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s) 4755354 ns 4220333 ns 1.13
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s) 270916 ns 271166 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s) 305104 ns 434854 ns 0.70
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s) 348458 ns 353250 ns 0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s) 635625 ns 650792 ns 0.98
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s) 54250 ns 54375 ns 1.00
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA 27950 ns 27922 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s) 355958 ns 339896.5 ns 1.05
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s) 274500 ns 340500 ns 0.81
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s) 753208.5 ns 611187.5 ns 1.23
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s) 151667 ns 152292 ns 1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA 205458.5 ns 206825 ns 0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s) 372292 ns 356792 ns 1.04
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s) 288521 ns 355875 ns 0.81
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s) 798979 ns 420542 ns 1.90
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s) 150792 ns 151000 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s) 602253459 ns 603607250 ns 1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s) 430857604 ns 425272979 ns 1.01
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s) 392009125 ns 372455458 ns 1.05
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s) 877215958 ns 873099458 ns 1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA 7028016 ns 7619709 ns 0.92
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s) 1996302145.5 ns 2006739833.5 ns 0.99
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s) 1609994521 ns 1613467771 ns 1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s) 1565616166.5 ns 1601604000 ns 0.98
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s) 2641861333 ns 2628483083 ns 1.01
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA 25992958 ns 26335134 ns 0.99
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s) 536791.5 ns 520146 ns 1.03
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s) 435250 ns 434479 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s) 2792250 ns 1898520.5 ns 1.47
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s) 865125 ns 866625 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA 47701 ns 47286 ns 1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s) 1900167 ns 1848208.5 ns 1.03
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s) 2798208 ns 2786229 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s) 16325500 ns 14679500 ns 1.11
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s) 2771604 ns 2771958 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA 248374 ns 249296.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s) 1976729 ns 1937125 ns 1.02
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s) 5051583 ns 5035312.5 ns 1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s) 16501146 ns 14724291.5 ns 1.12
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s) 2698083.5 ns 2768167 ns 0.97
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s) 1614854 ns 1574791.5 ns 1.03
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s) 1236833 ns 1257666 ns 0.98
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s) 1069583 ns 1200500 ns 0.89
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s) 2226209 ns 2226083 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA 577670 ns 584985.5 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s) 5930562.5 ns 5976500 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s) 6880833 ns 4604667 ns 1.49
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s) 26135520.5 ns 25216125 ns 1.04
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s) 7284792 ns 7317042 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA 1356112 ns 1363255 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s) 12782291 ns 12710625 ns 1.01
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s) 11955834 ns 11988958 ns 1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s) 21105833.5 ns 21409084 ns 0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s) 10667312.5 ns 10882083 ns 0.98
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s) 2334 ns 2291 ns 1.02
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s) 4792 ns 2708 ns 1.77
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s) 3625 ns 2959 ns 1.23
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s) 2375 ns 2375 ns 1
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA 24681 ns 24451.5 ns 1.01
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s) 7333 ns 7042 ns 1.04
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s) 7250 ns 7084 ns 1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s) 7167 ns 7209 ns 0.99
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s) 7291 ns 7166 ns 1.02
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA 209372.5 ns 210193.5 ns 1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s) 8333 ns 8125 ns 1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s) 8292 ns 8292 ns 1
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s) 8500 ns 8208 ns 1.04
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s) 6000 ns 5917 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s) 10312.5 ns 11000.5 ns 0.94
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s) 14125 ns 16166 ns 0.87
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s) 10687.5 ns 11146 ns 0.96
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s) 7167 ns 7125 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA 24485 ns 24717 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s) 19958 ns 20000 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s) 20041.5 ns 20000 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s) 19833 ns 20125 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s) 20000 ns 20250 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA 229359 ns 230632.5 ns 0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s) 23395.5 ns 23375 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s) 23750 ns 23417 ns 1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s) 23542 ns 23645.5 ns 1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s) 21333 ns 21375 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s) 28875 ns 29458 ns 0.98
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s) 28750 ns 28834 ns 1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s) 29083 ns 28625 ns 1.02
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s) 46041 ns 46333 ns 0.99
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA 25546 ns 25821.5 ns 0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s) 221812.5 ns 226542 ns 0.98
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s) 279708 ns 274167 ns 1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s) 4417417 ns 4023229.5 ns 1.10
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s) 145625 ns 145708 ns 1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA 211875.5 ns 205677 ns 1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s) 332875 ns 339625 ns 0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s) 321125 ns 311625 ns 1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s) 562312.5 ns 520417 ns 1.08
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s) 161625 ns 161292 ns 1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s) 2083 ns 1875 ns 1.11
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s) 2125 ns 1833 ns 1.16
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s) 3875 ns 2104.5 ns 1.84
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s) 1709 ns 1625 ns 1.05
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA 22559 ns 22965 ns 0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s) 5334 ns 5250 ns 1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s) 5437.5 ns 5250 ns 1.04
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s) 5458 ns 5292 ns 1.03
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s) 5417 ns 5208 ns 1.04
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA 254509.5 ns 261526 ns 0.97
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s) 11708 ns 11208 ns 1.04
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s) 11416 ns 11333 ns 1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s) 11416 ns 11459 ns 1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s) 6750 ns 6708 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s) 79881458 ns 79891416 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s) 49107667 ns 49038584 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s) 43180145.5 ns 44836791 ns 0.96
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s) 151771375 ns 151572917 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA 2680326.5 ns 2695899 ns 0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s) 662703292 ns 665802334 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s) 414205958 ns 410890125 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s) 397227958 ns 399102167 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s) 688889667 ns 681784916 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA 14602708 ns 14619713 ns 1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s) 715248166.5 ns 710708249.5 ns 1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s) 686640708 ns 671159083 ns 1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s) 1044047896 ns 978285458 ns 1.07
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s) 994524042 ns 996959708 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.