-
Notifications
You must be signed in to change notification settings - Fork 57
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: remove old mention of LuxDeviceUtils
- Loading branch information
Showing
1 changed file
with
1 addition
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a808aa8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
414875
ns411270.5
ns1.01
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
321479
ns321459
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
322521
ns243229
ns1.33
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
740000
ns739583
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
40861
ns41187
ns0.99
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1343250
ns1293854.5
ns1.04
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
2434250
ns2409166.5
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
474937.5
ns16158416
ns0.029392577836837474
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2252271
ns2244124.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
182562
ns186717.5
ns0.98
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1328292
ns1386416
ns0.96
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
2620521
ns2592167
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
610500
ns16442917
ns0.03712844868097309
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2229562.5
ns2224250
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1765917
ns1760437.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1031334
ns1084209
ns0.95
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1365416
ns1521520.5
ns0.90
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2818125
ns2926125
ns0.96
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
204521
ns205511.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12152917
ns12138333
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8828833
ns8825083.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9300834
ns9173333
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18599875
ns18597250
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1492272
ns1482675
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17275187
ns17291625
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13914875
ns13944583.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14281833
ns14521250
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21819042
ns21811145.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250296521
ns249976666.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148101750
ns148133250
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
148130792
ns116316853.5
ns1.27
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
448565625
ns449124167
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5496241
ns5482800
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1226292708
ns1230523917
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
930446334
ns928555292
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
443990041
ns832972542
ns0.53
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1653613542
ns1627591875
ns1.02
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
35420264
ns35593150.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1147479875
ns1137967916
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
996058750
ns992382854.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
629339646
ns1336473333
ns0.47
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1740843604
ns1743275104
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1116250
ns1092042
ns1.02
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1624229.5
ns1598250
ns1.02
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
1206375.5
ns3369875
ns0.36
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
782041
ns781500
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
260633
ns252534
ns1.03
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2984374.5
ns2977458
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4127166
ns4116250
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
3295208.5
ns9642292
ns0.34
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3137625
ns3142958
ns1.00
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1049614
ns1029320
ns1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2315396
ns2326083
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1424437
ns1424667
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1685208
ns1560084
ns1.08
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4196250
ns4056750
ns1.03
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
208669.5
ns208748
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19413145.5
ns19404250
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16084375
ns16063062.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17133041.5
ns17250542
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25866542
ns25830521
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1576194
ns1568775
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34217167
ns34513208
ns0.99
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30754459
ns30762479.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31341542
ns31225958
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
37132709
ns36930167
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4525792
ns4524979.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2744125
ns2763937.5
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2881375
ns2673854.5
ns1.08
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8371458
ns8377291.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
423036
ns421945
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
38892667
ns39044209
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32085104.5
ns32104791.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32057770.5
ns32250208
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
52159979.5
ns51814146
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2618584
ns2619508
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
89172458
ns88649667
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
113776875
ns113984812.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
62985709
ns225519896
ns0.28
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74986500
ns74452916.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
268884125
ns268719375
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
159000000
ns159201750
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
158925750
ns123545124.5
ns1.29
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
486715083
ns491504917
ns0.99
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
6941165
ns6875777.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1474467645.5
ns1477656896
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1134657750
ns1178240000
ns0.96
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
687890791.5
ns1062550813
ns0.65
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2033574500
ns2026628729
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
33495275
ns33057912
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1720167208
ns1716757416
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1551435312.5
ns1531392041.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1147814729
ns1872548250
ns0.61
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2245015792
ns2230422791
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2039500
ns2018375
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
3006583
ns3022624.5
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
1618791.5
ns7958917
ns0.20
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2424854.5
ns2489958
ns0.97
lenet(28, 28, 1, 128)/forward/GPU/CUDA
258194
ns253828.5
ns1.02
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9325667
ns9615917
ns0.97
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
11994291.5
ns11905479
ns1.01
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
7128604
ns24859333
ns0.29
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11753792
ns11285437.5
ns1.04
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1096609.5
ns1089672
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
380363500.5
ns382104291.5
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
286893354
ns288697958.5
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
129833291
ns263937500
ns0.49
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
456069146
ns453025562.5
ns1.01
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
5018425
ns4955655.5
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1154815958
ns1159602875
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
934037667
ns937068625
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
609039458
ns1116269958
ns0.55
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1585642292
ns1586148458
ns1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
19065478
ns18262229
ns1.04
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1049833.5
ns1053583
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
2073542
ns2074479.5
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
1348479.5
ns6685541
ns0.20
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1287021
ns1294959
ns0.99
lenet(28, 28, 1, 64)/forward/GPU/CUDA
259724.5
ns256727
ns1.01
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6258792
ns6501125
ns0.96
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
12411416
ns12392708
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
4953146
ns19126833.5
ns0.26
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6086709
ns6062209
ns1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1149352.5
ns1151411.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70546083
ns70475667
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43491792
ns43577354.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
37811479.5
ns39785333
ns0.95
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
134717229.5
ns132525000
ns1.02
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1859024
ns1859935
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
355554354.5
ns356597312.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
270317625
ns270033292
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
146113896
ns254165791.5
ns0.57
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
537066979.5
ns541858104.5
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12142155.5
ns12225780
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
396257791
ns395590417
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
404428375.5
ns407040083
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
302176729
ns686687708
ns0.44
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
712116709
ns711343459
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1190477625
ns1189721417
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
689814958.5
ns694763792
ns0.99
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
404795334
ns639415416.5
ns0.63
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1876404250
ns1863138250
ns1.01
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12324333
ns12305225
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3610008479.5
ns3693716416.5
ns0.98
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2831662833
ns2822411042
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
1516977229.5
ns2715785292
ns0.56
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5143819000
ns5075752792
ns1.01
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
50066391.5
ns50096815
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3345708.5
ns3427000
ns0.98
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2078625
ns2064167
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2287083
ns2526875
ns0.91
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6026917
ns6023271
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
330146
ns343351
ns0.96
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25733291.5
ns26112416
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
18989125
ns19044125
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19553792
ns19200250
ns1.02
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39739583.5
ns39317542
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2459398
ns2472756
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
54593479
ns53212375
ns1.03
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
78905375
ns86527562.5
ns0.91
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
29660083.5
ns174565833
ns0.17
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45812146
ns45612312
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1660583.5
ns1781334
ns0.93
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1105770.5
ns1100708
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1392229
ns1559041
ns0.89
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3035959
ns3031000
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
210818
ns209992.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12525958.5
ns12526708.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9221375
ns9202062.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9699583
ns9594042
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
19002416.5
ns18995083.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1509113
ns1520295
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17662604.5
ns17654187.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14311479
ns14319770.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14590875
ns14542209
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22225541
ns22178916
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70524583
ns70529958
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43452458
ns43518812.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
37882479.5
ns39678750
ns0.95
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132685187
ns132567854.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1859436
ns1864543.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
359287667
ns362025167
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
347693812.5
ns346626458
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
197401167
ns304297792
ns0.65
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
730607333
ns726309042
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13254127
ns13257700
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
420436292
ns417934646
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
419235583
ns420406333
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
310533750
ns710093000
ns0.44
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
718184500
ns717186500
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1442542
ns1662000
ns0.87
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1346416.5
ns1348708
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1331812.5
ns1039209
ns1.28
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2403021
ns2446333
ns0.98
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
549048
ns549327.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8851250
ns8831979
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
12939667
ns12827437
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
5552708
ns32684875
ns0.17
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9880416.5
ns9840916
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1258951
ns1223627
ns1.03
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
16575062
ns16482375
ns1.01
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
20954208
ns22352396
ns0.94
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
13338833
ns48470938
ns0.28
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
13092416
ns13131854
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
822708
ns786625
ns1.05
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
528084
ns549645.5
ns0.96
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
71146
ns1064854.5
ns0.06681288382591237
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
725750
ns725104.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
46414.5
ns45187
ns1.03
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1506500
ns1494083
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1020854
ns1045666.5
ns0.98
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
323833
ns1424458
ns0.23
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2281417
ns2291291
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
211160.5
ns207948.5
ns1.02
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1512416
ns1496875
ns1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1090125
ns1011416
ns1.08
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
446562.5
ns1769209
ns0.25
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2259375
ns2257000
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3176750
ns3409312.5
ns0.93
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2053979
ns2052875
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2268708
ns2516667
ns0.90
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6008875
ns5998083
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
282441.5
ns281525
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24059292
ns24109729
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17235458
ns17182959
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
16956292
ns17113146
ns0.99
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37778228.5
ns37468750.5
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2390107
ns2398593
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
52955708.5
ns52554812
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
84900333
ns84392000
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
27496312.5
ns170326000
ns0.16
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44513375.5
ns44580125
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250307750
ns250367042
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148084625
ns147848500
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
148444250
ns116105042
ns1.28
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
455285000
ns454852833
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5327018
ns5326699.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1102117541
ns1103578584
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
856978792
ns855911416.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
437778208
ns831258666.5
ns0.53
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1768146583
ns1772502666
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
33525724
ns33341175
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1027855937
ns1010431750
ns1.02
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
965570792
ns965660000
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
584455270.5
ns1276218416
ns0.46
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1726926104.5
ns1718974833.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1135584
ns1245354
ns0.91
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
989209
ns938208
ns1.05
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
923667
ns685500
ns1.35
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2052500
ns2004042
ns1.02
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
548882.5
ns548493.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5867833
ns5771604
ns1.02
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
6531896
ns6597750
ns0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
2613541.5
ns25936583
ns0.10
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7097417
ns7098375
ns1.00
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1222578
ns1220210
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
9683896
ns9431791
ns1.03
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
13118666
ns13114104.5
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
6497583
ns33204521
ns0.20
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
7614083.5
ns7606042
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
512667
ns430541
ns1.19
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
391292
ns381021
ns1.03
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
32750
ns3043792
ns0.010759605124134632
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
87812.5
ns89542
ns0.98
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
25759
ns25679
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
382125
ns354583
ns1.08
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
444875
ns443833
ns1.00
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
160875
ns4158750
ns0.03868349864743012
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
258750
ns258750
ns1
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
188723
ns188553
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
420291.5
ns385584
ns1.09
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
475750
ns474625
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
194375
ns4412750
ns0.04404849583592998
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
270958
ns271208
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
461312.5
ns376687.5
ns1.22
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
326666.5
ns325000
ns1.01
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
14792
ns771479
ns0.019173561432002686
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
54145.5
ns54583
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
26082
ns26029
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
340312
ns303687.5
ns1.12
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
342500
ns341166.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
25958.5
ns893375
ns0.029056667133062822
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151625
ns151583.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
181930
ns180458.5
ns1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
357792
ns316479
ns1.13
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
357833
ns355958.5
ns1.01
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
46437.5
ns833687.5
ns0.055701326936052176
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
151209
ns150917
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
602226667
ns603139000
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
427648645.5
ns430379625
ns0.99
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
207084708
ns380417500
ns0.54
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
882976625
ns876424750
ns1.01
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
6984740
ns7025391.5
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
1997486771
ns2008872896
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1621644791.5
ns1619900021
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
856167166
ns1577697458
ns0.54
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2637178042
ns2622523542
ns1.01
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
26468421.5
ns26902392.5
ns0.98
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
520062.5
ns535729
ns0.97
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
429271
ns431416.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
166000
ns2478833.5
ns0.06696698265534978
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
866083
ns866124.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
46206
ns44614
ns1.04
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1874625
ns1911229
ns0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
2508792
ns2468667
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
1021958
ns16401666
ns0.062308182595597304
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2650063
ns2768395.5
ns0.96
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
217141.5
ns210772.5
ns1.03
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1862417
ns1986854
ns0.94
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
5033959
ns5052500
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
1161917
ns16457750
ns0.07059999088575292
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2752500
ns2773062.5
ns0.99
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1462229
ns1594542
ns0.92
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1192834
ns1175979
ns1.01
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1192667
ns932458.5
ns1.28
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2221791
ns2307417
ns0.96
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
550464
ns543745
ns1.01
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5883792
ns5990542
ns0.98
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
4676563
ns5767687
ns0.81
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
2871000
ns25963208
ns0.11
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7325000.5
ns7322042
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1196239.5
ns1137388.5
ns1.05
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
11670958.5
ns11669000
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
16372334
ns16638833.5
ns0.98
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
8780584
ns38505541
ns0.23
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
9544250
ns9523103.5
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2458
ns2770.5
ns0.89
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2542
ns2541
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
2875
ns3542
ns0.81
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
4625
ns2167
ns2.13
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
22670
ns21552
ns1.05
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
6916
ns7167
ns0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7083
ns7083
ns1
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7250
ns7250
ns1
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7333
ns7250
ns1.01
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
180475.5
ns173171.5
ns1.04
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8250
ns8167
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8292
ns8208
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8542
ns8584
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
6125
ns5979.5
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10916.5
ns10959
ns1.00
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
12625
ns13437.5
ns0.94
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
10459
ns10250
ns1.02
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
9729
ns7208
ns1.35
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
22420
ns21706
ns1.03
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
19916
ns19916
ns1
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
19875
ns20000
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
19958
ns20209
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
20000
ns19854.5
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
195313
ns188017
ns1.04
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
23542
ns23666
ns0.99
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
23541
ns23625
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
27125
ns23708
ns1.14
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
21334
ns21334
ns1
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28834
ns28625
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28708
ns28875
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
29042
ns28208
ns1.03
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46291
ns45875
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
23925
ns23317
ns1.03
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
224750
ns233812.5
ns0.96
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
276542
ns277666
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
44250
ns3990583
ns0.011088605349143221
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145000
ns145083
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
197967
ns191945
ns1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
242125
ns250666.5
ns0.97
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
293916
ns295459
ns0.99
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
68604.5
ns4148750
ns0.01653618559807171
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
145584
ns145562.5
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1583
ns2041
ns0.78
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
2166
ns1916
ns1.13
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2166.5
ns2584
ns0.84
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
4333.5
ns1625
ns2.67
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
20975.5
ns20024
ns1.05
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5084
ns5375
ns0.95
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5125
ns5125
ns1
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5209
ns5250
ns0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5500
ns5084
ns1.08
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
234449.5
ns238397
ns0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
7375
ns7541
ns0.98
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
7458
ns7416
ns1.01
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
8125
ns7750
ns1.05
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
5459
ns5250
ns1.04
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
80045708
ns79842084
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
49037958.5
ns49100250
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
42791749.5
ns43191750
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151490583
ns151456000
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2680013
ns2712652
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
606632959
ns472190041
ns1.28
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
411440583
ns413693042
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
292411917
ns397758813
ns0.74
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
737907354
ns737522187.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
16971190.5
ns16943151
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
714524875
ns710270771
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
672104708
ns668321833
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
580514646
ns1002011792
ns0.58
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
1012152875
ns997156208
ns1.02
This comment was automatically generated by workflow using github-action-benchmark.