Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable arm neon intrinsics for msvc build #5151

Merged
merged 30 commits into from
Nov 21, 2023

Conversation

nihui
Copy link
Member

@nihui nihui commented Nov 15, 2023

  • compiler check
  • guard all gnu inline assembly
  • enable windows arm ci
  • implement runtime cpu feature
  • neon intrinsics coverage

@github-actions github-actions bot added the arm label Nov 15, 2023
@codecov-commenter
Copy link

codecov-commenter commented Nov 15, 2023

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (465debe) 94.72% compared to head (f505902) 94.41%.
Report is 3 commits behind head on master.

Files Patch % Lines
src/mat.cpp 0.00% 6 Missing ⚠️
src/cpu.cpp 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5151      +/-   ##
==========================================
- Coverage   94.72%   94.41%   -0.31%     
==========================================
  Files         774      774              
  Lines      242342   243079     +737     
==========================================
- Hits       229551   229504      -47     
- Misses      12791    13575     +784     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nihui nihui force-pushed the arm-neon-intrinsics branch from 3d9899f to 23ecdc3 Compare November 17, 2023 02:52
@nihui
Copy link
Member Author

nihui commented Nov 17, 2023

pi@raspberrypi:~/ncnn/build/benchmark $ ./benchncnn 10 4 0 -1 1
loop_count = 10
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =    8.49  max =    8.78  avg =    8.69
     squeezenet_int8  min =    8.61  max =    9.05  avg =    8.81
           mobilenet  min =   10.99  max =   11.14  avg =   11.06
      mobilenet_int8  min =   11.22  max =   11.48  avg =   11.31
        mobilenet_v2  min =   13.40  max =   13.75  avg =   13.61
        mobilenet_v3  min =    9.05  max =   12.14  avg =    9.50
          shufflenet  min =    4.43  max =    4.46  avg =    4.44
       shufflenet_v2  min =    3.63  max =    3.71  avg =    3.66
             mnasnet  min =    8.02  max =    8.14  avg =    8.08
     proxylessnasnet  min =    9.43  max =    9.68  avg =    9.52
     efficientnet_b0  min =   14.89  max =   15.14  avg =   15.03
   efficientnetv2_b0  min =   16.56  max =   20.13  avg =   17.12
        regnety_400m  min =   12.22  max =   12.38  avg =   12.30
           blazeface  min =    1.61  max =    1.66  avg =    1.64
           googlenet  min =   31.02  max =   31.48  avg =   31.21
      googlenet_int8  min =   29.68  max =   30.22  avg =   29.96
            resnet18  min =   23.85  max =   24.17  avg =   24.03
       resnet18_int8  min =   21.39  max =   21.60  avg =   21.47
             alexnet  min =   24.61  max =   24.84  avg =   24.74
               vgg16  min =  166.10  max =  168.17  avg =  166.83
          vgg16_int8  min =  129.48  max =  131.56  avg =  130.40
            resnet50  min =   55.17  max =   55.79  avg =   55.60
       resnet50_int8  min =   50.97  max =   53.51  avg =   51.45
      squeezenet_ssd  min =   39.42  max =   39.99  avg =   39.72
 squeezenet_ssd_int8  min =   36.80  max =   38.57  avg =   37.31
       mobilenet_ssd  min =   30.53  max =   30.94  avg =   30.74
  mobilenet_ssd_int8  min =   28.72  max =   29.63  avg =   29.25
      mobilenet_yolo  min =   71.02  max =   71.81  avg =   71.34
  mobilenetv2_yolov3  min =   48.18  max =   48.69  avg =   48.36
         yolov4-tiny  min =   55.63  max =   56.20  avg =   55.85
           nanodet_m  min =   13.68  max =   14.00  avg =   13.81
    yolo-fastest-1.1  min =    6.44  max =    6.57  avg =    6.50
      yolo-fastestv2  min =    5.73  max =    5.82  avg =    5.76
  vision_transformer  min =  616.99  max =  640.29  avg =  627.76
          FastestDet  min =    5.60  max =    5.80  avg =    5.70
pi@raspberrypi:~/ncnn/build/benchmark $ ./benchncnn 10 1 0 -1 1
loop_count = 10
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   13.22  max =   13.38  avg =   13.31
     squeezenet_int8  min =   12.89  max =   13.17  avg =   13.05
           mobilenet  min =   21.46  max =   21.60  avg =   21.51
      mobilenet_int8  min =   15.41  max =   15.72  avg =   15.58
        mobilenet_v2  min =   18.76  max =   19.56  avg =   19.01
        mobilenet_v3  min =   12.88  max =   13.25  avg =   13.07
          shufflenet  min =    7.26  max =    7.30  avg =    7.28
       shufflenet_v2  min =    7.22  max =    7.32  avg =    7.26
             mnasnet  min =   13.42  max =   13.71  avg =   13.57
     proxylessnasnet  min =   16.61  max =   16.69  avg =   16.66
     efficientnet_b0  min =   25.88  max =   26.04  avg =   25.95
   efficientnetv2_b0  min =   28.97  max =   29.17  avg =   29.06
        regnety_400m  min =   17.47  max =   17.54  avg =   17.51
           blazeface  min =    3.08  max =    3.13  avg =    3.11
           googlenet  min =   51.52  max =   52.01  avg =   51.71
      googlenet_int8  min =   52.89  max =   53.08  avg =   52.98
            resnet18  min =   31.67  max =   31.99  avg =   31.83
       resnet18_int8  min =   39.00  max =   39.21  avg =   39.05
             alexnet  min =   36.66  max =   37.07  avg =   36.88
               vgg16  min =  205.90  max =  209.63  avg =  207.62
          vgg16_int8  min =  287.79  max =  289.39  avg =  288.78
            resnet50  min =   95.31  max =   96.43  avg =   95.96
       resnet50_int8  min =   85.52  max =   85.89  avg =   85.76
      squeezenet_ssd  min =   43.67  max =   44.13  avg =   43.88
 squeezenet_ssd_int8  min =   47.20  max =   48.27  avg =   47.77
       mobilenet_ssd  min =   50.50  max =   51.04  avg =   50.81
  mobilenet_ssd_int8  min =   40.89  max =   42.05  avg =   41.54
      mobilenet_yolo  min =  114.10  max =  116.32  avg =  115.16
  mobilenetv2_yolov3  min =   68.92  max =   69.05  avg =   68.98
         yolov4-tiny  min =   73.24  max =   73.61  avg =   73.45
           nanodet_m  min =   21.32  max =   21.59  avg =   21.45
    yolo-fastest-1.1  min =    8.72  max =    8.84  avg =    8.78
      yolo-fastestv2  min =    7.89  max =    8.00  avg =    7.92
  vision_transformer  min = 1267.07  max = 1268.49  avg = 1267.67
          FastestDet  min =    7.88  max =    8.06  avg =    7.97

@nihui
Copy link
Member Author

nihui commented Nov 17, 2023

after apt upgrade and reboot

pi@raspberrypi:~/ncnn/build/benchmark $ ./benchncnn 10 4 0 -1 0
loop_count = 10
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 0
          squeezenet  min =    8.31  max =    8.54  avg =    8.45
     squeezenet_int8  min =    8.46  max =    8.93  avg =    8.70
           mobilenet  min =   10.98  max =   11.19  avg =   11.07
      mobilenet_int8  min =   10.98  max =   11.26  avg =   11.14
        mobilenet_v2  min =   13.50  max =   13.80  avg =   13.67
        mobilenet_v3  min =    8.72  max =    8.92  avg =    8.85
          shufflenet  min =    4.43  max =    4.52  avg =    4.47
       shufflenet_v2  min =    3.58  max =    3.71  avg =    3.63
             mnasnet  min =    7.98  max =    8.16  avg =    8.05
     proxylessnasnet  min =    9.43  max =    9.52  avg =    9.48
     efficientnet_b0  min =   14.78  max =   15.04  avg =   14.93
   efficientnetv2_b0  min =   16.38  max =   19.24  avg =   16.90
        regnety_400m  min =   11.72  max =   11.86  avg =   11.80
           blazeface  min =    1.60  max =    1.64  avg =    1.62
           googlenet  min =   30.47  max =   31.14  avg =   30.84
      googlenet_int8  min =   29.60  max =   30.04  avg =   29.82
            resnet18  min =   23.24  max =   26.26  avg =   23.78
       resnet18_int8  min =   21.06  max =   21.58  avg =   21.31
             alexnet  min =   24.64  max =   28.61  avg =   25.22
               vgg16  min =  164.78  max =  168.18  avg =  166.34
          vgg16_int8  min =  129.48  max =  130.91  avg =  130.19
            resnet50  min =   54.83  max =   55.58  avg =   55.31
       resnet50_int8  min =   50.47  max =   50.78  avg =   50.64
      squeezenet_ssd  min =   39.51  max =   40.68  avg =   39.78
 squeezenet_ssd_int8  min =   36.46  max =   38.04  avg =   37.17
       mobilenet_ssd  min =   30.16  max =   30.82  avg =   30.49
  mobilenet_ssd_int8  min =   28.51  max =   29.46  avg =   29.08
      mobilenet_yolo  min =   70.41  max =   71.12  avg =   70.79
  mobilenetv2_yolov3  min =   47.81  max =   48.28  avg =   48.01
         yolov4-tiny  min =   55.16  max =   55.58  avg =   55.39
           nanodet_m  min =   13.31  max =   13.75  avg =   13.57
    yolo-fastest-1.1  min =    6.32  max =    6.45  avg =    6.38
      yolo-fastestv2  min =    5.94  max =    6.05  avg =    6.01
  vision_transformer  min =  584.05  max =  610.38  avg =  596.09
          FastestDet  min =    5.79  max =    5.95  avg =    5.89
pi@raspberrypi:~/ncnn/build/benchmark $ ./benchncnn 10 1 0 -1 0
loop_count = 10
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 0
          squeezenet  min =   12.79  max =   13.10  avg =   12.91
     squeezenet_int8  min =   12.71  max =   13.06  avg =   12.91
           mobilenet  min =   21.39  max =   21.50  avg =   21.43
      mobilenet_int8  min =   15.40  max =   15.68  avg =   15.52
        mobilenet_v2  min =   18.66  max =   18.96  avg =   18.80
        mobilenet_v3  min =   12.82  max =   13.24  avg =   13.02
          shufflenet  min =    7.27  max =    7.33  avg =    7.30
       shufflenet_v2  min =    7.20  max =    7.27  avg =    7.24
             mnasnet  min =   13.42  max =   13.67  avg =   13.54
     proxylessnasnet  min =   16.52  max =   16.61  avg =   16.57
     efficientnet_b0  min =   25.25  max =   25.74  avg =   25.56
   efficientnetv2_b0  min =   28.75  max =   28.90  avg =   28.82
        regnety_400m  min =   17.38  max =   17.52  avg =   17.46
           blazeface  min =    3.09  max =    3.12  avg =    3.10
           googlenet  min =   50.80  max =   51.47  avg =   51.15
      googlenet_int8  min =   52.25  max =   52.74  avg =   52.52
            resnet18  min =   31.69  max =   32.07  avg =   31.89
       resnet18_int8  min =   38.26  max =   38.60  avg =   38.44
             alexnet  min =   36.83  max =   37.42  avg =   37.07
               vgg16  min =  206.50  max =  209.37  avg =  207.51
          vgg16_int8  min =  286.72  max =  287.99  avg =  287.15
            resnet50  min =   95.03  max =   96.05  avg =   95.60
       resnet50_int8  min =   84.42  max =   86.22  avg =   85.23
      squeezenet_ssd  min =   43.46  max =   43.96  avg =   43.65
 squeezenet_ssd_int8  min =   47.01  max =   47.90  avg =   47.49
       mobilenet_ssd  min =   50.54  max =   51.04  avg =   50.75
  mobilenet_ssd_int8  min =   41.10  max =   42.67  avg =   41.86
      mobilenet_yolo  min =  112.72  max =  114.50  avg =  113.65
  mobilenetv2_yolov3  min =   68.36  max =   68.94  avg =   68.56
         yolov4-tiny  min =   72.89  max =   73.27  avg =   73.07
           nanodet_m  min =   21.08  max =   21.39  avg =   21.25
    yolo-fastest-1.1  min =    8.64  max =    8.79  avg =    8.68
      yolo-fastestv2  min =    7.78  max =    7.90  avg =    7.82
  vision_transformer  min = 1263.35  max = 1265.39  avg = 1264.57
          FastestDet  min =    7.85  max =    7.96  avg =    7.91

@nihui nihui force-pushed the arm-neon-intrinsics branch from 2e4bace to 7b9b43d Compare November 21, 2023 06:04
@nihui nihui closed this Nov 21, 2023
@nihui nihui reopened this Nov 21, 2023
@nihui nihui closed this Nov 21, 2023
@nihui nihui reopened this Nov 21, 2023
@nihui nihui changed the title [WIP] enable arm neon intrinsics for msvc build enable arm neon intrinsics for msvc build Nov 21, 2023
@nihui nihui merged commit 058aa0a into Tencent:master Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants