arm neon optimization for layernorm fp32/bf16s/fp16s #6263
linux-x64-cpu-clang.yml
on: pull_request
linux-clang
20m 54s
linux-clang-simplestl
5m 24s