Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process.report.getReport slower from 20.2.0 to 20.3.0 on Linux #48831

Open
H4ad opened this issue Jul 19, 2023 · 13 comments
Open

process.report.getReport slower from 20.2.0 to 20.3.0 on Linux #48831

H4ad opened this issue Jul 19, 2023 · 13 comments

Comments

@H4ad
Copy link
Member

H4ad commented Jul 19, 2023

Script:

const now = performance.now();
process.report.getReport()
console.log(`Took ${performance.now() - now}ms`);

On Node 20.2.0:

> node get-report.js
Took 26.467950999736786ms

On node 20.3.0

> node get-report.js
609.6318630054593ms

I did a bisect and the commit that introduced this slowdown was bfcb3d1, which is the upgrade of libuv.

But the actual problem was introduced (I think) at libuv/libuv@434eb4b, the slowdown is caused because instead of assuming the same information for all CPUs, now libuv read the /sys/devices/system/cpu/cpu%u/cpufreq/scaling_cur_freq for every CPU.

I'll also open an issue on libuv, but I want to keep it open just to track this slowdown.

My machine:

  • Ryzen 9 5950X 3.4GHz, 16-Cores 32-Threads
  • 32GB of Memory.

cc @nodejs/performance

@H4ad H4ad changed the title process.report.getReport 24x slower from 20.2.0 to 20.3.0 process.report.getReport slower from 20.2.0 to 20.3.0 on Linux Jul 19, 2023
@H4ad
Copy link
Member Author

H4ad commented Jul 19, 2023

As discussed at libuv/libuv#4098, the slowdown is not related to libuv itself, for some reason it's slow to read data about actual frequency, even cat can be slow:

time sudo cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq

On my machine it took 600ms (32 cpu), on another machine it took less than 1ms (12 cpu), and on another, it took 1.6s (16 cpu), so this problem is very strange. If we call cat multiple times it will return the fast value, so if you wait 2s to call it will get slow again.

I proposed to add a flag on uv_cpu_info to choose which way we want to get the CPU frequency data, in theory, we can go back to the old behavior of getting the approximate frequency (and maybe we can introduce a new API to get the real one) but unless a from the libuv maintainers reopen the issue, it will not be implemented.

In this case, unless we remove the call for uv_cpu_info, Linux machines can have this slowdown depending on many factors, not only CPUs.

This issue could mainly affect native bindings with NodeJS because they use getReport to detect if the system is musl, but I already created PRs to avoid using it on detect-libc and napi-rs, so at least for the most popular packages, it will not be an issue.

@feisuzhu
Copy link

Just run npm install xxx on a 384 cores machine and it stuck forever, and then found this issue.

The uv_cpu_info is kept called and I don't know when will it stop.

2024-11-12-223125_1722x620_scrot

@feisuzhu
Copy link

root@6c92459bf81a:/# node a.js
Took 3862.274124ms
root@6c92459bf81a:/#

@H4ad
Copy link
Member Author

H4ad commented Nov 12, 2024

Oh, that's why I saw random slowdown/stuck in my machine, probably introduced by npm/npm-install-checks@ebf9b9f

/cc @nodejs/npm

@H4ad
Copy link
Member Author

H4ad commented Nov 12, 2024

We can use the implementation that I created for lovell/detect-libc#19, is much faster and solve the issue with the slowdown caused by getReport.

I will probably open a PR soon (when I had more time to investigate and confirm the issue)

@feisuzhu
Copy link

feisuzhu commented Nov 12, 2024

For anyone run into this, here's a workaround (if you are using glibc, e.g. Ubuntu, change to musl for if you are using Alpine)

cat <<EOF >> /usr/lib/node_modules/npm/node_modules/npm-install-checks/lib/current-env.js
function libc(osname) {
    return 'glibc';
}
EOF
npm install -g your-favorite-app

@Tofandel
Copy link
Contributor

Tofandel commented Nov 12, 2024

Related #55576 with fix #55602

100% sure this issue is with uv_cpu_info?

We do really need to add options to getReport so we can choose specifically which info we are looking for to avoid slowdowns

@H4ad
Copy link
Member Author

H4ad commented Nov 12, 2024

@Tofandel the network could affect but the main issue is probably the uv_cpu_info that we can't avoid without introducing a new flag, which I think could be the best scenario along with updating the npm code.

But I can later try to build nodejs in your branch to confirm.

@Tofandel
Copy link
Contributor

Tofandel commented Nov 12, 2024

Is it 384 core or logical processors? I'm guessing if you have 384 core you have double the logical processors and uv cpu info runs for each of the core synchronously, and the function might not be very fast

4 seconds for a debug function would be considered long but acceptable, but since it's not only used as a debug function, then instead of adding more and more flags to it, we just add options and if possible make it async

@feisuzhu
Copy link

feisuzhu commented Nov 12, 2024

@Tofandel My bad, the SMT is off so only 192 real cores It's actually 1-way node with SMT on

$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          52 bits physical, 57 bits virtual
  Byte Order:             Little Endian
CPU(s):                   192
  On-line CPU(s) list:    0-191
Vendor ID:                AuthenticAMD
  Model name:             AMD EPYC 9654 96-Core Processor
    CPU family:           25
    Model:                17
    Thread(s) per core:   2
    Core(s) per socket:   96
    Socket(s):            1
    Stepping:             1
    Frequency boost:      enabled
    CPU max MHz:          3707.8120
    CPU min MHz:          1500.0000
    BogoMIPS:             4800.18
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_go
                          od nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm c
                          mp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_
                          l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx51
                          2ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xs
                          aveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vg
                          if v_spec_ctrl avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flus
                          h_l1d
Virtualization features:
  Virtualization:         AMD-V
Caches (sum of all):
  L1d:                    3 MiB (96 instances)
  L1i:                    3 MiB (96 instances)
  L2:                     96 MiB (96 instances)
  L3:                     384 MiB (12 instances)
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-191
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Mitigation; safe RET
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected

@Tofandel
Copy link
Contributor

Tofandel commented Nov 12, 2024

It seems its actually 96 real cores and 192 logical cores (1 socket with 96 core per socket * 2 threads per core)

Is it maybe an AMD specific issue? Would be nice to see if it's a common factor to see where to look

@feisuzhu
Copy link

Hmmm you are right, thought it was a 2-way machine

@Tofandel
Copy link
Contributor

It seems it's just randomly slow for me as well (8 real cores - Intel), so not just AMD

 time sudo cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
4000009
3999975
4000000
4000000
4000031
3999999
4000040
4000113

real    0m0.010s
user    0m0.007s
sys     0m0.003s
time sudo cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
4083445
4027384
4078306
4092856
4023428
4094010
4097951
4153294

real    0m0.166s
user    0m0.009s
sys     0m0.001s

It seems to depend on what the cpu is doing at the moment, if the cpu is at rest with a lot of idle time it becomes slow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants