Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

musl update to 1.2.1 causes slowdown. #11298

Closed
ghuls opened this issue Mar 25, 2022 · 3 comments
Closed

musl update to 1.2.1 causes slowdown. #11298

ghuls opened this issue Mar 25, 2022 · 3 comments
Labels
optimization os-linux regression It worked in a previous version of Zig, but stopped working.
Milestone

Comments

@ghuls
Copy link

ghuls commented Mar 25, 2022

Zig Version

0.10.0-dev.750+cf5009f9a

Steps to Reproduce

# Clone Cluster-Buster repo.
git clone https://github.com/weng-lab/cluster-buster

cd cluster-buster

# Compile Cluster-Buster with different versions of zig.
for zig_version in 0.10.0-dev.750+cf5009f9a 0.6.0+e5fab3b 0.6.0+030f003 ; do
    for architecture in x86_64 ; do
        ../zig/zig-linux-x86_64-${zig_version}/zig c++ \
            -v -Wall \
            -std=c++0x \
            -O3 \
            -march="${architecture}" \
            -mtune="${architecture}" \
            -D NDEBUG \
            -target x86_64-linux-musl \
            -D GIT_COMMIT_INFO="\"2020-05-07 10:30:42 -0400  commit: ac1d33c\"" \
            -ffast-math \
            -o "cbust_fast_math_${architecture}_zig${zig_version//./}" \
             *.cpp
    done
done

Create input files:

# Create motif file.
echo '>motif1
0,0,100,0
0,100,0,0
0,0,0,100
0,50,50,0
0,100,0,0
0,50,0,50
0,0,50,50
0,100,0,0
0,50,0,50
0,0,100,0
0,50,0,50
33,33,0,34
' | tr ',' '\t' > /tmp/motif.cb


# Create FASTA file.
awk '
BEGIN {
    choose_nuc[0] = "A";
    choose_nuc[1] = "C";
    choose_nuc[2] = "G";
    choose_nuc[3] = "T";

    for (i = 1; i <= 250000; i++) {
        print ">seqeunce_" i;

        for (j = 0; j < 500; j++) {
            printf "%s", choose_nuc[int(rand() * 4)];
        }
        print;
    }
}' > /tmp/fasta.fa

Run Cluster-Buster binaries for each compiled version (with different zig versions) and check execution time.

time cbust_compiled_with_zig_version_x -f 4 /tmp/motif.cb /tmp/fasta.fa > /dev/null

Expected Behavior

I expect that newer zig versions are faster (or equal in speed) than older ones.

I had a compiled Cluster-Buster version that I compiled 2020-11-05 and this is the fastest version of Cluster-Buster I ever managed to compile. I didn't know with which version and settings ( it was compiled (march, tune) with) but I remembered it was a development version of zig at the time. I never managed to create a faster or same speed binary with any zig version afterwards (0.7.0 => current).

Today I tried to find which zig version it could have been and I managed to find it.

$ strings cbust_fastest | grep zig
/software/zig/zig-linux-x86_64-0.6.0+17837affd/lib/zig/libcxxabi/src/demangle/ItaniumDemangle.h
/software/zig/zig-linux-x86_64-0.6.0+17837affd/lib/zig/libcxxabi/src/cxa_demangle.cpp

I tracked down the commit that introduced the slowdown:
https://github.com/ziglang/zig/commits/master?after=7f91be9c80f8d79e4a2c6cf0b15197cdd454cef6+6335&branch=master

Slow
update musl sources to 1.2.1
@andrewrk
andrewrk committed on Nov 6, 2020

update musl headers to 1.2.1
@andrewrk
andrewrk committed on Nov 6, 2020

update process_headers tool
@andrewrk
andrewrk committed on Nov 6, 2020

As fast as zig 0.6.0+17837affd
std: Introduce SemanticVersion data structure
@jayschwa
@andrewrk
jayschwa authored and andrewrk committed on Nov 6, 2020

Something in the musl lib update seems to cause the slow down.

Actual Behavior

When running all binaries with perf, it seems like the musl update causes a huge number of pagefaults and the sytem time goes to 1.1 second vs to less than 0.1 for the fast binary (zig0.6.0+030f003).

$ perf stat ./cbust_fast_math_x86_64_zig0100-dev750+cf5009f9a -f 4 /tmp/motif.cb /tmp/fasta.fa > /dev/null

 Performance counter stats for './cbust_fast_math_x86_64_zig0100-dev750+cf5009f9a -f 4 m/tmp/motif.cb /tmp/fasta.fa':

          16152.22 msec task-clock:u              #    0.990 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
            282135      page-faults:u             #    0.017 M/sec
       51491768159      cycles:u                  #    3.188 GHz
       78484700109      instructions:u            #    1.52  insn per cycle
       10115988739      branches:u                #  626.291 M/sec
         125795653      branch-misses:u           #    1.24% of all branches

      16.321717146 seconds time elapsed

      15.031987000 seconds user
       1.123085000 seconds sys

$ perf stat ./cbust_fast_math_x86_64_zig060+e5fab3b -f 4 /tmp/motif.cb /tmp/fasta.fa > /dev/null

 Performance counter stats for './cbust_fast_math_x86_64_zig060+e5fab3b -f 4 /tmp/motif.cb /tmp/fasta.fa':

          17711.34 msec task-clock:u              #    0.998 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
            280616      page-faults:u             #    0.016 M/sec
       52003067748      cycles:u                  #    2.936 GHz
       85803272856      instructions:u            #    1.65  insn per cycle
       11490153668      branches:u                #  648.745 M/sec
         132034190      branch-misses:u           #    1.15% of all branches

      17.754947524 seconds time elapsed

      16.541077000 seconds user
       1.171173000 seconds sys



$ perf stat ./cbust_fast_math_x86_64_zig060+030f003 -f 4 /tmp/motif.cb /tmp/fasta.fa > /dev/null

 Performance counter stats for './cbust_fast_math_x86_64_zig060+030f003 -f 4 /tmp/motif.cb /tmp/fasta.fa':

          15094.25 msec task-clock:u              #    0.998 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
             15851      page-faults:u             #    0.001 M/sec
       51491411258      cycles:u                  #    3.411 GHz
       85712078427      instructions:u            #    1.66  insn per cycle
       11470618728      branches:u                #  759.933 M/sec
         123801113      branch-misses:u           #    1.08% of all branches

      15.131651655 seconds time elapsed

      15.001611000 seconds user
       0.093772000 seconds sys
@ghuls ghuls added the bug Observed behavior contradicts documented or intended behavior label Mar 25, 2022
@12101111
Copy link

musl introduced a new malloc implementation called mallocng in 1.2.0.
Both the oldmalloc and mallocng are designed for embedded usage, and have bad performance for multi-thread computing.

You can build and link those malloc implementation as a replacement:
https://github.com/microsoft/mimalloc
https://github.com/microsoft/snmalloc
https://github.com/mjansson/rpmalloc
https://github.com/jemalloc/jemalloc

@ghuls
Copy link
Author

ghuls commented Mar 25, 2022

Cluster-Buster is a single threaded program.

@andrewrk andrewrk added os-linux optimization regression It worked in a previous version of Zig, but stopped working. and removed bug Observed behavior contradicts documented or intended behavior labels Aug 19, 2022
@andrewrk andrewrk added this to the 0.11.0 milestone Aug 19, 2022
@andrewrk
Copy link
Member

andrewrk commented Aug 19, 2022

Here's the plan to address this:

@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Jun 19, 2023
@andrewrk andrewrk closed this as not planned Won't fix, can't repro, duplicate, stale Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization os-linux regression It worked in a previous version of Zig, but stopped working.
Projects
None yet
Development

No branches or pull requests

3 participants