Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patchelf-0.17.2 seems to corrupt emacs on staging-next #482

Closed
trofi opened this issue Mar 18, 2023 · 24 comments · Fixed by #544
Closed

patchelf-0.17.2 seems to corrupt emacs on staging-next #482

trofi opened this issue Mar 18, 2023 · 24 comments · Fixed by #544
Assignees
Labels

Comments

@trofi
Copy link

trofi commented Mar 18, 2023

On current staging-next iteration quite a few emacs-dependent packages are failing. The failures seem to stem from the fact that emacs is incorrectly modified by patchelf-0.17.2 (0.15.0 works, bisected in nixpkgs by @mweinelt).

$ nix run https://github.com/NixOS/nixpkgs/archive/staging-next.tar.gz#emacs
Segmentation fault (core dumped)
$ patchelf --version
patchelf 0.17.2

It seems to have something to do with modified library list:

$ nix shell https://github.com/NixOS/nixpkgs/archive/staging-next.tar.gz#emacs
$ gdb emacs
Reading symbols from emacs...

warning: Loadable section ".dynstr" outside of ELF segments
  in /nix/store/lzahvwakhghr8b3ri40s935bwhn7nf0x-emacs-28.2/bin/emacs-28.2

warning: Loadable section ".dynamic" outside of ELF segments
  in /nix/store/lzahvwakhghr8b3ri40s935bwhn7nf0x-emacs-28.2/bin/emacs-28.2
(No debugging symbols found in emacs)
(gdb) run
Starting program: /nix/store/lzahvwakhghr8b3ri40s935bwhn7nf0x-emacs-28.2/bin/emacs

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fe6597 in dl_main ()
   from /nix/store/8xk4yl1r3n6kbyn05qhan7nbag7npymx-glibc-2.35-224/lib/ld-linux-x86-64.so.2
(gdb) bt
#0  0x00007ffff7fe6597 in dl_main ()
   from /nix/store/8xk4yl1r3n6kbyn05qhan7nbag7npymx-glibc-2.35-224/lib/ld-linux-x86-64.so.2
#1  0x00007ffff7fe2a06 in _dl_sysdep_start ()
   from /nix/store/8xk4yl1r3n6kbyn05qhan7nbag7npymx-glibc-2.35-224/lib/ld-linux-x86-64.so.2
#2  0x00007ffff7fe45ad in _dl_start ()
   from /nix/store/8xk4yl1r3n6kbyn05qhan7nbag7npymx-glibc-2.35-224/lib/ld-linux-x86-64.so.2
#3  0x00007ffff7fe33a8 in _start ()
   from /nix/store/8xk4yl1r3n6kbyn05qhan7nbag7npymx-glibc-2.35-224/lib/ld-linux-x86-64.so.2
#4  0x0000000000000001 in ?? ()
#5  0x00007fffffffd1f3 in ?? ()
#6  0x0000000000000000 in ?? ()

If gdb is to be believed loadable program headers that contain ".dynstr" and ".dynamic" are not what they should be.

LD_DEBUG also suggests very little could be loaded by ld.so:

$ LD_DEBUG=all emacs
   2608983:     symbol=__vdso_clock_gettime;  lookup in file=linux-vdso.so.1 [0]
   2608983:     binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_gettime' [LINUX_2.6]
   2608983:     symbol=__vdso_gettimeofday;  lookup in file=linux-vdso.so.1 [0]
   2608983:     binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_gettimeofday' [LINUX_2.6]
   2608983:     symbol=__vdso_time;  lookup in file=linux-vdso.so.1 [0]
   2608983:     binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_time' [LINUX_2.6]
   2608983:     symbol=__vdso_getcpu;  lookup in file=linux-vdso.so.1 [0]
   2608983:     binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_getcpu' [LINUX_2.6]
   2608983:     symbol=__vdso_clock_getres;  lookup in file=linux-vdso.so.1 [0]
   2608983:     binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_getres' [LINUX_2.6]
Segmentation fault (core dumped)

eu-elflint is also unhappy:

$ eu-elflint /nix/store/lzahvwakhghr8b3ri40s935bwhn7nf0x-emacs-28.2/bin/emacs
section [ 7] '.dynstr' not fully contained in segment of program header entry 2
section [ 8] '.dynamic': alloc flag set but section not in any loaded segment
section [29] '.symtab': symbol 1 (__abi_tag): st_value out of bounds
section [29] '.symtab': _GLOBAL_OFFSET_TABLE_ symbol size 0 does not match .got section size 9736
section [29] '.symtab': symbol 5599 (_DYNAMIC): st_value out of bounds
section [29] '.symtab': _DYNAMIC_ symbol value 0x6ab560 does not match dynamic segment address 0x40ee60
section [29] '.symtab': _DYNAMIC symbol size 0 does not match dynamic segment size 1184
section [29] '.symtab': symbol 6650 (__bss_start): st_value out of bounds
loadable segment [2] is writable but contains no writable sections
@trofi trofi added the bug label Mar 18, 2023
@brenoguim brenoguim self-assigned this Mar 18, 2023
@brenoguim
Copy link
Collaborator

Thanks for reporting the issue.
I'll investigate when I get home.

I think I will be able to reproduce the issue easily using nix, but if you could attach the result of readelf -a -W for the binary, it always helps.

@mweinelt
Copy link
Member

@trofi
Copy link
Author

trofi commented Mar 18, 2023

Using emacs as an example bisected patchelf down to 42394e8 write out replace sections in original order.

It's gist st change of traversal from

    for (auto & i : replacedSections) {
        const std::string & sectionName = i.first;
        auto & shdr = findSectionHeader(sectionName);

to

    /* We iterate over the sorted section headers here, so that the relative
       position between replaced sections stays the same.  */
    for (auto & shdr : shdrs) {
        std::string sectionName = getSectionName(shdr);
        auto i = replacedSections.find(sectionName);
        if (i == replacedSections.end())
            continue;

I suspect it has a chance to miss newly added sections if patchelf ever does that. But maybe not in emacs case. readelf (attached below "before" and "after") says both have 31 sections. But I'm not sure I believe it.

Attaching readelf -a -W:

Looks like one of program headers got lost (or merged into existing one):

diff -u readelf-aw-good.txt  readelf-aw-bad.txt | cat
--- readelf-aw-good.txt 2023-03-18 18:30:42.604009844 +0000
+++ readelf-aw-bad.txt  2023-03-18 18:31:55.812435119 +0000
@@ -10,11 +10,11 @@
   Version:                           0x1
   Entry point address:               0x427ea0
   Start of program headers:          64 (bytes into file)
-  Start of section headers:          6757680 (bytes into file)
+  Start of section headers:          6753584 (bytes into file)
   Flags:                             0x0
   Size of this header:               64 (bytes)
   Size of program headers:           56 (bytes)
-  Number of program headers:         15
+  Number of program headers:         14

@brenoguim
Copy link
Collaborator

I debugged something like this. It had to do with rounding of load segments overlapping.
Thanks for this info!

@brenoguim
Copy link
Collaborator

I think this is the same scenario I saw in: #446

We can see the following LOAD segments with different read/write permissions:

  LOAD           0x000000 0x00000000003ff000 0x00000000003ff000 0x01057c 0x01057c RW  0x1000
  LOAD           0x01057c 0x000000000040f57c 0x000000000040f57c 0x0085cc 0x0085cc R   0x1000

The first one goes from 0x3ff000 to 0x40f57c
And the second goes from 0x40f57c to 0x417b48

Notice that the alignment is 0x1000 so the OS will have to map the pages:

  1. 0x3ff000 to 0x410000
  2. 0x40e000 to 0x418000

There is an overlap between the two. So the fix #469 should hopefully fix this issue.
I'll try the "master" commit tomorrow and see if it fixes the issue.

@trofi
Copy link
Author

trofi commented Mar 19, 2023

I tried master as well and it did not fix the issue for me. Tested as:

--- a/pkgs/applications/editors/emacs/generic.nix
+++ b/pkgs/applications/editors/emacs/generic.nix
@@ -46,6 +46,7 @@
   else "lucid")
 , withSystemd ? lib.meta.availableOn stdenv.hostPlatform systemd, systemd
 , withTreeSitter ? lib.versionAtLeast version "29", tree-sitter ? null
+, patchelfUnstable
 }:
 
 assert (libXft != null) -> libpng != null;      # probably a bug
@@ -135,7 +136,7 @@ assert withTreeSitter -> tree-sitter != null;
     ""
   ];
 
-  nativeBuildInputs = [ pkg-config makeWrapper ]
+  nativeBuildInputs = [ pkg-config makeWrapper patchelfUnstable ]
     ++ lib.optionals (srcRepo || withMacport) [ texinfo ]
     ++ lib.optionals srcRepo [ autoreconfHook ]
     ++ lib.optional (withX && (withGTK3 || withXwidgets)) wrapGAppsHook;
diff --git a/pkgs/development/tools/misc/patchelf/unstable.nix b/pkgs/development/tools/misc/patchelf/unstable.nix
index 66c14bd07e0..3f20cb7834f 100644
--- a/pkgs/development/tools/misc/patchelf/unstable.nix
+++ b/pkgs/development/tools/misc/patchelf/unstable.nix
@@ -2,13 +2,13 @@
 
 stdenv.mkDerivation rec {
   pname = "patchelf";
-  version = "unstable-2023-03-07";
+  version = "unstable-2023-03-18";
 
   src = fetchFromGitHub {
     owner = "NixOS";
     repo = "patchelf";
-    rev = "ea2fca765c440fff1ff74e1463444dea7b819db2";
-    sha256 = "sha256-IH80NcLhwjGpIXEjHuV+NgaSC+Y/PXquxZ/C8Bl+CLk=";
+    rev = "265b31ae22c6e1d20b01295aaa7bcf28fd31a5cf";
+    sha256 = "sha256-+iGvdjXvhk5mN8jp3u+M9fICKFqbtyZCx+WjQszaB1o=";
   };
 
   # Drop test that fails on musl (?)

@trofi
Copy link
Author

trofi commented Mar 19, 2023

NixOS/nixpkgs#221900 was merged into staging-next to fix emacs. You might need to revert the change locally to reproduce it on staging-next.

@brenoguim
Copy link
Collaborator

brenoguim commented Mar 19, 2023

Weird, I can't reproduce the crash using the commit before the merge.

# before merge
nix run https://github.com/NixOS/nixpkgs/archive/6c70dbc.gz#emacs

# after merge
nix run https://github.com/NixOS/nixpkgs/archive/ce7e136.gz#emacs

(I'm new to Nix, so I might be doing something wrong)

I can reproduce the messages from gdb and eu-elflint in both hashes.

@brenoguim
Copy link
Collaborator

I reverted it locally and now I can reproduce it. Not sure what is the difference, but let me get to it.

@trofi
Copy link
Author

trofi commented Mar 19, 2023

This commit fails for me (it's the one directly preceeding the patchelf-0.15.0 pin):

$ nix run https://github.com/NixOS/nixpkgs/archive/403b148aa51073bc343febbbfd041ecd495dbe3e.tar.gz#emacs
Segmentation fault (core dumped)

This should allow extracting exact binary:

$ nix build https://github.com/NixOS/nixpkgs/archive/403b148aa51073bc343febbbfd041ecd495dbe3e.tar.gz#emacs
$ result/bin/emacs
Segmentation fault (core dumped)

@brenoguim
Copy link
Collaborator

brenoguim commented Mar 19, 2023

Note so far:

  1. I generated an unpatched emacs by removing the patchelf invocation on emacs/generic.nix and I'm running them manually. Seems to be a good way to debug these things.
  2. patching emacs with the new layout engine from Implement an alternative layout engine #468 #477 generates good binaries.
  3. emacs is patched to change rpath and then add a needed so. Issue happens only after the "add needed".

Still investigating. Looks like the strtab is falling out of the LOAD segment that is supposed to map it into memory.

@brenoguim
Copy link
Collaborator

brenoguim commented Mar 19, 2023

I'm quite sure the missing thing is an else to this if:

if (neededSpace > startOffset) {

When we enter that if, we split a LOAD segment in two: the one the loads the replaced sections and the one that keeps loading the sections that stay in place.
However, for the "else", when there is enough space before the first non-replaced section, we don't check if the LOAD is large enough to map all the new rewritten sections. Then some of the sections may be dangling out of the load.

It's a bit hard though to nail the exact fix

@brenoguim
Copy link
Collaborator

#485

@vcunat
Copy link
Member

vcunat commented Mar 20, 2023

I think we should revert the patchelf default on staging-next for now – or switch to any other reliable version. I confirmed that ldc is broken by that as well, and I've seen some other build regressions that look caused by that. Spraying weird failures all over nixpkgs is just bad, and I fear not all will be even shown on Hydra (and I'm not counting out-of-official-repo use cases).

@vcunat
Copy link
Member

vcunat commented Mar 20, 2023

I wonder if you'd want to get a jobset on Hydra to verify a full nixpkgs rebuild before a (stable?) patchelf release is made. (or possibly even on a PR/branch if it's considered risky) This certainly isn't the first time we had to revert the default, e.g. I found NixOS/nixpkgs#69213

@brenoguim
Copy link
Collaborator

brenoguim commented Mar 20, 2023

I wonder if you'd want to get a jobset on Hydra to verify a full nixpkgs rebuild before a (stable?) patchelf release is made. (or possibly even on a PR/branch if it's considered risky)

That would be lovely. I can see that Patchelf has accumulated fixes without an associated test. It's a bit painful to recover from that and being able to verify all nix packages will certainly help a lot. Perhaps we should add "ldd" test after patching anything because it catches several issues.

bors bot added a commit that referenced this issue Apr 23, 2023
485: Resize segment mapping rewritten sections if needed #482 r=Mic92 a=brenoguim



Co-authored-by: Breno Rodrigues Guimaraes <brenorg@gmail.com>
@Mic92
Copy link
Member

Mic92 commented Apr 24, 2023

I have now bumped patchelf to 0.18.0 in a branch based on nixpkgs staging and my emacs seems to be no longer corrupted.

@Patryk27
Copy link
Member

pcloud seems to be affected by this as well - what's curious, when compiled with patchelfUnstable, it crashes inside ld-linux-x86-64.so.2!

i.e. doing:

diff --git a/pkgs/applications/networking/pcloud/default.nix b/pkgs/applications/networking/pcloud/default.nix
index 403d1e0cf34..93e9eb9b1d1 100644
--- a/pkgs/applications/networking/pcloud/default.nix
+++ b/pkgs/applications/networking/pcloud/default.nix
@@ -34,6 +34,7 @@
 , libXdamage
 , nss
 , udev
+, patchelfUnstable
 }:
 
 let
@@ -62,6 +63,7 @@ stdenv.mkDerivation {
 
   nativeBuildInputs = [
     autoPatchelfHook
+    patchelfUnstable
   ];
 
   buildInputs = [

... and then:

$ NIXPKGS_ALLOW_UNFREE=1 nix build --impure .#pcloud
$ gdb --args bash ./result/bin/pcloud
(gdb) b main
Breakpoint 1 at 0x31340
(gdb) r
Breakpoint 1, 0x0000555555585340 in main ()
(gdb) c

... results in:

warning: Loadable section ".interp" outside of ELF segments
  in /nix/store/5iarv8nx5i0q30d879i9v6wkbapxjvqb-pcloud-1.12.0/app/pcloud
warning: Loadable section ".note.ABI-tag" outside of ELF segments
  in /nix/store/5iarv8nx5i0q30d879i9v6wkbapxjvqb-pcloud-1.12.0/app/pcloud
warning: Loadable section ".dynstr" outside of ELF segments
  in /nix/store/5iarv8nx5i0q30d879i9v6wkbapxjvqb-pcloud-1.12.0/app/pcloud

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fece58 in strcmp () from /nix/store/yaz7pyf0ah88g2v505l38n0f3wg2vzdj-glibc-2.37-8/lib/ld-linux-x86-64.so.2
(gdb) bt
#0  0x00007ffff7fece58 in strcmp () from /nix/store/yaz7pyf0ah88g2v505l38n0f3wg2vzdj-glibc-2.37-8/lib/ld-linux-x86-64.so.2
#1  0x00007ffff7fdcaa4 in _dl_check_map_versions () from /nix/store/yaz7pyf0ah88g2v505l38n0f3wg2vzdj-glibc-2.37-8/lib/ld-linux-x86-64.so.2
#2  0x00007ffff7fdd0f0 in _dl_check_all_versions () from /nix/store/yaz7pyf0ah88g2v505l38n0f3wg2vzdj-glibc-2.37-8/lib/ld-linux-x86-64.so.2
#3  0x00007ffff7fe54bc in version_check_doit () from /nix/store/yaz7pyf0ah88g2v505l38n0f3wg2vzdj-glibc-2.37-8/lib/ld-linux-x86-64.so.2
#4  0x00007ffff7fcb3fb in _dl_receive_error () from /nix/store/yaz7pyf0ah88g2v505l38n0f3wg2vzdj-glibc-2.37-8/lib/ld-linux-x86-64.so.2
#5  0x00007ffff7fe7ae9 in dl_main () from /nix/store/yaz7pyf0ah88g2v505l38n0f3wg2vzdj-glibc-2.37-8/lib/ld-linux-x86-64.so.2
#6  0x00007ffff7fe4483 in _dl_sysdep_start () from /nix/store/yaz7pyf0ah88g2v505l38n0f3wg2vzdj-glibc-2.37-8/lib/ld-linux-x86-64.so.2
#7  0x00007ffff7fe5bac in _dl_start () from /nix/store/yaz7pyf0ah88g2v505l38n0f3wg2vzdj-glibc-2.37-8/lib/ld-linux-x86-64.so.2
#8  0x00007ffff7fe4a58 in _start () from /nix/store/yaz7pyf0ah88g2v505l38n0f3wg2vzdj-glibc-2.37-8/lib/ld-linux-x86-64.so.2
#9  0x0000000000000001 in ?? ()
#10 0x00007fffffffe184 in ?? ()
#11 0x0000000000000000 in ?? ()

eu-elflint also seems to have an issue with one of the files there:

$ eu-elflint /nix/store/5iarv8nx5i0q30d879i9v6wkbapxjvqb-pcloud-1.12.0/app/libnode.so
section [ 2] '.dynsym': symbol 594 (_ZTSN6icu_6025CollationFastLatinBuilderE): st_value out of bounds                                                                                                    
section [ 2] '.dynsym': symbol 603 (_ZTSN6icu_608CalendarE): st_value out of bounds                                                                                                                      
section [ 2] '.dynsym': symbol 608 (_ZTSN6icu_609LocaleKeyE): st_value out of bounds                                                                                                                     
section [ 2] '.dynsym': symbol 694 (_ZN6icu_6011StringPiece4nposE): st_value out of bounds                                                                                                               
section [ 2] '.dynsym': symbol 704 (_ZN6icu_6016CollationBuilder11HAS_BEFORE2E): st_value out of bounds                                                                                                  
section [ 2] '.dynsym': symbol 705 (_ZN6icu_609Collation20LEVEL_SEPARATOR_BYTEE): st_value out of bounds                                                                                                 
section [ 2] '.dynsym': symbol 714 (_ZTSN6icu_6013ResourceValueE): st_value out of bounds                                                                                                                
section [ 2] '.dynsym': symbol 734 (_ZN2v88internal17GCIdleTimeHandler22kConservativeTimeRatioE): st_value out of bounds                                                                                 
section [ 2] '.dynsym': symbol 751 (_ZTSN6icu_6010GenderInfoE): st_value out of bounds                                                                                                                   
section [ 2] '.dynsym': symbol 769 (_ZTSN6icu_6022UIterCollationIteratorE): st_value out of bounds                                                                                                       
section [ 2] '.dynsym': symbol 828 (_ZTSN6icu_6014HebrewCalendarE): st_value out of bounds                                                                                                               
section [ 2] '.dynsym': symbol 840 (_ZN6icu_6016CollationBuilder11HAS_BEFORE3E): st_value out of bounds                                                                                                  
section [ 2] '.dynsym': symbol 935 (_ZTSN6icu_6014SimpleTimeZoneE): st_value out of bounds                                                                                                               
section [ 2] '.dynsym': symbol 959 (_ZN6icu_6018CalendarAstronomer2PIE): st_value out of bounds                                                                                                          
section [ 2] '.dynsym': symbol 989 (_ZN2v88internal11interpreter20ConstantArrayBuilder14k16BitCapacityE): st_value out of bounds
(+ like 100 more of those)

This issue seems to exist on all patchelf versions available in nixpkgs now (i.e. patchelf 0.13, 0.15 and unstable-2023-04-25 all generate invalid libnode.so's, which seems to have been somewhat exacerbated by NixOS/nixpkgs#209870 since it now additionally links libgcc_s.so.1).

@Patryk27
Copy link
Member

Patryk27 commented May 21, 2023

fwiw, it looks like pcloud (x86-64_linux) got (more) broken by #469 - i.e. doing:

diff --git a/pkgs/applications/networking/pcloud/default.nix b/pkgs/applications/networking/pcloud/default.nix
index 403d1e0cf34..93e9eb9b1d1 100644
--- a/pkgs/applications/networking/pcloud/default.nix
+++ b/pkgs/applications/networking/pcloud/default.nix
@@ -34,6 +34,7 @@
 , libXdamage
 , nss
 , udev
+, patchelfUnstable
 }:
 
 let
@@ -62,6 +63,7 @@ stdenv.mkDerivation {
 
   nativeBuildInputs = [
     autoPatchelfHook
+    patchelfUnstable
   ];
 
   buildInputs = [
diff --git a/pkgs/development/tools/misc/patchelf/unstable.nix b/pkgs/development/tools/misc/patchelf/unstable.nix
index 7d340cf547b..987f6bb8860 100644
--- a/pkgs/development/tools/misc/patchelf/unstable.nix
+++ b/pkgs/development/tools/misc/patchelf/unstable.nix
@@ -2,13 +2,13 @@
 
 stdenv.mkDerivation rec {
   pname = "patchelf";
-  version = "unstable-2023-04-25";
+  version = "unstable";
 
   src = fetchFromGitHub {
     owner = "NixOS";
     repo = "patchelf";
-    rev = "008a582741617e2d7d5aa4aab1e8ddfdec0067d9";
-    sha256 = "sha256-SC9zZbHN1p5BD6YHr+/ZNelmmZDozEO/vDwuCdJJCcs=";
+    rev = "27cbc89d4830d5ae1fe3a2396f2a6042266895bc";
+    sha256 = "sha256-FxwKznM/xcYZAmeKMAKYA2qkED4Zfayr62R7cg8AORA=";
   };
 
   # Drop test that fails on musl (?)

... generates a file that crashes over ld-linux-x86-64.so.2 (like I mentioned above), but going a single commit before:

diff --git a/pkgs/applications/networking/pcloud/default.nix b/pkgs/applications/networking/pcloud/default.nix
index 403d1e0cf34..93e9eb9b1d1 100644
--- a/pkgs/applications/networking/pcloud/default.nix
+++ b/pkgs/applications/networking/pcloud/default.nix
@@ -34,6 +34,7 @@
 , libXdamage
 , nss
 , udev
+, patchelfUnstable
 }:
 
 let
@@ -62,6 +63,7 @@ stdenv.mkDerivation {
 
   nativeBuildInputs = [
     autoPatchelfHook
+    patchelfUnstable
   ];
 
   buildInputs = [
diff --git a/pkgs/development/tools/misc/patchelf/unstable.nix b/pkgs/development/tools/misc/patchelf/unstable.nix
index 7d340cf547b..cd986b539a4 100644
--- a/pkgs/development/tools/misc/patchelf/unstable.nix
+++ b/pkgs/development/tools/misc/patchelf/unstable.nix
@@ -2,13 +2,13 @@
 
 stdenv.mkDerivation rec {
   pname = "patchelf";
-  version = "unstable-2023-04-25";
+  version = "unstable";
 
   src = fetchFromGitHub {
     owner = "NixOS";
     repo = "patchelf";
-    rev = "008a582741617e2d7d5aa4aab1e8ddfdec0067d9";
-    sha256 = "sha256-SC9zZbHN1p5BD6YHr+/ZNelmmZDozEO/vDwuCdJJCcs=";
+    rev = "ac212d0e6fb8b741e5a5e9ea61091149103f401c";
+    sha256 = "sha256-JtobCiZEl3KeXT5CAhXTRhjAPgTVx2upVAUTJNCb/a0=";
   };
 
   # Drop test that fails on musl (?)

... yields a binary/library that at least can be loaded - I mean, some symbols in libnode.so there seem to be still linked in a wrong way, but at least it doesn't cause ld to crash 😄 (NixOS/nixpkgs#226339 (comment)).

Edit: also, in all the invalid cases libnode.so has funky procmap:

      0x7ffff6600000     0x7ffff6dc8000   0x7c8000        0x0  r--p   /nix/store/5iarv8nx5i0q30d879i9v6wkbapxjvqb-pcloud-1.12.0/app/libnode.so
      0x7ffff6dc8000     0x7ffff7a7d000   0xcb5000   0x7c8000  r-xp   /nix/store/5iarv8nx5i0q30d879i9v6wkbapxjvqb-pcloud-1.12.0/app/libnode.so
      0x7ffff7a7d000     0x7ffff7b2c000    0xaf000  0x147d000  rw-p   /nix/store/5iarv8nx5i0q30d879i9v6wkbapxjvqb-pcloud-1.12.0/app/libnode.so
      0x7ffff7b2c000     0x7ffff7b43000    0x17000        0x0  rw-p   
      0x7ffff7b43000     0x7ffff7fbd000   0x47a000  0x152d000  rw-p   /nix/store/5iarv8nx5i0q30d879i9v6wkbapxjvqb-pcloud-1.12.0/app/libnode.so

... where the fourth, uhm, part (segment? not sure on the terminology) has an offset of zero; I don't know much about elf files or how the stuff gets mapped into RAM, but it feels sus.

For comparison, here's a correct libnode.so (taken from pcloud built from nixpkgs:fdd49f1bcd8a7f0b5e29f550d698b2abe5c540cd):

      0x7ffff6a00000     0x7ffff71c8000   0x7c8000        0x0  r--p   /nix/store/dp16s7cfwslam9rd5l0mkj9skrvy49aq-pcloud-1.12.0/app/libnode.so                                                           
      0x7ffff71c8000     0x7ffff7e7d000   0xcb5000   0x7c8000  r-xp   /nix/store/dp16s7cfwslam9rd5l0mkj9skrvy49aq-pcloud-1.12.0/app/libnode.so                                                           
      0x7ffff7e7d000     0x7ffff7f2c000    0xaf000  0x147d000  rw-p   /nix/store/dp16s7cfwslam9rd5l0mkj9skrvy49aq-pcloud-1.12.0/app/libnode.so 

Edit 2: here's a patchelf's log when building an invalid libnode.so:

searching for dependencies of /nix/store/rjw66ywwkd9r85559fnsiyw37idjhbxh-pcloud-1.12.0/app/libnode.so
    libgcc_s.so.1 -> found: /nix/store/5gk8zqasr9hdhm9nhl0y7g0g7bf5lvbc-gcc-12.2.0-libgcc/lib
setting RPATH to: /nix/store/5gk8zqasr9hdhm9nhl0y7g0g7bf5lvbc-gcc-12.2.0-libgcc/lib
patching ELF file '/nix/store/rjw66ywwkd9r85559fnsiyw37idjhbxh-pcloud-1.12.0/app/libnode.so'
new rpath is '/nix/store/5gk8zqasr9hdhm9nhl0y7g0g7bf5lvbc-gcc-12.2.0-libgcc/lib'
rpath is too long or shared, resizing...
DT_NULL index is 30
replacing section '.dynamic' with size 512
replacing section '.dynstr' with size 1046753
this is a dynamic library
last page is 0x1543000
first page is 0x0
needed space is 4690320
rewriting section '.rodata' from offset 0x240 (size 3643048) to offset 0x152d000 (size 3643048)
rewriting section '.dynstr' from offset 0x40ee54 (size 1046687) to offset 0x18a66a8 (size 1046753)
rewriting section '.dynamic' from offset 0x1528250 (size 496) to offset 0x19a5f90 (size 512)
rewriting symbol table section 2
writing /nix/store/rjw66ywwkd9r85559fnsiyw37idjhbxh-pcloud-1.12.0/app/libnode.so

The needed space here is kinda suspicious as well, considering how large it is 🤔

@attila-lendvai
Copy link

attila-lendvai commented Sep 27, 2024

i think i'm seeing the same, or very similar, on a recent guix (i.e. patchelf 0.18.0).

i'm downloading the go-ethereum binary release and patching it to run on guix using patchelf (package source is available here).

if i read the git log correctly, then what broke it for me was a patchelf update from 0.11 to 0.18.0.

the moving parts are:

  • the patchelf update
  • i think go-ethereum also switched to golang 1.23.0 at the release that broke.
$ gdb ./geth
GNU gdb (GDB) 14.2
Reading symbols from ./geth...
(No debugging symbols found in ./geth)
(gdb) r
Starting program: /tmp/guix-build-geth-binary-1.14.10.drv-0/geth 

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fe7c0a in dl_main () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/ld-linux-x86-64.so.2
(gdb) back
#0  0x00007ffff7fe7c0a in dl_main () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/ld-linux-x86-64.so.2
#1  0x00007ffff7fe45ef in _dl_sysdep_start () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/ld-linux-x86-64.so.2
#2  0x00007ffff7fe5d9c in _dl_start () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/ld-linux-x86-64.so.2
#3  0x00007ffff7fe4ba8 in _start () from /gnu/store/zvlp3n8iwa1svxmwv4q22pv1pb1c9pjq-glibc-2.39/lib/ld-linux-x86-64.so.2
#4  0x0000000000000001 in ?? ()
#5  0x00007fffffffc2fb in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb) 
$ LD_DEBUG=all ./geth
     10265:	symbol=__vdso_clock_gettime;  lookup in file=linux-vdso.so.1 [0]
     10265:	binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_gettime' [LINUX_2.6]
     10265:	symbol=__vdso_gettimeofday;  lookup in file=linux-vdso.so.1 [0]
     10265:	binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_gettimeofday' [LINUX_2.6]
     10265:	symbol=__vdso_time;  lookup in file=linux-vdso.so.1 [0]
     10265:	binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_time' [LINUX_2.6]
     10265:	symbol=__vdso_getcpu;  lookup in file=linux-vdso.so.1 [0]
     10265:	binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_getcpu' [LINUX_2.6]
     10265:	symbol=__vdso_clock_getres;  lookup in file=linux-vdso.so.1 [0]
     10265:	binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_getres' [LINUX_2.6]
Segmentation fault
$ 

@attila-lendvai
Copy link

a nonguix issue that is probably related: https://gitlab.com/nonguix/nonguix/-/issues/350

nvidia-smi is also written in golang.

@Patryk27
Copy link
Member

@attila-lendvai could you try using #544?

@attila-lendvai
Copy link

@attila-lendvai could you try using #544?

sadly, i do not notice any difference. it looks like it produces the same output with my binary.

but i've set up myself to relatively easily experiment with various different patchelf versions, so let me know if i should try anything else!

@attila-lendvai
Copy link

someone at @nonguix identified v0.16.1 as the latest that still works.

i tried go-ethereum with that version, and i can confirm that it works with v0.16.1.

@Mic92 Mic92 closed this as completed in #544 Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants