Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release/18.x: [libcxx] Align __recommend() + 1 by __endian_factor (#90292) #95264

Merged
merged 2 commits into from
Jun 15, 2024

Conversation

llvmbot
Copy link
Collaborator

@llvmbot llvmbot commented Jun 12, 2024

Backport 382f70a d129ea8

Requested by: @DimitryAndric

@llvmbot llvmbot requested a review from a team as a code owner June 12, 2024 15:43
@llvmbot llvmbot added this to the LLVM 18.X Release milestone Jun 12, 2024
@llvmbot
Copy link
Collaborator Author

llvmbot commented Jun 12, 2024

@ldionne @philnik777 What do you think about merging this PR to the release branch?

@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jun 12, 2024
@llvmbot
Copy link
Collaborator Author

llvmbot commented Jun 12, 2024

@llvm/pr-subscribers-libcxx

Author: None (llvmbot)

Changes

Backport 382f70a d129ea8

Requested by: @DimitryAndric


Full diff: https://github.com/llvm/llvm-project/pull/95264.diff

1 Files Affected:

  • (modified) libcxx/include/string (+3-3)
diff --git a/libcxx/include/string b/libcxx/include/string
index ba169c3dbfc9e..56e2ef09947f4 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -1943,10 +1943,10 @@ private:
     if (__s < __min_cap) {
       return static_cast<size_type>(__min_cap) - 1;
     }
-    size_type __guess =
-        __align_it < sizeof(value_type) < __alignment ? __alignment / sizeof(value_type) : 1 > (__s + 1) - 1;
+    const size_type __boundary = sizeof(value_type) < __alignment ? __alignment / sizeof(value_type) : __endian_factor;
+    size_type __guess          = __align_it<__boundary>(__s + 1) - 1;
     if (__guess == __min_cap)
-      ++__guess;
+      __guess += __endian_factor;
     return __guess;
   }
 

@ldionne
Copy link
Member

ldionne commented Jun 12, 2024

Approved per discussion in #90292.

@vitalybuka It would be amazing if you could run this PR through the same tests that made you find the bug that you fixed with #90292

@vitalybuka vitalybuka self-requested a review June 12, 2024 19:37
@vitalybuka
Copy link
Collaborator

@vitalybuka It would be amazing if you could run this PR through the same tests that made you find the bug that you fixed with #90292

Looking, I will approve after validation.

@vitalybuka
Copy link
Collaborator

release/18.x https://lab.llvm.org/buildbot/#/builders/168/builds/20916 OK
release/18.x+#95264 https://lab.llvm.org/buildbot/#/builders/168/builds/20917 OK

Problem is that release/18.x suppose to fail, but it does not fail, because sized new/delete is OFF there. So:

release/18.x+#90373 https://lab.llvm.org/buildbot/#/builders/168/builds/20919 BAD
release/18.x+#90373+#95264 https://lab.llvm.org/buildbot/#/builders/168/builds/20920 OK

LGTM

@DimitryAndric
Copy link
Collaborator

Do we need to merge this manually? GitHub's UI does not allow to merge the PR. :)

@mordante
Copy link
Member

Do we need to merge this manually? GitHub's UI does not allow to merge the PR. :)

I expect this requires by @tstellar (the release manager).

ldionne and others added 2 commits June 15, 2024 10:21
)

Previously, there was a ternary conditional with a less-than comparison
appearing inside a template argument, which was really confusing because
of the <...> of the function template. This patch rewrites the same
statement on two lines for clarity.

(cherry picked from commit 382f70a)
This is detected by asan after llvm#83774

Allocation size will be divided by `__endian_factor` before storing. If
it's not aligned,
we will not be able to recover allocation size to pass into
`__alloc_traits::deallocate`.

we have code like this
```
 auto __allocation = std::__allocate_at_least(__alloc(), __recommend(__sz) + 1);
    __p               = __allocation.ptr;
    __set_long_cap(__allocation.count);

void __set_long_cap(size_type __s) _NOEXCEPT {
    __r_.first().__l.__cap_     = __s / __endian_factor;
    __r_.first().__l.__is_long_ = true;
  }

size_type __get_long_cap() const _NOEXCEPT {
    return __r_.first().__l.__cap_ * __endian_factor;
  }

inline ~basic_string() {
    __annotate_delete();
    if (__is_long())
      __alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap());
  }
```
1. __recommend() -> even size
2. `std::__allocate_at_least(__alloc(), __recommend(__sz) + 1)` - > not
even size
3. ` __set_long_cap() `- > lose one bit of size for __endian_factor == 2
(see `/ __endian_factor`)
4. `__alloc_traits::deallocate(__alloc(), __get_long_pointer(),
__get_long_cap())` -> uses even size (see `__get_long_cap`)

(cherry picked from commit d129ea8)
@tstellar tstellar merged commit 3b5b5c1 into llvm:release/18.x Jun 15, 2024
39 of 46 checks passed
@tstellar
Copy link
Collaborator

Could someone write a release note for this?

ldionne added a commit to ldionne/llvm-project that referenced this pull request Jun 19, 2024
@ldionne
Copy link
Member

ldionne commented Jun 19, 2024

Could someone write a release note for this?

#96116

@tickerguy
Copy link

This appears to blow up compilation on the Pi3 for certain source files on FreeBSD 14.1 with a SEGV in 'AArch64O0PreLegalizerCombiner'.

Specifically I have filed this bug thread for FreeBSD: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280038

I have bisected the source and reverting this commit resolves the crash. I have a reproducing source file that is only ~400 lines with it being nearly all OpenSSL calls -- and in an attempt to further isolate it, as the FreeBSD bugtracker will not take the reproducer file (too large) as that source file was already split off and very small I took individual functions out into their own source files -- and got it to dump on two of them but not consistently; often, if I simply re-type "make" it would complete the second time!

The blow-up only happens with this commit in, and only on the Pi3 -- and I have used both a very old one (no HAT header for POE, for example) and a brand new unit (has the header) with identical results, so a specific hardware fault is excluded. The same boot media inserted into a Pi4 and booted compiles the code without complaint. Further, the same code, on the same revision of OS and clang/llvm but built and running on an AMD64 system with the commit in also builds without complaint.

The reproducer files are still too large to upload into FreeBSD's bugtracker, but since I now have a routine that is ~100 lines that does reproduce it, I simply posted that function's source.

@ldionne
Copy link
Member

ldionne commented Jul 2, 2024

@tickerguy Thanks for the heads up.

@tstellar I read the thread in the linked FreeBSD tracker up to 2024-07-02 18:59:18 UTC, and based on what I am seeing they are still investigating the cause of the issue. I don't think we have grounds to do anything at this point: we know for certain we fixed a bug with this patch, and while there is a report of something fairly obscure happening on a single architecture on the FreeBSD tracker, at this point it seems more likely that we exposed an existing bug than anything else. Depending on how that thread develops we might learn new information that would change that, but not at this point (IMO).

@tickerguy
Copy link

tickerguy commented Jul 2, 2024

Its definitely architecture-specific. The media in question where I first saw this will boot and run on both a Pi3 and Pi4 but on the "4" it never blows up, whether via the original "make" or if I execute the reproducer shell script after it blows on the "3", I shut it down, stick the same card in the "4", boot it and then run the reproducer script. If I take the reproducer over to my much-larger AMD64 build system it completes every time as well.

On the "3" it does blow up, and not just there -- although its intermittent now that I've broken out the routines in the one file that was crashing reliably into three. This started when those routines, all of which are heavy with OpenSSL function calls (and little else; they perform AES encryption, decryption, and MAC computation) were part of a much larger file and it failed reliably in those routines; I split them out in an attempt to isolate the problem. Obviously separating a function call from one file with a lot of functions into its own separate ".c" file shouldn't change anything but now it crashes a good part of the time - but not every time - in that if I run the reproducer script it will will crash perhaps half or 3/4 of the time -- but the other times completes and produces the expected .o file. That its intermittent is even worse, obviously, since you'd expect a compilation of the same file on the same CPU to act in a reasonably-deterministic fashion.

@tstellar
Copy link
Collaborator

tstellar commented Jul 2, 2024

@tickerguy From the bugzilla thread, do I understand correctly that the last good version of clang was 18.1.6 ?

@tickerguy
Copy link

tickerguy commented Jul 2, 2024

Reverting the one commit (three lines) fixes it on the Pi3 and does not break it on any of the other architectures I have. releng/14.1 does not have that commit in it, which is why I never saw it until I built on stable/14, which does. I thought it was elsewhere as there are other commits to llvm but bisection narrowed the blow-up down to this which is why I commented on this thread here specifically.

Of note I built the stable/14 media for the Pis on my AMD64 system that is running stable/14 and thus has this commit in it -- and it has no negative impact on that architecture (and of course a full OS build does a lot of compiling) which reinforces that the problem is architecture-specific.

root@NewFS:/usr/src.14-STABLE` # git diff 55c5dad2f305f74d1ff5ca85c453635511aab9b2~ 55c5dad2f305f74d1ff5ca85c453635511aab9b2
diff --git a/contrib/llvm-project/libcxx/include/string b/contrib/llvm-project/libcxx/include/string
index ba169c3dbfc9..56e2ef09947f 100644
--- a/contrib/llvm-project/libcxx/include/string
+++ b/contrib/llvm-project/libcxx/include/string
@@ -1943,10 +1943,10 @@ private:
     if (__s < __min_cap) {
       return static_cast<size_type>(__min_cap) - 1;
     }
-    size_type __guess =
-        __align_it < sizeof(value_type) < __alignment ? __alignment / sizeof(value_type) : 1 > (__s + 1) - 1;
+    const size_type __boundary = sizeof(value_type) < __alignment ? __alignment / sizeof(value_type) : __endian_factor;
+    size_type __guess          = __align_it<__boundary>(__s + 1) - 1;
     if (__guess == __min_cap)
-      ++__guess;
+      __guess += __endian_factor;
     return __guess;
   }

With that out it does not blow up but when it comes to why I lack sufficient understanding of the impacted code to know

@tickerguy
Copy link

My thread over on FreeBSD's bugtracker has more context after plenty of additional attempts to isolate this; specifically, simply changing whether or not assertions are enabled in llvm and a couple other places, or turning on debug file building,, even though such output files are not then in use when compilation is done, makes it disappear.

This looks like a rather obscure problem somewhere related specifically to the Pi3's CPU architecture (likely not the llvm project itself) but without being able to include symbols and such (I can get a backtrace from lldb when it happens but without symbols its machine code without much context) figuring out exactly why it occurs is looking rather problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
Development

Successfully merging this pull request may close these issues.

7 participants