Allow for non-ASCII decoding in legacy demangling #65

addisoncrump · 2023-09-28T01:32:59Z

There are some rare situations (I'm currently in one) where one needs non-ASCII demangling. This change implements non-ASCII support for legacy demangling.

bjorn3 · 2023-09-28T07:41:42Z

🐇 is not a valid character inside symbols. Some linkers will reject it and some tools don't handle it, not just rustc-demangle. Rustc would always encode it into something like $81$$81$$81$ (this is not the actual encoding, but just to give an idea)

addisoncrump · 2023-09-28T19:43:48Z

From what I've observed, it is inconsistently handled. c++filt and llvm-cxxfilt don't demangle it. That said, GDB and LLDB do demangle it. Since the cost of this change is effectively negligible¹ and there's no technical debt associated with maintaining this change (it is very small), I don't see why it is unfavourable to support the broadest set here.

In my case, this is an intentional post-processing redefinition to flag the symbols in a way that's visually recognisable. I've tested with mold, lld, and ld, and each accept these symbols. (I would note that it would be surprising if they could not support it, since the symbols are effectively copy-pasted at link-time -- and they are required to be supported for even more strange cases, like linkage with invalid UTF-8.)

Here's a screenshot from GDB with an example from one of the binaries built using our library:

¹Actually, it might be more performant since we don't scan the whole string ahead of time + the existing check does not account for special ASCII like CR/TAB/etc anyways.

addisoncrump · 2023-09-28T21:32:57Z

Unfortunately this PR is incomplete -- and perhaps more invasive than I originally thought. The existing strategy for demangle_line aggressively filters based on some assumptions about the input shape. I am refactoring this now.

addisoncrump · 2023-09-28T22:36:04Z

src/lib.rs

-            s = &s[..i];
+            thinlto_stripped = &s[..i];


Note a subtle change here: previously, this would strip the thinlto data from s before assigning original to it, which could lead to the removal of symbol data before passing it to original. This seemed to be erroneous (e.g. if style is None I would expect original to be unchanged) and would truncate the remaining line if unchanged (since we must keep remainder data). It is possible to return to original behaviour if this change is unwanted.

addisoncrump · 2023-09-28T22:38:09Z

src/lib.rs

-            output.write_all(mangled.as_bytes())?;
+            // there are maybe valid symbols inside this fake one
+            output.write_all(&line.as_bytes()[..1])?;
+            line = &line[1..];


Note again a subtle difference: this can now identify and write symbols inside what appear to be other symbols.

Wouldn't that accidentally demangle parts of C++ symbols as Rust symbols? If so that would break C++ demangling for programs that try Rust demangling before C++ demangling to handle both. Trying C++ demangling first is guaranteed to work for legacy Rust symbols, but doesn't demangle the $LT$ , $C$ , ... parts.

addisoncrump · 2023-09-28T22:39:04Z

src/lib.rs

-        let next_head = match (line[head..].find("_ZN"), line[head..].find("_R")) {
-            (Some(idx), None) | (None, Some(idx)) => head + idx,
-            (Some(idx1), Some(idx2)) => head + idx1.min(idx2),
+        let next_head = match (line.find("_ZN"), line.find("_R")) {


Is this valid for OSX? This conflicts with the tests.

workingjubilee · 2023-09-29T04:08:51Z

Does the problem as-such occur with v0 demangling?

addisoncrump · 2023-09-29T15:32:26Z

Haven't tested with v0 as I didn't have any samples of unicode in v0 symbols. I'll try this later.

workingjubilee · 2023-09-29T19:41:26Z

We would like to enable v0 mangling by default rather than continuing to focus on "improving" legacy mangling/demangling (an improvement that is somewhat questionable if other tooling that supports legacy mangling/demangling does not support or expect it), so it would be important to know how it is affected by the concern you have.

addisoncrump · 2023-10-02T13:31:39Z

I'll be honest, I don't entirely follow this logic 😕 Legacy mangling will be around for some time yet (even once the RFC/stabilisation lands) and this is something that is supported by GDB/LLDB. Moreover, I'm pretty sure this patch is faster than the previous behaviour. To test this, I wrote a benchmark:

Benchmark Code

extern crate criterion;
extern crate rustc_demangle;

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use rustc_demangle::demangle_stream;
use std::io::{BufReader, Sink};

const SOURCE: &str = include_str!("sample-mangled.txt");

fn demangle(source: &str) {
    let mut reader = BufReader::new(source.as_bytes());
    let mut writer = Sink::default();

    demangle_stream(&mut reader, &mut writer, false).unwrap();
}

fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("demangle", |b| b.iter(|| demangle(black_box(SOURCE))));
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

Here is the sample I used (from an ongoing project in which we will soon want the changes in this PR): sample-mangled.txt

Then tested with this PR:

$ cargo bench --bench demangle_large --features std
   Compiling rustc-demangle v0.1.23 (/home/addisoncrump/git/rustc-demangle)
    Finished bench [optimized] target(s) in 16.58s
     Running benches/demangle_large.rs (target/release/deps/demangle_large-0592d8a86e001704)
Gnuplot not found, using plotters backend
demangle                time:   [12.683 ms 12.701 ms 12.721 ms]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

With the original:

$ cargo bench --bench demangle_large --features std
   Compiling rustc-demangle v0.1.23 (/home/addisoncrump/git/rustc-demangle)
    Finished bench [optimized] target(s) in 16.62s
     Running benches/demangle_large.rs (target/release/deps/demangle_large-0592d8a86e001704)
Gnuplot not found, using plotters backend
demangle                time:   [19.847 ms 19.879 ms 19.917 ms]
                        change: [+56.172% +56.517% +56.900%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

There are significant performance improvements associated with being more permissive here without loss of generality or correctness.

allow for non-ASCII decoding

c9faa0b

addisoncrump marked this pull request as draft September 28, 2023 01:38

addisoncrump added 2 commits September 28, 2023 03:44

fixup and test

3cc40e7

separate tests for clarity

8fdefc2

addisoncrump marked this pull request as ready for review September 28, 2023 01:45

addisoncrump mentioned this pull request Sep 28, 2023

libafl_libfuzzer: rename all symbols AFLplusplus/LibAFL#1565

Merged

addisoncrump marked this pull request as draft September 28, 2023 21:28

demangle_line support for non-ASCII symbols

e9f84c1

addisoncrump marked this pull request as ready for review September 28, 2023 22:34

addisoncrump commented Sep 28, 2023

View reviewed changes

test for demangle_str with emoji

3ed564f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow for non-ASCII decoding in legacy demangling #65

Allow for non-ASCII decoding in legacy demangling #65

Uh oh!

addisoncrump commented Sep 28, 2023

Uh oh!

bjorn3 commented Sep 28, 2023

Uh oh!

addisoncrump commented Sep 28, 2023 •

edited

Loading

Uh oh!

addisoncrump commented Sep 28, 2023

Uh oh!

addisoncrump Sep 28, 2023 •

edited

Loading

Uh oh!

addisoncrump Sep 28, 2023

Uh oh!

bjorn3 Oct 2, 2023

Uh oh!

addisoncrump Sep 28, 2023

Uh oh!

workingjubilee commented Sep 29, 2023 •

edited

Loading

Uh oh!

addisoncrump commented Sep 29, 2023

Uh oh!

workingjubilee commented Sep 29, 2023 •

edited

Loading

Uh oh!

addisoncrump commented Oct 2, 2023 •

edited

Loading

Uh oh!

Uh oh!

Allow for non-ASCII decoding in legacy demangling #65

Are you sure you want to change the base?

Allow for non-ASCII decoding in legacy demangling #65

Uh oh!

Conversation

addisoncrump commented Sep 28, 2023

Uh oh!

bjorn3 commented Sep 28, 2023

Uh oh!

addisoncrump commented Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

addisoncrump commented Sep 28, 2023

Uh oh!

addisoncrump Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

addisoncrump Sep 28, 2023

Choose a reason for hiding this comment

Uh oh!

bjorn3 Oct 2, 2023

Choose a reason for hiding this comment

Uh oh!

addisoncrump Sep 28, 2023

Choose a reason for hiding this comment

Uh oh!

workingjubilee commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

addisoncrump commented Sep 29, 2023

Uh oh!

workingjubilee commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

addisoncrump commented Oct 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

addisoncrump commented Sep 28, 2023 •

edited

Loading

addisoncrump Sep 28, 2023 •

edited

Loading

workingjubilee commented Sep 29, 2023 •

edited

Loading

workingjubilee commented Sep 29, 2023 •

edited

Loading

addisoncrump commented Oct 2, 2023 •

edited

Loading