flt2dec: replace for loop by iter_mut #144205

hkBst · 2025-07-20T05:43:20Z

Perf is explored in #144118, which initially showed small losses, but then also showed significant gains. Both are real, but given the smallness of the losses, this seems a good change.

rustbot · 2025-07-20T05:43:25Z

r? @workingjubilee

rustbot has assigned @workingjubilee.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

workingjubilee · 2025-07-20T07:24:17Z

Performance related so isolating it for the usual reasons, though it's unclear how much it matters.

@bors r+ rollup=never

bors · 2025-07-20T07:24:19Z

📌 Commit 67b272a has been approved by workingjubilee

It is now in the queue for this repository.

hanna-kruppe · 2025-07-20T08:03:34Z

library/core/src/num/flt2dec/mod.rs

-            for j in i + 1..d.len() {
-                d[j] = b'0';
-            }
+            d.iter_mut().skip(i + 1).for_each(|c| *c = b'0');


I don’t think this is clearly better or worse than before. But how about d[i+1..].fill(b'0')?

That would be even better. I added it as an option to my bench crate, and performance on eu dev machines looks like this:
For aarch64:

test bench_round_up_fill ... bench: 974,458.62 ns/iter (+/- 17,236.78) test bench_round_up_for ... bench: 1,055,622.70 ns/iter (+/- 2,128.75) test bench_round_up_iter ... bench: 855,721.80 ns/iter (+/- 2,159.27)

For x86_64:

test bench_round_up_fill ... bench: 730,473.60 ns/iter (+/- 393.90) test bench_round_up_for ... bench: 730,497.57 ns/iter (+/- 894.02) test bench_round_up_iter ... bench: 740,172.60 ns/iter (+/- 954.45)

On aarch the (+/-) is always strangely large.

I suspect your benchmarks are reading tea leaves, not nailing down meaningful differences, because at least the iter_mut().skip() version should also be readily identifiable as equivalent to memset. Consider checking if one variant calls memset and the other doesn’t — if memset vs inline loop makes a real difference then any changes here will be extremely fragile as they depend on loop idiom recognition working or not working for this loop (which may also differ between isolated benchmark of this one function vs. benchmark of the whole flt2dec rabbit hole).

Oh. I looked at the codegen in your godbolt link and both variants you tried already get turned into memsets. The reason why they benchmark differently is that your benchmark runs on a 100k character buffer filled with '9's so the cost is dominated by the initial "scan backwards for first non-9" part instead. I don't know why the phrasing of the memset makes a difference for codegen in that part of the function, but now I really don't trust the numbers nor do I have faith that this will at all reproduce in the context of the flt2dec routines. The actual buffer size is orders of magnitude smaller, effects on other arms of this match aren't benchmarked at all, and if it's weird spooky action at a distance that affects the codegen for the rposition loop, then inlining it into the callers may have similarly unpredictable effects.

All of these concerns could be avoided by benchmarking a full trip through the formatting machinery. I appreciate that it's difficult to find an input that hits the path you're interested in and makes that piece of the code hot enough to get a measurable signal. But the "easier" alternative of reducing to the smallest possible benchmark and putting it a microscope can waste your time in other ways!

If I reduce length to 10, then I get:
Aarch64:

test bench_round_up_fill ... bench: 14.86 ns/iter (+/- 0.01) test bench_round_up_for ... bench: 16.07 ns/iter (+/- 0.01) test bench_round_up_iter ... bench: 16.07 ns/iter (+/- 0.06)

X86_64:

test bench_round_up_fill ... bench: 10.47 ns/iter (+/- 0.01) test bench_round_up_for ... bench: 12.85 ns/iter (+/- 0.02) test bench_round_up_iter ... bench: 9.84 ns/iter (+/- 0.07)

I still don't know if those numbers are at all meaningful, but at least they no longer give a reason to not use the fill version for the readability win, I guess?

Fill it is!

workingjubilee · 2025-07-20T18:09:11Z

@bors r-

workingjubilee · 2025-07-20T19:21:44Z

assuming this doesn't magically fail tidy

@bors r+

bors · 2025-07-20T19:21:47Z

📌 Commit f147716 has been approved by workingjubilee

It is now in the queue for this repository.

flt2dec: replace for loop by iter_mut Perf is explored in #144118, which initially showed small losses, but then also showed significant gains. Both are real, but given the smallness of the losses, this seems a good change.

bors · 2025-07-21T22:51:36Z

⌛ Testing commit f147716 with merge 42266c2...

jieyouxu · 2025-07-22T06:35:23Z

I think queue is very borked

jieyouxu · 2025-07-22T06:36:07Z

@bors retry r- (manual status refresh, maybe github outage yesterday?)

jieyouxu · 2025-07-22T06:36:20Z

@bors r=workingjubilee

bors · 2025-07-22T06:36:22Z

📌 Commit f147716 has been approved by workingjubilee

It is now in the queue for this repository.

bors · 2025-07-22T08:28:32Z

⌛ Testing commit f147716 with merge c0b282f...

bors · 2025-07-22T11:27:51Z

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing c0b282f to master...

github-actions · 2025-07-22T11:30:49Z

What is this?

This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 9748d87 (parent) -> c0b282f (this PR)

Test differences

Show 3 test diffs

3 doctest diffs were found. These are ignored, as they are noisy.

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard c0b282f0ccdab7523cdb8dfa41b23bed5573da76 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

x86_64-apple-2: 3896.3s -> 5086.5s (30.5%)
dist-aarch64-linux: 8243.9s -> 5773.8s (-30.0%)
dist-x86_64-apple: 7267.1s -> 9279.4s (27.7%)
dist-apple-various: 7480.8s -> 8708.3s (16.4%)
pr-check-1: 1598.3s -> 1858.2s (16.3%)
pr-check-2: 2242.3s -> 2587.3s (15.4%)
dist-aarch64-apple: 5905.1s -> 6677.7s (13.1%)
i686-gnu-nopt-1: 7194.3s -> 8100.7s (12.6%)
x86_64-gnu-llvm-19-1: 3263.5s -> 3660.2s (12.2%)
i686-gnu-2: 5376.7s -> 6023.7s (12.0%)

How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

rust-timer · 2025-07-22T13:34:54Z

Finished benchmarking commit (c0b282f): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (secondary -1.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.0%	[-1.0%, -1.0%]	1
All ❌✅ (primary)	-	-	0

Cycles

Results (secondary -2.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.5%	[-2.5%, -2.5%]	1
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 465.536s -> 463.984s (-0.33%)
Artifact size: 374.60 MiB -> 374.67 MiB (0.02%)

RalfJung · 2025-07-24T16:44:08Z

Please remember to update PR descriptions and titles when the PR contents change. Too late for this one though, that's now permanently recorded in the git history.

rustbot assigned workingjubilee Jul 20, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 20, 2025

hkBst changed the title ~~flt2dec: fix some clippy lints~~ flt2dec: change a for loop by iter_mut Jul 20, 2025

hkBst changed the title ~~flt2dec: change a for loop by iter_mut~~ flt2dec: replace a for loop by iter_mut Jul 20, 2025

hkBst changed the title ~~flt2dec: replace a for loop by iter_mut~~ flt2dec: replace for loop by iter_mut Jul 20, 2025

hkBst mentioned this pull request Jul 20, 2025

address clippy formatting nits #143423

Merged

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 20, 2025

hanna-kruppe reviewed Jul 20, 2025

View reviewed changes

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 20, 2025

flt2dec: use fill instead of loop

f147716

rust-cloud-vms bot force-pushed the flt2dec branch from 67b272a to f147716 Compare July 20, 2025 18:45

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 20, 2025

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 22, 2025

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 22, 2025

bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 22, 2025

bors merged commit c0b282f into rust-lang:master Jul 22, 2025
12 checks passed

rustbot added this to the 1.90.0 milestone Jul 22, 2025

hkBst deleted the flt2dec branch July 22, 2025 15:09

flt2dec: replace for loop by iter_mut #144205

flt2dec: replace for loop by iter_mut #144205

Uh oh!

Conversation

hkBst commented Jul 20, 2025

Uh oh!

rustbot commented Jul 20, 2025

Uh oh!

workingjubilee commented Jul 20, 2025

Uh oh!

bors commented Jul 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanna-kruppe Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

workingjubilee commented Jul 20, 2025

Uh oh!

workingjubilee commented Jul 20, 2025

Uh oh!

bors commented Jul 20, 2025

Uh oh!

bors commented Jul 21, 2025

Uh oh!

jieyouxu commented Jul 22, 2025

Uh oh!

jieyouxu commented Jul 22, 2025

Uh oh!

jieyouxu commented Jul 22, 2025

Uh oh!

bors commented Jul 22, 2025

Uh oh!

bors commented Jul 22, 2025

Uh oh!

bors commented Jul 22, 2025

Uh oh!

Uh oh!

github-actions bot commented Jul 22, 2025

Test differences

Job duration changes

Uh oh!

rust-timer commented Jul 22, 2025

Overall result: no relevant changes - no action needed

Uh oh!

RalfJung commented Jul 24, 2025

Uh oh!

Uh oh!

hanna-kruppe Jul 20, 2025 •

edited

Loading