Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repeat method for (byte) slices? #48784

Closed
alexreg opened this issue Mar 6, 2018 · 23 comments · Fixed by #64877
Closed

repeat method for (byte) slices? #48784

alexreg opened this issue Mar 6, 2018 · 23 comments · Fixed by #64877
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@alexreg
Copy link
Contributor

alexreg commented Mar 6, 2018

There exists a str::repeat method, but no equivalent for (byte) slices. This would seem like a useful feature. Should we consider adding it to the stdlib?

@TimNN TimNN added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Mar 6, 2018
@Centril
Copy link
Contributor

Centril commented Mar 7, 2018

Possibly generalized (or not, but let's entertain the notion...)

pub trait Repeat {
    type Repeated;
    fn repeat(&self, times: usize) -> Self::Repeated;
}

@sfackler
Copy link
Member

sfackler commented Mar 7, 2018

Seems reasonable to add this to [T] where T: Clone.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 7, 2018

@Centril Heh, I should expect something abstract from you! Could definitely do that, and implement it even for things like iterators, though it's a bit more work...

@sfackler Yep, that certainly wouldn't be too liberal

@scottmcm
Copy link
Member

scottmcm commented Mar 7, 2018

Doesn't the general one exist already? iter::repeat(my_into_iter).take(n).flatten().collect()

That said, #48657 could easily be moved to [T] where T: Copy, and that way the str/String version would just wrap the slice/Vec version. That seems like a more interesting option to me, since it's taking advantage of the "slice-ness", rather than just being a wrapper around iterator methods.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 7, 2018

@scottmcm What's the efficiency of that one like? I bet it's nothing close to memcpy, which could effectively be done by specialising on Repeat for [T] where T: Copy.

@scottmcm
Copy link
Member

scottmcm commented Mar 7, 2018

@alexreg That's what I meant by my second paragraph. slice::repeat<T:Copy>(x: &[T], n: usize) -> Vec<T> sounds great, but I don't see a need for it for iterators or T: Clone.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 7, 2018

@scottmcm Yeah, that's certainly a fair idea. Do you know if iter::repeat(my_into_iter).take(n).flatten().collect() is as efficient as pre-allocating a Vec of the requisite size and repeatedly calling extend on it? If so, I agree with you fully. Otherwise, there may be a need for a Repeat trait.

@sfackler
Copy link
Member

sfackler commented Mar 8, 2018

We can implement it for T: Clone and specialize to #48657's implementation for T: Clone. None of that needs to affect the method's public API though.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 8, 2018

@sfackler Yep, that sounds like a good solution to be honest. Even if it's no more efficient that the iterator approach for T: Clone, it would at least provide a consistent interface!

@scottmcm
Copy link
Member

scottmcm commented Mar 8, 2018

@alexreg Unfortunately I know for sure that literally .flatten().collect() is unlikely to be great (see #48544 for a discussion of why, though that PR wouldn't help this case). But I would expect that using #48597 .collect_into(Vec::with_capacity(my_slice.len() * n)) instead would let the iterator method do just as well as something more manual.

@alexreg
Copy link
Contributor Author

alexreg commented Mar 8, 2018

@scottmcm Fair enough, though I think in that case why might as well provide that as a default fn on the Repeat trait, and allow for specialisations when T: Copy (and potentially other cases too).

bors added a commit that referenced this issue Apr 24, 2018
@dtolnay dtolnay reopened this Apr 24, 2018
@dtolnay dtolnay added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Apr 24, 2018
@kennytm
Copy link
Member

kennytm commented Oct 27, 2018

@rfcbot poll @rust-lang/libs I'd like to stabilize [T]::repeat.

The signature in #48999 is

fn repeat(&self, n: usize) -> Vec<T> where T: Copy;

and the implementation uses on copy_nonoverlapping which relies on the property of Copy. There's suggestion to generalize the bound to T: Clone, however I don't think this is a stabilization blocker:

  1. T: Copy is what makes slice::repeat special. If T: Clone + !Copy, the performance is probably just similar to iter::repeat(_).take(_).flatten().collect() and can be used instead.

  2. Since Copy is a subtrait of Clone, we can upgrade the bound to T: Clone without breaking backward compatibility.

OTOH it is really simple to extend the bound to T: Clone today as we could use specialization internally (as explained in #48784 (comment)):

// note: private trait 
trait RepeatSlice: Sized {
    fn repeat_slice(slice: &[Self], n: usize) -> Vec<Self>;
}

impl<T: Clone> RepeatSlice for T {
    default fn repeat_slice(slice: &[Self], n: usize) -> Vec<Self> {
        let mut res = Vec::with_capacity(n * slice.len());
        for _ in 0..n {
            res.extend_from_slice(slice);
        }
        res
    }
}

impl<T: Copy> RepeatSlice for T {
    fn repeat_slice(slice: &[Self], n: usize) -> Vec<Self> {
        // the original copy_nonoverlapping implementation
    }
}

impl<T> [T] {
    pub fn repeat(&self, n: usize) -> Vec<T> 
    where 
        T: Clone,
    {
        T::repeat_slice(self, n)
    }
}

@rfcbot
Copy link

rfcbot commented Oct 27, 2018

Team member @kennytm has asked teams: T-libs, for consensus on:

I'd like to stabilize [T]::repeat.

@SimonSapin
Copy link
Contributor

From the implementation:

        // If `n` is larger than zero, it can be split as
        // `n = 2^expn + rem (2^expn > rem, expn >= 0, rem >= 0)`.
        // `2^expn` is the number represented by the leftmost '1' bit of `n`,
        // and `rem` is the remaining part of `n`.
        // `2^expn` repetition is done by doubling `buf` `expn`-times.

I guess that it’s this way because fewer calls to copy_nonoverlapping with larger slices are assumed to be faster. This may be the case up to "medium" sizes compared to e.g. copying a single byte at a time. However at some point, going back "far away" in memory might hurt cache locality.

Do we have evidence one way or another to justify such complex code?

@BurntSushi
Copy link
Member

Is there anything blocking stabilization here? It looks like @SimonSapin's comment above here is "just" an implementation nit.

@scottmcm
Copy link
Member

@SimonSapin The code came from str::repeat, which has some performance numbers in the PR that added this implementation method: #48657 (comment) #48657 (comment)

@scottmcm scottmcm added the disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. label Jun 19, 2019
@SimonSapin
Copy link
Contributor

Right, my comment should not be taken as blocking stabilization. It’s purely an implementation detail and can change later.

@dtolnay @withoutboats @sfackler This pFCP awaits your checkboxes above.

@norcalli
Copy link
Contributor

@sfackler @withoutboats I came across this again and read through the comments to find nothing blocking stabilization other than your votes which have been absent for 9 months. It would seem prudent to comment before it is further forgotten.

@SimonSapin
Copy link
Contributor

#48784 (comment) is a poll, not a pFCP. These days rfcbot starts FCP when it has not more than two check boxes missing, so I think this one should be considered accepted. Feel free to send a stabilization PR.

@sinkuu
Copy link
Contributor

sinkuu commented Oct 11, 2019

I guess that it’s this way because fewer calls to copy_nonoverlapping with larger slices are assumed to be faster. This may be the case up to "medium" sizes compared to e.g. copying a single byte at a time. However at some point, going back "far away" in memory might hurt cache locality.

I confirmed that run-time of m.repeat(n) with larger m or n gets worse.

m: Vec<u8>

len(m) n ns / op Ratio to n = 2 Increase rate
16 2 19 1.0
16 4 24 1.3 1.3
16 8 26 1.4 1.1
16 16 29 1.5 1.1
16 32 32 1.7 1.1
16 64 37 1.9 1.2
16 128 70 3.7 1.9
16 256 91 4.8 1.3
16 512 134 7.1 1.5
16 1024 270 14.2 2.0
16 2048 548 28.8 2.0
16 4096 1482 78.0 2.7
16 8192 3137 165.1 2.1
16 16384 6592 346.9 2.1
16 32768 14912 784.8 2.3
16 65536 33446 1760.3 2.2
16 131072 69184 3641.3 2.1
16 262144 195495 10289.2 2.8
16 524288 420965 22156.1 2.2
16 1048576 1099518 57869.4 2.6
16 2097152 5504275 289698.7 5.0
16 4194304 10739572 565240.6 2.0
16 8388608 21194699 1115510.5 2.0
16 16777216 42588466 2241498.2 2.0
16 33554432 85833026 4517527.7 2.0
16 67108864 173012397 9105915.6 2.0
16 134217728 344958449 18155707.8 2.0
16 268435456 702324815 36964463.9 2.0
16 536870912 1426503968 75079156.2 2.0
128 2 21 1.0
128 4 24 1.1 1.1
128 8 31 1.5 1.3
128 16 63 3.0 2.0
128 32 72 3.4 1.1
128 64 128 6.1 1.8
128 128 194 9.2 1.5
128 256 536 25.5 2.8
128 512 1487 70.8 2.8
128 1024 3144 149.7 2.1
128 2048 6602 314.4 2.1
128 4096 15307 728.9 2.3
128 8192 34233 1630.1 2.2
128 16384 69279 3299.0 2.0
128 32768 200694 9556.9 2.9
128 65536 420488 20023.2 2.1
128 131072 1096523 52215.4 2.6
128 262144 5527507 263214.6 5.0
128 524288 10733631 511125.3 1.9
128 1048576 21219285 1010442.1 2.0
128 2097152 42514794 2024514.0 2.0
128 4194304 85295812 4061705.3 2.0
128 8388608 170770953 8131950.1 2.0
128 16777216 342405831 16305039.6 2.0
128 33554432 698031506 33239595.5 2.0
128 67108864 1420857974 67659903.5 2.0
256 2 22 1.0
256 4 31 1.4 1.4
256 8 68 3.1 2.2
256 16 69 3.1 1.0
256 32 126 5.7 1.8
256 64 191 8.7 1.5
256 128 536 24.4 2.8
256 256 1251 56.9 2.3
256 512 2655 120.7 2.1
256 1024 5754 261.5 2.2
256 2048 13505 613.9 2.3
256 4096 31159 1416.3 2.3
256 8192 64759 2943.6 2.1
256 16384 179837 8174.4 2.8
256 32768 404020 18364.5 2.2
256 65536 1083392 49245.1 2.7
256 131072 5523028 251046.7 5.1
256 262144 10702046 486456.6 1.9
256 524288 21279790 967263.2 2.0
256 1048576 42513553 1932434.2 2.0
256 2097152 85352125 3879642.0 2.0
256 4194304 171699467 7804521.2 2.0
256 8388608 342988128 15590369.5 2.0
256 16777216 696892564 31676934.7 2.0
256 33554432 1422522241 64660101.9 2.0
1024 2 31 1.0
1024 4 64 2.1 2.1
1024 8 79 2.5 1.2
1024 16 128 4.1 1.6
1024 32 203 6.5 1.6
1024 64 419 13.5 2.1
1024 128 1229 39.6 2.9
1024 256 2688 86.7 2.2
1024 512 5720 184.5 2.1
1024 1024 12629 407.4 2.2
1024 2048 28684 925.3 2.3
1024 4096 59888 1931.9 2.1
1024 8192 149524 4823.4 2.5
1024 16384 323578 10438.0 2.2
1024 32768 886780 28605.8 2.7
1024 65536 2105535 67920.5 2.4
1024 131072 8995326 290171.8 4.3
1024 262144 17833419 575271.6 2.0
1024 524288 35786350 1154398.4 2.0
1024 1048576 70596109 2277293.8 2.0
1024 2097152 140652832 4537188.1 2.0
1024 4194304 282462549 9111695.1 2.0
1024 8388608 599121203 19326490.4 2.1

@SimonSapin
Copy link
Contributor

@sinkuu Would you be interested in working on a PR to improve the implementation?

@bors bors closed this as completed in 6767d9b Oct 11, 2019
@sinkuu
Copy link
Contributor

sinkuu commented Oct 11, 2019

@SimonSapin Yes, I'll try to add the fallback to the old impl for large parameters.

@SimonSapin
Copy link
Contributor

It may be worth looking at the number of bytes being copied per loop iteration, rather than simply the number of repetitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.