Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: rename MemWriter to SeekableMemWriter, add seekless MemWriter #15915

Merged
merged 4 commits into from
Jul 30, 2014

Conversation

erickt
Copy link
Contributor

@erickt erickt commented Jul 23, 2014

std: rename MemWriter to SeekableMemWriter, add seekless MemWriter

Not all users of MemWriter need to seek, but having MemWriter seekable adds between 3-29% in overhead in certain circumstances. This fixes that performance gap by making a non-seekable MemWriter, and creating a new SeekableMemWriter for those circumstances when that functionality is actually needed.

test io::mem::test::bench_buf_reader                        ... bench:       682 ns/iter (+/- 85)
test io::mem::test::bench_buf_writer                        ... bench:       580 ns/iter (+/- 57)
test io::mem::test::bench_mem_reader                        ... bench:       793 ns/iter (+/- 99)
test io::mem::test::bench_mem_writer_001_0000               ... bench:        48 ns/iter (+/- 27)
test io::mem::test::bench_mem_writer_001_0010               ... bench:        65 ns/iter (+/- 27) = 153 MB/s
test io::mem::test::bench_mem_writer_001_0100               ... bench:       132 ns/iter (+/- 12) = 757 MB/s
test io::mem::test::bench_mem_writer_001_1000               ... bench:       802 ns/iter (+/- 151) = 1246 MB/s
test io::mem::test::bench_mem_writer_100_0000               ... bench:       481 ns/iter (+/- 28)
test io::mem::test::bench_mem_writer_100_0010               ... bench:      1957 ns/iter (+/- 126) = 510 MB/s
test io::mem::test::bench_mem_writer_100_0100               ... bench:      8222 ns/iter (+/- 434) = 1216 MB/s
test io::mem::test::bench_mem_writer_100_1000               ... bench:     82496 ns/iter (+/- 11191) = 1212 MB/s
test io::mem::test::bench_seekable_mem_writer_001_0000      ... bench:        48 ns/iter (+/- 2)
test io::mem::test::bench_seekable_mem_writer_001_0010      ... bench:        64 ns/iter (+/- 2) = 156 MB/s
test io::mem::test::bench_seekable_mem_writer_001_0100      ... bench:       129 ns/iter (+/- 7) = 775 MB/s
test io::mem::test::bench_seekable_mem_writer_001_1000      ... bench:       801 ns/iter (+/- 159) = 1248 MB/s
test io::mem::test::bench_seekable_mem_writer_100_0000      ... bench:       711 ns/iter (+/- 51)
test io::mem::test::bench_seekable_mem_writer_100_0010      ... bench:      2532 ns/iter (+/- 227) = 394 MB/s
test io::mem::test::bench_seekable_mem_writer_100_0100      ... bench:      8962 ns/iter (+/- 947) = 1115 MB/s
test io::mem::test::bench_seekable_mem_writer_100_1000      ... bench:     85086 ns/iter (+/- 11555) = 1175 MB/s

@erickt
Copy link
Contributor Author

erickt commented Jul 23, 2014

cc @cmr, who originally proposed this idea.

@alexcrichton
Copy link
Member

Before we split the types, have you looked into optimizing MemWriter as-is? For example, was this too slow?

fn write() {
    if ever_seeked { /* slower write() */ }
    else { self.buf.push_all(buf); Ok(()) }
}

@erickt
Copy link
Contributor Author

erickt commented Jul 23, 2014

@alexcrichton: I forgot to mention I did some benchmarking of adding a hot path to the original MemWriter, but it didn't seem to do much:

impl Writer for SeekableMemWriter {
    #[inline]
    fn write(&mut self, buf: &[u8]) -> IoResult<()> {
        if self.pos == self.buf.len() {
            self.buf.push_all(buf)
        } else {
            // Make sure the internal buffer is as least as big as where we
            // currently are
            let difference = self.pos as i64 - self.buf.len() as i64;
            if difference > 0 {
                self.buf.grow(difference as uint, &0);
            }

            // Figure out what bytes will be used to overwrite what's currently
            // there (left), and what will be appended on the end (right)
            let cap = self.buf.len() - self.pos;
            let (left, right) = if cap <= buf.len() {
                (buf.slice_to(cap), buf.slice_from(cap))
            } else {
                (buf, &[])
            };

            // Do the necessary writes
            if left.len() > 0 {
                slice::bytes::copy_memory(self.buf.mut_slice_from(self.pos), left);
            }
            if right.len() > 0 {
                self.buf.push_all(right);
            }
        }

        // Bump us forward
        self.pos += buf.len();
        Ok(())
    }
}

Here are the results from the benchmarks with it:

test io::mem::test::bench_mem_writer_001_0000               ... bench:        50 ns/iter (+/- 4)
test io::mem::test::bench_mem_writer_001_0010               ... bench:        63 ns/iter (+/- 6) = 158 MB/s
test io::mem::test::bench_mem_writer_001_0100               ... bench:       127 ns/iter (+/- 12) = 787 MB/s
test io::mem::test::bench_mem_writer_001_1000               ... bench:       772 ns/iter (+/- 23) = 1295 MB/s
test io::mem::test::bench_mem_writer_100_0000               ... bench:       513 ns/iter (+/- 33)
test io::mem::test::bench_mem_writer_100_0010               ... bench:      2045 ns/iter (+/- 132) = 488 MB/s
test io::mem::test::bench_mem_writer_100_0100               ... bench:      8409 ns/iter (+/- 553) = 1189 MB/s
test io::mem::test::bench_mem_writer_100_1000               ... bench:     84410 ns/iter (+/- 6155) = 1184 MB/s
test io::mem::test::bench_seekable_mem_writer_001_0000      ... bench:        51 ns/iter (+/- 3)
test io::mem::test::bench_seekable_mem_writer_001_0010      ... bench:        66 ns/iter (+/- 5) = 151 MB/s
test io::mem::test::bench_seekable_mem_writer_001_0100      ... bench:       137 ns/iter (+/- 12) = 729 MB/s
test io::mem::test::bench_seekable_mem_writer_001_1000      ... bench:       798 ns/iter (+/- 54) = 1253 MB/s
test io::mem::test::bench_seekable_mem_writer_100_0000      ... bench:       872 ns/iter (+/- 108)
test io::mem::test::bench_seekable_mem_writer_100_0010      ... bench:      2456 ns/iter (+/- 243) = 407 MB/s
test io::mem::test::bench_seekable_mem_writer_100_0100      ... bench:      9020 ns/iter (+/- 939) = 1108 MB/s
test io::mem::test::bench_seekable_mem_writer_100_1000      ... bench:     87705 ns/iter (+/- 12400) = 1140 MB/s

The biggest wins I've found have been in my https://github.com/erickt/rust-serde project. I've been benchmarking my new json serializer against Go's builtin serializer, Go's ffjson and RapidJSON. Without my PR, or with this hot path, I go from:

...
test bench_log::bench_serializer_mem_writer           ... bench:      3876 ns/iter (+/- 145) = 156 MB/s
...

to this:

...
test bench_log::bench_serializer_mem_writer           ... bench:      2687 ns/iter (+/- 171) = 225 MB/s
...

So it's a pretty substantial win for me. In comparison, ffjson is about 120MB/s, and RapidJSON is at 230MB/s, so we're basically at parity with them without having to codegen serialization-specific code.

@alexcrichton
Copy link
Member

I would prefer to duplicate those results before resulting to duplication of APIs, this does not seem like a pattern that we wish to encourage in the standard library. I checked out rust-serde and tweaked it to build on master, but there doesn't seem to be a benchmark called bench_serializer_mem_writer.

Is that benchmark part of a tweak which hasn't been uploaded yet?

@erickt
Copy link
Contributor Author

erickt commented Jul 23, 2014

@alexcrichton: oops, I haven't pushed up the version with bench_serializer_mem_writer yet. I'll get it uploaded tonight. In the meantime, the code is essentially:

#[bench]
fn bench_serializer_mem_writer(b: &mut Bencher) {
    let log = Log::new();
    let json = json::to_vec(&log);
    b.bytes = json.len() as u64;

    b.iter(|| {
        //let _json = json::to_str(&log).unwrap();
        let mut wr = MemWriter0::with_capacity(1024);
        {
            let mut serializer = json::Serializer::new(wr.by_ref());
            log.serialize(&mut serializer).unwrap();
        }
        let _json = wr.unwrap();
    });
}

Or you could look at bench_serializer2, which has a local copy of my MemWriter rewrite.

@sfackler
Copy link
Member

Another possible alternative is to just remove the Seek impl from MemWriter. It's only used by the ebml serialization, and that can be restructured to not require Seek - which would also help shrink metadata a bit.

@erickt
Copy link
Contributor Author

erickt commented Jul 23, 2014

@sfackler: that's a good point. I'll also look into that tonight.

@alexcrichton
Copy link
Member

Ok, I was able to reproduce your results:

test bench_log::bench_serializer0 ... bench:      1618 ns/iter (+/- 30) = 375 MB/s
test bench_log::bench_serializer1 ... bench:      2557 ns/iter (+/- 37) = 237 MB/s
test bench_log::bench_serializer2 ... bench:      1780 ns/iter (+/- 29) = 341 MB/s

The serializer0 benchmark is MemWriter with no seek (write is just push_all), serializer1 is MemWriter in the stdlib as-is today (with seek), and serializer2 is a writer that looks like:

#[inline]                                    
fn write(&mut self, buf: &[u8]) -> io::IoResult<()> {
    if self.pos == uint::MAX {                       
        self.buf.push_all(buf);                      
        return Ok(())                                
    }                                                
    self.write_slow(buf)
}          

#[inline(never)] fn write_slow(...) { ... }       

Here the pos is uint::MAX on initialization and if it's ever changed via seek then you fall back to the slower write impl (fast path most of the time).

This tells me a few things:

  1. It's very important that the push_all is inlined for the speedup to occur
  2. Just an if condition is losing 100ns in the "inner loop"
  3. It looks like write is invoked very frequently

I'm not sure if this is a case to hamstring MemWriter (remove Seek altogether), the fast path/slow path version is sufficient, or the serializer you're using should use an internal buffer for performance.

I suppose the lease-painful route would be the fast/slow path, but it wouldn't get to the speed that you're looking at, but perhaps an internal buffer would get there? (unknown)

@huonw
Copy link
Member

huonw commented Jul 24, 2014

I wonder if you can eek out a little more speed in the fast case with a #[cold] on write_slow? (Just glancing at the playpen now, making that changes seems to reorder branch slightly more nicely.)

@alexcrichton
Copy link
Member

It may have bought a bit, but it doesn't seem to have made a noticeable impact.

@sfackler
Copy link
Member

@alexcrichton does any code outside of ebml use the Seek impl?

@alexcrichton
Copy link
Member

Not that I'm aware of, but one may logically expect memory-based readers/writers to be seekable.

@sfackler
Copy link
Member

FWIW, Go and Java's equivalents don't seek, and I honestly can't say I've ever wanted to.

@alexcrichton
Copy link
Member

I was chatting with @erickt on IRC last night, and if there's precedent for not having seek then perhaps it won't be so bad to remove them. Instead of adding SeekableMemWriter to std, though, I'd prefer if it were added to rustc where it's only needed.

@lilyball
Copy link
Contributor

I'm inclined to agree, memory writers don't need to be seekable. Their primary job is to buffer writes. If I need to seek, I would also think I may already know the total size of the buffer as well, at which point I would just use BufWriter. Seekable + growable is perhaps too niche to bother having in libstd.

@alexcrichton
Copy link
Member

Could you move metadata::util to rustc::util? May as well try to cut down on the number of util modules. Additionally, this will need a [breaking-change] annotation on one of the commits with a clear explanation as to why Seek was removed and how to migrate code that was using Seek.

erickt added 4 commits July 29, 2014 15:50
Not all users of MemWriter need to seek, but having MemWriter
seekable adds between 3-29% in overhead in certain circumstances.
This fixes that performance gap by making a non-seekable MemWriter,
and creating a new SeekableMemWriter for those circumstances when
that functionality is actually needed.

```
test io::mem::test::bench_buf_reader                        ... bench:       682 ns/iter (+/- 85)
test io::mem::test::bench_buf_writer                        ... bench:       580 ns/iter (+/- 57)
test io::mem::test::bench_mem_reader                        ... bench:       793 ns/iter (+/- 99)
test io::mem::test::bench_mem_writer_001_0000               ... bench:        48 ns/iter (+/- 27)
test io::mem::test::bench_mem_writer_001_0010               ... bench:        65 ns/iter (+/- 27) = 153 MB/s
test io::mem::test::bench_mem_writer_001_0100               ... bench:       132 ns/iter (+/- 12) = 757 MB/s
test io::mem::test::bench_mem_writer_001_1000               ... bench:       802 ns/iter (+/- 151) = 1246 MB/s
test io::mem::test::bench_mem_writer_100_0000               ... bench:       481 ns/iter (+/- 28)
test io::mem::test::bench_mem_writer_100_0010               ... bench:      1957 ns/iter (+/- 126) = 510 MB/s
test io::mem::test::bench_mem_writer_100_0100               ... bench:      8222 ns/iter (+/- 434) = 1216 MB/s
test io::mem::test::bench_mem_writer_100_1000               ... bench:     82496 ns/iter (+/- 11191) = 1212 MB/s
test io::mem::test::bench_seekable_mem_writer_001_0000      ... bench:        48 ns/iter (+/- 2)
test io::mem::test::bench_seekable_mem_writer_001_0010      ... bench:        64 ns/iter (+/- 2) = 156 MB/s
test io::mem::test::bench_seekable_mem_writer_001_0100      ... bench:       129 ns/iter (+/- 7) = 775 MB/s
test io::mem::test::bench_seekable_mem_writer_001_1000      ... bench:       801 ns/iter (+/- 159) = 1248 MB/s
test io::mem::test::bench_seekable_mem_writer_100_0000      ... bench:       711 ns/iter (+/- 51)
test io::mem::test::bench_seekable_mem_writer_100_0010      ... bench:      2532 ns/iter (+/- 227) = 394 MB/s
test io::mem::test::bench_seekable_mem_writer_100_0100      ... bench:      8962 ns/iter (+/- 947) = 1115 MB/s
test io::mem::test::bench_seekable_mem_writer_100_1000      ... bench:     85086 ns/iter (+/- 11555) = 1175 MB/s
```

[breaking-change]
@erickt
Copy link
Contributor Author

erickt commented Jul 30, 2014

@bors: retry

bors added a commit that referenced this pull request Jul 30, 2014
std: rename MemWriter to SeekableMemWriter, add seekless MemWriter

Not all users of MemWriter need to seek, but having MemWriter seekable adds between 3-29% in overhead in certain circumstances. This fixes that performance gap by making a non-seekable MemWriter, and creating a new SeekableMemWriter for those circumstances when that functionality is actually needed.

```
test io::mem::test::bench_buf_reader                        ... bench:       682 ns/iter (+/- 85)
test io::mem::test::bench_buf_writer                        ... bench:       580 ns/iter (+/- 57)
test io::mem::test::bench_mem_reader                        ... bench:       793 ns/iter (+/- 99)
test io::mem::test::bench_mem_writer_001_0000               ... bench:        48 ns/iter (+/- 27)
test io::mem::test::bench_mem_writer_001_0010               ... bench:        65 ns/iter (+/- 27) = 153 MB/s
test io::mem::test::bench_mem_writer_001_0100               ... bench:       132 ns/iter (+/- 12) = 757 MB/s
test io::mem::test::bench_mem_writer_001_1000               ... bench:       802 ns/iter (+/- 151) = 1246 MB/s
test io::mem::test::bench_mem_writer_100_0000               ... bench:       481 ns/iter (+/- 28)
test io::mem::test::bench_mem_writer_100_0010               ... bench:      1957 ns/iter (+/- 126) = 510 MB/s
test io::mem::test::bench_mem_writer_100_0100               ... bench:      8222 ns/iter (+/- 434) = 1216 MB/s
test io::mem::test::bench_mem_writer_100_1000               ... bench:     82496 ns/iter (+/- 11191) = 1212 MB/s
test io::mem::test::bench_seekable_mem_writer_001_0000      ... bench:        48 ns/iter (+/- 2)
test io::mem::test::bench_seekable_mem_writer_001_0010      ... bench:        64 ns/iter (+/- 2) = 156 MB/s
test io::mem::test::bench_seekable_mem_writer_001_0100      ... bench:       129 ns/iter (+/- 7) = 775 MB/s
test io::mem::test::bench_seekable_mem_writer_001_1000      ... bench:       801 ns/iter (+/- 159) = 1248 MB/s
test io::mem::test::bench_seekable_mem_writer_100_0000      ... bench:       711 ns/iter (+/- 51)
test io::mem::test::bench_seekable_mem_writer_100_0010      ... bench:      2532 ns/iter (+/- 227) = 394 MB/s
test io::mem::test::bench_seekable_mem_writer_100_0100      ... bench:      8962 ns/iter (+/- 947) = 1115 MB/s
test io::mem::test::bench_seekable_mem_writer_100_1000      ... bench:     85086 ns/iter (+/- 11555) = 1175 MB/s
```
@bors bors closed this Jul 30, 2014
@bors bors merged commit 2bcb4bd into rust-lang:master Jul 30, 2014
chris-morgan added a commit to chris-morgan/rust-http that referenced this pull request Aug 1, 2014
Its seekability wasn’t being used, and so in light of the changes made
in rust-lang/rust#15915 it should be removed, rather than switching from
the now-unseekable ``MemWriter`` to the new ``SeekableMemWriter``.

This necessitated adding the ``get_ref`` method which was added to
``MemWriter`` at some point.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants