std: rename MemWriter to SeekableMemWriter, add seekless MemWriter #15915

erickt · 2014-07-23T14:07:27Z

std: rename MemWriter to SeekableMemWriter, add seekless MemWriter

Not all users of MemWriter need to seek, but having MemWriter seekable adds between 3-29% in overhead in certain circumstances. This fixes that performance gap by making a non-seekable MemWriter, and creating a new SeekableMemWriter for those circumstances when that functionality is actually needed.

test io::mem::test::bench_buf_reader                        ... bench:       682 ns/iter (+/- 85)
test io::mem::test::bench_buf_writer                        ... bench:       580 ns/iter (+/- 57)
test io::mem::test::bench_mem_reader                        ... bench:       793 ns/iter (+/- 99)
test io::mem::test::bench_mem_writer_001_0000               ... bench:        48 ns/iter (+/- 27)
test io::mem::test::bench_mem_writer_001_0010               ... bench:        65 ns/iter (+/- 27) = 153 MB/s
test io::mem::test::bench_mem_writer_001_0100               ... bench:       132 ns/iter (+/- 12) = 757 MB/s
test io::mem::test::bench_mem_writer_001_1000               ... bench:       802 ns/iter (+/- 151) = 1246 MB/s
test io::mem::test::bench_mem_writer_100_0000               ... bench:       481 ns/iter (+/- 28)
test io::mem::test::bench_mem_writer_100_0010               ... bench:      1957 ns/iter (+/- 126) = 510 MB/s
test io::mem::test::bench_mem_writer_100_0100               ... bench:      8222 ns/iter (+/- 434) = 1216 MB/s
test io::mem::test::bench_mem_writer_100_1000               ... bench:     82496 ns/iter (+/- 11191) = 1212 MB/s
test io::mem::test::bench_seekable_mem_writer_001_0000      ... bench:        48 ns/iter (+/- 2)
test io::mem::test::bench_seekable_mem_writer_001_0010      ... bench:        64 ns/iter (+/- 2) = 156 MB/s
test io::mem::test::bench_seekable_mem_writer_001_0100      ... bench:       129 ns/iter (+/- 7) = 775 MB/s
test io::mem::test::bench_seekable_mem_writer_001_1000      ... bench:       801 ns/iter (+/- 159) = 1248 MB/s
test io::mem::test::bench_seekable_mem_writer_100_0000      ... bench:       711 ns/iter (+/- 51)
test io::mem::test::bench_seekable_mem_writer_100_0010      ... bench:      2532 ns/iter (+/- 227) = 394 MB/s
test io::mem::test::bench_seekable_mem_writer_100_0100      ... bench:      8962 ns/iter (+/- 947) = 1115 MB/s
test io::mem::test::bench_seekable_mem_writer_100_1000      ... bench:     85086 ns/iter (+/- 11555) = 1175 MB/s

erickt · 2014-07-23T14:08:13Z

cc @cmr, who originally proposed this idea.

alexcrichton · 2014-07-23T14:12:16Z

Before we split the types, have you looked into optimizing MemWriter as-is? For example, was this too slow?

fn write() {
    if ever_seeked { /* slower write() */ }
    else { self.buf.push_all(buf); Ok(()) }
}

erickt · 2014-07-23T15:00:57Z

@alexcrichton: I forgot to mention I did some benchmarking of adding a hot path to the original MemWriter, but it didn't seem to do much:

impl Writer for SeekableMemWriter {
    #[inline]
    fn write(&mut self, buf: &[u8]) -> IoResult<()> {
        if self.pos == self.buf.len() {
            self.buf.push_all(buf)
        } else {
            // Make sure the internal buffer is as least as big as where we
            // currently are
            let difference = self.pos as i64 - self.buf.len() as i64;
            if difference > 0 {
                self.buf.grow(difference as uint, &0);
            }

            // Figure out what bytes will be used to overwrite what's currently
            // there (left), and what will be appended on the end (right)
            let cap = self.buf.len() - self.pos;
            let (left, right) = if cap <= buf.len() {
                (buf.slice_to(cap), buf.slice_from(cap))
            } else {
                (buf, &[])
            };

            // Do the necessary writes
            if left.len() > 0 {
                slice::bytes::copy_memory(self.buf.mut_slice_from(self.pos), left);
            }
            if right.len() > 0 {
                self.buf.push_all(right);
            }
        }

        // Bump us forward
        self.pos += buf.len();
        Ok(())
    }
}

Here are the results from the benchmarks with it:

test io::mem::test::bench_mem_writer_001_0000               ... bench:        50 ns/iter (+/- 4)
test io::mem::test::bench_mem_writer_001_0010               ... bench:        63 ns/iter (+/- 6) = 158 MB/s
test io::mem::test::bench_mem_writer_001_0100               ... bench:       127 ns/iter (+/- 12) = 787 MB/s
test io::mem::test::bench_mem_writer_001_1000               ... bench:       772 ns/iter (+/- 23) = 1295 MB/s
test io::mem::test::bench_mem_writer_100_0000               ... bench:       513 ns/iter (+/- 33)
test io::mem::test::bench_mem_writer_100_0010               ... bench:      2045 ns/iter (+/- 132) = 488 MB/s
test io::mem::test::bench_mem_writer_100_0100               ... bench:      8409 ns/iter (+/- 553) = 1189 MB/s
test io::mem::test::bench_mem_writer_100_1000               ... bench:     84410 ns/iter (+/- 6155) = 1184 MB/s
test io::mem::test::bench_seekable_mem_writer_001_0000      ... bench:        51 ns/iter (+/- 3)
test io::mem::test::bench_seekable_mem_writer_001_0010      ... bench:        66 ns/iter (+/- 5) = 151 MB/s
test io::mem::test::bench_seekable_mem_writer_001_0100      ... bench:       137 ns/iter (+/- 12) = 729 MB/s
test io::mem::test::bench_seekable_mem_writer_001_1000      ... bench:       798 ns/iter (+/- 54) = 1253 MB/s
test io::mem::test::bench_seekable_mem_writer_100_0000      ... bench:       872 ns/iter (+/- 108)
test io::mem::test::bench_seekable_mem_writer_100_0010      ... bench:      2456 ns/iter (+/- 243) = 407 MB/s
test io::mem::test::bench_seekable_mem_writer_100_0100      ... bench:      9020 ns/iter (+/- 939) = 1108 MB/s
test io::mem::test::bench_seekable_mem_writer_100_1000      ... bench:     87705 ns/iter (+/- 12400) = 1140 MB/s

The biggest wins I've found have been in my https://github.com/erickt/rust-serde project. I've been benchmarking my new json serializer against Go's builtin serializer, Go's ffjson and RapidJSON. Without my PR, or with this hot path, I go from:

...
test bench_log::bench_serializer_mem_writer           ... bench:      3876 ns/iter (+/- 145) = 156 MB/s
...

to this:

...
test bench_log::bench_serializer_mem_writer           ... bench:      2687 ns/iter (+/- 171) = 225 MB/s
...

So it's a pretty substantial win for me. In comparison, ffjson is about 120MB/s, and RapidJSON is at 230MB/s, so we're basically at parity with them without having to codegen serialization-specific code.

alexcrichton · 2014-07-23T15:12:46Z

I would prefer to duplicate those results before resulting to duplication of APIs, this does not seem like a pattern that we wish to encourage in the standard library. I checked out rust-serde and tweaked it to build on master, but there doesn't seem to be a benchmark called bench_serializer_mem_writer.

Is that benchmark part of a tweak which hasn't been uploaded yet?

erickt · 2014-07-23T18:02:48Z

@alexcrichton: oops, I haven't pushed up the version with bench_serializer_mem_writer yet. I'll get it uploaded tonight. In the meantime, the code is essentially:

#[bench]
fn bench_serializer_mem_writer(b: &mut Bencher) {
    let log = Log::new();
    let json = json::to_vec(&log);
    b.bytes = json.len() as u64;

    b.iter(|| {
        //let _json = json::to_str(&log).unwrap();
        let mut wr = MemWriter0::with_capacity(1024);
        {
            let mut serializer = json::Serializer::new(wr.by_ref());
            log.serialize(&mut serializer).unwrap();
        }
        let _json = wr.unwrap();
    });
}

Or you could look at bench_serializer2, which has a local copy of my MemWriter rewrite.

sfackler · 2014-07-23T19:34:23Z

Another possible alternative is to just remove the Seek impl from MemWriter. It's only used by the ebml serialization, and that can be restructured to not require Seek - which would also help shrink metadata a bit.

erickt · 2014-07-23T20:19:05Z

@sfackler: that's a good point. I'll also look into that tonight.

alexcrichton · 2014-07-24T00:49:49Z

Ok, I was able to reproduce your results:

test bench_log::bench_serializer0 ... bench:      1618 ns/iter (+/- 30) = 375 MB/s
test bench_log::bench_serializer1 ... bench:      2557 ns/iter (+/- 37) = 237 MB/s
test bench_log::bench_serializer2 ... bench:      1780 ns/iter (+/- 29) = 341 MB/s

The serializer0 benchmark is MemWriter with no seek (write is just push_all), serializer1 is MemWriter in the stdlib as-is today (with seek), and serializer2 is a writer that looks like:

#[inline]                                    
fn write(&mut self, buf: &[u8]) -> io::IoResult<()> {
    if self.pos == uint::MAX {                       
        self.buf.push_all(buf);                      
        return Ok(())                                
    }                                                
    self.write_slow(buf)
}          

#[inline(never)] fn write_slow(...) { ... }

Here the pos is uint::MAX on initialization and if it's ever changed via seek then you fall back to the slower write impl (fast path most of the time).

This tells me a few things:

It's very important that the push_all is inlined for the speedup to occur
Just an if condition is losing 100ns in the "inner loop"
It looks like write is invoked very frequently

I'm not sure if this is a case to hamstring MemWriter (remove Seek altogether), the fast path/slow path version is sufficient, or the serializer you're using should use an internal buffer for performance.

I suppose the lease-painful route would be the fast/slow path, but it wouldn't get to the speed that you're looking at, but perhaps an internal buffer would get there? (unknown)

huonw · 2014-07-24T01:04:39Z

I wonder if you can eek out a little more speed in the fast case with a #[cold] on write_slow? (Just glancing at the playpen now, making that changes seems to reorder branch slightly more nicely.)

alexcrichton · 2014-07-24T01:47:38Z

It may have bought a bit, but it doesn't seem to have made a noticeable impact.

sfackler · 2014-07-24T03:34:49Z

@alexcrichton does any code outside of ebml use the Seek impl?

alexcrichton · 2014-07-24T03:43:06Z

Not that I'm aware of, but one may logically expect memory-based readers/writers to be seekable.

sfackler · 2014-07-24T03:50:18Z

FWIW, Go and Java's equivalents don't seek, and I honestly can't say I've ever wanted to.

alexcrichton · 2014-07-24T14:24:27Z

I was chatting with @erickt on IRC last night, and if there's precedent for not having seek then perhaps it won't be so bad to remove them. Instead of adding SeekableMemWriter to std, though, I'd prefer if it were added to rustc where it's only needed.

lilyball · 2014-07-24T15:21:55Z

I'm inclined to agree, memory writers don't need to be seekable. Their primary job is to buffer writes. If I need to seek, I would also think I may already know the total size of the buffer as well, at which point I would just use BufWriter. Seekable + growable is perhaps too niche to bother having in libstd.

alexcrichton · 2014-07-27T23:43:11Z

Could you move metadata::util to rustc::util? May as well try to cut down on the number of util modules. Additionally, this will need a [breaking-change] annotation on one of the commits with a clear explanation as to why Seek was removed and how to migrate code that was using Seek.

Not all users of MemWriter need to seek, but having MemWriter seekable adds between 3-29% in overhead in certain circumstances. This fixes that performance gap by making a non-seekable MemWriter, and creating a new SeekableMemWriter for those circumstances when that functionality is actually needed. ``` test io::mem::test::bench_buf_reader ... bench: 682 ns/iter (+/- 85) test io::mem::test::bench_buf_writer ... bench: 580 ns/iter (+/- 57) test io::mem::test::bench_mem_reader ... bench: 793 ns/iter (+/- 99) test io::mem::test::bench_mem_writer_001_0000 ... bench: 48 ns/iter (+/- 27) test io::mem::test::bench_mem_writer_001_0010 ... bench: 65 ns/iter (+/- 27) = 153 MB/s test io::mem::test::bench_mem_writer_001_0100 ... bench: 132 ns/iter (+/- 12) = 757 MB/s test io::mem::test::bench_mem_writer_001_1000 ... bench: 802 ns/iter (+/- 151) = 1246 MB/s test io::mem::test::bench_mem_writer_100_0000 ... bench: 481 ns/iter (+/- 28) test io::mem::test::bench_mem_writer_100_0010 ... bench: 1957 ns/iter (+/- 126) = 510 MB/s test io::mem::test::bench_mem_writer_100_0100 ... bench: 8222 ns/iter (+/- 434) = 1216 MB/s test io::mem::test::bench_mem_writer_100_1000 ... bench: 82496 ns/iter (+/- 11191) = 1212 MB/s test io::mem::test::bench_seekable_mem_writer_001_0000 ... bench: 48 ns/iter (+/- 2) test io::mem::test::bench_seekable_mem_writer_001_0010 ... bench: 64 ns/iter (+/- 2) = 156 MB/s test io::mem::test::bench_seekable_mem_writer_001_0100 ... bench: 129 ns/iter (+/- 7) = 775 MB/s test io::mem::test::bench_seekable_mem_writer_001_1000 ... bench: 801 ns/iter (+/- 159) = 1248 MB/s test io::mem::test::bench_seekable_mem_writer_100_0000 ... bench: 711 ns/iter (+/- 51) test io::mem::test::bench_seekable_mem_writer_100_0010 ... bench: 2532 ns/iter (+/- 227) = 394 MB/s test io::mem::test::bench_seekable_mem_writer_100_0100 ... bench: 8962 ns/iter (+/- 947) = 1115 MB/s test io::mem::test::bench_seekable_mem_writer_100_1000 ... bench: 85086 ns/iter (+/- 11555) = 1175 MB/s ``` [breaking-change]

erickt · 2014-07-30T13:26:19Z

@bors: retry

std: rename MemWriter to SeekableMemWriter, add seekless MemWriter Not all users of MemWriter need to seek, but having MemWriter seekable adds between 3-29% in overhead in certain circumstances. This fixes that performance gap by making a non-seekable MemWriter, and creating a new SeekableMemWriter for those circumstances when that functionality is actually needed. ``` test io::mem::test::bench_buf_reader ... bench: 682 ns/iter (+/- 85) test io::mem::test::bench_buf_writer ... bench: 580 ns/iter (+/- 57) test io::mem::test::bench_mem_reader ... bench: 793 ns/iter (+/- 99) test io::mem::test::bench_mem_writer_001_0000 ... bench: 48 ns/iter (+/- 27) test io::mem::test::bench_mem_writer_001_0010 ... bench: 65 ns/iter (+/- 27) = 153 MB/s test io::mem::test::bench_mem_writer_001_0100 ... bench: 132 ns/iter (+/- 12) = 757 MB/s test io::mem::test::bench_mem_writer_001_1000 ... bench: 802 ns/iter (+/- 151) = 1246 MB/s test io::mem::test::bench_mem_writer_100_0000 ... bench: 481 ns/iter (+/- 28) test io::mem::test::bench_mem_writer_100_0010 ... bench: 1957 ns/iter (+/- 126) = 510 MB/s test io::mem::test::bench_mem_writer_100_0100 ... bench: 8222 ns/iter (+/- 434) = 1216 MB/s test io::mem::test::bench_mem_writer_100_1000 ... bench: 82496 ns/iter (+/- 11191) = 1212 MB/s test io::mem::test::bench_seekable_mem_writer_001_0000 ... bench: 48 ns/iter (+/- 2) test io::mem::test::bench_seekable_mem_writer_001_0010 ... bench: 64 ns/iter (+/- 2) = 156 MB/s test io::mem::test::bench_seekable_mem_writer_001_0100 ... bench: 129 ns/iter (+/- 7) = 775 MB/s test io::mem::test::bench_seekable_mem_writer_001_1000 ... bench: 801 ns/iter (+/- 159) = 1248 MB/s test io::mem::test::bench_seekable_mem_writer_100_0000 ... bench: 711 ns/iter (+/- 51) test io::mem::test::bench_seekable_mem_writer_100_0010 ... bench: 2532 ns/iter (+/- 227) = 394 MB/s test io::mem::test::bench_seekable_mem_writer_100_0100 ... bench: 8962 ns/iter (+/- 947) = 1115 MB/s test io::mem::test::bench_seekable_mem_writer_100_1000 ... bench: 85086 ns/iter (+/- 11555) = 1175 MB/s ```

Its seekability wasn’t being used, and so in light of the changes made in rust-lang/rust#15915 it should be removed, rather than switching from the now-unseekable ``MemWriter`` to the new ``SeekableMemWriter``. This necessitated adding the ``get_ref`` method which was added to ``MemWriter`` at some point.

erickt mentioned this pull request Jul 24, 2014

impl Reader/Writer for Vec<u8> and &[u8], get rid of {Buf,Mem}{Reader,Writer} #15946

Closed

erickt added 4 commits July 29, 2014 15:50

Fix a whitespace typo

a011b22

serialize: fix a warning

ce2824d

std: Make MemWriter clonable

2bcb4bd

erickt mentioned this pull request Jul 30, 2014

remove serialize::ebml, add librbml #16107

Closed

bors closed this Jul 30, 2014

bors merged commit 2bcb4bd into rust-lang:master Jul 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std: rename MemWriter to SeekableMemWriter, add seekless MemWriter #15915

std: rename MemWriter to SeekableMemWriter, add seekless MemWriter #15915

erickt commented Jul 23, 2014

erickt commented Jul 23, 2014

alexcrichton commented Jul 23, 2014

erickt commented Jul 23, 2014

alexcrichton commented Jul 23, 2014

erickt commented Jul 23, 2014

sfackler commented Jul 23, 2014

erickt commented Jul 23, 2014

alexcrichton commented Jul 24, 2014

huonw commented Jul 24, 2014

alexcrichton commented Jul 24, 2014

sfackler commented Jul 24, 2014

alexcrichton commented Jul 24, 2014

sfackler commented Jul 24, 2014

alexcrichton commented Jul 24, 2014

lilyball commented Jul 24, 2014

alexcrichton commented Jul 27, 2014

erickt commented Jul 30, 2014

std: rename MemWriter to SeekableMemWriter, add seekless MemWriter #15915

std: rename MemWriter to SeekableMemWriter, add seekless MemWriter #15915

Conversation

erickt commented Jul 23, 2014

erickt commented Jul 23, 2014

alexcrichton commented Jul 23, 2014

erickt commented Jul 23, 2014

alexcrichton commented Jul 23, 2014

erickt commented Jul 23, 2014

sfackler commented Jul 23, 2014

erickt commented Jul 23, 2014

alexcrichton commented Jul 24, 2014

huonw commented Jul 24, 2014

alexcrichton commented Jul 24, 2014

sfackler commented Jul 24, 2014

alexcrichton commented Jul 24, 2014

sfackler commented Jul 24, 2014

alexcrichton commented Jul 24, 2014

lilyball commented Jul 24, 2014

alexcrichton commented Jul 27, 2014

erickt commented Jul 30, 2014