deserialization perf improvement #81

zigzago · 2018-06-13T19:50:06Z

I've done a deserialization perf benchmark with JMH and I've found that most of the time is consumed in LittleEndianInputStream#readUTF. With this PR, I notice a ~15% improvement.

Instead of creating a new decoder each time, cache & reuse in StaticBuffers (the main improvement)
There is always an underflow at the end of the decode. If len == 0, do not compact as it is not useful (and avoid a system array copy and a byte array allocation)
LittleEndianInputStream is only used in BsonParser, that is not thread safe. So it seems reasonable to cache StaticBuffer in the stream and avoid a ThreadLocal call for each string deserialization (except the first one).

HTH

michel-kraemer · 2018-06-15T05:40:26Z

Thanks for the PR. It looks very promising. I would like to make some experiments myself before merging it. I'll get back to you as soon as possible.

michel-kraemer · 2018-06-15T06:20:02Z

By the way, a 15% improvement is a lot. Do you have a test script or a test dataset, so I can probably reproduce the benchmark?

zigzago · 2018-06-15T20:01:45Z

My benchmark targets KMongo : CodecRegistryBenchmark & KMongoBenchmark, and is copied from Jongo benchmark

You can see in this first result that for "jacksonFriend" the score is 123775 (current bson4jackson version). With the patch, I get ~142000

michel-kraemer · 2018-06-21T05:32:23Z

Thanks for the test data. I made my own benchmarks (with your test data) and was able to achieve an improvement of at most 2%-3%. Sometimes I didn't see any improvement at all or it was within the error margin. This is probably due to the way we tested. I didn't use a benchmark framework - I just used a for loop and measured the ms elapsed.

Anyway, I'm still thinking about the implications of the change. Since you're caching StaticBuffers in LittleEndianInputStream, it may behave differently in a multi-threaded environment. In practise, this should not make a difference, since it's an internal class, but you never know who uses it where and for what purpose. So, I'm wondering if this is a breaking change and if it's worth the improvement (if there is any at all).

Here's my benchmark code so you can reproduce my tests:

@Test
public void parseFriend() throws Exception {
  long start = System.currentTimeMillis();
  for (int i = 0; i < 100000; i++) {
    ByteArrayInputStream bais = new ByteArrayInputStream(FRIEND_AS_BYTE);
    BsonFactory fac = new BsonFactory();
    ObjectMapper mapper = new ObjectMapper(fac);
    fac.setCodec(mapper);
    Map<?, ?> m = mapper.readValue(bais, Map.class);
  }
  System.out.println((System.currentTimeMillis() - start));

  start = System.currentTimeMillis();
  for (int i = 0; i < 100000; i++) {
    ByteArrayInputStream bais = new ByteArrayInputStream(FRIEND_AS_BYTE);
    BsonFactory fac = new BsonFactory();
    ObjectMapper mapper = new ObjectMapper(fac);
    fac.setCodec(mapper);
    Map<?, ?> m = mapper.readValue(bais, Map.class);
  }
  System.out.println((System.currentTimeMillis() - start));

  start = System.currentTimeMillis();
  for (int i = 0; i < 100000; i++) {
    ByteArrayInputStream bais = new ByteArrayInputStream(FRIEND_AS_BYTE);
    BsonFactory fac = new BsonFactory();
    ObjectMapper mapper = new ObjectMapper(fac);
    fac.setCodec(mapper);
    Map<?, ?> m = mapper.readValue(bais, Map.class);
  }
  System.out.println((System.currentTimeMillis() - start));

  start = System.currentTimeMillis();
  for (int i = 0; i < 100000; i++) {
    ByteArrayInputStream bais = new ByteArrayInputStream(FRIEND_AS_BYTE);
    BsonFactory fac = new BsonFactory();
    ObjectMapper mapper = new ObjectMapper(fac);
    fac.setCodec(mapper);
    Map<?, ?> m = mapper.readValue(bais, Map.class);
  }
  System.out.println((System.currentTimeMillis() - start));
}

Maybe I did something wrong and my tests are useless? :) Let me know, what you think.

zigzago · 2018-06-21T22:52:36Z

I think there is an issue with your test. In a real app, you have usually only one ObjectMapper for the app as it is thread safe and costly to instantiate. Here, you measure more the ObjectMapper instantiation cost than the bson deserialization perf.

For my point of view, a more accurate test would look like this:

   @Test
    public void parseFriend() throws Exception {
        BsonFactory fac = new BsonFactory();
        ObjectMapper mapper = new ObjectMapper(fac);
        fac.setCodec(mapper);

        long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            ByteArrayInputStream bais = new ByteArrayInputStream(FRIEND_AS_BYTE);

            Map<?, ?> m = mapper.readValue(bais, Map.class);
        }
        System.out.println((System.currentTimeMillis() - start));

        start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            ByteArrayInputStream bais = new ByteArrayInputStream(FRIEND_AS_BYTE);
            Map<?, ?> m = mapper.readValue(bais, Map.class);
        }
        System.out.println((System.currentTimeMillis() - start));

        start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            ByteArrayInputStream bais = new ByteArrayInputStream(FRIEND_AS_BYTE);
            Map<?, ?> m = mapper.readValue(bais, Map.class);
        }
        System.out.println((System.currentTimeMillis() - start));

        start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            ByteArrayInputStream bais = new ByteArrayInputStream(FRIEND_AS_BYTE);
            Map<?, ?> m = mapper.readValue(bais, Map.class);
        }
        System.out.println((System.currentTimeMillis() - start));
    }

And with this fixed test I get better results :)

Without the patch:
6745
5836
5792
5688

With the patch:
5506
4804
4847
4871

Let me know if you get the same results

zigzago · 2018-07-03T20:16:56Z

@michel-kraemer do you reproduce the perf improvement with the updated test? I plan to publish a benchmark soon and it would be great to have this improvement included (if you think it makes sense of course).

Refers to #81

michel-kraemer · 2018-07-28T10:41:10Z

Yes, I was able to reproduce it, but as I said I'm still worried about the possible implications of caching StaticBuffers in LittleEndianInputStream.

I did some experiments and noticed that caching the CharsetDecoder matters the most. The other changes improve performance almost unnoticeable (if at all). So, I just pushed 32d2ab3. Please have a look at it.

stale · 2018-09-26T11:22:04Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

stale · 2018-10-10T11:55:04Z

This issue has been automatically closed due to inactivity. Please feel free to reopen it if necessary. We apologize for any inconvenience caused.

deserialization perf improvement

17f711a

michel-kraemer added a commit that referenced this pull request Jul 28, 2018

Improve performance by caching UTF-8 decoder

32d2ab3

Refers to #81

stale bot added the stale label Sep 26, 2018

stale bot closed this Oct 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deserialization perf improvement #81

deserialization perf improvement #81

zigzago commented Jun 13, 2018 •

edited

Loading

michel-kraemer commented Jun 15, 2018

michel-kraemer commented Jun 15, 2018

zigzago commented Jun 15, 2018

michel-kraemer commented Jun 21, 2018

zigzago commented Jun 21, 2018 •

edited

Loading

zigzago commented Jul 3, 2018

michel-kraemer commented Jul 28, 2018

stale bot commented Sep 26, 2018

stale bot commented Oct 10, 2018

deserialization perf improvement #81

deserialization perf improvement #81

Conversation

zigzago commented Jun 13, 2018 • edited Loading

michel-kraemer commented Jun 15, 2018

michel-kraemer commented Jun 15, 2018

zigzago commented Jun 15, 2018

michel-kraemer commented Jun 21, 2018

zigzago commented Jun 21, 2018 • edited Loading

zigzago commented Jul 3, 2018

michel-kraemer commented Jul 28, 2018

stale bot commented Sep 26, 2018

stale bot commented Oct 10, 2018

zigzago commented Jun 13, 2018 •

edited

Loading

zigzago commented Jun 21, 2018 •

edited

Loading