Test lucene FixedBitSet - Experiment #4384

t2gran · 2022-08-17T13:43:58Z

Summary

Lucene

Norway - Vincent´s machine
 Worker: 
 ==> mc_destination : [  911,  826,  884,  836 ] Avg: 864,3

Total:  
 ==> mc_destination : [ 1397, 1340, 1361, 1319 ] Avg: 1354,3

Baden-wuerttemberg, Performance test 
 Worker: 
 ==> mc_destination : [  614,  607,  604,  603 ] Avg: 607.0

Total:  
 ==> mc_destination : [  775,  765,  762,  760 ] Avg: 765.5

JDK

Norway - Vincent´s machine
 Worker: 
 ==> mc_destination : [  844,  775,  827,  856 ] Avg: 825,5

Total:  
 ==> mc_destination : [ 1321, 1225, 1300, 1292 ] Avg: 1284,5
 
 Baden-wuerttemberg, Performance test 
Worker: 
 ==> mc_destination : [  612,  602,  602,  603 ] Avg: 604.8

Total:  
 ==> mc_destination : [  769,  756,  755,  757 ] Avg: 759.3

Not as good as we thought ...

https://stackoverflow.com/questions/37080045/java-jdk-bitset-vs-lucene-openbitset

hannesj · 2022-08-17T13:55:07Z

src/main/java/org/opentripplanner/transit/raptor/util/LuceneBitSetIterator.java

+  public int next() {
+    int index = nextIndex;
+    nextIndex =
+      index == set.length() - 1 ? DocIdSetIterator.NO_MORE_DOCS : set.nextSetBit(index + 1);


Could you check the performance with SparseFixedBitSet, In that case the nextSetBit could be faster, since it checks each 4096-bit block only if at least one value is set, and it knows directly in which 64-bit segment. the regular FixedBitSet needs to check each block. It might be that linear scanning is still faster, as most of the data is already in the cache.

Seems to be better than org.apache.lucene.util.FixedBitSet, but still worse than java.util.BitSet

java.util.BitSet Worker: ==> mc_destination : [ 464, 441, 440, 444 ] Avg: 447.3 Total: ==> mc_destination : [ 772, 705, 709, 711 ] Avg: 724.3 org.apache.lucene.util.FixedBitSet Worker: ==> mc_destination : [ 480, 459, 465, 466 ] Avg: 467.5 Total: ==> mc_destination : [ 779, 741, 750, 750 ] Avg: 755.0 org.apache.lucene.util.SparseFixedBitSet Worker: ==> mc_destination : [ 480, 447, 449, 452 ] Avg: 457.0 Total: ==> mc_destination : [ 788, 726, 722, 720 ] Avg: 739.0

leonardehrenfried · 2022-08-18T08:55:51Z

Right now, I don't see a noticeable speed up in the speed test in Germany either: https://tinyurl.com/2nhldzku

t2gran · 2022-08-18T10:01:55Z

src/main/java/org/opentripplanner/transit/raptor/rangeraptor/standard/besttimes/BestTimes.java

@@ -66,21 +67,21 @@ public int transitArrivalTime(int stop) {
   * @return true if at least one stop arrival was reached last round (best overall).
   */
  public boolean isCurrentRoundUpdated() {
-    return !reachedCurrentRound.isEmpty();
+    return reachedCurrentRound.cardinality() > 0;


This is slow, but should not have a huge impact. We can look for the first bit set or keep a flag in the BestTimes to improve this.

t2gran · 2022-08-23T11:27:02Z

In the profiling @leonardehrenfried did on the German data a while a go the BestTime.setTime() -> java.util.BitSet.expandTo(int) showed up, but it does not show up in the profiling we have done on the Entur case. Is this something we should investigate more?

leonardehrenfried · 2022-08-23T11:47:50Z

Did I say that? In Grafana I could not see a difference.

t2gran · 2022-08-23T12:27:14Z

I got a profiling jfr file from you a while back witch showed this - at least I though I got it from you.
jar_2022_07_18_144732.jfr.zip

leonardehrenfried · 2022-08-23T12:56:59Z

Ah, I misunderstood. I thought that you referred to the speed test on CI which doesn't run on the data set for entire Germany. I will check that.

t2gran · 2022-08-23T14:31:10Z

It might be that the whole of Germany exceeds a threshold, hence trigger the expandTo. This can probably be avoided with a small change of the java BitSet and improve the performance for very large data sets.

leonardehrenfried · 2022-08-23T14:32:39Z

I'm currently setting up a test scenario for CI where I load all of Germany's transit data.

leonardehrenfried · 2022-08-24T08:55:52Z

I'm adding the Germany-wide data set to the CI speed test in this PR: #4408

Once that is merged, we can run the numbers with this branch and check the performance.

t2gran · 2022-08-24T09:59:30Z

I striped the Java BitSet for all checks and made a simpler version, it would be interesting to test that as well. For Norway it performs exactly as the Java BitSet. The expandTo(int) method is gone in FBitSet (my striped version). I branched of the same commit as the otp2_use_lucene_fixedbitset.

The code is in the otp2_use_stripped_java_bitset branch.

leonardehrenfried · 2022-08-24T10:12:01Z

Ok, if you could give #4408 a review then we have some baseline data to compare it to. The PR also contains the generation of JFR files which IntelliJ can use to show flame graphs.

leonardehrenfried · 2022-08-24T12:02:12Z

The results for all of Germany are in.

dev-2.x

CI run: https://github.com/opentripplanner/OpenTripPlanner/actions/runs/2917645778

 Worker: 
 ==> mc_destination : [ 7331 ] Avg: 7331.0

Total:  
 ==> mc_destination : [ 7962 ] Avg: 7962.0

Stripped BitSet

CI run: https://github.com/opentripplanner/OpenTripPlanner/actions/runs/2918381468

Worker: 
 ==> mc_destination : [ 7124 ] Avg: 7124.0

Total:  
 ==> mc_destination : [ 7767 ] Avg: 7767.0

You can indeed see that it's gone from the flame graph:

It's definitely an improvement but not a huge one.

hannesj · 2022-08-25T07:02:52Z

In the Norwegian dataset, the BitSet in routeIndexIterator is the one consuming most time, especially the BitSetIterator coupled to it. It might make sense to test the different BitSet implementations in it

t2gran · 2022-11-04T12:03:07Z

I will close this one now, thanks to everyone for testing this.

vpaturet added 2 commits August 17, 2022 15:06

Use Lucene BitSet

6c41b1a

Fix SpeedTest

372adcd

t2gran added the Optimization The feature is to improve performance. label Aug 17, 2022

t2gran added this to the 2.2 milestone Aug 17, 2022

t2gran requested a review from a team as a code owner August 17, 2022 13:43

t2gran changed the title ~~Ese lucene FixedBitSet - Experiment~~ Test lucene FixedBitSet - Experiment Aug 17, 2022

t2gran marked this pull request as draft August 17, 2022 13:47

hannesj reviewed Aug 17, 2022

View reviewed changes

t2gran commented Aug 18, 2022

View reviewed changes

leonardehrenfried mentioned this pull request Aug 24, 2022

Raptor Heuristics performance #3748

Closed

t2gran modified the milestones: 2.2, 2.3, Rejected Nov 1, 2022

t2gran closed this Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test lucene FixedBitSet - Experiment #4384

Test lucene FixedBitSet - Experiment #4384

t2gran commented Aug 17, 2022 •

edited

Loading

hannesj Aug 17, 2022

hannesj Aug 18, 2022

leonardehrenfried commented Aug 18, 2022

t2gran Aug 18, 2022

t2gran commented Aug 23, 2022

leonardehrenfried commented Aug 23, 2022

t2gran commented Aug 23, 2022

leonardehrenfried commented Aug 23, 2022

t2gran commented Aug 23, 2022

leonardehrenfried commented Aug 23, 2022

leonardehrenfried commented Aug 24, 2022

t2gran commented Aug 24, 2022 •

edited

Loading

leonardehrenfried commented Aug 24, 2022

leonardehrenfried commented Aug 24, 2022

hannesj commented Aug 25, 2022

t2gran commented Nov 4, 2022

Test lucene FixedBitSet - Experiment #4384

Test lucene FixedBitSet - Experiment #4384

Conversation

t2gran commented Aug 17, 2022 • edited Loading

Summary

hannesj Aug 17, 2022

Choose a reason for hiding this comment

hannesj Aug 18, 2022

Choose a reason for hiding this comment

leonardehrenfried commented Aug 18, 2022

t2gran Aug 18, 2022

Choose a reason for hiding this comment

t2gran commented Aug 23, 2022

leonardehrenfried commented Aug 23, 2022

t2gran commented Aug 23, 2022

leonardehrenfried commented Aug 23, 2022

t2gran commented Aug 23, 2022

leonardehrenfried commented Aug 23, 2022

leonardehrenfried commented Aug 24, 2022

t2gran commented Aug 24, 2022 • edited Loading

leonardehrenfried commented Aug 24, 2022

leonardehrenfried commented Aug 24, 2022

dev-2.x

Stripped BitSet

hannesj commented Aug 25, 2022

t2gran commented Nov 4, 2022

t2gran commented Aug 17, 2022 •

edited

Loading

t2gran commented Aug 24, 2022 •

edited

Loading