-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raptor Heuristics performance #3748
Comments
Might be related to #3664 |
After having added the speed test for all of Germany today, I noticed that this is a critical issue for very large data sets. You can see that in this data set (500k stops, 200k patterns) the heuristics request takes up about 60% of the routing time: More reading: #4384 (comment) |
For huge networks, such as this one it might not be possible to have an heuristic that reach most stops and patterns (and the min transfer + 5 exit condition of existing heuristic is likely to reach almost all stops) I think that an additional precomputed heuristic to prune the graph to the relevant patterns would help. For instance, clustering connecting pattern by operator and computing best generalised score between these clusters could help. In many cases you don't need a lot more than a handful of operators out of the 100s that operates at a national level. The city/regional around the origin and destination and the national operators in between + a few connecting ones (ferry, regional trains, etc.) The challenge is to precompute this list properly by taking into account all transit times, days of operations and RT information. |
Based on your observations of reducing the input data set for Raptor leading to a speed up, I wrote an experimental filter that uses the bounding box of start, destination coordinates and checks if a route is inside of it. The result of that filtering is then fed to Raptor. The code is at leonardehrenfried@c973c04 If you have a huge network but only need to route a few km, it led to a dramatic speed up. Of course there is lots of ways how this could go wrong. We discussed this in a dev meeting and @t2gran and @abyrd said that instead of this filter they think that improving the heuristics themselves would lead to better results. |
I'm having the same problem with country wide slowness in Colombia. I've got around 120 gtfs feeds with a lot of frequency based trips. |
If you want to tackle this in a systematic manner then you could contribute a speed test data set for Columbia, like I have done for Germany: #4408 |
@leonardehrenfried How dit you generate the csv? Cut and paste? |
I wrote a little python script that parses stops.txt and selects random-ish stop ids for start and destination. |
Can you share it? |
https://gist.github.com/leonardehrenfried/f9ba0f585676be35fb9a3fbee5d0241a You need to remove the Germany-specific parts. This one does stop to stop routing requests only. This is actually not a very good simulation of real workloads where you have potentially many start/end stops. So you should remove the column fromPlace and toPlace if you want something more realistic. Also maybe move the coordinates a little bit away from the stop coordinates. |
|
Looks good. If you want to run this with every merge, you can submit a PR. However, it's not a guarantee that a core developer will do anything about it. Nevertheless it's good to track the problematic performance cases. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days |
In the past 6 months the focus has been on functionality, not performance. I ran the SpeedTest today, and especially the Raptor Heuristics seems to be much slower than it used to be. My benchmark uses the same 27 routing requests and run them all 8 times to get god average results:
This chart show some of the key numbers compared with old versions of OTP2:
Average Response Times
Raptor request distribution
Version of OTP used (exact commit hash or JAR name)
Update all timers to micrometer instances #3744
Data sets in use (links to GTFS and OSM PBF files)
An old Entur dataset used as a benchmark - I use the same dataset every time.
Steps to reproduce the problem
Run SpeedTest with the same dataset and config with different versions of OTP.
Parameters
-p md -n 8
.The text was updated successfully, but these errors were encountered: