-
Notifications
You must be signed in to change notification settings - Fork 41
Add tests for big feeds #123
Comments
If I try to run the Dutch feed with
So it looks like it's getting hung up in the static GTFS validation using the Conveyal gtfs-validator. If I run the Dutch GTFS-rt feed with random GTFS data (I used HART in Tampa), then it processes each GTFS-rt iteration in about 1.1 seconds. Using MBTA data, it processes each GTFS-rt iteration in about 1.1 seconds as well. |
Here's a good list of GTFS-rt feeds from Transitfeeds.com: |
Transitland issue for adding support for GTFS-rt feeds - https://github.com/transitland/transitland/issues/77. |
We could use the batch processor for benchmarking feed processing times - see README "Configuration options ->Batch processing": |
@barbeau did you try running the out-of-memory dataset using a profiler? |
No, not yet. |
A good approach for this might be to graph performance on each PR instead of imposing hard limits via a unit test - that's what OpenTripPlanner is doing here: |
DELFI e.V. is a non-profit that aggregates transit datasets of all the local transit authorities/providers to create a unified feed fir Germany. It's official role is to publish NeTeX as mandatory per the EU regulation. But it also publishes a GTFS feed generated from the merged data, which is currently 333mb in size. Its official site doesn't provide a direct & script-friendly URL for it (🙄), but @juliuste kindly mirrors it to Currently, it is not much larger than the Dutch feed, but since over the coming months & years, missing regions as well as lots of stop/station & Edit: Unfortunately, to my knowledge, there are no realtime feeds available right now. |
Summary:
We need to make sure that as we add new rules, the validator can continue to run in real-time on production-sized feeds for major cities.
I posted a question on the GTFS-rt list asking for examples of very large feeds:
https://groups.google.com/forum/#!topic/gtfs-realtime/mM8cQIIV_-Y
These have been suggested to me so far, with largest coming first:
We should add some unit tests that do basic benchmarking to ensure we're not exceeding a given duration when processing feeds. I think 2 seconds may be reasonable, but we'll need to test. We'll also need to figure out how this works for CI, as Travis is significantly underpowered when compared to a typical desktop.
The text was updated successfully, but these errors were encountered: