Skip to content
This repository has been archived by the owner on May 27, 2024. It is now read-only.

Add batch command line option to disable shapes.txt-based metadata #284

Closed
barbeau opened this issue Sep 28, 2017 · 2 comments
Closed

Add batch command line option to disable shapes.txt-based metadata #284

barbeau opened this issue Sep 28, 2017 · 2 comments
Assignees
Milestone

Comments

@barbeau
Copy link
Member

barbeau commented Sep 28, 2017

Summary:

When validating the massive Netherlands feed using the BatchProcessor, it stalls out and dies with the below error:

[main] INFO edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor - gtfs.zip read in 216.409 seconds
[main] INFO edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata - Building GtfsMetadata for E:\Git Projects\transit-feed-quality-calculator\feeds\194-The Netherlands\gtfs.zip...
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at com.vividsolutions.jts.geom.impl.CoordinateArraySequence.<init>(CoordinateArraySequence.java:113)
	at com.vividsolutions.jts.geom.impl.CoordinateArraySequenceFactory.create(CoordinateArraySequenceFactory.java:91)
	at com.vividsolutions.jts.geom.GeometryFactory.createMultiPoint(GeometryFactory.java:382)
	at com.vividsolutions.jts.geom.GeometryFactory.createMultiPoint(GeometryFactory.java:363)
	at org.locationtech.spatial4j.shape.jts.JtsShapeFactory$JtsMultiPointBuilder.build(JtsShapeFactory.java:351)
	at edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata.<init>(GtfsMetadata.java:135)
	at edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor.processFeeds(BatchProcessor.java:133)
	at edu.usf.cutr.transitfeedqualitycalculator.BulkFeedValidator.validateFeeds(BulkFeedValidator.java:60)
	at edu.usf.cutr.transitfeedqualitycalculator.TransitFeedQualityCalculator.calculate(TransitFeedQualityCalculator.java:74)
	at edu.usf.cutr.transitfeedqualitycalculator.Main.main(Main.java:32)

We should add a command line parameter for the batch processor to force it to use stops instead of shape points for the agency bounding box (the same logic is currently used if a GTFS feed doesn't have a shapes.txt file).

We might be able to extend this to the normal server mode too. Originally the server crashed when trying to run static GTFS validation on the data:
#123 (comment)

...but not that we allow the user to uncheck the GTFS validation box in the web UI, we could add a similar option to (hopefully) allow the GTFS-rt feed to be validated.

See also CUTR-at-USF/transit-feed-quality-calculator#1 (comment).

Steps to reproduce:

Run the batch validator on the Netherlands feed:

Expected behavior:

Correctly finish validation

Observed behavior:

OutOfMemoryError: Java heap space when building GtfsMetadata

Platform:

Windows 7 Enterprise SP1 w/ jdk1.8.0_73

@barbeau barbeau added this to the v1.0 milestone Sep 28, 2017
@barbeau
Copy link
Member Author

barbeau commented Sep 28, 2017

Well, if I comment out the bounding box text using shapes.txt, I still get another OutOfMemoryError error when it's trying to build the trip shapes in memory from shapes.txt:

[main] INFO edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata - Building GtfsMetadata for C:\Users\barbeau\Dropbox\CUTR\Projects\NITC - PSU - GTFS-realtime validation tool\Development\Archived data\Netherlands\gtfs.zip...
[main] INFO edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata - Processing trips and building trip shapes for C:\Users\barbeau\Dropbox\CUTR\Projects\NITC - PSU - GTFS-realtime validation tool\Development\Archived data\Netherlands\gtfs.zip...
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at org.locationtech.spatial4j.shape.jts.JtsShapeFactory$CoordinatesAccumulator.pointXYZ(JtsShapeFactory.java:316)
	at org.locationtech.spatial4j.shape.jts.JtsShapeFactory$CoordinatesAccumulator.pointXY(JtsShapeFactory.java:310)
	at org.locationtech.spatial4j.shape.jts.JtsShapeFactory$JtsLineStringBuilder.pointXY(JtsShapeFactory.java:228)
	at edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata.<init>(GtfsMetadata.java:181)
	at edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor.processFeeds(BatchProcessor.java:133)
	at edu.usf.cutr.gtfsrtvalidator.Main.main(Main.java:76)

So maybe the best solution is just to turn off the shapes.txt processing all together? This would prevent a few of the rules from executing, including:

  • E029 - Vehicle position outside trip shape buffer

But, right now we can't validate these feeds at all, so it would be an improvement...

@barbeau barbeau changed the title Add command line option to use stops instead of shape points for bounding box Add command line option to disable shapes.txt-based metadata Sep 28, 2017
@barbeau
Copy link
Member Author

barbeau commented Sep 29, 2017

Ok, looks like eliminating all shapes.txt processing works - if I comment out that code, the validator can validate the Netherlands feed in batch mode:

[main] INFO edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor - Starting batch processor...
[main] INFO edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor - Reading GTFS data from C:\Users\barbeau\Dropbox\CUTR\Projects\NITC - PSU - GTFS-realtime validation tool\Development\Archived data\Netherlands\gtfs.zip...
[main] INFO edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor - gtfs.zip read in 247.259 seconds
[main] INFO edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata - Building GtfsMetadata for C:\Users\barbeau\Dropbox\CUTR\Projects\NITC - PSU - GTFS-realtime validation tool\Development\Archived data\Netherlands\gtfs.zip...
[main] INFO edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata - Processing trips and building trip shapes for C:\Users\barbeau\Dropbox\CUTR\Projects\NITC - PSU - GTFS-realtime validation tool\Development\Archived data\Netherlands\gtfs.zip...
[main] INFO edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata - Trips polylines processed for C:\Users\barbeau\Dropbox\CUTR\Projects\NITC - PSU - GTFS-realtime validation tool\Development\Archived data\Netherlands\gtfs.zip in 7.485 seconds
[main] INFO edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata - Repeated stop_ids for trips in stop_times.txt processed for C:\Users\barbeau\Dropbox\CUTR\Projects\NITC - PSU - GTFS-realtime validation tool\Development\Archived data\Netherlands\gtfs.zip in 17.19 seconds
[main] INFO edu.usf.cutr.gtfsrtvalidator.background.GtfsMetadata - Built GtfsMetadata for C:\Users\barbeau\Dropbox\CUTR\Projects\NITC - PSU - GTFS-realtime validation tool\Development\Archived data\Netherlands\gtfs.zip in 37.25 seconds
[main] INFO edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor - Sorting GTFS-rt files by DATE_MODIFIED...
[main] INFO edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor - Read OVapi Vehicle Positions-1506548048426.pb to byte array in 0.018 seconds
[main] INFO edu.usf.cutr.gtfsrtvalidator.batch.BatchProcessor - Decoded OVapi Vehicle Positions-1506548048426.pb protobuf in 0.177 seconds
...

@barbeau barbeau self-assigned this Sep 29, 2017
@barbeau barbeau changed the title Add command line option to disable shapes.txt-based metadata Add batch command line option to disable shapes.txt-based metadata Oct 10, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant