-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/adapt for restart #2
base: master
Are you sure you want to change the base?
Conversation
avoid crashing on errors extracting data from s3; show s3 path that caused error
state(agencyId: "trimet" | ||
, startTime: 1642867201 | ||
, endTime: 1642867500 | ||
, routes: ["100"]) { | ||
agencyId | ||
startTime | ||
routes { | ||
routeId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent, can we either change agencyId and routeId to agencyID and routeID, or change vehicleID and tripID to vehicleId and tripId?
* @param agencyId - String | ||
* @param currentTime - Number | ||
* @return prefix - String | ||
*/ | ||
function getBucketMinutePrefix(agencyId, currentTime) { | ||
function getBucketHourPrefix(agencyId, currentTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous S3 path format with the minute was used because the S3 API allows searching for keys by prefix, and including the minute in the prefix of the path makes it possible to fetch data with a granularity of one minute. opentransit-metrics fetches data from opentransit-state-api in chunks that are not necessarily multiples of 1 hour.
If the minute was removed from the bucket prefix, it would require searching the S3 for keys with a granularity of 1 hour and then filtering within those keys to find keys with the requested minutes. I'm not sure that there is a drawback to keeping the minute in the prefix so it's probably easier that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reviewing the existing code in getVehiclePaths, I see that it makes separate s3.listObject requests to get the list of keys for every minute within the time range, which is a lot more API requests than necessary and might reduce performance. Now I think it probably would be faster to search for a keys with a granularity of 1 hour and then filter within those keys to find the requested minutes.
const hour = currentDateTime.getUTCHours(); | ||
const minute = currentDateTime.getUTCMinutes(); | ||
return `${agencyId}/${year}/${month}/${day}/${hour}/${minute}/`; | ||
const pacificDateTime = convertTZ(currentDateTime, 'America/Los_Angeles'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using America/Los_Angeles as the time zone could cause issues related to daylight savings since when switching from PDT to PST in the fall the there would be two hours with the same key prefix.
Using America/Los_Angeles would also be confusing when deploying opentransit-state-api for transit agencies that are not in Pacific time. It is also possible that a single instance of opentransit-state-api could support transit agencies in multiple time zones. It would probably be simpler to keep the S3 keys in UTC.
Updating TrynAPI schema, resolver, and s3helper code to align with Trimet API and new S3 bucket configuration.
Main Updates
${agencyId}/${year}/${month}/${day}/${hour}/${minute}/
${agencyId}/${year}/${month}/${day}/${hour}/