Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable providers to bucket data by fixed time windows #354

Closed

Conversation

babldev
Copy link
Contributor

@babldev babldev commented Aug 22, 2019

Background

We've seen requests from Providers to have an easier to implement Provider API.
#268

According to @johnpena:

We've seen the latency across our API endpoints creep up on us as more agencies have adopted MDS and as more trips have been taken and added to our trips and status changes datasets.

In particular, if we could present a way for a user to ask for a specific day or hour of data, it would allow us to resolve the query ahead of time and return the result to them much faster.

If Providers are able to bucket data into files of fixed time windows, and Provider API queries only touch a single bucket at a time, we can make our response times super fast! 💨

Proposal

This PR adds some restrictions/clarifications into MDS querying behavior that will allow Providers to bucket data into fixed time windows, and have efficient "single file" lookups under the hood.

Remove ability to query by vehicle_id or device_id
It's difficult to search using these fields if our data is solely indexed by time window. From the discussion in #268 it sounds like it isn't really used.

Add clarifying language that the number of results per page may vary, or even be zero.
This isn't technically a breaking change, but I thought it would be valuable to indicate that if providers bucket data by time window then certain buckets may not contain data, or the amount of data may vary.

Is this a breaking change

Yes, but hopefully with limited impact given my understanding of current Provider API usage.

Provider or agency

Which API(s) will this pull request impact:

  • provider

Additional context

Add any other context or screenshots about the feature request here.

@babldev
Copy link
Contributor Author

babldev commented Aug 22, 2019

@hunterowens here's the proposal I promised from the last MDS call!

A heads up, I'm going to be on extended leave starting next week so may be unresponsive to this thread (and will unfortunately miss tomorrow's call). @shivtools can speak on Lyft's behalf.

I should be able to answer questions Thurs afternoon and Fri if any of this doesn't make sense.

@hyperknot
Copy link
Contributor

I think we'd need to clarify static files (buckets) vs. API queries.

The only way to pre-process static files is if the time periods are known in advance, thus we cannot make the periods part of the query parameters.

I believe these endpoints should be specified as "historical_..." and their fixed time windows should be specified in the specs.

I believe there are two choices:

  • UTC hours
  • local timezone days split by midnight

We should discuss which is the better choice for providers and then make that part of the specs. For example

  • The "historical_trips" endpoint provides JSON-in-ZIP files split to days by local timezone midnight.

@hunterowens
Copy link
Collaborator

closing this in favor of #357, which includes the @babldev work. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants