Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flat file implementations of provider API #58

Closed
jfh01 opened this issue Sep 12, 2018 · 14 comments
Closed

Flat file implementations of provider API #58

jfh01 opened this issue Sep 12, 2018 · 14 comments

Comments

@jfh01
Copy link
Contributor

jfh01 commented Sep 12, 2018

I am concerned that many cities may not have the technical capacity to consume a dynamic, query-based API.

The /trips and /events endpoints both lend themselves to a flat file format. While LA will clearly want to require a dynamic API, it may be worth broadening MDS to allow for other cities to request the same data in flat file format (e.g. monthly/weekly CSVs).

Are LADOT folks amenable to including some guidance on "flat file MDS implementations" in the spec? I suspect this would make MDS more appealing to other cities.

Happy to do put in pull request of people think this is a good idea.

@ezheidtmann
Copy link

I can't speak to the relevance of a flat file implementation specifically, but I strongly agree that there is value to having a well-defined object format (i.e. events, trips, routes) independent of any specific interface. Maybe we can start by breaking things down into object types and interfaces?

@jfh01
Copy link
Contributor Author

jfh01 commented Sep 12, 2018

This makes a ton of sense to me.

@hunterowens: Thoughts on this? Related discussion in #60 as well.

Idea would be to:

  • Define data types that can be used as return objects from API calls OR put into static/flat files
  • Define API endpoints as query mechanisms which return typed data

LA would require that all providers implement the API, but other cities could adopt the data component of the spec while using something other than an API to move data.

@bhandzo
Copy link
Contributor

bhandzo commented Sep 13, 2018

I think a flat file standard would be 💯 as many cities won't have the ability to consume API endpoints but will want to access the data. Main concern here is some standardized data chunking or ability to break what will certainly be very large data sets up to be manageable.

@migurski
Copy link

Strongly agree with this. I’d like to see simplifications to the Provider API that allow it to be expressed fully as a flat file interface instead of an additional component to the spec. This would position the provider API as an easily-consumable data feed accessible to smaller cities with limited IT or engineering staff, but available data analysts familiar with spreadsheet tools.

A sample trips request and response might look like this:

> GET /trips
> Accept: text/csv

< HTTP/1.1 200 OK
< Content-Type: text/csv
< 
< provider_id, provider_name, device_id,…

Parameters for the /trips endpoint are unspecified, but might include date and bounding box arguments that can be manipulated in a web browser.

This change would conflict with #46 which pulls the API away from common skillsets present in cities and toward those typically found in software engineering orgs.

Challenges would include representing routes and pagination. Pagination could be addressed out of band with Web Linking headers.

@thekaveman
Copy link
Collaborator

I've put together an initial proposal for this at #68

@hunterowens
Copy link
Collaborator

I think a flat file implementation would be great at improving adoption. Will comment on the PR from here.

@hunterowens hunterowens added this to the 0.2.0 milestone Sep 14, 2018
@migurski
Copy link

I am a fan of the changes in #68, in particular the addition of trip_points. There’s still some vagueness about query params for GET requests, but this goes a long way toward expressing MDS Provider data in a form accessible to non-programmers.

@mattwigway
Copy link

I'm agreed on the flat file. As an academic potentially doing research with this data in the future, it's going to be more useful to have a dump of all the trips (even if it's 10GB or 1TB) than to have to repeatedly query the api.

@jfh01
Copy link
Contributor Author

jfh01 commented Sep 14, 2018

The changes in #68 look good, with one potential change:

My assumption is that the start and end location will be used far more often than the full collection of points that make up the route. What do people think about leaving start and end location in the basic trip data structure (CSV-friendly), while offering the optional trip_points for those use cases that require the detailed data.

As proposed in #68, I still need a database (or ability to process and join two datasets) to get start/end location for a trip. Leaving it in the trip data structure would remove this barrier.

@ian-r-rose
Copy link
Contributor

I agree that for many (most?) users a flat file, even a potentially large one, would be the preferred way of analyzing the data. As @migurski points out, this would pull away from #53. My concern is the same as in #46: CSVs are emphatically not a well-specified format, and I can easily see an $N^2$ problem in implementations between the multiple agencies and providers. Sooner or later I would worry about significant differences in interpretations and implementations (especially as the version starts getting bumped).

In #68 there is some discussion of a quasi-official client-side tool for transforming JSON-data to a CSV (or other flat file). I think that could be a good compromise.

@hunterowens
Copy link
Collaborator

hunterowens commented Sep 18, 2018

My preference would be the client-side tool, as mentioned in #68. My worry about supporting CSVs on the same spec is loosing some of the ability to have robust standardization, for example, JSON Schema. A tool like mds-downloader.github.io could allow City Staff / Researchers/ Etc to plugin tokens and get data.

Additionally, supporting some sort of token based auth means that the barrier to entry is already high, I wonder if we are creating a bunch of work for ourselves if we add CSV support only to require that they use complex request to get that CSV

In order to support that, I have opened issue #79 for CORS. Can anybody think of additional changes to spec needed for that tool.

@mattwigway
Copy link

mattwigway commented Sep 18, 2018 via email

@thekaveman
Copy link
Collaborator

Also throwing my 👍 towards the conversion tool idea, and a more robust JSON-based spec. And since it seems the crowd is leaning this direction...

Does anyone want to take a peek over at #46 / #53 for a discussion around JSON Schema and tightening up this spec for machine validation? @ian-r-rose made some additions to #53 that haven't made their way over yet, but I'll merge them ASAP.

@thekaveman thekaveman modified the milestones: 0.2.0, Future Sep 26, 2018
@hunterowens
Copy link
Collaborator

Since we have JSON Schema inclusion, I'm gonna close this issue.

Hoping that we can come up with a toolset to serve as an mds-provider-bulk-downloader

@schnuerle schnuerle removed this from the Future milestone Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants