Flat file implementations of provider API #58

jfh01 · 2018-09-12T17:33:19Z

I am concerned that many cities may not have the technical capacity to consume a dynamic, query-based API.

The /trips and /events endpoints both lend themselves to a flat file format. While LA will clearly want to require a dynamic API, it may be worth broadening MDS to allow for other cities to request the same data in flat file format (e.g. monthly/weekly CSVs).

Are LADOT folks amenable to including some guidance on "flat file MDS implementations" in the spec? I suspect this would make MDS more appealing to other cities.

Happy to do put in pull request of people think this is a good idea.

ezheidtmann · 2018-09-12T20:32:02Z

I can't speak to the relevance of a flat file implementation specifically, but I strongly agree that there is value to having a well-defined object format (i.e. events, trips, routes) independent of any specific interface. Maybe we can start by breaking things down into object types and interfaces?

jfh01 · 2018-09-12T23:50:55Z

This makes a ton of sense to me.

@hunterowens: Thoughts on this? Related discussion in #60 as well.

Idea would be to:

Define data types that can be used as return objects from API calls OR put into static/flat files
Define API endpoints as query mechanisms which return typed data

LA would require that all providers implement the API, but other cities could adopt the data component of the spec while using something other than an API to move data.

bhandzo · 2018-09-13T05:34:29Z

I think a flat file standard would be 💯 as many cities won't have the ability to consume API endpoints but will want to access the data. Main concern here is some standardized data chunking or ability to break what will certainly be very large data sets up to be manageable.

migurski · 2018-09-14T00:06:45Z

Strongly agree with this. I’d like to see simplifications to the Provider API that allow it to be expressed fully as a flat file interface instead of an additional component to the spec. This would position the provider API as an easily-consumable data feed accessible to smaller cities with limited IT or engineering staff, but available data analysts familiar with spreadsheet tools.

A sample trips request and response might look like this:

> GET /trips
> Accept: text/csv

< HTTP/1.1 200 OK
< Content-Type: text/csv
< 
< provider_id, provider_name, device_id,…

Parameters for the /trips endpoint are unspecified, but might include date and bounding box arguments that can be manipulated in a web browser.

This change would conflict with #46 which pulls the API away from common skillsets present in cities and toward those typically found in software engineering orgs.

Challenges would include representing routes and pagination. Pagination could be addressed out of band with Web Linking headers.

thekaveman · 2018-09-14T03:30:20Z

I've put together an initial proposal for this at #68

hunterowens · 2018-09-14T04:10:39Z

I think a flat file implementation would be great at improving adoption. Will comment on the PR from here.

migurski · 2018-09-14T17:41:41Z

I am a fan of the changes in #68, in particular the addition of trip_points. There’s still some vagueness about query params for GET requests, but this goes a long way toward expressing MDS Provider data in a form accessible to non-programmers.

mattwigway · 2018-09-14T19:16:23Z

I'm agreed on the flat file. As an academic potentially doing research with this data in the future, it's going to be more useful to have a dump of all the trips (even if it's 10GB or 1TB) than to have to repeatedly query the api.

jfh01 · 2018-09-14T20:30:51Z

The changes in #68 look good, with one potential change:

My assumption is that the start and end location will be used far more often than the full collection of points that make up the route. What do people think about leaving start and end location in the basic trip data structure (CSV-friendly), while offering the optional trip_points for those use cases that require the detailed data.

As proposed in #68, I still need a database (or ability to process and join two datasets) to get start/end location for a trip. Leaving it in the trip data structure would remove this barrier.

ian-r-rose · 2018-09-18T00:50:43Z

I agree that for many (most?) users a flat file, even a potentially large one, would be the preferred way of analyzing the data. As @migurski points out, this would pull away from #53. My concern is the same as in #46: CSVs are emphatically not a well-specified format, and I can easily see an $N^2$ problem in implementations between the multiple agencies and providers. Sooner or later I would worry about significant differences in interpretations and implementations (especially as the version starts getting bumped).

In #68 there is some discussion of a quasi-official client-side tool for transforming JSON-data to a CSV (or other flat file). I think that could be a good compromise.

hunterowens · 2018-09-18T19:07:54Z

My preference would be the client-side tool, as mentioned in #68. My worry about supporting CSVs on the same spec is loosing some of the ability to have robust standardization, for example, JSON Schema. A tool like mds-downloader.github.io could allow City Staff / Researchers/ Etc to plugin tokens and get data.

Additionally, supporting some sort of token based auth means that the barrier to entry is already high, I wonder if we are creating a bunch of work for ourselves if we add CSV support only to require that they use complex request to get that CSV

In order to support that, I have opened issue #79 for CORS. Can anybody think of additional changes to spec needed for that tool.

mattwigway · 2018-09-18T19:45:26Z

I'm not particularly tied to CSV, although I can see it being valuable for smaller cities (they can analyze it in Excel). But I think the important piece here from a research perspective is that we need a way to get all of the trips out in a single file so we can analyze then en masse, without needing to make piecemeal requests to an api.

…

On Tue, Sep 18, 2018, 12:09 PM Hunter Owens ***@***.***> wrote: My preference would be the client-side tool, as mentioned in #68 <#68>. In order to support that, I have opened issue #79 <#79> for CORS. Can anybody think of additional changes to spec needed for that tool. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#58 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAimrisScVS-lJS3-sCY7Ye_7tHULZY_ks5ucUTVgaJpZM4Wl2Ke> .

thekaveman · 2018-09-19T00:31:32Z

Also throwing my 👍 towards the conversion tool idea, and a more robust JSON-based spec. And since it seems the crowd is leaning this direction...

Does anyone want to take a peek over at #46 / #53 for a discussion around JSON Schema and tightening up this spec for machine validation? @ian-r-rose made some additions to #53 that haven't made their way over yet, but I'll merge them ASAP.

hunterowens · 2018-10-03T23:47:44Z

Since we have JSON Schema inclusion, I'm gonna close this issue.

Hoping that we can come up with a toolset to serve as an mds-provider-bulk-downloader

jfh01 mentioned this issue Sep 12, 2018

How should access to the 'provider' API be authenticated? #60

Closed

thekaveman mentioned this issue Sep 14, 2018

Discussion: provider flat-file representation #68

Closed

hunterowens added this to the 0.2.0 milestone Sep 14, 2018

hunterowens mentioned this issue Sep 18, 2018

CORS support #79

Closed

thekaveman modified the milestones: 0.2.0, Future Sep 26, 2018

hunterowens closed this as completed Oct 6, 2018

schnuerle removed this from the Future milestone Sep 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flat file implementations of provider API #58

Flat file implementations of provider API #58

jfh01 commented Sep 12, 2018

ezheidtmann commented Sep 12, 2018

jfh01 commented Sep 12, 2018 •

edited

Loading

bhandzo commented Sep 13, 2018

migurski commented Sep 14, 2018

thekaveman commented Sep 14, 2018

hunterowens commented Sep 14, 2018

migurski commented Sep 14, 2018

mattwigway commented Sep 14, 2018

jfh01 commented Sep 14, 2018 •

edited

Loading

ian-r-rose commented Sep 18, 2018

hunterowens commented Sep 18, 2018 •

edited

Loading

mattwigway commented Sep 18, 2018 via email

thekaveman commented Sep 19, 2018

hunterowens commented Oct 3, 2018

Flat file implementations of provider API #58

Flat file implementations of provider API #58

Comments

jfh01 commented Sep 12, 2018

ezheidtmann commented Sep 12, 2018

jfh01 commented Sep 12, 2018 • edited Loading

bhandzo commented Sep 13, 2018

migurski commented Sep 14, 2018

thekaveman commented Sep 14, 2018

hunterowens commented Sep 14, 2018

migurski commented Sep 14, 2018

mattwigway commented Sep 14, 2018

jfh01 commented Sep 14, 2018 • edited Loading

ian-r-rose commented Sep 18, 2018

hunterowens commented Sep 18, 2018 • edited Loading

mattwigway commented Sep 18, 2018 via email

thekaveman commented Sep 19, 2018

hunterowens commented Oct 3, 2018

jfh01 commented Sep 12, 2018 •

edited

Loading

jfh01 commented Sep 14, 2018 •

edited

Loading

hunterowens commented Sep 18, 2018 •

edited

Loading