Clarify "data does not exist" vs empty data for trips and status changes #437

jiffyclub · 2020-01-30T23:04:22Z

Explain pull request

The trips and status changes endpoints are expected to return HTTP 404 responses when data "does not exist". I'm trying to clarify that that means the future and hours when the provider was not in operation in the municipality. For past hours during which the provider was operating but nothing happened they should return empty arrays of trips or status changes.

I'm guessing, but not totally sure, that this was the original intent.

Is this a breaking change

A breaking change would require consumers or implementors of the API to modify their code for it to continue to function (ex: renaming of a required field or the change in data type of an existing field). A non-breaking change would allow existing code to continue to function (ex: addition of an optional field or the creation of a new optional endpoint).

No, not breaking

Impacted Spec

Which spec(s) will this pull request impact?

provider

Additional context

See #357 for the PR in which /trips and /status_changes were switched to this hourly format.

jiffyclub · 2020-01-30T23:05:21Z

cc @hyperknot

hyperknot · 2020-01-30T23:13:03Z

I agree, this is a much needed PR for clarifying some parts around the 404. I'm not sure "operating" is the right word to use here, as some provider might understand 404 can be used during night hours when they are "not operating".

I think it's clear that there are only two cases when 404 should be used:

for dates which have not yet been processed or are in the future
for dates before the provider's first operating day in a given city

Everything else should be returned with a 200 response and a list (possibly empty).

jiffyclub · 2020-01-30T23:24:57Z

What about seasonal operations, where a provider doesn't have operations in a city for like 4 months?

hyperknot · 2020-01-30T23:35:45Z

I'd say that 404 as well. But a bank holiday, or even a 2-3 day long holiday should definitely count as "operating" and be returned as empty list.

thekaveman · 2020-01-31T00:24:41Z

I'm not sure I totally agree with this assessment. If we consider the history of #268 and the final implementation in #357, where the intent was that /trips and /status_changes endpoints could be implemented with static files - then 404 makes a lot of sense if there is no data for a requested hour. Why create the file only for it to be empty?

What about seasonal operations, where a provider doesn't have operations in a city for like 4 months?

I'd say that 404 as well. But a bank holiday, or even a 2-3 day long holiday should definitely count as "operating" and be returned as empty list.

To me these scenarios bring up more ambiguity between using 404 sometimes and using [] other times. At what point do we cut off the "non operation" and say 404 is more appropriate than []? What if a provider has equipment issues for a week? What if intense rain or another unforeseen natural disaster makes operations infeasible for some extended period of time?

Seems easier to say, 404 if there is no data, each and every time. This is my interpretation of the current language.

hyperknot · 2020-01-31T01:49:55Z

The current implementation does not make a difference between an hour with no trips and an hour where the operator has not yet processed the data, or was not operating yet.

However this is a very important differentiation! When the operator has processed the data, and has data about an hour where there were no trips, this is a very valueable information! It should be acknoledged by returning an empty list[]. Think like NULL vs. 0 in a database, that's a very important difference.

This is information required for all vehicle count calculations, which is the most important use case for MDS currently.

Technically speaking, storing those empty hour files take a few byes, which literally is costing nothing today.

I think the solution is very simple:

if the operator has processed data -> return 200 + results
if the operator has not processed data for any reason -> return 404

This way we clarify that it doesn't matter why the operator hasn't processed the data, it naturally includes all uses cases mentioned in this discussion.

hyperknot · 2020-01-31T01:53:46Z

Even the original comment which introduced 404 had this meaning in mind:
#268 (comment)

when a day's dataset is complete (polling for HTTP 200 vs 404 not found)

From a provider's point of view "complete" means exactly that a given time range :

has been processed -> 200
not processed -> 404

thekaveman · 2020-01-31T02:38:18Z

I think the solution is very simple:

if the operator has processed data -> return 200 + results

if the operator has not processed data for any reason -> return 404

This seems like a great simplification, thank you for the explanation.

thekaveman

Just want to ensure it is clear what is required here.

provider/README.md

billdirks · 2020-01-31T20:15:46Z

There seems to be 3 cases:

No data has been processed but data will be processed
No data has been processed and no data will never be processed.
Data has been processed

With the model:

if the operator has processed data -> return 200 + results
if the operator has not processed data for any reason -> return 404

and the discussion above it seems cases 1 and 2 results in a 404 and case 3 results in a 200. Do we want to distinguish cases 1 and 2? The use case would be, I'm polling an operators feed over some time window. I don't know the operator stopped operating and pulled out all vehicles during some period of time (say a winter month). If we don't distinguish between case 1 and 2, I won't know programmatically if I'm not getting data because there will never be data or if the data is actually missing.

One way to distinguish this is to return a 200 with an empty list if there will never be any data for that time interval. That is, there is no data, so in a sense it has been processed and the result is empty. This would change the example above for requests made before the operator has started service. Instead of returning a 404 in this case the endpoint would return a 200 with a [].

The proposed model would be:

No data exists for this time window but data will appear in the future -> 404
No data will ever exist for time window -> 200 with a []
Data exists for this time window -> 200 with data (could be empty)

hyperknot · 2020-01-31T21:31:40Z

@billdirks I agree with your points, however I believe it's out of the scope of this PR and it would also be a breaking change.

The most important point is that there are multiple ways of "processing" stage. Providers usually make a "quick" result, then after a few day - up to about a week, they make a "finalized" results with lost vehicles included, data properly cleaned, etc.

MDS needs to handle those cases, but I believe it should be a separate discisson. I believe the right solution is to introduce processing levels, like:

"not_operating"
"not_yet_processed"
"quick_processing"
"final_processing"

or something similar. But this is not an easy task and I think it should be outside this PR. For this, I believe we can simplify into 404 or 200, to make it non-breaking.

jiffyclub · 2020-02-03T18:07:30Z

Another clarifying option that gets a little esoteric is to use some of the less common HTTP status codes, for example:

200: there is data and here it is
204: we were operating and have processed this hour and have nothing to report
404: we haven't processed this data but check back later (processing is in progress or the hour is in the future)
410: there will never be data related to this hour because there were no operation at the time

Or, as @hyperknot, suggests, add a metadata field to the MDS response that describes the status of the data.

jiffyclub · 2021-03-09T18:07:33Z

This is over a year old now. What's the status of getting some additional clarity on this in MDS? We at Populus continue to deal with this ambiguity with operators choosing different interpretations of what to return when there's no data. Most return responses with empty arrays, but some choose to return 404 responses, which we cannot differentiate from a need to try to get the data again later.

schnuerle · 2021-03-18T01:17:47Z

Hey @jiffyclub you could present this tomorrow on the WG call.

jiffyclub · 2021-03-18T02:38:51Z

Great, I'll be there.

…

On Wed, Mar 17, 2021 at 18:18 Michael Schnuerle ***@***.***> wrote: Hey @jiffyclub <https://github.com/jiffyclub> you could present this tomorrow on the WG call <https://github.com/openmobilityfoundation/mobility-data-specification/wiki/Web-conference-notes,-2021.03.18-(Joint-Working-Group)> . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#437 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHAXLEHNOOKVD3C6M2EHXTTEFIEZANCNFSM4KN6QVBQ> .

jiffyclub · 2021-03-18T16:26:01Z

Posting a note from today's working group call: in my experience at Populus most operators are already handling the case of "operational, but no trips or status changes to report" by sending 200 responses with empty arrays of trips or status changes. There are only a couple that are taking the 404 route that we have to handle with a special case.

quicklywilliam · 2021-03-18T16:27:04Z

In the conversation on this issue today, it sounds like we have consensus on moving forward with this PR but we want to see if it would be feasible for operators to also differentiate between cases where no data is available yet (server still processing) vs no data will ever be available (outside service period etc). If operators can feasibly expose this, we would amend this PR to include it as follows.

Building on @jiffyclub's proposal above, we discussed using the following HTTP codes:

102: we haven't processed this data but check back later (processing is in progress or the hour is in the future)
404: there will never be data related to this hour because there were no operation at the time

schnuerle · 2021-03-19T20:15:20Z

From the WG Call yesterday:

Would be good to add more things.
Adding new codes would be a non breaking change for 1.2.
Will tag providers on the issue to see if this is possible/know. Spin and Superpedestrian on the call and looking into it.
Most operators are sending back 200 response.
Suggestion to add a 102 response - server is working on this - more appropriate and clear than anything in the 400s
400 class errors don’t capture this. It’s not an error, it's that the data/processing hasn't happened yet.

@dirkdk @bhandzo and other providers: could you weigh in on the question of if and when and how you might know that you are waiting on data or data processing so you could send a 102 response that the data is not missing, it's just not ready yet.

dirkdk · 2021-03-22T05:37:19Z

102 would be a good idea

With 0.4.0 we started using hour as a parameter. We decided that future hours would return a 404. So the 102 code would be used when the hour is in the past, server just hasn't caught up yet. A few seconds/minutes of delay if the provider is using a batched approach to save the whole response for 1 hour to file is not unheard of

So this would be the sequence:

event_time=2021-4-01T07, time is 2021-4-0 7:59 ==> HTTP 404 (hour has not finished yet)
event_time=2021-4-01T07, time is 2021-4-0 8:00 ==> HTTP 102 (server busy processing data)
event_time=2021-4-01T07, time is 2021-4-0 8:05 ==> HTTP 200 (server done processing data and response served)

schnuerle · 2021-06-25T20:58:20Z

HI @jiffyclub could make some tweaks/additions to this PR based on the last 3 comments?

The trips and status changes endpoints are expected to return HTTP 404 responses when data "does not exist". I'm trying to clarify that that means the future and hours when the provider was not in operation in the municipality. For past hours during which the provider was operating but nothing happened they should return "200 OK" responses with empty arrays of trips or status changes. Additionally, for hours that are not available in the API because the data is still be processed the API shall return a "102 Processing" response to indicate the requester should check back later.

jiffyclub · 2021-06-28T19:38:09Z

@schnuerle Updated!

thekaveman

This looks great!

schnuerle

I think this is ready for merge. Please reply if you have any issues with this.

bhargav-lime · 2023-03-24T00:42:53Z

Instead of using 102 status code, can we use status code 202?

102 status code requests the client to keep the connection open anticipating server will process the data. However, this may not be always feasible. There are times when due to underlying nature of the system, it can take longer for hourly data to be available for compute as a whole. Under such circumstances, clients will keep the connection open and uses resources on both ends which may not be ideal for either parties.

I have raised an issue regarding usage of 102 here.

jiffyclub requested a review from a team as a code owner January 30, 2020 23:04

jiffyclub requested a review from a team January 30, 2020 23:04

jiffyclub changed the title ~~Clarify "data does not exist" vs empty data~~ Clarify "data does not exist" vs empty data for trips and status changes Jan 30, 2020

thekaveman added documentation documentation change can be for code and/or markdown pages Provider Specific to the Provider API labels Jan 31, 2020

thekaveman suggested changes Jan 31, 2020

View reviewed changes

provider/README.md Outdated Show resolved Hide resolved

provider/README.md Outdated Show resolved Hide resolved

sarob added this to the Future milestone Feb 1, 2020

schnuerle modified the milestones: Future, 1.2.0 Mar 29, 2021

jiffyclub force-pushed the clarify-404 branch from d5083d4 to f5d698f Compare June 28, 2021 19:36

jiffyclub requested a review from thekaveman June 28, 2021 19:38

thekaveman approved these changes Jun 29, 2021

View reviewed changes

schnuerle requested a review from a team June 29, 2021 19:43

schnuerle approved these changes Jun 29, 2021

View reviewed changes

schnuerle merged commit b52d3f7 into openmobilityfoundation:dev Jul 21, 2021

jiffyclub mentioned this pull request Sep 1, 2021

Clarify language for Trips responses #686

Closed

jiffyclub deleted the clarify-404 branch February 17, 2022 20:52

jiffyclub mentioned this pull request Feb 17, 2022

Provider response outside of operating hours #746

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify "data does not exist" vs empty data for trips and status changes #437

Clarify "data does not exist" vs empty data for trips and status changes #437

jiffyclub commented Jan 30, 2020

jiffyclub commented Jan 30, 2020

hyperknot commented Jan 30, 2020 •

edited

Loading

jiffyclub commented Jan 30, 2020

hyperknot commented Jan 30, 2020

thekaveman commented Jan 31, 2020

hyperknot commented Jan 31, 2020

hyperknot commented Jan 31, 2020 •

edited

Loading

thekaveman commented Jan 31, 2020

thekaveman left a comment

billdirks commented Jan 31, 2020

hyperknot commented Jan 31, 2020

jiffyclub commented Feb 3, 2020

jiffyclub commented Mar 9, 2021

schnuerle commented Mar 18, 2021

jiffyclub commented Mar 18, 2021 via email

jiffyclub commented Mar 18, 2021 •

edited

Loading

quicklywilliam commented Mar 18, 2021

schnuerle commented Mar 19, 2021

dirkdk commented Mar 22, 2021

schnuerle commented Jun 25, 2021

jiffyclub commented Jun 28, 2021

thekaveman left a comment

schnuerle left a comment

bhargav-lime commented Mar 24, 2023

Clarify "data does not exist" vs empty data for trips and status changes #437

Clarify "data does not exist" vs empty data for trips and status changes #437

Conversation

jiffyclub commented Jan 30, 2020

Explain pull request

Is this a breaking change

Impacted Spec

Additional context

jiffyclub commented Jan 30, 2020

hyperknot commented Jan 30, 2020 • edited Loading

jiffyclub commented Jan 30, 2020

hyperknot commented Jan 30, 2020

thekaveman commented Jan 31, 2020

hyperknot commented Jan 31, 2020

hyperknot commented Jan 31, 2020 • edited Loading

thekaveman commented Jan 31, 2020

thekaveman left a comment

Choose a reason for hiding this comment

billdirks commented Jan 31, 2020

hyperknot commented Jan 31, 2020

jiffyclub commented Feb 3, 2020

jiffyclub commented Mar 9, 2021

schnuerle commented Mar 18, 2021

jiffyclub commented Mar 18, 2021 via email

jiffyclub commented Mar 18, 2021 • edited Loading

quicklywilliam commented Mar 18, 2021

schnuerle commented Mar 19, 2021

dirkdk commented Mar 22, 2021

schnuerle commented Jun 25, 2021

jiffyclub commented Jun 28, 2021

thekaveman left a comment

Choose a reason for hiding this comment

schnuerle left a comment

Choose a reason for hiding this comment

bhargav-lime commented Mar 24, 2023

hyperknot commented Jan 30, 2020 •

edited

Loading

hyperknot commented Jan 31, 2020 •

edited

Loading

jiffyclub commented Mar 18, 2021 •

edited

Loading