-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify "data does not exist" vs empty data for trips and status changes #437
Conversation
cc @hyperknot |
I agree, this is a much needed PR for clarifying some parts around the 404. I'm not sure "operating" is the right word to use here, as some provider might understand 404 can be used during night hours when they are "not operating". I think it's clear that there are only two cases when 404 should be used:
Everything else should be returned with a 200 response and a list (possibly empty). |
What about seasonal operations, where a provider doesn't have operations in a city for like 4 months? |
I'd say that 404 as well. But a bank holiday, or even a 2-3 day long holiday should definitely count as "operating" and be returned as empty list. |
I'm not sure I totally agree with this assessment. If we consider the history of #268 and the final implementation in #357, where the intent was that
To me these scenarios bring up more ambiguity between using 404 sometimes and using Seems easier to say, 404 if there is no data, each and every time. This is my interpretation of the current language. |
The current implementation does not make a difference between an hour with no trips and an hour where the operator has not yet processed the data, or was not operating yet. However this is a very important differentiation! When the operator has processed the data, and has data about an hour where there were no trips, this is a very valueable information! It should be acknoledged by returning an empty list This is information required for all vehicle count calculations, which is the most important use case for MDS currently. Technically speaking, storing those empty hour files take a few byes, which literally is costing nothing today. I think the solution is very simple:
This way we clarify that it doesn't matter why the operator hasn't processed the data, it naturally includes all uses cases mentioned in this discussion. |
Even the original comment which introduced 404 had this meaning in mind:
From a provider's point of view "complete" means exactly that a given time range :
|
This seems like a great simplification, thank you for the explanation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to ensure it is clear what is required here.
There seems to be 3 cases:
With the model:
and the discussion above it seems cases 1 and 2 results in a 404 and case 3 results in a 200. Do we want to distinguish cases 1 and 2? The use case would be, I'm polling an operators feed over some time window. I don't know the operator stopped operating and pulled out all vehicles during some period of time (say a winter month). If we don't distinguish between case 1 and 2, I won't know programmatically if I'm not getting data because there will never be data or if the data is actually missing. One way to distinguish this is to return a 200 with an empty list if there will never be any data for that time interval. That is, there is no data, so in a sense it has been processed and the result is empty. This would change the example above for requests made before the operator has started service. Instead of returning a 404 in this case the endpoint would return a 200 with a []. The proposed model would be: No data exists for this time window but data will appear in the future -> 404 |
@billdirks I agree with your points, however I believe it's out of the scope of this PR and it would also be a breaking change. The most important point is that there are multiple ways of "processing" stage. Providers usually make a "quick" result, then after a few day - up to about a week, they make a "finalized" results with lost vehicles included, data properly cleaned, etc. MDS needs to handle those cases, but I believe it should be a separate discisson. I believe the right solution is to introduce processing levels, like:
or something similar. But this is not an easy task and I think it should be outside this PR. For this, I believe we can simplify into 404 or 200, to make it non-breaking. |
Another clarifying option that gets a little esoteric is to use some of the less common HTTP status codes, for example:
Or, as @hyperknot, suggests, add a metadata field to the MDS response that describes the status of the data. |
This is over a year old now. What's the status of getting some additional clarity on this in MDS? We at Populus continue to deal with this ambiguity with operators choosing different interpretations of what to return when there's no data. Most return responses with empty arrays, but some choose to return 404 responses, which we cannot differentiate from a need to try to get the data again later. |
Hey @jiffyclub you could present this tomorrow on the WG call. |
Great, I'll be there.
…On Wed, Mar 17, 2021 at 18:18 Michael Schnuerle ***@***.***> wrote:
Hey @jiffyclub <https://github.com/jiffyclub> you could present this
tomorrow on the WG call
<https://github.com/openmobilityfoundation/mobility-data-specification/wiki/Web-conference-notes,-2021.03.18-(Joint-Working-Group)>
.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#437 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHAXLEHNOOKVD3C6M2EHXTTEFIEZANCNFSM4KN6QVBQ>
.
|
Posting a note from today's working group call: in my experience at Populus most operators are already handling the case of "operational, but no trips or status changes to report" by sending 200 responses with empty arrays of trips or status changes. There are only a couple that are taking the 404 route that we have to handle with a special case. |
In the conversation on this issue today, it sounds like we have consensus on moving forward with this PR but we want to see if it would be feasible for operators to also differentiate between cases where no data is available yet (server still processing) vs no data will ever be available (outside service period etc). If operators can feasibly expose this, we would amend this PR to include it as follows. Building on @jiffyclub's proposal above, we discussed using the following HTTP codes:
|
From the WG Call yesterday:
@dirkdk @bhandzo and other providers: could you weigh in on the question of if and when and how you might know that you are waiting on data or data processing so you could send a 102 response that the data is not missing, it's just not ready yet. |
102 would be a good idea With 0.4.0 we started using hour as a parameter. We decided that future hours would return a 404. So the 102 code would be used when the hour is in the past, server just hasn't caught up yet. A few seconds/minutes of delay if the provider is using a batched approach to save the whole response for 1 hour to file is not unheard of So this would be the sequence:
|
HI @jiffyclub could make some tweaks/additions to this PR based on the last 3 comments? |
The trips and status changes endpoints are expected to return HTTP 404 responses when data "does not exist". I'm trying to clarify that that means the future and hours when the provider was not in operation in the municipality. For past hours during which the provider was operating but nothing happened they should return "200 OK" responses with empty arrays of trips or status changes. Additionally, for hours that are not available in the API because the data is still be processed the API shall return a "102 Processing" response to indicate the requester should check back later.
@schnuerle Updated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready for merge. Please reply if you have any issues with this.
Instead of using 102 status code, can we use status code 202? 102 status code requests the client to keep the connection open anticipating server will process the data. However, this may not be always feasible. There are times when due to underlying nature of the system, it can take longer for hourly data to be available for compute as a whole. Under such circumstances, clients will keep the connection open and uses resources on both ends which may not be ideal for either parties. I have raised an issue regarding usage of 102 here. |
Explain pull request
The trips and status changes endpoints are expected to return HTTP 404 responses when data "does not exist". I'm trying to clarify that that means the future and hours when the provider was not in operation in the municipality. For past hours during which the provider was operating but nothing happened they should return empty arrays of trips or status changes.
I'm guessing, but not totally sure, that this was the original intent.
Is this a breaking change
A breaking change would require consumers or implementors of the API to modify their code for it to continue to function (ex: renaming of a required field or the change in data type of an existing field). A non-breaking change would allow existing code to continue to function (ex: addition of an optional field or the creation of a new optional endpoint).
Impacted Spec
Which spec(s) will this pull request impact?
provider
Additional context
See #357 for the PR in which
/trips
and/status_changes
were switched to this hourly format.