-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file linked_datasets.txt containing the Trip Updates, Vehicle Positions and Service Alerts URLs #93
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
|
Hi @LeoFrachet, I remember there being a significant amount of discussion about this a few years ago on one of the mailing lists, but I don't remember the conclusions. It would be helpful to track down that conversation and see what issues have already been raised on this subject. A few initial observations:
I agree that inconsistent use of the word "feed" is problematic and it may make sense to prefer other terms. However I would argue that the word "feed" itself is not ambiguous - it evokes the image of an endless stream of new data being regularly updated, as in "RSS feed". I believe the original intended meaning of "feed" in GTFS is that there's a stable URL where new data appears from time to time, with the feed being the ongoing series of files published at that URL. |
Hi Andrew, thanks for your answer!
This is the goal of the "Background" section of my first message. You'll find in it the two links to the two conversations from the Google Group speaking about it. About the conclusions of those discussions, the consensus seemed pretty broad on the solution that I'm describing above and which is used by TriMet. The only difference is on the naming of the file as I'm point out.
Just to be sure we are on the same page, please note I'm not working at Transit anymore. I'm working for RMI on the @MobilityData program, which is the continuation of the ITD program you knew, which created the Best Practices.
Using |
I agree that this is needed. One example that bit us earlier this year was MBTA's movement of GTFS-realtime URLs. MBTA announced this on their Google Group: ...but we're archiving data from several different agencies and haven't been manually monitoring all the developer Google Groups for those agencies. So MBTA made a best effort to announce this change, but we missed it. If there was a programmatic way to communicate this change (i.e., this proposal), this design easily scales and we could have caught the change. In the MBTA case, from a consumer's perspective it would have been nice to also have a |
On 9 Aug 2018, at 23:03, Leo Frachet ***@***.***> wrote:
This is the goal of the "Background" section of my first message. You'll find in it the two links to the two conversations from the Google Group speaking about it.
Oh, sorry I didn’t catch that! Thanks for the links.
Using linked_datasets.txt would also help consumers to discover which producers have real-time, since there is currently no simple way to know it. It would also allow consumers to be informed when a producer adds real-time. (I'm gonna add this in my first message, since it's another important half of the issue).
I do agree that overall the idea makes sense. The main point of discussion would be the new file’s format, and the potential for widespread adoption.
…-Andrew
|
What would happen if multiple URLs to the same type of data are applied (tripUpdates/...) ? Should the consumer choose, or should it merge? |
This is currently not supported AFAIK with the current specification. This would be adding additional feature to the current GTFS behavior. I'm not against it (in fact, I'm even in favor of it), but that's outside of the scope of this proposal IMHO. It could be the next step once this proposal has been adopted. (But, otherwise, I agree, we could be scoping by agency_id for example). |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. |
@LeoFrachet if that is not in the scope then I fail to see why an extra file is needed, opposed to adding extra columns in feed_info.txt |
@skinkie: Because according to the current specification, agencies have to choice to either provide one unique URL with the three feeds in the same protobuf, or one URLs by dataset (aka a total of 3 URLs). |
I think @skinkie is trying to say that this format works when you have a single agency in the dataset:
...but wouldn't work if you have more than one agency in the feed that each has their own GTFS-realtime feed. (@skinkie is that right?) I'd say if we have a producer that's willing to represent mapping of agencies to GTFS-realtime feeds then we can include I do agree that there is some overlap with fields in |
+1 to the overall goal of adding GTFS-RT URLs in to static GTFS feeds. This would be useful for @transitland to associate existing static and real-time feeds, to discover new real-time feeds, and to identified real-time feeds with changed URLs. |
@barbeau what @LeoFrachet is saying: "The spec is currently not drafted that multiple realtime feeds are allowed, and could be merged on the client." This means to me that per GTFS there can be only one GTFS-RT feed of a single type. A per agency split is not supported. Thus then I wonder, why not have the realtime URLs as feed_info.txt in three columns (per type), since only one feed (per type) is supported anyway. The Dutch situation is as follows: we have (three) train GTFS-RT and a (three) GTFS-RT for the rest of public transport. Hence it would be in our own interest to have the option to specify multiple GTFS-RT feeds for a single GTFS file. The current proposal does not write anything about an integrated feed of the three feeds as one, and nothing about merging. So in the current state it would just make things more difficult instead of explicit. |
I would like to add something else, that may be interesting for Google/Bing people. While software like OpenTripPlanner et al can build graphs in hours, and other software (no spam) in minutes. The problem at Google Transit, Bing Maps and others is that it takes days. In this period tripIds might be different from operator standpoint. Synchronisation between the GTFS-RT producing application timetable version and the GTFS version active on the client would suggest that feed_info.txt might be a good place to set up multiple URLs of GTFS-RT for specific usage. Hence if Google Transit had version "Monday" loaded and a new timetable at the producer was at "Wednesday" the feed_info.txt might specify which older versions of the GTFS-RT URLs might be supported. |
Are more than one of these GTFS-RT feeds for the same agency_id in the single GTFS zip file? Or is each GTFS-RT feed tied to a single agency_id from GTFS?
The current proposal is silent on how more than one TripUpdate feed in linked_datasets.txt should be handled, but I don't see why we couldn't add text to say if more than one URL for the same entity_type is provided the client should consume all of the entities from all of the URLs (for active periods, if we define start and end dates), if you're willing to produce this data and this addresses your use case. |
Both GTFS-RT tripUpdate.pb files (train vs non-train), contain multiple agencies. |
Thanks @skinkie and @barbeau for your thoughts! If I try to list the concerns that you have with the current proposal, I got four of them:
The 4 (adding the real-time URLs in the Putting them in another file keeps the doors open to future improvements. And if you both tell me that there is a need today for such improvements, then we can extend the proposal with extra fields, to address the needs you’ve spoken about: Concern 3: Need to scope in time the real-time URLs by adding a start date and an end dateIt’s pretty straightforward, as you said earlier we can add in
=> @barbeau, would you have a potential producer for this? Concern 1: Need to be allowed to specify multiple Trip Updates feeds (respectively Vehicle Positions feeds, Service Alerts feeds) for one GTFS datasetThe simplest solution would be to add in
The two limitations I see are:
=> So far, in the GTFS format, simplicity has always prevailed over reduction of redundancy. I would therefore advocate for just adding an
=> @skinkie, could OpenOV be potentially a producer for this? Concern 2: Need to synchronize between the « static » GTFS dataset and the real-time feedsI agree this is a huge need. I would like to see it solved, but we need a unique identifier for the GTFS dataset for that. Therefore this is a distinct issue that this one IMHO. |
This doesn't work for OpenTripPlanner, hence two GTFS files are exported, and the 3 GTFS-RT files each are matched to a single dataset. Why the later is not a sustainable solution for me: more and more features in GTFS such as pathways.txt observe GTFS to integrate an entire network, at least geographically. Given that we have 'by coincidence' a mode split, others may have an agency spilt. I am not against enforcing the option to only have a single GTFS feed paired with tripUpdates / serviceAlerts / vehicleUpdates. But given that you agree that this is a huge need, I rather have the semantics available to use something side by side without specifying how it is split. |
Indeed. So for now, we could stick to the current proposal, and just add the following warning: Warning: Multiples URLs can provide the same type of data (e.g. two URLs can both provide data for Trip Updates). They will be merged by data consumer and used as if it was only one feed. The merging will be done simply by using the entities of the different feeds, without selection or pruning. |
This one might be tricky to get a producer and consumer for - we can add it to our USF campus shuttle feed, but the fields would just be blank as we don't anticipate changing the URL any time soon. So for this to be meaningfully tested with a real feed we'd need to find someone who is planning to change their GTFS-realtime URLs. I'm fine with this proposal continuing without the start and end dates - we could add these in a future change when a producer needs to change their GTFS-realtime URLs. The existing proposal would still support an immediate changeover - the producer would just need to make sure that the old and new GTFS-realtime URLs worked in parallel for a while during the transition. |
gtfs/spec/en/reference.md
Outdated
|
||
| Field Name | Required | Details | | ||
| ------ | ------ | ------ | | ||
| url | **Required** | The **url** fields contains the URL to the linked dataset. The value must be a fully qualified URL that includes **http**:// or **https**://, and any special characters in the URL must be correctly escaped. See http://www.w3.org/Addressing/URL/4_URI_Recommentations.html for a description of how to fully qualified URL values. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably shouldn't limit to http and https - so it should read The value must be a fully qualified URL that includes a scheme such as **http**:// or **https**://,...
.
I would also add `The value must be the exact URL a client could use to make a request that returns GTFS-realtime data in the protocol buffer format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include websocket as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should exhaustively list protocols here - the purpose of this sentence and including "http/https" is just to give an example that we expect a fully qualified URL.
To my knowledge all websocket implementations of GTFS-realtime are currently for DIFFERENTIAL
feeds, which aren't currently officially supported in GTFS-realtime (see #84), so I'd suggest omitting ws://
examples to avoid confusion. When DIFFERENTIAL
feeds are officially supported we can add more text here that says something like "such as http for FULL_DATASET and ws for DIFFERENTIAL feeds".
As requested, I've added the following example in the first post of this thread. ExampleFor example, the
|
So here's a wrinkle that we haven't discussed - TriMet's GTFS-realtime feeds require an API key (i.e., AppID): If you make a request to the URL listed in TriMet's GTFS data: ...without an API key, you get a "403 Forbidden" response. You also get the same response if you use an invalid URL: To clarify to consumers whether or not a given URL requires an API key for a valid response, perhaps we should add a |
@LeoFrachet it is good you gave this example. My interpretation was different. |
@barbeau: Very good point. If the API key can always be passed in the URL as argument (which is really common), we should even better give the field name like Another option, which is less clean IMHO, would be to put it directly in the URL with a placeholder, like |
This would be great. It would definitely be useful for the transit-feed-quality-calculator that I've been working with. Right now I just have a bunch of API parameter names and keys hard-coded in URLs in a CSV file. If an agency required authentication with something other than an API key, like HTTP basic authentication (see CUTR-at-USF/transit-feed-quality-calculator#31 for Metra in Chicago), how would we handle this? Maybe we need two fields - So TriMet's data would be:
...and a feed with basic HTTP authentication would be:
In other words, |
@mbta should also be able to produce this before the end of the year. |
Hey @LeoFrachet, do you know of anyone who was using the older realtime_feeds.txt file? If Mike makes the change, should realtime_feeds.txt continue to be included in the feed for a period of time? (e.g., is there a policy or best practice on how to deprecate something like realtime_feeds.txt?). |
Good question! I'm not aware of anybody using it. But the real answer is that I don't know. You can try to inform your main consumers & the ones with whom you have an official / legal relationship. For the other ones... well... just pull the plug (aka remove it) and wait for anybody to start screaming (🙏 please don't do that around Xmas or NYE though, developers have families too #BeenThere). |
Ping @juanborre & @gcamp. We should soon have at least 3 producers (MBTA, TriMet & Trillium's datasets). |
Thanks @LeoFrachet ... brings up an interesting point, in that we really don't know the users of our feed. Guessing other producers are also in the dark on who (beyond the gang of four: Google, Apple, Bing and Transit App) the consumers are... |
This is out of the scope of this conversation, but still is a very important other conversation. Maybe one practical way to do it is to survey your riders. They'll tell you which app(s) they are using. This won't give you the exhaustive list of who's using your data, but at least you'll be able to reach out the most important ones (rider-wise). |
@LeoFrachet, I'll add it to our GTFS by the end of the day |
TriMet GTFS contains @TransitApp (@gcamp & @juanborre) you can start the engin! |
We can start to consume this in @transitland as part of our GTFS Realtime cataloging and validation efforts. We're not sure if storing this information inside static GTFS feeds is the best solution in the long run -- we want to continue a wider discussion about formats for linking static GTFS, GTFS Realtime, GBFS, MDS, etc -- but it's worth experimenting with this along the way. |
The one type of "ad-hoc" authentication scheme that has been mentioned on this thread is basic auth. There are also some endpoints that include API keys in their URL paths. For example:
These are so rare it's probably not necessary to handle -- I just mention it for future readers wondering what "ad-hoc" auth might cover. |
@drewda Sounds good! Please let us know when it will be implemented in TransitLand. Thanks! |
Like many people aggregating GTFS feeds I want an easy way to maintain a collection of GTFS schedules and joined GTFS-realtime feeds, the "Current Issue" Leo mentioned when opening this PR. I saw the transitfeeds.com provider as a possible solution and found this discussion when digging a bit more. Similar to what @drewda wrote perhaps we have an easy short term solution to the current issue which doesn't require agencies to maintain a new file: modify the transitfeeds.com API to publish feed URLs by provider. The common case is a single schedule zip and 0 - 3 realtime feeds referring to it. Some providers have multiple zips and realtime feeds, for example mta. The good news is it looks like mta uses the same GTFS ids across zips, for example MTA Bronx GTFS/100646 and MTA Queens GTFS/100646. We'd need to clean up things like provider NJ Transit which has zips containing conflicting ids because it is unclear which is associated with the realtime feed. The idea of a global registry of feeds, grouped by provider that use the same GTFS identifier namespace was discussed quite a bit in the thread Feed identification and naming Leo linked to. Back in 2013 nobody was volunteering to maintain it but now it looks like Transitfeeds aka OpenMobilityData is already doing it. Perhaps this can be the beginning of the https://github.com/transitland/distributed-mobility-feed-registry. I haven't thought about how this can be connected to the transitland feed_id and Onestop ID scheme. Does transitland have a list of realtime feeds per provider/operator? I can't them in the API. |
@TomGoBravo No GTFS-rt feeds in Transit.land as of today, but @drewda is working on that as part of this project - https://www.interline.io/blog/transportation-research-board-funds-gtfs-realtime/. |
Trillium has started producing this file but it won't appear for many of our real-time-enabled feeds for a few more weeks, pending some decisions on public API keys. We have published this data in one of our GTFS feeds, for Marin Transit: see |
Findings from @transitland by @irees: these feeds contain at least one feed version with
mostly trillium feeds, which appear to now include only trimet includes |
that's right, we're working on including content for a number of other feeds for which we'll need to include API key information, http://www.marintransit.org/data/google_transit.zip is the one feed that has entries according to the spec. in a week or so here, a service alerts feed will be added for Marin as well. |
The GTFS for @mbta also has linked_datasets.txt: https://cdn.mbta.com/MBTA_GTFS.zip |
For
We (Swiftly) generally prefer for consumers of real-time feeds to use this instead of a URL parameter to help protect the value of the API key. |
Checkin in on the status of this PR.
|
Trillium is a producer of linked_datasets for ~50 feeds.
…On Wed, Apr 14, 2021 at 12:47 PM Elizabeth Sall ***@***.***> wrote:
Checkin in on the status of this PR.
1. It sounds like there are producers ***@***.***
<https://github.com/mgilligan> TriMet, @paulswartz
<https://github.com/paulswartz> MBTA)
2. Are there consumers?
3. Is there a reason why this hasn't been put to a vote?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#93 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIZDJ5QUIHOO5VM3P4NRDLTIXWMPANCNFSM4FOXIZYQ>
.
--
Lillian Karabaic (pronouns - she/her)
Data Technician
Trillium <http://trilliumtransit.com/> - We make transit easier to use.
|
...(second set of questions)... We are exploring if/how to link various GTFS Schedule datasets – specifically for the use case of specifying inter-dataset fare rules and pathways. To the best of my knowledge this is the most relevant, (sorta) current, related discussion on that matter? cc: TransitApp @gcamp @ and Cal-ITP colleagues @mcplanner @antrim |
I disagree. The most important part for inter-dataset stuff is unique, consistent and predictable identifiers. This issue on how different parts of the same producer can be discovered iff it is open data. |
I am certainly not arguing that this is an optimal way to implement relationships between GTFS Producers - just asking as to if it is most relevant discussion on references beyond a dataset (if not – I would appreciate a link to the appropriate place). |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This pull request has been closed due to inactivity. Pull requests can always be reopened after they have been closed. See the Specification Amendment Process. |
Current Issue
As data consumer, knowing which producer has real-time and tracking which producer adds real-time is time consuming and requires manual searches. Also, keeping track of the changes of the different real-time URLs (Trip Updates, Vehicle Positions and Service Alerts URLs) is also time consuming and could lead to silent bugs or time-downs.
[Updated on 2018-08-09 to add discovery]
[Updated on 2018-08-15 14:55 UTC to add the example]
[Updated on 2018-08-27 12:37 UTC to add the fields
authentication_type
,authentication_info_url
&api_key_parameter_name
]Proposal
We can add those real-time URLs in an extra file of the GTFS.
This proposal adds a new file called
linked_datasets.txt
, containing the fields:url
(String, required): A full URL, linking to the other dataset;trip_updates
(Boolean, required): Whether the dataset at this URL may contain aTripUpdate
entity.vehicle_positions
(Boolean, required): Whether the dataset at this URL may contain aVehiclePosition
entity.service_alerts
(Boolean, required): Whether the dataset at this URL may contain anAlert
entity.authentication_type
(Integer, required): Defines the type of authentication required to access the URL. The allowed values are:api_key_parameter_name
in the URL.authentication_info_url
(String, optional): If an authentication is required, this field contains an URL to a human readable page describing how the authentication should be done and how potential credentials can be created. Required ifauthentication_type
is 1 or greater.api_key_parameter_name
(String, optional): Name of the parameter to pass in the URL to provide the API key. Required ifauthentication_type
is 2.Please note: those datasets are only linked to the core GTFS, they are not owned by it, and therefore the data in
feed_info.txt
doesn't apply to them (e.g. start & end date).Discussion around the naming of the file
TriMet (Ping @mgilligan & @fpurcell) currently uses the extra file
realtime_feeds.txt
with the fieldsurl
,trip_updates
,service_alerts
andvehicle_positions
.My current proposal is only different in the naming of the file (
linked_datasets.txt
vsrealtime_feeds.txt
), for two reasons:feed_info.feed_start_date
) or the set of the different versions of the dataset (seefeed_version
definition). Some open proposal even define “feed” as being the unique and constant source of the different datasets (see feed_id proposal).For those reasons, I think
linked_datasets.txt
encapsulate unambiguously the content of the new file, without adding any useless limitations for the future.Background
This has already been proposed in 2014 in the GTFS-changes Google Group (see here and there). TriMet is currently using the 2014 proposal in their production feed.
Example
For example, the
linked_datasets.txt
file for Madison Metro Transit (which does not require authentication) would be:... and for TriMet (which requires an API key as a URL parameter called
appID
for authentication) it would be:... and for Metra in Chicago (which requires ad-hoc authentication not enumerated in our options, in this case via HTTP basic authentication) it would be: