Skip to content

Contributor Meetings

John Levin edited this page Jul 12, 2023 · 11 revisions

2023-07-12

Agenda

  • Review status of spec PRs (almost) ready to go. Can we get them moving….

    • Add trip_id_scheduled to event tables PR#142
    • Add trip_stop_sequence to event tables and stop_visits PR# 112
    • Add fare_action enums PR#122
  • Discuss recent spec edits and issues

    • Review proposed values for trips_performed.schedule_relationship (issue #138)
    • Review proposed new field trips_performed.service_type (issue #138)
    • Review proposed values for stops_visit.schedule_relationship (issue #139)
    • Discuss use of ODS along with GTFS (issue #151)
  • Check on status of documentation work

    • Add documentation of documentation process (PR #152) - is this complete, the markdown isn't rendering for me, but maybe I am not doing it right…
    • Outstanding: Continue edits to draft Best Practices section
  • Update on status of Continuous Integration fixes

  • Next steps on adding sample data structure and documentation

    • Add samples structure and documentation PR#100
    • Github workflow to validate samples, PR #104
  • New topics for discussion

    • What is the specific meaning of dwell in stops_visited? (Discussion #153)
    • Do we want to have a way to define a shape for a trip that has been added (in advance of this being defined in GTFS)
    • Have we addressed all the specific needs for using TIDES with frequency based service?
    • What is the meaning of the amount field on fare_activities when the fare_action is ACTIVATE?
    • If a fare collection records location (lat/long), whould we add the location on fare_transactions, or do we create a separate vehicle_locations record for this information? (I feel like we discussed this…)

2023-05-17

Agenda

  • Issue 137 and new Frictionless document
  • Issue 91 and new Best practices document
  • Issue 138 and relationship between reference schedule and trips_performed
  • Issue 139 and relationship between reference schedule and stops_visited

2023-05-05

Agenda

  • Any outstanding issues or pull requests that you would like to bring up. Are there new items that need discussion or old items that need unsticking.

  • Go back to the conversation we started to have about different trip “scenarios” and how they are represented in TIDES. I think much of this has been implicit and I think we need to make it explicit for our conversations and eventually for documentation. I have started a document here

  • Brief mention of an upcoming TRB Webinar on Improving Access, Quality, and Management of Public Transit ITS Data. The webinar will cover content from TCRP Report 235 as well as TCRP Synthesis 153 (The Transit Analyst Toolbox: Analysis and Approaches for Reporting, Communicating, and Examining Transit Data)

2023-04-19

2023-04-05

2023-03-22

2023-02-22

2023-01-25

Agenda

  • Review outstanding PRs and Issues

    • Any final discussion of PR #95 add optional count field from approvers
    • Any final discussion of PR #96 add pattern_id from approvers
    • Any final discussion of PR#97 support more route types and modes from approvers
    • Any final discussion of PR#100 sample data from approvers
    • Any final discussion of PR #72 make trip_id_performed optional and Issue #88 Relax constraints as much as possible (I’d like to finish this off)
    • Discuss Issue #99 Accommodate other types of instantaneous events (Jay could you kick this discussion off)
  • Discuss details of proposed Vendor meeting

    • At the TIDES Update meeting at TRB it was recommended that we hold a meeting for system vendor to introduce them to the TIDES project and spec. I’d like to discuss this and get any feedback from the Contributor group before we schedule this.
  • Continue discussion of procurement requirements based on TIDES spec

    • See Issue #94 and Issue #98

Notes

  • PR #95 add optional count field from approvers

    • Some discussion of how we would use a 0 value. Maybe for an known operator movement. Group agreed to keep this as a valid value.
    • Okay to approve.
  • PR #96 add pattern_id from approvers

    • Okay to approve
  • PR#97 support more route types and modes from approvers

    • Okay to approve
  • PR#100 sample data from approvers

    • Okay to approve
  • PR #72 make trip_id_performed optional and Issue #88 Relax constraints as much as possible (I’d like to finish this off)

    • Okay to approve PR #72
  • Issue #99 Accommodate other types of instantaneous events (Jay could you kick this discussion off)

    • There was general discussion of the types of events that we might want to capture: passenger events that don't involve fare payment or don't occur at a stop, e.g., cord pull, vehicle events recorded by stationary equipment such as train tracking systems or track circuits.

    • We could create new record types and split location information into a separate record type. This is more normalized, but adds a lot of complexity.

    • One goal is to make it easier for vendors to store information. The closer the data structure is to how the vendors manage information natively, the easier it will be for them to provide data correctly. But we also don't want to overload fields which is how some vendors store different types of data in common sets of fields. This would suggest more specialized tables depending on the type of information being transferred.

    • At the same time, the more different types of records there, the harder it may be to predetermine the correct tables and fields to have.

    • Our next step should be to define all the different types of data that folks think we would want to capture and identify the fields that would be associated with each type of data. That will allow us to see which types of data are similar enough to go into a single table vs. which should be split out into separate tables.

  • Discuss details of proposed Vendor meeting

    • Agreed that we should have this meeting. John will continue with planning.
  • Continue discussion of procurement requirements based on TIDES spec (See Issue #94 and Issue #98)

  • Question came up of whether TIDES will work with European data structures

    • There is no reason it shouldn't. It will be up to European system vendors and transit agencies to review the specification and determine whether it can be used to store the data they have.

2023-01-11

Notes

  • What is the difference between GTFS-ride and TIDES?
    • There is some overlap in the content and purpose, but there are important differences
    • GTFS-ride is extension of GTFS, TIDES can use GTFS or other sources for schedule data
    • GTFS-ride is more complete expression of ridership data and is intended to reflect complete ridership for reporting, especially at the system and route levels. TIDES has stop and trip ridership, but is not designed inherently to include route, system, day/month/year, summarization.
  • What is the killer app for TIDES?
  • Need to include vendors in the conversation
    • Need to create language to ask vendors for data (and to include in RFPs)
    • We should have a workshop for vendors
    • Concern that vendors will require expensive work orders from each customer in order to provide data in TIDES format
  • We should write a research statement for implementation support
    • TRB, FHWA, FTA, AASHTO, Transit IDEA, etc.
  • Need to make sure we are planning for self-sustaining community for the TIDES specification
    • Plan for maintainers conference
  • What do we plan for executive level communication?
    • They need something that more clearly demonstrates the value of TIDES
      • For example: a dashboard-in-a-box tool based on TIDES

2022-12-14

Agenda

  • Discuss open issues
    • Any remaining discussion needed to finalize the sample data structure
    • Final resolution of Issue #90 Add pattern and pattern_stop tables
    • Final resolution of Issue #89 Drop operators table
    • Final resolution of Issue #85 Support extended route types, NTD modes
    • Further discussion of Issue #73 Add optional count field to passenger_events table (I can’t recall if we resolved this)
    • Other open issues for discussion?
  • Discuss plans for presentation and meetings at TRB
    • Any thoughts on what and how to share about the status of the work
    • Opportunities to recruit more participation in the spec development and data samples
  • Plans for future contributor meetings into 2023
    • Do we want to meet on 12/28?
    • Do bi-weekly meetings work? Other suggestions.

Notes

  • Any remaining discussion needed to finalize the sample data structure

    • Agreed to split the PR into pieces so we can proceed with the elements that are agreed on a ready to go
    • Elizabeth and Joey will work this week to get as much resolved as possible and leave any remaining issues open
  • Final resolution of Issue #90 Add pattern and pattern_stop tables

    • Many find that pattern stops are an important part of transit information and useful for analysis and reporting
    • If the spec doesn't include details on patterns, and if there is no other place for this, then tool providers will need to provide this on their own
    • In the short run, it will be easier to develop tools from TIDES data if there is detail about patterns and pattern stops with the data
    • On the other hand, pattern information is really part of the schedule and should be referenced there. Adding it to spec sets in stone something that really shouldn't be there, and presumably will need to be supported
  • Final resolution of Issue #89 Drop operators table

    • Similar to pattern information, the operator information should be coming from somewhere else, e.g., the schedule
    • This should be optional, but it could be useful information to have easy access to when working with TIDES data
  • Final resolution of Issue #85 Support extended route types, NTD modes

    • Do we really need the GTFS types if the trip is linked to the GTFS shedule data. Maybe not, but still useful to have repeated here.
    • Do we need both GTFS fields (regular and extended) if one is an extension of the other
    • Agreement to keeo NTD mode and custom agency string fields
  • Further discussion of Issue #73 Add optional count field to passenger_events table

    • agreement to add count to field for flexibility.
    • work with sytems that have separate record for each event and for those that agregate events that occur at about the same time
    • one downside of this more flexibile approach is that when there are different ways to do something, it is harder to understand the spec and there is more variation in implementations
    • still need to discuss concept of "device" events vs. "passenger" events. Jay and Gabriel to write this issue up
  • Other open issues for discussion?

    • Elizabeth brought up requirements for an upcoming RFP.
      • have approved, versioned spec

      • Have explicit requirements in the spec for fields that are wanted

      • Options

        • Prevously discussed that this could part of the documentation/specs of the tools that use that data, not of the data spec. Downside is that requirements for various functions/features are not managed in one place.
        • Point to best practices for the fields required. Downsides: harder to indicate that this is "official" part of the spec; and not possible to validate data
        • Form TIDES for each specific requirement. Downsides: Harder to manage the overall spec a single community
        • Tag fields when they are needed for a certain requirement, i.e., feature tags. -- This was the preferred approach of the group. Elizabeth will document in an issue.
  • Discussed future Contributor meetings. Agreed to cancel 12/28/22 meeting. 1/11/23 meeting will be at TRB. Will schedule bi-weekly meetings starting 1/25/23

2022-11-16

Agenda

I’d like to focus most of our time on how we can start trying to convert the data we already have into the TIDES structure, even though the spec isn’t complete yet. I am hopeful that working with data from more agencies will help to identify new issues and help guide the decisions on some of the existing issues.

  • Review proposed structure for sample data on GitHub
  • Discuss options for starting to work with your agency data
  • Tour of discussion board (stay tuned…)
  • Brief review of new Issues with spec
    • #80 Eliminate unnecessary inconsistencies with GTFS - Elizabeth
    • #82 Some fields in the TIDES spec clash with Protected Keywords in SQL Databases - Erik
    • #85 support extended route types, NTD modes - Joey
    • #88 Relax constraints as much as possible - Joey
    • #89 Drop operators table - Gabriel
    • #90 Add pattern and pattern_stop tables - Gabriel
    • #91 Expand documentation’s architecture section - Gabriel

Notes

  • John provided a high-level overview of the plans to add a structure for sample data sets to the GitHub repository

  • Joey summarized the structure of the sample data sets

    • More information is in Issue #40 and PR #75
    • Documentation about the sample data structure is already available on the Documentation site
    • Each agency would have it’s own folder; agency can have multiple samples
    • Each sample has three folders: raw, TIDES, scripts
    • Each sample has a datasample.json file to define the contents of the sample (source of data, processing applied, etc.)
    • There are different ways to reflect the iterative nature of TIDSES, perhaps with multiple copies of the same file in the TIDES directory to reflect successive steps in improving the data
  • John encouraged transit agencies to try to convert their data sets into TIDES format, and to push samples of data and scripts to the repository if they can

  • There are some resources available to help contributors work through the details of converting their data into TIDES format (Jameelah has some time set aside for this and others are willing to provide assistance and advice)

  • John briefly mentioned that we will be using the Discussion feature of GitHub in particular to discuss questions, challenges, and needs for assistance with converting data into TIDES

  • We discussed the following issues in more detail

  • There was significant discussion about the datpackage.json file and exactly what this should look like.

    • One option is a standard approach, using mostly the existing elements. This would be faster and simpler to setup
    • Another option is a customized structure, with a define profile, that would allow the datapackage.json to be validated.. This would potentially allow for more clarity.
    • Agreed we would work to setup a profile for datapackage.json (Issue #93)
  • Discussed some of the issues from the agends:

  • #80 Eliminate unnecessary inconsistencies with GTFS - Elizabeth

    • This is partly an issue with GTFS (especially the issue with the definition of “heading”)
    • Contributors encourage to comment on the GTFS issue
  • #85 support extended route types, NTD modes - Joey

    • Discussed the balance between data producers being able to represent the details of their service (unique classifications, etc.) and the value of having common shared definitions
    • It is preferred to have guidance on how to use the spec to be built into specs, not rely on separate requirements and documentation
    • Several commented on the benefits of having both the GTFS and NTD types
    • And several commented on the importance of having separate fields that the agency can define
    • In the end it seems that a “both and” approach is appropriate here. Have several different fields that can be use to define route types, including a string field that is more flexible
  • #89 Drop operators table - Gabriel

    • Discussed what other external information we would want to associate with operators. There was no general agreement on universally needed fields
    • This discussion continues on the GitHub issue…
  • #90 Add pattern and pattern_stop tables - Gabriel

    • Discussed if this information could be inferred from stop_times in GTFS
    • Discussed whether this something that might reasonable be added to GTFS (probably) not) or Transit Operational Data Standard (TODS) (more likely)
    • Most agreed that patterns are useful, but that requiring them might make things more difficult. General agreement that should be included somehow, but should be optional

2022-11-02

Agenda

Picking up on from conversation last week, there are two main topics I would like discuss when we meet:

Topic #1 - Does the TIDES specification reflect only a complete, internally consistent and referenced data structure or does it define one or more incomplete/partial data structures.

To be clear, we all agreed that there is a need to have data that is incomplete and that agencies will use the TIDES spec to represent this data. The question is whether we are more or less formal in how we define and represent those less than complete data sets.

Related Issues: #70, #71

Discussion

  • We know that many users of TIDES have incomplete data sets and they will want to store that data in the TIDESstructure.
    • This may include agencies that have multiple streams of the same type of data from different systems
  • There will be tools that take in this incomplete data and transform it into more complete data. We want to define both the required minimum data for input to these tools and the data that will be output from them.
  • Agencies will require data from vendors in TIDES structure but it is likely vendors will not be able to provide completely formed TIDES data, so we want to make sure agencies can be clear on their requirements.
  • If the TIDES spec only documents a complete data set, how do we define and reference incomplete datasets (for example for tool requirements or RFPs)
    • Do we build this into the data specification somehow? Do we rely on separate documents, best practices, etc.?
    • Where do we put that information so that it is meets various user need/use cases?
  • If the TIDES spec allows for both complete and incomplete data, do we define the different options explicitly or do we just make more fields optional so that incomplete data still conforms to the spec
    • Since there are many possible ways data could be incomplete, how would we keep this option at a manageable scale?

Implications to consider

  • Usefulness of the spec for transit agencies
  • Ability of tool creators to do their work effectively
  • Ability of agencies to write effective RFP/contract language requiring data in "TIDES format"
  • Complexity of defining and managing the specification moving forward

Topic #2 - What is the nature and relationship of trip_id_scheduled and trip_id_performed and the relationship between the event tables and the trips_performed table.

Related issues #51, #70, #71

I think we are close to consensus on this:

  • Trip_id_scheduled is the GTFS (or other? external reference) to the scheduled trip. It is/will be optional on both the event tables and the trips_performed table. This identifier is not unique to a specific date.
  • Trip_id_performed is a unique identifier for each trip that is operated. The same scheduled trips operating on two dates have different values.
  • Trip_id_performed may be generated by an operational system or it may need to be generated by a tranform process on raw data.
  • In a completely formed data set, the event tables and trips_performed table are linked by the trip_id_performed field.
  • From an incomplete data set, there are several different paths by which event and trip_performed tables could be managed. Trips_performed might be inferred from the events and/or it might be inferred from the GTFS schedule. Or possibly trips_performed could be generated directly from an operational system.

Implications to consider

  • Unscheduled trips and deadhead trips are important to consider and address
  • Other service disruptions such as multiple vehicles operating the same scheduled trip, canceled trips, etc.

Other topics

If we time, we can also discuss the following issues from GitHub:

Issue #73 -add optional count field to passenger_events table

  • I think we have achieved consensus to add the field.

Issue #35 - Wide vs long format for stop_visits (transaction_revenue_x, transaction_count_x)

  • Did we reach consensus in the discussion that the two format are each appropriate in their own context, or should we discuss more?

Notes

Topic #1 - Does the TIDES specification reflect only a complete, internally consistent and referenced data structure or does it define one or more incomplete/partial data structures.

Discussion

  • One goal of TIDES is to handle imperfect data. Data quality is one of the biggest issues with ITS data.
  • We need to think about the “process” of working with TIDES. Cleaning of data and transferring it into a standard structure is one step of this process. Does it occur inside or outside TIDES.
  • Do we want to define different “phases” of data explicitly in the spec or do we cover this in Best Practices and/or leave it up to the tools to define what data they need in order to work correctly (and what data they produce as output)
  • One option is to have few required fields and many optional fields. This is the approach of GTFS.
  • The downside of this is that the minimum data required can’t do much. You ultimately need the “optional” data to do anything useful
  • This again points to the Best Practices to define the bare minimum data that is needed to do something useful. In other words, define the data that is required for any given feature. (This is similar to what Report 235 does in defining the fields that are required to calculate the different KPIs)
  • On the other hand, Analysis of operational data is complex and varied and open ended. Trying to define ahead of time what fields are needed for any given analysis is tough to do. Better to leave it to the Tools to define what data they need.
  • What if we have a strict definition, but know that not everyone will follow it? Downside of this is that it make it harder to validate data.
  • Could we define a way to define not only the data but also the error that are present in the data.
  • Could we create a metadata table that accompanies the data files. Could be created by the data provider and/or could be created by data validators. Would be ready by tools to know what the data is. On the other hand, the tools could do this task on their own…
  • Defining a spec/status for raw data would help with the development and operation of cleaning tools.
  • Need to be able to provide a flexible spec that has a standardized data structure but does not require data to be complete and consistent.

Topic #2 - What is the nature and relationship of trip_id_scheduled and trip_id_performed and the relationship between the event tables and the trips_performed table.

Discussion

  • Some agencies have uniquely serial trip ids through time
  • This is not currently required by the TIDES spec
  • There was discussion of whether trip identifiers should be
  • Surrogate key vs. Natural key
  • Primary key vs Composite key (including date)

2022-10-26

Agenda

There are a few fundamental questions that have come up in work on a number of issues over the past few weeks.

  • Issue #71 asks an important question about how we intend to use the TIDES data specification. In theory, vehicle location and passenger events in the events data files are presumed to be associated with a specific trip (trip_id_performed). But sometimes the data that is available does not have trip information or that information is incorrect. So this issue also begs the question of whether data is in "TIDES format" only when it is complete and consistent, or if TIDES is also used to represent raw data that is still on its way to becoming a better version of itself. And if we do want to use TIDES to represent "incomplete" data, how do we communicate that?

  • Issue #51 (and Issue #70) address some details about the use of trip ids in the event data files and the trips_performed file. But they also introduce a deeper questions about the relationship between trips_performed and the other files. Is trips_performed intended to truly be a summary files, only pulling information that is available in other files? Or is trips_performed also a source data file that contains new information that is needed to manage and move data?

  • A related issue I would like to discuss is whether we are thinking of TIDES as more a data table specification, in that it defines a structure of tables and fields that will always be present (even if they are blank) or is TIDES more of a data interchange specification that identifies a range of tables and fields that may or may not be present? This issue also has particular impact on how we think about the various tools that will be developed to support the use of TIDES and to leverage data in TIDES format.

My request is that if you have been involved in the discussion of one of these issues, please come prepared to provide a brief summary of the issue from your perspective and the options you see for resolution. We will start with those comments and then open the floor for questions and discussion.

In the end, I would like to try to come to consensus on the resolution of these issues. For me consensus is defined as everyone being able to "live with" the outcome. It doesn't mean everyone has to love it. We just all have to agree it is viable.

If we are not able to come to consensus, then I will most likely either a) make a decision so we can keep moving forward, or b) put the issue on hold for a future discussion.

For our next meeting, in just a week, I am hoping we will continue to work our way through any open issues that require discussion (see the Discuss flag in the issues list). Two specific issues I want to address are the limitations and challenges with the frictionless schema as the basis for TIDES (Issue #77) and how we want to structure the sample data files (Issue #76). If there are any other topics you want to add to that agenda, let me know.

Notes

Issue #71 asks an important question about how we intend to use the TIDES data specification. In theory, vehicle location and passenger events in the events data files are presumed to be associated with a specific trip (trip_id_performed). But sometimes the data that is available does not have trip information or that information is incorrect. So this issue also begs the question of whether data is in "TIDES format" only when it is complete and consistent, or if TIDES is also used to represent raw data that is still on its way to becoming a better version of itself. And if we do want to use TIDES to represent "incomplete" data, how do we communicate that?

Discussion

  • Some agencies just have bread crumb record from GPS/AVL. They then use tools to match the records to the scheduled and/or operated trips.
  • Sometimes there are multiple AVL/GPS systems working in tandem both providing data
  • If TIDES is for upstream vendor data, then we need to define TIDES as working with incomplete data
  • Deadhead trips aren’t in schedules, but we still get data from them and want to manage and match that data
  • We need to the spec to be able to handle these and other cases of incomplete data
  • On the one hand, defining the complete TIDES spec provides values and makes guarantees about a system/tool can find in that data. It is okay to have incomplete data, but can’t call that “TIDES”. Compliance with the spec means the data is all there.
  • On the other hand, calling partial data TIDES makes it clearer what other data is “still to come”
  • Do we define different levels of completeness? We could add this to the metadata for a file Or is this too complex? Too many different possibilities; it wouldn’t be practical to try to define and maintain them all.
  • What affect will this have on governance and our ability to easily manage the spec
  • Where to draw the line in defining TIDES is important in terms of how others work with it.

Issue #51 (and Issue #70) address some details about the use of trip ids in the event data files and the trips_performed file. But they also introduce a deeper questions about the relationship between trips_performed and the other files. Is trips_performed intended to truly be a summary files, only pulling information that is available in other files? Or is trips_performed also a source data file that contains new information that is needed to manage and move data?

Discussion

  • In part this is a semantics issue regarding the terms “event” files and “summary” files.
  • Some summary files represent events and sometimes summary files will be generated directly from operational systems.
  • Some summary files are not really “summary” in that they may contain unique data (e.g., if trips_performed is generated from other sources
  • In the end, group agreed that there are different ways for data to flow through TIDES, so we don’t want to be too strict in our definition of “summary” to allow for that.

2022-09-28 - Kickoff Meeting

Agenda

  • Brief “tour” of the TIDES-transit GitHub site and best practices for contributing
  • Review the current status of the specification, open issues, and opportunities to contribute
  • Current governance approach and plans to evolve governance over time
  • Plans for regular contributor meetings and communications

Notes

  • John provided a brief update on the progress of the TIDES project and introduced members of the Project Management Team • John explained the role of Cal-ITP and their contractors to provide assistance in setting up and managing the the GitHub repository. • Jameelah introduced the repository and file structure. She is a primary point of contact for any issues using the site and can be reached at jameelah.y@jarv.us • Joey reviewed the spec files and described the Frictionless schema that we are using to define the files and fields. • Frictionless Schema Overview: https://specs.frictionlessdata.io/table-schema/#language • John briefly reviewed plans for future discussion of future governance for the project

Questions from the chat:

• Michael Paine: Is there a minimum time windowing for the historical data? 1 second, 1 hour, 1 day?

• Nisar Ahmed: How is this spec different from the ODS that Cal-ITP has been working on? • Eichler, Michael: where is this effort compared to GTFS-ride? is there any thought of finding some common ground or merging the efforts? • https://docs.calitp.org/operational-data-standard/spec/https://github.com/ODOT-PTS/GTFS-ride/blob/master/spec/en/reference.md • There was a brief conversation about ODS, GTFS-ride, and TIDES

• Brendon Hemily: Were you planning to have an in-person meeting at TRB? • John: Yes. We are planning to have an in-person meeting. Details tbd.