Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify region configurations #107

Open
AltNico opened this issue Dec 17, 2017 · 8 comments
Open

Simplify region configurations #107

AltNico opened this issue Dec 17, 2017 · 8 comments
Labels

Comments

@AltNico
Copy link
Collaborator

AltNico commented Dec 17, 2017

The configuration of regions are already quite simple, but I think we can even simplify them more. My intention behind that is that we make it really easy for new regions to use osm2gtfs without having to configure a lot of things which could be simply use sane defaults.

The results of these process should lead to a revision of the currently existing wiki article about the configuration saying whether a configuration field is mandatory or not and if not, what kind of default is used.

For example, this is the current configuration of fenix:

{
    "query": {
        "bbox": {
            "n": "-27.2155",
            "s": "-27.9410",
            "e": "-48.2711",
            "w": "-49.0155"
        },
        "tags": {
            "route": "bus"
        }
    },
    "stops": {
        "name_without": "Ponto sem nome",
        "name_auto": "yes"
    },
    "agency": {
        "agency_id": "BR-Floripa",
        "agency_name": "Consórcio Fênix",
        "agency_url": "http://www.consorciofenix.com.br/",
        "agency_timezone": "America/Sao_Paulo",
        "agency_lang": "pt",
        "agency_phone": "+55 (48) 3025-6868",
        "agency_fare_url": ""
    },
    "feed_info": {
        "publisher_name": "Torsten Grote",
        "publisher_url": "https://transportr.grobox.de",
        "version":  "0.1"
    },
    "schedule_source": "http://www.consorciofenix.com.br/api2/linhas.json",
    "output_file": "data/br-floripa.zip",
    "selector": "fenix"
}

Here are some questions:

  • Do we need to specify a bbox if a given network name only exists once in the world? How would be the performance?
  • If a bbox is given, do we need to specify the route type or are the defaults enough?
  • Do every region needs to define their own stops_without_name name or could we use a (internationalized) default name?
  • What is the default of stops->name_auto?
  • Do we need to define an agency or is it enough for testing purposes to just use some osm2gtfs default agency information?
  • Do we need to give information about the publisher or is osm2gtfs and the link to the repo enough, at least for testing purposes?
  • Do we need to specify a version or could this be automatically generated?
  • Do we need to specify the path of the output file or could some automatically generated default be used?
  • Do we need to specify a selector or could it be generated from the file name of the configuration?

For example, it would be really cool if we could use configurations like this:

{
    "query": {
        "tags": {
            "network": "NI-Estelí"
        }
    },
    "schedule_source": "https://github.com/mapanica/pt-data-esteli/blob/master/timetable.json"
}

Sure, the ability to configure everything is cool but we should not overwhelm users with it. In my opinion.

@pantierra
Copy link
Contributor

pantierra commented Dec 24, 2017

Hey, cool idea, I like it a lot. Will try to answer some of your questions:

Do we need to specify a bbox if a given network name only exists once in the world? How would be the performance?

If there were a clear way in OSM to unite networks and to avoid name clashes - so, in an ideal world - no problem. In reality, this doesn't really can be expected to be formally correct. To go for a bbox and free tags to be applied, was the most flexible approach we thought we were able to take there. And generally, to cover more cases we probably want to provide this combination. As for this issue, I would like to go for making the bbox optional. This should be possible, and then also performance issues are problem of the one who uses the config file to create their query.

If a bbox is given, do we need to specify the route type or are the defaults enough?

Also here, we can not expect a completely consistent schema on the OSM side. I think the combination of (optionally) using bbox and combining tags is really the best way to allow this script to be used flexibly. But yes, we should make this optional and stick to simple defaults. public_tansport:version=2 has been the one in the past to generally select based on a bbox.

Do every region needs to define their own stops_without_name name or could we use a (internationalized) default name?

We could use a default name and surely make this optional!

What is the default of stops->name_auto?

It is a nice logic, already coming from the first city this script was made for. It basically queries OSM for relevant places close to a stop without a name set and then assigns - if found - the name to the stop.

The default behaviour: not executing it, until opt-in:

if self.auto_stop_names:
    self._get_names_for_unnamed_stops()

Do we need to define an agency or is it enough for testing purposes to just use some osm2gtfs default agency information?

The GTFS specifications should guide us here. And there it seems we have to provide some required data. This probably we can not make optional and needs to be introduced by a human.

Do we need to give information about the publisher or is osm2gtfs and the link to the repo enough, at least for testing purposes?

In the GTFS specs there are also required values for publisher name and url, etc. If it is required in GTFS I think we should not fill it in with dummy content.

Do we need to specify a version or could this be automatically generated?

No idea. Very good question.

Do we need to specify the path of the output file or could some automatically generated default be used?

I think we should provide a default of data/<SELECTOR>.zip and make this field optional.

Do we need to specify a selector or could it be generated from the file name of the configuration?

This is a tricky question. As a default we support a file living in the osm2gtfs root with the name config.json, with not specifying a selector, this could only use standard creators. If we want to say, all creators should live in the creators directory and follow the naming convention #83, then we could derive from this the selector. But then we should also get rid of using the config.json, which would be also a pity, because it is a very immediate entry point to use the script.

@pantierra
Copy link
Contributor

In the wiki, I added a general overview of the GTFS values and where they are coming from, and how they may be overridden. Maybe this list is also useful for the thoughts in this issue to optimize it a bit.

@nlehuby
Copy link
Collaborator

nlehuby commented Dec 26, 2017

GTFS agency matches pretty well with what OSM calls operator.
We may consider using the operator tag on route_master as default instead of providing agency in the config file. (Whereas the network tag could be a realistic fallback too)
The main difficulty with be with the agency url (which is a required field), as there are none in OSM.

@prhod
Copy link
Collaborator

prhod commented Dec 26, 2017

I also think the default behavior should be to use OSM data. But I prefer the network over the operator :p
For the URL, we could use the osm2gtfs github URL (as the source of the feed, even if it's not what it's expected). And for the TimeZone (also required), there may be a way to find the local one ?

@ialokim
Copy link
Contributor

ialokim commented Dec 27, 2017

And for the TimeZone (also required), there may be a way to find the local one ?

Or even better, the timezone inside the bbox?

@nlehuby
Copy link
Collaborator

nlehuby commented Dec 28, 2017

Here is an open API to find the right timezone : https://timezones-api.now.sh/timezones-4fbc08f/by_point.json?longitude=-0.1406632&latitude=50.8246776
The source data for timezones is derived from OSM.

@Skippern
Copy link

Skippern commented Dec 28, 2017 via email

@prhod
Copy link
Collaborator

prhod commented Dec 30, 2017

I looked closely the GTFS specifications on this. The reference Time zone is the Agency location. There could be a difference in the time zone for stops, but stop_times are specified with the Agency time zone. (be carefull, if the feed contains several agencies, the all should be with the same time zone).
When looking at @nlehuby api, there is a Spatialite database in the source with the shapes of time zones. I think of those methodes :
Use the API :

  • calculate a centroid of all stops and using the external API to specify the time zone
  • call the external API for all the stops (not very efficient !)
    Use the underlying DB :
  • select the time zone with the most intersection with the bbox
  • select the time zone of the centroid (using the API is better for this I think)
  • get the time zone of all the stops, and use the most used time zone for the agencies
    What do you think ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants