GV Scraper gathers bus information from Grande Vitória, by different companies and sources, in different formats, for use with osm2gtfs
The scraper script downloads the timetables on PDF files, supplied on the web site of Expresso Lorenzutti and Sanremo, and extracts the timetable from it before storing them as a JSON file.
For Transcol
and Seletivo
, it uses the same JSON interface, used by the Ceturb site.
For Planeta, timetables are posted as tables in HTML, each variation is a separate route, using the page index as ref
tag on the routes.
The JSON format is developed in collaboration with the developers of osm2gtfs for full functionallity.
The script requires pdfminer
, requests
, overpass
, logging
, json
, workalendar
and datetime
python moduls and runs under Python2.7
Install dependencies by running
pip install -r requirements.txt
- osrm need to be installed manually. If not installed, or if install not importing, fallback to
YOURS
overrequests
In each folder, to obtain the duration of the routes, just run get_duration.py
. To generate a times.json
file for osm2gtfs
, when durations.json
is up to date, just run get_times.py
There is a separate script, get_durations.py
that tests the route relations against OSRM
to generate a list of durations. This script is only needed to run when significant changes have been done in the itenerary, or new routes have been added. Mark that it will not erase the duration of routes that have been discontinued.
Routing is done by selecting the route relation in question with an overpass
query, and creates a list of waypoints that are passed to the selected routing engine.
If there are no route relation for a specific route, it returns -1
duration, this is a signal to the scraper to test against the default value (60). Mark that routes that doesn't have a relation will not be handled by osm2gtfs
either. Other negative values have different meanings, but for short means that no relation found or impossible to calculate route due to missing waypoints.
-1
: Route have no valid stop positions.-2
: Route have only one valid stop position, and it is neither start, nor end.-3
: Route doesn't start with a valid stop position.-4
: Route doesn't end with a valid stop position.-5
: No valid routes found-6
: Circular route (same start and end position) with no aditional stops
get_durations.py
depends, in addition to the above mentioned, on overpass
and osrm python modules.
As a fallback if osrm
is not installed, or installation doesn't work, routing can be handled by a YOURS
web interface, using requests
calls. This is ment as a fallback, since YOURS
must route between two nodes, so a long route must be called in a series of calls, instead of osrm
that can take the entire waypoint list in one call.
For routes such as Transcol, I have added feriados.py
, requiring workalendar python module. The workalendar
give a system for handling holidays, and feriados.py
use them to create different lists of holidays within a given year. This way, exception
s can be handled in an intelligent manner. workalendar
handles fixed holidays as well as moving holidays.
- Lorenzutti (Guarapari - PDF)
- Sanremo (Vila Velha - PDF)
- Seletivo (Grande Vitória/Ceturb - JSON)
- Transcol (Grande Vitória/Ceturb - JSON)
- Viação Grande Vitória (Vitória - HTML)
- Flesha Branca (Cachoeiro de Itapemirim - format not identified)
- NovoTrans (Cachoeiro de Itapemirim - format not identified)
- Cartão Melhor (Cachoeiro de Itapemirim - format not identified)
- Planeta (HTML)
- Santa Luzia (format not identified)
- Alvorada Site not publishing times except for the airport express, and no
ref
numbers. - Águia Branca Site contain no useful API, but let you buy tickets for destinations, can possibly be used to verify
ref
tags. - Sudeste
- Real Ita
- Viação Joana d'Arc
- Viação Pretti
- Viação São Gabriel
- Mutum Preto
- Viação Marilândia
- Cordial
- Lírio dos Vales
This will not be pursued, if a proper API can be found, this can be done per company, but also mapping of such routes can be challenging as some of them spans the entire territory of Brazil. It will be preferred if these companies can supply their own GTFS
sources.
- EFVM Estrada Ferroviaria Vitoria Minas. Static
times.json
file.
DER-ES: Quadro de Horários List of all intercity concessions given by DER-ES with contracted time-tables. This shows the contracts with the state, not necessary the reality. Tables updated infrequently.Moved to CETURB-ES- CETURB-ES: Quadro de Horários List of all intercity concessions given by CETURB-ES with contracted time-tables. This shows the contracts with the state, not necessary the reality. Tables updated infrequently.
- ANTT: Consulta Linhas que Fazem Ligação entre Duas Localidades Look up companies and lines connecting two locations. Useful to look up interstate bus connections.
- ANTT: Informações de empresas, linhas, veículos e seguro Lists different pages to look up companies serving certain points, linking locations, or other. Useful to look up interstate bus connections.