Skip to content

Commit

Permalink
feat: add GTFS Loader (#150)
Browse files Browse the repository at this point in the history
* chore: add gtfs_kit dependency

* chore: add trips count to GTFSLoader

* chore: load directions from gtfs feed

* fix: make gtfs_kit import optional, fix imports in __init__.py, use WGS84_CRS constant

* chore: add example notebook for GTFSLoader

* chore: add GTFS feed validation

* fix: use warnings for gtfs validation error

* test: add pytest-mock dependency and gtfs loader validation tests

* test: add GTFSLoader tests

* chore: update CHANGELOG

* test: add gtfs_kit to optional dependencies tests

* fix(pre-commit.ci): auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
piotrgramacki and pre-commit-ci[bot] authored Feb 13, 2023
1 parent 2bd6fac commit 199adea
Show file tree
Hide file tree
Showing 14 changed files with 1,606 additions and 919 deletions.
2 changes: 1 addition & 1 deletion .flake8
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[flake8]
max-line-length = 100
max-doc-length = 100
extend-ignore = E203
extend-ignore = E203,B028
exclude =
.git,
.venv,
Expand Down
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ repos:
rev: v0.991
hooks:
- id: mypy
additional_dependencies: ["types-requests"]
- repo: https://github.com/pdm-project/pdm
rev: 2.4.3
hooks:
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased] - 2022-MM-DD

### Added
- GTFS Loader from gtfs2vec paper

### Changed
- Change embedders and joiners interface to have `.transform` method
Expand Down
1 change: 1 addition & 0 deletions examples/loaders/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
Examples illustrating the usage of every Loader.

- [GeoparquetLoader](geoparquet_loader.ipynb)
- [GTFSLoader](gtfs_loader.ipynb)
2 changes: 2 additions & 0 deletions examples/loaders/files/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# example GTFS used in notebook
example.zip
127 changes: 127 additions & 0 deletions examples/loaders/gtfs_loader.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# GTFS Loader Example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"from srai.loaders import GTFSLoader\n",
"import gtfs_kit as gk\n",
"import geopandas as gpd\n",
"import numpy as np\n",
"from shapely.geometry import Point\n",
"from srai.utils.constants import WGS84_CRS\n",
"from utils import download"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Download an example GTFS feed from Wroclaw, Poland\n",
"\n",
"In this notebook we use the GTFS feed for Wroclaw, Poland as an example, which is available in Wroclaw's open data repository[1]. This download uses transitfeeds.com[2] to download the feed, but you can also download the feed directly from the Wroclaw open data repository.\n",
"\n",
"1. https://www.wroclaw.pl/open-data/dataset/rozkladjazdytransportupublicznegoplik_data\n",
"2. https://transitfeeds.com/p/mpk-wroc-aw/663"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"wroclaw_gtfs = Path().resolve() / \"files\" / \"example.zip\"\n",
"gtfs_url = \"https://transitfeeds.com/p/mpk-wroc-aw/663/20221221/download\"\n",
"\n",
"download(gtfs_url, wroclaw_gtfs.as_posix())"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Peek at the feed using `gtfs_kit` directly"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"feed = gk.read_feed(wroclaw_gtfs, dist_units=\"km\")\n",
"\n",
"stops_df = feed.stops[[\"stop_id\", \"stop_lat\", \"stop_lon\"]].set_index(\"stop_id\")\n",
"stops_df[\"geometry\"] = stops_df.apply(lambda row: Point(row[\"stop_lon\"], row[\"stop_lat\"]), axis=1)\n",
"\n",
"stops_gdf = gpd.GeoDataFrame(\n",
" stops_df,\n",
" geometry=\"geometry\",\n",
" crs=WGS84_CRS,\n",
")\n",
"\n",
"stops_gdf.plot(markersize=1)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use GTFSLoader to load stops statistics from the feed"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"gtfs_loader = GTFSLoader()\n",
"trips_gdf = gtfs_loader.load(wroclaw_gtfs)\n",
"\n",
"print(trips_gdf.columns)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.14"
},
"vscode": {
"interpreter": {
"hash": "f39c7279c85c8be5d827e53eddb5011e966102d239fe8b81ca4bd9f0123eda8f"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
28 changes: 28 additions & 0 deletions examples/loaders/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
"""Utility functions for loaders examples."""
import requests
from tqdm import tqdm


def download(url: str, fname: str, chunk_size: int = 1024) -> None:
"""
Download a file with progress bar.
Args:
url (str): URL to download.
fname (str): File name.
chunk_size (str): Chunk size.
Source: https://gist.github.com/yanqd0/c13ed29e29432e3cf3e7c38467f42f51
"""
resp = requests.get(url, stream=True)
total = int(resp.headers.get("content-length", 0))
with open(fname, "wb") as file, tqdm(
desc=fname.split("/")[-1],
total=total,
unit="iB",
unit_scale=True,
unit_divisor=1024,
) as bar:
for data in resp.iter_content(chunk_size=chunk_size):
size = file.write(data)
bar.update(size)
Loading

0 comments on commit 199adea

Please sign in to comment.