feat: add GTFS Loader (#150)

* chore: add gtfs_kit dependency * chore: add trips count to GTFSLoader * chore: load directions from gtfs feed * fix: make gtfs_kit import optional, fix imports in __init__.py, use WGS84_CRS constant * chore: add example notebook for GTFSLoader * chore: add GTFS feed validation * fix: use warnings for gtfs validation error * test: add pytest-mock dependency and gtfs loader validation tests * test: add GTFSLoader tests * chore: update CHANGELOG * test: add gtfs_kit to optional dependencies tests * fix(pre-commit.ci): auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
kraina-ai · Feb 13, 2023 · 199adea · 199adea
1 parent 2bd6fac
commit 199adea
Show file tree

Hide file tree

Showing 14 changed files with 1,606 additions and 919 deletions.
diff --git a/.flake8 b/.flake8
@@ -1,7 +1,7 @@
 [flake8]
 max-line-length = 100
 max-doc-length = 100
-extend-ignore = E203
+extend-ignore = E203,B028
 exclude =
     .git,
     .venv,

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -37,6 +37,7 @@ repos:
     rev: v0.991
     hooks:
       - id: mypy
+        additional_dependencies: ["types-requests"]
   - repo: https://github.com/pdm-project/pdm
     rev: 2.4.3
     hooks:

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased] - 2022-MM-DD
 
 ### Added
+- GTFS Loader from gtfs2vec paper
 
 ### Changed
 - Change embedders and joiners interface to have `.transform` method

diff --git a/examples/loaders/README.md b/examples/loaders/README.md
@@ -3,3 +3,4 @@
 Examples illustrating the usage of every Loader.
 
 - [GeoparquetLoader](geoparquet_loader.ipynb)
+- [GTFSLoader](gtfs_loader.ipynb)
diff --git a/examples/loaders/files/.gitignore b/examples/loaders/files/.gitignore
@@ -0,0 +1,2 @@
+# example GTFS used in notebook
+example.zip
diff --git a/examples/loaders/gtfs_loader.ipynb b/examples/loaders/gtfs_loader.ipynb
@@ -0,0 +1,127 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# GTFS Loader Example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pathlib import Path\n",
+    "from srai.loaders import GTFSLoader\n",
+    "import gtfs_kit as gk\n",
+    "import geopandas as gpd\n",
+    "import numpy as np\n",
+    "from shapely.geometry import Point\n",
+    "from srai.utils.constants import WGS84_CRS\n",
+    "from utils import download"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Download an example GTFS feed from Wroclaw, Poland\n",
+    "\n",
+    "In this notebook we use the GTFS feed for Wroclaw, Poland as an example, which is available in Wroclaw's open data repository[1]. This download uses transitfeeds.com[2] to download the feed, but you can also download the feed directly from the Wroclaw open data repository.\n",
+    "\n",
+    "1. https://www.wroclaw.pl/open-data/dataset/rozkladjazdytransportupublicznegoplik_data\n",
+    "2. https://transitfeeds.com/p/mpk-wroc-aw/663"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "wroclaw_gtfs = Path().resolve() / \"files\" / \"example.zip\"\n",
+    "gtfs_url = \"https://transitfeeds.com/p/mpk-wroc-aw/663/20221221/download\"\n",
+    "\n",
+    "download(gtfs_url, wroclaw_gtfs.as_posix())"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Peek at the feed using `gtfs_kit` directly"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feed = gk.read_feed(wroclaw_gtfs, dist_units=\"km\")\n",
+    "\n",
+    "stops_df = feed.stops[[\"stop_id\", \"stop_lat\", \"stop_lon\"]].set_index(\"stop_id\")\n",
+    "stops_df[\"geometry\"] = stops_df.apply(lambda row: Point(row[\"stop_lon\"], row[\"stop_lat\"]), axis=1)\n",
+    "\n",
+    "stops_gdf = gpd.GeoDataFrame(\n",
+    "    stops_df,\n",
+    "    geometry=\"geometry\",\n",
+    "    crs=WGS84_CRS,\n",
+    ")\n",
+    "\n",
+    "stops_gdf.plot(markersize=1)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Use GTFSLoader to load stops statistics from the feed"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "gtfs_loader = GTFSLoader()\n",
+    "trips_gdf = gtfs_loader.load(wroclaw_gtfs)\n",
+    "\n",
+    "print(trips_gdf.columns)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.14"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "f39c7279c85c8be5d827e53eddb5011e966102d239fe8b81ca4bd9f0123eda8f"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/examples/loaders/utils.py b/examples/loaders/utils.py
@@ -0,0 +1,28 @@
+"""Utility functions for loaders examples."""
+import requests
+from tqdm import tqdm
+
+
+def download(url: str, fname: str, chunk_size: int = 1024) -> None:
+    """
+    Download a file with progress bar.
+
+    Args:
+        url (str): URL to download.
+        fname (str): File name.
+        chunk_size (str): Chunk size.
+
+    Source: https://gist.github.com/yanqd0/c13ed29e29432e3cf3e7c38467f42f51
+    """
+    resp = requests.get(url, stream=True)
+    total = int(resp.headers.get("content-length", 0))
+    with open(fname, "wb") as file, tqdm(
+        desc=fname.split("/")[-1],
+        total=total,
+        unit="iB",
+        unit_scale=True,
+        unit_divisor=1024,
+    ) as bar:
+        for data in resp.iter_content(chunk_size=chunk_size):
+            size = file.write(data)
+            bar.update(size)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -3,3 +3,4 @@
		Examples illustrating the usage of every Loader.

		- [GeoparquetLoader](geoparquet_loader.ipynb)
		- [GTFSLoader](gtfs_loader.ipynb)