RunPandas - Python Package for handing running data from GPS-enabled devices to worldwide race results.
RunPandas is a project to add support for data collected by GPS-enabled tracking devices, heart rate monitors data to [pandas](http://pandas.pydata.org) objects. It is a Python package that provides infrastructure for importing tracking data from such devices, enabling statistical and visual analysis for running enthusiasts and lovers. Its goal is to fill the gap between the routine collection of data and their manual analyses in Pandas and Python.
Since the release 0.6.0
it comes with the support of handling race event results, so we can analyze
from race split times, finish times, demographics, etc. The goal is to support several many races results available
to anyone interested in running race results analytics.
Stable documentation `__ is available on `github.io. A second copy of the stable documentation is hosted on read the docs for more details.
Development documentation is available for the latest changes in master.
==> Check out this Blog post for the reasoning and philosophy behind Runpandas, as well as a detailed tutorial with code examples.
==> Follow this Runpandas live book in Jupyter notebook format based on Jupyter Books.
RunPandas depends on the following packages:
pandas
fitparse
stravalib
pydantic
pyaml
haversine
thefuzz`
Runpandas was tested to work on *nix-like systems, including macOS.
$ pip install runpandas
$ conda install -c marcelcaraciolo runpandas
$ pip install git+https://github.com/corriporai/runpandas.git
or
$ git clone https://github.com/corriporai/runpandas.git
$ python setup.py install
Install using pip
and then import and use one of the tracking
readers. This example loads a local file.tcx. From the data file, we
obviously get time, altitude, distance, heart rate and geo position
(lat/long).
# !pip install runpandas
import runpandas as rpd
activity = rpd.read_file('./sample.tcx')
activity.head(5)
alt | dist | hr | lon | lat | |
---|---|---|---|---|---|
time | |||||
00:00:00 | 178.942627 | 0.000000 | 62.0 | -79.093187 | 35.951880 |
00:00:01 | 178.942627 | 0.000000 | 62.0 | -79.093184 | 35.951880 |
00:00:06 | 178.942627 | 1.106947 | 62.0 | -79.093172 | 35.951868 |
00:00:12 | 177.500610 | 13.003035 | 62.0 | -79.093228 | 35.951774 |
00:00:16 | 177.500610 | 22.405027 | 60.0 | -79.093141 | 35.951732 |
The data frames that are returned by runpandas when loading files is
similar for different file types. The dataframe in the above example is
a subclass of the pandas.DataFrame
and provides some additional
features. Certain columns also return specific pandas.Series
subclasses, which provides useful methods:
print (type(activity))
print(type(activity.alt))
<class 'runpandas.types.frame.Activity'> <class 'runpandas.types.columns.Altitude'>
For instance, if you want to get the base unit for the altitude alt
data or the distance dist
data:
print(activity.alt.base_unit)
print(activity.alt.sum())
m 65883.68151855901
print(activity.dist.base_unit)
print(activity.dist[-1])
m 4686.31103516
The Activity
dataframe also contains special properties that
presents some statistics from the workout such as elapsed time, mean
heartrate, the moving time and the distance of workout in meters.
#total time elapsed for the activity
print(activity.ellapsed_time)
#distance of workout in meters
print(activity.distance)
#mean heartrate
print(activity.mean_heart_rate())
0 days 00:33:11 4686.31103516 156.65274151436032
Occasionally, some observations such as speed, distance and others must
be calculated based on available data in the given activity. In
runpandas there are special accessors (runpandas.acessors
) that
computes some of these metrics. We will compute the speed
and the
distance per position
observations using the latitude and longitude
for each record and calculate the haversine distance in meters and the
speed in meters per second.
#compute the distance using haversine formula between two consecutive latitude, longitudes observations.
activity['distpos'] = activity.compute.distance()
activity['distpos'].head()
time 00:00:00 NaN 00:00:01 0.333146 00:00:06 1.678792 00:00:12 11.639901 00:00:16 9.183847 Name: distpos, dtype: float64
#compute the distance using haversine formula between two consecutive latitude, longitudes observations.
activity['speed'] = activity.compute.speed(from_distances=True)
activity['speed'].head()
time 00:00:00 NaN 00:00:01 0.333146 00:00:06 0.335758 00:00:12 1.939984 00:00:16 2.295962 Name: speed, dtype: float64
Popular running metrics are also available through the runpandas acessors such as gradient, pace, vertical speed , etc.
activity['vam'] = activity.compute.vertical_speed()
activity['vam'].head()
time 00:00:00 NaN 00:00:01 0.000000 00:00:06 0.000000 00:00:12 -0.240336 00:00:16 0.000000 Name: vam, dtype: float64
Sporadically, there will be a large time difference between consecutive observations in the same workout. This can happen when device is paused by the athlete or therere proprietary algorithms controlling the operating sampling rate of the device which can auto-pause when the device detects no significant change in position. In runpandas there is an algorithm that will attempt to calculate the moving time based on the GPS locations, distances, and speed of the activity.
To compute the moving time, there is a special acessor that detects the
periods of inactivity and returns the moving
series containing all
the observations considered to be stopped.
activity_only_moving = activity.only_moving()
print(activity_only_moving['moving'].head())
time 00:00:00 False 00:00:01 False 00:00:06 False 00:00:12 True 00:00:16 True Name: moving, dtype: bool
Now we can compute the moving time, the time of how long the user were active.
activity_only_moving.moving_time
Timedelta('0 days 00:33:05')
Runpandas also provides a method summary
for summarising the
activity through common statistics. Such a session summary includes
estimates of several metrics computed above with a single call.
activity_only_moving.summary()
Session Running: 26-12-2012 21:29:53 Total distance (meters) 4686.31 Total ellapsed time 0 days 00:33:11 Total moving time 0 days 00:33:05 Average speed (km/h) 8.47656 Average moving speed (km/h) 8.49853 Average pace (per 1 km) 0 days 00:07:04 Average pace moving (per 1 km) 0 days 00:07:03 Average cadence NaN Average moving cadence NaN Average heart rate 156.653 Average moving heart rate 157.4 Average temperature NaN dtype: object
Now, let’s play with the data. Let’s show distance vs as an example of what and how we can create visualizations. In this example, we will use the built in, matplotlib based plot function.
activity[['dist']].plot()
Matplotlib is building the font cache; this may take a moment.
<AxesSubplot:xlabel='time'>
And here is altitude versus time.
activity[['alt']].plot()
<AxesSubplot:xlabel='time'>
Finally, lest’s show the altitude vs distance profile. Here is a scatterplot that shows altitude vs distance as recorded.
activity.plot.scatter(x='dist', y='alt', c='DarkBlue')
<AxesSubplot:xlabel='dist', ylabel='alt'>
Finally, let’s watch a glimpse of the map route by plotting a 2d map using logintude vs latitude.
activity.plot(x='lon', y='lat')
<AxesSubplot:xlabel='lon'>
The runpandas
package also comes with extra batteries, such as our
runpandas.datasets
package, which includes a range of example data
for testing purposes. There is a dedicated
repository with all
the data available. An index of the data is kept
here.
You can use the example data available:
example_fit = rpd.activity_examples(path='Garmin_Fenix_6S_Pro-Running.fit')
print(example_fit.summary)
print('Included metrics:', example_fit.included_data)
Synced from watch Garmin Fenix 6S Included metrics: [<MetricsEnum.latitude: 'latitude'>, <MetricsEnum.longitude: 'longitude'>, <MetricsEnum.elevation: 'elevation'>, <MetricsEnum.heartrate: 'heartrate'>, <MetricsEnum.cadence: 'cadence'>, <MetricsEnum.distance: 'distance'>, <MetricsEnum.temperature: 'temperature'>]
rpd.read_file(example_fit.path).head()
enhanced_speed | enhanced_altitude | unknown_87 | fractional_cadence | lap | session | unknown_108 | dist | cad | hr | lon | lat | temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||
00:00:00 | 0.000 | 254.0 | 0 | 0.0 | 0 | 0 | NaN | 0.00 | 0 | 101 | 13.843376 | 51.066280 | 8 |
00:00:01 | 0.000 | 254.0 | 0 | 0.0 | 0 | 0 | NaN | 0.00 | 0 | 101 | 13.843374 | 51.066274 | 8 |
00:00:10 | 1.698 | 254.0 | 0 | 0.0 | 0 | 1 | 2362.0 | 0.00 | 83 | 97 | 13.843176 | 51.066249 | 8 |
00:00:12 | 2.267 | 254.0 | 0 | 0.0 | 0 | 1 | 2362.0 | 3.95 | 84 | 99 | 13.843118 | 51.066250 | 8 |
00:00:21 | 2.127 | 254.6 | 0 | 0.5 | 0 | 1 | 2552.0 | 16.67 | 87 | 100 | 13.842940 | 51.066231 | 8 |
In case of you just only want to see all the activities in a specific
file type , you can filter the runpandas.activities_examples
, which
returns a filter iterable that you can iterate over:
fit_examples = rpd.activity_examples(file_type=rpd.FileTypeEnum.FIT)
for example in fit_examples:
#Download and play with the filtered examples
print(example.path)
https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Fenix_6S_Pro-Running.fit https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Fenix2_running_with_hrm.fit https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Forerunner_910XT-Running.fit
The package runpandas
provides utilities to import a group of
activities data, and after careful processing, organises them into a
MultiIndex Dataframe.
The pandas.MultiIndex
allows you to have multiple columns acting as
a row identifier and multiple rows acting as a header identifier. In our
scenario we will have as first indentifier (index) the timestamp of the
workout when it started, and as second indentifier the timedelta of the
consecutive observations of the workout.
The MultiIndex dataframe result from the function
runpandas.read_dir_aggregate
, which takes as input the directory of
tracking data files, and constructs using the read*() functions to build
runpandas.Activity
objects. Them, the result daframes are first
sorted by the time stamps and are all combined into a single
runpandas.Activity
indexed by the two-level pandas.MultiIndex
.
Let’s illustrate these examples by loading a bunch of 68 running activities of a female runner over the years of 2020 until 2021.
import warnings
warnings.filterwarnings('ignore')
import runpandas
session = runpandas.read_dir_aggregate(dirname='session/')
session
alt | hr | lon | lat | ||
---|---|---|---|---|---|
start | time | ||||
2020-08-30 09:08:51.012 | 00:00:00 | NaN | NaN | -34.893609 | -8.045055 |
00:00:01.091000 | NaN | NaN | -34.893624 | -8.045054 | |
00:00:02.091000 | NaN | NaN | -34.893641 | -8.045061 | |
00:00:03.098000 | NaN | NaN | -34.893655 | -8.045063 | |
00:00:04.098000 | NaN | NaN | -34.893655 | -8.045065 | |
... | ... | ... | ... | ... | ... |
2021-07-04 11:23:19.418 | 00:52:39.582000 | 0.050001 | 189.0 | -34.894534 | -8.046602 |
00:52:43.582000 | NaN | NaN | -34.894465 | -8.046533 | |
00:52:44.582000 | NaN | NaN | -34.894443 | -8.046515 | |
00:52:45.582000 | NaN | NaN | -34.894429 | -8.046494 | |
00:52:49.582000 | NaN | 190.0 | -34.894395 | -8.046398 |
48794 rows × 4 columns
Now let’s see how many activities there are available for analysis. For
this question, we also have an acessor
runpandas.types.acessors.session._SessionAcessor
that holds several
methods for computing the basic running metrics across all the
activities from this kind of frame and some summary statistics.
#count the number of activities in the session
print ('Total Activities:', session.session.count())
Total Activities: 68
We might compute the main running metrics (speed, pace, moving, etc)
using the session acessors methods as like the ones available in the
runpandas.types.metrics.MetricsAcessor
. By the way, those methods
are called inside each metric method, but applying in each of activities
separatedely.
#In this example we compute the distance and the distance per position across all workouts
session = session.session.distance()
session
alt | hr | lon | lat | distpos | dist | ||
---|---|---|---|---|---|---|---|
start | time | ||||||
2020-08-30 09:08:51.012 | 00:00:00 | NaN | NaN | -34.893609 | -8.045055 | NaN | NaN |
00:00:01.091000 | NaN | NaN | -34.893624 | -8.045054 | 1.690587 | 1.690587 | |
00:00:02.091000 | NaN | NaN | -34.893641 | -8.045061 | 2.095596 | 3.786183 | |
00:00:03.098000 | NaN | NaN | -34.893655 | -8.045063 | 1.594298 | 5.380481 | |
00:00:04.098000 | NaN | NaN | -34.893655 | -8.045065 | 0.163334 | 5.543815 | |
... | ... | ... | ... | ... | ... | ... | ... |
2021-07-04 11:23:19.418 | 00:52:39.582000 | 0.050001 | 189.0 | -34.894534 | -8.046602 | 12.015437 | 8220.018885 |
00:52:43.582000 | NaN | NaN | -34.894465 | -8.046533 | 10.749779 | 8230.768664 | |
00:52:44.582000 | NaN | NaN | -34.894443 | -8.046515 | 3.163638 | 8233.932302 | |
00:52:45.582000 | NaN | NaN | -34.894429 | -8.046494 | 2.851535 | 8236.783837 | |
00:52:49.582000 | NaN | 190.0 | -34.894395 | -8.046398 | 11.300740 | 8248.084577 |
48794 rows × 6 columns
#comput the speed for each activity
session = session.session.speed(from_distances=True)
#compute the pace for each activity
session = session.session.pace()
#compute the inactivity periods for each activity
session = session.session.only_moving()
After all the computation done, let’s going to the next step: the exploration and get some descriptive statistics.
After the loading and metrics computation for all the activities, now
let’s look further the data and get the basic summaries about the
session: time spent, total distance, mean speed and other insightful
statistics in each running activity. For this task, we may accomplish it
by calling the method
runpandas.types.session._SessionAcessor.summarize
. It will return a
basic Dataframe including all the aggregated statistics per activity
from the season frame.
summary = session.session.summarize()
summary
moving_time | mean_speed | max_speed | mean_pace | max_pace | mean_moving_speed | mean_moving_pace | mean_cadence | max_cadence | mean_moving_cadence | mean_heart_rate | max_heart_rate | mean_moving_heart_rate | mean_temperature | min_temperature | max_temperature | total_distance | ellapsed_time | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
start | ||||||||||||||||||
2020-07-03 09:50:53.162 | 00:25:29.838000 | 2.642051 | 4.879655 | 00:06:18 | 00:03:24 | 2.665008 | 00:06:15 | NaN | NaN | NaN | 178.819923 | 188.0 | 178.872587 | NaN | NaN | NaN | 4089.467333 | 00:25:47.838000 |
2020-07-05 09:33:20.999 | 00:05:04.999000 | 2.227637 | 6.998021 | 00:07:28 | 00:02:22 | 3.072098 | 00:05:25 | NaN | NaN | NaN | 168.345455 | 176.0 | 168.900000 | NaN | NaN | NaN | 980.162640 | 00:07:20.001000 |
2020-07-05 09:41:59.999 | 00:18:19 | 1.918949 | 6.563570 | 00:08:41 | 00:02:32 | 2.729788 | 00:06:06 | NaN | NaN | NaN | 173.894180 | 185.0 | 174.577143 | NaN | NaN | NaN | 3139.401118 | 00:27:16 |
2020-07-13 09:13:58.718 | 00:40:21.281000 | 2.509703 | 8.520387 | 00:06:38 | 00:01:57 | 2.573151 | 00:06:28 | NaN | NaN | NaN | 170.808176 | 185.0 | 170.795527 | NaN | NaN | NaN | 6282.491059 | 00:41:43.281000 |
2020-07-17 09:33:02.308 | 00:32:07.691000 | 2.643278 | 8.365431 | 00:06:18 | 00:01:59 | 2.643278 | 00:06:18 | NaN | NaN | NaN | 176.436242 | 186.0 | 176.436242 | NaN | NaN | NaN | 5095.423045 | 00:32:07.691000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-06-13 09:22:30.985 | 01:32:33.018000 | 2.612872 | 23.583956 | 00:06:22 | 00:00:42 | 2.810855 | 00:05:55 | NaN | NaN | NaN | 169.340812 | 183.0 | 169.655879 | NaN | NaN | NaN | 15706.017295 | 01:40:11.016000 |
2021-06-20 09:16:55.163 | 00:59:44.512000 | 2.492640 | 6.065895 | 00:06:41 | 00:02:44 | 2.749453 | 00:06:03 | NaN | NaN | NaN | 170.539809 | 190.0 | 171.231392 | NaN | NaN | NaN | 9965.168311 | 01:06:37.837000 |
2021-06-23 09:37:44.000 | 00:26:49.001000 | 2.501796 | 5.641343 | 00:06:39 | 00:02:57 | 2.568947 | 00:06:29 | NaN | NaN | NaN | 156.864865 | 171.0 | 156.957031 | NaN | NaN | NaN | 4165.492241 | 00:27:45.001000 |
2021-06-27 09:50:08.664 | 00:31:42.336000 | 2.646493 | 32.734124 | 00:06:17 | 00:00:30 | 2.661853 | 00:06:15 | NaN | NaN | NaN | 166.642857 | 176.0 | 166.721116 | NaN | NaN | NaN | 5074.217061 | 00:31:57.336000 |
2021-07-04 11:23:19.418 | 00:47:47.583000 | 2.602263 | 4.212320 | 00:06:24 | 00:03:57 | 2.856801 | 00:05:50 | NaN | NaN | NaN | 177.821862 | 192.0 | 177.956967 | NaN | NaN | NaN | 8248.084577 | 00:52:49.582000 |
68 rows × 18 columns
print('Session Interval:', (summary.index.to_series().max() - summary.index.to_series().min()).days, 'days')
print('Total Workouts:', len(summary), 'runnings')
print('Tota KM Distance:', summary['total_distance'].sum() / 1000)
print('Average Pace (all runs):', summary.mean_pace.mean())
print('Average Moving Pace (all runs):', summary.mean_moving_pace.mean())
print('Average KM Distance (all runs):', round(summary.total_distance.mean()/ 1000,2))
Session Interval: 366 days Total Workouts: 68 runnings Tota KM Distance: 491.77377537338896 Average Pace (all runs): 0 days 00:07:18.411764 Average Moving Pace (all runs): 0 days 00:06:02.147058 Average KM Distance (all runs): 7.23
At this point, I have the summary data to start some powerful visualization and analysis. At the charts below we illustrate her pace and distance evolution over time.
import matplotlib.pyplot as plt
import datetime
#let's convert the pace to float number in minutes
summary['mean_moving_pace_float'] = summary['mean_moving_pace'] / datetime.timedelta(minutes=1)
summary['pace_moving_all_mean'] = summary.mean_moving_pace.mean()
summary['pace_moving_all_mean_float'] = summary['pace_moving_all_mean'] / datetime.timedelta(minutes=1)
plt.subplots(figsize=(8, 5))
plt.plot(summary.index, summary.mean_moving_pace_float, color='silver')
plt.plot(summary.pace_moving_all_mean_float, color='purple', linestyle='dashed', label='average')
plt.title("Pace Evolution")
plt.xlabel("Runnings")
plt.ylabel("Pace")
plt.legend()
<matplotlib.legend.Legend at 0x7f82d8d83cd0>
plt.subplots(figsize=(8, 5))
summary['distance_all_mean'] = round(summary.total_distance.mean()/1000,2)
plt.plot(summary.index, summary.total_distance / 1000, color='silver')
plt.plot(summary.distance_all_mean, color='purple', linestyle='dashed', label='average')
plt.title("Distance Evolution")
plt.xlabel("Runs")
plt.ylabel("distance")
plt.legend()
plt.show()
One of the great features in Runpandas is the capability of accessing race’s result datasets accross several races around the world, from majors to local ones (if it’s available at our data repository). In this example we will analyze the 2022 Berlin Marathon using runpandas methods specially tailored for handling race results data.
First, let’s load the Berlin Marathon data by using the runpandas method
runpandas.get_events
. This function provides a way of accessing the
race data and visualize the results from several marathons available at
our datasets repository. Given the year and the marathon identifier you
can filter any marathon datasets that you want analyze. The result will
be a list of runpandas.EventData
instances with race result and its
metadata. Let’s look for Berlin Marathon results.
import pandas as pd
import runpandas as rpd
import warnings
warnings.filterwarnings('ignore')
results = rpd.get_events('Berlin')
results
[<Event: name=Berlin Marathon Results from 2022., country=DE, edition=2022>]
The result comes with the Berlin Marathon Result from 2022. Let’s take a
look inside the race event, which comes with a handful method to
describe its attributes and a special method to load the race result
data into a runpandas.datasets.schema.RaceData
instance.
berlin_result = results[0]
print('Event type', berlin_result.run_type)
print('Country', berlin_result.country)
print('Year', berlin_result.edition)
print('Name', berlin_result.summary)
Event type RunTypeEnum.MARATHON Country DE Year 2022 Name Berlin Marathon Results from 2022.
Now that we confirmed that we requested the corresponding marathon dataset. We will load it into a DataFrame so we can further explore it.
#loading the race data into a RaceData Dataframe
race_result = berlin_result.load()
race_result
position | position_gender | country | sex | division | bib | firstname | lastname | club | starttime | ... | 10k | 15k | 20k | 25k | 30k | 35k | 40k | grosstime | nettime | category | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | KEN | M | 1 | 1 | Eliud | Kipchoge | – | 09:15:00 | ... | 0 days 00:28:23 | 0 days 00:42:33 | 0 days 00:56:45 | 0 days 01:11:08 | 0 days 01:25:40 | 0 days 01:40:10 | 0 days 01:54:53 | 0 days 02:01:09 | 0 days 02:01:09 | M35 |
1 | 2 | 2 | KEN | M | 1 | 5 | Mark | Korir | – | 09:15:00 | ... | 0 days 00:28:56 | 0 days 00:43:35 | 0 days 00:58:14 | 0 days 01:13:07 | 0 days 01:28:06 | 0 days 01:43:25 | 0 days 01:59:05 | 0 days 02:05:58 | 0 days 02:05:58 | M30 |
2 | 3 | 3 | ETH | M | 1 | 8 | Tadu | Abate | – | 09:15:00 | ... | 0 days 00:29:46 | 0 days 00:44:40 | 0 days 00:59:40 | 0 days 01:14:44 | 0 days 01:30:01 | 0 days 01:44:55 | 0 days 02:00:03 | 0 days 02:06:28 | 0 days 02:06:28 | MH |
3 | 4 | 4 | ETH | M | 2 | 26 | Andamlak | Belihu | – | 09:15:00 | ... | 0 days 00:28:23 | 0 days 00:42:33 | 0 days 00:56:45 | 0 days 01:11:09 | 0 days 01:26:11 | 0 days 01:42:14 | 0 days 01:59:14 | 0 days 02:06:40 | 0 days 02:06:40 | MH |
4 | 5 | 5 | KEN | M | 3 | 25 | Abel | Kipchumba | – | 09:15:00 | ... | 0 days 00:28:55 | 0 days 00:43:35 | 0 days 00:58:14 | 0 days 01:13:07 | 0 days 01:28:03 | 0 days 01:43:08 | 0 days 01:59:14 | 0 days 02:06:49 | 0 days 02:06:49 | MH |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
35566 | DNF | – | USA | M | – | 65079 | michael | perkowski | – | – | ... | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | M65 |
35567 | DNF | – | USA | M | – | 62027 | Karl | Mann | – | – | ... | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | M55 |
35568 | DNF | – | THA | F | – | 27196 | oraluck | pichaiwongse | STATE to BERLIN 2022 | – | ... | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | W55 |
35569 | DNF | – | SUI | M | – | 56544 | Gerardo | GARCIA CALZADA | – | – | ... | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | M50 |
35570 | DNF | – | AUT | M | – | 63348 | Harald | Mori | Albatros | – | ... | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | NaT | M60 |
35571 rows × 23 columns
Now you can get some insights about the Berlin Marathon 2022, by using its tailored methods for getting basic and quick insights. For example, the number of finishers, number of participants and the winner info.
print('Total participants', race_result.total_participants)
print('Total finishers', race_result.total_finishers)
print('Total Non-Finishers', race_result.total_nonfinishers)
Total participants 35571 Total finishers 34844 Total Non-Finishers 727
race_result.winner
position 1 position_gender 1 country KEN sex M division 1 bib 1 firstname Eliud lastname Kipchoge club – starttime 09:15:00 start_raw_time 09:15:00 half 0 days 00:59:51 5k 0 days 00:14:14 10k 0 days 00:28:23 15k 0 days 00:42:33 20k 0 days 00:56:45 25k 0 days 01:11:08 30k 0 days 01:25:40 35k 0 days 01:40:10 40k 0 days 01:54:53 grosstime 0 days 02:01:09 nettime 0 days 02:01:09 category M35 Name: 0, dtype: object
Eliud Kipchoge of Kenya won the 2022 Berlin Marathon in 2:01:09. Kipchoge’s victory was his fourth in Berlin and 17th overall in a career of 19 marathon starts. And who was the women’s race winner?
race_result[(race_result['position_gender'] == 1) & (race_result['sex'] == 'F')].T
32 | |
---|---|
position | 33 |
position_gender | 1 |
country | ETH |
sex | F |
division | 1 |
bib | F24 |
firstname | Tigist |
lastname | Assefa |
club | – |
starttime | 09:15:00 |
start_raw_time | 09:15:00 |
half | 0 days 01:08:13 |
5k | 0 days 00:16:22 |
10k | 0 days 00:32:36 |
15k | 0 days 00:48:44 |
20k | 0 days 01:04:43 |
25k | 0 days 01:20:48 |
30k | 0 days 01:36:41 |
35k | 0 days 01:52:27 |
40k | 0 days 02:08:42 |
grosstime | 0 days 02:15:37 |
nettime | 0 days 02:15:37 |
category | WH |
Tigist Assefa of Ethiopia won the women’s race in a stunning time of 2:15:37 to set a new course record in Berlin.
Runpandas also provides a race’s summary method for showing the compilation of some general insights such as finishers, partipants (by gender and overall).
race_result.summary()
Event name berlin marathon Event type 42k Event country DE Event date 25-09-2022 Number of participants 35571 Number of finishers 34844 Number of non-finishers 727 Number of male finishers 23314 Number of female finishers 11523 Winner Nettime 0 days 02:01:09 dtype: objec
Runpandas for some race results come with the splits for the partial
distances of the race. We can fetch for any runner the splits using the
method runpandas.acessors.splits.pick_athlete
. So, if we need to
have direct access to all splits from a specific runner, we will use the
splits
acesssor.
race_result.splits.pick_athlete(identifier='1')
time | distance_meters | distance_miles | |
---|---|---|---|
split | |||
0k | 0 days 00:00:00 | 0 | 0.0000 |
5k | 0 days 00:14:14 | 5000 | 3.1069 |
10k | 0 days 00:28:23 | 10000 | 6.2137 |
15k | 0 days 00:42:33 | 15000 | 9.3206 |
20k | 0 days 00:56:45 | 20000 | 12.4274 |
half | 0 days 00:59:51 | 21097 | 13.1091 |
25k | 0 days 01:11:08 | 25000 | 15.5343 |
30k | 0 days 01:25:40 | 30000 | 18.6411 |
35k | 0 days 01:40:10 | 35000 | 21.7480 |
40k | 0 days 01:54:53 | 40000 | 24.8548 |
nettime | 0 days 02:01:09 | 42195 | 26.2187 |
With plotting libraries such as matplotlib
you can analyze the splits data through a impressive visualization!
eliud_kipchoge_splits = race_result.splits.pick_athlete(identifier='1')
def timeTicks(x, pos):
seconds = x / 10**9
d = datetime.timedelta(seconds=seconds)
return str(d)
fig, ax2 = plt.subplots()
#plot the splits time
#format the y-axis to show the labels as timedelta.
formatter = matplotlib.ticker.FuncFormatter(timeTicks)
#plot the paces per segment
line2, = ax2.plot(eliud_kipchoge_splits_filtered.index, eliud_kipchoge_splits_filtered['pace'], linestyle='dashed', color='cyan', lw=5, alpha=0.8)
#plot the overall mean pace
line3, = ax2.plot(eliud_kipchoge_splits_filtered.index, eliud_kipchoge_splits_filtered['mean_pace'], color='#1b9e77', linestyle='dashed', lw=5, alpha=0.8)
#annotate the pace line with time splits
yvalues = line2.get_ydata()
for index, y in zip(eliud_kipchoge_splits_filtered.index, yvalues):
formated_time = datetime.timedelta(seconds=eliud_kipchoge_splits_filtered.loc[index,'split_time'].total_seconds())
ax2.text(index, y, formated_time, weight="bold", size=12, )
ax2.yaxis.set_major_formatter(formatter)
ax2.grid(False)
ax2.legend(
(line2, line3),
('Splits Time', 'Splits Pace', 'Mean Pace'),
loc='lower right',
frameon=False
)
ax2.set_title("Eliud Kipchoge splits time and pace in Berlin Marathon 2022")
ax2.set_xlabel("Splits in kms")
ax2.set_ylabel("Pace min/km")
plt.show()
- Report bugs, suggest features or view the source code [on GitHub](https://github.com/corriporai/runpandas).
I'm very interested in your experience with runpandas. Please drop me an note with any feedback you have.
Contributions welcome!
- Marcel Caraciolo
Runpandas is licensed under the MIT License. A copy of which is included in LICENSE.