Skip to content

RunPandas - Python Package for handing running data from GPS-enabled devices to worldwide race results.

License

Notifications You must be signed in to change notification settings

corriporai/runpandas

Repository files navigation

https://raw.githubusercontent.com/corriporai/runpandas/master/docs/source/_static/images/runpandas_banner.png

RunPandas - Python Package for handing running data from GPS-enabled devices to worldwide race results.

CodeFactor https://github.com/corriporai/runpandas/workflows/Build/badge.svg?branch=master https://coveralls.io/repos/github/corriporai/runpandas/badge.svg?branch=master https://readthedocs.org/projects/runpandas/badge/?version=latest https://static.pepy.tech/personalized-badge/runpandas?period=total&units=international_system&left_color=black&right_color=orange&left_text=Downloads

Introduction

RunPandas is a project to add support for data collected by GPS-enabled tracking devices, heart rate monitors data to [pandas](http://pandas.pydata.org) objects. It is a Python package that provides infrastructure for importing tracking data from such devices, enabling statistical and visual analysis for running enthusiasts and lovers. Its goal is to fill the gap between the routine collection of data and their manual analyses in Pandas and Python.

Since the release 0.6.0 it comes with the support of handling race event results, so we can analyze from race split times, finish times, demographics, etc. The goal is to support several many races results available to anyone interested in running race results analytics.

Documentation

Stable documentation `__ is available on `github.io. A second copy of the stable documentation is hosted on read the docs for more details.

Development documentation is available for the latest changes in master.

==> Check out this Blog post for the reasoning and philosophy behind Runpandas, as well as a detailed tutorial with code examples.

==> Follow this Runpandas live book in Jupyter notebook format based on Jupyter Books.

Install

RunPandas depends on the following packages:
  • pandas
  • fitparse
  • stravalib
  • pydantic
  • pyaml
  • haversine
  • thefuzz`

Runpandas was tested to work on *nix-like systems, including macOS.


Install latest release version via pip

$ pip install runpandas

Install latest release version via conda

$ conda install -c marcelcaraciolo runpandas

Install latest development version

$ pip install git+https://github.com/corriporai/runpandas.git

or

$ git clone https://github.com/corriporai/runpandas.git
$ python setup.py install

Examples

Install using pip and then import and use one of the tracking readers. This example loads a local file.tcx. From the data file, we obviously get time, altitude, distance, heart rate and geo position (lat/long).

# !pip install runpandas
import runpandas as rpd
activity = rpd.read_file('./sample.tcx')
activity.head(5)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
alt dist hr lon lat
time
00:00:00 178.942627 0.000000 62.0 -79.093187 35.951880
00:00:01 178.942627 0.000000 62.0 -79.093184 35.951880
00:00:06 178.942627 1.106947 62.0 -79.093172 35.951868
00:00:12 177.500610 13.003035 62.0 -79.093228 35.951774
00:00:16 177.500610 22.405027 60.0 -79.093141 35.951732

The data frames that are returned by runpandas when loading files is similar for different file types. The dataframe in the above example is a subclass of the pandas.DataFrame and provides some additional features. Certain columns also return specific pandas.Series subclasses, which provides useful methods:

print (type(activity))
print(type(activity.alt))
<class 'runpandas.types.frame.Activity'>
<class 'runpandas.types.columns.Altitude'>

For instance, if you want to get the base unit for the altitude alt data or the distance dist data:

print(activity.alt.base_unit)
print(activity.alt.sum())
m
65883.68151855901
print(activity.dist.base_unit)
print(activity.dist[-1])
m
4686.31103516

The Activity dataframe also contains special properties that presents some statistics from the workout such as elapsed time, mean heartrate, the moving time and the distance of workout in meters.

#total time elapsed for the activity
print(activity.ellapsed_time)
#distance of workout in meters
print(activity.distance)
#mean heartrate
print(activity.mean_heart_rate())
0 days 00:33:11
4686.31103516
156.65274151436032

Occasionally, some observations such as speed, distance and others must be calculated based on available data in the given activity. In runpandas there are special accessors (runpandas.acessors) that computes some of these metrics. We will compute the speed and the distance per position observations using the latitude and longitude for each record and calculate the haversine distance in meters and the speed in meters per second.

#compute the distance using haversine formula between two consecutive latitude, longitudes observations.
activity['distpos']  = activity.compute.distance()
activity['distpos'].head()
time
00:00:00          NaN
00:00:01     0.333146
00:00:06     1.678792
00:00:12    11.639901
00:00:16     9.183847
Name: distpos, dtype: float64
#compute the distance using haversine formula between two consecutive latitude, longitudes observations.
activity['speed']  = activity.compute.speed(from_distances=True)
activity['speed'].head()
time
00:00:00         NaN
00:00:01    0.333146
00:00:06    0.335758
00:00:12    1.939984
00:00:16    2.295962
Name: speed, dtype: float64

Popular running metrics are also available through the runpandas acessors such as gradient, pace, vertical speed , etc.

activity['vam'] = activity.compute.vertical_speed()
activity['vam'].head()
time
00:00:00         NaN
00:00:01    0.000000
00:00:06    0.000000
00:00:12   -0.240336
00:00:16    0.000000
Name: vam, dtype: float64

Sporadically, there will be a large time difference between consecutive observations in the same workout. This can happen when device is paused by the athlete or therere proprietary algorithms controlling the operating sampling rate of the device which can auto-pause when the device detects no significant change in position. In runpandas there is an algorithm that will attempt to calculate the moving time based on the GPS locations, distances, and speed of the activity.

To compute the moving time, there is a special acessor that detects the periods of inactivity and returns the moving series containing all the observations considered to be stopped.

activity_only_moving = activity.only_moving()
print(activity_only_moving['moving'].head())
time
00:00:00    False
00:00:01    False
00:00:06    False
00:00:12     True
00:00:16     True
Name: moving, dtype: bool

Now we can compute the moving time, the time of how long the user were active.

activity_only_moving.moving_time
Timedelta('0 days 00:33:05')

Runpandas also provides a method summary for summarising the activity through common statistics. Such a session summary includes estimates of several metrics computed above with a single call.

activity_only_moving.summary()
Session                           Running: 26-12-2012 21:29:53
Total distance (meters)                                4686.31
Total ellapsed time                            0 days 00:33:11
Total moving time                              0 days 00:33:05
Average speed (km/h)                                   8.47656
Average moving speed (km/h)                            8.49853
Average pace (per 1 km)                        0 days 00:07:04
Average pace moving (per 1 km)                 0 days 00:07:03
Average cadence                                            NaN
Average moving cadence                                     NaN
Average heart rate                                     156.653
Average moving heart rate                                157.4
Average temperature                                        NaN
dtype: object

Now, let’s play with the data. Let’s show distance vs as an example of what and how we can create visualizations. In this example, we will use the built in, matplotlib based plot function.

activity[['dist']].plot()
Matplotlib is building the font cache; this may take a moment.
<AxesSubplot:xlabel='time'>

And here is altitude versus time.

activity[['alt']].plot()
<AxesSubplot:xlabel='time'>

Finally, lest’s show the altitude vs distance profile. Here is a scatterplot that shows altitude vs distance as recorded.

activity.plot.scatter(x='dist', y='alt', c='DarkBlue')
<AxesSubplot:xlabel='dist', ylabel='alt'>

Finally, let’s watch a glimpse of the map route by plotting a 2d map using logintude vs latitude.

activity.plot(x='lon', y='lat')
<AxesSubplot:xlabel='lon'>

The runpandas package also comes with extra batteries, such as our runpandas.datasets package, which includes a range of example data for testing purposes. There is a dedicated repository with all the data available. An index of the data is kept here.

You can use the example data available:

example_fit = rpd.activity_examples(path='Garmin_Fenix_6S_Pro-Running.fit')
print(example_fit.summary)
print('Included metrics:', example_fit.included_data)
Synced from watch Garmin Fenix 6S

Included metrics: [<MetricsEnum.latitude: 'latitude'>, <MetricsEnum.longitude: 'longitude'>, <MetricsEnum.elevation: 'elevation'>, <MetricsEnum.heartrate: 'heartrate'>, <MetricsEnum.cadence: 'cadence'>, <MetricsEnum.distance: 'distance'>, <MetricsEnum.temperature: 'temperature'>]
rpd.read_file(example_fit.path).head()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
enhanced_speed enhanced_altitude unknown_87 fractional_cadence lap session unknown_108 dist cad hr lon lat temp
time
00:00:00 0.000 254.0 0 0.0 0 0 NaN 0.00 0 101 13.843376 51.066280 8
00:00:01 0.000 254.0 0 0.0 0 0 NaN 0.00 0 101 13.843374 51.066274 8
00:00:10 1.698 254.0 0 0.0 0 1 2362.0 0.00 83 97 13.843176 51.066249 8
00:00:12 2.267 254.0 0 0.0 0 1 2362.0 3.95 84 99 13.843118 51.066250 8
00:00:21 2.127 254.6 0 0.5 0 1 2552.0 16.67 87 100 13.842940 51.066231 8

In case of you just only want to see all the activities in a specific file type , you can filter the runpandas.activities_examples, which returns a filter iterable that you can iterate over:

fit_examples = rpd.activity_examples(file_type=rpd.FileTypeEnum.FIT)
for example in fit_examples:
    #Download and play with the filtered examples
    print(example.path)
https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Fenix_6S_Pro-Running.fit
https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Fenix2_running_with_hrm.fit
https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Forerunner_910XT-Running.fit

Exploring sessions

The package runpandas provides utilities to import a group of activities data, and after careful processing, organises them into a MultiIndex Dataframe.

The pandas.MultiIndex allows you to have multiple columns acting as a row identifier and multiple rows acting as a header identifier. In our scenario we will have as first indentifier (index) the timestamp of the workout when it started, and as second indentifier the timedelta of the consecutive observations of the workout.

Illustration of the MultiIndex Dataframe

The MultiIndex Runpandas Activity Dataframe

The MultiIndex dataframe result from the function runpandas.read_dir_aggregate, which takes as input the directory of tracking data files, and constructs using the read*() functions to build runpandas.Activity objects. Them, the result daframes are first sorted by the time stamps and are all combined into a single runpandas.Activity indexed by the two-level pandas.MultiIndex.

Let’s illustrate these examples by loading a bunch of 68 running activities of a female runner over the years of 2020 until 2021.

import warnings
warnings.filterwarnings('ignore')
import runpandas
session = runpandas.read_dir_aggregate(dirname='session/')
session
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
alt hr lon lat
start time
2020-08-30 09:08:51.012 00:00:00 NaN NaN -34.893609 -8.045055
00:00:01.091000 NaN NaN -34.893624 -8.045054
00:00:02.091000 NaN NaN -34.893641 -8.045061
00:00:03.098000 NaN NaN -34.893655 -8.045063
00:00:04.098000 NaN NaN -34.893655 -8.045065
... ... ... ... ... ...
2021-07-04 11:23:19.418 00:52:39.582000 0.050001 189.0 -34.894534 -8.046602
00:52:43.582000 NaN NaN -34.894465 -8.046533
00:52:44.582000 NaN NaN -34.894443 -8.046515
00:52:45.582000 NaN NaN -34.894429 -8.046494
00:52:49.582000 NaN 190.0 -34.894395 -8.046398

48794 rows × 4 columns

Now let’s see how many activities there are available for analysis. For this question, we also have an acessor runpandas.types.acessors.session._SessionAcessor that holds several methods for computing the basic running metrics across all the activities from this kind of frame and some summary statistics.

#count the number of activities in the session
print ('Total Activities:', session.session.count())
Total Activities: 68

We might compute the main running metrics (speed, pace, moving, etc) using the session acessors methods as like the ones available in the runpandas.types.metrics.MetricsAcessor . By the way, those methods are called inside each metric method, but applying in each of activities separatedely.

#In this example we compute the distance and the distance per position across all workouts
session = session.session.distance()
session
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
alt hr lon lat distpos dist
start time
2020-08-30 09:08:51.012 00:00:00 NaN NaN -34.893609 -8.045055 NaN NaN
00:00:01.091000 NaN NaN -34.893624 -8.045054 1.690587 1.690587
00:00:02.091000 NaN NaN -34.893641 -8.045061 2.095596 3.786183
00:00:03.098000 NaN NaN -34.893655 -8.045063 1.594298 5.380481
00:00:04.098000 NaN NaN -34.893655 -8.045065 0.163334 5.543815
... ... ... ... ... ... ... ...
2021-07-04 11:23:19.418 00:52:39.582000 0.050001 189.0 -34.894534 -8.046602 12.015437 8220.018885
00:52:43.582000 NaN NaN -34.894465 -8.046533 10.749779 8230.768664
00:52:44.582000 NaN NaN -34.894443 -8.046515 3.163638 8233.932302
00:52:45.582000 NaN NaN -34.894429 -8.046494 2.851535 8236.783837
00:52:49.582000 NaN 190.0 -34.894395 -8.046398 11.300740 8248.084577

48794 rows × 6 columns

#comput the speed for each activity
session = session.session.speed(from_distances=True)
#compute the pace for each activity
session = session.session.pace()
#compute the inactivity periods for each activity
session = session.session.only_moving()

After all the computation done, let’s going to the next step: the exploration and get some descriptive statistics.

After the loading and metrics computation for all the activities, now let’s look further the data and get the basic summaries about the session: time spent, total distance, mean speed and other insightful statistics in each running activity. For this task, we may accomplish it by calling the method runpandas.types.session._SessionAcessor.summarize . It will return a basic Dataframe including all the aggregated statistics per activity from the season frame.

summary = session.session.summarize()
summary
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
moving_time mean_speed max_speed mean_pace max_pace mean_moving_speed mean_moving_pace mean_cadence max_cadence mean_moving_cadence mean_heart_rate max_heart_rate mean_moving_heart_rate mean_temperature min_temperature max_temperature total_distance ellapsed_time
start
2020-07-03 09:50:53.162 00:25:29.838000 2.642051 4.879655 00:06:18 00:03:24 2.665008 00:06:15 NaN NaN NaN 178.819923 188.0 178.872587 NaN NaN NaN 4089.467333 00:25:47.838000
2020-07-05 09:33:20.999 00:05:04.999000 2.227637 6.998021 00:07:28 00:02:22 3.072098 00:05:25 NaN NaN NaN 168.345455 176.0 168.900000 NaN NaN NaN 980.162640 00:07:20.001000
2020-07-05 09:41:59.999 00:18:19 1.918949 6.563570 00:08:41 00:02:32 2.729788 00:06:06 NaN NaN NaN 173.894180 185.0 174.577143 NaN NaN NaN 3139.401118 00:27:16
2020-07-13 09:13:58.718 00:40:21.281000 2.509703 8.520387 00:06:38 00:01:57 2.573151 00:06:28 NaN NaN NaN 170.808176 185.0 170.795527 NaN NaN NaN 6282.491059 00:41:43.281000
2020-07-17 09:33:02.308 00:32:07.691000 2.643278 8.365431 00:06:18 00:01:59 2.643278 00:06:18 NaN NaN NaN 176.436242 186.0 176.436242 NaN NaN NaN 5095.423045 00:32:07.691000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021-06-13 09:22:30.985 01:32:33.018000 2.612872 23.583956 00:06:22 00:00:42 2.810855 00:05:55 NaN NaN NaN 169.340812 183.0 169.655879 NaN NaN NaN 15706.017295 01:40:11.016000
2021-06-20 09:16:55.163 00:59:44.512000 2.492640 6.065895 00:06:41 00:02:44 2.749453 00:06:03 NaN NaN NaN 170.539809 190.0 171.231392 NaN NaN NaN 9965.168311 01:06:37.837000
2021-06-23 09:37:44.000 00:26:49.001000 2.501796 5.641343 00:06:39 00:02:57 2.568947 00:06:29 NaN NaN NaN 156.864865 171.0 156.957031 NaN NaN NaN 4165.492241 00:27:45.001000
2021-06-27 09:50:08.664 00:31:42.336000 2.646493 32.734124 00:06:17 00:00:30 2.661853 00:06:15 NaN NaN NaN 166.642857 176.0 166.721116 NaN NaN NaN 5074.217061 00:31:57.336000
2021-07-04 11:23:19.418 00:47:47.583000 2.602263 4.212320 00:06:24 00:03:57 2.856801 00:05:50 NaN NaN NaN 177.821862 192.0 177.956967 NaN NaN NaN 8248.084577 00:52:49.582000

68 rows × 18 columns

print('Session Interval:', (summary.index.to_series().max() - summary.index.to_series().min()).days, 'days')
print('Total Workouts:', len(summary), 'runnings')
print('Tota KM Distance:', summary['total_distance'].sum() / 1000)
print('Average Pace (all runs):', summary.mean_pace.mean())
print('Average Moving Pace (all runs):', summary.mean_moving_pace.mean())
print('Average KM Distance (all runs):', round(summary.total_distance.mean()/ 1000,2))
Session Interval: 366 days
Total Workouts: 68 runnings
Tota KM Distance: 491.77377537338896
Average Pace (all runs): 0 days 00:07:18.411764
Average Moving Pace (all runs): 0 days 00:06:02.147058
Average KM Distance (all runs): 7.23

At this point, I have the summary data to start some powerful visualization and analysis. At the charts below we illustrate her pace and distance evolution over time.

import matplotlib.pyplot as plt
import datetime

#let's convert the pace to float number in minutes
summary['mean_moving_pace_float'] = summary['mean_moving_pace'] / datetime.timedelta(minutes=1)
summary['pace_moving_all_mean'] = summary.mean_moving_pace.mean()
summary['pace_moving_all_mean_float'] = summary['pace_moving_all_mean'] / datetime.timedelta(minutes=1)

plt.subplots(figsize=(8, 5))

plt.plot(summary.index, summary.mean_moving_pace_float, color='silver')
plt.plot(summary.pace_moving_all_mean_float, color='purple', linestyle='dashed', label='average')
plt.title("Pace Evolution")
plt.xlabel("Runnings")
plt.ylabel("Pace")
plt.legend()
<matplotlib.legend.Legend at 0x7f82d8d83cd0>

plt.subplots(figsize=(8, 5))

summary['distance_all_mean'] = round(summary.total_distance.mean()/1000,2)

plt.plot(summary.index, summary.total_distance / 1000, color='silver')
plt.plot(summary.distance_all_mean, color='purple', linestyle='dashed', label='average')
plt.title("Distance Evolution")
plt.xlabel("Runs")
plt.ylabel("distance")
plt.legend()


plt.show()

Accessing historical data from running race results

One of the great features in Runpandas is the capability of accessing race’s result datasets accross several races around the world, from majors to local ones (if it’s available at our data repository). In this example we will analyze the 2022 Berlin Marathon using runpandas methods specially tailored for handling race results data.

First, let’s load the Berlin Marathon data by using the runpandas method runpandas.get_events. This function provides a way of accessing the race data and visualize the results from several marathons available at our datasets repository. Given the year and the marathon identifier you can filter any marathon datasets that you want analyze. The result will be a list of runpandas.EventData instances with race result and its metadata. Let’s look for Berlin Marathon results.

import pandas as pd
import runpandas as rpd
import warnings
warnings.filterwarnings('ignore')
results = rpd.get_events('Berlin')
results
[<Event: name=Berlin Marathon Results from 2022., country=DE, edition=2022>]

The result comes with the Berlin Marathon Result from 2022. Let’s take a look inside the race event, which comes with a handful method to describe its attributes and a special method to load the race result data into a runpandas.datasets.schema.RaceData instance.

berlin_result = results[0]
print('Event type', berlin_result.run_type)
print('Country', berlin_result.country)
print('Year', berlin_result.edition)
print('Name', berlin_result.summary)
Event type RunTypeEnum.MARATHON
Country DE
Year 2022
Name Berlin Marathon Results from 2022.

Now that we confirmed that we requested the corresponding marathon dataset. We will load it into a DataFrame so we can further explore it.

#loading the race data into a RaceData Dataframe
race_result = berlin_result.load()
race_result
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
position position_gender country sex division bib firstname lastname club starttime ... 10k 15k 20k 25k 30k 35k 40k grosstime nettime category
0 1 1 KEN M 1 1 Eliud Kipchoge 09:15:00 ... 0 days 00:28:23 0 days 00:42:33 0 days 00:56:45 0 days 01:11:08 0 days 01:25:40 0 days 01:40:10 0 days 01:54:53 0 days 02:01:09 0 days 02:01:09 M35
1 2 2 KEN M 1 5 Mark Korir 09:15:00 ... 0 days 00:28:56 0 days 00:43:35 0 days 00:58:14 0 days 01:13:07 0 days 01:28:06 0 days 01:43:25 0 days 01:59:05 0 days 02:05:58 0 days 02:05:58 M30
2 3 3 ETH M 1 8 Tadu Abate 09:15:00 ... 0 days 00:29:46 0 days 00:44:40 0 days 00:59:40 0 days 01:14:44 0 days 01:30:01 0 days 01:44:55 0 days 02:00:03 0 days 02:06:28 0 days 02:06:28 MH
3 4 4 ETH M 2 26 Andamlak Belihu 09:15:00 ... 0 days 00:28:23 0 days 00:42:33 0 days 00:56:45 0 days 01:11:09 0 days 01:26:11 0 days 01:42:14 0 days 01:59:14 0 days 02:06:40 0 days 02:06:40 MH
4 5 5 KEN M 3 25 Abel Kipchumba 09:15:00 ... 0 days 00:28:55 0 days 00:43:35 0 days 00:58:14 0 days 01:13:07 0 days 01:28:03 0 days 01:43:08 0 days 01:59:14 0 days 02:06:49 0 days 02:06:49 MH
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
35566 DNF USA M 65079 michael perkowski ... NaT NaT NaT NaT NaT NaT NaT NaT NaT M65
35567 DNF USA M 62027 Karl Mann ... NaT NaT NaT NaT NaT NaT NaT NaT NaT M55
35568 DNF THA F 27196 oraluck pichaiwongse STATE to BERLIN 2022 ... NaT NaT NaT NaT NaT NaT NaT NaT NaT W55
35569 DNF SUI M 56544 Gerardo GARCIA CALZADA ... NaT NaT NaT NaT NaT NaT NaT NaT NaT M50
35570 DNF AUT M 63348 Harald Mori Albatros ... NaT NaT NaT NaT NaT NaT NaT NaT NaT M60

35571 rows × 23 columns

Now you can get some insights about the Berlin Marathon 2022, by using its tailored methods for getting basic and quick insights. For example, the number of finishers, number of participants and the winner info.

print('Total participants', race_result.total_participants)
print('Total finishers', race_result.total_finishers)
print('Total Non-Finishers', race_result.total_nonfinishers)
Total participants 35571
Total finishers 34844
Total Non-Finishers 727
race_result.winner
position                         1
position_gender                  1
country                        KEN
sex                              M
division                         1
bib                              1
firstname                    Eliud
lastname                  Kipchoge
club                             –
starttime                 09:15:00
start_raw_time            09:15:00
half               0 days 00:59:51
5k                 0 days 00:14:14
10k                0 days 00:28:23
15k                0 days 00:42:33
20k                0 days 00:56:45
25k                0 days 01:11:08
30k                0 days 01:25:40
35k                0 days 01:40:10
40k                0 days 01:54:53
grosstime          0 days 02:01:09
nettime            0 days 02:01:09
category                       M35
Name: 0, dtype: object

Eliud Kipchoge of Kenya won the 2022 Berlin Marathon in 2:01:09. Kipchoge’s victory was his fourth in Berlin and 17th overall in a career of 19 marathon starts. And who was the women’s race winner?

race_result[(race_result['position_gender'] == 1) & (race_result['sex'] == 'F')].T
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
32
position 33
position_gender 1
country ETH
sex F
division 1
bib F24
firstname Tigist
lastname Assefa
club
starttime 09:15:00
start_raw_time 09:15:00
half 0 days 01:08:13
5k 0 days 00:16:22
10k 0 days 00:32:36
15k 0 days 00:48:44
20k 0 days 01:04:43
25k 0 days 01:20:48
30k 0 days 01:36:41
35k 0 days 01:52:27
40k 0 days 02:08:42
grosstime 0 days 02:15:37
nettime 0 days 02:15:37
category WH

Tigist Assefa of Ethiopia won the women’s race in a stunning time of 2:15:37 to set a new course record in Berlin.

Runpandas also provides a race’s summary method for showing the compilation of some general insights such as finishers, partipants (by gender and overall).

race_result.summary()
Event name                    berlin marathon
Event type                                42k
Event country                              DE
Event date                         25-09-2022
Number of participants                  35571
Number of finishers                     34844
Number of non-finishers                   727
Number of male finishers                23314
Number of female finishers              11523
Winner Nettime                0 days 02:01:09
dtype: objec

Runpandas for some race results come with the splits for the partial distances of the race. We can fetch for any runner the splits using the method runpandas.acessors.splits.pick_athlete. So, if we need to have direct access to all splits from a specific runner, we will use the splits acesssor.

race_result.splits.pick_athlete(identifier='1')
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
time distance_meters distance_miles
split
0k 0 days 00:00:00 0 0.0000
5k 0 days 00:14:14 5000 3.1069
10k 0 days 00:28:23 10000 6.2137
15k 0 days 00:42:33 15000 9.3206
20k 0 days 00:56:45 20000 12.4274
half 0 days 00:59:51 21097 13.1091
25k 0 days 01:11:08 25000 15.5343
30k 0 days 01:25:40 30000 18.6411
35k 0 days 01:40:10 35000 21.7480
40k 0 days 01:54:53 40000 24.8548
nettime 0 days 02:01:09 42195 26.2187

With plotting libraries such as matplotlib you can analyze the splits data through a impressive visualization!

eliud_kipchoge_splits = race_result.splits.pick_athlete(identifier='1')
def timeTicks(x, pos):
    seconds = x / 10**9
    d = datetime.timedelta(seconds=seconds)
    return str(d)

fig, ax2 = plt.subplots()
#plot the splits time
#format the y-axis to show the labels as timedelta.
formatter = matplotlib.ticker.FuncFormatter(timeTicks)
#plot the paces per segment
line2, = ax2.plot(eliud_kipchoge_splits_filtered.index, eliud_kipchoge_splits_filtered['pace'],  linestyle='dashed', color='cyan',  lw=5, alpha=0.8)
#plot the overall mean pace
line3, = ax2.plot(eliud_kipchoge_splits_filtered.index, eliud_kipchoge_splits_filtered['mean_pace'], color='#1b9e77', linestyle='dashed',  lw=5, alpha=0.8)

#annotate the pace line with time splits
yvalues = line2.get_ydata()
for index, y in zip(eliud_kipchoge_splits_filtered.index, yvalues):
    formated_time = datetime.timedelta(seconds=eliud_kipchoge_splits_filtered.loc[index,'split_time'].total_seconds())
    ax2.text(index, y, formated_time, weight="bold", size=12,   )

ax2.yaxis.set_major_formatter(formatter)

ax2.grid(False)

ax2.legend(
            (line2, line3),
            ('Splits Time', 'Splits Pace', 'Mean Pace'),
            loc='lower right',
            frameon=False
)


ax2.set_title("Eliud Kipchoge splits time and pace in Berlin Marathon 2022")
ax2.set_xlabel("Splits in kms")
ax2.set_ylabel("Pace min/km")

plt.show()

examples/overview_files/5-marathon_analysis_80_0.png

Get in touch

I'm very interested in your experience with runpandas. Please drop me an note with any feedback you have.

Contributions welcome!

- Marcel Caraciolo

License

Runpandas is licensed under the MIT License. A copy of which is included in LICENSE.