Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pair distribution function #320

Merged
merged 29 commits into from
Jun 4, 2024
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
29535b5
udate-version-file
ThoChat Mar 21, 2024
8c3fea1
First implementation pdf computation
ThoChat Mar 21, 2024
7e278ad
First debuged version
ThoChat Mar 22, 2024
ba86377
2d and optimisation
ThoChat Mar 27, 2024
d82530d
Add notebook to test pdf function
chraibi Mar 29, 2024
241f123
Add documentation and vectorised version of dist calculation
chraibi Mar 29, 2024
40a63dd
fix stack of arrays and use Divide
chraibi Mar 30, 2024
ace9e02
Raise warning for division by null values during pdf computation #330
ThoChat Apr 3, 2024
b596877
Merge branch 'PedestrianDynamics:main' into Pair-Distribution-Function
ThoChat Apr 3, 2024
b103e1b
Add a parameter to pdf to control the number of data scrambling loops…
ThoChat Apr 4, 2024
44601d2
Merge remote-tracking branch 'origin/Pair-Distribution-Function' into…
ThoChat Apr 4, 2024
82cfe43
Method error fix
ThoChat Apr 4, 2024
6d5c74b
Change atribute name and documentation
ThoChat Apr 4, 2024
795479c
First modification of the User guide
ThoChat Apr 16, 2024
94138eb
Final Changes User guide and remove test codes
ThoChat Apr 17, 2024
f8c60cf
Minor correction User guide
ThoChat Apr 17, 2024
5616020
del _version for the PR
ThoChat Apr 17, 2024
73c2ee8
Clear User Guide output
ThoChat Apr 22, 2024
15b71c1
Bug fix
ThoChat Apr 22, 2024
aeba30c
Changing name of the file containning the pdf computation
ThoChat Apr 23, 2024
a92690c
Minor correction and formating to pass all continuity tests
ThoChat Apr 29, 2024
a1ca8fd
formating
ThoChat Apr 29, 2024
9163e47
formatting
ThoChat Apr 29, 2024
5ae398a
formating
ThoChat Apr 29, 2024
da9f51b
formating
ThoChat Apr 29, 2024
7bc59cf
Editing the text of PDF
chraibi May 28, 2024
f742424
fix small typos
chraibi May 28, 2024
91901aa
Changing name in the API doc
ThoChat Jun 3, 2024
6e7475c
formating
ThoChat Jun 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -317,3 +317,4 @@ cython_debug/
docs/build
pedpy/_version.py
# End of https://www.toptal.com/developers/gitignore/api/python,visualstudiocode,pycharm,jupyternotebooks
.vscode/launch.json
1,102 changes: 1,102 additions & 0 deletions notebooks/demo-data/single_file/n34_cam2.csv

Large diffs are not rendered by default.

2,392 changes: 2,392 additions & 0 deletions notebooks/demo-data/single_file/n56_cam1.csv

Large diffs are not rendered by default.

172 changes: 171 additions & 1 deletion notebooks/user_guide.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2675,6 +2675,176 @@
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pedestrian Dynamics : Spatial Analysis \n",
"This section corresponds to analysis method which can be used to characterise different crowds or group formations.\n",
"These methods may include measurement of the time-to-collision, pair-distribution function and measurement of crowd polarization.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pair-distibution function (PDF)\n",
"This method is inspired from condensed matter description and used in the work of [Cordes et al. (2023)](https://doi.org/10.1093/pnasnexus/pgae120) following [Karamousas et al. (2014)](https://doi.org/10.1103/PhysRevLett.113.238701).\n",
"The pair-distribution function (PDF): \n",
"\n",
"$$g(r)=P(r)/P_{Ni}(r)$$\n",
"\n",
"\"Quantifies the probability that two interacting pedestrians are found a given distance r apart, renormalized by the probability $P_{Ni}$ of measuring this distance for pedestrians that do not interact.\"\n",
"\n",
"This probability is approximated here by randomising the time information (randomisation of the time frames).\n",
"\n",
"The pair-distribution function of a given crowd recording can be computed using the following instructions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pathlib\n",
"from pedpy import load_trajectory_from_txt, TrajectoryUnit\n",
"\n",
"# Load trajectories\n",
"path_to_file = \"./demo-data/bottleneck/040_c_56_h-.txt\"\n",
"traj = load_trajectory_from_txt(\n",
" trajectory_file=pathlib.Path(path_to_file),\n",
" default_unit=TrajectoryUnit.METER,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pedpy import compute_pair_distibution_function\n",
"\n",
"# Compute pair distribution function\n",
"radius_bins, pair_distribution = compute_pair_distibution_function(\n",
" traj_data=traj, radius_bin_size=0.1, randomisation_stacking=1\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"# Plot the function\n",
"fig, ax1 = plt.subplots(figsize=(5, 5))\n",
"ax1.plot(radius_bins, pair_distribution)\n",
"ax1.set_title(\"Pair Distribution Function\")\n",
"ax1.set_xlabel(\"$r$\", fontsize=16)\n",
"ax1.set_ylabel(\"$g(r)$\", fontsize=16)\n",
"ax1.grid(True, alpha=0.3)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Parameters of the PDF\n",
"The function `compute_pair_distibution_function` has two main parameters:\n",
"- `radius_bin_size` is the size of the radius bins for which probability will be computed. On one hand a larger bin size results in smoother pdf but decreases the accuracy of the description, as more individuals can be detected in each bin. On the other hand, a smaller bin will increase the accuracy of the description but may lead to noisy or `Nan` values as each bin may not be populated (leading to invalid divisions). We suggest using a bin size value between 0.1 and 0.3 m as these values are close to order of magniture of a chest depth.\n",
"- `randomisation_stacking` is the number of time the data stacked before being shuffled in order to compute the probability $P_{Ni}$ of measuring given pair-wise distances for pedestrians that do not interact. Stacking the data multiple times helps harmonize the random positions more effectively, ensuring that the PDF converges to results that are independent of the randomization method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"### Variation of the two parameters\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from matplotlib.cm import twilight\n",
"from time import time\n",
"import pathlib\n",
"from pedpy import (\n",
" compute_pair_distibution_function,\n",
" load_trajectory_from_txt,\n",
" TrajectoryUnit,\n",
")\n",
"\n",
"# Load trajectories\n",
"path_to_file = \"./demo-data/bottleneck/040_c_56_h-.txt\"\n",
"traj = load_trajectory_from_txt(\n",
" trajectory_file=pathlib.Path(path_to_file),\n",
" default_unit=TrajectoryUnit.METER,\n",
")\n",
"# Create a colormap\n",
"cmap = twilight\n",
"\n",
"\n",
"# Plot the function on the first axis\n",
"fig, (ax0, ax1) = plt.subplots(\n",
" 1, 2, figsize=(10, 5)\n",
") # create a figure with 2 axes in one row\n",
"\n",
"## Test the effect of `radius_bin_size`\n",
"param = [0.05, 0.1, 0.25, 0.5]\n",
"N_it = len(param) # number of plots per parameters\n",
"\n",
"for i, p in enumerate(param):\n",
" # compute the pdf with the given parameter\n",
" radius_bins, pair_distribution = compute_pair_distibution_function(\n",
" traj_data=traj, radius_bin_size=p, randomisation_stacking=1\n",
" )\n",
" ax0.plot(\n",
" radius_bins,\n",
" pair_distribution,\n",
" color=cmap(i / N_it),\n",
" label=\"$r_{bin}=$\" + str(p),\n",
" )\n",
"\n",
"ax0.set_title(\"Effect of `radius_bin_size`\")\n",
"ax0.set_ylim((0, 1.3))\n",
"ax0.set_xlabel(\"$r$\", fontsize=16)\n",
"ax0.set_ylabel(\"$g(r)$\", fontsize=16)\n",
"ax0.grid(True, alpha=0.3)\n",
"ax0.legend(title=\"Bin sizes\")\n",
"\n",
"## Test the effect of `randomisation_stacking`\n",
"param = [1, 3, 5]\n",
"N_it = len(param) # number of plots per parameters\n",
"\n",
"for i, p in enumerate(param):\n",
" beg_t = time()\n",
" radius_bins, pair_distribution = compute_pair_distibution_function(\n",
" traj_data=traj, radius_bin_size=0.15, randomisation_stacking=p\n",
" )\n",
" end_t = time()\n",
" ax1.plot(\n",
" radius_bins,\n",
" pair_distribution,\n",
" color=cmap(0.1 + i / N_it),\n",
" label=str(p) + \" times: \" + str(np.round(end_t - beg_t, 2)) + \"s\",\n",
" )\n",
"\n",
"\n",
"ax1.set_ylim((0, 1.3))\n",
"ax1.set_title(\"Effect of 'randomisation_stacking'\")\n",
"ax1.set_ylabel(\"$g(r)$\", fontsize=16)\n",
"ax1.set_xlabel(\"$r$\", fontsize=16)\n",
"ax1.grid(True, alpha=0.3)\n",
"\n",
"fig.tight_layout()\n",
"ax1.legend(title=\"Nb of stacks: Execution time\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
Expand Down Expand Up @@ -3305,7 +3475,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
"version": "3.12.2"
}
},
"nbformat": 4,
Expand Down
4 changes: 4 additions & 0 deletions pedpy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@
get_invalid_trajectory,
is_trajectory_valid,
)
from .methods.pair_distribution_function import (
compute_pair_distibution_function,
)
from .methods.profile_calculator import (
DensityMethod,
SpeedMethod,
Expand Down Expand Up @@ -104,6 +107,7 @@
"compute_time_distance_line",
"get_invalid_trajectory",
"is_trajectory_valid",
"compute_pair_distibution_function",
"DensityMethod",
"SpeedMethod",
"compute_density_profile",
Expand Down
138 changes: 138 additions & 0 deletions pedpy/methods/pair_distribution_function.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
"""Module containing functions to compute pair distribution function."""

import warnings
from typing import Tuple

import numpy as np
import numpy.typing as npt
import pandas
from scipy.spatial.distance import cdist

from pedpy.column_identifier import FRAME_COL, ID_COL, X_COL, Y_COL
from pedpy.data.trajectory_data import TrajectoryData


def compute_pair_distibution_function(
*,
traj_data: TrajectoryData,
radius_bin_size: float,
randomisation_stacking: int = 1,
) -> Tuple[npt.NDArray[np.float16], npt.NDArray[np.float16]]:
"""Computes the pair distribution function g(r) for a given set of trajectory data.

This function calculates the spatial distribution of positions :math:`g(r)`
:math:`g(r)` here referred to the Euclidean distance between agents.
The pdf is given by the probability that two pedestrians are separated
by :math:`r` normalized by the probability :math:`PNI(r)` that two non-interacting pedestrians
are separated by :math:`r`, specifically

.. math::
g(r) = P(r)/PNI(r),

Args:
traj_data: TrajectoryData, an object containing the trajectories.
radius_bin_size: float, the size of the bins for the radial distribution function in the
same units as the positions.
randomisation_stacking: int, Number of time the dataset will be stacked before being
randomly shuffled to exact distances of non-interacting pedestrians. Larger stacking
number will lead to closer approximation of true pairwise distribution of non-
interacting pedestrians but with also increase computation cost.


Returns:
Tuple[np.ndarray, np.ndarray]: A tuple of two numpy arrays. The first array contains the
bin edges (excluding the first bin edge), and the second array contains the values of
the pair-distribution function :math:`g(r)` for each bin.
"""
data_df = traj_data.data

# Create Dataframe with all pairwise distances
pairwise_dist_array = calculate_data_frame_pair_dist(data_df)

# Concatenate the working dataframe (data_df) to match the number of randomization cycles
concatenated_random_df = pandas.concat(
[data_df] * randomisation_stacking, ignore_index=True
)
# Scramble time-information to mitigate finite-size effects and calculate
# pairwise distances of scrambled dataset
concatenated_random_df.frame = concatenated_random_df.frame.sample(
frac=1
).reset_index(drop=True)
pairwise_dist_ni_array = calculate_data_frame_pair_dist(
concatenated_random_df
)

## Create the bin for data
radius_bins = np.arange(0, pairwise_dist_array.max(), radius_bin_size)

# Calculate pair distibution: g(r)
## Actual distribution
pd_bins = pandas.cut(pairwise_dist_array, radius_bins)
pd_bins_normalised = (pd_bins.value_counts().sort_index().to_numpy()) / len(
pairwise_dist_array
) # Normalising by the number of pairwise distances in the dataframe
## Scrambled distribution
pd_ni_bins = pandas.cut(pairwise_dist_ni_array, radius_bins)
pd_ni_bins_normalised = (
pd_ni_bins.value_counts().sort_index().to_numpy()
) / len(
pairwise_dist_ni_array
) # Normalising by the number of pairwise distances in the dataframe

# Suppress warnings
warnings.filterwarnings("ignore")

try:
with np.errstate(divide="raise"):
pair_distribution = pd_bins_normalised / pd_ni_bins_normalised
warnings.filterwarnings("default") # reset warning-values

except FloatingPointError:
warnings.filterwarnings("default") # reset warning-values
pair_distribution = pd_bins_normalised / pd_ni_bins_normalised
warning_message = (
"Random probability distribution contains null values,"
+ "try using larger dx or more randomization cycles."
)
warnings.warn(warning_message)

return radius_bins[1:], pair_distribution


def calculate_data_frame_pair_dist(
data_df: pandas.DataFrame,
) -> npt.NDArray[np.float16]:
"""Calculates the pairwise distances for a given pandas DataFrame of pedestrian positions.

This function calculates the pairwise Euclidean distances between all pedestrian positions
and returns an array containning Euclidean distances between every possible pair of pedestrian
positions at every time frame.

Args:
data_df: pandas.DataFrame, a DataFrame containing pedestrian positions, where columns are
identified by FRAME_COL, ID_COL, X_COL, and Y_COL constants.

Returns:
npt.NDArray[np.float16]: A 1D numpy array of pairwise distances.
"""
distances_list = []

for _, frame_df in data_df.groupby(FRAME_COL):
number_pedestrians = len(frame_df[ID_COL].unique())
if number_pedestrians > 1:
x_values = frame_df[X_COL].values
y_values = frame_df[Y_COL].values
coordinates = np.stack((x_values, y_values), axis=-1)
# Calculate pairwise distances for the current frame using cdist
frame_distances = cdist(
coordinates, coordinates, metric="euclidean"
)

# Extract the upper triangle without the diagonal
distances_upper_triangle = frame_distances[
np.triu_indices_from(frame_distances, k=1)
]

distances_list.extend(distances_upper_triangle)

return np.array(distances_list)