PedestrianDynamics · chraibi · Jun 4, 2024 · Mar 21, 2024 · Mar 21, 2024 · Mar 22, 2024
diff --git a/.gitignore b/.gitignore
@@ -317,3 +317,4 @@ cython_debug/
 docs/build
 pedpy/_version.py
 # End of https://www.toptal.com/developers/gitignore/api/python,visualstudiocode,pycharm,jupyternotebooks
+.vscode/launch.json
diff --git a/notebooks/demo-data/single_file/n34_cam2.csv b/notebooks/demo-data/single_file/n34_cam2.csv
diff --git a/notebooks/demo-data/single_file/n56_cam1.csv b/notebooks/demo-data/single_file/n56_cam1.csv
diff --git a/notebooks/user_guide.ipynb b/notebooks/user_guide.ipynb
@@ -2675,6 +2675,176 @@
     "plt.show()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Pedestrian Dynamics : Spatial Analysis \n",
+    "This section corresponds to analysis method which can be used to characterise different crowds or group formations.\n",
+    "These methods may include measurement of the time-to-collision, pair-distribution function and measurement of crowd polarization.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Pair-distibution function (PDF)\n",
+    "This method is inspired from condensed matter description and used in the work of [Cordes et al. (2023)](https://doi.org/10.1093/pnasnexus/pgae120) following [Karamousas et al. (2014)](https://doi.org/10.1103/PhysRevLett.113.238701).\n",
+    "The pair-distribution function (PDF): \n",
+    "\n",
+    "$$g(r)=P(r)/P_{Ni}(r)$$\n",
+    "\n",
+    "\"Quantifies the probability that two interacting pedestrians are found a given distance r apart, renormalized by the probability $P_{Ni}$ of measuring this distance for pedestrians that do not interact.\"\n",
+    "\n",
+    "This probability is approximated here by randomising the time information (randomisation of the time frames).\n",
+    "\n",
+    "The pair-distribution function of a given crowd recording can be computed using the following instructions:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pathlib\n",
+    "from pedpy import load_trajectory_from_txt, TrajectoryUnit\n",
+    "\n",
+    "# Load trajectories\n",
+    "path_to_file = \"./demo-data/bottleneck/040_c_56_h-.txt\"\n",
+    "traj = load_trajectory_from_txt(\n",
+    "    trajectory_file=pathlib.Path(path_to_file),\n",
+    "    default_unit=TrajectoryUnit.METER,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pedpy import compute_pair_distibution_function\n",
+    "\n",
+    "# Compute pair distribution function\n",
+    "radius_bins, pair_distribution = compute_pair_distibution_function(\n",
+    "    traj_data=traj, radius_bin_size=0.1, randomisation_stacking=1\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# Plot the function\n",
+    "fig, ax1 = plt.subplots(figsize=(5, 5))\n",
+    "ax1.plot(radius_bins, pair_distribution)\n",
+    "ax1.set_title(\"Pair Distribution Function\")\n",
+    "ax1.set_xlabel(\"$r$\", fontsize=16)\n",
+    "ax1.set_ylabel(\"$g(r)$\", fontsize=16)\n",
+    "ax1.grid(True, alpha=0.3)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Parameters of the PDF\n",
+    "The function `compute_pair_distibution_function` has two main parameters:\n",
+    "- `radius_bin_size` is the size of the radius bins for which probability will be computed. On one hand a larger bin size results in smoother pdf but decreases the accuracy of the description, as more individuals can be detected in each bin. On the other hand, a smaller bin will increase the accuracy of the description but may lead to noisy or `Nan` values as each bin may not be populated (leading to invalid divisions). We suggest using a bin size value between 0.1 and 0.3 m as these values are close to order of magniture of a chest depth.\n",
+    "- `randomisation_stacking` is the number of time the data stacked before being shuffled in order to compute the probability $P_{Ni}$ of measuring given pair-wise distances for pedestrians that do not interact. Stacking the data multiple times helps harmonize the random positions more effectively, ensuring that the PDF converges to results that are independent of the randomization method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "### Variation of the two parameters\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from matplotlib.cm import twilight\n",
+    "from time import time\n",
+    "import pathlib\n",
+    "from pedpy import (\n",
+    "    compute_pair_distibution_function,\n",
+    "    load_trajectory_from_txt,\n",
+    "    TrajectoryUnit,\n",
+    ")\n",
+    "\n",
+    "# Load trajectories\n",
+    "path_to_file = \"./demo-data/bottleneck/040_c_56_h-.txt\"\n",
+    "traj = load_trajectory_from_txt(\n",
+    "    trajectory_file=pathlib.Path(path_to_file),\n",
+    "    default_unit=TrajectoryUnit.METER,\n",
+    ")\n",
+    "# Create a colormap\n",
+    "cmap = twilight\n",
+    "\n",
+    "\n",
+    "# Plot the function on the first axis\n",
+    "fig, (ax0, ax1) = plt.subplots(\n",
+    "    1, 2, figsize=(10, 5)\n",
+    ")  # create a figure with 2 axes in one row\n",
+    "\n",
+    "## Test the effect of `radius_bin_size`\n",
+    "param = [0.05, 0.1, 0.25, 0.5]\n",
+    "N_it = len(param)  # number of plots per parameters\n",
+    "\n",
+    "for i, p in enumerate(param):\n",
+    "    # compute the pdf with the given parameter\n",
+    "    radius_bins, pair_distribution = compute_pair_distibution_function(\n",
+    "        traj_data=traj, radius_bin_size=p, randomisation_stacking=1\n",
+    "    )\n",
+    "    ax0.plot(\n",
+    "        radius_bins,\n",
+    "        pair_distribution,\n",
+    "        color=cmap(i / N_it),\n",
+    "        label=\"$r_{bin}=$\" + str(p),\n",
+    "    )\n",
+    "\n",
+    "ax0.set_title(\"Effect of `radius_bin_size`\")\n",
+    "ax0.set_ylim((0, 1.3))\n",
+    "ax0.set_xlabel(\"$r$\", fontsize=16)\n",
+    "ax0.set_ylabel(\"$g(r)$\", fontsize=16)\n",
+    "ax0.grid(True, alpha=0.3)\n",
+    "ax0.legend(title=\"Bin sizes\")\n",
+    "\n",
+    "## Test the effect of `randomisation_stacking`\n",
+    "param = [1, 3, 5]\n",
+    "N_it = len(param)  # number of plots per parameters\n",
+    "\n",
+    "for i, p in enumerate(param):\n",
+    "    beg_t = time()\n",
+    "    radius_bins, pair_distribution = compute_pair_distibution_function(\n",
+    "        traj_data=traj, radius_bin_size=0.15, randomisation_stacking=p\n",
+    "    )\n",
+    "    end_t = time()\n",
+    "    ax1.plot(\n",
+    "        radius_bins,\n",
+    "        pair_distribution,\n",
+    "        color=cmap(0.1 + i / N_it),\n",
+    "        label=str(p) + \" times: \" + str(np.round(end_t - beg_t, 2)) + \"s\",\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "ax1.set_ylim((0, 1.3))\n",
+    "ax1.set_title(\"Effect of 'randomisation_stacking'\")\n",
+    "ax1.set_ylabel(\"$g(r)$\", fontsize=16)\n",
+    "ax1.set_xlabel(\"$r$\", fontsize=16)\n",
+    "ax1.grid(True, alpha=0.3)\n",
+    "\n",
+    "fig.tight_layout()\n",
+    "ax1.legend(title=\"Nb of stacks: Execution time\")\n",
+    "plt.show()"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -3305,7 +3475,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.8"
+   "version": "3.12.2"
   }
  },
  "nbformat": 4,

diff --git a/pedpy/__init__.py b/pedpy/__init__.py
@@ -39,6 +39,9 @@
     get_invalid_trajectory,
     is_trajectory_valid,
 )
+from .methods.pair_distribution_function import (
+    compute_pair_distibution_function,
+)
 from .methods.profile_calculator import (
     DensityMethod,
     SpeedMethod,
@@ -104,6 +107,7 @@
     "compute_time_distance_line",
     "get_invalid_trajectory",
     "is_trajectory_valid",
+    "compute_pair_distibution_function",
     "DensityMethod",
     "SpeedMethod",
     "compute_density_profile",

diff --git a/pedpy/methods/pair_distribution_function.py b/pedpy/methods/pair_distribution_function.py
@@ -0,0 +1,138 @@
+"""Module containing functions to compute pair distribution function."""
+
+import warnings
+from typing import Tuple
+
+import numpy as np
+import numpy.typing as npt
+import pandas
+from scipy.spatial.distance import cdist
+
+from pedpy.column_identifier import FRAME_COL, ID_COL, X_COL, Y_COL
+from pedpy.data.trajectory_data import TrajectoryData
+
+
+def compute_pair_distibution_function(
+    *,
+    traj_data: TrajectoryData,
+    radius_bin_size: float,
+    randomisation_stacking: int = 1,
+) -> Tuple[npt.NDArray[np.float16], npt.NDArray[np.float16]]:
+    """Computes the pair distribution function g(r) for a given set of trajectory data.
+
+    This function calculates the spatial distribution of positions :math:`g(r)`
+    :math:`g(r)` here referred to the Euclidean distance between agents.
+    The pdf is given by the probability that two pedestrians are separated
+    by :math:`r` normalized by the probability :math:`PNI(r)` that two non-interacting pedestrians
+    are separated by :math:`r`, specifically
+
+    .. math::
+        g(r) = P(r)/PNI(r),
+
+    Args:
+        traj_data: TrajectoryData, an object containing the trajectories.
+        radius_bin_size: float, the size of the bins for the radial distribution function in the
+            same units as the positions.
+        randomisation_stacking: int, Number of time the dataset will be stacked before being
+            randomly shuffled to exact distances of non-interacting pedestrians. Larger stacking
+            number will lead to closer approximation of true pairwise distribution of non-
+            interacting pedestrians but with also increase computation cost.
+
+
+    Returns:
+        Tuple[np.ndarray, np.ndarray]: A tuple of two numpy arrays. The first array contains the
+            bin edges (excluding the first bin edge), and the second array contains the values of
+            the pair-distribution function :math:`g(r)` for each bin.
+    """
+    data_df = traj_data.data
+
+    # Create Dataframe with all pairwise distances
+    pairwise_dist_array = calculate_data_frame_pair_dist(data_df)
+
+    # Concatenate the working dataframe (data_df) to match the number of randomization cycles
+    concatenated_random_df = pandas.concat(
+        [data_df] * randomisation_stacking, ignore_index=True
+    )
+    # Scramble time-information to mitigate finite-size effects and calculate
+    # pairwise distances of scrambled dataset
+    concatenated_random_df.frame = concatenated_random_df.frame.sample(
+        frac=1
+    ).reset_index(drop=True)
+    pairwise_dist_ni_array = calculate_data_frame_pair_dist(
+        concatenated_random_df
+    )
+
+    ## Create the bin for data
+    radius_bins = np.arange(0, pairwise_dist_array.max(), radius_bin_size)
+
+    # Calculate pair distibution: g(r)
+    ## Actual distribution
+    pd_bins = pandas.cut(pairwise_dist_array, radius_bins)
+    pd_bins_normalised = (pd_bins.value_counts().sort_index().to_numpy()) / len(
+        pairwise_dist_array
+    )  # Normalising by the number of pairwise distances in the dataframe
+    ## Scrambled distribution
+    pd_ni_bins = pandas.cut(pairwise_dist_ni_array, radius_bins)
+    pd_ni_bins_normalised = (
+        pd_ni_bins.value_counts().sort_index().to_numpy()
+    ) / len(
+        pairwise_dist_ni_array
+    )  # Normalising by the number of pairwise distances in the dataframe
+
+    # Suppress warnings
+    warnings.filterwarnings("ignore")
+
+    try:
+        with np.errstate(divide="raise"):
+            pair_distribution = pd_bins_normalised / pd_ni_bins_normalised
+        warnings.filterwarnings("default")  # reset warning-values
+
+    except FloatingPointError:
+        warnings.filterwarnings("default")  # reset warning-values
+        pair_distribution = pd_bins_normalised / pd_ni_bins_normalised
+        warning_message = (
+            "Random probability distribution contains null values,"
+            + "try using larger dx or more randomization cycles."
+        )
+        warnings.warn(warning_message)
+
+    return radius_bins[1:], pair_distribution
+
+
+def calculate_data_frame_pair_dist(
+    data_df: pandas.DataFrame,
+) -> npt.NDArray[np.float16]:
+    """Calculates the pairwise distances for a given pandas DataFrame of pedestrian positions.
+
+    This function calculates the pairwise Euclidean distances between all pedestrian positions
+    and returns an array containning Euclidean distances between every possible pair of pedestrian
+    positions at every time frame.
+
+    Args:
+        data_df: pandas.DataFrame, a DataFrame containing pedestrian positions, where columns are
+        identified by FRAME_COL, ID_COL, X_COL, and Y_COL constants.
+
+    Returns:
+        npt.NDArray[np.float16]: A 1D numpy array of pairwise distances.
+    """
+    distances_list = []
+
+    for _, frame_df in data_df.groupby(FRAME_COL):
+        number_pedestrians = len(frame_df[ID_COL].unique())
+        if number_pedestrians > 1:
+            x_values = frame_df[X_COL].values
+            y_values = frame_df[Y_COL].values
+            coordinates = np.stack((x_values, y_values), axis=-1)
+            # Calculate pairwise distances for the current frame using cdist
+            frame_distances = cdist(
+                coordinates, coordinates, metric="euclidean"
+            )
+
+            # Extract the upper triangle without the diagonal
+            distances_upper_triangle = frame_distances[
+                np.triu_indices_from(frame_distances, k=1)
+            ]
+
+            distances_list.extend(distances_upper_triangle)
+
+    return np.array(distances_list)