Skip to content

Commit

Permalink
Merge pull request #859 from 2i2c-org/partnerships-hub-activity
Browse files Browse the repository at this point in the history
Partnerships: Document how to track hub usage with Grafana
  • Loading branch information
jnywong authored May 13, 2024
2 parents cec062a + b2698bc commit 8284ab7
Show file tree
Hide file tree
Showing 8 changed files with 379 additions and 3 deletions.
5 changes: 4 additions & 1 deletion .github/workflows/test-docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,13 @@ jobs:
python-version: "3.11"

- name: Install deps
run: pip install -r requirements.txt
run: |
pip install -r requirements.txt
# linkcheck is done separately from this in doc-links.yml, scheduled to
# run every day and open/update an issue
- name: make dirhtml
env:
GRAFANA_TOKEN: ${{ secrets.GRAFANA_TOKEN }}
run: |
make dirhtml SPHINXOPTS='--color -W --keep-going'
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
_build
.vscode
*/.ipynb_checkpoints/*
**/.ipynb_checkpoints/*
tmp/
.DS_Store
.nox
.env
__pycache__

# Assets generated at build
Expand Down
11 changes: 10 additions & 1 deletion conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,10 @@
"use_repository_button": True,
"use_edit_page_button": True,
}

html_js_files = [
"https://cdn.plot.ly/plotly-2.31.1.min.js", # NOTE: load plotly before require
"https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"
]


# -- Options for intersphinx extension ---------------------------------------
Expand Down Expand Up @@ -137,3 +140,9 @@ def setup(app):
# The token exists in our RTD environmet so it should work there.
path_script = Path(__file__).parent / "_data/support_stewards/gen_support_stewards.py"
run(f"python {path_script}", shell=True)

# -- myst_nb configuration --

nb_execution_timeout = -1 # no timeout
nb_execution_mode = 'auto'
suppress_warnings = ["mystnb.unknown_mime_type"]
279 changes: 279 additions & 0 deletions partnerships/community_success/hub-activity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.16.2
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---

+++ {"editable": true, "slideshow": {"slide_type": ""}, "user_expressions": []}

# Track Hub Usage with Grafana

+++ {"user_expressions": []}

## Overview

Grafana is an open source analytics and interactive visualization web application. Prometheus is an open-source monitoring and alerting platform that collects and stores metrics as time-series data, which feeds into Grafana as a data source.

Grafana dashboard deployments for 2i2c hubs (k8s+JupyterHub) follow the templates outlined in the GitHub repository https://github.com/jupyterhub/grafana-dashboards. Note that Prometheus data is retained for 1 year on 2i2c hubs.

+++ {"user_expressions": []}

## Prerequisites

### Create a Grafana service account and generate a token

See [Grafana docs – Service Accounts](https://grafana.com/docs/grafana/latest/administration/service-accounts/) for more details.

1. Navigate to the [2i2c Grafana](https://grafana.pilot.2i2c.cloud) website at `https://grafana.pilot.2i2c.cloud`.
1. Open the *{octicon}`three-bars` Menu* and click on *{octicon}`gear` Administration > Users and access > Service accounts*.
1. Click on the {guilabel}`Add service account` button on the top-right.
1. Choose a descriptive *Display name*, e.g. `username-local-prometheus-access` and leave the role as *Viewer*. Click the {guilabel}`Create` button to confirm.
1. You will see a new page with the details of the service account you have created. In the section *Tokens*, click the {guilabel}`Add service account token` button to generate a token to authenticate with the Grafana API.
1. Choose a descriptive name for the token and then set a token expiry date. We recommend 1 month from now.[^token]
1. Click the {guilabel}`Generate token button` to confirm.
1. **Important:** Copy the token and keep a copy somewhere safe. You will not be able to see it again. Losing a token requires creating a new one.

[^token]: After the token expires, you will need to regenerate a new token and update its value in the local `.env` file, GitHub action secret and/or Read the Docs environment variable.

### Configure Grafana Token access

See [](../../reference/documentation/secrets.md) for a general guide to configuring access to the Grafana Token in a local development environment, or while deploying documentation with GitHub actions or Read the Docs.

+++ {"user_expressions": []}

(hub-activity:python-packages)=
### Python packages

We require the following Python packages to run the code in this guide:

- `python-dotenv` – load environment variables defined in `.env` to your notebook session
- `dateparser` – parse human readable dates
- `prometheus-pandas` – query Prometheus and format into Pandas data structures
- `plotly` – visualize interactive plots

+++ {"user_expressions": []}

### Javascript

We require the [`plotly.js`](https://plotly.com/javascript/) Javascript library to render the interactive plotly graphs.

In your local development environment, enable the [`jupyter-dash`](https://github.com/plotly/jupyter-dash) extension for JupyterLab.

For Jupyter Book/MyST deployments, enable the following Javascript libraries in your configuration file:

```yaml
https://cdn.plot.ly/plotly-2.31.1.min.js", # NOTE: load plotly before require.js
https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"
```

## Import packages and define functions

+++ {"user_expressions": []}

Import packages and set the Pandas plotting backend to the Plotly engine.

```{code-cell} ipython3
import os
import re
import requests
from dotenv import load_dotenv
from urllib.parse import urlencode
from datetime import datetime
from dateparser import parse as dateparser_parse
from prometheus_pandas.query import Prometheus
import pandas as pd
import plotly.graph_objects as go
pd.options.plotting.backend = "plotly"
```

+++ {"user_expressions": []}

Load the Grafana token as an environment variable from the `.env` file or GitHub/Read the Docs secret.

```{code-cell} ipython3
load_dotenv()
GRAFANA_TOKEN = os.environ["GRAFANA_TOKEN"]
```

+++ {"user_expressions": []}

Define a `get_default_prometheus_uid` function to get the unique id of the Prometheus data source.

```{code-cell} ipython3
def get_prometheus_datasources(grafana_url: str, grafana_token: str) -> str:
"""
Get the uid of the default Prometheus configured for this Grafana.
Parameters
----------
grafana_url: str
API URL of Grafana for querying. Must end in a trailing slash.
grafana_token: str
Service account token with appropriate rights to make this API call.
"""
api_url = f"{grafana_url}/api/datasources"
datasources = requests.get(
api_url,
headers={
"Accept": "application/json",
"Content-Type": "application/json",
"Authorization": f"Bearer {grafana_token}",
}
)
# Convert to a dataframe so that we can manipulate more easily
df = pd.DataFrame.from_dict(datasources.json())
# Move "name" to the first column by setting and resetting it as the index
df = df.set_index("name").reset_index()
# Filter for sources with type prometheus
df = df.query("type == 'prometheus'")
return df
```

+++ {"user_expressions": []}

Define the `get_pandas_prometheus` function that creates and Prometheus client and formats the result into a pandas dataframe.

```{code-cell} ipython3
def get_pandas_prometheus(grafana_url: str, grafana_token: str, prometheus_uid: str):
"""
Create a Prometheus client and format the result as a pandas data stucture.
Parameters
----------
grafana_url: str
URL of Grafana for querying. Must end in a trailing slash.
grafana_token: str
Service account token with appropriate rights to make this API call.
prometheus_uid: str
uid of Prometheus datasource within grafana to query.
"""
session = requests.Session() # Session to use for requests
session.headers = {"Authorization": f"Bearer {grafana_token}"}
proxy_url = f"{grafana_url}/api/datasources/proxy/uid/{prometheus_uid}/" # API URL to query server
return Prometheus(proxy_url, session)
```

+++ {"user_expressions": []}

## Execute the main program

+++ {"user_expressions": []}

Fetch all available data sources from Prometheus.

```{code-cell} ipython3
datasources = get_prometheus_datasources("https://grafana.pilot.2i2c.cloud", GRAFANA_TOKEN)
```

+++ {"user_expressions": []}

Define a query for the data source using [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/), formatted as a string. The query below finds the maximum number of unique users over the last 24 hour period and aggregrates by hub name.

```{code-cell} ipython3
query = """
max(
jupyterhub_active_users{period="24h", namespace=~".*"}
) by (namespace)
"""
```

+++ {"user_expressions": []}

:::{note}

Writing efficient PromQL queries is important to make sure that the query actually completes, especially over large periods of time. However, most queries users of JupyterHub are bound to make are fairly simple, and you don't need to be a PromQL expert.

You can borrow a lot of useful queries from the GitHub repository [jupyterhub/grafana-dashboards](https://github.com/jupyterhub/grafana-dashboards), from inside the `jsonnet` files. The primary thing you may need to modify is getting rid of the `$hub` template parameter from queries.
:::

+++ {"user_expressions": []}

Loop over each datasource and call the `get_pandas_prometheus()` function to create a Prometheus client for querying the server with the API. Evaluate the query from the last month to now with a step size of 1 day and output the results to a pandas dataframe. Save each output into an `activity` list item and then concatenate the results together at the end.

```{code-cell} ipython3
activity=[]
for prometheus_uid in datasources['uid']:
prometheus = get_pandas_prometheus("https://grafana.pilot.2i2c.cloud", GRAFANA_TOKEN, prometheus_uid)
df = prometheus.query_range(
query,
dateparser_parse("1 month ago"),
dateparser_parse("now"),
"1d",
)
activity.append(df)
df = pd.concat(activity)
```

+++ {"user_expressions": []}

## Pre-process and visualize the results

+++ {"user_expressions": []}

Round the datetime index to nearest calendar day.

```{code-cell} ipython3
df.index = df.index.floor('D')
```

+++ {"user_expressions": []}

Rename the hubs from the raw data, `{namespace="<hub_name>"}`, to a human readable format using regex to extract the `<hub_name>` from the `"` double-quotes.

```{code-cell} ipython3
df.columns = [re.findall(r'[^"]+', col)[1] for col in df.columns]
```

+++ {"user_expressions": []}

Sort hubs by most number of unique users over time.

```{code-cell} ipython3
df = df.reindex(df.sum().sort_values(ascending=False).index, axis=1)
```

+++ {"user_expressions": []}

### Unique users in the last 24 hours

+++ {"user_expressions": []}

Plot the data! 📊

```{code-cell} ipython3
fig = go.Figure()
for col in df.columns:
fig.add_trace(go.Bar(
x=df.index,
y=df[col],
name=f"{col}",
)
)
fig.update_layout(
xaxis_title="Date",
width=800,
height=600,
legend=dict(groupclick="toggleitem"),
barmode='stack',
legend_traceorder="normal",
)
fig.show()
```

```{code-cell} ipython3
```
1 change: 1 addition & 0 deletions partnerships/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,6 @@ structure.md
workflow.md
../communication/index.md
community_success/freshdesk.md
community_success/hub-activity.ipynb
stickers.md
```
1 change: 1 addition & 0 deletions reference/documentation/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ See the sections below for more information about the structure and instructions
edit
content
environment
secrets
structure
sources
```
Loading

0 comments on commit 8284ab7

Please sign in to comment.