This code takes newsitems in the format as provided by Taranis AI and clusters them into Stories.
The approach supports the following functionalities:
- Automatically detect Events.
- News items are clustered based on the detected Events.
- Documents belonging to related Events are then clustered into Stories.
The method initial_clustering
in clustering.py
takes as input a dictionary of news_items_aggregate
(see tests/testdapa.py
for the actual input format) and outputs a dictionary containing two keys:
("event_clusters" : list of list of documents ids) and
("story_clusters" : list of list of documents ids)
The method incremental_clustering_v2
takes as input a dictionary of news_items_aggregate
, containing new news items to be clustered, and clustered_news_items_aggregate
, containing already clustered items, and tries to cluster the new documents to the existing clusters or create new ones. See tests/testdata.py
for the actual input formats. This method also
outputs a dictionary containing two keys:
("event_clusters" : list of list of documents ids) and
("story_clusters" : list of list of documents ids)
flask run
# or
granian run
# or
docker run -p 5000:5000 ghcr.io/taranis-ai/taranis-story-cluster-bot:latest
uv venv
uv sync --all-extras --dev
See notebook\test_story_clustering.ipynb
for examples on how to use the clustering methods.
EUROPEAN UNION PUBLIC LICENCE v. 1.2