Physics Data for Machine Learning (pd4ml)

This repository contains datasets and model for machine learning from the publication "Shared Data and Algorithms for Deep Learning in Fundamental Physics" (arXiv:2107.00656)

You can install this package as a python module with pip via:

pip install git+https://github.com/erum-data-idt/pd4ml

# or just git clone & 'pip install .' in this folder

The essential function is the load function to load the training and testing datasets. The datasets features "X" are returned as a list of numpy arrays. The labels are returend directly as a numpy array.

from pd4ml import Spinodal   # or any other dataset (see below) 

# loading training data into RAM (downloads dataset first time)
X_train, y_train  = Spinodal.load('train', path='./datasets')

# loading test data into RAM (downloads dataset first time)
X_test, y_test = Spinodal.load('test', path = './datasets')

Here a subfolder ./datasets is created. The datasets take up a total disk space of about 2.4 GB. For loading the training datasets a free RAM of at at least 5 GB is necessary (depending on the dataset).

Included datasets at the moment with the tags:

1: TopTagging, 2: Spinodal, 3: EOSL, 4: Airshower, 5: Belle

An description of the datasets can be printed via the function:

Spinodal.print_description()

Show all available datasets:

import pd4ml

for dataset in pd4ml.Dataset.datasets_register:
    print(dataset.name)

An additionally load_data function performs some basic preprocessing steps as well as allows the return of an adjecancy matrix:

from pd4ml import Spinodal   # or any other dataset
x_train, y_train = Spinodal.load_data('train', path = './datasets', graph = True)

x_train is dictionary with the contents features and adj_matrix. If no adjecancy matrix is required, one may set graph = False.

Some example plots can be found in the notebooks in the example folder.

Creating a model:

In the folder models multiple model implementations can be found. Each can be imported in the main.py script and run on the specified datasets. If you'd like to contribute a model, feel free to implement it using the template.py.

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
benchmark		benchmark
examples		examples
models		models
pd4ml		pd4ml
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Physics Data for Machine Learning (pd4ml)

Creating a model:

About

Releases

Packages

Contributors 7

Languages

erum-data-idt/pd4ml

Folders and files

Latest commit

History

Repository files navigation

Physics Data for Machine Learning (pd4ml)

Creating a model:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages