add SevenNet to Matbench Discovery leaderboard #112

YutackPark · 2024-07-13T09:00:29Z

Dear Developers,

I hope this message finds you well. Thank you for your efforts for maintaining the leaderboard. I am writing to submit a pull request to update the leaderboard with our model, SevenNet. I have carefully followed your instructions for submitting new models. However, if there is anything missing or unclear, please let me know.

Our test codes are very similar to test_mace.py. We found that the relaxation optimizer choices and hyperparameters used by MACE are also suitable for our model.

List of files

2024-07-11-sevennet-preds.csv.gz
2024-07-11-sevennet-preds-no-bad.csv.gz
2024-07-11-sevennet-preds-bad.csv.gz
sevennet.yml
test_sevennet.py
join_results.py
write_no_bad.py

File Descriptions

2024-07-11-sevennet-preds.csv.gz: Result of our model without outlier screening.
2024-07-11-sevennet-preds-no-bad.csv.gz: The main result we wish to submit to the leaderboard.
2024-07-11-sevennet-preds-bad.csv.gz: Screened outliers using the same criteria as in join_mace_results.py. We have three missing predictions according to this screening.
sevennet.yml: Metadata about our model
test_sevennet.py: Main benchmark code, similar to test_mace.py
join_results.py: Script for joining results into single .csv and .json files, similar to join_mace_results.py
write_no_bad.py: Script for screening outliers from model predictions, copied from join_mace_results.py

Our code, training procedures, and model are all freely available via the SevenNet repository.

Note for our Model

Our paper primarily discusses a parallel algorithm for GNN interatomic potentials, including the UIP model SevenNet-0 (22May2024) trained on the MPF dataset. However, the model we are submitting is not the same as the one introduced in the paper. We trained a new model, SevenNet-0 (11July2024), using the MPTrj dataset with the same model hyperparameters and architecture. SevenNet-0 (11July2024) is the model used to produce the results we are submitting.

Detailed training configurations to reproduce the potential SevenNet-0 (11July2024) are available in our repository.

Below is our internal result regarding the metrics on the leaderboard. Please inform us if there are any inconsistencies.

Thank you once more for your efforts in developing and maintaining this leaderboard.

Best regards,

Yutack Park
Materials Data & Informatics Laboratory
Seoul National University

for more information, see https://pre-commit.ci

CompRhys · 2024-07-13T14:52:06Z

Thanks for the submission look's like a great addition and open source open data SOTA assuming that @janosh replicates your metrics when we look at updating the static figures.

I have a few questions about how SevenNet-O relates to Nequip for the purposes of MBD. The proposed algorithm for calculating energies and forces when deployed in a spatial decomposition setup makes no difference to the predictions of energies and forces? As such there are no additional hyper-parameters relating to training to be included and the only difference is this self interaction embedding?

for more information, see https://pre-commit.ci

…etadata

for more information, see https://pre-commit.ci

CompRhys · 2024-07-14T01:21:35Z

@YutackPark could you check the hyperparameters listed before we merge, I took the values from the paper but it appears that the values used for this snapshot might be different: https://github.com/MDIL-SNU/SevenNet/blob/1c97a881bf52f86f985a030028f1044d93a549f4/pretrained_potentials/SevenNet_0__11July2024/pre_train.yaml.

We're running some more analysis and so it may be a few more day before we merge but this looks like a great contribution based on the preliminaries!

YutackPark · 2024-07-14T02:51:22Z

Thanks, @CompRhys! It looks like a lot of work has been done while I was sleeping. Here are my answers to your questions:

The proposed algorithm for calculating energies and forces when deployed in a spatial decomposition setup makes no difference to the predictions of energies and forces?

Yes, that is correct. However, for this benchmark, we did not use our parallel algorithm. Our proposed algorithm is beneficial when the target system has a sufficient number of atoms (typically more than 1000 atoms), which is not the case for this benchmark.

As such, there are no additional hyper-parameters relating to training to be included, and the only difference is this self-interaction embedding?

Yes. You can refer to the third paragraph of section 5.1 in our JCTC paper for reference. This is the only notable change compared to NequIP. If you set self_connection_type="nequip" in the pre_train.yaml, SevenNet retains exactly the same NequIP architecture.

I took the values from the paper, but it appears that the values used for this snapshot might be different.

I apologize for the confusion. This is because there are two SevenNet-0 potentials. The first one, SevenNet-0 (22May2024), is introduced in the paper. The second one, SevenNet-0 (11July2024), is trained on the MPTrj dataset and is different from the one mentioned in our paper. The second one is the potential we are submitting. We had to modify the scheduler for use with the MPTrj dataset. The pre_train.yaml is what we prepared for reproducibility as it is not introduced in the paper. I will update the sevennet.yml accordingly.

Thank you for your efforts. If you have any further questions or need additional assistance, please let me know.

P.S. If there are no concerns, may I link the official journal article in the paper= field in sevennet.yml instead of the arXiv link? Additionally, if the MACE model used in this benchmark refers to MACE-MP-medium, could you please verify the hidden irreps in models/mace/readme.md? It seems there might be an inconsistency.

CompRhys · 2024-07-14T03:21:11Z

Thanks for helping to update the metadata in the sevennet.yml! In pursuit of truly capturing the meta data it would be really great to copy across the pre-train.yml and perhaps a training script here if it's a simple script. We're very keen to provide strong meta data to help expose some of the tricks used in training to get good performance. This is especially true here as this model appears to be a nice jump in performance for the open source + open weights + open data category that are the most useful for the community! We want the benchmark to serve as a slingshot to get these fully open models out into the hands of a wide group of people as quickly as possible, e.g. materialsproject/atomate2#918, so more transparency == more trust.

P.S. If there are no concerns, may I link the official journal article in the paper= field in sevennet.yml instead of the arXiv link?

Our strong preference is for the paper field to link to a place where the manuscript or a preprint is open access so anyone can read it, we use the doi field to point to any journal publication, see cgcnn and m3gnet where this pattern is also used due to paywalled articles. We want the benchmark to serve an educational role and not all people have access to paywalled articles. Closer to home, I myself could not read the JCTC article and only had access to the pre-print.

CompRhys · 2024-07-14T03:49:11Z

re MACE I believe the readme is out of date and that the snapshot used for these predictions was trained with 2023-12-03-mace-128-L1.sh which corresponds to the 2023-12-11-mace-wbm-IS2RE-FIRE-no-bad.csv.gz predictions file. I'll check this tomorrow with others and update the readme accordingly. Thanks for highlighting!

for more information, see https://pre-commit.ci

YutackPark · 2024-07-14T05:34:20Z

Thank you for clarifying. I have updated sevennet.yml. As I plan to double-check my coworkers' affiliations and metadata, there may be additional changes.

In pursuit of truly capturing the meta data it would be really great to copy across the pre-train.yml and perhaps a training script here if it's a simple script.

I agree; there’s no issue with copying pre-train.yaml to this repository. Regarding the training script, once the data is prepared, it is quite straightforward to start training using a command-line interface of SevenNet. Conversely, the Python interface for SevenNet is currently a bit messy and not very user-friendly. Refactoring and preparing notebook-friendly tutorials are on my to-do list.
Therefore, preparing python script for training could take some time. if you don't mind I will prepare the shell script version first.

Our strong preference is for the paper field to link to a place where the manuscript or a preprint is open access so anyone can read it

If that’s the case, I’m happy to link to arXiv instead of the journal. I will update the arXiv with the peer-reviewed version, as it is currently outdated.

CompRhys · 2024-07-14T15:18:30Z

If there's a CLI, a shell script + the YAML would be great. Don't worry about a Python script or refactoring anything just for the purposes of merging your SevenNet results into the benchmark table. If in the future you make tweaks/refactor and train an improved model, we can always update the metadata about training then to capture the best practices.

janosh · 2024-07-14T20:07:41Z

@YutackPark are you planning to make a PyPI release for the sevenn package? That would help a lot with versioning and ease of use! it would also allow building additional tooling for the model (see e.g. materialsproject/atomate2#918; atomate2 can only depend on PyPI packages, not git repos when making new PyPI releases of itself).

YutackPark · 2024-07-15T01:09:53Z

@CompRhys, thank you for your advice. I will update the shell script for training first.

@janosh, yes I'm planning to release SevenNet via PyPI. I will let you know after the release. 👍

for more information, see https://pre-commit.ci

rename write_no_bad.py to filter_bad_preds.py

janosh

thanks @YutackPark for this excellent addition! 👍
congrats on having trained such a strong model

YutackPark and others added 4 commits July 13, 2024 16:37

add SevenNet to Matbench Discovery leaderboard

d826d1e

chore: rename yaml

35aab03

[pre-commit.ci] auto fixes from pre-commit.com hooks

63f9563

for more information, see https://pre-commit.ci

Update sevennet.yml fixing date and adding hyperparameters

ce676f8

CompRhys and others added 14 commits July 13, 2024 10:58

fea: add to enums and pred_files

4ab8a5f

fea: add to enum and pred files

d445796

[pre-commit.ci] auto fixes from pre-commit.com hooks

a084ce9

for more information, see https://pre-commit.ci

test: fix version test

1f3e7fd

[pre-commit.ci] auto fixes from pre-commit.com hooks

fd592ef

for more information, see https://pre-commit.ci

clean: refer to model as SevenNet-0 and version as 22July2024, more m…

d264c0a

…etadata

[pre-commit.ci] auto fixes from pre-commit.com hooks

214adfb

for more information, see https://pre-commit.ci

fix: version must match col names so bump max model version name length

c8129c9

add SevenNet to metrics tables

ccf2309

add SevenNet to model-stats.json

cf7aad9

fea: add autodownload of checkpoint and lint

01628bd

fix: remote duplicate file path

87d3c75

fix: mypy error

514f722

maint: add openness to all

2b2cba1

janosh mentioned this pull request Jul 14, 2024

Add SevenNetRelaxMaker + SevenNetStaticMaker to force field jobs materialsproject/atomate2#918

Merged

Merge branch 'janosh:sevennet' into sevennet

8bb3533

CompRhys and others added 3 commits July 13, 2024 23:52

fix: remove duplicate merge error

e1ca470

fix: sevennet.yml hyperparameters

010e6fb

[pre-commit.ci] auto fixes from pre-commit.com hooks

2eacc9d

for more information, see https://pre-commit.ci

clean: move whitespace line

81a0af1

Merge remote-tracking branch 'origin/main' into pr/YutackPark/112

c42cc3c

janosh mentioned this pull request Jul 14, 2024

Subject 4th model SevenNet to ffonon analysis janosh/ffonons#3

Merged

janosh added the new model Model submission label Jul 14, 2024

update site figs adding SevenNet, GNoME and MatterSim

1a98292

YutackPark and others added 10 commits July 15, 2024 10:26

add missing orcid id in

bcd2908

add: training script for sevennet

f13f353

[pre-commit.ci] auto fixes from pre-commit.com hooks

20cd0ca

for more information, see https://pre-commit.ci

fix: convert_MPTrj2xyz.py

7b6a732

[pre-commit.ci] auto fixes from pre-commit.com hooks

7809a7b

for more information, see https://pre-commit.ci

refactor: clean and type annotated convert_MPTrj2xyz.py

675786d

[pre-commit.ci] auto fixes from pre-commit.com hooks

716b605

for more information, see https://pre-commit.ci

get sevennet_root from sevenn.__path__

c04ed1a

refactor sevennet/join_results.py

dce35bc

rename write_no_bad.py to filter_bad_preds.py

refactor sevennet/train_sevennet/convert_MPTrj2xyz.py

20a271b

janosh enabled auto-merge (squash) July 15, 2024 13:36

janosh approved these changes Jul 15, 2024

View reviewed changes

janosh merged commit 7732f58 into janosh:main Jul 15, 2024
10 checks passed

janosh mentioned this pull request Jul 21, 2024

Publish to PyPI MDIL-SNU/SevenNet#52

Closed

janosh added the mlff Concerning machine learning force fields label Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add SevenNet to Matbench Discovery leaderboard #112

add SevenNet to Matbench Discovery leaderboard #112

YutackPark commented Jul 13, 2024

CompRhys commented Jul 13, 2024 •

edited by janosh

Loading

CompRhys commented Jul 14, 2024

YutackPark commented Jul 14, 2024 •

edited

Loading

CompRhys commented Jul 14, 2024 •

edited

Loading

CompRhys commented Jul 14, 2024 •

edited

Loading

YutackPark commented Jul 14, 2024 •

edited

Loading

CompRhys commented Jul 14, 2024

janosh commented Jul 14, 2024

YutackPark commented Jul 15, 2024

janosh left a comment •

edited

Loading

add SevenNet to Matbench Discovery leaderboard #112

add SevenNet to Matbench Discovery leaderboard #112

Conversation

YutackPark commented Jul 13, 2024

List of files

File Descriptions

Note for our Model

CompRhys commented Jul 13, 2024 • edited by janosh Loading

CompRhys commented Jul 14, 2024

YutackPark commented Jul 14, 2024 • edited Loading

CompRhys commented Jul 14, 2024 • edited Loading

CompRhys commented Jul 14, 2024 • edited Loading

YutackPark commented Jul 14, 2024 • edited Loading

CompRhys commented Jul 14, 2024

janosh commented Jul 14, 2024

YutackPark commented Jul 15, 2024

janosh left a comment • edited Loading

Choose a reason for hiding this comment

CompRhys commented Jul 13, 2024 •

edited by janosh

Loading

YutackPark commented Jul 14, 2024 •

edited

Loading

CompRhys commented Jul 14, 2024 •

edited

Loading

CompRhys commented Jul 14, 2024 •

edited

Loading

YutackPark commented Jul 14, 2024 •

edited

Loading

janosh left a comment •

edited

Loading