Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev pytorch #168

Merged
merged 296 commits into from
Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
296 commits
Select commit Hold shift + click to select a range
55ccaec
Disable all outdagted tests, the valuable ones should still be converted
niekdejonge Jan 17, 2024
b6a76ba
move helper functions and add new loss
florian-huber Jan 18, 2024
9fff13d
add test for new loss
florian-huber Jan 18, 2024
0fa83bd
fix function and test
florian-huber Jan 18, 2024
902eadd
update model and tests to include rmse and have custom loss input
florian-huber Jan 18, 2024
45a77df
cleanup
florian-huber Jan 19, 2024
62f77ca
add documentation
florian-huber Jan 19, 2024
c82ebf8
update model code, add relu to dense layer in sequential way
florian-huber Jan 19, 2024
aa4822d
update smart binning layer
florian-huber Jan 22, 2024
eb3e1a6
Store vectorization_settings in model. Including MetadataFeatureGener…
niekdejonge Jan 22, 2024
d645c12
Integrate pytorch in MS2Deepscore class
niekdejonge Jan 22, 2024
73b2eb8
prospector
niekdejonge Jan 22, 2024
4f8b79e
isort
niekdejonge Jan 22, 2024
f2b7a3c
isort
niekdejonge Jan 22, 2024
6611032
Merge branch 'dev_pytorch' into integrate_pytorch
niekdejonge Jan 22, 2024
83bd631
Fix issue after merge
niekdejonge Jan 22, 2024
9dbf5eb
add risk aware mse loss function
florian-huber Jan 22, 2024
66c0c5b
Linting after merge
niekdejonge Jan 22, 2024
f6a9002
update train funtion and tests
florian-huber Jan 22, 2024
50e5950
fix parameter name
florian-huber Jan 22, 2024
fa54f4d
Merge branch 'dev_pytorch' into integrate_pytorch
florian-huber Jan 22, 2024
e70216c
Merge pull request #169 from matchms/integrate_pytorch
florian-huber Jan 22, 2024
74f3b8c
clean up loss functions
florian-huber Jan 22, 2024
95ea1eb
fix and linting
florian-huber Jan 22, 2024
66275e0
renaming, refactoring, removing old parts
florian-huber Jan 22, 2024
5b5f852
larger restructuring of the classes for default settings
florian-huber Jan 22, 2024
66351e2
switch to use of generator vs. model settings
florian-huber Jan 22, 2024
1d24029
fix random seed handling and linting
florian-huber Jan 22, 2024
645c53d
fix random number rng
florian-huber Jan 22, 2024
f9c465b
add validation data generator handling
florian-huber Jan 22, 2024
b0ae960
expand test
florian-huber Jan 22, 2024
b3ac9e8
linting
florian-huber Jan 22, 2024
fd2d150
Remove some old tests
niekdejonge Jan 23, 2024
9fca54b
Added test loading and saving val generator
niekdejonge Jan 23, 2024
bf6ea7c
Remove redundant tests
niekdejonge Jan 23, 2024
6e80197
Remove BinnedSpectrum.py
niekdejonge Jan 23, 2024
b6ad903
Remove typing and add todo to montecarlodropout
niekdejonge Jan 23, 2024
f817b1e
Isort and linting
niekdejonge Jan 23, 2024
0b82ab9
Merge pull request #173 from matchms/issue172
niekdejonge Jan 23, 2024
c377458
Remove create_peak_dict, since not used anymore
niekdejonge Jan 23, 2024
637ddd3
Move data_generators.py to train_models folder
niekdejonge Jan 23, 2024
a29bd98
Move tensorizationSettings to MS2DeepscoreSettings
niekdejonge Jan 23, 2024
c536c94
Create tensorize spectra file
niekdejonge Jan 23, 2024
bf34570
Merge pull request #174 from matchms/remove_redundant_functions
niekdejonge Jan 23, 2024
93b2be7
Save version in model
niekdejonge Jan 23, 2024
50cef2b
Check version when loading model
niekdejonge Jan 23, 2024
5b53afb
move embedding computation function
florian-huber Jan 23, 2024
de3178b
rename benchmarking module and switch to more general loss computatio…
florian-huber Jan 23, 2024
773426a
linting
florian-huber Jan 23, 2024
7c54ae5
Small bug in checking equal shape
niekdejonge Jan 24, 2024
0de8230
Add commas to make tuples
niekdejonge Jan 24, 2024
edbb486
Implement ValidationLossCalculator
niekdejonge Jan 24, 2024
a72e7d4
Move loss functions together
niekdejonge Jan 24, 2024
af2a0ea
Use LOSS_FUNCTIONS in bin_dependent_losses
niekdejonge Jan 24, 2024
d0e92eb
Change assert to raise
niekdejonge Jan 24, 2024
53cb837
Integrate val_loss_calculator into train
niekdejonge Jan 24, 2024
a2400ce
Move ValidationLossCalculator to loss functions
niekdejonge Jan 24, 2024
2ff5db3
Added test model
niekdejonge Jan 24, 2024
1fe7d3c
Move ValidationLossCalculator to separate file and fix multiple bugs
niekdejonge Jan 24, 2024
5b6cee9
Add test_validation_loss_calculator
niekdejonge Jan 24, 2024
92782c2
Implement in test_siamese_spectra_model.py
niekdejonge Jan 24, 2024
9ff7092
Integrate in train_ms2deepscore.py
niekdejonge Jan 24, 2024
83c5144
Fix test_train_ms2deepscore.py
niekdejonge Jan 24, 2024
2a6223d
Fix bug with switching predictions and true values
niekdejonge Jan 24, 2024
31e1140
Use same_prob_bins from generator settings for plotting as well, to a…
niekdejonge Jan 24, 2024
d272eb5
prospector
niekdejonge Jan 24, 2024
ab7568b
isort
niekdejonge Jan 24, 2024
0b518f8
isort tests
niekdejonge Jan 24, 2024
b852a6c
Remove circular import
niekdejonge Jan 24, 2024
4e7c7b2
Remove comparisons against itself in ValidationLossCalculator.py
niekdejonge Jan 24, 2024
2a97607
Merge pull request #179 from matchms/finalize_issue177
florian-huber Jan 25, 2024
9f4dab6
add documentation
florian-huber Jan 25, 2024
a4e9f69
use smaller models to speed up testing
florian-huber Jan 25, 2024
7e634dd
Merge pull request #178 from matchms/issue177
florian-huber Jan 25, 2024
acf498e
add first changes to log
florian-huber Jan 25, 2024
42da1ad
Merge branch 'dev_pytorch' into save_version
florian-huber Jan 25, 2024
7893ac4
linting
florian-huber Jan 25, 2024
809a46c
Change list to tuple
niekdejonge Jan 25, 2024
e94d4c0
Add version to testmodel
niekdejonge Jan 25, 2024
ed2821f
Merge pull request #176 from matchms/save_version
niekdejonge Jan 25, 2024
50d2113
start working on MC version
florian-huber Jan 25, 2024
125bccf
fix train run for GPU
florian-huber Jan 25, 2024
1f6cc57
Merge branch 'dev_pytorch' of https://github.com/matchms/ms2deepscore…
florian-huber Jan 25, 2024
0bedf36
fix wrong default
florian-huber Jan 25, 2024
5928634
Merge pull request #186 from matchms/fix_issues
florian-huber Jan 25, 2024
a7b79ea
add some imports to __init__ (while avoiding circular imports)
florian-huber Jan 25, 2024
699e102
switch to 0 to 1 inclusive binning
florian-huber Jan 25, 2024
e937111
remove bins outside range 0 to 1
florian-huber Jan 25, 2024
0fb9a1c
remove bins outside range 0 to 1
florian-huber Jan 25, 2024
f8b527a
linting
florian-huber Jan 25, 2024
4eb8c5e
Remove outdated val data generator funcitons
niekdejonge Jan 25, 2024
67fe646
Update SettingsMS2Deepscore to integrate in train_ms2deepscore.py
niekdejonge Jan 25, 2024
807da75
add tests
florian-huber Jan 25, 2024
41e2d74
Use DataGeneratorSettings directly in DataGenerator
niekdejonge Jan 25, 2024
25015fa
fix number of digits
florian-huber Jan 25, 2024
e277ac3
Merge pull request #187 from matchms/fix_issues
niekdejonge Jan 25, 2024
fac584a
Use modelSettings directly instead of passing on parameters
niekdejonge Jan 26, 2024
a8d025f
Remove model_parameters in SiameseSpectralModel and instead just stor…
niekdejonge Jan 26, 2024
0117c31
Move create model directory name from settings to training_wrapper_fu…
niekdejonge Jan 26, 2024
5045d57
Merge tensorization settings with SettingsMS2Deepscore
niekdejonge Jan 26, 2024
5f2fb1c
Merge GeneratorSettings with SettingsMS2Deepscore
niekdejonge Jan 26, 2024
4590a16
Updated testmodel.py
niekdejonge Jan 26, 2024
4ec7319
isort
niekdejonge Jan 26, 2024
33f80d4
Merge branch 'dev_pytorch' into integrate_settings
niekdejonge Jan 26, 2024
c64df42
Fix issue merge conflict
niekdejonge Jan 26, 2024
fb0f9af
linting
niekdejonge Jan 26, 2024
cd73342
add option to have weights during training
florian-huber Jan 26, 2024
ab540b3
move weighting into loss functions
florian-huber Jan 29, 2024
be2ff47
add sigmoid option
florian-huber Jan 29, 2024
40415b0
remove hard coded model.eval()
florian-huber Jan 29, 2024
3ef8671
change model_settings to settings
niekdejonge Jan 29, 2024
7699585
Merge pull request #189 from matchms/integrate_settings
niekdejonge Jan 29, 2024
255d6ba
switch to Tanh for embedding layer
florian-huber Jan 29, 2024
de9f6f8
add weighting to train function
florian-huber Jan 29, 2024
307e6e2
Merge branch 'dev_pytorch' into test_loss_functions
niekdejonge Jan 29, 2024
2848b97
Merge pull request #188 from matchms/test_loss_functions
niekdejonge Jan 29, 2024
c2c2d08
Use random seed for validation loss calculator
niekdejonge Jan 29, 2024
388353d
Merge branch 'dev_pytorch' into issue175
florian-huber Jan 29, 2024
de7f793
update monte-carlo score variant
florian-huber Jan 29, 2024
02ddeda
add tests for Monte Carlo score
florian-huber Jan 29, 2024
5b5d560
switch to low-high range instead of std or iqr
florian-huber Jan 29, 2024
a4408ad
linting
florian-huber Jan 29, 2024
2072756
add intensity sum functions
florian-huber Jan 30, 2024
95e0457
Update CHANGELOG.md
florian-huber Jan 30, 2024
e642ed4
fix import
florian-huber Jan 30, 2024
5e4cbf5
fix small augmentation bug and adjust defaults
florian-huber Jan 30, 2024
5b13a14
push my current default settings
florian-huber Jan 30, 2024
13cf5a9
add fingerprint_type and nbits to functions
florian-huber Jan 31, 2024
dfefa40
Merge pull request #191 from matchms/issue175
florian-huber Jan 31, 2024
dec8fb4
fix
florian-huber Jan 31, 2024
79835da
consistently add fingerprint_type and nbits to settings and use in fu…
florian-huber Jan 31, 2024
958a956
linting
florian-huber Jan 31, 2024
4fe10ea
Improved error message
niekdejonge Jan 31, 2024
8079ab0
Specify predictions and true values parameters to reduce risk of intr…
niekdejonge Jan 31, 2024
e756843
Improve printed message during tanimoto score calculation
niekdejonge Jan 31, 2024
644ee6f
Adjust the creation of the model dir name to the new way of storing d…
niekdejonge Jan 31, 2024
b903957
Turn metadata harmonization on again when loading spectra, otherwise …
niekdejonge Jan 31, 2024
3a24403
Fix suggestions
niekdejonge Jan 31, 2024
ebda509
Update ms2deepscore/SettingsMS2Deepscore.py
florian-huber Jan 31, 2024
28e1d83
Merge pull request #195 from matchms/small_changes_florian_pr
florian-huber Jan 31, 2024
e30ee56
Merge pull request #192 from matchms/data_and_cnn_exploration
florian-huber Jan 31, 2024
de6169a
replace asserts by raise
florian-huber Jan 31, 2024
bf9ac7b
Merge pull request #194 from matchms/small_fixes
niekdejonge Jan 31, 2024
a980c05
fix tests
florian-huber Jan 31, 2024
c882586
fix tests
florian-huber Jan 31, 2024
7a5d080
fix tests
florian-huber Jan 31, 2024
fe69a0a
try to fix weird error
florian-huber Jan 31, 2024
4a2b781
Merge pull request #197 from matchms/issue196
florian-huber Feb 1, 2024
6814a1a
switch to numpy random generator
florian-huber Feb 1, 2024
b748937
add ridgeline plot
florian-huber Feb 1, 2024
7381f0d
plot edits
florian-huber Feb 1, 2024
78e83c2
add min and max_resolution
florian-huber Feb 1, 2024
968c3ac
fixes
florian-huber Feb 1, 2024
bf17a65
adjust test to new random seed handling
florian-huber Feb 2, 2024
5e6fdac
linting
florian-huber Feb 2, 2024
e2474cb
Added max height as option and doubled it to have better looking plots
niekdejonge Feb 2, 2024
1debe9e
linting
florian-huber Feb 2, 2024
7b20665
Merge pull request #198 from matchms/cleanup_wrappers
niekdejonge Feb 5, 2024
b798ee7
Use reccall to determine bin sizes for reversed plot
niekdejonge Feb 5, 2024
78d0d2c
Update data_generators.py
florian-huber Feb 8, 2024
bc44eb9
move functioN
florian-huber Feb 13, 2024
fee53ae
move functioN
florian-huber Feb 13, 2024
ab85e2f
add first embedding evaluator
florian-huber Feb 13, 2024
a8ae725
fix
florian-huber Feb 13, 2024
5ba4407
move function
florian-huber Feb 14, 2024
7ead788
add new parameters to settings
florian-huber Feb 14, 2024
391662e
switch to InceptionTime model
florian-huber Feb 14, 2024
e23b2a0
add data generator for evaluator training
florian-huber Feb 14, 2024
3146334
fix
florian-huber Feb 14, 2024
55b51ae
add missing imports
florian-huber Feb 14, 2024
29e8419
add missing imports
florian-huber Feb 14, 2024
050df06
linting & fixes
florian-huber Feb 14, 2024
65fa41e
add tests
florian-huber Feb 14, 2024
c888173
add tests
florian-huber Feb 14, 2024
68aeb9b
minor edits
florian-huber Feb 14, 2024
d514d3e
cosmetic changes
florian-huber Feb 15, 2024
126bad0
add linear model
florian-huber Feb 15, 2024
e975e9d
add tests
florian-huber Feb 15, 2024
97812f0
small updates
florian-huber Feb 15, 2024
f90f27e
add MS2DeepScore variant
florian-huber Feb 15, 2024
312b315
add scikit learn
florian-huber Feb 15, 2024
a4eb440
fix test
florian-huber Feb 15, 2024
438ce1e
add tests and linting
florian-huber Feb 15, 2024
825c734
linting
florian-huber Feb 15, 2024
607e364
update test
florian-huber Feb 15, 2024
27b62b3
fixes
florian-huber Feb 16, 2024
7caa61e
edits and fixes
florian-huber Feb 16, 2024
ffc9540
add tests
florian-huber Feb 16, 2024
f3825d3
linting
florian-huber Feb 16, 2024
408e8f0
fix
florian-huber Feb 16, 2024
a002622
Fix bug when loading models due to not setting device
niekdejonge Feb 16, 2024
a83b6b7
Fix bug with sampling spectra multiple times (sampled one too many ti…
niekdejonge Feb 16, 2024
17db430
linting and more tests
florian-huber Feb 16, 2024
8809e1a
Update CHANGELOG.md
florian-huber Feb 16, 2024
6af0193
add documentation
florian-huber Feb 19, 2024
d97b7c9
add type hints
florian-huber Feb 19, 2024
a7407ec
expand documentation and use InceptionTime class as base class
florian-huber Feb 19, 2024
74efa9a
linting
florian-huber Feb 20, 2024
9475d5b
linting
florian-huber Feb 20, 2024
dfd1ce3
linting
florian-huber Feb 20, 2024
44a3a21
speed up model training tets
florian-huber Feb 20, 2024
91e9808
Merge pull request #201 from matchms/add_2nd_model
florian-huber Feb 20, 2024
bfc518e
Add jupyter notebooks
niekdejonge Feb 26, 2024
b2b09d7
isort
niekdejonge Feb 27, 2024
74ddaa1
linting
niekdejonge Feb 27, 2024
0e0abb9
Merge pull request #202 from matchms/small_bug_fixes
niekdejonge Feb 27, 2024
2f7807d
Separate LinearEmbeddingEvaluation from EmbeddingEvaluatorModel.py
niekdejonge Feb 27, 2024
23d8641
Add train_evaluator to EmbeddingEvaluatorModel.py as method
niekdejonge Feb 27, 2024
154b4cb
Fix import issues
niekdejonge Feb 27, 2024
9194ed0
Remove unused code
niekdejonge Feb 27, 2024
3d4f6bb
Add test train model
niekdejonge Feb 27, 2024
75fa5d7
isort
niekdejonge Feb 27, 2024
d03e70f
linting
niekdejonge Feb 27, 2024
6e17b14
Move inception modules down
niekdejonge Feb 28, 2024
9d0d288
Make compute_embedding_evaluations a method of the class
niekdejonge Feb 28, 2024
d02f55a
Fix test
niekdejonge Feb 28, 2024
f93aad6
Merge TimeInception class into EmbeddingEvaluatorModel
niekdejonge Feb 28, 2024
569a4c0
Split SettingsMS2Deepscore and SettingsEvaluationModel
niekdejonge Feb 28, 2024
8184b98
isort
niekdejonge Feb 28, 2024
cd9a0c9
Use fingerprint settings from the MS2Deepscore model (these should al…
niekdejonge Feb 28, 2024
68eb945
Add training settings to SettingsEmbeddingEvaluator
niekdejonge Feb 28, 2024
dd514e8
Remove redundant in line comment
niekdejonge Feb 28, 2024
ab53792
Create Data_generators in train evaluator, to remove risk of using wr…
niekdejonge Feb 28, 2024
8bd6887
isort
niekdejonge Feb 28, 2024
256edc4
linting
niekdejonge Feb 28, 2024
acce811
Fix mismatch between settings and model_settings
niekdejonge Feb 28, 2024
7c7b4e4
Merge pull request #206 from matchms/integrate_train_in_class
florian-huber Feb 29, 2024
0f705e6
Update ms2deepscore/SettingsMS2Deepscore.py
florian-huber Feb 29, 2024
413aa49
Fix bug with loading model on cpu while trained on gpu
niekdejonge Mar 1, 2024
247a53c
Remove random seed
niekdejonge Mar 1, 2024
6cd0f03
Remove outdated todo
niekdejonge Mar 1, 2024
47b2103
Merge pull request #207 from matchms/split_settings
florian-huber Mar 6, 2024
80b919e
add notebook embedding evaluator
niekdejonge Mar 6, 2024
110e161
Remove outdated todo
niekdejonge Mar 12, 2024
ab54038
Update settings to new default settings
niekdejonge Mar 12, 2024
2754c77
Fix bugs in tests
niekdejonge Mar 12, 2024
7071a18
Fix type testing bug additional metadata
niekdejonge Mar 12, 2024
1513b2a
Merge pull request #183 from matchms/issue182
florian-huber Mar 13, 2024
a303cdd
Merge pull request #210 from matchms/type_check_settings
florian-huber Mar 13, 2024
4dd468c
Add check for known loss function
niekdejonge Mar 13, 2024
b4907d0
Add allign structures
niekdejonge Mar 13, 2024
66abfe7
Change python version
niekdejonge Mar 13, 2024
67e8a05
Add new notebooks
niekdejonge Mar 13, 2024
ec7733e
Only different python version
niekdejonge Mar 13, 2024
8f21b64
Added saving linear model
niekdejonge Mar 13, 2024
eefc905
Only line separator changes or different python version
niekdejonge Mar 13, 2024
c868105
Alligned figures postive vs negative examples
niekdejonge Mar 13, 2024
d07ee88
Merge pull request #211 from matchms/type_check_settings
niekdejonge Mar 13, 2024
818b4fa
Merge pull request #212 from matchms/update_notebooks
niekdejonge Mar 13, 2024
dc407e8
Merge branch 'main' into dev_pytorch
niekdejonge Mar 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 1 addition & 36 deletions .github/workflows/CI_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
fail-fast: false
matrix:
os: ['ubuntu-latest', 'macos-latest', 'windows-latest']
python-version: ['3.8', '3.9', '3.10']
python-version: ['3.9', '3.10', '3.11']
exclude:
# already tested in first_check job
- python-version: 3.9
Expand All @@ -69,38 +69,3 @@ jobs:
- name: Run tests
run: |
pytest

tensorflow_check:
name: Tensorflow version check / python-3.8 / ubuntu-latest
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: 3.8
- name: Python info
run: |
which python
python --version
- name: Install Tensorflow version 2.6
run: |
python -m pip install --upgrade pip
pip install "tensorflow>=2.6,<2.7"
- name: Install other dependencies
run: |
pip install -e .[dev,train]
- name: Show pip list
run: |
pip list
- name: Run test with tensorflow version 2.6
run: pytest
- name: Install Tensorflow version 2.8
run: |
pip install --upgrade "numpy<1.24.0"
pip install --upgrade "tensorflow>=2.8,<2.9"
- name: Show pip list
run: |
pip list
- name: Run test with tensorflow version 2.8
run: pytest
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ xunit-result.xml

docs/_build
docs/apidocs
prototyping/

# ide
.idea
Expand Down
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,26 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [2.0.0] - date...
Large scale expansion, revision, and restructuring of MS2Deepscore.

### Added
- Models are now build using PyTorch.
- Models have build-in GPU support (using pytorch).
- new `EmbeddingEvaluatorModel` (Inception Time CNN)
- new `LinearModel` for absolute error estimates
- new `MS2DeepScoreEvaluated` matchms-style score --> gives "score" and "predicted_absolute_error"
- Additional smart binning layer that can handle input of much higher peak resolution (not used as a default!)
- New validation concept --> all-vs-all scores for the validation spectra are computed, but loss is then computed per score bin. This gives better and more significant statistics of the model performance
- New loss functions "Risk Aware MAE" and "Risk Aware MSE" which function similar to MAE or MSE but try to counteract the tendency of a model to predict towards 0.5.
- Losses can now be weighted with a weighting_factor.


### Changed
- No longer supports Tensorflow/Keras
- The concept of Spectrum binning has changed and is now implemented differently (i.e. no more "missing peaks" as before)
- Monte-Carlo Dropout does not return a score (mean or median) together with percentile-based upper and lower bound (instead of STD or IQR before).

## [Unreleased]

## [1.0.0] - 2024-03-12
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,11 +116,13 @@ In that scenario, `scores["score"]` contains the similarity scores (median of th
Training your own model is only recommended if you have some familiarity with machine learning.
To train your own model you can run the code below.
Please first ensure cleaning your spectra. We recommend using the cleaning pipeline in [matchms](https://github.com/matchms/matchms).

```python
from ms2deepscore.train_new_model.SettingMS2Deepscore import \
from ms2deepscore.SettingsMS2Deepscore import
SettingsMS2Deepscore
from ms2deepscore.wrapper_functions.training_wrapper_functions import \
from ms2deepscore.wrapper_functions.training_wrapper_functions import
train_ms2deepscore_wrapper

settings = SettingsMS2Deepscore({"epochs": 300,
"base_dims": (1000, 1000, 1000),
"embedding_dim": 500,
Expand All @@ -129,9 +131,7 @@ settings = SettingsMS2Deepscore({"epochs": 300,
"learning_rate": 0.00025,
"patience": 30,
})
train_ms2deepscore_wrapper(spectra_file_path=,
settings=settings,
validation_split_fraction=20)
train_ms2deepscore_wrapper(spectra_file_path=, model_settings=, validation_split_fraction=20)
```
## Contributing
We welcome contributions to the development of ms2deepscore! Have a look at the [contribution guidelines](https://github.com/matchms/ms2deepscore/blob/main/CONTRIBUTING.md).
48 changes: 0 additions & 48 deletions ms2deepscore/BinnedSpectrum.py

This file was deleted.

73 changes: 16 additions & 57 deletions ms2deepscore/MS2DeepScore.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
import numpy as np
from matchms import Spectrum
from matchms.similarity.BaseSimilarity import BaseSimilarity
from tqdm import tqdm
from .typing import BinnedSpectrumType
from ms2deepscore.models.SiameseSpectralModel import (SiameseSpectralModel,
compute_embedding_array)
from .vector_operations import cosine_similarity, cosine_similarity_matrix


Expand All @@ -29,7 +29,7 @@ class MS2DeepScore(BaseSimilarity):
queries = load_from_json("xyz.json")

# Load pretrained model
model = load_model("model_file_123.hdf5")
model = load_model("model_file_123.pt")

similarity_measure = MS2DeepScore(model)
# Calculate scores and get matchms.Scores object
Expand All @@ -38,43 +38,25 @@ class MS2DeepScore(BaseSimilarity):

"""

def __init__(self, model, progress_bar: bool = True):
def __init__(self, model: SiameseSpectralModel, progress_bar: bool = True):
"""

Parameters
----------
model:
Expected input is a SiameseModel that has been trained on
the desired set of spectra. The model contains the keras deep neural
network (model.model) as well as the used spectrum binner (model.spectrum_binner).
the desired set of spectra.
progress_bar:
Set to True to monitor the embedding creating with a progress bar.
Default is False.
"""
self.model = model
self.multi_inputs = (model.nr_of_additional_inputs > 0)
if self.multi_inputs:
self.input_vector_dim = [self.model.base.input_shape[0][1], self.model.base.input_shape[1][1]]
else:
self.input_vector_dim = self.model.base.input_shape[1]
self.output_vector_dim = self.model.base.output_shape[1]
self.model.eval()
self.output_vector_dim = self.model.model_settings.embedding_dim
self.progress_bar = progress_bar

def _create_input_vector(self, binned_spectrum: BinnedSpectrumType):
"""Creates input vector for model.base based on binned peaks and intensities"""
if self.multi_inputs:
X = [np.zeros((1, i[1])) for i in self.model.base.input_shape]
idx = np.array([int(x) for x in binned_spectrum.binned_peaks.keys()])
values = np.array(list(binned_spectrum.binned_peaks.values()))

X[0][0, idx] = values
X[1] = np.array([[float(value) for key, value in binned_spectrum.metadata.items() if (key != "inchikey")]])
else:
X = np.zeros((1, self.input_vector_dim))
idx = np.array([int(x) for x in binned_spectrum.binned_peaks.keys()])
values = np.array(list(binned_spectrum.binned_peaks.values()))
X[0, idx] = values
return X
def get_embedding_array(self, spectrums):
return compute_embedding_array(self.model, spectrums)

def pair(self, reference: Spectrum, query: Spectrum) -> float:
"""Calculate the MS2DeepScore similaritiy between a reference and a query spectrum.
Expand All @@ -91,12 +73,9 @@ def pair(self, reference: Spectrum, query: Spectrum) -> float:
ms2ds_similarity
MS2DeepScore similarity score.
"""
binned_reference = self.model.spectrum_binner.transform([reference])[0]
binned_query = self.model.spectrum_binner.transform([query])[0]
reference_vector = self.model.base.predict(self._create_input_vector(binned_reference))
query_vector = self.model.base.predict(self._create_input_vector(binned_query))

return cosine_similarity(reference_vector[0, :], query_vector[0, :])
embedding_reference = self.get_embedding_array([reference])
embedding_query = self.get_embedding_array([query])
return cosine_similarity(embedding_reference[0, :], embedding_query[0, :])

def matrix(self, references: List[Spectrum], queries: List[Spectrum],
array_type: str = "numpy",
Expand All @@ -122,33 +101,13 @@ def matrix(self, references: List[Spectrum], queries: List[Spectrum],
ms2ds_similarity
Array of MS2DeepScore similarity scores.
"""
reference_vectors = self.calculate_vectors(references)
embeddings_reference = self.get_embedding_array(references)
if is_symmetric:
assert np.all(references == queries), \
"Expected references to be equal to queries for is_symmetric=True"
query_vectors = reference_vectors
embeddings_query = embeddings_reference
else:
query_vectors = self.calculate_vectors(queries)
embeddings_query = self.get_embedding_array(queries)

ms2ds_similarity = cosine_similarity_matrix(reference_vectors, query_vectors)
ms2ds_similarity = cosine_similarity_matrix(embeddings_reference, embeddings_query)
return ms2ds_similarity

def calculate_vectors(self, spectrum_list: List[Spectrum]) -> np.ndarray:
"""Returns a list of vectors for all spectra

parameters
----------
spectrum_list:
List of spectra for which the vector should be calculated
"""
n_rows = len(spectrum_list)
reference_vectors = np.empty(
(n_rows, self.output_vector_dim), dtype="float")
binned_spectrums = self.model.spectrum_binner.transform(spectrum_list, progress_bar=self.progress_bar)
for index_reference, reference in enumerate(
tqdm(binned_spectrums,
desc='Calculating vectors of reference spectrums',
disable=(not self.progress_bar))):
reference_vectors[index_reference, 0:self.output_vector_dim] = \
self.model.base.predict(self._create_input_vector(reference), verbose=0)
return reference_vectors
Loading
Loading