Support multiple epitope prediction scores per tool #15

lkuchenb · 2021-01-03T17:00:58Z

Most tools report multiple scores for each prediction. Currently the framework interfaces are affinity-centric, which is actually discouraged by most tool developers.

The epitope prediction method interface should support multiple scores and all scores produced by each method should be extracted from their outputs.

b-schubert · 2021-02-17T15:43:09Z

agreed! As disucussed, we should probably rethink how to store results etc and perhaps use something similar to anndata (https://anndata.readthedocs.io/en/latest/) to store results?

christopher-mohr · 2021-04-16T09:15:35Z

agreed! As disucussed, we should probably rethink how to store results etc and perhaps use something similar to anndata (https://anndata.readthedocs.io/en/latest/) to store results?

Hi @b-schubert, did you use the anndata package before? Do you think it would be a good fit for this feature? I think especially the rank-based prediction would be a really important thing to add (also based on feature requests I get).

lkuchenb · 2021-04-16T10:23:37Z

If we want to stick to the current pandas dataframe layout as a return value for predictions we could also realize multiple scores with a multi-layer index on the columns, for example

Allele→      A*01:01-----------------
Method→      NetMHC4------- SYFPEITHI
ScoreType→   PercRank- Aff- SyfScore-
↓Peptide

SYFPEITHI    0.001     123     0.123
EITHYSYFP    0.9       12312   0.904
ITYPEISYF    ...       ...     ...

b-schubert · 2021-04-16T10:47:56Z

@christopher-mohr

Hi @b-schubert, did you use the anndata package before? Do you think it would be a good fit for this feature? I think especially the rank-based prediction would be a really important thing to add (also based on feature requests I get).

anndata is essential a Matrix format with additional pandas data frames on the rows columns. I guess it would be quite similar to an additional multi-index dimension?! But would give us perhaps a better way to store the peptide and HLA objects (instead of the index and column keys)....

Opinions?

christopher-mohr · 2021-04-19T07:35:00Z

Hard to say without any detailed knowledge about anndata. Currently, I would also vote for sticking to pandas as @lkuchenb suggested.

jonasscheid · 2021-05-20T15:30:46Z

Hey! I would like to help implementing this functionality :)

If the EpitopePredictionResult class in Result.py should be changed from

Peptide Obj	Method Name	Allele1 Obj	Allele2 Obj
Peptide1	Method 1	0.324	0.56
	Method 2	20	15
Peptide2	Method 1	0.50	0.36.
	Method 2	26	10

to something like

Allele	A*01:01
Method	NetMHC4		SYFPEITHI
ScoreType	PercRank	Aff	SyfScore
Peptide
SYFPEITHI	0.001	123	0.123
EITHYSYFP	0.9	12312	0.904
ITYPEISYF	...	...	...

using multi-index on columns I need to change the functions filter_result and merge_results in the EpitopePredictionResult class right?

Additionally parse_external_result in the predict function of External.py needs to be changed such that it returns a dictionary with separated metrics. I would suggest to change the output of that function (for all classes) from

defaultdict(dict, {'HLA-Allele1': {'Pep1': -999, 'Pep2': 500}})

to

defaultdict(dict, {'HLA-Allele1': {'Rank':{'Pep1': 15.0, 'Pep2': 16.0}, 'Affinity'{'Pep1': -999, 'Pep2': 500}}, HLA-Allele2{...})

Also the current multi-indexing in the predict function needs to be changed then.

What are your opinions on that? Would that be sufficient? Did I miss anything? Please let me know!

Best,
Jonas

christopher-mohr · 2021-05-26T09:02:53Z

Hi @jonasscheid, thanks for starting to work on this! In my opinion, this is already a good starting point to carry forward this discussion. I think it's also important to check what is the impact of changes to EpitopePredictionResult.

What do you think @lkuchenb and @b-schubert? Should we proceed with this?

b-schubert · 2021-06-08T09:19:32Z

It will also affect all downstream methods (e.g., vaccine design frameworks)

b-schubert · 2021-06-30T13:44:53Z

Multiindexing in all results classes
Introduce an Enum for Rank and Score index
Extend OptiTope, EpitopeAssemlby specifying wich score type to use and in which direction to optimize; Implement internal routine to transformed scores accordingly if necessary (to not changes the optimization problems)

jonasscheid · 2021-07-09T11:21:50Z

Multiindexing in all results classes

Hey!
I went through the CleavageAndTAPPrediction tutorial and the data structures in Results.py and recognized, that there might actually not be a need to change the remaining result classes towards more layers of multi-indices. I can e.g. do Chapter 3: Consensus prediction for natural ligand prediction in the tutorial only with the changes in EpitopePredictionResult and obtain meaningful results.

I also have no meaningful idea on adding another layer of multi-indices to the remaining results classes. Maybe you have one. Would be nice to have some feedback!

Thanks in advance!

christopher-mohr · 2021-07-23T07:18:53Z

@b-schubert, @lkuchenb what's your opinion on this? I guess one reason for changing it would be to have a consistent structure of result objects.

@jonasscheid I think in the case of the `CleavageSitePrediction, the "problem " is that you don't have any allele-dependency right?

jonasscheid · 2021-07-23T08:52:09Z

Exactly @christopher-mohr. This is also the case for CleavageFragmentPredictionResult, TAPPredictionResult. Would not make sense biologically as well..

Additionally for CleavageSitePredictionResult, CleavageFragmentPredictionResult, TAPPredictionResult (all remaining classes in Results.py) there is no predictor, as far as I investigated, which has multiple scores as output. So no need for multi-indices in that regard as well.

b-schubert · 2021-07-28T07:05:36Z

Those prediction tools indeed do not offer multiple scores, but I was thinking of future-proofing them given that we could also calculate rank-based scores for them at some point.

* Push all changes made on fork * Set Setuptools version also for external yml * Fixed erroneous variable names in matrix files * deleted A_2601_9 matrix for now. Caused troubles * Add A2601_9 syf matrix for debugging * Fixed bug in test caused by addition of A*26:01 matrix * Change solver from cbc to glpk to investigate if macOS dependant env problems in github actions can be solved * Corrected after review * Adjust tutorials to new structure * Change filter_result as discussed * Adjusted filter method and tutorials according to #12 * Fixed a bug occuring for netMHCfamily tools when peptide input has multiple lengths * remove logging * Alter filter_result method as discussed

christopher-mohr · 2021-12-22T11:12:07Z

Solved by #42.

* Version bump 3.0.0rc2 * Fix master / main branch naming in GH action * Fix typos in README file * Add pypi GH action * Fix PyPI linting errors * Reduce version to 3.0.0rc1 * Add a changelog * Install changelog with package * Change PyPI CD trigger to published release * Add rank metric #15 (#42) * Push all changes made on fork * Set Setuptools version also for external yml * Fixed erroneous variable names in matrix files * deleted A_2601_9 matrix for now. Caused troubles * Add A2601_9 syf matrix for debugging * Fixed bug in test caused by addition of A*26:01 matrix * Change solver from cbc to glpk to investigate if macOS dependant env problems in github actions can be solved * Corrected after review * Adjust tutorials to new structure * Change filter_result as discussed * Adjusted filter method and tutorials according to #12 * Fixed a bug occuring for netMHCfamily tools when peptide input has multiple lengths * remove logging * Alter filter_result method as discussed * Fixed issues #38, #44 and #45 (#46) * Fixed issues #44 and #45 * Fix #48, include review suggestions * Improve/update documentation (#50) * Update CHANGELOG * Extend README * Change framework name in code comment * Remove logging warning * Change file ending in tutorial * Add docstrings, minor formatting * Update CHANGELOG version and setup.py * Update date Co-authored-by: Leon Kuchenbecker <leon.kuchenbecker@uni-tuebingen.de> Co-authored-by: Jonas Scheid <43858870+jonasscheid@users.noreply.github.com>

* Version bump 3.0.0rc2 * Fix master / main branch naming in GH action * Fix typos in README file * Add pypi GH action * Fix PyPI linting errors * Reduce version to 3.0.0rc1 * Add a changelog * Install changelog with package * Change PyPI CD trigger to published release * Add rank metric #15 (#42) * Push all changes made on fork * Set Setuptools version also for external yml * Fixed erroneous variable names in matrix files * deleted A_2601_9 matrix for now. Caused troubles * Add A2601_9 syf matrix for debugging * Fixed bug in test caused by addition of A*26:01 matrix * Change solver from cbc to glpk to investigate if macOS dependant env problems in github actions can be solved * Corrected after review * Adjust tutorials to new structure * Change filter_result as discussed * Adjusted filter method and tutorials according to #12 * Fixed a bug occuring for netMHCfamily tools when peptide input has multiple lengths * remove logging * Alter filter_result method as discussed * Fixed issues #38, #44 and #45 (#46) * Fixed issues #44 and #45 * Fix #48, include review suggestions * Improve/update documentation (#50) * Update CHANGELOG * Extend README * Change framework name in code comment * Remove logging warning * Change file ending in tutorial * Add docstrings, minor formatting * Update CHANGELOG version and setup.py * Update date * Fix #52 (#53) * add check if transcript sequence available from BioMart, cleanup (#58) * Add interface for netMHCpan 4.1 (#59) * add interface for netmhcpan 4.1 * remove duplicate alleles from list * Update supportedAlleles of syfpeithi (#62) Co-authored-by: Christopher Mohr <christopher.mohr@qbic.uni-tuebingen.de> * Fix protobuf version for tests, prepare docs for 3.1.0 release (#64) * Prepare docs for new release * minor changes/additions docs * check if fixing protobuf version resolves testing errors * check if changing github actions workflow resolves testing issue * allow lower versions of protobuf * Update epytope/doc/conf.py Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com> Co-authored-by: Christopher Mohr <christopher.mohr@uni-tuebingen.de> Co-authored-by: Jonas Scheid <jonas@u-081-c204.eap.uni-tuebingen.de> Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com> * Add netMHCIIpan 4.1 interface (#66) * add netmhciipan 4.1 interface * remove logging * remove __name method * update changelog * Update CHANGELOG.md Co-authored-by: Christopher Mohr <christopher.mohr@qbic.uni-tuebingen.de> Co-authored-by: Christopher Mohr <christopher.mohr@uni-tuebingen.de> Co-authored-by: Christopher Mohr <christopher.mohr@qbic.uni-tuebingen.de> * minor doc improvements, cleanup setup.py * bump version Co-authored-by: Leon Kuchenbecker <leon.kuchenbecker@uni-tuebingen.de> Co-authored-by: Jonas Scheid <43858870+jonasscheid@users.noreply.github.com> Co-authored-by: Jonas Scheid <jonas@u-081-c204.eap.uni-tuebingen.de> Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com>

* Version bump 3.0.0rc2 * Fix master / main branch naming in GH action * Fix typos in README file * Add pypi GH action * Fix PyPI linting errors * Reduce version to 3.0.0rc1 * Add a changelog * Install changelog with package * Change PyPI CD trigger to published release * Add rank metric #15 (#42) * Push all changes made on fork * Set Setuptools version also for external yml * Fixed erroneous variable names in matrix files * deleted A_2601_9 matrix for now. Caused troubles * Add A2601_9 syf matrix for debugging * Fixed bug in test caused by addition of A*26:01 matrix * Change solver from cbc to glpk to investigate if macOS dependant env problems in github actions can be solved * Corrected after review * Adjust tutorials to new structure * Change filter_result as discussed * Adjusted filter method and tutorials according to #12 * Fixed a bug occuring for netMHCfamily tools when peptide input has multiple lengths * remove logging * Alter filter_result method as discussed * Fixed issues #38, #44 and #45 (#46) * Fixed issues #44 and #45 * Fix #48, include review suggestions * Improve/update documentation (#50) * Update CHANGELOG * Extend README * Change framework name in code comment * Remove logging warning * Change file ending in tutorial * Add docstrings, minor formatting * Update CHANGELOG version and setup.py * Update date * Fix #52 (#53) * add check if transcript sequence available from BioMart, cleanup (#58) * Add interface for netMHCpan 4.1 (#59) * add interface for netmhcpan 4.1 * remove duplicate alleles from list * Update supportedAlleles of syfpeithi (#62) Co-authored-by: Christopher Mohr <christopher.mohr@qbic.uni-tuebingen.de> * Fix protobuf version for tests, prepare docs for 3.1.0 release (#64) * Prepare docs for new release * minor changes/additions docs * check if fixing protobuf version resolves testing errors * check if changing github actions workflow resolves testing issue * allow lower versions of protobuf * Update epytope/doc/conf.py Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com> Co-authored-by: Christopher Mohr <christopher.mohr@uni-tuebingen.de> Co-authored-by: Jonas Scheid <jonas@u-081-c204.eap.uni-tuebingen.de> Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com> * Add netMHCIIpan 4.1 interface (#66) * add netmhciipan 4.1 interface * remove logging * remove __name method * update changelog * Update CHANGELOG.md Co-authored-by: Christopher Mohr <christopher.mohr@qbic.uni-tuebingen.de> Co-authored-by: Christopher Mohr <christopher.mohr@uni-tuebingen.de> Co-authored-by: Christopher Mohr <christopher.mohr@qbic.uni-tuebingen.de> * minor doc improvements, cleanup setup.py * Update MartsAdapter (#69) * Rewrite, extend, cleanup MartsAdapter, adapt tests * add requests and beautifulsoup4 dependency * prevent too long requests, avoid server request for each attribute * add gene to test object * fix enum ref * adapt MartsAdapter in other test * add function for getting gene names, add tests * change method name, add test * add lxml as dependency * workaround for pandas read_xml, remove dependency * add missing all() * fix test * add retry strategy for GET requests * Update epytope/IO/MartsAdapter.py Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com> * add default biomart url Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com> * Outsource supported alleles (#63) * Draft for outsourcing supported alleles * Further outsourcing of netmhc alleles * Finish outsourcing external alleles * Outsource alleles from pssm and ann predictors * Correct minor erroneous hla nomenclatures of smmpmbec * Change allele imports by importing frozensets * Add __allele_import_name to classes to increase readability * Refactor: convert_alleles is now classmethod in pssm * Incorporate feedback * Update __init__.py * Update uniprot adapter (#71) * remove HLAtyping and distance2self tests, update CHANGELOG * fix reading sequences in uniprot adapter * add test for uniprot adapter * remove HLAtyping and distance2self tests, update CHANGELOG (#70) * Fix netmhcii4.0 parser (#73) * fix netmhciipan4.0 issue * update changelog * Add function for peptides to check if created by variant (#74) * remove HLAtyping and distance2self tests, update CHANGELOG * add Peptide functon to determine if peptide originates from a variant * fix peptide call, update CHANGELOG * Improve function to check peptide origin (#75) * remove HLAtyping and distance2self tests, update CHANGELOG * add Peptide functon to determine if peptide originates from a variant * fix peptide call, update CHANGELOG * improve method for variant-peptide check * minor CHANGELOG change * change peptide to self * update setup.py and CHANGELOG * Fix errorneous supported alleles (#78) * Draft for outsourcing supported alleles * Further outsourcing of netmhc alleles * Finish outsourcing external alleles * Outsource alleles from pssm and ann predictors * Correct minor erroneous hla nomenclatures of smmpmbec * Change allele imports by importing frozensets * Add __allele_import_name to classes to increase readability * Refactor: convert_alleles is now classmethod in pssm * Incorporate feedback * Fix parsing error and sort allele list * Adjust variable naming Co-authored-by: Leon Kuchenbecker <leon.kuchenbecker@uni-tuebingen.de> Co-authored-by: Jonas Scheid <43858870+jonasscheid@users.noreply.github.com> Co-authored-by: Jonas Scheid <jonas@u-081-c204.eap.uni-tuebingen.de> Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com>

lkuchenb added the enhancement New feature or request label Jan 3, 2021

lkuchenb added this to the 3.1 milestone Jan 3, 2021

jonasscheid mentioned this issue Sep 7, 2021

Rank support of multiple prediction tools #15 #41

Closed

jonasscheid mentioned this issue Oct 26, 2021

Add rank metric #15 #42

Merged

christopher-mohr modified the milestones: 3.1, 3.0 Dec 22, 2021

christopher-mohr closed this as completed Dec 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple epitope prediction scores per tool #15

Support multiple epitope prediction scores per tool #15

lkuchenb commented Jan 3, 2021

b-schubert commented Feb 17, 2021

christopher-mohr commented Apr 16, 2021

lkuchenb commented Apr 16, 2021

b-schubert commented Apr 16, 2021

christopher-mohr commented Apr 19, 2021

jonasscheid commented May 20, 2021

christopher-mohr commented May 26, 2021

b-schubert commented Jun 8, 2021

b-schubert commented Jun 30, 2021

jonasscheid commented Jul 9, 2021 •

edited

Loading

christopher-mohr commented Jul 23, 2021

jonasscheid commented Jul 23, 2021

b-schubert commented Jul 28, 2021

christopher-mohr commented Dec 22, 2021

Support multiple epitope prediction scores per tool #15

Support multiple epitope prediction scores per tool #15

Comments

lkuchenb commented Jan 3, 2021

b-schubert commented Feb 17, 2021

christopher-mohr commented Apr 16, 2021

lkuchenb commented Apr 16, 2021

b-schubert commented Apr 16, 2021

christopher-mohr commented Apr 19, 2021

jonasscheid commented May 20, 2021

christopher-mohr commented May 26, 2021

b-schubert commented Jun 8, 2021

b-schubert commented Jun 30, 2021

jonasscheid commented Jul 9, 2021 • edited Loading

christopher-mohr commented Jul 23, 2021

jonasscheid commented Jul 23, 2021

b-schubert commented Jul 28, 2021

christopher-mohr commented Dec 22, 2021

jonasscheid commented Jul 9, 2021 •

edited

Loading