Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MartsAdapter #69

Merged

Conversation

christopher-mohr
Copy link
Collaborator

@christopher-mohr christopher-mohr commented Aug 16, 2022

This rewrites parts of the MartsAdapter and extends it to add new functionality and to make it more flexible in its use.
The main changes are the following:

  • Use requests instead of urllib
  • Use ElementTree to create XML queries
  • Add methods e.g. for showing available Ensembl archives and datasets
  • Use Pandas for fetching results
  • Check for attribute names on Ensembl server to make result retrieval more robust
  • Extend tests to cover most recent Ensembl Mart and the stable GRCh37 release
  • Rename methods where name was misleading
  • Remove methods get_variant_ids and get_all_variant_ids

Resolves #57.

@christopher-mohr christopher-mohr added this to the 3.2 milestone Aug 16, 2022
@christopher-mohr
Copy link
Collaborator Author

Not sure why the requests to BioMart currently fail occasionally, I did not experience that before.

Query ERROR: caught BioMart::Exception: non-BioMart die(): 
not well-formed (invalid token) at line 2, column 441, byte 480 at /nfs/public/ro/ensweb-software/sharedsw/2022_01_17_ct7/linuxbrew/Cellar/perl/5.34.0/lib/perl5/site_perl/5.34.0/x86_64-linux-thread-multi/XML/Parser.pm line 187.
XML::Simple called at /nfs/public/ro/ensweb/live/mart/www_107/biomart-perl/lib/BioMart/Query.pm line 1935.

I will run it again tomorrow. Maybe it's a problem on their side.

@christopher-mohr
Copy link
Collaborator Author

christopher-mohr commented Aug 25, 2022

I added a retry strategy for the GET requests to biomart. I still can't tell why it fails from time to time and it is an internal server error (500). I guess not much we can do there besides the workaround, might be that it occurs if one sends too many requests after each other.

epytope/IO/MartsAdapter.py Outdated Show resolved Hide resolved
epytope/IO/MartsAdapter.py Outdated Show resolved Hide resolved
epytope/IO/MartsAdapter.py Show resolved Hide resolved
christopher-mohr and others added 2 commits September 9, 2022 08:57
Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com>
Copy link
Collaborator

@ggabernet ggabernet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@christopher-mohr christopher-mohr merged commit c1da169 into KohlbacherLab:develop Sep 9, 2022
@christopher-mohr christopher-mohr deleted the extend_marts_adapter branch September 9, 2022 14:38
This was referenced Nov 2, 2022
christopher-mohr added a commit that referenced this pull request Nov 9, 2022
* Version bump 3.0.0rc2

* Fix master / main branch naming in GH action

* Fix typos in README file

* Add pypi GH action

* Fix PyPI linting errors

* Reduce version to 3.0.0rc1

* Add a changelog

* Install changelog with package

* Change PyPI CD trigger to published release

* Add rank metric #15 (#42)

* Push all changes made on fork

* Set Setuptools version also for external yml

* Fixed erroneous variable names in matrix files

* deleted A_2601_9 matrix for now. Caused troubles

* Add A2601_9 syf matrix for debugging

* Fixed bug in test caused by addition of A*26:01 matrix

* Change solver from cbc to glpk to investigate if macOS dependant env problems in github actions can be solved

* Corrected after review

* Adjust tutorials to new structure

* Change filter_result as discussed

* Adjusted filter method and tutorials according to #12

* Fixed a bug occuring for netMHCfamily tools when peptide input has multiple lengths

* remove logging

* Alter filter_result method as discussed

* Fixed issues #38, #44 and #45 (#46)

* Fixed issues #44 and #45

* Fix #48, include review suggestions

* Improve/update documentation (#50)

* Update CHANGELOG

* Extend README

* Change framework name in code comment

* Remove logging warning

* Change file ending in tutorial

* Add docstrings, minor formatting

* Update CHANGELOG version and setup.py

* Update date

* Fix #52 (#53)

* add check if transcript sequence available from BioMart, cleanup (#58)

* Add interface for netMHCpan 4.1 (#59)

* add interface for netmhcpan 4.1

* remove duplicate alleles from list

* Update supportedAlleles of syfpeithi (#62)

Co-authored-by: Christopher Mohr <christopher.mohr@qbic.uni-tuebingen.de>

* Fix protobuf version for tests, prepare docs for 3.1.0 release (#64)

* Prepare docs for new release

* minor changes/additions docs

* check if fixing protobuf version resolves testing errors

* check if changing github actions workflow resolves testing issue

* allow lower versions of protobuf

* Update epytope/doc/conf.py

Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com>

Co-authored-by: Christopher Mohr <christopher.mohr@uni-tuebingen.de>
Co-authored-by: Jonas Scheid <jonas@u-081-c204.eap.uni-tuebingen.de>
Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com>

* Add netMHCIIpan 4.1 interface (#66)

* add netmhciipan 4.1 interface

* remove logging

* remove __name method

* update changelog

* Update CHANGELOG.md

Co-authored-by: Christopher Mohr <christopher.mohr@qbic.uni-tuebingen.de>

Co-authored-by: Christopher Mohr <christopher.mohr@uni-tuebingen.de>
Co-authored-by: Christopher Mohr <christopher.mohr@qbic.uni-tuebingen.de>

* minor doc improvements, cleanup setup.py

* Update MartsAdapter (#69)

* Rewrite, extend, cleanup MartsAdapter, adapt tests

* add requests and beautifulsoup4 dependency

* prevent too long requests, avoid server request for each attribute

* add gene to test object

* fix enum ref

* adapt MartsAdapter in other test

* add function for getting gene names, add tests

* change method name, add test

* add lxml as dependency

* workaround for pandas read_xml, remove dependency

* add missing all()

* fix test

* add retry strategy for GET requests

* Update epytope/IO/MartsAdapter.py

Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com>

* add default biomart url

Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com>

* Outsource supported alleles (#63)

* Draft for outsourcing supported alleles

* Further outsourcing of netmhc alleles

* Finish outsourcing external alleles

* Outsource alleles from pssm and ann predictors

* Correct minor erroneous hla nomenclatures of smmpmbec

* Change allele imports by importing frozensets

* Add __allele_import_name to classes to increase readability

* Refactor: convert_alleles is now classmethod in pssm

* Incorporate feedback

* Update __init__.py

* Update uniprot adapter (#71)

* remove HLAtyping and distance2self tests, update CHANGELOG

* fix reading sequences in uniprot adapter

* add test for uniprot adapter

* remove HLAtyping and distance2self tests, update CHANGELOG (#70)

* Fix netmhcii4.0 parser (#73)

* fix netmhciipan4.0 issue

* update changelog

* Add function for peptides to check if created by variant (#74)

* remove HLAtyping and distance2self tests, update CHANGELOG

* add Peptide functon to determine if peptide originates from a variant

* fix peptide call, update CHANGELOG

* Improve function to check peptide origin (#75)

* remove HLAtyping and distance2self tests, update CHANGELOG

* add Peptide functon to determine if peptide originates from a variant

* fix peptide call, update CHANGELOG

* improve method for variant-peptide check

* minor CHANGELOG change

* change peptide to self

* update setup.py and CHANGELOG

* Fix errorneous supported alleles (#78)

* Draft for outsourcing supported alleles

* Further outsourcing of netmhc alleles

* Finish outsourcing external alleles

* Outsource alleles from pssm and ann predictors

* Correct minor erroneous hla nomenclatures of smmpmbec

* Change allele imports by importing frozensets

* Add __allele_import_name to classes to increase readability

* Refactor: convert_alleles is now classmethod in pssm

* Incorporate feedback

* Fix parsing error and sort allele list

* Adjust variable naming

Co-authored-by: Leon Kuchenbecker <leon.kuchenbecker@uni-tuebingen.de>
Co-authored-by: Jonas Scheid <43858870+jonasscheid@users.noreply.github.com>
Co-authored-by: Jonas Scheid <jonas@u-081-c204.eap.uni-tuebingen.de>
Co-authored-by: Gisela Gabernet <gisela.gabernet@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants