-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deprecate sequence_alignment() and replace deprecated Bio.pairwise2 with Bio.Align.PairwiseAligner #3951
Conversation
- fix #3950 - engineered return tuple as namedtuple to functionally match the originally returned pairwise2.Alignment; we only documented a tuple as return type anyway - update CHANGELOG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the existing tests are sufficient, lgtm! (one missing empty line though :P )
Since biopython 1.80 the output format changed so my hacky way of getting the aligned sequences with gaps is not working anymore, hence the tests fail. |
:/ just 1.80? I.e. do we need to support both return types? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package/MDAnalysis/analysis/align.py
Outdated
# extract sequences (there's no obvious way to get the character | ||
# representation with gaps by other means from the new | ||
# Bio.Align.PairwiseAlignment instance) | ||
seqA, _, seqB, _ = topalignment.format().split("\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only worked until Bio 1.79. Then they changed the format.
I knew that parsing str was iffy but luckily our tests showed me right away how iffy that was.
I am now trying to find another way to get our old output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out, it's trivial
seqA = topalignment[0]
seqB = topalignment[1]
... testing now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs biopython >= 1.80.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'll need to offer both options and just do a version check?
@IAlibay please review again. Sorry for the extra work. When I tested it locally it worked because I had biopython 1.79 locally installed. Only when I updated to 1.80 (as in our CI) it broke. The changes should make it more robust because I think I am using the proper biopython API to get the sequence but it requires the most recent one >= 1.80 (which should be available for Python >= 3.8). |
@orbeckst do we have any idea what the minimum dependency versions for Biopython are? I'm apprehensive about having a core dependency that's forced to use the latest release since that can clash with a lot of older deps versions. |
The conda-forge feedstock uses pin_compatible numpy so that might be ok? (that's currently set to nep29-ish) |
(The docs show how one should write code if/when we deprecated this function.)
I tried my code with biopython 1.79 and it failed. For the above to work, it has to be >= 1.80. |
Alternatively, we outright deprecate sequence_alignment() and hope that we remove it before biopython removes pairwise2. |
Looking at biopython's setup.py https://github.com/biopython/biopython/blob/0d763824ab5ab5fd447fe2589b209e7c64a54b50/setup.py#L112 , they only depend on numpy, any version. Their ci-dependencies.txt is also non-specific. |
Codecov ReportBase: 94.38% // Head: 94.38% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## develop #3951 +/- ##
========================================
Coverage 94.38% 94.38%
========================================
Files 194 194
Lines 25242 25248 +6
Branches 3493 3493
========================================
+ Hits 23825 23831 +6
Misses 1368 1368
Partials 49 49
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Not sure where these failures in window come from mdanalysis-CI (Azure_Tests Win-Python38-32bit-full) fails TestH5MDWriterBaseAPI.test_write_trajectory_universe
I am letting it re-run, maybe that's a transient issue. |
That's a new one, but I've seen similar transient h5py win32 issues lately. It's been a long while since wheels for win32 got released, honestly we should just kill it soon (definitely for v3.0). |
Do you want to deprecate |
If it's likely not being used much, then I'd be happy for this to happen. |
I am pretty sure I added it a long time ago, in the times of "batteries included is better". It would be fitting if I removed it again ;-). I doubt it's used a lot and in any case, it's 3 lines of code. I'll add deprecations and raise an issue for the UG. |
- add deprecation - test for deprecation warning - combined tests for sequence_alignment in a class for easier removal - update CHANGELOG
match_score, mismatch_penalty, gap_penalty, gapextension_penalty) | ||
# choose top alignment | ||
return aln[0] | ||
aligner = Bio.Align.PairwiseAligner( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main argument for pairwisealigner is scoring
do we want/need to pass that in somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are deprecating it then no, let's not add features. At this stage I'd just want to avoid the code suddenly breaking because pairwise2 is gone (and getting rid of DeprecationWarning).
Same error on Windows 32bit. Not sure what to do — no idea which version of hdf5 is installed or if it even has anything to do with bumping biopython to 1.80; at least I wouldn't understand how, given the modest requirements of biopython. |
@orbeckst I think I might know where to start looking, do you mind if I push directly to the branch? |
Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
Please go ahead! (I can see how reviewing this PR motivated you to get PEP8 back in PR #3954 , sorry.) |
Did it... just fix itself? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just merge this whilst it's green @orbeckst 🤣 (if azure fails elsewhere we'll do a separate PR)
haha, ok, will do |
Fixes #3950
Changes made in this Pull Request:
sequence_alignment()
and schedule for removal in 3.0PR Checklist