-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suggested enhacement: mask w.r.t. to reference #6
Comments
Hi Rob, Thanks for the suggestions, I already implemented You can test it on the v0.3.3c pre release. Frederic |
Thanks so much for implementing this. I tested it an it works exactly as I wanted. I'm a little confused only because that doesn't seem to quite match what you described (probably just me misunderstanding, but I wanted to clarify in case I'm missing something important). Example, using v0.3.3c: test.fa:
output:
This is exactly what I want - the position is masked in the sequence, and nothing else has changed. My only confusion is that you said "all insertions compared o the reference are also masked", which meant I was expecting bases 4-6 to be masked in seq1. Note that this is not I what I wanted - what I want is exactly what it's doing right now!! Just thought I'd highlight my confusion. Thanks again! |
I was not very clear indeed. If I take your example:
If you mask positions [2-3] in reference coordinates (
It also masks insertions/gaps that are in the interval. |
AH, OK, got it. That makes sense. One way to describe this is something like: "If the range of masked sites incorporates gaps in the reference sequence, these gaps will also be masked in the output alignment" Thanks again - this is really awesome. |
Thanks, I modified the documentation and the help with your suggestion. |
Hi,
A suggested enhancement to
mask
, please feel free to ignore!One useful thing for masking sequences is to mask them with respect to positions in an un-gapped reference. E.g. imagine I had:
I may know that I want to mask based on position 6 of the reference sequence before it was aligned. That position ends up being position 9 in this alignment. This can be a headache in large projects because I can't know in advance if the non-reference sequence will have insertions, so it's a pain to figure out by hand where the position I want to mask has ended up in the final alignment.
In this case my desired output would be:
and I might think of a command line something like
goalign mask -ref 'ref1' -s 6 -l 1
The key difference being that if one specifies some flag like
-ref [ref_seq_name]
then the -s and -l flags would correspond to ungapped positions in the named reference sequence, not positions in the alignment.Rob
The text was updated successfully, but these errors were encountered: