Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: fasta aligned action, and multi-sequence aamutation action #195

Merged
merged 7 commits into from
Aug 10, 2023

Conversation

Taepper
Copy link
Collaborator

@Taepper Taepper commented Aug 9, 2023

This PR introduces an action to retrieve fasta sequences. (Either for one or multiple sequences)

And also includes a fix for AAMutation action to work for multiple sequences as it is a similar logic

@Taepper Taepper changed the title feat: fasta aligned action feat: fasta aligned action, and multi-sequence aamutation action Aug 9, 2023
Copy link
Contributor

@fengelniederhammer fengelniederhammer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about this symbol != position.symbol_whose_bitmap_is_flipped condition, otherwise this looks good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test this action in a unit test (now that we can have sequences of arbitrary length)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will maybe introduce those next week if it's okay. As we need a database for the query engine to run and don't have any such tests yet, we could maybe introduce a good framework for such tests, which I would do in a separate PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the fastaAligned_multiple.json endToEndTest already tests this function with sequences of differing lengths

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the different edge and error cases in mind, e.g. the sequence name does not exist. And also testing that the resulting sequence is actually correct, since this is vital. The reconstruction of the sequence is relatively complex.

for (auto position_id = local.begin(); position_id != local.end(); position_id++) {
const NucPosition& position = sequence_store.positions.at(position_id);
for (const NUCLEOTIDE_SYMBOL symbol : NUC_SYMBOLS) {
if (symbol != position.symbol_whose_bitmap_is_flipped &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it possible to not consider positions with flipped bitmaps?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed wrong if the flipped symbol is unequal to the reference sequence! Good catch!

Comment on lines +216 to +217
"AminoAcidMutations action can have the field sequenceName of type string or an array of "
"strings, but no other type"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"AminoAcidMutations action can have the field sequenceName of type string or an array of "
"strings, but no other type"
"The field sequenceName of AminoAcidMutations action must have type string or an array, if present."

Comment on lines 224 to 226
"AminoAcidMutations action can have the field sequenceName of type string or an array "
"of "
"strings, but no other type; while parsing array encountered the element " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"AminoAcidMutations action can have the field sequenceName of type string or an array "
"of "
"strings, but no other type; while parsing array encountered the element " +
"AminoAcidMutations action sequenceNames must be of type string, but found " +

@fengelniederhammer
Copy link
Contributor

How final is the implementation of the FastaAligned action? I could imagine that handing it out via JSON is not the most efficient way.

@Taepper
Copy link
Collaborator Author

Taepper commented Aug 10, 2023

Not final at all, but I just wanted to have something working, while writing it out to a buffer will still take some time I suppose. Maybe introduce a answer size limit for this action until we implement the compressed and/or streaming download

@fengelniederhammer
Copy link
Contributor

Ok, for some working solution, this is definitely fine, although maybe not the solution we should build upon in LAPIS.

…ap is different from the reference sequence symbol
@Taepper Taepper merged commit 8ddc878 into main Aug 10, 2023
4 of 5 checks passed
@Taepper Taepper deleted the fastaAction branch August 10, 2023 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants