Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agat_sp_extract_sequences.pl: support for multicistronic transcripts. #451

Open
jdcla opened this issue Apr 8, 2024 · 0 comments
Open

Comments

@jdcla
Copy link

jdcla commented Apr 8, 2024

Is your feature request related to a problem? Please describe.
Currently agat_sp_extract_sequences.pl (could be other scripts as well) does not support multicistronic transcripts. While this feature is often not supported by various gtf/gff tools, studies increasingly indicate the existence of translated ORFs positioned upstream/downstream/... of canonical coding sequences.

Describe the solution you'd like
When running agat_sp_extract_sequences.pl, I would like agat_sp_extract_sequences.pl to be able to handle multiple CDSs defined per transcript/mRNA feature. To start of, the tool would evaluate CDS IDs rather than transcript IDs as fasta headers (see this issue). Currently, I think the tool ignores or merges multicistronic CDSs with identical transcript IDs.

Describe alternatives you've considered
Today, it's possible to define a unique mRNA feature for each CDS, similar to the solution described here. It's a hacky solution that fails to show that multiple CDSs are from the same transcript.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant