-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bcftools csq - GFF Format #1078
Comments
GFFs provided by Ensembl use this convention |
That I know but almost no other annotation (tool) does produce such a format. Plus, putting the prefixes there is redundant with the feature column of the GFF format specification. The feature column can probably considered more stable in its definition than a prefix of the attribute ID field. I can provide a patch if this is considered a better way of identifying GFF_TSCRIPT_LINE and GFF_GENE_LINE. |
I am open for this to be changed as long as it continues working with Ensembl files. A more general (and also an easier) solution might be to provide a new script |
The latter might be a short term fix but one has to remove Testing for a GFF_GENE_LINE would need to be TRUE if the third field of an (Ensembl) GFF contains:
Testing for a GFF_TSCRIPT_LINE would need to be TRUE if the third field of an (Ensembl) GFF contains:
Best to stay in sync with SequenceOntology (which Ensembl promotes): Gene --> http://www.sequenceontology.org/browser/current_release/term/SO:0000704 |
Hi, would be nice if the prefix ("transcript:", "gene:", etc) were optional. |
There are too many possible variations a GFF can have, I don't want to burden |
Petr, thanks for the reply. I'll look into making a PR for the GFF,
do you have a recommendation for this? There are no phase annotations in the GFF. |
@brentp Hi, any updates on this? I'm experiencing the same issue . Best, |
Phase is 8th column of GFF https://www.ncbi.nlm.nih.gov/datasets/docs/reference-docs/file-formats/about-ncbi-gff3. |
Hi,
could someone explain me why the feature type for a line in a GFF is not taken from the third GFF filed but
bcftools csq
expect each gene and transcript with a prefix (e.g.,gene:
ortranscript:
)? Inflates GFFs pretty much with redundant information and introduces IDs that are longer than they actually have to be. Guess there is a rational that I just don't get at the moment.Thx,
Felix
The text was updated successfully, but these errors were encountered: