Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: at least one feature in the annotation file doesn't have a biotype description. ALFA won't be able to work robustly. #7

Open
zztin opened this issue Jan 8, 2021 · 2 comments

Comments

@zztin
Copy link

zztin commented Jan 8, 2021

Hi,

Thank you for the package! I am keen to use solve cell-free DNA functional annotation with ALFA. In the first step, I need to provide an annotation file (GTF format with biotypes) for my reference genome (hg38). However, the gtf downloaded from NCBI (see link below) and USCS both raise this complaint. I would like to hear your suggestion where can I get the correct format of gtf (and recommanded tracks) for this purpose.

The details of the two sources I tried:

  1. NCBI:
    https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.39/
    (top-right botton: Download Assembly--> RefSeq --> File Type: Genomic GTF)
  2. UCSC: http://genome.ucsc.edu/cgi-bin/hgTables
    Screen Shot 2021-01-08 at 18 25 36

both of these returns:
Error: at least one feature in the annotation file doesn't have a biotype description. ALFA won't be able to work robustly.

Best,
Li-Ting

@zztin
Copy link
Author

zztin commented Jan 8, 2021

Hi,
I solved the question for now by downloading GTF from Gencode. I switched the key-value pair in column 9 key: "gene_type" to "gene_biotype"
After this customized alteration, it works with ALFA. I'm not sure if this is the correct thing to do, but it seems like gene_type indeed referred to biotype in the context (based on the data format description).

I would still appreciate your recommendations on where you would download the gtf tracks while working with human data.

Another related question, Is it possible to use ALFA to determine if the DNA fragments coverage on Alu element / LINE / LTR and other repetitive elements? In this case, can I alter the "gene_biotype" into other custom tags such as "mobile_element_type" in ALFA? Would the normalization stay intact like this?

Thank you!

@mbahin
Copy link
Collaborator

mbahin commented Jan 11, 2021

Hi,

First of all, I must say that the package is not maintained anymore (sorry). Though it should work as it was at the time it was developed.

Regarding the annotation file, I used to download it from Ensembl. For example, for Homo Sapiens, I was getting the file from here.
However, your trick should be ok since the "gene_type" is described as what we use as the biotype (and I check the list of gene_type which is concordant with the list of biotypes we used to work with).

Cheers,
Mathieu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants