Skip to content

Commit

Permalink
Update UCSC annotation file retrieval
Browse files Browse the repository at this point in the history
  • Loading branch information
nuno-agostinho committed May 17, 2018
1 parent 00e6655 commit 28b2978
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions inst/scripts/make-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ source(system.file("scripts/events.R",
## files from a transcript annotation file in GTF format. The annotation file
## was retrieved from UCSC Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables)
## by selecting the GRCh37/hg19 assembly, "Genes and Gene Predictions" group,
## "UCSC Genes" track, "knownGene" table for all genome in the GTF format.
## "Ensembl Genes" track, "ensGene" table for all genome in the GTF format.
## Misleadingly, the "transcript_id" column contains gene identifiers. As such,
## the proper transcript identifiers were retrieved from UCSC Table Browser in
## TXT format and and the following steps were taken:
## TXT format and the following steps were taken:

annotationGTF <- "ensembl_hg19.gtf"
annotationTXT <- "ensGene.txt"
Expand All @@ -31,8 +31,8 @@ annotationTXT <- "ensGene.txt"
# annotation from UCSC (TXT file)
require(data.table) # faster to load data frames
txt <- fread(annotationTXT, data.table = FALSE)
idTable <- txt$V13 # Save gene ID
names(idTable) <- txt$V2 # Save transcript ID
idTable <- txt[[13]] # Save gene ID
names(idTable) <- txt[[2]] # Save transcript ID

# Retrieve transcript IDs from the GTF file
gtf <- fread(annotationGTF, data.table = FALSE)
Expand Down

0 comments on commit 28b2978

Please sign in to comment.