Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update IPAW to hg38 genome #3

Open
yafeng opened this issue Jul 18, 2018 · 11 comments
Open

update IPAW to hg38 genome #3

yafeng opened this issue Jul 18, 2018 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@yafeng
Copy link
Collaborator

yafeng commented Jul 18, 2018

The current IPAW pipeline utilises hg19 genome based databases, and the reported coordinates of novel peptides and SAAV peptides are hg19 genomic coordinates. The goal is to make IPAW compatible for latest hg38 genome assembly.

@yafeng
Copy link
Collaborator Author

yafeng commented Jul 18, 2018

  1. COSMIC and dbSNP in hg38 version
    Get the SNP database
    https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=682621893_rZAeDI3qkmv2ea9OULNeBo6GjEui&clade=mammal&org=&db=hg38&hgta_group=varRep&hgta_track=snp150Common&hgta_table=snp150CodingDbSnp&hgta_regionType=genome&position=&hgta_outputType=primaryTable&hgta_outFileName=snp150CodingDbSnp.txt

Get the COSMIC database
sftp 'your_email_address@example.com'@sftp-cancer.sanger.ac.uk
Download the data

sftp> get cosmic/grch38/cosmic/v85/CosmicMutantExport.tsv.gz
sftp> exit

@yafeng
Copy link
Collaborator Author

yafeng commented Jul 18, 2018

  1. Get the hg38 masked genome sequence
wget hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chromFaMasked.tar.gz
tar hg38.chromFaMasked.tar.gz
for chr in {1..22} X Y M; do cat chr$chr.fa.masked >> hg38.chr1-22.X.Y.M.fa.masked; done

@yafeng
Copy link
Collaborator Author

yafeng commented Jul 18, 2018

  1. varDB 2.0 database with latest pseudogene, lncRNA, nsSNPs and COSMIC DB
    aiming to include:
    a. GENCODE release 28 pseudogenes including consensus pseudogenes predicted by the Yale and UCSC pipelines
    b. lncRNAs from LNCipedia v5.1 (hg38)
    c. mutant peptides derived from somatic mutations in COSMIC v85
    d. mutant peptides derived from nsSNPs in dbSNP150

@yafeng yafeng self-assigned this Jul 19, 2018
@yafeng yafeng added the enhancement New feature or request label Jul 19, 2018
@yafeng
Copy link
Collaborator Author

yafeng commented Aug 21, 2018

varDB2.0 database can be downloaded from:
wget http://lehtiolab.se/Supplementary_Files/VarDB2.zip

@yafeng
Copy link
Collaborator Author

yafeng commented Aug 21, 2018

Add a command-line option --hg19 or --hg38 so that the workflow can be run under different genome assembly. The following processes need to be modified accordingly:
BLATnovel, phastcons, phyloCSF , annovar

@yafeng
Copy link
Collaborator Author

yafeng commented Nov 26, 2018

@TnakaNY
Copy link

TnakaNY commented Mar 18, 2020

Could you provide a copy of ipaw for hg38 genome?
Or, latest version is for hg38?

@TnakaNY
Copy link

TnakaNY commented Mar 18, 2020

Hi Yafeng,

Could you please also upload varDB2.0 anywhere?
I could not download by using your suggestion above.

Thx.

@yafeng
Copy link
Collaborator Author

yafeng commented Mar 28, 2020

@TnakaNY
try this link for VarDB2.0
https://drive.google.com/open?id=1G20qIF60xdJ5zrSbt8a8sd0RKutYxQMC

you need to use ipaw hg38 version, which I uploaded under my github repo. And you need to use conda to set up local environments so that all executive commands can be found. It take some efforts to set up. Otherwise, I suggest you continue to use hg19, which is better maintained.

https://github.com/yafeng/proteogenomics-analysis-workflow/blob/master/ipaw.local.hg38.nf

@TnakaNY
Copy link

TnakaNY commented Mar 28, 2020

Thank you, let me try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants