Skip to content

Rohit-Satyam/NGS-Data-Analysis

Repository files navigation

NGS-Data-Analysis

Introduction

This repository tries to enlist the best practices for NGS Data Analysis in general. It's also sequential arrangements of Best Practices of updated version of GATK 4.1.4.0 and on how to start with NGS Data analysis.

Starters Book On Genomics in R

  1. Basics on NGS: Understanding the Basics of NGS: From Mechanism to Variant Calling
  2. Illumina Sequencing Technology: Intro , sequencing by synthesis
  3. Eric Chow explanation behind the chemistry behind Illumina sequencers
  4. An interesting TED talk to get you excited: How to read the genome and build a human being | Riccardo Sabatini
  5. Various NGS Platforms here
  6. Analysis pipelines for cancer genome sequencing in mice is a Protocol with all the relevant command Published in 2020

Which Genome to start with: Grch37 or Grch38

  1. Genomic Analysis in the Age of Human Genome Sequencing
  2. Improvements and impacts of GRCh38 human reference on highthroughput sequencing data analysis
  3. Get to Know Your Reference Genome (GRCh37 vs GRCh38)
  4. GATK Post: Human genome reference builds - GRCh38/hg38 - b37 - hg19
  5. GRCh37 / hg19 / b37 / humanG1Kv37 - Human Reference Discrepancies
  6. Sequence masking
  7. How to...choose a reference genome?
  8. Why Are There Ambiguous (N) Bases (Gaps) In The Human Genome
  9. Which human reference genome to use? Heng Li's Blog

The exome size increased significantly from GRCh37's 75,231,228 to GRCh38’s 95,505,476 by 20,274,248 nucleotides, a 26.9.0% increase.(Source No. 2)...... Percentage wise, 2.43% of GRCh37 is exome as compared to 3.09% of CRCh38. The increase in exome size can be attributed to several reasons. First, the total number of distinct exons increased from 327,058 to 457,748 in GRCh38 and the median number of exons per gene also increased from 13 to 19 in GRCh38, while the median number of nucleotide per exon increased slightly al- most from 140 to 146 in GRCh38. These combined factors explain why the increase in the exome% in GRCh38.

high throughput sequencing is prone to identify more duplication than deletion CNV

System Preparation To make your life easy read my System Preparation Section before embarking upon the analysis part.

All about HPC

  1. Start here

Learn Linux

  1. Source 1: Here
  2. Source 2:

Download Data from SRA

  1. How to download data From SRA: Learn basic NGS data analysis from
    nextgenerationsequencinghq
  2. SRA-Toolkit: One of the worst documented NCBI Tool

How to get Chromosome size

  1. Go to UCSC Table browser
  2. Select Clade: mammal, Assembly: GRCh38, group: All Tables, Table: Chrominfo and hit Submit
  3. This wiil give you chromosome length

How Can I Get The Human Chromosome Centromere Position And Chromosome Length In Grch37/Hg19 see here

Locating centromeres and telomeres

  1. In hg19: This information can be found in the "gap" database table. Use the Table Browser to extract it. To do this, select your assembly and the gap table, then click the " filter Create" button. Set the "type" field to centromere telomere (separated by a space). Source
  2. In hg38: See source1 and source2

Convert .bcl to fastq

  1. Quick read

Lessons learnt from TCGA 2020 Conference

  1. Use multiple variant callers (MuSE, Mutect2, VarScan2, SomaticSniper, Pindel were used in TCGA). Post annotation). Since germline mutations can be leaked and can be used to trace individuals they are usually masked. enter image description here

enter image description here

To download havy files from TCGA GDC portal, used manifest file. Use then Data Transfer Tool (DTT).

enter image description here

enter image description here

enter image description here

enter image description here

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages