Skip to content

mottensmann/mhc_cluster

 
 

Repository files navigation

Clustering MHC class II alleles

Suite of functions for the clustering of MHC II sequences into putative allele sequences. So far the pipeline has been applied to Antarctic fur seal (Arctocephalus gazella) MHC II DQB & DRB sequences as well as MHC II sequences of the Zebra finch (Taeniopygia guttata)


This repository was forked from https://github.com/nextgenusfs/mhc_cluster.git [1]

The script cluster_mhc2.py allows to cluster high-quality Illumina reads into putative alleles using the Unoise3 approach 2. Input sequences are FASTQ files generated on a Illumina MiSeq run. The following steps must be executed outside the clustering:

Using cluster_mhc2.py

Input files are expected to be in fastq format containing the barcodes for individual samples within the header as shown below. Barcodes are required for demultiplexing reads prior to mapping reads to the generated list of alleles.

@MISEQ:279:000000000-AVVMJ:1:1101:14590:18831:N:0:barcodelabel=CAGAGAGGAAGGAGTA

For further information and a description of parameters see the helpfile:

## first version 
cluster_mhc2.py -h
## latest version introducing vsearch 
cluster_mhc3.py -h 

The subfolder lib contains required functions and may contain hidden Markov models created from multiple sequence alignment of previously characterised MHC genes using HMMER3. Example using MUSCLE and HMMER3:

##  Align sequences
muscle -in input_sequences.fasta -out aligned_sequences.afa
#   Create hidden markov model  
hmmbuild hmm aligned_sequences.afa
#   Create auxiliary files
hmmpress hmm

Dependencies

wget https://github.com/torognes/vsearch/archive/v2.9.1.tar.gz
tar xzf v2.9.1.tar.gz
cd vsearch-2.9.1
./autogen.sh
./configure
make
make install  # as root or sudo make install
tar zxf hmmer-3.1b2.tar.gz
cd hmmer-3.1b2
./configure
make
make check
pip install biopython

*Installing Biophyton can easily cause some problems. See troubleshooting options here: https://askubuntu.com/questions/677566/biopython-installation*

pip install natsort

References

[1] Palmer JM, Berkman LK, Marquardt PE, Donner DM, Jusino MA, Lindner DL. Preliminary characterization of little brown bats (Myotis lucifugus) immune MHC II DRB alleles using next-generation sequencing. PeerJ PrePrints. 2016 Jan 21;4:e1662v1.

[2] Edgar, R.C., 2016. UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. bioRxiv, p.081257.

About

Clustering MHC sequences

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%