Tophat tutorial

From Icbwiki

Jump to: navigation, search
  • TopHat

TopHat is a spliced read mapper for mRNA-seq reads. It uses the short read aligner Bowtie to map reads against the whole genome and analyzes the mapping results to identify splice junctions between exons.

  • Installation:

Download the source code from http://tophat.cbcb.umd.edu/. Compile and install the program using:

   ./configure --prefix=/path/to/install/directory/ --with-bam=/path/to/bam
   (where /path/to/bam contains include/bam and lib/libbam.a) 
   make
   make install
  • Example usage:
  • Building the genome index database:
bowtie-build hg18/hg18.fa hg18_bowtie_idx

This builds the Bowtie index for the human genome hg18. You can also download the pre-built index file from ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/hg18.ebwt.zip.

  • Aligning to the genome:
tophat -p 4 -o alignments/ hg18_bowtie_idx s_1_sequence.txt

This aligns the sequence file (s_1_sequence.txt) against the human genome (hg18_bowtie_idx) using 4 CPUs (-p 4) and generates the alignment output (alignments/accepted_hits.sam) in the output directory (-o alignments/). The output SAM file does not contain the MD tag annotation so use the following step to fill in the MD tags.

If the output is in the BAM format, you can decompress it using

samtools view file.bam


  • Adding the MD tags to the SAM file:
samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa > alignments/accepted_hits.sam.filledmd

or to save bandwidth and disk space, use:

samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa | gzip > alignments/accepted_hits.sam.filledmd.gz

where hg18/hg18.fa.fai is the human genome index file (by samtools), hg18/hg18.fa is the FASTA sequence of the human genome.

For more detailed information, please go to http://tophat.cbcb.umd.edu/tutorial.html.

Personal tools