Tophat tutorial

From Icbwiki

Revision as of 03:44, 11 May 2010; view current revision
←Older revision | Newer revision→
Jump to: navigation, search
  • TopHat

TopHat is a spliced read mapper for mRNA-seq reads. It uses the short read aligner Bowtie to map reads against the whole genome and analyzes the mapping results to identify splice junctions between exons.

  • Installation:

Download the source code from Compile and install the program using:

   ./configure --prefix=/path/to/install/directory/
   make install
  • Example usage:
  • Building the genome index database:
bowtie-build hg18/hg18.fa hg18bwaidx

This builds the Bowtie index for the human genome hg18. You can also download the pre-built index file from

  • Aligning to the genome:
tophat -p 4 -o alignments/ hg18_bowtie_idx s_1_sequence.txt

This aligns the sequence file (s_1_sequence.txt) against the human genome (hg18_bowtie_idx) using 4 CPUs (-p 4) and generates the alignment output (alignments/accepted_hits.sam) in the output directory (-o alignments/). The output SAM file does not contain the MD tag annotation so use the following step to fill in the MD tags.

  • Adding the MD tags to the SAM file:
samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa > alignments/accepted_hits.sam

where hg18/hg18.fa.fai is the human genome index file (by samtools), hg18/hg18.fa is the FASTA sequence of the human genome.

For more detail information, please go to

Personal tools