Tophat tutorial

From Icbwiki

Revision as of 18:54, 10 January 2013; view current revision
←Older revision | Newer revision→
Jump to: navigation, search
  • TopHat

TopHat is a spliced read mapper for mRNA-seq reads. It uses the short read aligner Bowtie to map reads against the whole genome and analyzes the mapping results to identify splice junctions between exons.

  • Installation:

Download the source code from Compile and install the program using:

   ./configure --prefix=/path/to/install/directory/ --with-bam=/path/to/bam
   (where /path/to/bam contains include/bam and lib/libbam.a) 
   make install
  • Example usage:
  • Building the genome index database:
bowtie-build hg18/hg18.fa hg18_bowtie_idx

This builds the Bowtie index for the human genome hg18. You can also download the pre-built index file from

  • Aligning to the genome:
tophat -p 4 -o alignments/ hg18_bowtie_idx s_1_sequence.txt

This aligns the sequence file (s_1_sequence.txt) against the human genome (hg18_bowtie_idx) using 4 CPUs (-p 4) and generates the alignment output (alignments/accepted_hits.sam) in the output directory (-o alignments/). The output SAM file does not contain the MD tag annotation so use the following step to fill in the MD tags.

If the output is in the BAM format, you can decompress it using

samtools view file.bam

  • Adding the MD tags to the SAM file:
samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa > alignments/accepted_hits.sam.filledmd

or to save bandwidth and disk space, use:

samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa | gzip > alignments/accepted_hits.sam.filledmd.gz

where hg18/hg18.fa.fai is the human genome index file (by samtools), hg18/hg18.fa is the FASTA sequence of the human genome.

For more detailed information, please go to

Elemento Lab TopHat pipeline

  • to make BigWig tracks, upload to lab's web server; will write track data to screen
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/ --dirs=. --tophat=1 --verbose=1 --updir=AIDiPSC


--normalize=1 to normalize read counts
--trackfile=FILE to output track data to file
--genome=STR to specify genome eg hg19
  • to generate basic QC data, eg number of fastq reads, number of aligned reads, number of uniquely aligned reads
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/ --dirs=.
Personal tools