Tophat tutorial

From Icbwiki

Revision as of 05:00, 24 November 2010; view current revision
←Older revision | Newer revision→
Jump to: navigation, search
  • TopHat

TopHat is a spliced read mapper for mRNA-seq reads. It uses the short read aligner Bowtie to map reads against the whole genome and analyzes the mapping results to identify splice junctions between exons.

  • Installation:

Download the source code from http://tophat.cbcb.umd.edu/. Compile and install the program using:

   ./configure --prefix=/path/to/install/directory/ --with-bam=/path/to/bam
   (where /path/to/bam contains include/bam and lib/libbam.a) 
   make
   make install
  • Example usage:
  • Building the genome index database:
bowtie-build hg18/hg18.fa hg18bwaidx

This builds the Bowtie index for the human genome hg18. You can also download the pre-built index file from ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/hg18.ebwt.zip.

  • Aligning to the genome:
tophat -p 4 -o alignments/ hg18_bowtie_idx s_1_sequence.txt

This aligns the sequence file (s_1_sequence.txt) against the human genome (hg18_bowtie_idx) using 4 CPUs (-p 4) and generates the alignment output (alignments/accepted_hits.sam) in the output directory (-o alignments/). The output SAM file does not contain the MD tag annotation so use the following step to fill in the MD tags.

  • Adding the MD tags to the SAM file:
samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa > alignments/accepted_hits.sam.filledmd

or to save bandwidth and disk space, use:

samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa | gzip > alignments/accepted_hits.sam.filledmd.gz

where hg18/hg18.fa.fai is the human genome index file (by samtools), hg18/hg18.fa is the FASTA sequence of the human genome.

For more detailed information, please go to http://tophat.cbcb.umd.edu/tutorial.html.

Personal tools