Tophat tutorial

From Icbwiki

(Difference between revisions)
Jump to: navigation, search
Revision as of 03:44, 11 May 2010
Tas2019 (Talk | contribs)

← Previous diff
Revision as of 14:02, 11 May 2010
Tas2019 (Talk | contribs)

Next diff →
Line 23: Line 23:
* '''Adding the MD tags to the SAM file:''' * '''Adding the MD tags to the SAM file:'''
- samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa > alignments/accepted_hits.sam+ samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa > alignments/accepted_hits.sam.filledmd
where hg18/hg18.fa.fai is the human genome index file (by samtools), hg18/hg18.fa is the FASTA sequence of the human genome. where hg18/hg18.fa.fai is the human genome index file (by samtools), hg18/hg18.fa is the FASTA sequence of the human genome.
For more detail information, please go to http://tophat.cbcb.umd.edu/tutorial.html. For more detail information, please go to http://tophat.cbcb.umd.edu/tutorial.html.

Revision as of 14:02, 11 May 2010

  • TopHat

TopHat is a spliced read mapper for mRNA-seq reads. It uses the short read aligner Bowtie to map reads against the whole genome and analyzes the mapping results to identify splice junctions between exons.

  • Installation:

Download the source code from http://tophat.cbcb.umd.edu/. Compile and install the program using:

   ./configure --prefix=/path/to/install/directory/
   make
   make install
  • Example usage:
  • Building the genome index database:
bowtie-build hg18/hg18.fa hg18bwaidx

This builds the Bowtie index for the human genome hg18. You can also download the pre-built index file from ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/hg18.ebwt.zip.

  • Aligning to the genome:
tophat -p 4 -o alignments/ hg18_bowtie_idx s_1_sequence.txt

This aligns the sequence file (s_1_sequence.txt) against the human genome (hg18_bowtie_idx) using 4 CPUs (-p 4) and generates the alignment output (alignments/accepted_hits.sam) in the output directory (-o alignments/). The output SAM file does not contain the MD tag annotation so use the following step to fill in the MD tags.

  • Adding the MD tags to the SAM file:
samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa > alignments/accepted_hits.sam.filledmd

where hg18/hg18.fa.fai is the human genome index file (by samtools), hg18/hg18.fa is the FASTA sequence of the human genome.

For more detail information, please go to http://tophat.cbcb.umd.edu/tutorial.html.

Personal tools