Tophat tutorial

From Icbwiki

(Difference between revisions)
Jump to: navigation, search
Revision as of 14:03, 11 May 2010
Tas2019 (Talk | contribs)

← Previous diff
Current revision
Ole2001 (Talk | contribs)
(Elemento Lab TopHat pipeline)
Line 4: Line 4:
* '''Installation:''' * '''Installation:'''
Download the source code from http://tophat.cbcb.umd.edu/. Compile and install the program using: Download the source code from http://tophat.cbcb.umd.edu/. Compile and install the program using:
- ./configure --prefix=/path/to/install/directory/+ ./configure --prefix=/path/to/install/directory/ --with-bam=/path/to/bam
 + (where /path/to/bam contains include/bam and lib/libbam.a)
make make
make install make install
Line 11: Line 12:
* '''Building the genome index database:''' * '''Building the genome index database:'''
- bowtie-build hg18/hg18.fa hg18bwaidx+ bowtie-build hg18/hg18.fa hg18_bowtie_idx
This builds the Bowtie index for the human genome hg18. You can also download the pre-built index file from ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/hg18.ebwt.zip. This builds the Bowtie index for the human genome hg18. You can also download the pre-built index file from ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/hg18.ebwt.zip.
Line 20: Line 21:
This aligns the sequence file (s_1_sequence.txt) against the human genome (hg18_bowtie_idx) using 4 CPUs (-p 4) and generates the alignment output (alignments/accepted_hits.sam) in the output directory (-o alignments/). The output SAM file does not contain the MD tag annotation so use the following step to fill in the MD tags. This aligns the sequence file (s_1_sequence.txt) against the human genome (hg18_bowtie_idx) using 4 CPUs (-p 4) and generates the alignment output (alignments/accepted_hits.sam) in the output directory (-o alignments/). The output SAM file does not contain the MD tag annotation so use the following step to fill in the MD tags.
 +
 +If the output is in the BAM format, you can decompress it using
 +
 + samtools view file.bam
 +
* '''Adding the MD tags to the SAM file:''' * '''Adding the MD tags to the SAM file:'''
samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa > alignments/accepted_hits.sam.filledmd samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa > alignments/accepted_hits.sam.filledmd
 +
 +or to save bandwidth and disk space, use:
 + samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa | gzip > alignments/accepted_hits.sam.filledmd.gz
where hg18/hg18.fa.fai is the human genome index file (by samtools), hg18/hg18.fa is the FASTA sequence of the human genome. where hg18/hg18.fa.fai is the human genome index file (by samtools), hg18/hg18.fa is the FASTA sequence of the human genome.
For more detailed information, please go to http://tophat.cbcb.umd.edu/tutorial.html. For more detailed information, please go to http://tophat.cbcb.umd.edu/tutorial.html.

Current revision

  • TopHat

TopHat is a spliced read mapper for mRNA-seq reads. It uses the short read aligner Bowtie to map reads against the whole genome and analyzes the mapping results to identify splice junctions between exons.

  • Installation:

Download the source code from http://tophat.cbcb.umd.edu/. Compile and install the program using:

   ./configure --prefix=/path/to/install/directory/ --with-bam=/path/to/bam
   (where /path/to/bam contains include/bam and lib/libbam.a) 
   make
   make install
  • Example usage:
  • Building the genome index database:
bowtie-build hg18/hg18.fa hg18_bowtie_idx

This builds the Bowtie index for the human genome hg18. You can also download the pre-built index file from ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/hg18.ebwt.zip.

  • Aligning to the genome:
tophat -p 4 -o alignments/ hg18_bowtie_idx s_1_sequence.txt

This aligns the sequence file (s_1_sequence.txt) against the human genome (hg18_bowtie_idx) using 4 CPUs (-p 4) and generates the alignment output (alignments/accepted_hits.sam) in the output directory (-o alignments/). The output SAM file does not contain the MD tag annotation so use the following step to fill in the MD tags.

If the output is in the BAM format, you can decompress it using

samtools view file.bam


  • Adding the MD tags to the SAM file:
samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa > alignments/accepted_hits.sam.filledmd

or to save bandwidth and disk space, use:

samtools view -b -t hg18/hg18.fa.fai alignments/accepted_hits.sam | samtools fillmd - hg18/hg18.fa | gzip > alignments/accepted_hits.sam.filledmd.gz

where hg18/hg18.fa.fai is the human genome index file (by samtools), hg18/hg18.fa is the FASTA sequence of the human genome.

For more detailed information, please go to http://tophat.cbcb.umd.edu/tutorial.html.

Personal tools