Elementolab/miRSeeqer

From Icbwiki

Jump to: navigation, search

miRSeeqer is the Elemento Lab small-RNA-seq analysis toolkit. It can do adapter trimming, miRNA level quantification and normalization, etc.

To install

svn checkout https://pbtech-vc.med.cornell.edu/public/svn/elementolab/miRSeeqer/trunk miRSeeqer
svn checkout https://pbtech-vc.med.cornell.edu/public/svn/elementolab/BIO-C/trunk BIO-C  # (if not already installed)
cd miRSeeqer
make

miRSeeqer works from adapter-trimmed sorted and indexes BAM files.

To remove adapters, we provide a Smith-Waterman-based program, which works directly from gzipped FASTQ files

remove_miRNA_adapter s_9_sequence.txt.gz | gzip >  s_9_sequence.txt.gz.noadapt.gz

(other programs, e.g. FAR work well too)

Trimmed small RNA-seq reads can then be aligned using BWA (we typically use default parameters). Feel free to consult our BWA tutorial (section on genomic alignment) for details on this procedure.

miRSeeqer can then quantify levels of known miRNAs based on GFF3 annotation from miRBase, e.g. ftp://mirbase.org/pub/mirbase/CURRENT/genomes/hsa.gff3

miRSeeqer -bamfile s_9_sequence.txt.gz.noadapt.sorted.bam -gff3file hsa.gff3.hg19 -chrdata hg19.chrdata -outfile s_9_sequence.txt.gz.noadapt.sorted.bam.CNTS

The command above will alto produce a .stats file, with some statistics about the alignments, number of miRNA-matching reads, etc.

By default miRSeeqer produces the raw counts. Counts can also be normalized by the total number of miRNA-matching sequences (and multiplied by 1M) using:

-normalize MIR -normto 1000000

One can also specify a header. For example, if the sample label is EXP1, use

-lanedesc EXP1

Also note that miRSeeqer produces .stats files that contain basic statistics about the number of reads, the number and % of miRNA-mapping reads, etc.

To run miRseeqer on multiple samples (1 per lane for example), you need to create a tab-delimited file like this:

.       1       11S             GGCTAC  human
.       2       12S             CTTGTA  human
.       3       13S             AGTCAA  human
.       4       16S             AGTTCC  human
.       5       18S             ATGTCA  human

Col 1 = directory, Col 2 = lane number (used to find the miRSeeqer output file, e.g. s_9_sequence.txt.gz.noadapt.sorted.bam.CNTS), col 3 = sample name.

Then, run the following script:

perl run_miRSeeqer.pl --samples=sample_infos.txt --mir=1 --genome=hg19


You can then agglomerate stats from the samples using:

perl run_miRSeeqer_summarize_stats.pl --samples=sample_infos.txt
Personal tools