Elementolab/RNA-seq pipeline

From Icbwiki

Jump to: navigation, search

back to ElementoLab/

This page describe our cluster-based pipeline for RNA-seq processing.

The pipeline currently assumes that Summary.xml files exist (the same as those produced by CASAVA) and contain information about lane, sample name, species etc.

<Samples>
 <Lane>
  <laneNumber>1</laneNumber>
  <sampleId>CA10S</sampleId>
 <species>human</species>
 <barcode>GTGGCC</barcode>
</Lane>
<Lane>
 <laneNumber>2</laneNumber>
 <sampleId>CA12S</sampleId>
 <species>human</species>
 <barcode>GTTTCG</barcode>
</Lane>
...
  • to create the Summary.xml file based on GCF core read files (Cornell specific)
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/casava1.8_merge_move_data.pl --dir="Sa*" --tgtdir=. 

Options

--exec=0  for a dry run
--rnaseqrun=1 to automatically run full pipeline (a bit unreliable)
--pe=0  for SR
  • to get information about files in directories based on Summary.xml
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_sampleinfo.pl --dirs=.
  • to run TopHat+Cufflinks with RefSeq annotation (default)
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_run.pl  --dirs=. --submit=1 --numcpus=8

The script will autodetect the species, and align to the corresponding genome etc

If the data is storied on /zenodotus, use

--target=zenodotus (it will autodetect anyway)

Default human genome is hg18 (for historical reasons). To use hg19, use

--genomeversion=hg19
  • to combine RNA-seq runs into an annotated matrix
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_summarize.pl  --dirs=. --outfile=mat_vivek_RNAseq.txt --species=mouse --genes=1

To change sample names, use

--sampleinfo=tablenames.txt # tab delim, first col is core name, second col is full name
  • to make BigWig tracks, upload to lab's web server; will write track data to screen by default
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_makeBigWigTracks.pl --dirs=. --tophat=1 --verbose=1 --updir=AIDiPSC

Options:

--normalize=1 to normalize read counts
--trackfile=FILE to output track data to file
--genome=STR to specify genome eg hg19
  • to generate basic QC data, eg number of fastq reads, number of aligned reads, number of uniquely aligned reads
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_TopHatQC.pl --dirs=.


  • to look for fusions (using TopHaT fusion)
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_fusion.pl --dirs=001,002 --submit=1 --genomeversion=hg19
Personal tools