Elementolab/RNA-seq pipeline
From Icbwiki
This page describe our cluster-based pipeline for RNA-seq processing.
The pipeline currently assumes that Summary.xml files exist (the same as those produced by CASAVA) and contain information about lane, sample name, species etc.
<Samples> <Lane> <laneNumber>1</laneNumber> <sampleId>CA10S</sampleId> <species>human</species> <barcode>GTGGCC</barcode> </Lane> <Lane> <laneNumber>2</laneNumber> <sampleId>CA12S</sampleId> <species>human</species> <barcode>GTTTCG</barcode> </Lane> ...
- to create the Summary.xml file based on GCF core read files (Cornell specific)
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/casava1.8_merge_move_data.pl --dir="Sa*" --tgtdir=.
Options
--exec=0 for a dry run --rnaseqrun=1 to automatically run full pipeline (a bit unreliable) --pe=0 for SR
- to get information about files in directories based on Summary.xml
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_sampleinfo.pl --dirs=.
- to run TopHat+Cufflinks with RefSeq annotation (default)
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_run.pl --dirs=. --submit=1 --numcpus=8
The script will autodetect the species, and align to the corresponding genome etc
If the data is storied on /zenodotus, use
--target=zenodotus (it will autodetect anyway)
Default human genome is hg18 (for historical reasons). To use hg19, use
--genomeversion=hg19
- to combine RNA-seq runs into an annotated matrix
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_summarize.pl --dirs=. --outfile=mat_vivek_RNAseq.txt --species=mouse --genes=1
To change sample names, use
--sampleinfo=tablenames.txt # tab delim, first col is core name, second col is full name
- to make BigWig tracks, upload to lab's web server; will write track data to screen by default
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_makeBigWigTracks.pl --dirs=. --tophat=1 --verbose=1 --updir=AIDiPSC
Options:
--normalize=1 to normalize read counts --trackfile=FILE to output track data to file --genome=STR to specify genome eg hg19
- to generate basic QC data, eg number of fastq reads, number of aligned reads, number of uniquely aligned reads
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_TopHatQC.pl --dirs=.
- to look for fusions (using TopHaT fusion)
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_fusion.pl --dirs=001,002 --submit=1 --genomeversion=hg19