Elementolab/small RNA-seq pipeline
From Icbwiki
- clean up raw files
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/casava1.8_merge_move_data_old.pl --dir="Sa*" --tgtdir=. --exec=1 --pe=0
- remove adapters
If the adapters have been masked, eg by the core:
for f in s_*.txt.gz; do gr_5n "perl /home/ole2001/PROGRAMS/SNPseeqer/SCRIPTS/fastq_remove_N_adapters.pl $f" --name=rem$f; done
Otherwise use Smith-Waterman:
for f in *.gz; do echo $f; gr_5n "/home/ole2001/PROGRAMS/SMITH-WATER/remove_miRNA_adapter -fastqfile $f | gzip > $f.noadapt.gz" -name S$f ; done
- Align reads
for f in *.noadapt.gz; do PBS_quickAlign.pl --fastq=$f --bwaidx=/pbtech_mounts/oelab_scratch002/ole2001/DATA/GENOMES/hg19/hg19bwaidx --submit=1; done
Same for mouse
for f in *.noadapt.gz; do PBS_quickAlign.pl --fastq=$f --bwaidx=/home/ole2001/PROGRAMS/SNPseeqer/REFDATA/mm9/mm9idx --submit=1; done
- get a sample list
perl ~/PROGRAMS/SNPseeqer/SCRIPTS/RNAseq_sampleinfo.pl --dirs=. > sample_infos.txt
Typical output
. 1 CA10S GTGGCC human . 2 CA12S GTTTCG human . 3 CA13S CATGGC human . 4 CA14S CATTTT human . 5 CA16S CCAACA human . 6 CA1S AGTCAA human
- run miRseeqer
perl /home/ole2001/PROGRAMS/miRSeeqer/run_miRSeeqer.pl --samples=samples_info.txt --mir=1 --genome=hg19
- run stats analysis
perl ~/PROGRAMS/miRSeeqer/run_miRSeeqer_summarize_stats.pl --samples=sample_infos.txt
- add sample group information, eg in a new file called sample_infos.txt.withgroups
. 1 CA10S GTGGCC human HCC . 2 CA12S GTTTCG human HCC . 3 CA13S CATGGC human HCC . 4 CA14S CATTTT human HCA . 5 CA16S CCAACA human HCA . 6 CA1S AGTCAA human HCA
- to run a pairwise LIMMA analysis on all groups
perl ~/PERL_SCRIPTS/expression_split_files_based_on_sample_info.pl --expfile=carcinoid_miRNA.txt --sampleinfo=sample_infos.txt.withgroups --mirnas=1