Elementolab/RNA-seq RPKMs
From Icbwiki
Back to Elementolab/
Computing RPKM values from RNA-seq data
Related pages:
Steps:
1. Aligning the raw sequence data (FASTQ or FASTA format) to annotated RefSeq mRNAs
- fastq.gz -> bwa
2. Creating alignment in the SAM format
- bwa -> sam
3. Converting RefSeq-based alignments to genome-based alignments
- sam -> genome.sam
4. Converting SAM to BAM format
- genome.sam -> genome.bam
5. Creating sorted-by-coordinated BAM file
- genome.bam -> genome.sorted.bam
6. Indexing the sorted BAM file
- genome.sorted.bam -> genome.sorted.bam.bai
7. Computing RPKM (reads per kilobase per million mapped reads) expression values from a sorted BAM file
- genome.sorted.bam.bai -> RNAseq_RPKMs
fastq.gz -> bwa -> sam -> genome.sam -> genome.bam -> genome.sorted.bam -> genome.sorted.bam.bai -> RNAseq_RPKMs
Commands:
1. Aligning the raw sequence data (FASTQ or FASTA format) to annotated RefSeq mRNAs
bwa aln -t 4 ~/bwa/RefSeqbwaidx RnaSeq_data.fastq.gz > RnaSeq_data.bwa
2. Creating alignment in the SAM format (a generic format for storing large nucleotide sequence alignments)
bwa samse ~/bwa/RefSeqbwaidx RnaSeq_data.bwa RnaSeq_data.fastq.gz > RnaSeq_data.sam
3. Converting RefSeq-based alignments to genome-based alignments
./~/geneModel align -cmd ref2g -i RnaSeq_data.sam -o RnaSeq_data.genome.sam
4. Converting SAM to BAM format (binary format)
samtools import ~/bwa/wg.fa.fai RnaSeq_data.genome.sam RnaSeq_data.genome.bam
5. Creating sorted-by-coordinated BAM file
samtools sort RnaSeq_data.genome.bam RnaSeq_data.genome.sorted.bam
6. Indexing the sorted BAM file
samtools index RnaSeq_data.genome.sorted.bam #this will create an indexed bam file called RnaSeq_data.genome.sorted.bam.bai
7. Computing RPKM (reads per kilobase per million mapped reads) expression values from a sorted BAM file
./~/geneModel calcexp -cmd bamrpkm –i RnaSeq_data.genome.sorted.bam --uniq –o RnaSeq_data_RPKM.txt -i input sorted bam file -o output file containing RPKM expression values