Elementolab/TargetID
From Icbwiki
←Older revision | Newer revision→
This web page describes how to detect and annotate point mutations and indels whose abundance is increased in one cell population (expanded clone) compared to another population (parental cells). This tutorial assumes that RNA-seq was performed on the two cell populations; however, the same analysis can be performed on whole-exome resequencing data.
To perform this analysis, please install the SNVseeqer package (C programs and libraries) as described in
http://icb.med.cornell.edu/wiki/index.php/Elementolab/SNVseeqer_Tutorial
You will also need to install David's geneTool/geneModel
http://icb.med.cornell.edu/wiki/index.php/GeneTools
as well as samtools
The following tutorial assumes that the expanded clone and parental cell RNA-seq datasets consist of 1 lane each, and are contained in the s_1_sequence.txt.gz and s_2_sequence.txt.gz files (resp). The tutorial does not cover QC analysis of RNA-seq datasets (see the SNVseeqer package tools for this).
- Step 1: align the RNA-seq reads to RefSeq mRNAs (reconstituted from hg18) using BWA (more info at http://icb.med.cornell.edu/wiki/index.php/Elementolab/BWA_tutorial)
bwa aln -t 4 refGene.txt.07Jun2010.fa.bwaidx s_1_sequence.txt.gz > s_1_sequence.txt.bwa bwa samse refGene.txt.07Jun2010.fa.bwaidx s_1_sequence.txt.bwa s_1_sequence.txt.gz > s_1_sequence.txt.sam rm s_1_sequence.txt.bwa bwa aln -t 4 refGene.txt.07Jun2010.fa.bwaidx s_2_sequence.txt.gz > s_2_sequence.txt.bwa bwa samse refGene.txt.07Jun2010.fa.bwaidx s_2_sequence.txt.bwa s_2_sequence.txt.gz > s_2_sequence.txt.sam rm s_2_sequence.txt.bwa
- Step 2: remap to genome using David's geneModel/ref2g, rebuild MD tags
geneModel align -cmd ref2g -i s_1_sequence.txt.sam -o s_1_sequence.txt.hg18.sam -ref refGene.txt.07Jun2010 samtools view -t hg18/wg.fa.fai -bS s_1_sequence.txt.hg18.sam > s_1_sequence.txt.hg18.bam samtools sort s_1_sequence.txt.hg18.bam s_1_sequence.txt.hg18.sorted samtools index s_1_sequence.txt.hg18.sorted.bam samtools fillmd s_1_sequence.txt.hg18.sorted.bam hg18/wg.fa > s_1_sequence.txt.hg18.sorted.sam
geneModel align -cmd ref2g -i s_2_sequence.txt.sam -o s_2_sequence.txt.hg18.sam -ref refGene.txt.07Jun2010 samtools view -t hg18/wg.fa.fai -bS s_2_sequence.txt.hg18.sam > s_2_sequence.txt.hg18.bam samtools sort s_2_sequence.txt.hg18.bam s_2_sequence.txt.hg18.sorted samtools index s_2_sequence.txt.hg18.sorted.bam samtools fillmd s_2_sequence.txt.hg18.sorted.bam hg18/wg.fa > s_2_sequence.txt.hg18.sorted.sam
- Step 2: split SAM files to facilitate the analysis
mkdir s_1_sequence.txt.hg18.sorted.sam_SPLIT $SNVSEEQERDIR/split_samfile s_1_sequence.txt.hg18.sorted.sam -outdir s_1_sequence.txt.hg18.sorted.sam_SPLIT mkdir s_2_sequence.txt.hg18.sorted.sam_SPLIT $SNVSEEQERDIR/split_samfile s_2_sequence.txt.hg18.sorted.sam -outdir s_2_sequence.txt.hg18.sorted.sam_SPLIT
- Step 3: look for SNVs in the expanded clone
$SNVSEEQERDIR/SNVseeqerPB -readdir s_1_sequence.txt.hg18.sorted.sam_SPLIT -fastaref $SNVSEEQERDIR/REFDATA/refGene.txt.07Jun2010.fa -format sam \ -readlen 40 -verbose 1 -minnumreads 4 -correction BH -fdr 0.01 -pvtype PB -uniquereads 1 -outfile s_1_sequence.txt.hg18.sorted.sam.SNVs
- Step 4: annotate these SNVs using dbSNP and RefSeq gene annotation
$SNVSEEQERDIR/dbSNPmatchesRefSeq -snvfile s_1_sequence.txt.hg18.sorted.sam.SNVs -refseqsnpfile REFDATA/refGene.txt.07Jun2010.dbSNPmap > s_1_sequence.txt.hg18.sorted.sam.SNVs.dSNP $SNVSEEQERDIR/annotateSNVsRefSeq -snvfile s_1_sequence.txt.hg18.sorted.sam.SNVs.dbSNP -refgene REFDATA/refGene.txt.07Jun2010 \ -fastaref REFDATA/refGene.txt.07Jun2010.fa > s_1_sequence.txt.hg18.sorted.sam.SNVs.dbSNP.annot
- Step 5: determine which SNVs have increased their abundance in the expanded clone compared to the parental cell population
$SNVSEEQERDIR/CompareSNVsAlleleFrequencyWithControl -snvfile s_1_sequence.txt.hg18.sorted.sam.SNVs.dbSNP.annot \ -readdir s_2_sequence.txt.hg18.sorted.sam_SPLIT -chrdata $SNVSEEQERDIR/REFDATA/refGene.txt.07Jun2010.fa.chrdata -fdr 0.05
- Step 6: repeat the same analysis for indels
$SNVSEEQERDIR/INDELseeqer -readdir s_1_sequence.txt.hg18.sorted.sam_SPLIT -chrdata $SNVSEEQERDIR/REFDATA/refGene.txt.07Jun2010.fa.chrdata \ -format sam -verbose 0 -uniquereads 1 -outfile s_1_sequence.txt.hg18.sorted.sam.INDEL $SNVSEEQERDIR/annotateINDELs -snvfile s_1_sequence.txt.hg18.sorted.sam.INDEL -refgene REFDATA/refGene.txt.07Jun2010 > s_1_sequence.txt.hg18.sorted.sam.INDEL.annot $SNVSEEQERDIR/CompareINDELsAlleleFrequencyWithControl -snvfile s_1_sequence.txt.hg18.sorted.sam.INDEL.annot -readdir s_2_sequence.txt.hg18.sorted.sam_SPLIT \ -chrdata $SNVSEEQERDIR/REFDATA/refGene.txt.07Jun2010.fa.chrdata