Elementolab/TargetID

From Icbwiki

Jump to: navigation, search

This web page describes how to detect and annotate point mutations and indels whose abundance is increased in one cell population (expanded clone) compared to another population (parental cells). This tutorial assumes that RNA-seq was performed on the two cell populations; however, the same analysis can be performed on whole-exome resequencing data.

To perform this analysis, please install the SNVseeqer package (C programs and libraries) as described in

http://icb.med.cornell.edu/wiki/index.php/Elementolab/SNVseeqer_Tutorial

You will also need to install David's geneTool/geneModel

http://icb.med.cornell.edu/wiki/index.php/GeneTools

as well as samtools

The following tutorial assumes that the expanded clone and parental cell RNA-seq datasets consist of 1 lane each, and are contained in the s_1_sequence.txt.gz and s_2_sequence.txt.gz files (resp). The tutorial does not cover QC analysis of RNA-seq datasets (see the SNVseeqer package tools for this).

 bwa aln -t 4 refGene.txt.07Jun2010.fa.bwaidx s_1_sequence.txt.gz > s_1_sequence.txt.bwa

 bwa samse refGene.txt.07Jun2010.fa.bwaidx s_1_sequence.txt.bwa s_1_sequence.txt.gz > s_1_sequence.txt.sam

 rm s_1_sequence.txt.bwa

 bwa aln -t 4 refGene.txt.07Jun2010.fa.bwaidx s_2_sequence.txt.gz > s_2_sequence.txt.bwa

 bwa samse refGene.txt.07Jun2010.fa.bwaidx s_2_sequence.txt.bwa s_2_sequence.txt.gz > s_2_sequence.txt.sam

 rm s_2_sequence.txt.bwa
  • Step 2: remap to genome using David's geneModel/ref2g, rebuild MD tags
geneModel align -cmd ref2g -i s_1_sequence.txt.sam -o s_1_sequence.txt.hg18.sam -ref refGene.txt.07Jun2010
samtools view -t hg18/wg.fa.fai -bS s_1_sequence.txt.hg18.sam > s_1_sequence.txt.hg18.bam
samtools sort  s_1_sequence.txt.hg18.bam s_1_sequence.txt.hg18.sorted
samtools index s_1_sequence.txt.hg18.sorted.bam 
samtools fillmd  s_1_sequence.txt.hg18.sorted.bam hg18/wg.fa > s_1_sequence.txt.hg18.sorted.sam
geneModel align -cmd ref2g -i s_2_sequence.txt.sam -o s_2_sequence.txt.hg18.sam -ref refGene.txt.07Jun2010
samtools view -t hg18/wg.fa.fai -bS s_2_sequence.txt.hg18.sam > s_2_sequence.txt.hg18.bam
samtools sort  s_2_sequence.txt.hg18.bam s_2_sequence.txt.hg18.sorted
samtools index s_2_sequence.txt.hg18.sorted.bam 
samtools fillmd  s_2_sequence.txt.hg18.sorted.bam hg18/wg.fa > s_2_sequence.txt.hg18.sorted.sam
  • Step 2: split SAM files to facilitate the analysis
 mkdir s_1_sequence.txt.hg18.sorted.sam_SPLIT

 $SNVSEEQERDIR/split_samfile s_1_sequence.txt.hg18.sorted.sam -outdir s_1_sequence.txt.hg18.sorted.sam_SPLIT

 mkdir s_2_sequence.txt.hg18.sorted.sam_SPLIT

 $SNVSEEQERDIR/split_samfile s_2_sequence.txt.hg18.sorted.sam -outdir s_2_sequence.txt.hg18.sorted.sam_SPLIT
 
  • Step 3: look for SNVs in the expanded clone
$SNVSEEQERDIR/SNVseeqerPB -readdir s_1_sequence.txt.hg18.sorted.sam_SPLIT -fastaref $SNVSEEQERDIR/REFDATA/refGene.txt.07Jun2010.fa -format sam \ 
  -readlen 40 -verbose 1 -minnumreads 4 -correction BH -fdr 0.01 -pvtype PB -uniquereads 1 -outfile s_1_sequence.txt.hg18.sorted.sam.SNVs
  • Step 4: annotate these SNVs using dbSNP and RefSeq gene annotation
$SNVSEEQERDIR/dbSNPmatchesRefSeq -snvfile s_1_sequence.txt.hg18.sorted.sam.SNVs -refseqsnpfile REFDATA/refGene.txt.07Jun2010.dbSNPmap > s_1_sequence.txt.hg18.sorted.sam.SNVs.dSNP

$SNVSEEQERDIR/annotateSNVsRefSeq -snvfile s_1_sequence.txt.hg18.sorted.sam.SNVs.dbSNP -refgene REFDATA/refGene.txt.07Jun2010 \ 
 -fastaref REFDATA/refGene.txt.07Jun2010.fa > s_1_sequence.txt.hg18.sorted.sam.SNVs.dbSNP.annot


  • Step 5: determine which SNVs have increased their abundance in the expanded clone compared to the parental cell population
 $SNVSEEQERDIR/CompareSNVsAlleleFrequencyWithControl -snvfile s_1_sequence.txt.hg18.sorted.sam.SNVs.dbSNP.annot \
  -readdir s_2_sequence.txt.hg18.sorted.sam_SPLIT -chrdata $SNVSEEQERDIR/REFDATA/refGene.txt.07Jun2010.fa.chrdata -fdr 0.05


  • Step 6: repeat the same analysis for indels
 $SNVSEEQERDIR/INDELseeqer -readdir s_1_sequence.txt.hg18.sorted.sam_SPLIT -chrdata $SNVSEEQERDIR/REFDATA/refGene.txt.07Jun2010.fa.chrdata \
  -format sam  -verbose 0 -uniquereads 1 -outfile s_1_sequence.txt.hg18.sorted.sam.INDEL

 $SNVSEEQERDIR/annotateINDELs -snvfile s_1_sequence.txt.hg18.sorted.sam.INDEL -refgene REFDATA/refGene.txt.07Jun2010 > s_1_sequence.txt.hg18.sorted.sam.INDEL.annot

 $SNVSEEQERDIR/CompareINDELsAlleleFrequencyWithControl -snvfile s_1_sequence.txt.hg18.sorted.sam.INDEL.annot -readdir s_2_sequence.txt.hg18.sorted.sam_SPLIT \
   -chrdata $SNVSEEQERDIR/REFDATA/refGene.txt.07Jun2010.fa.chrdata
Personal tools