Elementolab/SNVseeqer Tutorial SHM

From Icbwiki

Jump to: navigation, search

This is a quick tutorial about how to analyze SHM from ChIP-seq data.

  • Apply SNVseeqer as described in tutorial. You only need to run SNPseeqerPB and dbSNPmatches for this analysis.
  • Apply ChIPseeqer to the ChIP-seq data with permissive threshold, peak extension, and close peak merging (see ChIPseeqer pages for details about installation)
$CHIPSEEQERDIR/ChIPseeqer.bin -chipdir . -t 2 -mindist 500 -ext 100 > peaks_t2_m500_e100.txt
  • Annotate SNVs using ChIPseeqer peaks, using annotateSNVs program in SNVseeqer package
$SNVSEEQERDIR/annotateSNVs -snvfile SNVs_dbSNP.txt -peaks peaks_t2_m500_e100.txt | tee SNVs_dbSNP_withPeaks.txt
  • Count non-dbSNP SNVs in each peak, and determine which ones affect Gs in RGYW motifs
perl  $SNVSEEQERDIR/SHManalyzeAnnotatedPeaks.pl --annotfile=SNVs_dbSNP_withPeaks.txt | tee SNVs_dbSNP_withPeaks_withSHM.txt

By default, only the peaks with the top 50 largest numbers of SNVs are analyzed. That can be changed using --stopafter=100000000

This script also accepts an option --annotcol=INT which specifies which column in the input file contains the peak annotation (columns are numbered from 0).

  • Find genes nearby each peak using annotatePeaks
$SNVSEEQERDIR/annotatePeaks -peaks SNVs_dbSNP_withPeaks_withSHM.txt -refgene $SNVSEEQERDIR/REFDATA/refGene.txt.25Nov2009

Instead of -refgene, you can use:

-knowngene $SNVSEEQERDIR/REFDATA/knownGeneWithAliases.txt

which contains more genes (e.g. IG/TR genes). There's also:

-wgrnas $SNVSEEQERDIR/REFDATA/wgRna.txt # small and long ncRNAs
-aceview $SNVSEEQERDIR/REFDATA/acembly.txt # AceView genes

The final output for a refGene analysis will look like this:

chr15	57066538	57068689	10	4.6	0(0.0%)	1/9	Gene:NM_017610,RNF111,1
chr5	148909002	148912034	10	3.3	0(0.0%)	4/6	Gene:NM_001892,CSNK1A1,-1|Gene:NM_001025105,CSNK1A1,-1
chr2	219748903	219752376	10	2.9	0(0.0%)	5/5	Gene:NM_024293,FAM134A,1|Gene:NM_015680,C2orf24,-1
chrX	44750041	44750438	10	25.1	0(0.0%)	0/10	Gene:NM_021140,KDM6A,1
chr21	37659013	37663291	10	2.3	1(10.0%)	1/9	Gene:NM_101395,DYRK1A,1
chr16	4836064	4839248	10	3.1	0(0.0%)	3/7	Gene:NM_032569,GLYR1,-1|Gene:NM_001079514,UBN1,1
chr11	683192	688293	10	2.0	0(0.0%)	3/7	Gene:NM_001042463,TMEM80,1|Gene:NM_021008,DEAF1,-1|Gene:NM_174940,TMEM80,1
chr3	51396462	51399680	10	3.1	0(0.0%)	1/9	Gene:NM_004947,DOCK3,1|Gene:NM_006010,MANF,1

where the first three columns describe the peak, the 4th column indicates how many SNVs are located in that peak, the 5th indicate the SNV density per kb, 6th = # SHM among SNVs (and fraction), 7th = # transitions/#transversions, and 8th = overlapping genes for that peak.

  • alternative gene-based (instead of peak-based analysis - THIS IS WHAT WE DID IN THE PAPER):
$SNVSEEQERDIR/annotateSNVs -snvfile SNVs_dbSNP.txt -refgene $SNVSEEQERDIR/REFDATA/refGene.txt.25Nov2009 | tee SNVs_dbSNP_withRefGene.txt
perl  $SNVSEEQERDIR/SHManalyzeAnnotatedPeaks.pl --annotfile=SNVs_dbSNP_withRefGene.txt --regions=promoters


--snvoutfile=SNVsinSHMgenes.txt to output SNVs associated with each gene, only for genes that have SNV hotspots. Only works with --regions=promoters
  • Annotate a SNV list
SHManalyze.pl --chrdir=/Users/ole2001/PROGRAMS/SNPseeqer/REFDATA/hg18 --snvfile=LY1_H3K4me3May09Feb10_BWA.SNVs.dbSNP_withRefGene.txt > LY1_H3K4me3May09Feb10_BWA.SNVs.dbSNP_withRefGene.txt.SHM
Personal tools