Elementolab/SNVseeqer Tutorial SHM
This is a quick tutorial about how to analyze SHM from ChIP-seq data.
- Apply SNVseeqer as described in tutorial. You only need to run SNPseeqerPB and dbSNPmatches for this analysis.
- Apply ChIPseeqer to the ChIP-seq data with permissive threshold, peak extension, and close peak merging (see ChIPseeqer pages for details about installation)
$CHIPSEEQERDIR/ChIPseeqer.bin -chipdir . -t 2 -mindist 500 -ext 100 > peaks_t2_m500_e100.txt
- Annotate SNVs using ChIPseeqer peaks, using annotateSNVs program in SNVseeqer package
$SNVSEEQERDIR/annotateSNVs -snvfile SNVs_dbSNP.txt -peaks peaks_t2_m500_e100.txt | tee SNVs_dbSNP_withPeaks.txt
- Count non-dbSNP SNVs in each peak, and determine which ones affect Gs in RGYW motifs
perl $SNVSEEQERDIR/SHManalyzeAnnotatedPeaks.pl --annotfile=SNVs_dbSNP_withPeaks.txt | tee SNVs_dbSNP_withPeaks_withSHM.txt
By default, only the peaks with the top 50 largest numbers of SNVs are analyzed. That can be changed using --stopafter=100000000
This script also accepts an option --annotcol=INT which specifies which column in the input file contains the peak annotation (columns are numbered from 0).
- Find genes nearby each peak using annotatePeaks
$SNVSEEQERDIR/annotatePeaks -peaks SNVs_dbSNP_withPeaks_withSHM.txt -refgene $SNVSEEQERDIR/REFDATA/refGene.txt.25Nov2009
Instead of -refgene, you can use:
which contains more genes (e.g. IG/TR genes). There's also:
-wgrnas $SNVSEEQERDIR/REFDATA/wgRna.txt # small and long ncRNAs -aceview $SNVSEEQERDIR/REFDATA/acembly.txt # AceView genes
The final output for a refGene analysis will look like this:
chr15 57066538 57068689 10 4.6 0(0.0%) 1/9 Gene:NM_017610,RNF111,1 chr5 148909002 148912034 10 3.3 0(0.0%) 4/6 Gene:NM_001892,CSNK1A1,-1|Gene:NM_001025105,CSNK1A1,-1 chr2 219748903 219752376 10 2.9 0(0.0%) 5/5 Gene:NM_024293,FAM134A,1|Gene:NM_015680,C2orf24,-1 chrX 44750041 44750438 10 25.1 0(0.0%) 0/10 Gene:NM_021140,KDM6A,1 chr21 37659013 37663291 10 2.3 1(10.0%) 1/9 Gene:NM_101395,DYRK1A,1 chr16 4836064 4839248 10 3.1 0(0.0%) 3/7 Gene:NM_032569,GLYR1,-1|Gene:NM_001079514,UBN1,1 chr11 683192 688293 10 2.0 0(0.0%) 3/7 Gene:NM_001042463,TMEM80,1|Gene:NM_021008,DEAF1,-1|Gene:NM_174940,TMEM80,1 chr3 51396462 51399680 10 3.1 0(0.0%) 1/9 Gene:NM_004947,DOCK3,1|Gene:NM_006010,MANF,1
where the first three columns describe the peak, the 4th column indicates how many SNVs are located in that peak, the 5th indicate the SNV density per kb, 6th = # SHM among SNVs (and fraction), 7th = # transitions/#transversions, and 8th = overlapping genes for that peak.
- alternative gene-based (instead of peak-based analysis - THIS IS WHAT WE DID IN THE PAPER):
$SNVSEEQERDIR/annotateSNVs -snvfile SNVs_dbSNP.txt -refgene $SNVSEEQERDIR/REFDATA/refGene.txt.25Nov2009 | tee SNVs_dbSNP_withRefGene.txt perl $SNVSEEQERDIR/SHManalyzeAnnotatedPeaks.pl --annotfile=SNVs_dbSNP_withRefGene.txt --regions=promoters
--snvoutfile=SNVsinSHMgenes.txt to output SNVs associated with each gene, only for genes that have SNV hotspots. Only works with --regions=promoters
- Annotate a SNV list
SHManalyze.pl --chrdir=/Users/ole2001/PROGRAMS/SNPseeqer/REFDATA/hg18 --snvfile=LY1_H3K4me3May09Feb10_BWA.SNVs.dbSNP_withRefGene.txt > LY1_H3K4me3May09Feb10_BWA.SNVs.dbSNP_withRefGene.txt.SHM