Elementolab/RRBSseeqer Tutorial

From Icbwiki

Jump to: navigation, search

Elementolab/ Elementolab/RRBSseeqer/



This tutorial assumes that you are in the RRBseeqer main directory (cd RRBS/). An even better way to proceed is to create an environment variable called RRBSEEQERDIR

export RRBSEEQERDIR=/path/to/RRBS  # REPLACE /path/to/RBBS by the REAL path

and then add RRBSEEQERDIR to your path:


Ideally, add the above to your .bashrc file so you don't have to retype it everytime you log in.

Transform epicore methycall files into RRBSeeqer input files (.calls)

perl epicore2calls.pl methylCall_Sample_YYY.txt > methylCall_Sample_YYY.txt.calls

Optionally you can gzip the call files as most other RRBSseeqer programs can read from gzipped files):

perl epicore2calls.pl methylCall_Sample_YYY.txt | gzip > methylCall_Sample_YYY.txt.calls.gz

Detect differentially methylated CGs

./RRBSseeqer_CG -rrbs1 sample_control.calls.gz -rrbs2 sample.calls.gz | sort_column.pl 1 | sort_column_alpnum.pl 0 > cgs.txt


-rrbs1		FILE (condition 1) - THIS SHOULD BE THE CONTROL
-rrbs2		FILE (condition 2)
-test		[fisher|chi] (Statistical test to be performed)
-minfold	FLOAT (Def: 1.25)
-fdr		FLOAT (Def: 0.2)

Output has the following columns

numC in file 1
numC+numT in file 1
numC in file 2
numC+numT in file 2
methylation ratio at CpG as methylation file 2 / methylation file 1
Fisher pvalue
adjusted Fisher pvalue

Detect differentially methylated regions (DMR)

perl RRBSidentifyUpDownDMR.pl --metfile=cgs.txt --dmax=250 --minmetdx=0.1 --minnumcg=5 -outfile=cgs_DMR.txt

Input must come from RRBSeeqer_CG. Parameters indicate maximum distance between diff methylated CGs (--dmax), minimum number of differentially methylated CGs for a region to be reported (--minumcg) and min total methylation dx for a region (--minmetdx), with 10% used default.

The DMR output columns are as follows:

pos start DMR
pos end DMR
num CpGs in DMR (num diff methylated CpGs at FDR=0.2 / num non diff methylated CpGs)
methylation difference (file2 - file1)   # min is controlled by -minmetdx; all CpGs in DMR go into this calculation, not only the differentially methylated CpGs

To create a bed file with all DMRs

perl DMR2bed.pl cgs_DMR.txt > cgs_DMR.bed

To create a bed file that can be uploaded to the Genome Browser, with two different tracks for hyper and hypo DMRs

perl DMR2bed2.pl cgs_DMR.txt

To get some stats on the DMRs

perl GetDMRstats.pl cgs_DMR.txt.HYPO
median = 188
average = 229.13698630137
min = 19
max = 1250

To annotate these DMRs, you can use ChIPseeqerAnnotate from our ChIPseeqer package

ChIPseeqerAnnotate --peakfile=cgs_DMR.txt.HYPO --genome=mm9 

Or use a custom Perl script to identify the genes that overlap with each DMR (refSeq is refGene.txt from the UCSC GB web site)

perl findOverlappingGenes.pl --peakfile=cgs_DMR.txt.HYPO --refgene=/Users/eug2002/PROGRAMS/ChIPseeqer/data/mm9/refSeq > cgs_DMR.txt.HYPO.overlappingGenes

Detect all covered regions (default 5 CpGs minimum, separated by less than 250bp between contiguous CpGs)

perl ~/PROGRAMS/RRBS/RRBSfindAllCoveredRegions.pl.pl --metfile=common_CpGs.txt

This script will create a common_CpGs.txt.REGIONS containing the covered regions. These regions can be annotated using ChIPseeqerAnnotate:

ChIPseeqerAnnotate --peakfile=common_CpGs.txt.REGIONS --genome=mm9

Annotate CGs

With CpG islands

./RRBSannotate -peakfile1 cgs.txt -peakfile2 DATA/hg19/CpGislands_annotation -showpeakdesc 1 -hasid1 0 -outfile cgs_CGI.txt

With RRBSannotate, you can just add annotation columns. Here's how to add shore annotation

./RRBSannotate -peakfile1 cgs_CGI.txt -peakfile2 DATA/hg19/CpGislands_annotation -shores 1 -ext2 1000  -hasid2 0  -showpeakdesc 1 -hasid1 0 -outfile cgs_CGI_shores.txt

With genes

./RRBSannotateGenes -rrbsfile cgs.txt -annotation DATA/hg19/refSeq -chrdata DATA/hg19.chrdata 

(you need to download gene annotation files first (e.g., hg18.tar.gz, hg19.tar.gz) from --> http://physiology.med.cornell.edu/faculty/elemento/lab/CS_files/)

Draw methylation across chromosomes

perl DrawRRBSmap2.pl --chrdata=DATA/hg19.chrdata --rrbsdata=cgs.txt

Make BigWig DIFFERENTIAL methylation track

Let's assume ben1_pca1.txt contains a differential methylation analysis generated by RRBSseeqer_CG above. To generate a BigWig track, make sure wigToBigWig is installed, and type something like this

RRBSmakeMethylDiffTrack -methfile ben1_pca1.txt -desc ben1_pca1 -type diff -bigwig 1 -chrdata hg19.chrdata -chromsizes hg19.chroms.sizes -s 25

This command will create a .bw track (binary format), with window size 50bp (2x25) that you can upload to your web server, then make visible to the UCSC browser using:

track type=bigWig name="ben1_pca1" description="ben1_pca1" bigDataUrl=http://physiology.med.cornell.edu/faculty/elemento/lab/files/ben1_pca1.wig.bw \ 
  visibility="full" maxHeightPixels="64:64:11" smoothingWindow="10" viewLimits="0:100" autoScale="off"

A sometimes faster way to visualize a BigWig track is to load it into IGV http://www.broadinstitute.org/igv/

Methylation levels for annotated elements

getMethylationLevelsForAnnotatedElements.pl --callfile=CpG_context_Sample_ln_cap_r.fastq.filtered.noadapt.gz_bismark.txt.calls_NM --headerhasid=0 \
--promlist=refSeq.new.u2000_d2000.PROMOTERS --minnumcg=2 --col=5

Methylation levels from scratch

 ./RRBSfindMethylationLevelForIntervals.pl --callfile=eRRBS_CB012511_50ng_myCpG.txt.calls.gz \ 
--regions=merged016_trim_CB_rep1_rep2_inter_100k.wig --hasid=1

(use hasid=1 if your interval file has identifiers eg for promoters, 0 otherwise eg for CpF islands)

It also works for differential analysis (output of RRBseeqer_CG)

./RRBSfindMethylationLevelForIntervals.pl --regions=refGene.txt.17Jan2012.oneperTSS.u2000_d2000 \
--callfile=empty_vs_sh2_21d.txt.RGYW --hasid=1 --diff=1 > empty_vs_sh2_21d.txt.RGYW.prom.diffmet

Allele-specific methylation

This analysis works from BAM files generated by Bismark. These files

Step 1: derive initial ASM calls. We'll assume that the BAM file is Sample_CB501.sorted.bam. We need a hg19.chrdata file that contains chr names and sizes. We also need a fasta file for hg19 containing all chromosomes. The fasta needs to be indexed using "samtools faidx".

$RRBSEEQERDIR/dist/RRBSalleleSpecificBAM -bamfile Sample_CB501.sorted.bam -chrdata $HOME/PROGRAMS/SNPseeqer/REFDATA/hg19.chrdata -fastafile $HOME/PROGRAMS/SNPseeqer/REFDATA/hg19/hg19.fa  > Sample_CB501.sorted.bam.ASM

Step 2: adjust for multiple testing (BH). Specify column index that contains p-values to correct.

perl $RRBSEEQERDIR/dist/p_adjust.pl Sample_CB501.sorted.bam.ASM 7 > Sample_CB501.sorted.bam.ASM.adj

Step 3: merge so we get a CG-based file

perl $RRBSEEQERDIR/dist/RRBS_ASM_merge.pl --asmfile=Sample_CB501.sorted.bam.ASM.adj --matched=0 

The output is like this

chr1	33236287	chr1/33236289/2/0/23/75/0/6.80e-23/5.91e-20	SS

which means:

chr, CpG pos,        chr/posSNP/offset_bet_SNP_and_CpG/  0/23/75/0 means for genotype 1, 0 5mC /23 non-5mC events, for genotype 2: 75 5mC / 0 non-5mC, pvalue, BH-adjusted, SS if BH<0.2

Step 4: visualize specific ASM locations. Here we will look at 1 CG at position chr1:5836437

perl $RRBSEEQERDIR/dist/RRBSshowCGSNPs_BAM.pl --bamfile=Sample_CB501.sorted.bam  --fastafile=/Users/ole2001/PROGRAMS/SNPseeqer/REFDATA/hg19/hg19.fa  --chrpos=chr1:5836437
Personal tools