Elementolab/RRBSseeqer Tutorial

From Icbwiki

Jump to: navigation, search

Elementolab/ Elementolab/RRBSseeqer/


Contents

Installation

This tutorial assumes that you are in the RRBseeqer main directory (cd RRBS/). An even better way to proceed is to create an environment variable called RRBSEEQERDIR

export RRBSEEQERDIR=/path/to/RRBS  # REPLACE /path/to/RBBS by the REAL path

and then add RRBSEEQERDIR to your path:

export PATH=$RRBSEEQERDIR:$PATH

Ideally, add the above to your .bashrc file so you don't have to retype it everytime you log in.

Transform epicore methycall files into RRBSeeqer input files (.calls)

perl epicore2calls.pl methylCall_Sample_YYY.txt > methylCall_Sample_YYY.txt.calls

Optionally you can gzip the call files as most other RRBSseeqer programs can read from gzipped files):

perl epicore2calls.pl methylCall_Sample_YYY.txt | gzip > methylCall_Sample_YYY.txt.calls.gz

Detect differentially methylated CGs

./RRBSseeqer_CG -rrbs1 sample_control.calls.gz -rrbs2 sample.calls.gz | sort_column.pl 1 | sort_column_alpnum.pl 0 > cgs.txt

Options:

-rrbs1		FILE (condition 1) - THIS SHOULD BE THE CONTROL
-rrbs2		FILE (condition 2)
-test		[fisher|chi] (Statistical test to be performed)
-minfold	FLOAT (Def: 1.25)
-fdr		FLOAT (Def: 0.2)

Output has the following columns

chr
pos
numC in file 1
numC+numT in file 1
numC in file 2
numC+numT in file 2
methylation ratio at CpG as methylation file 2 / methylation file 1
Fisher pvalue
adjusted Fisher pvalue

Detect differentially methylated regions (DMR)

perl RRBSidentifyUpDownDMR.pl --metfile=cgs.txt --dmax=250 --minmetdx=0.1 --minnumcg=5 -outfile=cgs_DMR.txt

Input must come from RRBSeeqer_CG. Parameters indicate maximum distance between diff methylated CGs (--dmax), minimum number of differentially methylated CGs for a region to be reported (--minumcg) and min total methylation dx for a region (--minmetdx), with 10% used default.

The DMR output columns are as follows:

chr
pos start DMR
pos end DMR
size
num CpGs in DMR (num diff methylated CpGs at FDR=0.2 / num non diff methylated CpGs)
methylation difference (file2 - file1)   # min is controlled by -minmetdx; all CpGs in DMR go into this calculation, not only the differentially methylated CpGs

To create a bed file with all DMRs

perl DMR2bed.pl cgs_DMR.txt > cgs_DMR.bed


To create a bed file that can be uploaded to the Genome Browser, with two different tracks for hyper and hypo DMRs

perl DMR2bed2.pl cgs_DMR.txt

To get some stats on the DMRs

perl GetDMRstats.pl cgs_DMR.txt.HYPO
median = 188
average = 229.13698630137
min = 19
max = 1250

To annotate these DMRs, you can use ChIPseeqerAnnotate from our ChIPseeqer package

ChIPseeqerAnnotate --peakfile=cgs_DMR.txt.HYPO --genome=mm9 

Or use a custom Perl script to identify the genes that overlap with each DMR (refSeq is refGene.txt from the UCSC GB web site)

perl findOverlappingGenes.pl --peakfile=cgs_DMR.txt.HYPO --refgene=/Users/eug2002/PROGRAMS/ChIPseeqer/data/mm9/refSeq > cgs_DMR.txt.HYPO.overlappingGenes

Detect all covered regions (default 5 CpGs minimum, separated by less than 250bp between contiguous CpGs)

perl ~/PROGRAMS/RRBS/RRBSfindAllCoveredRegions.pl.pl --metfile=common_CpGs.txt

This script will create a common_CpGs.txt.REGIONS containing the covered regions. These regions can be annotated using ChIPseeqerAnnotate:

ChIPseeqerAnnotate --peakfile=common_CpGs.txt.REGIONS --genome=mm9

Annotate CGs

With CpG islands

./RRBSannotate -peakfile1 cgs.txt -peakfile2 DATA/hg19/CpGislands_annotation -showpeakdesc 1 -hasid1 0 -outfile cgs_CGI.txt

With RRBSannotate, you can just add annotation columns. Here's how to add shore annotation

./RRBSannotate -peakfile1 cgs_CGI.txt -peakfile2 DATA/hg19/CpGislands_annotation -shores 1 -ext2 1000  -hasid2 0  -showpeakdesc 1 -hasid1 0 -outfile cgs_CGI_shores.txt


With genes

./RRBSannotateGenes -rrbsfile cgs.txt -annotation DATA/hg19/refSeq -chrdata DATA/hg19.chrdata 

(you need to download gene annotation files first (e.g., hg18.tar.gz, hg19.tar.gz) from --> http://physiology.med.cornell.edu/faculty/elemento/lab/CS_files/)

Draw methylation across chromosomes

perl DrawRRBSmap2.pl --chrdata=DATA/hg19.chrdata --rrbsdata=cgs.txt

Make BigWig DIFFERENTIAL methylation track

Let's assume ben1_pca1.txt contains a differential methylation analysis generated by RRBSseeqer_CG above. To generate a BigWig track, make sure wigToBigWig is installed, and type something like this

RRBSmakeMethylDiffTrack -methfile ben1_pca1.txt -desc ben1_pca1 -type diff -bigwig 1 -chrdata hg19.chrdata -chromsizes hg19.chroms.sizes -s 25

This command will create a .bw track (binary format), with window size 50bp (2x25) that you can upload to your web server, then make visible to the UCSC browser using:

track type=bigWig name="ben1_pca1" description="ben1_pca1" bigDataUrl=http://physiology.med.cornell.edu/faculty/elemento/lab/files/ben1_pca1.wig.bw \ 
  visibility="full" maxHeightPixels="64:64:11" smoothingWindow="10" viewLimits="0:100" autoScale="off"

A sometimes faster way to visualize a BigWig track is to load it into IGV http://www.broadinstitute.org/igv/

Methylation levels for annotated elements

getMethylationLevelsForAnnotatedElements.pl --callfile=CpG_context_Sample_ln_cap_r.fastq.filtered.noadapt.gz_bismark.txt.calls_NM --headerhasid=0 \
--promlist=refSeq.new.u2000_d2000.PROMOTERS --minnumcg=2 --col=5

Methylation levels from scratch

 ./RRBSfindMethylationLevelForIntervals.pl --callfile=eRRBS_CB012511_50ng_myCpG.txt.calls.gz \ 
--regions=merged016_trim_CB_rep1_rep2_inter_100k.wig --hasid=1

(use hasid=1 if your interval file has identifiers eg for promoters, 0 otherwise eg for CpF islands)

It also works for differential analysis (output of RRBseeqer_CG)

./RRBSfindMethylationLevelForIntervals.pl --regions=refGene.txt.17Jan2012.oneperTSS.u2000_d2000 \
--callfile=empty_vs_sh2_21d.txt.RGYW --hasid=1 --diff=1 > empty_vs_sh2_21d.txt.RGYW.prom.diffmet


Allele-specific methylation

This analysis works from BAM files generated by Bismark. These files

Step 1: derive initial ASM calls. We'll assume that the BAM file is Sample_CB501.sorted.bam. We need a hg19.chrdata file that contains chr names and sizes. We also need a fasta file for hg19 containing all chromosomes. The fasta needs to be indexed using "samtools faidx".

$RRBSEEQERDIR/dist/RRBSalleleSpecificBAM -bamfile Sample_CB501.sorted.bam -chrdata $HOME/PROGRAMS/SNPseeqer/REFDATA/hg19.chrdata -fastafile $HOME/PROGRAMS/SNPseeqer/REFDATA/hg19/hg19.fa  > Sample_CB501.sorted.bam.ASM

Step 2: adjust for multiple testing (BH). Specify column index that contains p-values to correct.

perl $RRBSEEQERDIR/dist/p_adjust.pl Sample_CB501.sorted.bam.ASM 7 > Sample_CB501.sorted.bam.ASM.adj

Step 3: merge so we get a CG-based file

perl $RRBSEEQERDIR/dist/RRBS_ASM_merge.pl --asmfile=Sample_CB501.sorted.bam.ASM.adj --matched=0 

The output is like this

chr1	33236287	chr1/33236289/2/0/23/75/0/6.80e-23/5.91e-20	SS

which means:

chr, CpG pos,        chr/posSNP/offset_bet_SNP_and_CpG/  0/23/75/0 means for genotype 1, 0 5mC /23 non-5mC events, for genotype 2: 75 5mC / 0 non-5mC, pvalue, BH-adjusted, SS if BH<0.2


Step 4: visualize specific ASM locations. Here we will look at 1 CG at position chr1:5836437

perl $RRBSEEQERDIR/dist/RRBSshowCGSNPs_BAM.pl --bamfile=Sample_CB501.sorted.bam  --fastafile=/Users/ole2001/PROGRAMS/SNPseeqer/REFDATA/hg19/hg19.fa  --chrpos=chr1:5836437
Personal tools