Elementolab/ChIPseeqer Evaluating Conservation Of Distal Peaks

From Icbwiki

Jump to: navigation, search

Back to Elementolab/ChIPseeqer_Tutorial

Are distal peaks more conserved than expected by chance ? this is the question that this page is trying to answer.

  • 1. If you haven't done it yet, run CS on ChIP-seq data to find peaks
ChIPseeqer.bin -chipdir BCL6CHIP -inputdir BCL6INPUT -t 15 -fold_t 2 -outfile bcl6peaks.txt
  • 2. Find distal peaks (>5kb away from any transcripts)
ChIPseeqerFindDistalPeaks --targets=bcl6peaks.txt --db=RefGene --mindistaway=5000

This will create a file called bcl6peaks.txt.RefGene.DISTPEAKS, which contain the distal peaks

Important: this is not shown here, but these peaks can be overlapped with H3K4me1 peaks using CompareIntervals (H3K4me3 overlapping-peaks can also be removed using CompareIntervals)

  • 3. Extract 2kb regions centered on the summit of these peaks and store into a new file
extract_regions_around_peak_summits.pl --peakfile=bcl6peaks.RefGene.DISTPEAKS --w=2000
  • 4. Calculate conservation profiles for each of these regions, and at the same time, calculate conservation profiles of regions of the same sizes randomly picked around the peak regions
ChIPseeqer2Cons.bin -regions bcl6peaks.txt.RefGene.DISTPEAKS.2kbaround \
  -consdir DATA/CONSERVATION -format gzscores -category placental -method phastCons\
  -make_rand 1 -randist 50000 -outrandom bcl6peaks.txt.RefGene.DISTPEAKS.2kbaround.randcons \
  -show_profiles 1 -outfile bcl6peaks.txt.RefGene.DISTPEAKS.2kbaround.cons  -outepsmap bcl6peaks.txt.RefGene.DISTPEAKS.2kbaround.cons.eps

Important: this command line assumes that PhastCons conservation scores are present in the DATA/CONSERVATION folder (they can be in a different folder).

Conservation scores should be downloaded from there: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons44way/placentalMammals/

The easiest way to download the files is to use lftp.

lftp http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons44way/placentalMammals/
mget *.gz

he phastCons score files are very big (~2.6Gb), and get even bigger if you unzip them. DON'T UNZIP THE FILES. ChIPseeqer2Cons lets you access the zipped or unzipped conservation files.

  • 5. See the results.

The output also consists of a .eps file that contains a 2D plot for the conservation score (averaged per column/bin). The plot will look like this:

Average Conservation Profile example


Load conservation score in R, average them, and plot them

To do so, we will use the R software:


Then the following commands in R:

mCP  <- read.table("bcl6peaks.txt.RefGene.DISTPEAKS.2kbaround.cons", row.names=1)
mRP  <- read.table("bcl6peaks.txt.RefGene.DISTPEAKS.2kbaround.randcons", row.names=1)
nCP  <- dim(mCP)[1]
nRP  <- dim(mRP)[1]
nb   <- dim(mCP)[2]
ymin <-  min(colMeans(mCP))
ymax <-  max(colMeans(mCP))
plot(10*(1:nb)-1000, colMeans(mCP), type="l", lwd=5, col="red", ylab="Average Conservation Level (Placental Mammals)",  
  xlab="Distance to binding peak summit (bp)", ylim=c(ymin, ymax)) 
plot(10*(1:nb)-1000, colMeans(mRP), type="l", lwd=5, col="green", ylab="", xlab="", ylim=c(ymin, ymax))
cleg <- sprintf("%d distal peaks", nCP)
rleg <- sprintf("%d random peaks", nRP)
legend("topleft", c(cleg, rleg), col=c("red", "green"), lty=1, lwd=5)

Here is what the plot would look like:

Average Conservation Profile example

Personal tools