Elementolab/ChIPseeqer Evaluating Conservation Of Distal Peaks
Back to Elementolab/ChIPseeqer_Tutorial
Are distal peaks more conserved than expected by chance ? this is the question that this page is trying to answer.
- 1. If you haven't done it yet, run CS on ChIP-seq data to find peaks
ChIPseeqer.bin -chipdir BCL6CHIP -inputdir BCL6INPUT -t 15 -fold_t 2 -outfile bcl6peaks.txt
- 2. Find distal peaks (>5kb away from any transcripts)
ChIPseeqerFindDistalPeaks --targets=bcl6peaks.txt --db=RefGene --mindistaway=5000
This will create a file called bcl6peaks.txt.RefGene.DISTPEAKS, which contain the distal peaks
Important: this is not shown here, but these peaks can be overlapped with H3K4me1 peaks using CompareIntervals (H3K4me3 overlapping-peaks can also be removed using CompareIntervals)
- 3. Extract 2kb regions centered on the summit of these peaks and store into a new file
extract_regions_around_peak_summits.pl --peakfile=bcl6peaks.RefGene.DISTPEAKS --w=2000
- 4. Calculate conservation profiles for each of these regions, and at the same time, calculate conservation profiles of regions of the same sizes randomly picked around the peak regions
ChIPseeqer2Cons.bin -regions bcl6peaks.txt.RefGene.DISTPEAKS.2kbaround \ -consdir DATA/CONSERVATION -format gzscores -category placental -method phastCons\ -make_rand 1 -randist 50000 -outrandom bcl6peaks.txt.RefGene.DISTPEAKS.2kbaround.randcons \ -show_profiles 1 -outfile bcl6peaks.txt.RefGene.DISTPEAKS.2kbaround.cons -outepsmap bcl6peaks.txt.RefGene.DISTPEAKS.2kbaround.cons.eps
Important: this command line assumes that PhastCons conservation scores are present in the DATA/CONSERVATION folder (they can be in a different folder).
Conservation scores should be downloaded from there: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons44way/placentalMammals/
The easiest way to download the files is to use lftp.
lftp http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons44way/placentalMammals/ mget *.gz quit
he phastCons score files are very big (~2.6Gb), and get even bigger if you unzip them. DON'T UNZIP THE FILES. ChIPseeqer2Cons lets you access the zipped or unzipped conservation files.
- 5. See the results.
The output also consists of a .eps file that contains a 2D plot for the conservation score (averaged per column/bin). The plot will look like this:
Load conservation score in R, average them, and plot them
To do so, we will use the R software:
Then the following commands in R:
mCP <- read.table("bcl6peaks.txt.RefGene.DISTPEAKS.2kbaround.cons", row.names=1) mRP <- read.table("bcl6peaks.txt.RefGene.DISTPEAKS.2kbaround.randcons", row.names=1) nCP <- dim(mCP) nRP <- dim(mRP) nb <- dim(mCP) ymin <- min(colMeans(mCP)) ymax <- max(colMeans(mCP)) par(cex=1.3) plot(10*(1:nb)-1000, colMeans(mCP), type="l", lwd=5, col="red", ylab="Average Conservation Level (Placental Mammals)", xlab="Distance to binding peak summit (bp)", ylim=c(ymin, ymax)) par(new=T) plot(10*(1:nb)-1000, colMeans(mRP), type="l", lwd=5, col="green", ylab="", xlab="", ylim=c(ymin, ymax)) cleg <- sprintf("%d distal peaks", nCP) rleg <- sprintf("%d random peaks", nRP) legend("topleft", c(cleg, rleg), col=c("red", "green"), lty=1, lwd=5)
Here is what the plot would look like: