Elementolab/ChIPseeqerCons

From Icbwiki

(Redirected from Elementolab/ChIPseeqer Cons)
Jump to: navigation, search

Back to Elementolab/ChIPseeqer_Tutorial

ChIPseeqerCons

This analysis estimates conservation scores for a given set of peaks.

To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.

1. The first step of this analysis is to download conservation scores for the human genome.

The ChIPseeqerCons script currently supports phastCons and PhyloP scores, which can be downloaded from the Genome Browser at http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons44way/ and http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phyloP44way/.

We recommend using conservation scores calculated from placental mammalian genomes (which are not too distant from human). They can be downloaded at: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons44way/placentalMammals/

The easiest way to download the files is to use lftp.

lftp http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons44way/placentalMammals/
mget *.gz
quit

IMPORTANT: the phastCons score files are very big (~2.6Gb), and get even bigger if you unzip them. DON'T UNZIP THE FILES. ChIPseeqerCons lets you access the zipped or unzipped conservation files. Reading directly from the zipped files takes ~10 times longer, but will require much less disk space.

2. To run ChIPseeqerCons, type the command:

ChIPseeqerCons --targets=TF_targets.txt --consdir=placentalMammals/ \
  --outfile=TF_targets_conservation.txt \
  --format=gzscores --category=placental --outepsmap=TF_targets_conservation.txt.eps

The following options are available:

--targets=FILE      file containing genomic regions
--consdir=DIR       directory that contains the conservation scores (one file per chromosome, e.g. chr9.phastCons44way.placental.wigFix.gz etc),
--genome=STR      hg18, hg19, or mm9
--chrdata=FILE      to run for different organisms, point to files:
                              DATA/hg18.chrdata for human hg18,
                              DATA/hg19.chrdata for human hg19, 
                              DATA/mm9.chrdata for mouse
--format=STR        gzscores (default), scores or nucleosomes
--outfile=STR       output file for the conservation of peaks
--outrandom=STR     output file for the conservation of random regions
--outepsmap=STR     output file for the .eps 2D plot
--category=STR      can be either placental or primates. Default is placental.
--method=STR	    can be either phastCons or phyloP. Default is phastCons.
--make_rand=INT     if set to 1, extracts random regions and estimates their consrvation
--randist=INT       distance of random regions from the peaks. Used when make_rand=1
--show_profiles=INT if set to 0, gives average/min/max conservation per peak, 
                    if set to 1, gives average conservation for each $window_size bp window of the peak
--around_summit=INT if set to 1, extracts regions around peak summit and computes conservation. Used when show_profiles=1
--window_size=INT   Sets the window size for estimating the avg conservation profile. Default is 10bp. Used when show_profiles=1
--distance=INT      distance around the peak summit. Used when around_summit=1
--showalldata=INT   adds conservation info at end of input lines if set to 1
--score_thres=FLOAT Sets the threshold for the conservation score: peaks with score>score_thres are printed in the .filter file.
                    Default is set to 0.5 

3. See the results.

The --show_profiles=0 output consists of the average, minimum and maximum phastCons conservation score for each region:

chr-start-end	avgcons	mincons	maxcons
chrX 3070597 3070914	0.009	0.000	0.072
chrX 6666849 6667467	0.075	0.000	1.000
chrX 6705746 6706023	0.004	0.000	0.071
chrX 6987397 6987900	0.010	0.000	0.227
chrX 7011802 7012388	0.053	0.000	0.786
chrX 7017002 7017327	0.020	0.000	0.208
chrX 9269622 9270263	0.356	0.000	1.000
...

The --show_profiles=1 output shows the average conservation for each window of n nulceotides (n can be defined by option window_size):

chr7-133847623-133849623	0.0	0.1	0.2	0.3	0.3	0.2	0.3	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.1	0.2	0.3	0.1	0.0	0.0	0.0	0.0	0.1	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.8	0.8	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.1	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
chr7-99516386-99518386	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.1	0.0	0.1	0.4	0.0	0.0	0.0	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.6	0.5	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.1	0.8	0.3	0.2	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.1	0.3	0.0	0.1	0.0	0.0	0.3	0.8	0.0	0.8	1.0	0.4	0.1	0.4	0.5	0.8	1.0	0.9	1.0	0.9	0.1	1.0	0.5	0.0	0.0	0.2	0.7	0.8	0.6	0.6	0.5	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.1	0.1	0.0	0.0	0.0	0.1	0.0	0.5	0.9	0.6	0.1	0.0	0.0	0.0	0.0	0.0	0.1	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.1	0.1	0.1	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.1	0.2	0.1	0.0	0.0	0.1	0.1	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.1	0.1	0.1	0.1	0.1	0.0	0.1	0.1

When --show_profiles=1 the output also consists of a .eps file that contains a 2D plot for the conservation score (averaged per column/bin). The plot will look like this:

Average Conservation Profile example

Personal tools