Elementolab/ChIPseeqer use

From Icbwiki

Jump to: navigation, search

Back to Elementolab/

Back to Elementolab/ChIPseeqer_Tutorial


This step assumes that mapped read files have been split into one file per chromosome (in directories named CHIP/ and INPUT/ - other directory names can be used). You are then ready to run ChIPseeqer and find peaks in your ChIP-seq data.

To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.

1. Type the command:

ChIPseeqer.bin -chipdir CHIP/ -inputdir INPUT/ -t 15 -fold_t 2 -format eland -chrdata DATA/hg18.chrdata -outfile TF_targets.txt  

The following options are available:

-chipdir DIR      (MANDATORY) Define the directory that contains the ChIP reads (one file per chromosome, named reads.chr1, reads.chr2, etc).
-inputdir DIR     Define the directory that contains the input DNA reads.
-outfile STR      (MANDATORY) Define the output file (provide full path).
-chrdata STR      (MANDATORY) To run for different organisms, point to files:
                  DATA/hg19.chrdata for human hg19,                   
                  DATA/hg18.chrdata for human hg18, 
                  DATA/mm10.chrdata for mouse mm10,                    
                  DATA/mm9.chrdata for mouse mm9,
                  DATA/rn4.chrdata for rat rn4,
                  DATA/dm3.chrdata for drosophila or 
                  DATA/sacser.chrdata for Saccharomyces cerevisiae
-t FLOAT          Define the significance negative log p-value [ratio] threshold for peaks. Thus, 15 means 10^-15. Default is 15.
-fold_t FLOAT     Define how much higher ChIP peaks should be compared to input DNA peaks. Default is 2.0.
-fraglen INT      Define length of the fragments whose extremities have been sequenced. Default is 170bp.
-readlen INT      Define length of reads (e.g., 36bp, 50bp). Default is 36bp.
-format STR       Define format of the read files: bam, sam, eland, exteland, export, bed. Default is eland.
-minlen INT       Define minimum peak width. Default is 100bp.
-mindist INT      Define mininum distance between peaks (merge subpeaks otherwise). Default is 100bp. 
-countreads INT When set to 1, creates a file that contains the numbers of reads per position. Default is 0.
-uniquereads INT  When set to 1, removes clonal reads (i.e., when several identical reads map to the same exact position in the genome). Default is 1.
-minpeakheight INT Define minimum peak height (reads count at the peak summit). Default is 0 (no minimum read count  - the statistics decide).

2. See the results.

Open the output file (e.g., TF_targets.txt). The results will look like this:

chr18	341	508	-7.307	7.979	346	12	3	167	424	78
chr4	7899	8032	-6.3998	6.5465	7934	9	26.1	133	7965	31
chr1	10970	11406	-20.9998	17.1502	11121	34	34.6	436	11188	67
chr8	10991	11239	-7.9287	8.0375	11107	12	46.6	248	11115	8
chr1	11712	11937	-11.8597	11.2954	11813	16	44.7	225	11824	11
chr2	18437	18778	-6.3246	7.0375	18437	10	0	341	18607	170

Each row represents a peak location, whereas the columns indicate:

Chromosome		: chromosome name
Start_Position		: the first (genomic) coordinate of the peak
End_Position		: the second (genomic) coordinate of the peak
Avg_p-value		: average log p-value of the nucleotides in the normalized peak region
Score			: score estimated as the average ChIP reads/length of peak (minus the average INPUT reads/length of peak - if INPUT is available)
Posmaxpeakheight   	: the position of the maximum height of the peak
Maxpeakheight		: the maximum peak height (in reads)
RelPosMaxPeakHeight(%)	: the relative position of the maximum height of the peak, e.g., 50% means the highest point is at the middle of the peak
Peak_Size		: the size of the peak (in bp)
Mid_point		: the middle position of the peak
Summit_dist_from_mid 	: the distance of the maximum height from the middle of the peak

3. What can I do next ?

4. Peak detection (-gene extraction) for multiple thresholds

Personal tools