Elementolab/ChIPseeqer use
From Icbwiki
Back to Elementolab/
Back to Elementolab/ChIPseeqer_Tutorial
ChIPseeqer
This step assumes that mapped read files have been split into one file per chromosome (in directories named CHIP/ and INPUT/ - other directory names can be used). You are then ready to run ChIPseeqer and find peaks in your ChIP-seq data.
To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.
1. Type the command:
ChIPseeqer.bin -chipdir CHIP/ -inputdir INPUT/ -t 15 -fold_t 2 -format eland -chrdata DATA/hg18.chrdata -outfile TF_targets.txt
The following options are available:
-chipdir DIR (MANDATORY) Define the directory that contains the ChIP reads (one file per chromosome, named reads.chr1, reads.chr2, etc).
-inputdir DIR Define the directory that contains the input DNA reads.
-outfile STR (MANDATORY) Define the output file (provide full path).
-chrdata STR (MANDATORY) To run for different organisms, point to files:
DATA/hg19.chrdata for human hg19,
DATA/hg18.chrdata for human hg18,
DATA/mm10.chrdata for mouse mm10,
DATA/mm9.chrdata for mouse mm9,
DATA/rn4.chrdata for rat rn4,
DATA/dm3.chrdata for drosophila or
DATA/sacser.chrdata for Saccharomyces cerevisiae
-t FLOAT Define the significance negative log p-value [ratio] threshold for peaks. Thus, 15 means 10^-15. Default is 15.
-fold_t FLOAT Define how much higher ChIP peaks should be compared to input DNA peaks. Default is 2.0.
-fraglen INT Define length of the fragments whose extremities have been sequenced. Default is 170bp.
-readlen INT Define length of reads (e.g., 36bp, 50bp). Default is 36bp.
-format STR Define format of the read files: bam, sam, eland, exteland, export, bed. Default is eland.
-minlen INT Define minimum peak width. Default is 100bp.
-mindist INT Define mininum distance between peaks (merge subpeaks otherwise). Default is 100bp.
-countreads INT When set to 1, creates a file that contains the numbers of reads per position. Default is 0.
-uniquereads INT When set to 1, removes clonal reads (i.e., when several identical reads map to the same exact position in the genome). Default is 1.
-minpeakheight INT Define minimum peak height (reads count at the peak summit). Default is 0 (no minimum read count - the statistics decide).
2. See the results.
Open the output file (e.g., TF_targets.txt). The results will look like this:
chr18 341 508 -7.307 7.979 346 12 3 167 424 78 chr4 7899 8032 -6.3998 6.5465 7934 9 26.1 133 7965 31 chr1 10970 11406 -20.9998 17.1502 11121 34 34.6 436 11188 67 chr8 10991 11239 -7.9287 8.0375 11107 12 46.6 248 11115 8 chr1 11712 11937 -11.8597 11.2954 11813 16 44.7 225 11824 11 chr2 18437 18778 -6.3246 7.0375 18437 10 0 341 18607 170
Each row represents a peak location, whereas the columns indicate:
Chromosome : chromosome name Start_Position : the first (genomic) coordinate of the peak End_Position : the second (genomic) coordinate of the peak Avg_p-value : average log p-value of the nucleotides in the normalized peak region Score : score estimated as the average ChIP reads/length of peak (minus the average INPUT reads/length of peak - if INPUT is available) Posmaxpeakheight : the position of the maximum height of the peak Maxpeakheight : the maximum peak height (in reads) RelPosMaxPeakHeight(%) : the relative position of the maximum height of the peak, e.g., 50% means the highest point is at the middle of the peak Peak_Size : the size of the peak (in bp) Mid_point : the middle position of the peak Summit_dist_from_mid : the distance of the maximum height from the middle of the peak
3. What can I do next ?
- Associate the peaks with genes and transcripts. This is explained in Elementolab/ChIPseeqer_Annotate
- Find pathways associated with these peaks. This is explained in Elementolab/ChIPseeqer_2iPAGE
- Identify regulatory elements associated with the peaks (and not present in random peaks). This is explained in Elementolab/ChIPseeqer2FIRE.
4. Peak detection (-gene extraction) for multiple thresholds
