Elementolab/CompareIntervals

From Icbwiki

Jump to: navigation, search

Back to Elementolab/ChIPseeqer_Tutorial

COMPARE INTERVALS

This analysis includes comparing two ChIPseeqer outputs (i.e., two lists of ChIP-seq peaks) and finding the peaks that overlap. The program that runs the analysis uses ultra-efficient data structures and algorithms (e.g., interval trees) to speed up the comparison process.

By default, the program outputs whether each peak in the first file (peakfile1) overlap with any peaks in the second file (peakfile2).

To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH. Read How to set the CHIPSEEQERDIR variable.

1. To run the program, type the command:

CompareIntervals -peakfile1 peaks1.txt -peakfile2 peaks2.txt

(peaks1.txt and peaks2.txt should be in ChIPseeqer output-type format in order to find overlapping peaks between ChIPseq runs). The output will resemble this:

chr14	34660637 	34660840	1
chr14	34695128 	34695345	0
chr14	34878903 	34879044	0
chr14	34908929 	34909071	0
chr14	34942098 	34942405	1
chr14	34942685 	34943077	2

Many options are available. To output the list of peaks in peakfile2 that overlap with peakfile1 peaks, use the -show_ov_int 1 option

CompareIntervals -peakfile1 peaks1.txt -peakfile2 peaks2.txt -show_ov_int 1
 
chr14	34660637 	34660840	1	chr14-34660293-34662279
chr14	34695128 	34695345	0
chr14	34878903 	34879044	0
chr14	34908929 	34909071	0
chr14	34942098 	34942405	1	chr14-34942145-34942875
chr14	34942685 	34943077	2	chr14-34942145-34942875	chr14-34943026-34943658

To output only the peaks in peakfile1 that overlap with any peaks in peakfile2, use -output peaklist

CompareIntervals -peakfile1 peaks1.txt -peakfile2 peaks2.txt -output peaklist

chr14	34660637 	34660840
chr14	34942098 	34942405
chr14	34942685 	34943077

(see how this output contains less lines than the outputs above)


The following options are also available:

-chrdata FILE          This file contains all the needed information for the genome used. Can be DATA/hg18.chrdata, DATA/mm9.chrdata or DATA/dm3.chrdata. Default is hg18.chrdata 
-output STR            This option accepts two types of argument: peaklist or profile.
-ovtype STR            This option accepts two types of argument: AND and ANDNOT. AND is the default behavior; ANDNOT, combined with -output peaklist, would show the peaks in peakfile1 that DO NOT overlap with any peaks in peakfile2   
-show_ov_int STR       When set to 1, shows the overlapping peaks
-showpeakdesc STR      When set to 1, shows all the peaks information
-outfile FILE          The file that will contain the output (instead of the screen)


IMPORTANT: To go a step further and retrieve the genes associated with peaks obtained from this analysis (e.g. genes whose promoters overlapped any peaks, use [ChIPseeqer Summary])

Other useful tools

Personal tools