Elementolab/ChIPseeqerNongenicAnnotate

From Icbwiki

Jump to: navigation, search

Back to Elementolab/ChIPseeqer_Tutorial

ChIPseeqerNongenicAnnotate

Ask which peaks overlap with

IMPORTANT: Note that this script is currently available only for human

To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.

1. Type the command:

ChIPseeqerNongenicAnnotate --peakfile=TF_targets.txt [ --prefix=TF_targets_ANN --type=RepMasker ]

The following options are available:

--peakfile=FILE File with ChIP-seq peaks.
--type=STR      Define the type of annotation: RepMasker, CpGislands, SegmentalDups, Encode. Default is RepMasker.
--genome=STR    hg19 (human)
                hg18 (human)
                mm10 (mouse)
                mm9 (mouse)
                rn4 (rat)
                dm3 (drosophila)
                sacser (Saccharomyces cerevisiae).
--TFname=STR    (type=Encode) Define the TF name you want to look for (e.g., --TFname=K562CTCF). Default is \"all\".
--iterations=INT   (type=Encode) Define the number of background random regions, used to estimate the z-score of the overlap.
--verbose=INT   Verbose mode. Default is 0.

IMPORTANT: Note that in the --peakfile option you must enter a peak file in ChIPseeqer output format.

2. See the results. The output of this process are three files with the extensions:

_ALL.RM, .RM and .RM.stats 
_ALL.CpG, .CpG and .CpG.stats
_ALL.DUP, .DUP and .DUP.stats

for each of the three different provided annotations.

  • The files that end with _ALL.* will look like this:
chrY	2867287	2867611	0
chrY	2871627	2871971	2	chrY-2871327-2871629	chrY-2871816-2872114
chrY	2944779	2944956	1	chrY-2944529-2945113
chrY	5642923	5643407	2	chrY-5639836-5643224	chrY-5643229-5643472
chrY	6905840	6906263	0
chrY	6917898	6918357	1	chrY-6918198-6918315
chrY	6945356	6945877	0
chrY	7267607	7267819	1	chrY-7267753-7267949
chrY	7381389	7381767	2	chrY-7381323-7381478	chrY-7381493-7381672
chrY	7652223	7652533	0
chrY	7659894	7660062	1	chrY-7659990-7660187

Each row represents a detected ChIPseeqer peak, whereas the columns indicate:

Chromosome	Start_Position	End_Position	Number_of_peaks_found	[peaks_found]
  • The files that end with .RM .CpG or .DUP will look like this:
chrY	2871627	2871971	2	chrY-2871327-2871629	chrY-2871816-2872114
chrY	2944779	2944956	1	chrY-2944529-2945113
chrY	5642923	5643407	2	chrY-5639836-5643224	chrY-5643229-5643472
chrY	6917898	6918357	1	chrY-6918198-6918315
chrY	7267607	7267819	1	chrY-7267753-7267949
chrY	7381389	7381767	2	chrY-7381323-7381478	chrY-7381493-7381672
chrY	7659894	7660062	1	chrY-7659990-7660187

This file is a filtered version of the previous one: Only the peaks that overlap with repeats, CpG islands or duplications are shown (no 0 in column 4).

IMPORTANT: The RepeatMasker output files also include the Repeat Name, Class and Family information for the repeats that overlap with the peaks. For example:

chrY	2871627	2871971	2	chrY-2871327-2871629:AluSx3 SINE SINE	chrY-2871816-2872114:Kanga1a DNA DNA
chrY	2944779	2944956	1	chrY-2944529-2945113:L1P1 LINE LINE
chrY	5642923	5643407	2	chrY-5639836-5643224:MER83B-int LTR LTR	chrY-5643229-5643472:HUERS-P1-int LTR LTR
  • The files that end with .stats summarize statistical information. For the GenParts option the .stats file will look like this:

For the RepMasker option the .stats file will look like this:

 Number of peaks: 	 18814
 Number of peaks with Repeats: 	 9680
 %Repeats: 	 0.514510470925906

 Name of repeats distribution 
 MIRb: 	 967 	 (% 0.0668556415929204)
 MIR: 	 622 	 (% 0.0430033185840708)
 L2a: 	 548 	 (% 0.0378871681415929)
 L2c: 	 534 	 (% 0.0369192477876106)
 ...

 Class of repeats distribution
 SINE: 	 4814 	 (% 0.332826327433628)
 LINE: 	 4002 	 (% 0.276686946902655)
 LTR: 	 1859 	 (% 0.128525995575221)
 DNA: 	 1745 	 (% 0.12064435840708)
 Simple_repeat: 	 1124 	 (% 0.0777101769911504)
 Low_complexity: 	 679 	 (% 0.0469441371681416)
 tRNA: 	 79 	 (% 0.00546183628318584)
 Satellite: 	 42 	 (% 0.0029037610619469)

 Family of repeats distribution
 SINE: 	 4814 	 (% 0.332826327433628)
 LINE: 	 4002 	 (% 0.276686946902655)
 LTR: 	 1859 	 (% 0.128525995575221)
 DNA: 	 1745 	 (% 0.12064435840708)
 Simple_repeat: 	 1124 	 (% 0.0777101769911504)
 Low_complexity: 	 679 	 (% 0.0469441371681416)
 tRNA: 	 79 	 (% 0.00546183628318584)
 Satellite: 	 42 	 (% 0.0029037610619469)

For the CpGislands and the SegmentalDups options the .stats file will look like this:

Number of peaks			18814 
Number of Duplicates(/CpGs)	272
%Duplicates(/CpGs)		0.0144573190177527
Personal tools