Elementolab/ChIPseeqer Annotate

From Icbwiki

Jump to: navigation, search

ChIPseeqerAnnotate

In this analysis you can search the detected peaks for:


1.Go to the ChIPseeqer-1.0 directory

$ cd ChIPseeqer-1.0/

2. Type the command:

$ ./ChIPseeqerAnnotate --targets=TF_targets.txt --prefix=TF_targets_ANN --type=RepMasker

The following options are available:

--targets=FILE file containing genomic regions
--suffix=STR   suffix for output files
--type=STR     can be RepMasker, CpGislands or SegmentalDups

IMPORTANT: Note that in the --targets option you must enter the ChIPseeqer output file.

3. See the results. The output of this process are three files with the extensions:

_ALL.RM, .RM and .RM.stats 
_ALL.CpG, .CpG and .CpG.stats
_ALL.DUP, .DUP and .DUP.stats

for each of the three different provided annotations.

  • The files that end with _ALL.* will look like this:
chrY	2867287	2867611	0
chrY	2871627	2871971	2	chrY-2871327-2871629	chrY-2871816-2872114
chrY	2944779	2944956	1	chrY-2944529-2945113
chrY	5642923	5643407	2	chrY-5639836-5643224	chrY-5643229-5643472
chrY	6905840	6906263	0
chrY	6917898	6918357	1	chrY-6918198-6918315
chrY	6945356	6945877	0
chrY	7267607	7267819	1	chrY-7267753-7267949
chrY	7381389	7381767	2	chrY-7381323-7381478	chrY-7381493-7381672
chrY	7652223	7652533	0
chrY	7659894	7660062	1	chrY-7659990-7660187

Each row represents a detected ChIPseeqer peak, whereas the columns indicate:

Chromosome	Start_Position	End_Position	Number_of_peaks_found	[peaks_found]
  • The files that end with .RM .CpG or .DUP will look like this:
chrY	2871627	2871971	2	chrY-2871327-2871629	chrY-2871816-2872114
chrY	2944779	2944956	1	chrY-2944529-2945113
chrY	5642923	5643407	2	chrY-5639836-5643224	chrY-5643229-5643472
chrY	6917898	6918357	1	chrY-6918198-6918315
chrY	7267607	7267819	1	chrY-7267753-7267949
chrY	7381389	7381767	2	chrY-7381323-7381478	chrY-7381493-7381672
chrY	7659894	7660062	1	chrY-7659990-7660187

This file is a filtered version of the previous one: Only the peaks that overlap with repeats, CpG islands or duplications are shown (no 0 in column 4).

IMPORTANT: The RepeatMasker output files also include the Repeat Name, Class and Family information for the repeats that overlap with the peaks. Foe example:

chrY	2871627	2871971	2	chrY-2871327-2871629:AluSx3 SINE SINE	chrY-2871816-2872114:Kanga1a DNA DNA
chrY	2944779	2944956	1	chrY-2944529-2945113:L1P1 LINE LINE
chrY	5642923	5643407	2	chrY-5639836-5643224:MER83B-int LTR LTR	chrY-5643229-5643472:HUERS-P1-int LTR LTR
chrY	6917898	6918357	1	chrY-6918198-6918315:AluJr SINE SINE
chrY	7267607	7267819	1	chrY-7267753-7267949:LTR36 LTR LTR
chrY	7381389	7381767	2	chrY-7381323-7381478:AluY SINE SINE	chrY-7381493-7381672:LTR43 LTR LTR
chrY	7659894	7660062	1	chrY-7659990-7660187:MIRb SINE SINE
chrY	8463843	8464207	2	chrY-8463743-8464149:LTR2 LTR LTR	chrY-8464149-8466429:HERVE_a-int LTR LTR
chrY	9067239	9067357	1	chrY-9065637-9067652:L1M4c LINE LINE
chrY	9082527	9082798	1	chrY-9082496-9082771:AluSz SINE SINE
  • The files that end with .stats summarize statistical information. For the RepMasker option the .stats file will look like this:
 Number of peaks: 	 18814
 Number of peaks with Repeats: 	 9680
 %Repeats: 	 0.514510470925906

 Name of repeats distribution 
 MIRb: 	 967 	 (% 0.0668556415929204)
 MIR: 	 622 	 (% 0.0430033185840708)
 L2a: 	 548 	 (% 0.0378871681415929)
 L2c: 	 534 	 (% 0.0369192477876106)
 ...

 Class of repeats distribution
 SINE: 	 4814 	 (% 0.332826327433628)
 LINE: 	 4002 	 (% 0.276686946902655)
 LTR: 	 1859 	 (% 0.128525995575221)
 DNA: 	 1745 	 (% 0.12064435840708)
 Simple_repeat: 	 1124 	 (% 0.0777101769911504)
 Low_complexity: 	 679 	 (% 0.0469441371681416)
 tRNA: 	 79 	 (% 0.00546183628318584)
 Satellite: 	 42 	 (% 0.0029037610619469)

 Family of repeats distribution
 SINE: 	 4814 	 (% 0.332826327433628)
 LINE: 	 4002 	 (% 0.276686946902655)
 LTR: 	 1859 	 (% 0.128525995575221)
 DNA: 	 1745 	 (% 0.12064435840708)
 Simple_repeat: 	 1124 	 (% 0.0777101769911504)
 Low_complexity: 	 679 	 (% 0.0469441371681416)
 tRNA: 	 79 	 (% 0.00546183628318584)
 Satellite: 	 42 	 (% 0.0029037610619469)

whereas for the CpGislands and the SegmentalDups options the .stats file will look like this:

Number of peaks			18814 
Number of Duplicates(/CpGs)	272
%Duplicates(/CpGs)		0.0144573190177527