In this analysis you can find for every peak that does not overlap with promoters, the "closest distant" gene (e.g., a peak that is between 5kb and 50kb away from a gene -IMPORTANT: away from the gene body, either the TSS or the TES -).

In particular, the script:

  • defines the promoters regions (choose an appropriate upstream/downstream length)
  • finds the peaks that do not overlap with the promoters
  • for each and every one of those non-overlapping peaks, finds the closest gene.

1.Go to the ChIPseeqer-1.0 directory

$ cd ChIPseeqer-1.0/

2. Type the command:

$ ./FindClosestPeak --targets=TF_targets.txt --lenu=2000 -lend=2000 --prefix=CLOSEST_PEAK --db=RefGene --distance=50000 --ext=5000

The following options are available:

--targets=FILE file containing genomic regions
--lenu=INT     length upstream of TSS
--lend=INT     length downstream of TSS
--suffix=STR   suffix for output files
--db=STR       can be either RefGene or AceView. Default is RefGene
--ext=INT      minimum distance away from the TSS/TES
--distance=INT maximum distance away from the TSS/TES

IMPORTANT: Note that in the --targets option you must enter the ChIPseeqer output file.

3. See the results. The output of this process are three files with the extensions: .NOV_PEAKS and .DIST_PEAKS

  • The file that ends with .CLOSEST_PEAK_GENE contains the peaks that were found to be distant from the promoters for the given distance. This file will look like this:
NM_001145276	chrY	2863111	2910546	29234	chrY:2944779-2944956
NR_001552	chrY	7627397	7629288	17936	chrY:7652223-7652533
NR_001552	chrY	7627397	7629288	25607	chrY:7659894-7660062
NM_001005852	chrY	20188622	20211697	20492	chrY:20162667-20163131
NM_032576	chrY	20217829	20227085	20139	chrY:20192145-20192691
NM_004681	chrY	21146998	21164428	11606	chrY:21129893-21130393
NM_001039567	chrY	21327341	21352306	10065	chrY:21311854-21312277
NR_001537	chrY	22154873	22165940	5037	chrY:22175976-22176166
NM_021109	chrX	12903146	12905267	18108	chrX:12878964-12880039

Each row represents a gene, whereas the columns indicate:

Transcript_name Chromosome	TSS	TES	Distance_between_gene_and_peak	Peak
