Elementolab/FindClosestPeak
From Icbwiki
[edit]
FindClosestPeak
In this analysis you can find for every peak that does not overlap with promoters, the "closest distant" gene (e.g., a peak that is between 5kb and 50kb away from a gene -IMPORTANT: away from the gene body, either the TSS or the TES -).
In particular, the script:
- defines the promoters regions (choose an appropriate upstream/downstream length)
- finds the peaks that do not overlap with the promoters
- for each and every one of those non-overlapping peaks, finds the closest gene.
1.Go to the ChIPseeqer-1.0 directory
$ cd ChIPseeqer-1.0/
2. Type the command:
$ ./FindClosestPeak --targets=TF_targets.txt --lenu=2000 -lend=2000 --prefix=CLOSEST_PEAK --db=RefGene --distance=50000 --ext=5000
The following options are available:
--targets=FILE file containing genomic regions --lenu=INT length upstream of TSS --lend=INT length downstream of TSS --suffix=STR suffix for output files --db=STR can be either RefGene or AceView. Default is RefGene --ext=INT minimum distance away from the TSS/TES --distance=INT maximum distance away from the TSS/TES
IMPORTANT: Note that in the --targets option you must enter the ChIPseeqer output file.
3. See the results. The output of this process are three files with the extensions: .NOV_PEAKS and .DIST_PEAKS
- The file that ends with .CLOSEST_PEAK_GENE contains the peaks that were found to be distant from the promoters for the given distance. This file will look like this:
NM_001145276 chrY 2863111 2910546 29234 chrY:2944779-2944956 NR_001552 chrY 7627397 7629288 17936 chrY:7652223-7652533 NR_001552 chrY 7627397 7629288 25607 chrY:7659894-7660062 NM_001005852 chrY 20188622 20211697 20492 chrY:20162667-20163131 NM_032576 chrY 20217829 20227085 20139 chrY:20192145-20192691 NM_004681 chrY 21146998 21164428 11606 chrY:21129893-21130393 NM_001039567 chrY 21327341 21352306 10065 chrY:21311854-21312277 NR_001537 chrY 22154873 22165940 5037 chrY:22175976-22176166 NM_021109 chrX 12903146 12905267 18108 chrX:12878964-12880039
Each row represents a gene, whereas the columns indicate:
Transcript_name Chromosome TSS TES Distance_between_gene_and_peak Peak