Elementolab/ChIPseeqerSummaryPromoters
From Icbwiki
Back to Elementolab/ChIPseeqer_Tutorial
ChIPseeqerSummaryPromoters
In this analysis you can create a file that contains a gene-based annotation of the detected peak locations.
In particular, the script:
- finds if/which promoter regions of the genes in refSeq overlap with the detected peaks
- extracts the NM trascript names for each of these genes from refSeq
- extracts the ORF name and description from RefLink
To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.
1. Type the command:
ChIPseeqerSummaryPromoters --targets=TF_targets.txt --lenu=2000 --lend=1000 --prefix=TF_targets_SUM --db=refSeq
The following options are available:
--peakfile=FILE File with ChIP-seq peaks.
--lenu=INT Define the length upstream of TSS.
--lend=INT Define the length downstream of TSS.
--suffix=STR Define a suffix for output files.
--genome=STR hg19 (human)
hg18 (human)
mm10 (mouse)
mm9 (mouse)
rn4 (rat)
dm3 (drosophila)
sacser (Saccharomyces cerevisiae)
--db=STR refSeq (available for hg19, hg18, mm10, mm9, rn4, dm3)
AceView (for hg19, hg18, mm9)
Ensembl (for hg19, hg18, mm10, mm9, rn4, dm3)
UCSCGenes (for hg19, hg18, mm10, mm9).
Default is refSeq.
--verbose=INT Verbose mode. Default is 0.
IMPORTANT: Note that in the --targets option you must enter the ChIPseeqer output file.
2. See the results. The output of this process are three files with the extensions: _ALL.NM, .NM and .SUM
- The file that ends with _ALL.NM will look like this:
NM_201266 chr2 206254468 206255967 0 NR_027685 chr17 5262684 5264183 2 chr17-5262760-5263300 chr17-5264120-5264368 NM_001145290 chr11 124437222 124438721 1 chr11-124437939-124438230 NM_018087 chr1 54076264 54077763 1 chr1-54076648-54077169 NM_016252 chr2 32434599 32436098 0 NM_001012415 chr9 137730696 137732195 0 NM_181886 chr4 103967744 103969243 1 chr4-103968310-103968772 NM_001097595 chrX 52533130 52534629 0
Each row represents a transcript from refSeq, whereas the columns indicate:
TranscriptID Chromosome Transcription_Start_Position Transcription_End_Position Number_of_peaks_found [peaks_found]
- The file that ends with .NM will look like this:
NM_001079559 chr11 62250898 62252397 1 chr11-62252383-62252996 NM_001143965 chr6 13436250 13437749 1 chr6-13436046-13436609 NM_024800 chr3 132227383 132228882 2 chr3-132227552-132227676 chr3-132228056-132228356 NM_003262 chr3 171166273 171167772 3 chr3-171166199-171166351 chr3-171166471-171166714 chr3-171166829-171167327 NM_001098536 chr12 6830545 6832044 1 chr12-6831636-6832665 NM_014712 chr16 30875115 30876614 1 chr16-30876052-30876909 NM_018465 chr9 5427361 5428860 2 chr9-5427196-5427574 chr9-5428733-5429129 NM_001135662 chr1 204010734 204012233 1 chr1-204010920-204011505
This file is a filtered version of the previous one: Only the transcripts with detected peaks are shown (no 0 in column 5).
- The file that ends with .SUM (only in the refSeq case) will look like this:
FBXO38 F-box protein 38 isoform b chr5 147742738 147744237 1 chr5-147743301-147743418 RPL27 ribosomal protein L27 chr17 38402971 38404470 1 chr17-38403540-38403898 ARID5B AT rich interactive domain 5B (MRF1-like) chr10 63330448 63331947 3 chr10-63329398-63331241 chr10-63331378-63331644 chr10-63331804-63333783 INSIG1 insulin induced gene 1 isoform 1 chr7 154719475 154720974 1 chr7-154719658-154719817 ARID1B AT rich interactive domain 1B (SWI1-like) chr6 157139777 157141276 1 chr6-157140112-157140358 USP5 ubiquitin specific peptidase 5 isoform 2 chr12 6830551 6832050 1 chr12-6831636-6832665 CDCA7 cell division cycle associated 7 isoform 1 chr2 173926806 173928305 1 chr2-173927314-173927479 FZD1 frizzled 1 precursor chr7 90730718 90732217 1 chr7-90731027-90731494
Each row represents a gene form RefLink, whereas the columns indicate:
GeneID GeneDescription Chromosome Transcription_Start_Position Transcription_End_Position Number_of_peaks_found [peaks_found]
