Elementolab/ChIPseeqerSummaryPromoters
From Icbwiki
Back to Elementolab/ChIPseeqer_Tutorial
ChIPseeqerSummaryPromoters
In this analysis you can create a file that contains a gene-based annotation of the detected peak locations.
In particular, the script:
- finds if/which promoter regions of the genes in RefGene overlap with the detected peaks
- extracts the NM trascript names for each of these genes from RefGene
- extracts the ORF name and description from RefLink
To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.
1. Type the command:
ChIPseeqerSummaryPromoters --targets=TF_targets.txt --lenu=2000 --lend=1000 --prefix=TF_targets_SUM --db=RefGene
The following options are available:
--targets=FILE file containing genomic regions
--lenu=INT length upstream of TSS
--lend=INT length downstream of TSS
--suffix=STR suffix for output files
--genome=STR can be hg18 (human),
mm9 (mouse),
dm3 (drosophila), or
sacser (for Saccharomyces cerevisiae)
--db=STR can be RefGene (available for hg18, mm9, dm3),
AceView (for hg18, mm9),
Ensembl (for hg18, mm9, dm3)
UCSCGenes (for hg18, mm9).
Default is RefGene.
IMPORTANT: Note that in the --targets option you must enter the ChIPseeqer output file.
2. See the results. The output of this process are three files with the extensions: _ALL.NM, .NM and .SUM
- The file that ends with _ALL.NM will look like this:
NM_201266 chr2 206254468 206255967 0 NR_027685 chr17 5262684 5264183 2 chr17-5262760-5263300 chr17-5264120-5264368 NM_001145290 chr11 124437222 124438721 1 chr11-124437939-124438230 NM_018087 chr1 54076264 54077763 1 chr1-54076648-54077169 NM_016252 chr2 32434599 32436098 0 NM_001012415 chr9 137730696 137732195 0 NM_181886 chr4 103967744 103969243 1 chr4-103968310-103968772 NM_001097595 chrX 52533130 52534629 0
Each row represents a transcript from RefGene, whereas the columns indicate:
TranscriptID Chromosome Transcription_Start_Position Transcription_End_Position Number_of_peaks_found [peaks_found]
- The file that ends with .NM will look like this:
NM_001079559 chr11 62250898 62252397 1 chr11-62252383-62252996 NM_001143965 chr6 13436250 13437749 1 chr6-13436046-13436609 NM_024800 chr3 132227383 132228882 2 chr3-132227552-132227676 chr3-132228056-132228356 NM_003262 chr3 171166273 171167772 3 chr3-171166199-171166351 chr3-171166471-171166714 chr3-171166829-171167327 NM_001098536 chr12 6830545 6832044 1 chr12-6831636-6832665 NM_014712 chr16 30875115 30876614 1 chr16-30876052-30876909 NM_018465 chr9 5427361 5428860 2 chr9-5427196-5427574 chr9-5428733-5429129 NM_001135662 chr1 204010734 204012233 1 chr1-204010920-204011505
This file is a filtered version of the previous one: Only the transcripts with detected peaks are shown (no 0 in column 5).
- The file that ends with .SUM (only in the RefGene case) will look like this:
FBXO38 F-box protein 38 isoform b chr5 147742738 147744237 1 chr5-147743301-147743418 RPL27 ribosomal protein L27 chr17 38402971 38404470 1 chr17-38403540-38403898 ARID5B AT rich interactive domain 5B (MRF1-like) chr10 63330448 63331947 3 chr10-63329398-63331241 chr10-63331378-63331644 chr10-63331804-63333783 INSIG1 insulin induced gene 1 isoform 1 chr7 154719475 154720974 1 chr7-154719658-154719817 ARID1B AT rich interactive domain 1B (SWI1-like) chr6 157139777 157141276 1 chr6-157140112-157140358 USP5 ubiquitin specific peptidase 5 isoform 2 chr12 6830551 6832050 1 chr12-6831636-6832665 CDCA7 cell division cycle associated 7 isoform 1 chr2 173926806 173928305 1 chr2-173927314-173927479 FZD1 frizzled 1 precursor chr7 90730718 90732217 1 chr7-90731027-90731494
Each row represents a gene form RefLink, whereas the columns indicate:
GeneID GeneDescription Chromosome Transcription_Start_Position Transcription_End_Position Number_of_peaks_found [peaks_found]
