Elementolab/ChIPseeqerSummaryPromoters
From Icbwiki
| Revision as of 18:59, 7 October 2010 Eug2002 (Talk | contribs) ← Previous diff |
Revision as of 18:34, 18 October 2010 Eug2002 (Talk | contribs) Next diff → |
||
| Line 23: | Line 23: | ||
| --lend=INT length downstream of TSS | --lend=INT length downstream of TSS | ||
| --suffix=STR suffix for output files | --suffix=STR suffix for output files | ||
| - | --genome=STR can be hg18, mm9 or dm3 | + | --genome=STR can be '''hg18 for human''' |
| + | '''mm9 for mouse,''' | ||
| + | '''dm3 for drosophila''', or | ||
| + | '''sacser for Saccharomyces cerevisiae''' | ||
| --db=STR can be either '''RefGene''' or '''AceView'''. Default is RefGene (used only with genome=hg18) | --db=STR can be either '''RefGene''' or '''AceView'''. Default is RefGene (used only with genome=hg18) | ||
Revision as of 18:34, 18 October 2010
Back to Elementolab/ChIPseeqer_Tutorial
ChIPseeqerSummaryPromoters
In this analysis you can create a file that contains a gene-based annotation of the detected peak locations.
In particular, the script:
- finds if/which promoter regions of the genes in RefGene overlap with the detected peaks
- extracts the NM trascript names for each of these genes from RefGene
- extracts the ORF name and description from RefLink
To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.
1. Type the command:
ChIPseeqerSummaryPromoters --targets=TF_targets.txt --lenu=2000 --lend=1000 --prefix=TF_targets_SUM --db=RefGene
The following options are available:
--targets=FILE file containing genomic regions
--lenu=INT length upstream of TSS
--lend=INT length downstream of TSS
--suffix=STR suffix for output files
--genome=STR can be hg18 for human
mm9 for mouse,
dm3 for drosophila, or
sacser for Saccharomyces cerevisiae
--db=STR can be either RefGene or AceView. Default is RefGene (used only with genome=hg18)
IMPORTANT: Note that in the --targets option you must enter the ChIPseeqer output file.
2. See the results. The output of this process are three files with the extensions: _ALL.NM, .NM and .SUM
- The file that ends with _ALL.NM will look like this:
NM_201266 chr2 206254468 206255967 0 NR_027685 chr17 5262684 5264183 2 chr17-5262760-5263300 chr17-5264120-5264368 NM_001145290 chr11 124437222 124438721 1 chr11-124437939-124438230 NM_018087 chr1 54076264 54077763 1 chr1-54076648-54077169 NM_016252 chr2 32434599 32436098 0 NM_001012415 chr9 137730696 137732195 0 NM_181886 chr4 103967744 103969243 1 chr4-103968310-103968772 NM_001097595 chrX 52533130 52534629 0
Each row represents a transcript from RefGene, whereas the columns indicate:
TranscriptID Chromosome Transcription_Start_Position Transcription_End_Position Number_of_peaks_found [peaks_found]
- The file that ends with .NM will look like this:
NM_001079559 chr11 62250898 62252397 1 chr11-62252383-62252996 NM_001143965 chr6 13436250 13437749 1 chr6-13436046-13436609 NM_024800 chr3 132227383 132228882 2 chr3-132227552-132227676 chr3-132228056-132228356 NM_003262 chr3 171166273 171167772 3 chr3-171166199-171166351 chr3-171166471-171166714 chr3-171166829-171167327 NM_001098536 chr12 6830545 6832044 1 chr12-6831636-6832665 NM_014712 chr16 30875115 30876614 1 chr16-30876052-30876909 NM_018465 chr9 5427361 5428860 2 chr9-5427196-5427574 chr9-5428733-5429129 NM_001135662 chr1 204010734 204012233 1 chr1-204010920-204011505
This file is a filtered version of the previous one: Only the transcripts with detected peaks are shown (no 0 in column 5).
- The file that ends with .SUM (only in the RefGene case) will look like this:
FBXO38 F-box protein 38 isoform b chr5 147742738 147744237 1 chr5-147743301-147743418 RPL27 ribosomal protein L27 chr17 38402971 38404470 1 chr17-38403540-38403898 ARID5B AT rich interactive domain 5B (MRF1-like) chr10 63330448 63331947 3 chr10-63329398-63331241 chr10-63331378-63331644 chr10-63331804-63333783 INSIG1 insulin induced gene 1 isoform 1 chr7 154719475 154720974 1 chr7-154719658-154719817 ARID1B AT rich interactive domain 1B (SWI1-like) chr6 157139777 157141276 1 chr6-157140112-157140358 USP5 ubiquitin specific peptidase 5 isoform 2 chr12 6830551 6832050 1 chr12-6831636-6832665 CDCA7 cell division cycle associated 7 isoform 1 chr2 173926806 173928305 1 chr2-173927314-173927479 FZD1 frizzled 1 precursor chr7 90730718 90732217 1 chr7-90731027-90731494
Each row represents a gene form RefLink, whereas the columns indicate:
GeneID GeneDescription Chromosome Transcription_Start_Position Transcription_End_Position Number_of_peaks_found [peaks_found]
