Elementolab/ChIPseeqerSummaryPromoters

From Icbwiki

Revision as of 20:02, 8 December 2010; view current revision
←Older revision | Newer revision→
Jump to: navigation, search

Back to Elementolab/ChIPseeqer_Tutorial

ChIPseeqerSummaryPromoters

In this analysis you can create a file that contains a gene-based annotation of the detected peak locations.

In particular, the script:

  • finds if/which promoter regions of the genes in RefGene overlap with the detected peaks
  • extracts the NM trascript names for each of these genes from RefGene
  • extracts the ORF name and description from RefLink

To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.

1. Type the command:

ChIPseeqerSummaryPromoters --targets=TF_targets.txt --lenu=2000 --lend=1000 --prefix=TF_targets_SUM --db=RefGene

The following options are available:

--targets=FILE file containing genomic regions
--lenu=INT     length upstream of TSS
--lend=INT     length downstream of TSS
--suffix=STR   suffix for output files
--genome=STR   can be hg18 (human),
               mm9 (mouse),
               dm3 (drosophila), or
               sacser (for Saccharomyces cerevisiae)
--db=STR       can be RefGene (available for hg18, mm9, dm3), 
               AceView (for hg18, mm9), 
               Ensembl (for hg18, mm9, dm3)
               UCSCGenes (for hg18, mm9). 
               Default is RefGene.

IMPORTANT: Note that in the --targets option you must enter the ChIPseeqer output file.

2. See the results. The output of this process are three files with the extensions: _ALL.NM, .NM and .SUM

  • The file that ends with _ALL.NM will look like this:
NM_201266	chr2	206254468	206255967	0
NR_027685	chr17	5262684		5264183		2	chr17-5262760-5263300	chr17-5264120-5264368
NM_001145290	chr11	124437222	124438721	1	chr11-124437939-124438230
NM_018087	chr1	54076264	54077763	1	chr1-54076648-54077169
NM_016252	chr2	32434599	32436098	0
NM_001012415	chr9	137730696	137732195	0
NM_181886	chr4	103967744	103969243	1	chr4-103968310-103968772
NM_001097595	chrX	52533130	52534629	0

Each row represents a transcript from RefGene, whereas the columns indicate:

TranscriptID	Chromosome	Transcription_Start_Position	Transcription_End_Position	Number_of_peaks_found	[peaks_found]
  • The file that ends with .NM will look like this:
NM_001079559	chr11	62250898	62252397	1	chr11-62252383-62252996
NM_001143965	chr6	13436250	13437749	1	chr6-13436046-13436609
NM_024800	chr3	132227383	132228882	2	chr3-132227552-132227676	chr3-132228056-132228356
NM_003262	chr3	171166273	171167772	3	chr3-171166199-171166351	chr3-171166471-171166714	chr3-171166829-171167327
NM_001098536	chr12	6830545		6832044		1	chr12-6831636-6832665
NM_014712	chr16	30875115	30876614	1	chr16-30876052-30876909
NM_018465	chr9	5427361		5428860		2	chr9-5427196-5427574	chr9-5428733-5429129
NM_001135662	chr1	204010734	204012233	1	chr1-204010920-204011505

This file is a filtered version of the previous one: Only the transcripts with detected peaks are shown (no 0 in column 5).

  • The file that ends with .SUM (only in the RefGene case) will look like this:
FBXO38	F-box protein 38 isoform b			chr5	147742738	147744237	1	chr5-147743301-147743418
RPL27	ribosomal protein L27				chr17	38402971	38404470	1	chr17-38403540-38403898
ARID5B	AT rich interactive domain 5B (MRF1-like)	chr10	63330448	63331947	3	chr10-63329398-63331241	chr10-63331378-63331644	 chr10-63331804-63333783
INSIG1	insulin induced gene 1 isoform 1		chr7	154719475	154720974	1	chr7-154719658-154719817
ARID1B	AT rich interactive domain 1B (SWI1-like)	chr6	157139777	157141276	1	chr6-157140112-157140358
USP5	ubiquitin specific peptidase 5 isoform 2	chr12	6830551		6832050		1	chr12-6831636-6832665
CDCA7	cell division cycle associated 7 isoform 1	chr2	173926806	173928305	1	chr2-173927314-173927479
FZD1	frizzled 1 precursor				chr7	90730718	90732217	1	chr7-90731027-90731494

Each row represents a gene form RefLink, whereas the columns indicate:

GeneID	GeneDescription	Chromosome	Transcription_Start_Position	Transcription_End_Position	Number_of_peaks_found	[peaks_found]
Personal tools