Elementolab/ChIPseeqerSummaryPromoters

From Icbwiki

Jump to: navigation, search

Back to Elementolab/ChIPseeqer_Tutorial

ChIPseeqerSummaryPromoters

In this analysis you can create a file that contains a gene-based annotation of the detected peak locations.

In particular, the script:

  • finds if/which promoter regions of the genes in refSeq overlap with the detected peaks
  • extracts the NM trascript names for each of these genes from refSeq
  • extracts the ORF name and description from RefLink

To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable. Read How to set the CHIPSEEQERDIR variable.

1. Type the command:

ChIPseeqerSummaryPromoters --targets=TF_targets.txt --lenu=2000 --lend=1000 --prefix=TF_targets_SUM --db=refSeq

The following options are available:

--peakfile=FILE File with ChIP-seq peaks.
--lenu=INT      Define the length upstream of TSS.
--lend=INT      Define the length downstream of TSS.
--suffix=STR    Define a suffix for output files.
--genome=STR    hg19 (human)
                hg18 (human)
                mm10 (mouse)
                mm9 (mouse)
                rn4 (rat)
                dm3 (drosophila)
                sacser (Saccharomyces cerevisiae)
--db=STR        refSeq (available for hg19, hg18, mm10, mm9, rn4, dm3)
                AceView (for hg19, hg18, mm9)
                Ensembl (for hg19, hg18, mm10, mm9, rn4, dm3)
                UCSCGenes (for hg19, hg18, mm10, mm9).
                Default is refSeq.
--verbose=INT   Verbose mode. Default is 0.

IMPORTANT: Note that in the --targets option you must enter the ChIPseeqer output file.

2. See the results. The output of this process are three files with the extensions: _ALL.NM, .NM and .SUM

  • The file that ends with _ALL.NM will look like this:
NM_201266	chr2	206254468	206255967	0
NR_027685	chr17	5262684		5264183		2	chr17-5262760-5263300	chr17-5264120-5264368
NM_001145290	chr11	124437222	124438721	1	chr11-124437939-124438230
NM_018087	chr1	54076264	54077763	1	chr1-54076648-54077169
NM_016252	chr2	32434599	32436098	0
NM_001012415	chr9	137730696	137732195	0
NM_181886	chr4	103967744	103969243	1	chr4-103968310-103968772
NM_001097595	chrX	52533130	52534629	0

Each row represents a transcript from refSeq, whereas the columns indicate:

TranscriptID	Chromosome	Transcription_Start_Position	Transcription_End_Position	Number_of_peaks_found	[peaks_found]
  • The file that ends with .NM will look like this:
NM_001079559	chr11	62250898	62252397	1	chr11-62252383-62252996
NM_001143965	chr6	13436250	13437749	1	chr6-13436046-13436609
NM_024800	chr3	132227383	132228882	2	chr3-132227552-132227676	chr3-132228056-132228356
NM_003262	chr3	171166273	171167772	3	chr3-171166199-171166351	chr3-171166471-171166714	chr3-171166829-171167327
NM_001098536	chr12	6830545		6832044		1	chr12-6831636-6832665
NM_014712	chr16	30875115	30876614	1	chr16-30876052-30876909
NM_018465	chr9	5427361		5428860		2	chr9-5427196-5427574	chr9-5428733-5429129
NM_001135662	chr1	204010734	204012233	1	chr1-204010920-204011505

This file is a filtered version of the previous one: Only the transcripts with detected peaks are shown (no 0 in column 5).

  • The file that ends with .SUM (only in the refSeq case) will look like this:
FBXO38	F-box protein 38 isoform b			chr5	147742738	147744237	1	chr5-147743301-147743418
RPL27	ribosomal protein L27				chr17	38402971	38404470	1	chr17-38403540-38403898
ARID5B	AT rich interactive domain 5B (MRF1-like)	chr10	63330448	63331947	3	chr10-63329398-63331241	chr10-63331378-63331644	 chr10-63331804-63333783
INSIG1	insulin induced gene 1 isoform 1		chr7	154719475	154720974	1	chr7-154719658-154719817
ARID1B	AT rich interactive domain 1B (SWI1-like)	chr6	157139777	157141276	1	chr6-157140112-157140358
USP5	ubiquitin specific peptidase 5 isoform 2	chr12	6830551		6832050		1	chr12-6831636-6832665
CDCA7	cell division cycle associated 7 isoform 1	chr2	173926806	173928305	1	chr2-173927314-173927479
FZD1	frizzled 1 precursor				chr7	90730718	90732217	1	chr7-90731027-90731494

Each row represents a gene form RefLink, whereas the columns indicate:

GeneID	GeneDescription	Chromosome	Transcription_Start_Position	Transcription_End_Position	Number_of_peaks_found	[peaks_found]
Personal tools