Elementolab/ChIPseeqerMakeGenepartsMatrix

From Icbwiki

Jump to: navigation, search

Back to Elementolab/ChIPseeqer_Tutorial

ChIPseeqerMakeGenepartsMatrix

To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable.

Read How to set the CHIPSEEQERDIR variable.

With this analysis you can create gene-based matrices (one for gene promoters, one for gene exons, one for gene introns etc) that summarize the number of peaks that fall in the specific gene part, across many different peak files (TFs).

This analysis requires that ChIPseeqerAnnotate has run multiple times, one for each ChIPseq dataset, and one .gene file per dataset is created.

The script takes as input a file (TFgenesFile) with the names of .gene files (i.e., typical output of ChIPseeqerAnnotate) followed by a label.

Example of the TFgenesFile.

CBP_TF_targets_15.txt.RefGene.GP.genes	      CBP
ETS1_TF_targets_15.txt.RefGene.GP.genes      ETS1
RUNX_TF_targets_15.txt.RefGene.GP.genes      RUNX

A .genes file is a gene-based matrix that shows the number of peaks that fall in a gene's promoter(P), downstream region around the TES(DW), exons(E), introns(I), intron1(I1), intron2(I2), and are distal to the gene(D) and looks like this:

TSID	        P	DW	E	I	I1	I2	D
NM_001025603	0	1	5	4	1	0	0
NM_021107	1	0	0	0	0	4	0
NM_173803	4	0	0	0	0	0	0

1. To run the script type the command:

ChIPseeqerMakeGenepartsMatrix  --TFgenesFile=TF_genes.txt

Available options are:

-TFgenesFile FILE   file containing the names and labels of the gene files (regular ChIPseeqerAnnotate output)

2. See the results.

The results will be one matrix per gene part. In particular, the files that will be produced are:

TF_genes.txt.distal
TF_genes.txt.dw
TF_genes.txt.exons
TF_genes.txt.introns
TF_genes.txt.promoters
TF_genes.txt.intron1
TF_genes.txt.intron2

and will look like this:

TSID           CBP	ETS1	RUNX	
NM_001025603	1	0	3
NM_021107	0	0	0
NM_173803	0	1	4
NM_139008       0	2	0


3. What else can I do next?

You could run clustering on these matrices, in order to find TFs that tend to bind in the promoters/exons/introns/etc of the same genes.

For example,

ChIPseeqerCluster --densityfile=TF_genes.txt.introns --suf=TF_genes.txt.introns.cluster -distance=2 --linkage=a 

See more on ChIPseeqerCluster.

Personal tools