Elementolab/ChIPseeqerMakeGenepartsMatrix
From Icbwiki
Back to Elementolab/ChIPseeqer_Tutorial
ChIPseeqerMakeGenepartsMatrix
To run the tools directly from any folder, you need to add the $CHIPSEEQERDIR and $CHIPSEEQERDIR/SCRIPTS to your $PATH variable.
Read How to set the CHIPSEEQERDIR variable.
With this analysis you can create gene-based matrices (one for gene promoters, one for gene exons, one for gene introns etc) that summarize the number of peaks that fall in the specific gene part, across many different peak files (TFs).
This analysis requires that ChIPseeqerAnnotate has run multiple times, one for each ChIPseq dataset, and one .gene file per dataset is created.
The script takes as input a file (TFgenesFile) with the names of .gene files (i.e., typical output of ChIPseeqerAnnotate) followed by a label.
Example of the TFgenesFile.
CBP_TF_targets_15.txt.RefGene.GP.genes CBP ETS1_TF_targets_15.txt.RefGene.GP.genes ETS1 RUNX_TF_targets_15.txt.RefGene.GP.genes RUNX
A .genes file is a gene-based matrix that shows the number of peaks that fall in a gene's promoter(P), downstream region around the TES(DW), exons(E), introns(I), intron1(I1), intron2(I2), and are distal to the gene(D) and looks like this:
TSID P DW E I I1 I2 D NM_001025603 0 1 5 4 1 0 0 NM_021107 1 0 0 0 0 4 0 NM_173803 4 0 0 0 0 0 0
1. To run the script type the command:
ChIPseeqerMakeGenepartsMatrix --TFgenesFile=TF_genes.txt
Available options are:
-TFgenesFile FILE file containing the names and labels of the gene files (regular ChIPseeqerAnnotate output)
2. See the results.
The results will be one matrix per gene part. In particular, the files that will be produced are:
TF_genes.txt.distal TF_genes.txt.dw TF_genes.txt.exons TF_genes.txt.introns TF_genes.txt.promoters TF_genes.txt.intron1 TF_genes.txt.intron2
and will look like this:
TSID CBP ETS1 RUNX NM_001025603 1 0 3 NM_021107 0 0 0 NM_173803 0 1 4 NM_139008 0 2 0
3. What else can I do next?
You could run clustering on these matrices, in order to find TFs that tend to bind in the promoters/exons/introns/etc of the same genes.
For example,
ChIPseeqerCluster --densityfile=TF_genes.txt.introns --suf=TF_genes.txt.introns.cluster -distance=2 --linkage=a
See more on ChIPseeqerCluster.
