Elementolab/Peak Finding with Promoter Arrays

From Icbwiki

Jump to: navigation, search


The find_peaks_and_threshold_in_pair_files.pl can be used to find peaks in Nimblegen promoter array ChIP-chip datasets.

It is available with the other script through SVN.

svn checkout --username XXX --password YYY https://pbtech-vc.med.cornell.edu/public/svn/elementolab/PERL_SCRIPTS/trunk PERL_SCRIPTS/

(XXX and YYY are PBtech username and password, resp; XXX can also be 'guest' and YYY any email address)

svn checkout https://pbtech-vc.med.cornell.edu/public/svn/elementolab/PERL_MODULES/trunk PERL_MODULES/

This script requires R to be installed (R packages evd and ismev are required and have to be installed prior to running the script); it also requires the Sets.pm and Table.pm modules.

It assumes that the 532 and 635 pair files have the same prefix, eg 128180_635.pair and 128180_532.pair

The prefix is the only argument to the script:

perl find_peaks_and_threshold_in_pair_files.pl 128180 | tee 128180_peaks.txt

If you have replicate experiments, you can combine them using the combine_multiple_pair_peak_files.pl script.

perl combine_multiple_pair_peak_files.pl 0.1 128180_peaks.txt 123181_peaks.txt | tee 128180_128181_peaks_t0.1.txt

Next step is to annotate the peaks, ie determine which genes are close to these peaks:

translate_column_using_table_column.pl --table=128180_128181_peaks_t0.1.txt --dict=HSAP_annotation_24APR2010.txt  --multicol=1 --add=1  | awk '{ if ($3 == 1) print $1 "\t" $2 }' | sort | uniq > 128180_128181_peaks_t0.1_ORF.txt

The HSAP_annotation_24APR2010.txt comes from a recent RefSeq annotation file; but it can easily be re-generated for the newest annotation using:

perl refseq_annotate_promoter_array_regions.pl HSAP_promoter_regions_unmasked.seq.matched_to_hg18 refGene.txt.24APR2010  > HSAP_annotation_24APR2010.txt
Personal tools