PAGE Tutorial

From Icbwiki

Jump to: navigation, search
  • get PAGE from svn

From our subversion repository:

svn co --username=guest --password=email@email.com https://pbtech-vc.med.cornell.edu/public/svn/elementolab/PAGE/trunk PAGE/

(Elemento lab members should use their PBtech identifiers instead of the guest ones)


  • Or download it (less up-to-date version)
http://physiology.med.cornell.edu/faculty/elemento/lab/files/PAGE.zip
unzip PAGE.zip
  • To install
cd PAGE
make clean
make
export PAGEDIR=`pwd`          # (if that does not work try: setenv PAGEDIR `pwd`)
export PATH=$PATH:$PAGEDIR    #  (here too, try setenv PATH $PATH:$PAGEDIR if export failed)

Important: you just need to type 'make' once (this step compiles iPAGE from source code into an executable program).

It is also probably best if you don't type export etc every time you log in. To do so, simply add these two lines to your ~/.bashrc file

export PAGEDIR=/path/to/PAGE  # REPLACE /path/to/PAGE by the real PAGE directory eg /home/ole2001/PAGE
export PATH=$PATH:$PAGEDIR    #  


  • Running
page.pl --expfile=FILE --pathways=STR --exptype=[discrete|continuous] [ --cattypes=P --minr=0.0 ]

where

--expfile=FILE 

is an expression/profile file, same format as in FIRE, ie two columns, one for the gene name, one for the expression measures. Here is an example of a [discrete expression profile], here is a [continuous one]

--pathways=STR 

is species+annotation, most likely human_go_orf (uses gene names like TP53, and curated GO categories, no electronic annotation).

Additional sources of annotations:

--pathways=human_go_orf (all GO categories, including electronic annotation)
--pathways=biocarta
--pathways=kegg
--pathways=HPRD_interactions
--pathways=staudt_genesets (curated B-cell-related gene sets from Lou Staudt's lab)
--exptype=STR 

describes whether the expression profile is discrete or continuous.

--cattypes=STR 

specifies which part of the GO to use, ie F = molecular function, C = cellular component, P = biological process, default is P only but can specify all by using "F,C,P"

--minr=FLOAT 

specifies how independent the categories should be, set to 0 by default, meaning all informative categories/pathways will be reported. --minr=5 will remove a lot of redundant categories.

--catmaxcount=INT

specify the maximum number of genes in a category. The default is 300 (Ok for GO, less optimal for Staudt gene sets).

  • Results

PAGE creates an expfile_PAGE directory and put the results in it. The pdf file is the main result file. pvmatrix.txt contains the data used to draw the pdf.


  • Additional options

To restrict the number of evaluated pathways to a list (1 line per pathway)

--pathwaylist=FILE

The number of bins for continuous expression values is set to #genes/100 (100 genes per bin). But that can be changed using

--ebins=INT

To estimate the number of false positive pathways in your list

--randomize=INT 

where INT specifies how many random runs to execute (do 3 minimum)

By default PAGE overwrites everything in $expfile_PAGE/. However, you may want to run PAGE for different annotation categories (e.g. KEGG, BioCarta). To change the name of your output file, use the -suffix option. So you can do something like:

page.pl --expfile=FILE --pathways=human_go_orf -suffix=GO
page.pl --expfile=FILE --pathways=kegg -suffix=KEGG
  • get list of genes in bin/cluster that belong to a given pathway (after PAGE analysis)
find_genes_in_bin_and_pathway.pl --expfile=FILE --bin=INT --pathway=STR --species=STR

# example:
find_genes_in_bin_and_pathway.pl --expfile=yourexpfile.txt --bin=1 --pathway=hsa04662 --species=kegg
Personal tools