Piana

From Icbwiki

Jump to: navigation, search

Contents

Background

The icb maintains local installations of the Protein Interactions And Network Analysis (PIANA) database. The MySQL database resides on piana.med.cornell.edu and contains a locally populated version along with the sample "limited" data distribution.

Configuration

The piana installation uses python 2.5 from softlib. At the present time, the required python modules for piana are only available on machines that are running the 64-bit version or RedHat Enterprise Linux 5. descartes.med.cornell.edu, rodin.med.cornell.edu and piana.med.cornell.edu are machines known to work with piana.

A bash script file is available will define environment variables and appropriated paths for using python with piana. The script is located at ~piana/bin/piana-setup.sh. When sourced, the following variables are added to your environment:

 PIANA_DIR - Directory of the piana source distribution
 PIANA_DBHOST - The host of the piana database instance (piana.med.cornell.edu)
 PIANA_DBNAME - The name of the piana database (piana or piana_limited)
 PIANA_DBUSER - Username to use when connecting to the piana database
 PIANA_DBPASS - Password to use when connecting to the piana database

Additionally, the value of PYTHONPATH will be set or modified to include the appropriate python packages required to interface with piana.

Example

The following is a sample session illustrating how to connect to the piana database and print a "reference card" using the piana python api.

 marko@descartes ~
 $ source ~piana/bin/piana-setup.sh
marko@descartes ~ $ cd /home/piana/piana/code/execs/
marko@descartes /home/piana/piana/code/execs $ python piana.py --piana-dbhost=${PIANA_DBHOST} --piana-dbname=${PIANA_DBNAME} --piana-dbuser=${PIANA_DBUSER} --piana-dbpass=${PIANA_DBPASS} --print-reference-card
Opening connection to Piana database piana on server piana.med.cornell.edu (use_secondary_db=no) -------------------- PIANA reference card -------------------- Database:piana Server:piana.med.cornell.edu
THIRD-PARTY DATABASES THAT WERE USED TO CREATE THIS DATABASE: Database biogrid version 2.0.39 was parsed on 2008-4-21 (contains ['protein-protein interactions']) Database kog-kyva=gb version June 06 2003 was parsed on 2008-4-21 (contains []) Database go version April 14 2008 was parsed on 2008-4-21 (contains ['protein attributes'])
...

Database contents

The following table describes the contents of the piana database as of April 21st, 2008. The "section" column refers to the are in the populate piana documentation. Versions of the raw data sets are shown for both the data loaded by the icb, and the "limited" dataset provided for reference.

Section Name Local Version Limited Version Description
4.1.1 ncbi_taxonomy April 18 2008 April 03 2007 NCBI Species information
4.1.2 swissprot April 08 2008 April 03 2007 Uniprot manually curated database
4.1.2 trembl April 08 2008 April 03 2007 Uniprot complete not manually curated database
4.1.3 genpept April 18 2008 (release 165) April 2007 (release 158) NCBI genbank database
4.1.4 nr April 16 2008 April 2007 NCBI non-redundant database
4.1.5 ncbi2pdb_pdbaa April 16 2008 April 09 2007 Correspondence between pdb and gi identifiers (pdbaa)
4.1.6 ncbi2uniprot swissprot April 16 2008 April 09 2007 Correspondences between uniprot and gi identifiers (swissprot)
4.1.7 pdbsprotec March 12 2008 January 15 2007 Correspondences between pdb and uniprot identifiers (mapping.txt)
4.1.8 gene April 18 2008 April 19 2007 Correspondences between NCBI accession number and geneID identifiers (gene2accession)
4.1.9 gene_info April 18 2008 April 19 2007 Gene NCBI database (gene_info)
4.1.10 refseq March 13 2008 (release 28) release 22 NCBI RefSeq Database
4.1.11 cog-myva=gb September 26 2002 September 26 2002 Cluster of orthologous genes (myva=gb)
4.1.11 cog March 05 2003 March 05 2003 Cluster of orthologous genes (whog)
4.1.11 kog-kyva=gb June 06 2003 June 06 2003 Eucariotic Cluster of orthologous genes (kyva=gb)
4.1.11 kog July 21 2003 July 21 2003 Eucariotic Cluster of orthologous genes (kog)
4.1.12 scop Release 1.73 Release 1.71 Structural Classification of Proteins (SCOP)
4.1.13 go April 14 2008 April 2007 Gene Ontology (GO)
4.2.1 dip April 07 2008 February 19 2007 Database of Interacting Proteins (DIP)
4.2.2 mips June 01 2004 n/a The Mammalian Protein-Protein Interaction Database (MIPS)
4.2.3 hprd September 01 2007 (Release_7) n/a Human Protein Reference Database (HPRD}
4.2.4 bind April 15 2008 n/a Biomolecular Interaction Network Database (BOND)
4.2.5 intact March 29 2008 n/a IntAct protein-protein interaction database
4.2.6 biogrid 2.0.39 n/a The BioGRID's curated set of physical and genetic interactions
4.2.7 mint April 08 2008 n/a The Molecular INTeraction database (MINT)
4.2.8 STRING TBD TBD Predictions of protein interactions and COG (Cluster of Orthologous Genes) information for proteins (http://string.embl.de/)
4.2.9 ori January 16 2007 January 16 2007 Predicted interactions from distant structure/sequence patterns (interact.dat)

Miscellaneous

It was necessary modify some of the original piana source distribution (version 1.47) in order to be able to load the data locally. Most changes dealt with changes to the formats of the raw data loaded into the database. Other changes were related to database access (i.e., passwords, etc) and consistency issues. Additionally, scripts were written to automate the data loading process. The updated source code and scripts can be found in the ICB subversion repository at https://pbtech-vc.med.cornell.edu/public/svn/icb/3rdparty/piana (If you do not have a local icb account, login as "guest" using your email address for the password).

Personal tools