Piana
From Icbwiki
Revision as of 23:41, 8 May 2008 Marko (Talk | contribs) (→Miscellaneous) ← Previous diff |
Current revision Marko (Talk | contribs) (→Database contents) |
||
Line 1: | Line 1: | ||
+ | == Background == | ||
The icb maintains local installations of the Protein Interactions And Network Analysis ([http://sbi.imim.es/piana/ PIANA]) database. The MySQL database resides on piana.med.cornell.edu and contains a locally populated version along with the sample "limited" data distribution. | The icb maintains local installations of the Protein Interactions And Network Analysis ([http://sbi.imim.es/piana/ PIANA]) database. The MySQL database resides on piana.med.cornell.edu and contains a locally populated version along with the sample "limited" data distribution. | ||
Line 159: | Line 160: | ||
| June 01 2004 | | June 01 2004 | ||
| n/a | | n/a | ||
- | | The Mammalian Protein-Protein Interaction Database [http://mips.gsf.de/proj/ppi/ MIPS] | + | | The Mammalian Protein-Protein Interaction Database ([http://mips.gsf.de/proj/ppi/ MIPS]) |
|- | |- | ||
| 4.2.3 | | 4.2.3 | ||
Line 189: | Line 190: | ||
| April 08 2008 | | April 08 2008 | ||
| n/a | | n/a | ||
- | | The Molecular INTeraction database ([ftp://mint.bio.uniroma2.it/pub/release/MITAB/ MINT] | + | | The Molecular INTeraction database ([ftp://mint.bio.uniroma2.it/pub/release/MITAB/ MINT]) |
|- | |- | ||
| 4.2.8 | | 4.2.8 | ||
+ | | STRING | ||
+ | | TBD | ||
+ | | TBD | ||
+ | | Predictions of protein interactions and COG (Cluster of Orthologous Genes) information for proteins ([http://string.embl.de/ http://string.embl.de/]) | ||
+ | |- | ||
+ | | 4.2.9 | ||
| ori | | ori | ||
| January 16 2007 | | January 16 2007 | ||
| January 16 2007 | | January 16 2007 | ||
- | | Predicted interactions from distant structure/sequence patterns [http://sbi.imim.es/piana#db interact.dat] | + | | Predicted interactions from distant structure/sequence patterns ([http://sbi.imim.es/piana#db interact.dat]) |
|} | |} | ||
== Miscellaneous == | == Miscellaneous == | ||
- | It was necessary modify some of the original piana source distribution (version 1.47) in order to be able to load the data locally. Most changes dealt with changes to the formats of the raw data loaded into the database. Other changes were related to database access (i.e., passwords, etc) and consistency issues. Additionally, scripts were written to automate the data loading process. The updated source code and scripts can be found in the ICB subversion repository at [https://pbtech-vc.med.cornell.edu/public/svn/icb/3rdparty/piana https://pbtech-vc.med.cornell.edu/public/svn/icb/3rdparty/piana] | + | It was necessary modify some of the original piana source distribution (version 1.47) in order to be able to load the data locally. Most changes dealt with changes to the formats of the raw data loaded into the database. Other changes were related to database access (i.e., passwords, etc) and consistency issues. Additionally, scripts were written to automate the data loading process. The updated source code and scripts can be found in the ICB subversion repository at [https://pbtech-vc.med.cornell.edu/public/svn/icb/3rdparty/piana https://pbtech-vc.med.cornell.edu/public/svn/icb/3rdparty/piana] (If you do not have a local icb account, login as "guest" using your email address for the password). |
Current revision
Contents |
Background
The icb maintains local installations of the Protein Interactions And Network Analysis (PIANA) database. The MySQL database resides on piana.med.cornell.edu and contains a locally populated version along with the sample "limited" data distribution.
Configuration
The piana installation uses python 2.5 from softlib. At the present time, the required python modules for piana are only available on machines that are running the 64-bit version or RedHat Enterprise Linux 5. descartes.med.cornell.edu, rodin.med.cornell.edu and piana.med.cornell.edu are machines known to work with piana.
A bash script file is available will define environment variables and appropriated paths for using python with piana. The script is located at ~piana/bin/piana-setup.sh. When sourced, the following variables are added to your environment:
PIANA_DIR - Directory of the piana source distribution PIANA_DBHOST - The host of the piana database instance (piana.med.cornell.edu) PIANA_DBNAME - The name of the piana database (piana or piana_limited) PIANA_DBUSER - Username to use when connecting to the piana database PIANA_DBPASS - Password to use when connecting to the piana database
Additionally, the value of PYTHONPATH will be set or modified to include the appropriate python packages required to interface with piana.
Example
The following is a sample session illustrating how to connect to the piana database and print a "reference card" using the piana python api.
marko@descartes ~ $ source ~piana/bin/piana-setup.sh
marko@descartes ~ $ cd /home/piana/piana/code/execs/
marko@descartes /home/piana/piana/code/execs $ python piana.py --piana-dbhost=${PIANA_DBHOST} --piana-dbname=${PIANA_DBNAME} --piana-dbuser=${PIANA_DBUSER} --piana-dbpass=${PIANA_DBPASS} --print-reference-card
Opening connection to Piana database piana on server piana.med.cornell.edu (use_secondary_db=no) -------------------- PIANA reference card -------------------- Database:piana Server:piana.med.cornell.edu
THIRD-PARTY DATABASES THAT WERE USED TO CREATE THIS DATABASE: Database biogrid version 2.0.39 was parsed on 2008-4-21 (contains ['protein-protein interactions']) Database kog-kyva=gb version June 06 2003 was parsed on 2008-4-21 (contains []) Database go version April 14 2008 was parsed on 2008-4-21 (contains ['protein attributes'])
...
Database contents
The following table describes the contents of the piana database as of April 21st, 2008. The "section" column refers to the are in the populate piana documentation. Versions of the raw data sets are shown for both the data loaded by the icb, and the "limited" dataset provided for reference.
Section | Name | Local Version | Limited Version | Description |
---|---|---|---|---|
4.1.1 | ncbi_taxonomy | April 18 2008 | April 03 2007 | NCBI Species information |
4.1.2 | swissprot | April 08 2008 | April 03 2007 | Uniprot manually curated database |
4.1.2 | trembl | April 08 2008 | April 03 2007 | Uniprot complete not manually curated database |
4.1.3 | genpept | April 18 2008 (release 165) | April 2007 (release 158) | NCBI genbank database |
4.1.4 | nr | April 16 2008 | April 2007 | NCBI non-redundant database |
4.1.5 | ncbi2pdb_pdbaa | April 16 2008 | April 09 2007 | Correspondence between pdb and gi identifiers (pdbaa) |
4.1.6 | ncbi2uniprot swissprot | April 16 2008 | April 09 2007 | Correspondences between uniprot and gi identifiers (swissprot) |
4.1.7 | pdbsprotec | March 12 2008 | January 15 2007 | Correspondences between pdb and uniprot identifiers (mapping.txt) |
4.1.8 | gene | April 18 2008 | April 19 2007 | Correspondences between NCBI accession number and geneID identifiers (gene2accession) |
4.1.9 | gene_info | April 18 2008 | April 19 2007 | Gene NCBI database (gene_info) |
4.1.10 | refseq | March 13 2008 (release 28) | release 22 | NCBI RefSeq Database |
4.1.11 | cog-myva=gb | September 26 2002 | September 26 2002 | Cluster of orthologous genes (myva=gb) |
4.1.11 | cog | March 05 2003 | March 05 2003 | Cluster of orthologous genes (whog) |
4.1.11 | kog-kyva=gb | June 06 2003 | June 06 2003 | Eucariotic Cluster of orthologous genes (kyva=gb) |
4.1.11 | kog | July 21 2003 | July 21 2003 | Eucariotic Cluster of orthologous genes (kog) |
4.1.12 | scop | Release 1.73 | Release 1.71 | Structural Classification of Proteins (SCOP) |
4.1.13 | go | April 14 2008 | April 2007 | Gene Ontology (GO) |
4.2.1 | dip | April 07 2008 | February 19 2007 | Database of Interacting Proteins (DIP) |
4.2.2 | mips | June 01 2004 | n/a | The Mammalian Protein-Protein Interaction Database (MIPS) |
4.2.3 | hprd | September 01 2007 (Release_7) | n/a | Human Protein Reference Database (HPRD} |
4.2.4 | bind | April 15 2008 | n/a | Biomolecular Interaction Network Database (BOND) |
4.2.5 | intact | March 29 2008 | n/a | IntAct protein-protein interaction database |
4.2.6 | biogrid | 2.0.39 | n/a | The BioGRID's curated set of physical and genetic interactions |
4.2.7 | mint | April 08 2008 | n/a | The Molecular INTeraction database (MINT) |
4.2.8 | STRING | TBD | TBD | Predictions of protein interactions and COG (Cluster of Orthologous Genes) information for proteins (http://string.embl.de/) |
4.2.9 | ori | January 16 2007 | January 16 2007 | Predicted interactions from distant structure/sequence patterns (interact.dat) |
Miscellaneous
It was necessary modify some of the original piana source distribution (version 1.47) in order to be able to load the data locally. Most changes dealt with changes to the formats of the raw data loaded into the database. Other changes were related to database access (i.e., passwords, etc) and consistency issues. Additionally, scripts were written to automate the data loading process. The updated source code and scripts can be found in the ICB subversion repository at https://pbtech-vc.med.cornell.edu/public/svn/icb/3rdparty/piana (If you do not have a local icb account, login as "guest" using your email address for the password).