Elementolab/PAGE adding new pathway database

From Icbwiki

Jump to: navigation, search

This tutorial only covers the Gene Ontology part.

Get the annotation (most often from the GO web site)

wget http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/gene-associations/gene_association.mgi.gz?rev=HEAD
mv gene_association.mgi.gz\?rev\=HEAD gene_association.mgi.gz
gunzip gene_association.mgi.gz

Cleanup so as to obtain a two columns file (GENE tab GOID return). Depending on which IDs you want to use, it involves writing a script or just extracting columns

# extract all annotations
columns.pl 2 4 < gene_association.mgi > gene_association.mgi.clean.all
# get full gene list
columns.pl 0 < gene_association.mgi.clean.all | sort | uniq > gene_association.mgi.clean.all.genes
# use this script to extract only non-IEA annotations
perl  ~/PERL_SCRIPTS/GO_IEA_cleanup_raw_annotation_file.pl gene_association.mgi > gene_association.mgi.clean

Get the ontology files

wget http://www.geneontology.org/ontology/gene_ontology.obo
wget http://www.geneontology.org/doc/GO.terms_and_ids

Get parent categories

perl ~/PERL_SCRIPTS/GO_OBO_get_parents.pl gene_ontology.obo > gene_ontology.obo.parents

Make index file (temporary ... it's missing lines)

perl ~/PERL_SCRIPTS/GO_gene_association2index.pl gene_association.mgi.clean gene_ontology.obo.parents > mouse_go_orf_index.txt.tmp

Make final findex file

GO_get_index_lines_for_all_genes.pl gene_association.mgi.clean.all.genes mouse_go_orf_index.txt.tmp > mouse_go_orf_index.txt 

Make names file

mv GO.terms_and_ids mouse_go_orf_names.txt
# use text editor to remove header comments

Make a gene desc file

columns.pl 1 2 <  ../mouse_genedesc.txt  > mouse_go_orf_genedesc.txt
Personal tools