Elementolab/PAGE adding new pathway database
From Icbwiki
(Redirected from Elementolab/FIRE adding new species)
This tutorial only covers the Gene Ontology part.
Get the annotation (most often from the GO web site)
wget http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/gene-associations/gene_association.mgi.gz?rev=HEAD mv gene_association.mgi.gz\?rev\=HEAD gene_association.mgi.gz gunzip gene_association.mgi.gz
Cleanup so as to obtain a two columns file (GENE tab GOID return). Depending on which IDs you want to use, it involves writing a script or just extracting columns
# extract all annotations columns.pl 2 4 < gene_association.mgi > gene_association.mgi.clean.all
# get full gene list columns.pl 0 < gene_association.mgi.clean.all | sort | uniq > gene_association.mgi.clean.all.genes
# use this script to extract only non-IEA annotations perl ~/PERL_SCRIPTS/GO_IEA_cleanup_raw_annotation_file.pl gene_association.mgi > gene_association.mgi.clean
Get the ontology files
wget http://www.geneontology.org/ontology/gene_ontology.obo wget http://www.geneontology.org/doc/GO.terms_and_ids
Get parent categories
perl ~/PERL_SCRIPTS/GO_OBO_get_parents.pl gene_ontology.obo > gene_ontology.obo.parents
Make index file (temporary ... it's missing lines)
perl ~/PERL_SCRIPTS/GO_gene_association2index.pl gene_association.mgi.clean gene_ontology.obo.parents > mouse_go_orf_index.txt.tmp
Make final findex file
GO_get_index_lines_for_all_genes.pl gene_association.mgi.clean.all.genes mouse_go_orf_index.txt.tmp > mouse_go_orf_index.txt
Make names file
mv GO.terms_and_ids mouse_go_orf_names.txt # use text editor to remove header comments
Make a gene desc file
columns.pl 1 2 < ../mouse_genedesc.txt > mouse_go_orf_genedesc.txt