Courses home > Homology searching

One of the most widely used bioinformatics tools involves searching a database for sequences similar to a query sequence. If two sequences are similar, then it is likely that they have a similar structure, and therefore perhaps similar functions. Thus, if you have an as-yet-uncharacterized sequence, finding homologs in the databases can give you an idea of what its identity might be. (Find protein sequences at NCBI.)

[ blast ] workedexample | help

[ fasta ] workedexample | help

[ smith waterman ] workedexample | help

There are several different algorithms for implementing a homology search, and each program has a wide range of options and parameters to help you carry out a more informative type of search. The algorithm that gives the most exact and informative matches is the Smith-Waterman algorithm, and was also the first homology searching algorithm developed. However, this program cannot be used with large databases because the algorithm is so labor-intensive that it becomes unfeasibly slow. The Smith-Waterman algorithm is most usefully used with protein databases, of smaller sizes, such as SwissProt. The most commonly used program is the BLAST family of programs, which gives biologically meaningful matches in a reasonable amount of time. However, the FASTA program, although usually a little slower, is almost always more sensitive than BLAST when using a DNA sequence to query a DNA database.

[ interproscan ]

The BLAST, FASTA and Smith-Waterman (MPSrch) servers are used to find homologs of the query sequence in the databases to aid in the identification of the query sequence. The InterProScan server, on the other hand, searches the InterPro database to identify chunks of protein that may encode some function. The InterPro database is an integrated resource that stores information about protein families, domains, repeat regions and other functional sites from the most commonly used signature databases.

Sequence retrievalSequence alignmentPhylogenetic analysisPromoter analysis
News
Jul, 2009; ChIPseeqer, a comprehensive framework for analysis of ChIP-seq data developed in the Elemento lab, is now available for download. [More]
Apr, 2009; The BDVal program developed by the Campagne laboratory for MAQC-II is now available from http://bdval.org. The software supports the development and evaluation of predictive biomarker models from high-throughput data. The web site offers binary and source distributions. [More]
Jan, 2009; Twease now supports searching MEDLINE articles by Author, Journal, and Publication Year. Examples for performing these searches can be found in the updated Twease tutorial. [More]

[News Archives] [Mailing List]


Events
Dec 11th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Fabien Campagne; ICB Conference Room - Y.1301
Jan 15th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Lei Shi; ICB Conference Room - Y.1301
Feb 12th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Christopher E. Mason; ICB Conference Room - Y.1301
Mar 12th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Olivier Elemento; ICB Conference Room - Y.1301
Apr 9th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Emre Aksay; ICB Conference Room - Y.1301
May 14th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Jonathan D. Victor; ICB Conference Room - Y.1301
Jun 11th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Harel Weinstein; ICB Conference Room - Y.1301
Jul 9th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Duane Hassane; ICB Conference Room - Y.1301