Courses home > Homology searching

One of the most widely used bioinformatics tools involves searching a database for sequences similar to a query sequence. If two sequences are similar, then it is likely that they have a similar structure, and therefore perhaps similar functions. Thus, if you have an as-yet-uncharacterized sequence, finding homologs in the databases can give you an idea of what its identity might be. (Find protein sequences at NCBI.)

[ blast ] workedexample | help

[ fasta ] workedexample | help

[ smith waterman ] workedexample | help

There are several different algorithms for implementing a homology search, and each program has a wide range of options and parameters to help you carry out a more informative type of search. The algorithm that gives the most exact and informative matches is the Smith-Waterman algorithm, and was also the first homology searching algorithm developed. However, this program cannot be used with large databases because the algorithm is so labor-intensive that it becomes unfeasibly slow. The Smith-Waterman algorithm is most usefully used with protein databases, of smaller sizes, such as SwissProt. The most commonly used program is the BLAST family of programs, which gives biologically meaningful matches in a reasonable amount of time. However, the FASTA program, although usually a little slower, is almost always more sensitive than BLAST when using a DNA sequence to query a DNA database.

[ interproscan ]

The BLAST, FASTA and Smith-Waterman (MPSrch) servers are used to find homologs of the query sequence in the databases to aid in the identification of the query sequence. The InterProScan server, on the other hand, searches the InterPro database to identify chunks of protein that may encode some function. The InterPro database is an integrated resource that stores information about protein families, domains, repeat regions and other functional sites from the most commonly used signature databases.

Sequence retrievalSequence alignmentPhylogenetic analysisPromoter analysis
News
Apr, 2009; The BDVal program developed by the Campagne laboratory for MAQC-II is now available from http://bdval.org. The software supports the development and evaluation of predictive biomarker models from high-throughput data. The web site offers binary and source distributions. [More]
Jan, 2009; Twease now supports searching MEDLINE articles by Author, Journal, and Publication Year. Examples for performing these searches can be found in the updated Twease tutorial. [More]
Jan, 2009; The free bioinformatics walk-in clinics are back and will be available every Monday, 2-4pm at the Weill Cornell Medical Library, in the Computer Room on the lower level. [More]

[News Archives] [Mailing List]


Events
Jul 7th; 1:00pm-2:00pm: Understanding Computation in Large Neuronal Networks - Yasser Roudi, Ph.D.; Nordic Institute for Theoretical Physics; LC-504 [Physiology and Biophysics Special Seminar]
Nov 1st: Sequencing of Individual Genomes: Impact on Society and Ethics ; Geneva, Switzerland [HUGO Symposium on Genomics and Ethics, Law and Society]