Course home > The Needleman-Wunsch algorithm

The Needleman-Wunsch global alignment algorithm finds the optimum alignment (including gaps) of two sequences over their entire length. The Needleman-Wunsch algorithm reads in a scoring matrix that contains values for every possible residue or nucleotide match and finds an alignment with the maximum possible score where the score of an alignment is equal to the sum of the matches taken from the scoring matrix. This works best with closely related sequences. Using this algorithm to try to align very distantly-related sequences will produce a result but much of the alignment may have little or no biological significance.

Input your e-mail address into the box provided. Then you can either paste in your two sequences which you want to align into each of the two large text-entry boxes provided, or you can browse your hard disk for the relevant files.

There are a large number of comparison matrices that you can choose from. These determine the scores of matches and mismatches. The default is the EBLOSUM62 matrix for protein sequences, and the EDNAMAT matrix for nucleotide sequences.

The gap open penalty is the score taken away when a gap is created. Any floating point value between 1.0 and 100.0 is allowed. The default is 10.0.

The gap extension penalty is added to the standard gap penalty for each base or residue in the gap. This implies that long gaps are penalized more heavily than short gaps. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. An exception is where one or both sequences are single reads with possible sequencing errors in which case you would expect many single base gaps. You can get this result by setting the gap open penalty to zero (or very low) and using the gap extension penalty to control gap scoring. Any floating point value between 0.0 and 10.0. The default is 0.5. for any sequence Allowed values: Floating point number from 0.0 to 10.0. Typically, the cost of extending a gap is set to be 5-10 times lower than the cost for opening a gap.

The memory and time required to do the search is proportional to (on the order of) mn, where 'n' and 'm' are the lengths of the two sequences. This means that aligning two 1000-residue sequences takes roughly 100 times longer and uses 100 times more memory than aligning two 100-residue sequences.

News
Jul, 2009; ChIPseeqer, a comprehensive framework for analysis of ChIP-seq data developed in the Elemento lab, is now available for download. [More]
Apr, 2009; The BDVal program developed by the Campagne laboratory for MAQC-II is now available from http://bdval.org. The software supports the development and evaluation of predictive biomarker models from high-throughput data. The web site offers binary and source distributions. [More]
Jan, 2009; Twease now supports searching MEDLINE articles by Author, Journal, and Publication Year. Examples for performing these searches can be found in the updated Twease tutorial. [More]

[News Archives] [Mailing List]


Events
Dec 11th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Fabien Campagne; ICB Conference Room - Y.1301
Jan 15th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Lei Shi; ICB Conference Room - Y.1301
Feb 12th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Christopher E. Mason; ICB Conference Room - Y.1301
Mar 12th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Olivier Elemento; ICB Conference Room - Y.1301
Apr 9th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Emre Aksay; ICB Conference Room - Y.1301
May 14th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Jonathan D. Victor; ICB Conference Room - Y.1301
Jun 11th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Harel Weinstein; ICB Conference Room - Y.1301
Jul 9th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Duane Hassane; ICB Conference Room - Y.1301