The Needleman-Wunsch global alignment algorithm finds the optimum alignment
(including gaps) of two sequences over their entire length. The Needleman-Wunsch
algorithm reads in a scoring matrix that contains values for every possible residue
or nucleotide match and finds an alignment with the maximum possible score where
the score of an alignment is equal to the sum of the matches taken from the
scoring matrix. This works best with closely related sequences. Using this
algorithm to try to align very distantly-related sequences will produce a result
but much of the alignment may have little or no biological significance.
Input your e-mail address into the box provided. Then you can either paste in your two
sequences which you want to align into each of the two large text-entry boxes provided,
or you can browse your hard disk for the relevant files.
There are a large number of comparison matrices that you can choose from. These
determine the scores of matches and mismatches. The default is the EBLOSUM62
matrix for protein sequences, and the EDNAMAT matrix for nucleotide sequences.
The gap open penalty is the score taken away when a gap is created. Any floating point
value between 1.0 and 100.0 is allowed. The default is 10.0.
The gap extension penalty is added to the standard gap penalty for each base or
residue in the gap. This implies that long gaps are penalized more heavily than
short gaps. Usually you will expect a few long gaps rather than many short gaps,
so the gap extension penalty should be lower than the gap penalty. An exception
is where one or both sequences are single reads with possible sequencing errors
in which case you would expect many single base gaps. You can get this result by
setting the gap open penalty to zero (or very low) and using the gap extension
penalty to control gap scoring. Any floating point value between 0.0 and 10.0. The
default is 0.5. for any sequence Allowed values: Floating point number from 0.0 to
10.0. Typically, the cost of extending a gap is set to be 5-10 times lower than the
cost for opening a gap.
The memory and time required to do the search is proportional to (on the order of)
mn, where 'n' and 'm' are the lengths of the two sequences. This means that aligning
two 1000-residue sequences takes roughly 100 times longer and uses 100 times more
memory than aligning two 100-residue sequences.
|