One widely used bioinformatic technique is to try to align several related sequences
to find residues / bases which are conserved between sequences and which are
variable. Conserved residues may be an essential part of the active site of an
enzyme for example, and variable residues could be part of a 'generic' alpha helix.
Dot matrix plots are a means of comparing two sequences graphically to find
regions of close similarity between them.
JDotter is a dot matrix Java program. JDotter runs as a client-server application.
Dot matrix plots, however, are not very useful in providing you with an actual
alignment of the two sequences. The ability to calculate a correct alignment
is important to many studies, such as whether a segment of a gene has been
duplicated, how many point mutations there have been between the two
sequences, where the locations of the indels (INsertions/DELetions) are; it can give you a distance
between species, and can lend an insight into the phylogeny of the sequences
under scrutiny. The most basic algorithm to align two sequences was
developed by Needleman and Wunsch.
A variation of the Needleman-Wunsch algorithm developed by Smith and Waterman finds the subsequence
alignment between two given sequences that gives the maximum degree of
similarity.
Other two sequence alignment programs include:
amino acid vs amino acid two sequence alignment. SIM finds a user-defined number of best non-intersecting alignments between two protein sequences or within a sequence.
nucleotide vs nucleotide two sequence alignment. Like SIM, this allows you to see, not just the best alignment between two nucleotide sequences, but all non-overlapping alignments.
genomic sequence vs cDNA/EST two sequence alignment.
The usefulness of pairwise alignments becomes limited when comparing
highly similar sequences. In these cases, to extract meaningful information,
it is often more helpful to compare several proteins from the same family,
including those from distantly related organisms, such as human and yeast.
One would expect very little chance sequence similarity between proteins from distantly related species,
so any identical or conserved amino acids could indicate functional or structural importance.
Sample protein sequence file.
Multiple alignments can also be used to discover SNPs, by taking a set of sequences from the
same region (of the same species). These sequences should all be identical, with the exception
of any polymorphisms. Alignments can help you to quickly find these polymorphic regions.
The HaploSNPer uses CAP3
or PHRAP
to align the
sequences together, and then uses user-defined parameters to discover whether there are any SNPs
present.
Sample DNA sequence file
Multiple sequence alignment may also be a vital prerequisite to trying to
determine the phylogenetic relationships among a group of related sequences,
and by extrapolation between the species or varieties that contain those
sequences. More on this in the next tutorial about phylogenetic reconstruction.
|