Courses home > Alignments

One widely used bioinformatic technique is to try to align several related sequences to find residues / bases which are conserved between sequences and which are variable. Conserved residues may be an essential part of the active site of an enzyme for example, and variable residues could be part of a 'generic' alpha helix.

[ dot matrix ] help

Dot matrix plots are a means of comparing two sequences graphically to find regions of close similarity between them.

[ jdotter ]

JDotter is a dot matrix Java program. JDotter runs as a client-server application.

[ needleman wunsch ] help

Dot matrix plots, however, are not very useful in providing you with an actual alignment of the two sequences. The ability to calculate a correct alignment is important to many studies, such as whether a segment of a gene has been duplicated, how many point mutations there have been between the two sequences, where the locations of the indels (INsertions/DELetions) are; it can give you a distance between species, and can lend an insight into the phylogeny of the sequences under scrutiny. The most basic algorithm to align two sequences was developed by Needleman and Wunsch.

[ smith waterman ] help

A variation of the Needleman-Wunsch algorithm developed by Smith and Waterman finds the subsequence alignment between two given sequences that gives the maximum degree of similarity.

Other two sequence alignment programs include:
[ SIM ]

amino acid vs amino acid two sequence alignment. SIM finds a user-defined number of best non-intersecting alignments between two protein sequences or within a sequence.

[ LFasta ]

nucleotide vs nucleotide two sequence alignment. Like SIM, this allows you to see, not just the best alignment between two nucleotide sequences, but all non-overlapping alignments.

[ MGAlign ]

genomic sequence vs cDNA/EST two sequence alignment.

[ clustalw ] workedexample | help

The usefulness of pairwise alignments becomes limited when comparing highly similar sequences. In these cases, to extract meaningful information, it is often more helpful to compare several proteins from the same family, including those from distantly related organisms, such as human and yeast. One would expect very little chance sequence similarity between proteins from distantly related species, so any identical or conserved amino acids could indicate functional or structural importance.

[ SNPServer ] workedexample | help

Multiple alignments can also be used to discover SNPs, by taking a set of sequences from the same region (of the same species). These sequences should all be identical, with the exception of any polymorphisms. Alignments can help you to quickly find these polymorphic regions. The SNPServer uses CAP3 to align the sequences together, and then uses user-defined parameters to discover whether there are any SNPs present.

Multiple sequence alignment may also be a vital prerequisite to trying to determine the phylogenetic relationships among a group of related sequences, and by extrapolation between the species or varieties that contain those sequences. More on this in the next tutorial about phylogenetic reconstruction.

Sequence retrievalHomology searchingPhylogenetic analysisPromoter analysis
News
Jun, 2008; Bioinformatics meets Alzheimer's disease research. Read about the discovery of the CALHM1 P86L polymorphism. The study appeared in the June 27th issue of Cell. [More]
Mar, 2008; A free bioinformatics walk-in clinic will be available every Monday, 1-3pm at the Weill Cornell Medical Library, in the Computer Room on the lower level. [More]

[News Archives] [Mailing List]


Events
Aug 25-29, 2008: Stanford University, CA - 7th Annual International Conference on Computational Systems Bioinformatics. Hosted by: Life Sciences Society [More]
Sep 22-26, 2008: Goettingen, Germany - Fall Course on Computational Neuroscience at the Max Planck Institute for Dynamics and Self-Organization. This annual course comprises tutorial lectures and seminar style coverage of selected current topics. Registration deadline: Aug 8, 2008. [More]