Courses home > Alignments

One widely used bioinformatic technique is to try to align several related sequences to find residues / bases which are conserved between sequences and which are variable. Conserved residues may be an essential part of the active site of an enzyme for example, and variable residues could be part of a 'generic' alpha helix.

[ dot matrix ] help

Dot matrix plots are a means of comparing two sequences graphically to find regions of close similarity between them.

[ jdotter ]

JDotter is a dot matrix Java program. JDotter runs as a client-server application.

[ needleman wunsch ] help

Dot matrix plots, however, are not very useful in providing you with an actual alignment of the two sequences. The ability to calculate a correct alignment is important to many studies, such as whether a segment of a gene has been duplicated, how many point mutations there have been between the two sequences, where the locations of the indels (INsertions/DELetions) are; it can give you a distance between species, and can lend an insight into the phylogeny of the sequences under scrutiny. The most basic algorithm to align two sequences was developed by Needleman and Wunsch.

[ smith waterman ] help

A variation of the Needleman-Wunsch algorithm developed by Smith and Waterman finds the subsequence alignment between two given sequences that gives the maximum degree of similarity.

Other two sequence alignment programs include:
[ SIM ]

amino acid vs amino acid two sequence alignment. SIM finds a user-defined number of best non-intersecting alignments between two protein sequences or within a sequence.

[ LFasta ]

nucleotide vs nucleotide two sequence alignment. Like SIM, this allows you to see, not just the best alignment between two nucleotide sequences, but all non-overlapping alignments.

[ MGAlign ]

genomic sequence vs cDNA/EST two sequence alignment.

[ clustalw ] workedexample | help

The usefulness of pairwise alignments becomes limited when comparing highly similar sequences. In these cases, to extract meaningful information, it is often more helpful to compare several proteins from the same family, including those from distantly related organisms, such as human and yeast. One would expect very little chance sequence similarity between proteins from distantly related species, so any identical or conserved amino acids could indicate functional or structural importance. Sample protein sequence file.

[ HaploSNPer ]

Multiple alignments can also be used to discover SNPs, by taking a set of sequences from the same region (of the same species). These sequences should all be identical, with the exception of any polymorphisms. Alignments can help you to quickly find these polymorphic regions. The HaploSNPer uses CAP3 or PHRAP to align the sequences together, and then uses user-defined parameters to discover whether there are any SNPs present. Sample DNA sequence file

Multiple sequence alignment may also be a vital prerequisite to trying to determine the phylogenetic relationships among a group of related sequences, and by extrapolation between the species or varieties that contain those sequences. More on this in the next tutorial about phylogenetic reconstruction.

Sequence retrievalHomology searchingPhylogenetic analysisPromoter analysis
News
Jul, 2009; ChIPseeqer, a comprehensive framework for analysis of ChIP-seq data developed in the Elemento lab, is now available for download. [More]
Apr, 2009; The BDVal program developed by the Campagne laboratory for MAQC-II is now available from http://bdval.org. The software supports the development and evaluation of predictive biomarker models from high-throughput data. The web site offers binary and source distributions. [More]
Jan, 2009; Twease now supports searching MEDLINE articles by Author, Journal, and Publication Year. Examples for performing these searches can be found in the updated Twease tutorial. [More]

[News Archives] [Mailing List]


Events
Dec 11th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Fabien Campagne; ICB Conference Room - Y.1301
Jan 15th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Lei Shi; ICB Conference Room - Y.1301
Feb 12th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Christopher E. Mason; ICB Conference Room - Y.1301
Mar 12th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Olivier Elemento; ICB Conference Room - Y.1301
Apr 9th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Emre Aksay; ICB Conference Room - Y.1301
May 14th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Jonathan D. Victor; ICB Conference Room - Y.1301
Jun 11th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Harel Weinstein; ICB Conference Room - Y.1301
Jul 9th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Duane Hassane; ICB Conference Room - Y.1301