Course home > Dot-matrix plots

Dot matrix plots are a means of visualizing the similarities and differences between two sequences. The sequences to be compared are arranged along the margins of a matrix. At every point in the matrix where the two sequences are identical, a dot is placed at their intersection in the matrix. A diagonal stretch of points therefore indicates a region of similarity between the two sequences. However, the amount of background noise on a typical dot plot, especially a DNA dot plot, is so huge that it becomes very difficult to make any sense of the plot whatsoever. Therefore it is usual to use a filter to reduce this noise. One such filtering mechanism is to place a dot only when a specified proportion of a small group of successive bases match, say 6 out of every 10 bases. Another way of filtering out the noise is to give each match a weight according to their chemical similarity.

Another way in which dot matrix plots can be generated is by looking only for exactly matching blocks. This algorithm subdivides both sequences into words (or tuples) of a user-specified block length. These arrays of words are then sorted alphabetically (with their locations). Then, by comparing the sorted array from one sequence with that from the other sequence, you can immediately get the respective locations of all the identical words.

The EMBOSS Dottup server uses the second method of generating dot matrix plots. Input either a sequence filename into each of the two smaller text-entry boxes, or type, or cut and paste, a sequence into each of the two larger text-entry boxes. The sequences can be in any standard format, and can be either DNA or protein, but both sequences must be of the same type.

The main option here is the word size, or tuple value. It can take as input any integer greater than 2 (default = 4). Using a longer word size displays less random noise and runs more quickly, but is less sensitive.

The graph pull-down menu gives a list of devices on which the resulting plot can be displayed. It will not be displayed on screen. The most convenient graph format is postscript, which can be viewed either with GSView on a PC, or with Adobe Photoshop on a Mac, or can be directly printed to a postscript-compatible printer.

News
Jul, 2009; ChIPseeqer, a comprehensive framework for analysis of ChIP-seq data developed in the Elemento lab, is now available for download. [More]
Apr, 2009; The BDVal program developed by the Campagne laboratory for MAQC-II is now available from http://bdval.org. The software supports the development and evaluation of predictive biomarker models from high-throughput data. The web site offers binary and source distributions. [More]
Jan, 2009; Twease now supports searching MEDLINE articles by Author, Journal, and Publication Year. Examples for performing these searches can be found in the updated Twease tutorial. [More]

[News Archives] [Mailing List]


Events
Dec 11th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Fabien Campagne; ICB Conference Room - Y.1301
Jan 15th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Lei Shi; ICB Conference Room - Y.1301
Feb 12th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Christopher E. Mason; ICB Conference Room - Y.1301
Mar 12th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Olivier Elemento; ICB Conference Room - Y.1301
Apr 9th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Emre Aksay; ICB Conference Room - Y.1301
May 14th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Jonathan D. Victor; ICB Conference Room - Y.1301
Jun 11th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Harel Weinstein; ICB Conference Room - Y.1301
Jul 9th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Duane Hassane; ICB Conference Room - Y.1301