Dot matrix plots are a means of visualizing the similarities and differences
between two sequences. The sequences to be compared are arranged along
the margins of a matrix. At every point in the matrix where the two
sequences are identical, a dot is placed at their intersection in the matrix.
A diagonal stretch of points therefore indicates a region of similarity
between the two sequences. However, the amount of background noise on
a typical dot plot, especially a DNA dot plot, is so huge that it becomes
very difficult to make any sense of the plot whatsoever. Therefore it is
usual to use a filter to reduce this noise. One such filtering mechanism
is to place a dot only when a specified proportion of a small group of
successive bases match, say 6 out of every 10 bases. Another way
of filtering out the noise is to give each match a weight according to
their chemical similarity.
Another way in which dot matrix plots can be generated is by looking only
for exactly matching blocks. This algorithm subdivides both sequences
into words (or tuples) of a user-specified block length. These arrays of
words are then sorted alphabetically (with their locations). Then,
by comparing the sorted array from one sequence with that from
the other sequence, you can immediately get the respective locations
of all the identical words.
The EMBOSS Dottup server uses the second method of generating dot matrix
plots. Input either a sequence filename into each of the two smaller text-entry
boxes, or type, or cut and paste, a sequence into each of the two larger
text-entry boxes. The sequences can be in any standard format, and can
be either DNA or protein, but both sequences must be of the same type.
The main option here is the word size, or tuple value. It can take as input
any integer greater than 2 (default = 4). Using a longer word size displays
less random noise and runs more quickly, but is less sensitive.
The graph pull-down menu gives a list of devices on which the resulting
plot can be displayed. It will not be displayed on screen. The most convenient
graph format is postscript, which can be viewed either with GSView on a
PC, or with Adobe Photoshop on a Mac, or can be directly printed to a postscript-compatible printer.
|