We are going to try to find all the homologs of the Felis catus
cannabinoid receptor (AAG37779) from the Swissprot database.
Go to the
NCBI BLAST home page. Click on the protein
blast link. Into the large text-entry box at the top of the page, we
can either input the sequence accession code (AAG37779), or paste
the whole sequence that can be retrieved from the
Protein division
of Entrez. Choose the Swissprot database to perform the search.
Since we are doing a simple BlastP search, we will not change the algorithm to be used.
In the Algorithm parameters section, there are many other options
(such as matrices, word sizes,
e-values etc) but for an initial search, it is perfectly okay to use
the default settings.
One useful option, however, is to check the 'Low complexity regions' filter,
to prevent any biologically irrelevant but statistically significant matches.
Show BLASTP FILTER OPTION image
Click on 'Blast!' to start the BLAST search.
As you change options in the main body of the page, the changes are reflected here,
so that you can ensure that you are running the required search.
Show COMPLETED BLASTP FORM image
The server then tells us that our BLAST search has been submitted
and tells us how often the page will be refreshed. It also gives the
job a request ID so that if we want to reformat the results, or look
at the BLAST search again (within about a week or so of it being
performed) then we don't have to do the whole BLAST search again
but just have to re-enter the request ID (in the Recent Results tab).
Show PREFORMAT image
A BlastP search automatically does a CD (conserved domain) search (against Pfam),
and shows
any functional domain that is found in our sequence. In this case, we can see that there is a
7 transmembrane segment in our sequence. (If we click on the identified
functional domain, it brings us to a more detailed description of
the Pfam family, and includes an alignment of our sequence against one of the domain
sequences that are used to create the
consensus signature for that domain.)
Show PFAM PAGE image
The results come in three
main sections. The first section is the
graphical overview of the hits. The different colors indicate how good
the match is (as indicated by the guide at the top of this section).
The second section is a list of short descriptions of the matches
that were found, with their definition lines, e-values (expectation value,
or what the probability is that this alignment would be found by chance)
and scores. The third section consists of the alignments of our query
sequence against the matches from the database. Running the mouse
over one of the lines in the graphical overview (as opposed to clicking
any of them) shows you the score and definition line of the line that
the mouse is over. Clicking on one of the colored lines takes you directly
to the alignment in section 3. Clicking on the accession number link in
section 2 takes you to the sequence itself. Clicking on the score in
section 2 takes you to the relevant alignment in section 3, where
clicking on the accession number again takes you directly to the
sequence itself.
If we look at the matches that our search found, we can see that they
are divided into two main categories. The first have very high scores and
very low expectation values. From their definition lines, we can also see
that they are all annotated as being cannabinoid receptors from various
species, including the top hit, which finds the cannabinoid receptor from
Felis catus, which is the sequence that we used as our query.
Show BLASTP RESULT SECTION 1 image
However,
if we look closely at the alignment, we can see that the sequences are
not identical. Some portions of the query sequence have been masked
(as shown by the light gray lowercase letters) as defining low complexity regions,
and there are one or two mismatches also
scattered around the sequence.
Show BLASTP RESULT SECTION 2 image
This stems from the fact that they are
in fact two different sequences submitted by two different groups at
two different times, and furthermore, the one that we used a query is
not the complete sequence (since it only starts to match from position 53 on the subject
sequence), whereas the one which the search found
in the Swissprot database is.
Secondly, we can notice that not only do we find other brain-type
cannabinoid receptors such as our query, but also the other type-2
cannabinoid receptors,
which we can also notice have slightly lower scores and slightly higher
e-values. Similarly, if we look at the alignments, we can see that they are
a lot patchier, but that the similarity between the query and the matches
is still evident.
Further down, we come across a number of G protein-coupled receptors.
The cannabinoid receptor is itself an example of a GPCR, and it is due to
the fact that large families of proteins have similar conserved domains
that we find these other members of this large superfamily.
It may also be useful to look at the
BLAST and
PSI-BLAST
tutorials provided
at NCBI.
|