Course home > Blast worked example

We are going to try to find all the homologs of the Felis catus cannabinoid receptor (AAG37779) from the Swissprot database.

Go to the NCBI BLAST home page. Click on the protein blast link. Into the large text-entry box at the top of the page, we can either input the sequence accession code (AAG37779), or paste the whole sequence that can be retrieved from the Protein division of Entrez. Choose the Swissprot database to perform the search. Since we are doing a simple BlastP search, we will not change the algorithm to be used. In the Algorithm parameters section, there are many other options (such as matrices, word sizes, e-values etc) but for an initial search, it is perfectly okay to use the default settings. One useful option, however, is to check the 'Low complexity regions' filter, to prevent any biologically irrelevant but statistically significant matches.

Show BLASTP FILTER OPTION image

Click on 'Blast!' to start the BLAST search. As you change options in the main body of the page, the changes are reflected here, so that you can ensure that you are running the required search.

Show COMPLETED BLASTP FORM image

The server then tells us that our BLAST search has been submitted and tells us how often the page will be refreshed. It also gives the job a request ID so that if we want to reformat the results, or look at the BLAST search again (within about a week or so of it being performed) then we don't have to do the whole BLAST search again but just have to re-enter the request ID (in the Recent Results tab).

Show PREFORMAT image

A BlastP search automatically does a CD (conserved domain) search (against Pfam), and shows any functional domain that is found in our sequence. In this case, we can see that there is a 7 transmembrane segment in our sequence. (If we click on the identified functional domain, it brings us to a more detailed description of the Pfam family, and includes an alignment of our sequence against one of the domain sequences that are used to create the consensus signature for that domain.)

Show PFAM PAGE image

The results come in three main sections. The first section is the graphical overview of the hits. The different colors indicate how good the match is (as indicated by the guide at the top of this section). The second section is a list of short descriptions of the matches that were found, with their definition lines, e-values (expectation value, or what the probability is that this alignment would be found by chance) and scores. The third section consists of the alignments of our query sequence against the matches from the database. Running the mouse over one of the lines in the graphical overview (as opposed to clicking any of them) shows you the score and definition line of the line that the mouse is over. Clicking on one of the colored lines takes you directly to the alignment in section 3. Clicking on the accession number link in section 2 takes you to the sequence itself. Clicking on the score in section 2 takes you to the relevant alignment in section 3, where clicking on the accession number again takes you directly to the sequence itself.

If we look at the matches that our search found, we can see that they are divided into two main categories. The first have very high scores and very low expectation values. From their definition lines, we can also see that they are all annotated as being cannabinoid receptors from various species, including the top hit, which finds the cannabinoid receptor from Felis catus, which is the sequence that we used as our query.

Show BLASTP RESULT SECTION 1 image

However, if we look closely at the alignment, we can see that the sequences are not identical. Some portions of the query sequence have been masked (as shown by the light gray lowercase letters) as defining low complexity regions, and there are one or two mismatches also scattered around the sequence.

Show BLASTP RESULT SECTION 2 image

This stems from the fact that they are in fact two different sequences submitted by two different groups at two different times, and furthermore, the one that we used a query is not the complete sequence (since it only starts to match from position 53 on the subject sequence), whereas the one which the search found in the Swissprot database is.

Secondly, we can notice that not only do we find other brain-type cannabinoid receptors such as our query, but also the other type-2 cannabinoid receptors, which we can also notice have slightly lower scores and slightly higher e-values. Similarly, if we look at the alignments, we can see that they are a lot patchier, but that the similarity between the query and the matches is still evident.

Further down, we come across a number of G protein-coupled receptors. The cannabinoid receptor is itself an example of a GPCR, and it is due to the fact that large families of proteins have similar conserved domains that we find these other members of this large superfamily.

It may also be useful to look at the BLAST and PSI-BLAST tutorials provided at NCBI.

News
Jul, 2009; ChIPseeqer, a comprehensive framework for analysis of ChIP-seq data developed in the Elemento lab, is now available for download. [More]
Apr, 2009; The BDVal program developed by the Campagne laboratory for MAQC-II is now available from http://bdval.org. The software supports the development and evaluation of predictive biomarker models from high-throughput data. The web site offers binary and source distributions. [More]
Jan, 2009; Twease now supports searching MEDLINE articles by Author, Journal, and Publication Year. Examples for performing these searches can be found in the updated Twease tutorial. [More]

[News Archives] [Mailing List]


Events
Dec 11th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Fabien Campagne; ICB Conference Room - Y.1301
Jan 15th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Lei Shi; ICB Conference Room - Y.1301
Feb 12th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Christopher E. Mason; ICB Conference Room - Y.1301
Mar 12th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Olivier Elemento; ICB Conference Room - Y.1301
Apr 9th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Emre Aksay; ICB Conference Room - Y.1301
May 14th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Jonathan D. Victor; ICB Conference Room - Y.1301
Jun 11th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Harel Weinstein; ICB Conference Room - Y.1301
Jul 9th; 4:00pm-5:00pm: Institute for Computational Biomedicine Research in Progress Seminar Series - Duane Hassane; ICB Conference Room - Y.1301