# TissueInfo

### From Icbwiki

## Contents |

## TissueInfo

TissueInfo is a bioinformatics pipeline to calculate the tissue expression profile of transcripts, ESTs or proteins. More background information about the project can be found on the TissueInfo project page.

This wiki page is concerned with ongoing or future improvements to TissueInfo.

### Calculating tissue expression profile similarity

When TissueInfo has produced transcript expression profiles, how can we quantify the similarity between the expression profile of two transcripts? A quantitative similarity measure would be useful to cluster transcripts based on their tissue expression, or just to assess how compatible are the tissue expression profiles of transcripts in a gene list.
To illustrate this discussion, consider the three transcripts *t*_{1},*t*_{2} and *t*_{3} expressed as shown in Table 1 below.

transcript id | hypothalamus | hippocampus | brain | liver |
---|---|---|---|---|

t_{1}
| 0 | 1 | 10 | 232 |

t_{2}
| 1 | 3 | 50 | 25 |

t_{3}
| 0 | 0 | 40 | 100 |

#### Naive distance with EST counts

We could cluster transcripts based on the counts of ESTs in each tissue tested. The quantitative measure would be a distance calculated between points with n dimensions, if n tissues where tested. The coordinate of each transcript could simply be the number of times the transcript is detected in an EST library made from the tissue.

If we use an euclidian distance . Therefore, *d*(*t*_{1},*t*_{2}) = 210.84 and *d*(*t*_{1},*t*_{3}) = 135.37. These distances suggest that *t*_{1} is closer to *t*_{3} than to *t*_{2}. Yet, *t*_{1} and *t*_{2} both have matches in hippocampus, a tissue where expression is infrequently reported. Intuitively, this match should count more than expression in liver, which is common for many genes.

#### Expression confidence scores

We could transform the EST counts into measures of how much we trust expression in a given tissue. For instance, we know that the EST count 1 and 3 in hippocampus carries the same information: expression was detected in hippocampus for both *t*_{1} and *t*_{2}. The difference 1 to 3 is within sampling error and it would be unwise to conclude that *t*_{2} is expressed three times more than *t*_{1} in hippocampus.

transcript id | hypothalamus | hippocampus | brain | liver |
---|---|---|---|---|

t_{1}
| 0 | 1 | 2 | 3 |

t_{2}
| 1 | 1 | 2 | 2 |

t_{3}
| 0 | 0 | 2 | 2 |

The euclidian distance calculated with confidence scores yields . Similarly, *d*(*t*_{1},*t*_{2}) = 1.41.

#### Scoring with confidence scores

We could quantify expression profile similarity with an aggregates of confidence scores (inspired by sequence similarity scores). Here, we sum contributions for each tissue. The term is positive if the confidence score of both transcript is positive (we can take the min of each confidence score), as in *s*(*t*_{1},*t*_{2}) = − 1 + *m**i**n*(1,1) + *m**i**n*(2,2) + *m**i**n*(3,2) = − 1 + 1 + 2 + 2 = 4. The tissue contribution is zero when the confidence score is zero in both transcripts. Finally, the score contribution is taken to be minus the confidence score of the transcript for which expression is detected in the tissue while it is not detected in the other tissue.

s(t_{i},t_{j}) = | ∑ | S(t_{i},t_{j},t) |

t = tissue |

where

This formulation yields
*s*(*t*_{1},*t*_{3}) = 0 + − 1 + *m**i**n*(2,2) + *m**i**n*(3,2) = − 1 + 2 + 2 = 3and *s*(*t*_{1},*t*_{2}) = 4 > *s*(*t*_{1},*t*_{3}) suggests that *t*_{1} and *t*_{2} have closer expression profiles than *t*_{1} and *t*_{3}.

#### More information

Some data regarding the test of this idea are presented on the TEPSS page.

We are currently preparing a manuscript for publication about this extension of TissueInfo. Contact me if you would like to try a pre-release version of TissueInfo Fabien Campagne 11:30, 8 August 2007 (EDT)

#### Future developments

This page lists ideas for future development of TissueInfo.