# HowTo/EvalTopK

### From Icbwiki

This page describes how to use EvalTopK/TopK to evaluate the statistical significance of ranking predictors. Ranking predictors are methods that return a ranked list of items. The items are generally ranked such that the items at the top of the list are more likely to be of the correct type than those further down the list. Information retrieval method generate ranked lists of documents and can be considered ranking predictors if documents are classified as relevant or non-relevant to a query.

Ranking predictors can be evaluated against the expectation of a random prediction with the approach implemented in TopK. For more details, see the manuscript.

## Download

TopK was developed by Sofus Attila Macskassy when at Rutgers University and can be downloaded from [1]

## Input file conversion

TopK expects predictions to be provided in a file where each line is of the form:

score<space>[+/-]

The first field of each line is the value of the score that generated the ranking of each returned item. The second field should be a plus or minus sign to indicate that the item belongs to the positive(+) or negative(-) class.

For instance,

21.299552041058323 + 21.15306983582886 + 20.03279567560194 -

defines a ranking of three items, the first two belong to the positive class, and the last one to the negative class.

The following description assumes that you use the output of the predictor and the class of labels to produce a file called scores.txt in the above format.

## Running TopK

Given the formatted scores.txt file, significance of the observed ranking can be estimated with TopK as follows:

java -cp . EvalTopK -dp scores.txt -mx 100 p 0.05 0.01 1e-4 1e-20 | more

The -dp option points to the scores.txt file. The -mx option indicates the maximum rank for which probabilities should be estimated. The default is to estimate for all ranks. The p character option defines a list of alphas (confidence intervals). For each alpha, TopK will estimate how many positive items would appear in the top K predictions if the predictions were done randomly. In the example above, the value 0.05 instructs TopK to estimate the random number of positive items at the 5% confidence interval.

The following is an example of results obtained by analyzing results of TEPSS when predicting the cytosolic ribosome:

reading predictions from scores.txt numPos=331 size =32414 thresholds = 1.0[0.0] 0.9999[9.999999999998899E-5] 0.99[0.010000000000000009] 0.95[0.050000000000000044] minK =1 maxK =100 # K predFileNumP predFileThreshold numP(1.0) numP(0.9999) numP(0.99) numP(0.95) 1 1 1.0 1.0 1.0 1.0 1.0 2 2 1.0000000000000002 2.0 2.0 1.0 1.0 3 2 0.9999989446897791 3.0 2.0 1.0 1.0 4 3 0.9999999893202385 4.0 2.0 1.0 1.0 5 4 0.9999999998922467 5.0 2.0 1.0 1.0 6 4 0.9999999993589 6.0 2.0 1.0 1.0 7 4 0.999999997774955 8.0 2.0 1.0 1.0 8 4 0.9999999941162624 8.0 2.0 1.0 1.0 9 5 0.9999999999112812 9.0 2.0 1.0 1.0 10 6 0.9999999999987297 8.0 3.0 1.0 1.0 11 7 0.9999999999999828 8.0 3.0 1.0 1.0 12 8 0.9999999999999999 9.0 3.0 1.0 1.0 ... 98 70 0.9999999999999998 99.0 4.0 3.0 99 71 1.0000000000000002 17.0 4.0 3.0 100 72 1.0 100.0 4.0 3.0 100 101 100 101 DONE!

The header are: K: rank in the output predFileNumP: Number of positive item found at this rank in the input file. predFileThreshold: 1-Pvalue of observing this distribution of positive items by chance numP(1.0): Number of positive items expected at this rank at P-value=1e-20 numP(0.9999): Number of positive items expected at this rank at P-value=1e-4 numP(0.99): Number of positive items expected at this rank at P-value=0.01 numP(0.95): Number of positive items expected at this rank at P-value=0.05

The ranking is highly significant (the third value is 1-PValue). Indeed, the observed number of top positive is larger than numP(0.9999) for all ranks.

For comparison, the following output is produced when a random ranking is provided to TopK (random-scores.txt contains the same number of positive and negative items than the actual score, but is shuffled):

reading predictions from random-scores-same-%.txt numPos=331 size =32414 thresholds = 1.0[0.0] 0.99[0.010000000000000009] 0.95[0.050000000000000044] minK =1 maxK =100 # K predFileNumP predFileThreshold numP(1.0) numP(0.99) numP(0.95) 1 0 0.9897883630530018 1.0 1.0 1.0 2 0 0.9796806918047205 2.0 1.0 1.0 3 0 0.9696759309449351 3.0 1.0 1.0 4 0 0.9597730358431865 4.0 1.0 1.0 5 0 0.9499709724410238 5.0 1.0 1.0 6 0 0.9402687171453349 6.0 1.0 1.0 7 0 0.9306652567227508 8.0 1.0 1.0 8 0 0.9211595881951108 8.0 1.0 1.0 9 0 0.9117507187359803 9.0 1.0 1.0 10 0 0.9024376655682097 8.0 1.0 1.0 11 0 0.8932194558625228 8.0 1.0 1.0 12 0 0.8840951266371272 9.0 1.0 1.0 13 0 0.8750637246583329 13.0 1.0 1.0 14 0 0.8661243063421727 14.0 1.0 1.0 15 0 0.8572759376570105 15.0 1.0 1.0 ... 98 2 0.9207905818334529 99.0 4.0 3.0 99 2 0.9189041290989911 17.0 4.0 3.0 100 2 0.9169983228846783 100.0 4.0 3.0 100 101 100 101 DONE!

At rank 15, no positive item has been found, while 1 where expected at the 5% confidence interval level. The probability that the ranking can be explained under the null model is 0.8572 (P-value=1-0.8572). The null hypothesis cannot be rejected and we must conclude that the ranking is not better than a random ranking.