# Twease/tasks/Pseudo relevance feedback for query expansion

Pseudo-relevance feedback (also called blind query expansion) is a key technique for improving the performance of information retrieval engines. In the pseudo-relevance feedback scheme, the following steps are carried out:

1. Initial query with set of terms T0 -> D0
2. Examine D0 and select a set of new terms T1. (Different strategies exist to select T1 from D0).
3. New query with $T_0 \cup T_1$ -> D1 (presented to the user). Terms T1 are usually reweighted.

Most papers which I read that studied pseudo-relevance feedback conclude that the set T1 should be limited in size (to about 20 or 30 terms). This paper, in particular, did a parameter scan to determine the impact of the number of top documents used to calculate T1 and the number of terms kept in T1.

Why is BM25 performance declining after a certain threshold of new terms is included? I find this counter-intuitive since more terms should help identify more documents. Is the limit dependent on the question asked? It does not seem to be so since many papers observed the same effect in different topic and text collections.

Could the effect be similar to what we report in our TREC 2006 notebook paper with words identified by stemming? (TODO: design an experiment to test this hypothesis.) If so we could correct for it with the BM25ec scorer by regrouping terms into independent classes.

How can we cluster terms into groups where groups in a class are related. I put some ideas on this other page.