MG4J Team Visit

From Icbwiki

Jump to: navigation, search

This page describes current and potential future interactions between MG4J, Textractor and Twease.

Let's make sure we discuss:

  • New MG4J 1.1 features, including
    • Index clusters
    • Remote Index search
    • TREC Genomics track passage&aspect retrieval task

Here are some areas of potential interactions between MG4J and Textractor/Twease:

  1. The DidYouMean engine and suggest related tool.
  2. Changing the granularity of the index at runtime i.e., whole document, sentence-level or section-level searches. Discuss how to extend MG4J to support indexing boundaries. Boundaries differ from words in that they have no length (any number of boundaries can exist between two consecutive words at position i and i+1).
  3. BM25 behavior when query expansion is used (i.e., with post-indexing stemming such as done in Twease).
  4. BM25ec (equivalence classes).
  5. Automatic query construction from topic narratives.
  6. Ranking terms by TF-IDF and the TF-IDF compressed file representation (reader, writer) & calculator.
  7. Identifying term dependencies automatically to assign terms to classes and leveraging BM25ec.
  8. Query expansion strategies downweight terms not in initial query with BM25. (See published methods below:) Discuss how to extend the current BM25 scorer to support adjusting term weights. How can we also support ranking by terms not in the DocumentIterator used to retrieve the documents?
    1. . Billerbeck B, Zobel J: Questioning Query Expansion: An Examination of Behaviour and Parameters. In: The Fifteenth Australian Database Conference (ADC2004): 2004; Dunedin, New Zealand: Australian Computer Society; 2004.
    2. . Robertson S, Walker S: Okapi/Keenbow at TREC-8. In: Eigth Text Retrieval Evaluation Conference (TREC-8): 2000;
  9. Retrieval performance evaluation without human judgements.
  10. Future applications to biological sequence retrieval
Personal tools