Passage Consensus

From Icbwiki

Jump to: navigation, search

Contents

Passage Consensus

We can use data from multiple trec evaluations in order to come up with a consensus regarding actual the location of passages within documents. A class called "evaluation.passage.PassageConsensusEvaluation" has been created for this purpose. Output from evaluate.Evaluate is taken as input to the consensus process along with a consensus algorithm (class). The output of the consensus process is a set of files with the start and end positions of passages updated (if applicable) with the consensus of similar passages from all the runs. The number of files written equals the number of files read. By default, the names of the output files will equal the names of the input files with the extension ".consensus" appended.

Input File Format

Input to the consensus process should be of the form:

 topic id rank score subscore start length runId

Tab characters should delimit each field in the line, but any spaces should work as well. The field "subscore" is optional. For example:

 160 11724899 58 0.9785368 23692 50 icb1
 160 11724899 58 0.9785368 23712 10 icb1

Could be used as input.

Command Line Usage

The following are valid options for the Passage Consensus process:

 -c,--consensus <PassageConsensus classname>
 -h,--help
 -i,--input <Input File(s)> | -l,--list <Input list>
 -o,--output <Extension to append to output files> (default is ".consensus")

At minimum, the passage consensus process requires the names of the input file(s) and the consensus algorithm (class) to use. The input files can be listed individually or a list of files can be specified inside a text file in the same format as the Rank Fusion process accepts.

For example, the following command will calculate the consensus of passages from the runs called eval-reranked-2006-1.txt, eval-reranked-2006-2.txt and eval-reranked-2006-3.txt using the union of applicable passages from all three runs.

 $ java -cp classes-eval:lib/commons-cli-1.0.jar:lib/commons-collections-3.2.jar:lib/commons-logging-1.1.jar:lib/commons-lang-2.3.jar:lib/commons-io-1.3.1.jarlib/fastutil-5.0.9.jar" evaluation.passage.PassageConsensusEvaluation --consensus evaluation.passage.UnionConsensus --input eval-reranked-2006-*.txt

Consensus Algorithms

The following algorithms are currently supported by the passage consensus process:

  • Union. The consensus range is the maximal overlap (union) of the all the given ranges. If the ranges are disjoint, the gap between the ranges is included in the result.
  • Median. The consensus range is determined by taking the median start and median end values of the given ranges.
  • Mean. The consensus range is determined by taking the mean start and mean end values of the given ranges.
Personal tools