Easy Tasks Dominate Information Retrieval Evaluation Results
The evaluation of information retrieval systems involves the creation of potential user needs for which systems try to find relevant documents. The difficulty of these topics differs greatly and final scores for systems are typically based on the mean average. However, the topics which are relatively easy to solve, have a much larger impact on the final system ranking than hard topics. This paper presents a novel methodology to measure that effect. The results of a large evaluation experiment with 100 topics from the Cross Language Evaluation Forum (CLEF) allow a split of the topics into four groups according to difficulty. The easy topics have a larger impact especially for multilingual retrieval. Nevertheless the internal test reliability as measured by Cronbach's Alpha is higher for more difficult topics. We can show how alternative, robust measures like the geometric average distribute the effect of the topics more evenly.
Full Text: PDF