Examining the information retrieval process from an inductive perspective

Ronan Cummins, Mounia Lalmas, Colm O'Riordan

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

1 Citation (Scopus)

Abstract

Term-weighting functions derived from various models of retrieval aim to model human notions of relevance more accurately. However, there is a lack of analysis of the sources of evidence from which important features of these term weighting schemes originate. In general, features pertaining to these term-weighting schemes can be collected from (1) the document, (2) the entire collection and (3) the query. In this work, we perform an empirical analysis to determine the increase in effectiveness as information from these three different sources becomes more accurate. First, we determine the number of documents to be indexed to accurately estimate collection-wide features to obtain near optimal effectiveness for a range of a term-weighting functions. Similarly, we determine the amount of a document and query that must be sampled to achieve near-peak effectiveness. This analysis also allows us to determine the factors that contribute most to the performance of a term-weighting function (i.e. the document, the collection or the query). We use our framework to construct a new model of weighting where we discard the 'bag of words' model and aim to retrieve documents based on the initial physical representation of a document using some basic axioms of retrieval. We show that this is a good first step towards incorporating some more interesting features into a term-weighting function.

Original languageEnglish
Title of host publicationCIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops
Pages89-98
Number of pages10
DOIs
Publication statusPublished - 2010
Event19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10 - Toronto, ON, Canada
Duration: 26 Oct 201030 Oct 2010

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10
Country/TerritoryCanada
CityToronto, ON
Period26/10/1030/10/10

Keywords

  • Information retrieval
  • Models
  • Term-weighting

Fingerprint

Dive into the research topics of 'Examining the information retrieval process from an inductive perspective'. Together they form a unique fingerprint.

Cite this