Abstract
Evolutionary algorithms and, in particular, Genetic Programming (GP) are increasingly being applied to the problem of evolving term-weighting schemes in Information Retrieval (IR). One fundamental problem with the solutions generated by these stochastic processes is that they are often difficult to analyse. A number of questions regarding these evolved term-weighting schemes remain unanswered. One interesting question is; do different runs of the GP process bring us to similar points in the solution space? This paper deals with determining a number of measures of the distance between the ranked lists (phenotype) returned by different term-weighting schemes. Using these distance measures, we develop trees that show the phenotypic distance between these termweighting schemes. This framework gives us a representation of where these evolved solutions lie in the solution space. Finally, we evolve several global term-weighting schemes and show that this framework is indeed useful for determining the relative closeness of these schemes and for determining the expected performance on general test data.
| Original language | English |
|---|---|
| Pages (from-to) | 6-11 |
| Number of pages | 6 |
| Journal | CEUR Workshop Proceedings |
| Publication status | Published - 2006 |
| Event | ECAI 2006 3rd International Workshop on Text-Based Information Retrieval, TIR 2006 - Riva del Garda, Italy Duration: 29 Aug 2006 → 29 Aug 2006 |