Improving ESA with document similarity

    Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

    9 Citations (Scopus)

    Abstract

    Explicit semantic analysis (ESA) is a technique for computing semantic relatedness between natural language texts. It is a document-based distributional model similar to latent semantic analysis (LSA), which is often built on the Wikipedia database when it is required for general English usage. Unlike LSA, however, ESA does not use dimensionality reduction, and therefore it is sometimes unable to account for similarity between words that do not co-occur with same concepts, even if their concepts themselves cover similar subjects. In the Wikipedia implementation ESA concepts are Wikipedia articles, and the Wikilinks between the articles are used to overcome the concept-similarity problem. In this paper, we provide two general solutions for integration of concept-concept similarities into the ESA model, ones that do not rely on a particular corpus structure and do not alter the explicit concept-mapping properties that distinguish ESA from models like LSA and latent Dirichlet allocation (LDA).

    Original languageEnglish
    Title of host publicationAdvances in Information Retrieval - 35th European Conference on IR Research, ECIR 2013, Proceedings
    Pages582-593
    Number of pages12
    DOIs
    Publication statusPublished - 2013
    Event35th European Conference on Information Retrieval, ECIR 2013 - Moscow, Russian Federation
    Duration: 24 Mar 201327 Mar 2013

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume7814 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference35th European Conference on Information Retrieval, ECIR 2013
    Country/TerritoryRussian Federation
    CityMoscow
    Period24/03/1327/03/13

    Fingerprint

    Dive into the research topics of 'Improving ESA with document similarity'. Together they form a unique fingerprint.

    Cite this