TY - GEN
T1 - Improving ESA with document similarity
AU - Polajnar, Tamara
AU - Aggarwal, Nitish
AU - Asooja, Kartik
AU - Buitelaar, Paul
PY - 2013
Y1 - 2013
N2 - Explicit semantic analysis (ESA) is a technique for computing semantic relatedness between natural language texts. It is a document-based distributional model similar to latent semantic analysis (LSA), which is often built on the Wikipedia database when it is required for general English usage. Unlike LSA, however, ESA does not use dimensionality reduction, and therefore it is sometimes unable to account for similarity between words that do not co-occur with same concepts, even if their concepts themselves cover similar subjects. In the Wikipedia implementation ESA concepts are Wikipedia articles, and the Wikilinks between the articles are used to overcome the concept-similarity problem. In this paper, we provide two general solutions for integration of concept-concept similarities into the ESA model, ones that do not rely on a particular corpus structure and do not alter the explicit concept-mapping properties that distinguish ESA from models like LSA and latent Dirichlet allocation (LDA).
AB - Explicit semantic analysis (ESA) is a technique for computing semantic relatedness between natural language texts. It is a document-based distributional model similar to latent semantic analysis (LSA), which is often built on the Wikipedia database when it is required for general English usage. Unlike LSA, however, ESA does not use dimensionality reduction, and therefore it is sometimes unable to account for similarity between words that do not co-occur with same concepts, even if their concepts themselves cover similar subjects. In the Wikipedia implementation ESA concepts are Wikipedia articles, and the Wikilinks between the articles are used to overcome the concept-similarity problem. In this paper, we provide two general solutions for integration of concept-concept similarities into the ESA model, ones that do not rely on a particular corpus structure and do not alter the explicit concept-mapping properties that distinguish ESA from models like LSA and latent Dirichlet allocation (LDA).
UR - https://www.scopus.com/pages/publications/84875431300
U2 - 10.1007/978-3-642-36973-5_49
DO - 10.1007/978-3-642-36973-5_49
M3 - Conference Publication
AN - SCOPUS:84875431300
SN - 9783642369728
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 582
EP - 593
BT - Advances in Information Retrieval - 35th European Conference on IR Research, ECIR 2013, Proceedings
T2 - 35th European Conference on Information Retrieval, ECIR 2013
Y2 - 24 March 2013 through 27 March 2013
ER -