research-article

Word Vector Compositionality based Relevance Feedback using Kernel Density Estimation

Authors:
Dwaipayan Roy

Indian Statistical Institute, Kolkata, India

Indian Statistical Institute, Kolkata, India
View Profile

,
Debasis Ganguly

Dublin City University, Dublin, Ireland

Dublin City University, Dublin, Ireland
View Profile

,
Mandar Mitra

Indian Statistical Institute, Kolkata, India

Indian Statistical Institute, Kolkata, India
View Profile

,
Gareth J.F. Jones

Dublin City University, Dublin, Ireland

Dublin City University, Dublin, Ireland
View Profile

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementOctober 2016Pages 1281–1290https://doi.org/10.1145/2983323.2983750

Published:24 October 2016Publication History

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Pages 1281–1290

ABSTRACT

A limitation of standard information retrieval (IR) models is that the notion of term composionality is restricted to pre-defined phrases and term proximity. Standard text based IR models provide no easy way of representing semantic relations between terms that are not necessarily phrases, such as the equivalence relationship between `osteoporosis' and the terms `bone' and `decay'. To alleviate this limitation, we introduce a relevance feedback (RF) method which makes use of word embedded vectors. We leverage the fact that the vector addition of word embeddings leads to a semantic composition of the corresponding terms, e.g. addition of the vectors for `bone' and `decay' yields a vector that is likely to be close to the vector for the word `osteoporosis'. Our proposed RF model enables incorporation of semantic relations by exploiting term compositionality with embedded word vectors. We develop our model for RF as a generalization of the relevance model (RLM). Our experiments demonstrate that our word embedding based RF model significantly outperforms the RLM model on standard TREC test collections, namely the TREC 6,7,8 and Robust ad-hoc and the TREC 9 and 10 WT10G test collections.

References

A. Berger and J. Lafferty. Information retrieval as statistical translation. In SIGIR '99, pages 222--229, 1999. Google ScholarDigital Library
C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2004 terabyte track. In TREC '04, 2004.Google Scholar
S. Clinchant and E. Gaussier. A theoretical analysis of pseudo-relevance feedback models. In ICTIR '13, pages 6--13, 2013. Google ScholarDigital Library
K. Collins-Thompson, C. Macdonald, P. N. Bennett, F. Diaz, and E. M. Voorhees. TREC 2014 web track overview. In Proc. of TREC 2014, 2014.Google Scholar
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391--407, 1990.Google ScholarCross Ref
F. Diaz. Condensed list relevance models. In ICTIR '15, pages 313--316, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
M. Efron, J. Lin, J. He, and A. de Vries. Temporal feedback for tweet search with non-parametric density estimation. In Proc. of SIGIR '14, pages 33--42, 2015. Google ScholarDigital Library
D. Ganguly, J. Leveling, and G. J. F. Jones. Topical relevance model. In AIRS '12, pages 326--335, 2012.Google ScholarCross Ref
D. Ganguly, D. Roy, M. Mitra, and G. J. F. Jones. Word embedding based generalized language model for information retrieval. In SIGIR'15, pages 795--798, 2015. Google ScholarDigital Library
T. Goodwin and S. M. Harabagiu. UTD at TREC 2014: Query expansion for clinical decision support. In Proc. of TREC 2014, 2014.Google Scholar
M. Grbovic, N. Djuric, V. Radosavljevic, F. Silvestri, and N. Bhamidipati. Context- and content-aware embeddings for query rewriting in sponsored search. In Proc. of SIGIR 2015, pages 383--392, 2015. Google ScholarDigital Library
D. Hiemstra. Using Language Models for Information Retrieval. PhD thesis, Center of Telematics and Information Technology, AE Enschede, 2000.Google Scholar
T. Hofmann. Probabilistic latent semantic indexing. In Proc. of SIGIR'99, pages 50--57, 1999. Google ScholarDigital Library
N. A. Jaleel, J. Allan, W. B. Croft, F. Diaz, L. S. Larkey, X. Li, M. D. Smucker, and C. Wade. Umass at TREC 2004: Novelty and HARD. In Proc. of TREC '04, 2004.Google Scholar
V. Lavrenko and B. W. Croft. Relevance based language models. In Proc. of SIGIR '01, pages 120--127, 2001. Google ScholarDigital Library
C. Lioma, J. G. Simonsen, B. Larsen, and N. D. Hansen. Non-compositional term dependence for information retrieval. In Proc. of SIGIR '15, pages 595--604, 2015. Google ScholarDigital Library
Y. Lv and C. Zhai. A comparative study of methods for estimating query language models with pseudo feedback. In Proc. of CIKM '09, pages 1895--1898, 2009. Google ScholarDigital Library
D. Metzler and W. B. Croft. Latent concept expansion using markov random fields. In Proc. of SIGIR '07, pages 311--318, 2007. Google ScholarDigital Library
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Proc. of NIPS '13, pages 3111--3119, 2013. Google ScholarDigital Library
D. Pal, M. Mitra, and K. Datta. Improving query expansion using wordnet. JAIST, 65(12):2469--2478, 2014.Google Scholar
A. Sordoni, Y. Bengio, and J.-Y. Nie. Learning concept embeddings for query expansion by quantum entropy minimization. In Proc. of AAAI '14, 2014. Google ScholarDigital Library
I. Vulic and M. Moens. Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In Proc. of SIGIR '15, pages 363--372, 2015. Google ScholarDigital Library
X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR '06, pages 178--185, 2006. Google ScholarDigital Library
X. Yi and J. Allan. A comparative study of utilizing topic models for information retrieval. In Proc. of ECIR '09, pages 29--41, 2009. Google ScholarDigital Library
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. TOIS, 22(2):179--214, Apr. 2004. Google ScholarDigital Library
G. Zheng and J. Callan. Learning to reweight terms with distributed representations. In Proc. of SIGIR'15, pages 575--584, 2015. Google ScholarDigital Library

Index Terms

Word Vector Compositionality based Relevance Feedback using Kernel Density Estimation
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Language models

Recommendations

Interactive content-based image retrieval using relevance feedback

Database search engines are generally used in a one-shot fashion in which a user provides query information to the system and, in return, the system provides a number of database instances to the user. A relevance feedback system allows the user to ...
Read More
Image retrieval based on indexing and relevance feedback

In content based image retrieval (CBIR) system, search engine retrieves the images similar to the query image according to a similarity measure. It should be fast enough and must have a high precision of retrieval. Indexing scheme is used to achieve a ...
Read More
A novel log-based relevance feedback technique in content-based image retrieval
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia

Relevance feedback has been proposed as an important technique to boost the retrieval performance in content-based image retrieval (CBIR). However, since there exists a semantic gap between low-level features and high-level semantic concepts in CBIR, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
General Chairs:
Snehasis Mukhopadhyay
Indiana University Purdue University Indianapolis, USA
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Program Chairs:
Elisa Bertino
Purdue University
,
Fabio Crestani
University of Lugano
,
Javed Mostafa
University of North Carolina
,
Jie Tang
Tsinghua University
,
Luo Si
Alibaba Group Inc & Purdue University
,
Xiaofang Zhou
University of Queensland
,
Yi Chang
Yahoo Research
,
Yunyao Li
IBM Research - Almaden
,
Parikshit Sondhi
WalmartLabs
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
kernel density estimation
relevance feedback
word compositionality
word vector embedding
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 348
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Word Vector Compositionality based Relevance Feedback using Kernel Density Estimation

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Interactive content-based image retrieval using relevance feedback

Image retrieval based on indexing and relevance feedback

A novel log-based relevance feedback technique in content-based image retrieval