ABSTRACT
A limitation of standard information retrieval (IR) models is that the notion of term composionality is restricted to pre-defined phrases and term proximity. Standard text based IR models provide no easy way of representing semantic relations between terms that are not necessarily phrases, such as the equivalence relationship between `osteoporosis' and the terms `bone' and `decay'. To alleviate this limitation, we introduce a relevance feedback (RF) method which makes use of word embedded vectors. We leverage the fact that the vector addition of word embeddings leads to a semantic composition of the corresponding terms, e.g. addition of the vectors for `bone' and `decay' yields a vector that is likely to be close to the vector for the word `osteoporosis'. Our proposed RF model enables incorporation of semantic relations by exploiting term compositionality with embedded word vectors. We develop our model for RF as a generalization of the relevance model (RLM). Our experiments demonstrate that our word embedding based RF model significantly outperforms the RLM model on standard TREC test collections, namely the TREC 6,7,8 and Robust ad-hoc and the TREC 9 and 10 WT10G test collections.
- A. Berger and J. Lafferty. Information retrieval as statistical translation. In SIGIR '99, pages 222--229, 1999. Google ScholarDigital Library
- C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2004 terabyte track. In TREC '04, 2004.Google Scholar
- S. Clinchant and E. Gaussier. A theoretical analysis of pseudo-relevance feedback models. In ICTIR '13, pages 6--13, 2013. Google ScholarDigital Library
- K. Collins-Thompson, C. Macdonald, P. N. Bennett, F. Diaz, and E. M. Voorhees. TREC 2014 web track overview. In Proc. of TREC 2014, 2014.Google Scholar
- S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391--407, 1990.Google ScholarCross Ref
- F. Diaz. Condensed list relevance models. In ICTIR '15, pages 313--316, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- M. Efron, J. Lin, J. He, and A. de Vries. Temporal feedback for tweet search with non-parametric density estimation. In Proc. of SIGIR '14, pages 33--42, 2015. Google ScholarDigital Library
- D. Ganguly, J. Leveling, and G. J. F. Jones. Topical relevance model. In AIRS '12, pages 326--335, 2012.Google ScholarCross Ref
- D. Ganguly, D. Roy, M. Mitra, and G. J. F. Jones. Word embedding based generalized language model for information retrieval. In SIGIR'15, pages 795--798, 2015. Google ScholarDigital Library
- T. Goodwin and S. M. Harabagiu. UTD at TREC 2014: Query expansion for clinical decision support. In Proc. of TREC 2014, 2014.Google Scholar
- M. Grbovic, N. Djuric, V. Radosavljevic, F. Silvestri, and N. Bhamidipati. Context- and content-aware embeddings for query rewriting in sponsored search. In Proc. of SIGIR 2015, pages 383--392, 2015. Google ScholarDigital Library
- D. Hiemstra. Using Language Models for Information Retrieval. PhD thesis, Center of Telematics and Information Technology, AE Enschede, 2000.Google Scholar
- T. Hofmann. Probabilistic latent semantic indexing. In Proc. of SIGIR'99, pages 50--57, 1999. Google ScholarDigital Library
- N. A. Jaleel, J. Allan, W. B. Croft, F. Diaz, L. S. Larkey, X. Li, M. D. Smucker, and C. Wade. Umass at TREC 2004: Novelty and HARD. In Proc. of TREC '04, 2004.Google Scholar
- V. Lavrenko and B. W. Croft. Relevance based language models. In Proc. of SIGIR '01, pages 120--127, 2001. Google ScholarDigital Library
- C. Lioma, J. G. Simonsen, B. Larsen, and N. D. Hansen. Non-compositional term dependence for information retrieval. In Proc. of SIGIR '15, pages 595--604, 2015. Google ScholarDigital Library
- Y. Lv and C. Zhai. A comparative study of methods for estimating query language models with pseudo feedback. In Proc. of CIKM '09, pages 1895--1898, 2009. Google ScholarDigital Library
- D. Metzler and W. B. Croft. Latent concept expansion using markov random fields. In Proc. of SIGIR '07, pages 311--318, 2007. Google ScholarDigital Library
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Proc. of NIPS '13, pages 3111--3119, 2013. Google ScholarDigital Library
- D. Pal, M. Mitra, and K. Datta. Improving query expansion using wordnet. JAIST, 65(12):2469--2478, 2014.Google Scholar
- A. Sordoni, Y. Bengio, and J.-Y. Nie. Learning concept embeddings for query expansion by quantum entropy minimization. In Proc. of AAAI '14, 2014. Google ScholarDigital Library
- I. Vulic and M. Moens. Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In Proc. of SIGIR '15, pages 363--372, 2015. Google ScholarDigital Library
- X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR '06, pages 178--185, 2006. Google ScholarDigital Library
- X. Yi and J. Allan. A comparative study of utilizing topic models for information retrieval. In Proc. of ECIR '09, pages 29--41, 2009. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. TOIS, 22(2):179--214, Apr. 2004. Google ScholarDigital Library
- G. Zheng and J. Callan. Learning to reweight terms with distributed representations. In Proc. of SIGIR'15, pages 575--584, 2015. Google ScholarDigital Library
Index Terms
- Word Vector Compositionality based Relevance Feedback using Kernel Density Estimation
Recommendations
Interactive content-based image retrieval using relevance feedback
Database search engines are generally used in a one-shot fashion in which a user provides query information to the system and, in return, the system provides a number of database instances to the user. A relevance feedback system allows the user to ...
Image retrieval based on indexing and relevance feedback
In content based image retrieval (CBIR) system, search engine retrieves the images similar to the query image according to a similarity measure. It should be fast enough and must have a high precision of retrieval. Indexing scheme is used to achieve a ...
A novel log-based relevance feedback technique in content-based image retrieval
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on MultimediaRelevance feedback has been proposed as an important technique to boost the retrieval performance in content-based image retrieval (CBIR). However, since there exists a semantic gap between low-level features and high-level semantic concepts in CBIR, ...
Comments