Abstract
In the multilingual World Wide Web, it is critical for Web applications, such as multilingual search engines and targeted international advertisements, to know what languages the user understands. However, online users are often unwilling to make the effort to explicitly provide this information. Additionally, language identification techniques struggle when a user does not use all the languages they know to directly interact with the applications. This work proposes a method of inferring the language(s) online users comprehend by analyzing their social profiles. It is mainly based on the intuition that a user’s experiences could imply what languages they know. This is nontrivial, however, as social profiles are usually incomplete, and the languages that are regionally related or similar in vocabulary may share common features; this makes the signals that help to infer language scarce and noisy. This work proposes a language and social relation-based factor graph model to address this problem. To overcome these challenges, it explores external resources to bring in more evidential signals, and exploits the dependency relations between languages as well as social relations between profiles in modeling the problem. Experiments in this work are conducted on a large-scale dataset. The results demonstrate the success of our proposed approach in language inference and show that the proposed framework outperforms several alternative methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this paper, comprehand means the user is able to grasp information in that language to a good extent.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Tucker, R.: A global perspective on bilingualism and bilingual education. In: Georgetown University Round Table on Languages and Linguistics, pp. 332–340 (1999)
Diamond, J.: The benefits of multilingualism. Sci. Wash. 330(6002), 332–333 (2010)
Ghorab, M., Leveling, J., Zhou, D., Jones, G.J., Wade, V.: Identifying common user behaviour in multilingual search logs. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 518–525. Springer, Heidelberg (2010)
Oakes, M., Xu, Y.: A search engine based on query logs, and search log analysis at the university of Sunderland. In: Proceedings of the 10th Cross Language Evaluation Forum (2009)
Kontaxis, G., Polychronakis, M., et al.: Minimizing information disclosure to third parties in social login platforms. Int. J. Inf. Secur. 11(5), 321–332 (2012)
Burger, J.D., et al.: Discriminating gender on Twitter. In: EMNLP, pp. 1301–1309 (2011)
Li, R., Wang, S., Deng, H., et al.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: SIGKDD, pp. 1023–1031 (2012)
Dunning, T.: Statistical identification of language. Technical Report MCCS 940–273, Computing Research Laboratory, New Mexico State University (1994)
Xia, F., Lewis, W.D., Poon, H.: Language ID in the context of harvesting language data off the web. In: EACL, pp. 870–878 (2009)
Martins, B., et al.: Language identification in web pages. In: SAC, pp. 764–768 (2005)
Stiller, J., Gäde, M., Petras, V.: Ambiguity of queries and the challenges for query language detection. In: The proceedings of Cross Language Evaluation Forum (2010)
Carter, S., et al.: Microblog language identification: Overcoming the limitations of short, unedited and idiomatic text. Lang. Resour. Eval. 47(1), 195–215 (2013)
Qiu, F., Cho, J.: Automatic identification of user interest for personalized search. In: WWW, pp. 727–736 (2006)
White, R.W., Bailey, P., Chen, L.: Predicting user interests from contextual information. In: SIGIR, pp. 363–370 (2009)
Liu, J., Dolan, P., Pedersen, E.R.: Personalized news recommendation based on click behavior. In: IUI, pp. 31–40 (2010)
Xu, S., et al.: Exploring folksonomy for personalized search. In: SIGIR, pp. 155–162 (2008)
Provost, F., Dalessandro, B., Hook, R., et al.: Audience selection for on-line brand advertising: privacy-friendly social network targeting. In: SIGKDD, pp. 707–716 (2009)
Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: WSDM, pp. 251–260 (2010)
Maheshwari, S., Sainani, A., Reddy, P.: An approach to extract special skills to improve the performance of resume selection. In: Kikuchi, S., Sachdeva, S., Bhalla, S. (eds.) DNIS 2010. LNCS, vol. 5999, pp. 256–273. Springer, Heidelberg (2010)
Wang, Z., Li, S., Kong, F., Zhou, G.: Collective personal profile summarization with social networks. In: EMNLP, pp. 715–725 (2013)
Yang, Z., Cai, K., et al.: Social context summarization. In: SIGIR, pp. 255–264 (2011)
Dong, Y., Tang, J., Wu, S., et al.: Link prediction and recommendation across heterogeneous social networks. In: ICDM, pp. 181–190 (2012)
Tang, W., Zhuang, H., Tang, J.: Learning to infer social ties in large networks. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 381–397. Springer, Heidelberg (2011)
Tang, J., Wu, S., Sun, J.: Confluence: Conformity influence in large social networks. In: SIGKDD, pp. 347–355 (2013)
Hammersley, J.M., Clifford, P.: Markov fields on finite graphs and lattices. Unpublished Manuscript (1971)
Acknowledgements
This research is supported by Science Foundation Ireland through the CNGL Programme (Grant 12/CE/I2267) in the ADAPT Centre (www.adaptcentre.ie) at Trinity College Dublin. The work is also supported by the National Natural Science Foundation of China under Project No. 61300129, and a project Sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, China under grant number [2013] 1792.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Xu, Y., Rami Ghorab, M., Wang, Z., Zhou, D., Lawless, S. (2016). Do Your Social Profiles Reveal What Languages You Speak? Language Inference from Social Media Profiles. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-30671-1_41
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)