Abstract
Keywords: Personalized Search; Personalized Web Search; Web Search; Re ranking; User Modeling; Improving Web Search; User Profiling; Improving Ranking; Query Log Analysis; Simulated Feedback
The main problem with web search is as follows : There is too much information available on the web; query words used by users often are confusing, ambiguous (a query “Java”can mean the Java island in Indonesia or the Java programming language), and some times are poor descriptors of information need (‘SBH” can mean “State Bank of Hyderabad” or “Syracuse Behavioral Healthcare” among others) ;
users are often not patient enough to see long list of results given by search engines to find relevant information. It has been observed that users typically view top few results, usually top 5 or 10, some times 20 and much fewer times 30 and so on. In this scenario, web search can be made more useful, effective and less burdensome to users by - trying to infer what would be relevant for the current user for a given query considering individual users’ interests and provide those results on the top so that the user does not have to scroll down a long list of results.
The problem of Personalized Search aims to customize search results according to each individual user for him to find the most relevant documents to him on the top by considering his idiosyncrasies. This would possibly satisfy them and help in finding
relevant information easily and quickly.
The major challenges for personalized search are two fold. The first is, modeling appropriate user context and learn a user model. The second is, how to utilize the user model to improve search accuracy. Another important challenge is evaluation of experiments. There are no standard and bench mark datasets available on which experiments can be performed. This makes comparison with earlier work in the litera-
ture and replicating their results difficult. There are also no standard metrics available to effectively evaluate personalized search algorithms. Some commonly used metrics used to evaluate Information Retrieval systems are usually used.
We propose three approaches for personalized web search. Our first approach is based on Statistical Language Modeling techniques. In spite of the progress made in language modeling and IR recently, there has not been much work applying language
modeling techniques to personalized web search. In this approach, we learn a user model by capturing statistical properties of text from his past searches. We explored different contexts. The different contexts include using single word and two adjacent
words (Simple N-Gram based method) and capturing relationship between query and document words (Noisy Channel model based method). Our second approach is based on Machine Learning algorithms which show an interesting and promising framework
for learning user profiles. We make use of ranking SVM, a variation of the classification support vector machines for learning the user model. The above two approaches are a class of approaches that have exploited user feedback data either explicit or implicit.
The third approach is another interesting approach, where, we attempt to model a particular user based on only past queries posed by corresponding user. The basic idea is to see if there is enough information available in just the previous queries without having to use teh clickthrough data made by the user. We have employed simple approaches based on language modeling to learn the user model from the previous
queries of the user.
We propose three approaches for personalized web search. Our first approach is
based on Statistical Language Modeling techniques. In spite of the progress made in
language modeling and IR recently, there has not been much work applying language
modeling techniques to personalized web search. In this approach, we learn a user
model by capturing statistical properties of text from his past searches. We explored
different contexts. The different contexts include using single word and two adjacent
words (Simple N-Gram based method) and capturing relationship between query and
document words (Noisy Channel model based method). Our second approach is based
on Machine Learning algorithms which show an interesting and promising framework
for learning user profiles. We make use of ranking SVM, a variation of the classification
support vector machines for learning the user model. The above two approaches are a
class of approaches that have exploited user feedback data either explicit or implicit.
The third approach is another interesting approach, where, we attempt to model a
particular user based on only past queries posed by corresponding user. The basic idea
is to see if there is enough information available in just the previous queries without
having to use teh clickthrough data made by the user. We have employed simple
approaches based on language modeling to learn the user model fr