Search Personalization Using Machine Learning

Hema Yoganarasimhan, 2017, 17-117

As the Internet has grown, the amount of information and products available in individual websites has increased exponentially. While large assortments give an abundance of choice to consumers, they also make it hard for them to locate the exact product or information they are looking for. Therefore, most businesses have adopted query-based search models that help consumers locate the product/information that best fits their needs. However, search is costly in both time and effort. Long and/or unsuccessful searches can have negative consequences for a firm since consumers may leave its website without clicking and/or making purchases. Firms have long grappled with the question of how to improve the search process.

This report considers the problem of optimally ranking a set of results shown in response to a query through personalized ranking of results. Author Hema Yoganarasimhan proposes a scalable framework to implement personalized search and evaluate the returns to personalization.

She presents a machine learning framework for search personalization that employs a three-pronged approach – (a) Feature generation, (b) NDCG-based LambdaMART algorithm, and (c) Feature selection using the Wrapper method. She applies the framework to a large-scale dataset from Yandex, the fourth-largest search engine in the world, with over 60% market share in Russia and Eastern Europe. She deploys the algorithm on Amazon EC2 servers for speed and efficiency.


Personalization improves clicks to the top position by 3.5% and reduces the average error in rank of a click (AERC) by 9.43% over the baseline. Moreover, personalization based on short-term history or within-session behavior is shown to be less valuable than long-term or across-session personalization.

In addition, there is significant heterogeneity in returns to personalization as a function of user history and query type. First, the quality of personalized results increases monotonically with the length of a user’s history. Given that storing long personal histories has privacy and storage costs, firms can use the estimates from this study to decide the optimal length of history to store.

Second, queries can be classified based on user intent as navigational (go to a website), informational (find information on a topic), and transactional (do something). Using a data-driven classification of queries based on this “do-know-go” system, the author shows that transactional and informational queries benefit significantly more from personalization compared to navigational queries. Third, queries benefit differentially based on their past performance, with poorly performing queries benefiting much more from personalization.

Finally, the author demonstrates the scalability of the proposed framework and derives the set of optimal features that maximizes accuracy while minimizing computing time.

Hema Yoganarasimhan is Associate Professor, Foster School of Business, University of Washington, Seattle.

This research was supported by funding from Marketing Science Institute. I am grateful to Yandex for providing the data used in this paper. I also thank Preyas Desai, Daria Dzyabura, Brett Gordon, Lan Luo, and Olivier Toubia for detailed comments on the paper. Thanks are also due to the participants of the following conferences – UT Dallas FORMS 2014, Marketing Science 2014, Big Data and Marketing Analytics 2014, and Stanford Digital Marketing 2016, and the participants of the following seminars – University of Washington Marketing Camp 2015, Harvard Business School Marketing Seminar 2017, and Duke Marketing Seminar 2017 for their constructive feedback.


  • Corporate: FREE
  • Academic: FREE
  • Subscribers: FREE
  • Public: $18.00



Employees of MSI Member Companies enjoy the benefits of complete online access to content, member conferences and networking with the MSI community.



Qualified academics benefit from a relationship with MSI through access to, conferences and research opportunities.



The public is invited to enjoy partial access to content, a free e-newsletter, selected reports and more.



Become a Subscriber

MSI's Online Library of 400+ reports, authored by marketing academics, offers new research and evidence-based insights

Read More

Stay Informed

The MSI Mailing List

Subscribe to our email list to stay informed about upcoming events, news, etc.