Naga Chokkanathan

Recommender Systems

Posted on: August 11, 2011

Yesterday, I attended a session organized by Yahoo! R & D – Mr. Deepak Agarwal delivered a very nice talk on ”Recommender Systems: Art and science of matching items to users”. It had many interesting insights which will be useful to solution designers / business analysts / programmers / general public who want to know how ‘personalization’ works on web – I am posting my raw notes from the session here, Please ignore spelling / grammar errors, if any 🙂

– Examples: Recommend ADs on pages (Google adwords), Recommend news items on portals, Recommend related items to buy (amazon), Recommend movies based on ratings etc., (Netflix), Recommend people (facebook), Recommend Deals (ebay)

– In most cases, real estate available is short, but the choices are more

– We use these recommender systems on a daily basis, without realizing them

– Goal of recommender systems: Server right items to users in an automated fashion, to optimize long-term business objectives

– Components:
    1. User
    2. Context (Page / Previous item viewed etc.,)
    3. Item inventory (Articles, Web Pages, ADs etc.,)

    Challenge: Construct an automated algorithm to select item(s) to show

    – Get feedback (Click, time spent, rating, buy etc.,)
    – Refine parameters of the algorithm
    – Repeat (large number of times)
    – Optimize metric(s) of interest (total clicks, total revenue etc.,)

    Low marginal cost per serve
    Efficient and intelligent systems can provide significant improvements

– Data mining: Clever algorithms

    * So much data, enough to process it all and process it fast?
    * Ideally, we want to learn every user-item interaction (# of things to learn increases with data size)
    * Dynamic nature
    * Learn quickly, because there is very little reaction time
    * Balance between heavy users (less in number – 20:80 rule) and not-so-freqent users (large in number)

– Simple approach: Segment Users / Items and serve most popular items or item segements to user segments

Example: Yahoo! front page

But it didn’t work, Because people who read an article won’t click again and that was affecting the overall rating for articles, which was not realistic anymore

– Clues to solve the mystery

Other sources of bias? How to adjust for them?

– Simple idea to remove bias – display articles at random to a small randomly chosen population

    – Call this the "Random Bucket"
    – Randomization removes bias in data (Charles Pierce, 1877; R. A. Fisher, 1935)

– Methodology: Select a random set of users, show them random set of articles and see how many are clicking it, because users and articles were chosen randomly, there won’t be any bias

– Random bucket ensures continuous flow of data for all articles, we quickly discard bad articles and converge to the best one
    – Click lift 40% initially
    – After 3 years, 200+% (due to continuous algorithm improvements, and content improvements due to that)

– Typical media: Editor decides which article goes where

– Internet media: Users decide

– Lessons Learnt:

    – It is okay to start with simple models that learn a few things, but beware of the biases inherent in your data
    – Randomization is a friend, use it when you can. Update the models fast, this may reduce the bias
    – What if we can’t afford complete randomization? Learn how to gamble – Gamblers do adoptive randomization

– Why learn how to gamble?

    Consider a slot machine with two arms
    P1 > P2 … unknown payoff probabilities

    Gambler has 1000 plays, what is the best way to experiment? (To maximize total expected reward)

    This is called the "bandit" problem, have been studied for a long time

    Optimal solution: Play the arm that has maximum potential of being good optimistic in the face of uncertainity

– Recommender Problems: Bandits?

    Two items

    Item 1 CTR = 2/100
    Item 2 CTR = 250/10000

    – Greedy: Show Item 2 to all

        Not a good idea

    – Item 1 CTR estimate noisy; Item could be potentially better
        – Invest in Item 1 for better overall performance on average

– Go Granular, But with back-off

    Too little data at granular level, need to borrow from coarse resultions with abundant data (Smoothing, Shrinkage)

    For example, no data at "pub-id=88, ad-id=77, zip= Palo Alto", So let me see at the data for Palo Alto, If that is also very little data for Bay Area

    Similarly, look at all ADs given by Publisher with ID 88 and so on

    There are different ways in which we can get data, So go for a calculation

    CTR = Weight1 * Score1 + Weight2 * Score2 … Weightn * Scoren

– How to decide the weight – Or rather, how much to borrow from ancestors

    – Depends on heterogeneity in CTRs of small cells
        * Ancestors with similar CTR child nodes are more credible

    – Example: If all zip-codes in Bay area have similar CTRs, more weights given to Bay Area node itself
        * Pool similar cases, separate dissimilar ones


    – If you have lot of data, use it
    – If you have less data, look at the ancestors
    – If ancestors have less data, look for the right one based on the formula above

– Hierarchical smoothing

– Back-off model

– Data Model:

    User i visits a website
    Article j is served based on an algorithm
    Depending on whether they click on the article or not, we decide the score for future checks

– When a new user comes

    – Look at the available (Little) data
    – Assign the news items (to start with) based on this data
    – Observe continuously
    – Move them from one news item / category to another

– Post Click utilities
    Recommender + Editorial

    CTR + Utility per click needs to be compared


2 Responses to "Recommender Systems"

thanks for the post

Thank you Nandhakumar

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of the Organization He works for / belongs to.

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 61 other followers

Big Adda

%d bloggers like this: