A scalable preference elicitation algorithm using group generalized binary search Max Yi Ren, Clayton Scott and Panos Papalambros University of Michigan, Ann Arbor. | IDETC, Portland, Aug. 06, 2013

Overview

In online shopping, one may encounter many alternatives to choose from.

Recommender systems: To explore similar products or products from people with similar tastes.

Goal of this paper: Develop an interaction that helps users to find their most preferred product in the least effort (fewest queries).

Contribution: To help efficient evaluation of a large number of design alternatives using the crowd.

Difference from conjoint analysis: The proposed interaction searches for the optimal design rather than partworth estimates.


Problem statement

Objective: To find an efficient query algorithm that elicits the most preferred product of a user from a product set.

A product set has a finite number of products, represented by a set of known features. E.g., a car can be represented by (1) MPG, (2) safety, (3) price, (4) style and so on.

A query is a pairwise comparison question to the user: “Do you prefer design A or design B?” The response to a query is binary: A or B.

An interaction consists of a sequence of queries. It terminates when $\text{Pr}(\text{product } i \text{ is the most preferred} | \text{all previous responses}) = 1$.

Problem statement (cont.)

Key assumptions:
(1) User preference can be modeled by a linear utility with known features.

(2) The user has consistent preference throughout the interaction, i.e., fixed partworth.

(3) No response errors.


Key question: What is the best way to query so that an interaction will have the least expected number of queries?

Demonstration of the problem

If you know that the user prefers A over B, which two products will you query next?

Demonstration of the problem (cont.)

When the user prefers A $\succ$ B and A $\succ$ C, we can conclude that A $\succ$ D if the linear utility model holds.

Object identification (The "Twenty Questions" game)

20Q is a game between two people:
◊ Player A first thinks about a person that both players know.
◊ Player B needs to figure out that person by asking yes/no questions, e.g., "Is the person graduating soon?".
◊ Player B wins if he/she figures out within 20 questions.

The query strategy of 20Q is called Generalized Binary Search (GBS).

Main idea of GBS: Query the most uncertain question, i.e., a query that you believe will have 50 percent of chance to be answered "yes" or "no".

Object identification (cont.)

Main idea of GBS: Query the most uncertain question, i.e., a query that you believe will have 50 percent of chance to be answered "yes" or "no". In this example, both feature 1 and 2 can be queried.

Group identification

Group identification problem: In 20Q, what should be the query strategy if player B wins by knowing which group the person belongs to?

In this case, feature 1 should be queried.

Group Generalized Binary Search (GGBS): To minimize the expected number of queries, we need to query the feature that (1) is the most uncertain and (2) the response to which will not separate objects from the same group.
(G. Bellala, S. Bhavnani and C. Scott, Group-based active query selection for rapid diagnosis in time-critical situations, 2012)


Group generalized binary search

This query strategy leads to an efficiency measure of queries.
The query with the most efficiency will be picked in each iteration.


The measure requires calculation of
\[ \text{Pr}(\text{The answer to query } j \text{ is "yes"}| \text{all previous responses}), \]
and
\[ \text{Pr}(\text{Group } i \text{ is the correct group}| \\ \text{all previous responses and the answer to query } j \text{ is "yes"}). \]
Both are based on the calculation of
\[ \text{Pr}(\text{Group } i \text{ is the correct group} | \text{all previous responses}). \]

Our problem is identical to group identification

The key step in the solution is the calculation of
\[\pi_{\Theta^i_a}:= \text{Pr}(\text{Product } i \text{ is the most preferred} \\ | \text{all previous responses up to iteration "a"}).\]


Implementation

On the figure, the green, blue and red shaded areas represent $\pi_{\Theta^A}$, $\pi_{\Theta^B}$ and $\pi_{\Theta^C}$, respectively.

Without any knowledge, we can initialize its distribution so that $\pi_{\Theta^A}=\pi_{\Theta^B}=\pi_{\Theta^C}=1/3$.

We also assume that ${\bf w}$ is piece-wise uniformly distributed over the circle.

The distribution of ${\bf w}$ can be determined by $\pi_{\Theta^A}$, $\pi_{\Theta^B}$, $\pi_{\Theta^C}$ and the arc lengths.

Implementation(cont.)

Calculating conditional probabilities is time consuming.

For the given distribution of ${\bf w}$, $\pi_{\Theta^i_a}$ requires the calculation of arc lengths, or surface areas of a hypersphere in general.

At each iteration, all arcs will be calculated for each candidate query.

E.g., to evaluate the efficiency of a query with A and C, we need to calculate
\[ \text{Pr}(A \text{ is the best}| A \text{ better than } C), \]
\[ \text{Pr}(B \text{ is the best}| A \text{ better than } C), \]
\[ \text{Pr}(C \text{ is the best}| A \text{ better than } C) \text{ (trivial)}. \]

A fast approximation using linear SVM

Approximating $\pi_{\Theta^A}$:

(1) Fit a circle such that its center lies on the green arc and it tangents with one of the boundaries.

(2) Calculate the radius of the circle $R_A$. The approximated area is then $R_A^{(D-1)}$.

(3) Let the depth of the green arc (density of ${\bf w}$ for product A) be $p_A$, we have
\[ \pi_{\Theta^A} = p_AR_A^{(D-1)}.\]

A fast approximation using linear SVM (cont.)

Finding such a circle involves a convex optimization problem identical to a hard-margin SVM, thus liblinear can be used to achieve $O(\text{current query size})$ complexity.


Simulation study

Algorithm implementation: appGGBS

◊ Find the four products with the highest conditional probabilities to be the most preferred (6 candidate queries in total)

◊ Pick a pair from the four designs using GGBS

◊ After getting user response, calculate $\pi_{\Theta^i_a}$ for all candidate products

◊ Terminate when one product has probability 1 to be the most preferred


Simulation study (cont.)

Algorithm to compare with: EGO

◊ The first query contains the two products with highest prior probabilities to be the most preferred

◊ After getting user response, update the partworth vector ${\bf w}$ and calculate $\pi_{\Theta^i_a}$

◊ Each following query contains (i) the previously preferred product and (ii) the product with highest predicted utility that hasn’t been queried yet

◊ Terminate when one product has probability 1 to be the most preferred


Simulation results: Performance

appGGBS has overall better performance. Its superiority increases with num. product and decreases with num. feature.

EGO can outperform appGGBS under high num. feature and low num. product.


Simulation results: Computation

With 15 product features and 160 candidate products, the average response time is less than 1 second.


User test

A set of laptops with:

◊ Screen size: 11 or 13 inches

◊ Storage: 64, 128 or 256 Gb

◊ CPU: Average or fast

◊ Keyboard: With or without

◊ Battery life: Half-day or full-day

◊ Price: based on features and market prices

Test procedure:

◊ The user is shown the entire product set and asked to choose the most preferred one. The user can use filters and sorting tools.

◊ The user then goes through two interactive sessions using appGGBS and EGO.


User test results

All user choices from the product list are consistent with the outcome from the interactive sessions.

appGGBS outperforms EGO on average.

Note: For the linear utility model to work, the features need to be hand tuned to ensure that appGGBS (EGO) will not mistakenly eliminate the correct product based on user responses

Conclusions

An active query algorithm can enhance online shopping and design evaluation experiences by quickly learning the preferred product of a user from a large amount of products.

We proposed a computationally efficient algorithm with strong theoretical support, and tested its performance using real people.

The algorithm is shown to be superior than a previously proposed heuristic algorithm in both simulations and the user test.

Future works

A collaborative filter can be incorporated with the appGGBS algorithm to provide better estimation of prior probabilities $\pi_{\Theta^i}$.

Response error needs to be handled.

Multiple choice instead of pairwise comparison will be implemented.
presentation available at http://maxyiren.appspot.com/idetc2013a.html This project is partially supported by the Automotive Research Center at the University of Michigan. This support is gratefully acknowledged. Thank you all for attending.
What questions do you have?

Research Fellow
Optimal Design Lab
University of Michigan
yiren@umich.edu
maxyiren.appspot.com

Associate Professor
Electrical Engineering and Computer Science
University of Michigan
clayscot@umich.edu
web.eecs.umich.edu/~cscott/

Professor
Optimal Design Lab
University of Michigan
pyp@umich.edu
ode.engin.umich.edu



Group identification (cont.)

The expected number of queries for group identification is an function of:

(1) The query strategy, i.e., a tree structure that tells which query to make under what previous responses

(2) The prior distribution of the groups

/

#