logged contextual bandits without probabilities #3910

chanansh · 2022-05-04T11:40:00Z

The example in https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Logged-Contextual-Bandit-Example assumes one knows the action probabilities. However, in many cases, these probabilities are unknown as they were not logged. What is the best practice in this situation? Is it predicting the action (propensity model) or is there a way to explicitly tell VW to learn this internally without having an external model learning the action probabilities?

I have seen a warm start example here https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Warm-starting-contextual-bandits but it is not clear because it seems as if the data is of supervised learning and there is no action/cost data.

jackgerrits · 2022-05-05T16:10:55Z

There is no pre-canned solution for "what do I do if I have data that was collected with exploration but I don't have the probabilities?". Some people try and learn a probability, which can work but is very situational.

This paper may also be of interest to you: https://arxiv.org/abs/1003.0120

I am not sure if warm cb helps or not, there is the option to use cost sensitive labels as input but it's nor clear to me if this is right.

I agree, I don't think cbify fits here as that would require supervised data as far as I know.

chanansh · 2022-05-27T15:27:55Z

@JohnLangford can you please help to answer this?

chanansh · 2022-05-27T17:33:57Z

@jackgerrits thanks for the reference. Indeed an interesting paper.

JohnLangford · 2022-05-27T20:05:30Z

Predicting the probability of the action is about the only approach to solving the "gah, we didn't record the probabilities". However, my experience with this approach is that it's fairly sensitive to the quality of these predictions. Do you have the right features upon which to make these predictions? (Is there true exploration going on even if not recorded?) Small errors in probabilities can be magnified because the optimization process may (effectively) seek out those errors. This is also the regime where double-robust approaches may be particularly helpful.

chanansh · 2022-05-27T20:15:54Z

Thanks @JohnLangford , does VW supports this out of the box? Or do I need to make the predictions myself outside the package? Do you have a reference how can it be done in VW?

JohnLangford · 2022-05-27T20:19:54Z

There isn't a canned way to do it, but you could of course invoke VW twice to do this. Use multiclass prediction with probabilities, then do contextual bandits.

olgavrou · 2022-09-22T15:44:29Z

Closing as this isn't currently on our roadmap, if this feature has high demand we can reopen it

olgavrou closed this as completed Sep 22, 2022

olgavrou mentioned this issue Dec 8, 2022

Use CB for campaign recommendation using Vowpal Wabbit when I don't have probabilities in my data #4319

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logged contextual bandits without probabilities #3910

logged contextual bandits without probabilities #3910

chanansh commented May 4, 2022

jackgerrits commented May 5, 2022

chanansh commented May 27, 2022

chanansh commented May 27, 2022 •

edited

Loading

JohnLangford commented May 27, 2022

chanansh commented May 27, 2022

JohnLangford commented May 27, 2022

olgavrou commented Sep 22, 2022

logged contextual bandits without probabilities #3910

logged contextual bandits without probabilities #3910

Comments

chanansh commented May 4, 2022

jackgerrits commented May 5, 2022

chanansh commented May 27, 2022

chanansh commented May 27, 2022 • edited Loading

JohnLangford commented May 27, 2022

chanansh commented May 27, 2022

JohnLangford commented May 27, 2022

olgavrou commented Sep 22, 2022

chanansh commented May 27, 2022 •

edited

Loading