Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logged contextual bandits without probabilities #3910

Closed
chanansh opened this issue May 4, 2022 · 7 comments
Closed

logged contextual bandits without probabilities #3910

chanansh opened this issue May 4, 2022 · 7 comments

Comments

@chanansh
Copy link

chanansh commented May 4, 2022

The example in https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Logged-Contextual-Bandit-Example assumes one knows the action probabilities. However, in many cases, these probabilities are unknown as they were not logged. What is the best practice in this situation? Is it predicting the action (propensity model) or is there a way to explicitly tell VW to learn this internally without having an external model learning the action probabilities?

I have seen a warm start example here https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Warm-starting-contextual-bandits but it is not clear because it seems as if the data is of supervised learning and there is no action/cost data.

@jackgerrits
Copy link
Member

There is no pre-canned solution for "what do I do if I have data that was collected with exploration but I don't have the probabilities?". Some people try and learn a probability, which can work but is very situational.

This paper may also be of interest to you: https://arxiv.org/abs/1003.0120

I am not sure if warm cb helps or not, there is the option to use cost sensitive labels as input but it's nor clear to me if this is right.

I agree, I don't think cbify fits here as that would require supervised data as far as I know.

@chanansh
Copy link
Author

@JohnLangford can you please help to answer this?

@chanansh
Copy link
Author

chanansh commented May 27, 2022

@jackgerrits thanks for the reference. Indeed an interesting paper.

@JohnLangford
Copy link
Member

Predicting the probability of the action is about the only approach to solving the "gah, we didn't record the probabilities". However, my experience with this approach is that it's fairly sensitive to the quality of these predictions. Do you have the right features upon which to make these predictions? (Is there true exploration going on even if not recorded?) Small errors in probabilities can be magnified because the optimization process may (effectively) seek out those errors. This is also the regime where double-robust approaches may be particularly helpful.

@chanansh
Copy link
Author

Thanks @JohnLangford , does VW supports this out of the box? Or do I need to make the predictions myself outside the package? Do you have a reference how can it be done in VW?

@JohnLangford
Copy link
Member

There isn't a canned way to do it, but you could of course invoke VW twice to do this. Use multiclass prediction with probabilities, then do contextual bandits.

@olgavrou
Copy link
Collaborator

Closing as this isn't currently on our roadmap, if this feature has high demand we can reopen it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants