-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logged contextual bandits without probabilities #3910
Comments
There is no pre-canned solution for "what do I do if I have data that was collected with exploration but I don't have the probabilities?". Some people try and learn a probability, which can work but is very situational. This paper may also be of interest to you: https://arxiv.org/abs/1003.0120 I am not sure if warm cb helps or not, there is the option to use cost sensitive labels as input but it's nor clear to me if this is right. I agree, I don't think cbify fits here as that would require supervised data as far as I know. |
@JohnLangford can you please help to answer this? |
@jackgerrits thanks for the reference. Indeed an interesting paper. |
Predicting the probability of the action is about the only approach to solving the "gah, we didn't record the probabilities". However, my experience with this approach is that it's fairly sensitive to the quality of these predictions. Do you have the right features upon which to make these predictions? (Is there true exploration going on even if not recorded?) Small errors in probabilities can be magnified because the optimization process may (effectively) seek out those errors. This is also the regime where double-robust approaches may be particularly helpful. |
Thanks @JohnLangford , does VW supports this out of the box? Or do I need to make the predictions myself outside the package? Do you have a reference how can it be done in VW? |
There isn't a canned way to do it, but you could of course invoke VW twice to do this. Use multiclass prediction with probabilities, then do contextual bandits. |
Closing as this isn't currently on our roadmap, if this feature has high demand we can reopen it |
The example in https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Logged-Contextual-Bandit-Example assumes one knows the action probabilities. However, in many cases, these probabilities are unknown as they were not logged. What is the best practice in this situation? Is it predicting the action (propensity model) or is there a way to explicitly tell VW to learn this internally without having an external model learning the action probabilities?
I have seen a warm start example here https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Warm-starting-contextual-bandits but it is not clear because it seems as if the data is of supervised learning and there is no action/cost data.
The text was updated successfully, but these errors were encountered: