Feature: Implementing SyntheticContinuousBanditDataset #112

usaito · 2021-07-06T04:31:30Z

new feature

implement SyntheticContinuousBanditDataset that generates synthetic data with (1-dimensional) continuous actions
https://github.com/st-tech/zr-obp/blob/continuous-dataset/obp/dataset/synthetic_continuous.py
this class works as follows

from obp.dataset import (
    SyntheticContinuousBanditDataset,
    linear_reward_funcion_continuous,
    linear_behavior_policy_continuous,
)

dataset = SyntheticContinuousBanditDataset(
    dim_context=5,
    min_action_value=1, 
    max_action_value=10,
    reward_function=linear_reward_funcion_continuous,
    behavior_policy_function=linear_behavior_policy_continuous,
    random_state=12345,
)
bandit_feedback = dataset.obtain_batch_bandit_feedback(n_rounds=10000)

Note that by setting min_action_value and max_action_value, we can control the action space (\mathcal{A} = [1,10] in the above example code)

tests

add some tests of SyntheticContinuousBanditDataset
https://github.com/st-tech/zr-obp/blob/continuous-dataset/tests/dataset/test_synthetic_continuous.py

refactor

rename action_prob to pscore in test_synthetic.py

zr-obp/tests/dataset/test_synthetic.py

Lines 313 to 315 in c77dd99

    
           pscore = linear_behavior_policy(context=context, action_context=action_context) 
        
           assert pscore.shape[0] == n_rounds and pscore.shape[1] == n_actions 
        
           assert np.all(0 <= pscore) and np.all(pscore <= 1)

…ontinuous-estimators

…ontinuous-policy-learner

…ataset

obp/dataset/synthetic_continuous.py

nomuramasahir0 · 2021-07-10T08:35:51Z

Other than the above minor points, LGTM!

usaito · 2021-07-10T14:18:04Z

@nmasahiro Thanks!

Feature: Implementing Continuous OPE Estimators

Feature: Implementing Continuous NN Policy Learner

usaito added 26 commits July 6, 2021 13:29

implement SyntheticContinuousBanditDataset

b1c817b

add tests of synthetic_continuous.py

c77dd99

implement continuous ope estimators

c0874d4

add tests of continuous ope estimators

bf2eaf9

add some check funcs for continuous ope

ea3b2f9

add tests of meta_continuous

ce63f55

implemente meta continuous

8d8cd53

black and flake8

66b8b80

add synthetic_continuous_bandit_feedback

6ee2fde

fix a bug

bee833c

add example code

786eeca

flake8

efab71d

fix typos

1c85d49

implement ContinuousNNPolicyLearner

a0a1ace

fix typos

0332bb0

add some descriptions

1d5aaf2

add tests of ContinuousNNPolicyLearner

1e04f89

add some check functions

d92a476

update

06ecd89

move some arguments to init

8f5884b

fix tests of SyntheticContinuousBanditDataset

84e74c7

fix docstring

97429af

Merge branch 'continuous-dataset' of github.com:st-tech/zr-obp into c…

82c5f30

…ontinuous-estimators

fix some tests to adjust the changes of SyntheticContinuousBanditDataset

0582f84

Merge branch 'continuous-dataset' of github.com:st-tech/zr-obp into c…

3189863

…ontinuous-policy-learner

fix some tests to adjust to the changes of SyntheticContinuousBanditD…

900c4ff

…ataset

usaito changed the title ~~Feature: Implement SyntheticContinuousBanditDataset~~ Feature: Implementing SyntheticContinuousBanditDataset Jul 8, 2021

usaito added 3 commits July 8, 2021 20:06

fix docstrings

e33345f

fix docstrings

f404d92

fix docstrings

eee7bc9

nomuramasahir0 reviewed Jul 10, 2021

View reviewed changes

obp/dataset/synthetic_continuous.py Show resolved Hide resolved

obp/dataset/synthetic_continuous.py Show resolved Hide resolved

obp/dataset/synthetic_continuous.py Outdated Show resolved Hide resolved

reflect review

a11d8f3

usaito and others added 7 commits July 14, 2021 23:25

fix docs

acb0076

update based on review

4c3f8cc

update based on review

e7c940a

Merge branch 'continuous-dataset' into continuous-policy-learner

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

460d8ca

Merge branch 'master' into continuous-dataset

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

a4fd4f1

usaito merged commit a4b61e9 into master Aug 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Implementing SyntheticContinuousBanditDataset #112

Feature: Implementing SyntheticContinuousBanditDataset #112

usaito commented Jul 6, 2021 •

edited

Loading

nomuramasahir0 commented Jul 10, 2021

usaito commented Jul 10, 2021

	pscore = linear_behavior_policy(context=context, action_context=action_context)
	assert pscore.shape[0] == n_rounds and pscore.shape[1] == n_actions
	assert np.all(0 <= pscore) and np.all(pscore <= 1)

Feature: Implementing SyntheticContinuousBanditDataset #112

Feature: Implementing SyntheticContinuousBanditDataset #112

Conversation

usaito commented Jul 6, 2021 • edited Loading

new feature

tests

refactor

nomuramasahir0 commented Jul 10, 2021

usaito commented Jul 10, 2021

usaito commented Jul 6, 2021 •

edited

Loading