You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PLT k = 1013
kary_tree = 2
creating cache_file = train.vw.cache
Reading datafile = train.vw
num sources = 1
Num weight bits = 30
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
Enabled reductions: gd, scorer-identity, plt
Input label = MULTILABEL
Output pred = MULTILABELS
average since example example current current current
loss last counter weight label predict features
1.000000 1.000000 1 1.0 254 127
1.000000 1.000000 2 2.0 240 103
1.000000 1.000000 4 4.0 960 42
1.000000 1.000000 8 8.0 172 199
1.000000 1.000000 16 16.0 645 184
1.000000 1.000000 32 32.0 268 236
1.000000 1.000000 64 64.0 207 215
1.000000 1.000000 128 128.0 117 118
1.000000 1.000000 256 256.0 777 62
1.000000 1.000000 512 512.0 900 216
1.000000 1.000000 1024 1024.0 852 446
1.000000 1.000000 2048 2048.0 389 190
1.000000 1.000000 4096 4096.0 545 152
1.000000 1.000000 8192 8192.0 603 83
1.000244 1.000488 16384 16384.0 730 101
0.996429 0.996429 32768 32768.0 917 124 h
0.994506 0.992584 65536 65536.0 977 89 h
0.991005 0.987503 131072 131072.0 150 68 h
0.984825 0.978646 262144 262144.0 29 101 h
0.976190 0.967556 524288 524288.0 804 108 h
finished run
number of examples per pass = 24300
passes used = 31
weighted example sum = 753300.000000
weighted label sum = 0.000000
average loss = 0.958889 h
total feature number = 103905211
finished run
number of examples per pass = 24300
passes used = 4
weighted example sum = 97200.000000
weighted label sum = 0.000000
average loss = 0.000000 h
total feature number = 13414196
Hi @FabianKaiser, I checked and there is no bug. The thing is that PLT reduction in 9.6 no longer returns loss when predicting (in 9.6 it was not the correct value btw.), and since you use PLT with a holdout dataset, you get 0 loss for that set, which results in premature early stopping of the training. If you add 'holdout_off': True, to your training params, the training will go as in 9.6, resulting in a similar model.
@jackgerrits There are two possible solutions for that problem one is to disable the holdout dataset for PLT reduction (not sure if that is possible). Alternatively, I can add an additional loss calculation to the prediction method that would basically mimic the learn step without updating base classifiers.
Describe the bug
The PLT reduction performs worse and trains faster in VW 9.7 when compared to VW 9.6.
The The difference can be seen also in the training logs:
reddit_data_sample.csv
VW 9.6
PLT k = 1013
kary_tree = 2
creating cache_file = train.vw.cache
Reading datafile = train.vw
num sources = 1
Num weight bits = 30
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
Enabled reductions: gd, scorer-identity, plt
Input label = MULTILABEL
Output pred = MULTILABELS
average since example example current current current
loss last counter weight label predict features
1.000000 1.000000 1 1.0 254 127
1.000000 1.000000 2 2.0 240 103
1.000000 1.000000 4 4.0 960 42
1.000000 1.000000 8 8.0 172 199
1.000000 1.000000 16 16.0 645 184
1.000000 1.000000 32 32.0 268 236
1.000000 1.000000 64 64.0 207 215
1.000000 1.000000 128 128.0 117 118
1.000000 1.000000 256 256.0 777 62
1.000000 1.000000 512 512.0 900 216
1.000000 1.000000 1024 1024.0 852 446
1.000000 1.000000 2048 2048.0 389 190
1.000000 1.000000 4096 4096.0 545 152
1.000000 1.000000 8192 8192.0 603 83
1.000244 1.000488 16384 16384.0 730 101
0.996429 0.996429 32768 32768.0 917 124 h
0.994506 0.992584 65536 65536.0 977 89 h
0.991005 0.987503 131072 131072.0 150 68 h
0.984825 0.978646 262144 262144.0 29 101 h
0.976190 0.967556 524288 524288.0 804 108 h
finished run
number of examples per pass = 24300
passes used = 31
weighted example sum = 753300.000000
weighted label sum = 0.000000
average loss = 0.958889 h
total feature number = 103905211
VW 9.7
PLT k = 1013
kary_tree = 2
creating cache_file = train.vw.cache
Reading datafile = train.vw
num sources = 1
Num weight bits = 30
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
Enabled reductions: gd, scorer-identity, plt
Input label = MULTILABEL
Output pred = MULTILABELS
average since example example current current current
loss last counter weight label predict features
14.55609 14.55609 1 1.0 382 81
15.16680 15.77752 2 2.0 879 146
14.76802 14.36924 4 4.0 283 65
17.27986 19.79169 8 8.0 930 119
16.27946 15.27906 16 16.0 863 113
15.91260 15.54575 32 32.0 377 185
15.79295 15.67330 64 64.0 684 102
16.23921 16.68547 128 128.0 226 66
16.49474 16.75027 256 256.0 845 89
16.48981 16.48488 512 512.0 935 56
16.35399 16.21816 1024 1024.0 0 149
15.75564 15.15729 2048 2048.0 365 101
15.00027 14.24491 4096 4096.0 471 95
14.03309 13.06592 8192 8192.0 138 141
12.93321 11.83332 16384 16384.0 925 104
0.000000 0.000000 32768 32768.0 95 79 h
0.000000 0.000000 65536 65536.0 860 85 h
finished run
number of examples per pass = 24300
passes used = 4
weighted example sum = 97200.000000
weighted label sum = 0.000000
average loss = 0.000000 h
total feature number = 13414196
How to reproduce
Version
9.7
OS
Linux
Language
Python
Additional context
No response
The text was updated successfully, but these errors were encountered: