Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worse performance and less training time for the PLT reduction, when updating from VW 9.6 to 9.7 #4511

Closed
FabianKaiser opened this issue Feb 27, 2023 · 3 comments
Labels
Bug Bug in learning semantics, critical by default

Comments

@FabianKaiser
Copy link

FabianKaiser commented Feb 27, 2023

Describe the bug

The PLT reduction performs worse and trains faster in VW 9.7 when compared to VW 9.6.

The The difference can be seen also in the training logs:

  • the loss behaves very different in both versions
  • there are less passes used in VW 9.7 (which is likely due to the different loss)
  • the performance is worse in VW 9.7 (this is not that drastic in the example data, but more so in larger datasets)

reddit_data_sample.csv

VW 9.6

PLT k = 1013
kary_tree = 2
creating cache_file = train.vw.cache
Reading datafile = train.vw
num sources = 1
Num weight bits = 30
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
Enabled reductions: gd, scorer-identity, plt
Input label = MULTILABEL
Output pred = MULTILABELS
average since example example current current current
loss last counter weight label predict features
1.000000 1.000000 1 1.0 254 127
1.000000 1.000000 2 2.0 240 103
1.000000 1.000000 4 4.0 960 42
1.000000 1.000000 8 8.0 172 199
1.000000 1.000000 16 16.0 645 184
1.000000 1.000000 32 32.0 268 236
1.000000 1.000000 64 64.0 207 215
1.000000 1.000000 128 128.0 117 118
1.000000 1.000000 256 256.0 777 62
1.000000 1.000000 512 512.0 900 216
1.000000 1.000000 1024 1024.0 852 446
1.000000 1.000000 2048 2048.0 389 190
1.000000 1.000000 4096 4096.0 545 152
1.000000 1.000000 8192 8192.0 603 83
1.000244 1.000488 16384 16384.0 730 101
0.996429 0.996429 32768 32768.0 917 124 h
0.994506 0.992584 65536 65536.0 977 89 h
0.991005 0.987503 131072 131072.0 150 68 h
0.984825 0.978646 262144 262144.0 29 101 h
0.976190 0.967556 524288 524288.0 804 108 h

finished run
number of examples per pass = 24300
passes used = 31
weighted example sum = 753300.000000
weighted label sum = 0.000000
average loss = 0.958889 h
total feature number = 103905211

VW 9.7

PLT k = 1013
kary_tree = 2
creating cache_file = train.vw.cache
Reading datafile = train.vw
num sources = 1
Num weight bits = 30
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
Enabled reductions: gd, scorer-identity, plt
Input label = MULTILABEL
Output pred = MULTILABELS
average since example example current current current
loss last counter weight label predict features
14.55609 14.55609 1 1.0 382 81
15.16680 15.77752 2 2.0 879 146
14.76802 14.36924 4 4.0 283 65
17.27986 19.79169 8 8.0 930 119
16.27946 15.27906 16 16.0 863 113
15.91260 15.54575 32 32.0 377 185
15.79295 15.67330 64 64.0 684 102
16.23921 16.68547 128 128.0 226 66
16.49474 16.75027 256 256.0 845 89
16.48981 16.48488 512 512.0 935 56
16.35399 16.21816 1024 1024.0 0 149
15.75564 15.15729 2048 2048.0 365 101
15.00027 14.24491 4096 4096.0 471 95
14.03309 13.06592 8192 8192.0 138 141
12.93321 11.83332 16384 16384.0 925 104
0.000000 0.000000 32768 32768.0 95 79 h
0.000000 0.000000 65536 65536.0 860 85 h

finished run
number of examples per pass = 24300
passes used = 4
weighted example sum = 97200.000000
weighted label sum = 0.000000
average loss = 0.000000 h
total feature number = 13414196

How to reproduce

import os
from vowpalwabbit import Workspace
import pandas as pd
import gc
from sklearn.metrics import precision_score, recall_score, f1_score


def to_vw_format(text: str, label=None) -> str:
    if label is None:
        label = ''
    return f'{label} |text {text}'


def evaluate(labels: pd.DataFrame):
    print(f"precision: {precision_score(labels['target'], labels['pred'], average='weighted', zero_division=0)}")
    print(f"recall: {recall_score(labels['target'], labels['pred'], average='weighted', zero_division=0)}")
    print(f"f1: {f1_score(labels['target'], labels['pred'], average='weighted', zero_division=0)}")

    true_positives = labels[labels.columns.difference(['target'])].apply(lambda x: x == labels['target']).sum()
    print(f"accuracy: {true_positives['pred'] / len(labels)}")


data = pd.read_csv('reddit_data_sample_local.csv')

target_var = 'target'
training_var = 'text'

cleaned_targets = data[target_var].dropna()
unique_labels = cleaned_targets.unique().tolist()
num_classes = len(unique_labels)
numbers = list(range(num_classes))
mapping = dict(zip(unique_labels, numbers))

training_data = data.sample(frac=0.9, random_state=25)
testing_data = data.drop(training_data.index)

training_data = training_data.dropna(subset=[training_var]).sample(frac=1).reset_index(drop=True)

vw_training_file_name = 'train.vw'

with open(vw_training_file_name, "wb") as f:
    for text, label in zip(training_data[training_var], training_data[target_var]):
        vw_label = mapping[label]
        example = to_vw_format(
            text, vw_label) + ' \n'
        f.write(example.encode())

os.makedirs('model', exist_ok=True)

params = {
    'loss_function': 'logistic',
    'data': vw_training_file_name,
    'c': True,
    'k': True,
    'f': 'model/model.vw',
    'compressed': True,
    'plt': num_classes,
    'b': 30,
    'passes': 50,
    'example_queue_limit': 256,
    'learning_rate': 0.5,
}

model = Workspace(**params)

model.finish()

del model
gc.collect()

params = {
    'loss_function': 'logistic',
    'predict_only_model': True,
    'i': 'model/model.vw',
    'top_k': 100,
}
model = Workspace(**params)

predictions = testing_data.text.apply(lambda x: model.predict(to_vw_format(x))[0]).rename('pred')
evaluate(pd.concat([predictions, testing_data.target.apply(lambda x: mapping[x])], axis=1))

del model
gc.collect()

Version

9.7

OS

Linux

Language

Python

Additional context

No response

@FabianKaiser FabianKaiser added the Bug Bug in learning semantics, critical by default label Feb 27, 2023
@mwydmuch
Copy link
Contributor

mwydmuch commented Mar 6, 2023

Hello @FabianKaiser, thank you for reporting the issue and providing the code to replicate it. I will look into this by the end of this week.

@mwydmuch
Copy link
Contributor

mwydmuch commented Mar 16, 2023

Hi @FabianKaiser, I checked and there is no bug. The thing is that PLT reduction in 9.6 no longer returns loss when predicting (in 9.6 it was not the correct value btw.), and since you use PLT with a holdout dataset, you get 0 loss for that set, which results in premature early stopping of the training. If you add 'holdout_off': True, to your training params, the training will go as in 9.6, resulting in a similar model.

@jackgerrits There are two possible solutions for that problem one is to disable the holdout dataset for PLT reduction (not sure if that is possible). Alternatively, I can add an additional loss calculation to the prediction method that would basically mimic the learn step without updating base classifiers.

@jackgerrits
Copy link
Member

Closing this as it should be resolved by #4534

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Bug in learning semantics, critical by default
Projects
None yet
Development

No branches or pull requests

3 participants