Test metrics not logging to Comet after training #760

fdelrio89 · 2020-01-28T15:36:34Z

🐛 Bug

When testing a model with Trainer.test metrics are not logged to Comet if the model was previously trained using Trainer.fit. While training metrics are logged correctly.

Code sample

    comet_logger = CometLogger()
    trainer = Trainer(logger=comet_logger)
    model = get_model()

    trainer.fit(model) # Metrics are logged to Comet
    trainer.test(model) # No metrics are logged to Comet

Expected behavior

Test metrics should also be logged in to Comet.

Environment

- PyTorch version: 1.3.0
Is debug build: No
CUDA used to build PyTorch: 10.1.243

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.168
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
GPU 4: GeForce GTX 1080 Ti
GPU 5: GeForce GTX 1080 Ti
GPU 6: GeForce GTX 1080 Ti
GPU 7: GeForce GTX 1080 Ti

Nvidia driver version: 418.67
cuDNN version: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.1

Versions of relevant libraries:
[pip3] numpy==1.16.4
[pip3] pytorch-lightning==0.6.0
[pip3] torch==1.3.0
[pip3] torchvision==0.4.1
[conda] Could not collect

Additional context

I believe the issue is caused because at the end of the training routine, logger.finalize("success") is called. This in turn calls experiment.end() inside the logger and the Experiment object doesn't expect to send more information after this.

An alternative is to create another Trainer object, with another logger but this means that the metrics will be logged into a different Comet experiment from the original. This issue can be solved using the ExistingExperiment object form the Comet SDK, but the solution seems a little hacky and the CometLogger currently doesn't support this kind of experiment.

The text was updated successfully, but these errors were encountered:

williamFalcon · 2020-02-11T17:36:13Z

Did you find a solution?
Mind submitting a PR?
@fdelrio89

fdelrio89 · 2020-02-13T21:36:24Z

I did solve the issue but in a kind of hacky way. It's not that elegant but it works for me, and I haven't had the time to think of a better solution.

I solved it by getting the experiment key and creating another logger and trainer with it.

    comet_logger = CometLogger()
    trainer = Trainer(logger=comet_logger)
    model = get_model()

    trainer.fit(model)

    experiment_key = comet_logger.experiment.get_key()
    comet_logger = CometLogger(experiment_key=experiment_key)
    trainer = Trainer(logger=comet_logger)

    trainer.test(model)

For this to work, I had to modify the CometLogger class to accept the experiment_key and create a CometExistingExperiment from the Comet SDK when this param is present.

class CometLogger(LightningLoggerBase):
     ...

    @property
    def experiment(self):
        ...

        if self.mode == "online":
            if self.experiment_key is None:
                self._experiment = CometExperiment(
                    api_key=self.api_key,
                    workspace=self.workspace,
                    project_name=self.project_name,
                    **self._kwargs
                )
            else:
                self._experiment = CometExistingExperiment(
                    api_key=self.api_key,
                    workspace=self.workspace,
                    project_name=self.project_name,
                    previous_experiment=self.experiment_key,
                    **self._kwargs
                )
        else:
            ...

        return self._experiment

I can happily do the PR if this solution is acceptable for you guys, but I think a better solution can be achieved I haven't had the time to think about it @williamFalcon.

xssChauhan · 2020-02-17T11:18:50Z

@williamFalcon Any progress on this Issue? I am facing the same problem.

xssChauhan · 2020-02-17T11:21:16Z

@fdelrio89 Since the logger object is available for the lifetime of the trainer, maybe you can refactor to store the experiment_key directly in the logger object itself, instead of having to re-instantiate the logger.

fdelrio89 · 2020-02-18T21:27:32Z

@xssChauhan good idea, I just submitted a PR (#892) considering this. Thanks!

Borda · 2020-02-26T23:34:53Z

I assume that it was fixed by #892
if you have some other problems feel free to reopen or create a new... 🤖

dvirginz · 2020-04-19T06:51:52Z

Actually I'm still facing the problem.

Borda · 2020-04-19T09:11:27Z

@dvirginz are you using the latest master? may you provide a minimal example?

dvirginz · 2020-04-19T09:21:55Z

@dvirginz are you using the latest master? may you provide a minimal example?

You are right, sorry.
After building from source it works.

tejasvi · 2020-07-14T13:43:08Z

I should probably open a new issue, but it happens with Weights & Biases logger too. I haven't had the time to delve deep into it yet.

fdelrio89 added the bug Something isn't working label Jan 28, 2020

williamFalcon added this to the 0.6.1 milestone Feb 11, 2020

fdelrio89 mentioned this issue Feb 18, 2020

Fix comet logger to log after train #892

Merged

Borda closed this as completed Feb 26, 2020

Borda assigned fdelrio89 Feb 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test metrics not logging to Comet after training #760

Test metrics not logging to Comet after training #760

fdelrio89 commented Jan 28, 2020

williamFalcon commented Feb 11, 2020

fdelrio89 commented Feb 13, 2020

xssChauhan commented Feb 17, 2020 •

edited

Loading

xssChauhan commented Feb 17, 2020

fdelrio89 commented Feb 18, 2020

Borda commented Feb 26, 2020

dvirginz commented Apr 19, 2020

Borda commented Apr 19, 2020

dvirginz commented Apr 19, 2020

tejasvi commented Jul 14, 2020

Test metrics not logging to Comet after training #760

Test metrics not logging to Comet after training #760

Comments

fdelrio89 commented Jan 28, 2020

🐛 Bug

Code sample

Expected behavior

Environment

Additional context

williamFalcon commented Feb 11, 2020

fdelrio89 commented Feb 13, 2020

xssChauhan commented Feb 17, 2020 • edited Loading

xssChauhan commented Feb 17, 2020

fdelrio89 commented Feb 18, 2020

Borda commented Feb 26, 2020

dvirginz commented Apr 19, 2020

Borda commented Apr 19, 2020

dvirginz commented Apr 19, 2020

tejasvi commented Jul 14, 2020

xssChauhan commented Feb 17, 2020 •

edited

Loading