Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test metrics not logging to Comet after training #760

Closed
fdelrio89 opened this issue Jan 28, 2020 · 10 comments
Closed

Test metrics not logging to Comet after training #760

fdelrio89 opened this issue Jan 28, 2020 · 10 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@fdelrio89
Copy link
Contributor

🐛 Bug

When testing a model with Trainer.test metrics are not logged to Comet if the model was previously trained using Trainer.fit. While training metrics are logged correctly.

Code sample

    comet_logger = CometLogger()
    trainer = Trainer(logger=comet_logger)
    model = get_model()

    trainer.fit(model) # Metrics are logged to Comet
    trainer.test(model) # No metrics are logged to Comet

Expected behavior

Test metrics should also be logged in to Comet.

Environment

- PyTorch version: 1.3.0
Is debug build: No
CUDA used to build PyTorch: 10.1.243

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.168
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
GPU 4: GeForce GTX 1080 Ti
GPU 5: GeForce GTX 1080 Ti
GPU 6: GeForce GTX 1080 Ti
GPU 7: GeForce GTX 1080 Ti

Nvidia driver version: 418.67
cuDNN version: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.1

Versions of relevant libraries:
[pip3] numpy==1.16.4
[pip3] pytorch-lightning==0.6.0
[pip3] torch==1.3.0
[pip3] torchvision==0.4.1
[conda] Could not collect

Additional context

I believe the issue is caused because at the end of the training routine, logger.finalize("success") is called. This in turn calls experiment.end() inside the logger and the Experiment object doesn't expect to send more information after this.

An alternative is to create another Trainer object, with another logger but this means that the metrics will be logged into a different Comet experiment from the original. This issue can be solved using the ExistingExperiment object form the Comet SDK, but the solution seems a little hacky and the CometLogger currently doesn't support this kind of experiment.

@fdelrio89 fdelrio89 added the bug Something isn't working label Jan 28, 2020
@williamFalcon
Copy link
Contributor

Did you find a solution?
Mind submitting a PR?
@fdelrio89

@williamFalcon williamFalcon added this to the 0.6.1 milestone Feb 11, 2020
@fdelrio89
Copy link
Contributor Author

I did solve the issue but in a kind of hacky way. It's not that elegant but it works for me, and I haven't had the time to think of a better solution.

I solved it by getting the experiment key and creating another logger and trainer with it.

    comet_logger = CometLogger()
    trainer = Trainer(logger=comet_logger)
    model = get_model()

    trainer.fit(model)

    experiment_key = comet_logger.experiment.get_key()
    comet_logger = CometLogger(experiment_key=experiment_key)
    trainer = Trainer(logger=comet_logger)

    trainer.test(model)

For this to work, I had to modify the CometLogger class to accept the experiment_key and create a CometExistingExperiment from the Comet SDK when this param is present.

class CometLogger(LightningLoggerBase):
     ...

    @property
    def experiment(self):
        ...

        if self.mode == "online":
            if self.experiment_key is None:
                self._experiment = CometExperiment(
                    api_key=self.api_key,
                    workspace=self.workspace,
                    project_name=self.project_name,
                    **self._kwargs
                )
            else:
                self._experiment = CometExistingExperiment(
                    api_key=self.api_key,
                    workspace=self.workspace,
                    project_name=self.project_name,
                    previous_experiment=self.experiment_key,
                    **self._kwargs
                )
        else:
            ...

        return self._experiment

I can happily do the PR if this solution is acceptable for you guys, but I think a better solution can be achieved I haven't had the time to think about it @williamFalcon.

@xssChauhan
Copy link
Contributor

xssChauhan commented Feb 17, 2020

@williamFalcon Any progress on this Issue? I am facing the same problem.

@xssChauhan
Copy link
Contributor

@fdelrio89 Since the logger object is available for the lifetime of the trainer, maybe you can refactor to store the experiment_key directly in the logger object itself, instead of having to re-instantiate the logger.

@fdelrio89
Copy link
Contributor Author

@xssChauhan good idea, I just submitted a PR (#892) considering this. Thanks!

@Borda
Copy link
Member

Borda commented Feb 26, 2020

I assume that it was fixed by #892
if you have some other problems feel free to reopen or create a new... 🤖

@dvirginz
Copy link

Actually I'm still facing the problem.

@Borda
Copy link
Member

Borda commented Apr 19, 2020

@dvirginz are you using the latest master? may you provide a minimal example?

@dvirginz
Copy link

@dvirginz are you using the latest master? may you provide a minimal example?

You are right, sorry.
After building from source it works.

@tejasvi
Copy link
Contributor

tejasvi commented Jul 14, 2020

I should probably open a new issue, but it happens with Weights & Biases logger too. I haven't had the time to delve deep into it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants