[Tensorboard] Log text prediction in evaluation #163

thomasw21 · 2021-10-27T00:10:10Z

A very useful tool in order to understand model performance beyond obtaining loss: Actually show what are the predictions.

It'd be very useful to be able to "see" the output of the model during evaluation in text format. These should be logged in tensorboard. Tensorboard likely supports markdown style where you can put prediction in bold.

Maybe we can only print out the first batch as we should get a good amount of example from it.

thomasw21 · 2021-10-27T22:54:01Z

@TevenLeScao also suggested that we make inference work in Meg-DS. Very simple greedy search. The motivation is that teacher forcing won't tell us much about the model (it's very similar to validation loss), whereas greedy search will show the models actually infers.

Personally I don't agree with the statement that teacher forcing won't tell us much, but I do agree that running actual inference in Meg-DS will probably allow us to notice bugs very quickly.

KMFODA · 2022-08-10T07:14:54Z

Hey @thomasw21. Is this still needed? If so I'd love to take it on.

thomasw21 · 2022-08-10T07:50:27Z

Hey! We have finished training BLOOM so the tensorboard integration might not be required anymore. However I think having a generation engine in Meg-DS would he greatly appeciated as we currently rely on our transformers converted checkpoint to generate

KMFODA · 2022-08-15T08:02:01Z

I see I'd like to help with that then. Where would be the best place for having that generate engine?

mayank31398 · 2022-08-15T09:58:24Z

@KMFODA @thomasw21 , #328
Already added the ability to benchmark system, interactive cli, and a generation server.
Testing a few things.
Will try to get this merged by this week

thomasw21 · 2022-08-16T08:39:24Z

Already added the ability to benchmark system, interactive cli, and a generation server.

IMO this issue is different, we want to have a inference mechanism from Meg-DS, without having to convert to transformers. The context was that we were training in Meg-DS and had no way to "test" the model until we built the transformers skeleton, convert the weights and then leverage transformers inference mechanisms.

Where would be the best place for having that generate engine?
I'm not sure what you're asking, probably in this repo?

KMFODA · 2022-08-16T16:01:18Z

Where would be the best place for having that generate engine?
I'm not sure what you're asking, probably in this repo?

Sorry. I'm new to this repo. I meant to ask where in the repo itself should this generate engine live?

mayank31398 · 2022-08-16T16:11:10Z

Hmm, @thomasw21
so, the PR I referred to above uses both HF accelerate and DS-inference libraries, depending on what we want to infer with.
But it does require transformers version of BLOOM

mayank31398 · 2022-08-16T16:12:49Z

@KMFODA currently, I am planning to create a standalone library. For now, I am adding to this repo itself.

thomasw21 · 2022-08-16T16:34:22Z

Sorry. I'm new to this repo. I meant to ask where in the repo itself should this generate engine live?

I mean you can probably create a megatron/inference folder.

mayank31398 · 2022-08-16T17:49:31Z

@thomasw21 , I am not sure how this differs from the PR I pointed above ^^. Can you explain?

thomasw21 · 2022-08-16T21:18:40Z

If you don't have transformers skeleton (ie modeling) how would one be able to use transformers or DS-inference?

mayank31398 · 2022-08-16T22:27:06Z

oh, I think I understand the issue now.
Maybe something like loading from the universal checkpoints and running inference etc?

thomasw21 · 2022-08-17T08:10:09Z

@mayank31398 yup! Essentially this is what this issue is about.

* support flash_attn * update * update

thomasw21 added the Good First Issue Good for newcomers label Oct 27, 2021

adammoody pushed a commit to adammoody/Megatron-DeepSpeed that referenced this issue Aug 16, 2023

support flash_attn on xpu device (bigscience-workshop#163)

e7bff5e

* support flash_attn * update * update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tensorboard] Log text prediction in evaluation #163

[Tensorboard] Log text prediction in evaluation #163

thomasw21 commented Oct 27, 2021

thomasw21 commented Oct 27, 2021

KMFODA commented Aug 10, 2022

thomasw21 commented Aug 10, 2022

KMFODA commented Aug 15, 2022

mayank31398 commented Aug 15, 2022

thomasw21 commented Aug 16, 2022

KMFODA commented Aug 16, 2022

mayank31398 commented Aug 16, 2022

mayank31398 commented Aug 16, 2022

thomasw21 commented Aug 16, 2022

mayank31398 commented Aug 16, 2022

thomasw21 commented Aug 16, 2022

mayank31398 commented Aug 16, 2022

thomasw21 commented Aug 17, 2022

[Tensorboard] Log text prediction in evaluation #163

[Tensorboard] Log text prediction in evaluation #163

Comments

thomasw21 commented Oct 27, 2021

thomasw21 commented Oct 27, 2021

KMFODA commented Aug 10, 2022

thomasw21 commented Aug 10, 2022

KMFODA commented Aug 15, 2022

mayank31398 commented Aug 15, 2022

thomasw21 commented Aug 16, 2022

KMFODA commented Aug 16, 2022

mayank31398 commented Aug 16, 2022

mayank31398 commented Aug 16, 2022

thomasw21 commented Aug 16, 2022

mayank31398 commented Aug 16, 2022

thomasw21 commented Aug 16, 2022

mayank31398 commented Aug 16, 2022

thomasw21 commented Aug 17, 2022