Fix alibi #222

thomasw21 · 2022-01-05T16:50:20Z

No description provided.

megatron/model/transformer.py

thomasw21 · 2022-01-05T16:52:48Z

megatron/model/transformer.py

+            if not hasattr(self, "logged_alibi"):
+                logger.debug("Using Alibi.")
+                self.logged_alibi = True


Testing purposes.

Just for information, on my side I was going to propose to isolate the 3 ways to calculate the score according to the positional embedding by creating a method for each positional embedding method and then wrapping these methods in log_debug_usage to have a log (as for the activation functions) to detect in the tests.

The micro advantage is that it also allows to test rotary and absolute (in all cases) but I don"t mind if you think it's easier to keep it like you did.

Yeah that's true that your version looks nice. I just think it's an incorrect abstraction. There's no reason to group all the positional embeddings together (they get applied in different places, they do different things, they have different constraints despite having the common purpose on given sequential information). One could argue that using a pure causal mask is a position embedding mechanism.

What I was thinking of is abstracting only the alibi function in a seperate function to use the pretty decorator, but I was lazy ^^' @log_debug_usage(logger, msg)

using a pure causal mask is a position embedding mechanism

I agree with this statement :) Models that don't have any position embeddings (like sinusoidal or learned or alibi) are actually able to achieve good (but not great) PPL because the causal mask encodes some kind of order.

SaulLu

Thanks a lot for the fix! 🎊

Can we also add a test in test_model.py in order to test alibi when we use it without megatron?

SaulLu · 2022-01-05T17:11:38Z

megatron/model/transformer.py

+            if not hasattr(self, "logged_alibi"):
+                logger.debug("Using Alibi.")
+                self.logged_alibi = True


Just for information, on my side I was going to propose to isolate the 3 ways to calculate the score according to the positional embedding by creating a method for each positional embedding method and then wrapping these methods in log_debug_usage to have a log (as for the activation functions) to detect in the tests.

The micro advantage is that it also allows to test rotary and absolute (in all cases) but I don"t mind if you think it's easier to keep it like you did.

megatron/model/transformer.py

thomasw21 · 2022-01-05T17:37:00Z

Not super clear on what you mean by without megatron. The test in test_model tests out invariants that we can have using prefix lm or gpt. I don't see any invariants for alibi version?

SaulLu · 2022-01-05T17:56:11Z

Oh sorry, I was meaning without deepspeed (I understand that the tests in test_model.py use GPTModel and not GPTModelPipe) !

thomasw21 · 2022-01-05T18:01:55Z

Ah makes sense ... I'll try something out

thomasw21 · 2022-01-06T11:29:54Z

Okay after thinking about it. I think we need to create a test so that all the models we have match their deepspeed version. Basically the only reason where I see the need of adding this test is that our EAI evaluation #212 is using Megatron-LM version of models. So I think this applies to all models we have.

I don't think we should implement that test in this PR. I'll open an issue to create a test for all models. #226

SaulLu

Following the last offline discussions, if ever we have to run the evaluation on the Megatron version without deepspeed we will add a test to verify that the architectures are identical.

In the meantime, I have full confidence in this fix and I think we can move forward and re-launch a training.

Thanks again Thomas! 🚀

DanielHesslow · 2022-01-07T15:08:36Z

megatron/model/transformer.py

@@ -470,9 +486,19 @@ def __init__(self, init_method, output_layer_init_method,
        self.mlp = ParallelMLP(init_method,
                               output_layer_init_method)

+        # Alibi
+        if args.position_embedding_type == PositionEmbeddingType.alibi:
+            self.alibi = self._build_alibi_tensor(args.seq_length, args.num_attention_heads, args.micro_batch_size).to(torch.cuda.current_device())


Does args.micro_batch_size work with batch size rampup? And if not, do we care?

micro batch size doesn't increase during batch size rampup. It's constant.

We should care since all our experiment run with batch size rampup, but I would expect it to crash badly if it doesn't match.

DanielHesslow · 2022-01-07T15:40:13Z

Have we tested that we get the same results with/without tensor parallelism? Correct me if I'm wrong but I think that the attention heads are distributed across workers since self.query_key_value is ColumnParallel. We might need to extract the correct part of the alibi tensor depending on the tensor parallel index since different attention heads should have different alibi slopes?

ie.

Megatron-DeepSpeed/megatron/model/transformer.py

Line 291 in 33f43ee

matmul_result = alibi[:output_size[0]*output_size[1], :, :output_size[3]]

might be wrong?

thomasw21 · 2022-01-07T15:54:26Z

That's actually a great point. Looking at things in diagonal, I feel you're completely right! Though we are lucky as we're using TP=1 in our experiment. If I understand correctly you believe :output_size[0]*output_size[1] is wrong right? because it doesn't take in account the ordering of the heads, ie there should be an offset linked to the tp_rank right?

If so I would suggest you open an issue/PR to introduce such test, and if it fails to fix it?

DanielHesslow · 2022-01-07T16:28:01Z

Yup exactly there should be some offset depending on the tp_rank, and possibly also a stride since the batch_size is baked in the first dimension as well.

I've opened an issue for it here: #227

Fix alibi

33f43ee

thomasw21 requested review from ibeltagy, SaulLu and TevenLeScao January 5, 2022 16:50

thomasw21 commented Jan 5, 2022

View reviewed changes

megatron/model/transformer.py Show resolved Hide resolved

thomasw21 commented Jan 5, 2022

View reviewed changes

SaulLu reviewed Jan 5, 2022

View reviewed changes

thomasw21 mentioned this pull request Jan 6, 2022

Make sure deepspeed powered models are equivalent with their non deepspeed version #226

Open

SaulLu approved these changes Jan 6, 2022

View reviewed changes

thomasw21 merged commit 7ab5c05 into main Jan 6, 2022

DanielHesslow reviewed Jan 7, 2022

View reviewed changes

thomasw21 mentioned this pull request Feb 14, 2022

[low CPU RAM] allocate on gpu directly, stagger checkpoint load/save #248

Closed

thomasw21 deleted the thomas/fix_alibi branch July 4, 2022 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix alibi #222

Fix alibi #222

thomasw21 commented Jan 5, 2022

thomasw21 Jan 5, 2022

SaulLu Jan 5, 2022 •

edited

Loading

thomasw21 Jan 5, 2022

ofirpress Jan 7, 2022 •

edited

Loading

SaulLu left a comment

SaulLu Jan 5, 2022 •

edited

Loading

thomasw21 commented Jan 5, 2022

SaulLu commented Jan 5, 2022 •

edited

Loading

thomasw21 commented Jan 5, 2022

thomasw21 commented Jan 6, 2022 •

edited

Loading

SaulLu left a comment

DanielHesslow Jan 7, 2022

thomasw21 Jan 7, 2022

DanielHesslow commented Jan 7, 2022

thomasw21 commented Jan 7, 2022

DanielHesslow commented Jan 7, 2022

Fix alibi #222

Fix alibi #222

Conversation

thomasw21 commented Jan 5, 2022

thomasw21 Jan 5, 2022

Choose a reason for hiding this comment

SaulLu Jan 5, 2022 • edited Loading

Choose a reason for hiding this comment

thomasw21 Jan 5, 2022

Choose a reason for hiding this comment

ofirpress Jan 7, 2022 • edited Loading

Choose a reason for hiding this comment

SaulLu left a comment

Choose a reason for hiding this comment

SaulLu Jan 5, 2022 • edited Loading

Choose a reason for hiding this comment

thomasw21 commented Jan 5, 2022

SaulLu commented Jan 5, 2022 • edited Loading

thomasw21 commented Jan 5, 2022

thomasw21 commented Jan 6, 2022 • edited Loading

SaulLu left a comment

Choose a reason for hiding this comment

DanielHesslow Jan 7, 2022

Choose a reason for hiding this comment

thomasw21 Jan 7, 2022

Choose a reason for hiding this comment

DanielHesslow commented Jan 7, 2022

thomasw21 commented Jan 7, 2022

DanielHesslow commented Jan 7, 2022

SaulLu Jan 5, 2022 •

edited

Loading

ofirpress Jan 7, 2022 •

edited

Loading

SaulLu Jan 5, 2022 •

edited

Loading

SaulLu commented Jan 5, 2022 •

edited

Loading

thomasw21 commented Jan 6, 2022 •

edited

Loading