Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when l use the model generate method within @tf.function, it encounterd a mistake #33241

Open
2 of 4 tasks
HelloWorldU opened this issue Sep 2, 2024 · 5 comments
Open
2 of 4 tasks
Labels

Comments

@HelloWorldU
Copy link

System Info

transformers version: 4.43.3
Platform: Linux-5.15.0-105-generic-x86_64-with-glibc2.17
Python version: 3.8.19
Huggingface_hub version: 0.24.5
Safetensors version: 0.4.4
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): not installed (NA)
Tensorflow version (GPU?): 2.7.0 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: Yes, using TensorFlow MirroredStrategy for distributed training.

Who can help?

@ArthurZucker the original issues is #33329, thanks a lot

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

''
def train_generator_step(self, input_ids, attention_mask, labels, styles, max_len, step, accumulation_steps=4,
lambda_rec=1.0, lambda_lm=1.0, lambda_adv=1.0, lambda_kl=1.0, gamma=1.0):

max_len = tf.constant(max_len, dtype=tf.int32)
max_len_value = max_len
seq_len = input_ids.shape[2]
max_new_tokens = tf.maximum(max_len_value - seq_len - 10, 1)
max_new_tokens = tf.cast(max_new_tokens, tf.int32)
max_new_tokens = tf.maximum(max_new_tokens, 1)
max_new_tokens = tf.constant(max_new_tokens, dtype=tf.int32)

@tf.function
def step_fn(input_ids=input_ids, attention_mask=attention_mask, labels=labels, styles=styles, accumulation_steps=accumulation_steps, 
            lambda_rec=lambda_rec, lambda_lm=lambda_lm, lambda_adv=lambda_adv, lambda_kl=lambda_kl, gamma=gamma):
    with tf.GradientTape() as tape:
        tf.debugging.enable_check_numerics()
        
        accumulation_steps, lambda_rec, lambda_lm, lambda_adv, lambda_kl, gamma = pr.conv_tensor_to_float(accumulation_steps, lambda_rec, lambda_lm, lambda_adv, lambda_kl, gamma)

        epsilon = 1e-6 # 快速修复

        """
        we firstly to reshape the input
        """
        actual_shape = tf.shape(input_ids)
        input_ids = tf.reshape(input_ids, (actual_shape[0] * actual_shape[1], actual_shape[2]))
        attention_mask = tf.reshape(attention_mask, (actual_shape[0] * actual_shape[1], actual_shape[2]))

        """
        then, we repeat styles and labels
        """
        styles = tf.repeat(styles, repeats=actual_shape[0])
        labels = tf.repeat(labels, repeats=actual_shape[0], axis=0)

        # 嵌入风格标签
        style_embeddings = self.embedding(styles) # [num_devices * batch_size, n_embd]
        print("Style embeddings shape:", style_embeddings.shape)  # Debug info
        
        # 将输入 ID 嵌入到相同的嵌入空间
        input_embeddings = self.gen.transformer.wte(input_ids) # [num_devices * batch_size, seq_len, n_embd]
        print("Input embeddings shape:", input_embeddings.shape)  # Debug info
        
        extended_input_embeddings = input_embeddings + tf.expand_dims(style_embeddings, axis=1)
        print("Extended embeddings shape:", extended_input_embeddings.shape)  # Debug info

        input_ids, attention_mask, labels, styles = dis.convert_tensor(input_ids, attention_mask, labels, styles)

        outputs = self.gen(input_ids=input_ids, attention_mask=attention_mask, training=True)
        logits = outputs.logits
        print("Logits shape:", logits.shape)  # Debug info
        print("Logits dtype:", logits.dtype)  # Debug info
        print("labels shape:", labels.shape)  # Debug info

        loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
        mask = tf.cast(labels != -100, logits.dtype)
        print("Mask shape:", mask.shape)  # Debug info
        print("Mask dtype:", mask.dtype)  # Debug info

        # Check for NaN or Inf in logits
        tf.debugging.check_numerics(logits, "Logits contain NaN or Inf")

        # 各个损失
        rec_loss = loss_fn(tf.where(labels == -100, tf.zeros_like(labels, dtype=logits.dtype), tf.cast(labels, logits.dtype)), logits)
        rec_loss = tf.reduce_sum(rec_loss * mask) / (tf.reduce_sum(mask) + epsilon)
        rec_loss = tf.cast(rec_loss, tf.float32)
        print("Reconstruction loss:", rec_loss)  # Debug info

        for var in self.gen.trainable_variables:
            tf.debugging.check_numerics(var, message="Model weight check")

        print("Input shapes:", 
                "input_ids:", input_ids.shape, 
                "input_ids dtype:", input_ids.dtype,
                "attention_mask:", attention_mask.shape, 
                "labels:", labels.shape, 
                "styles:", styles.shape,
                "max_len_value:", max_len_value)

        new_shape = tf.shape(input_ids)
        print("New shape:", new_shape)  # Debug info
        print("Seq len:", seq_len)  # Debug info
        
        # max_new_tokens = tf.maximum(max_new_tokens, 1)
        print("Max length:", max_len_value)  # Debug info
        print("Max new tokens:", max_new_tokens)  # Debug info
        if isinstance(max_new_tokens, tf.Tensor):
            print("Max new tokens:", tf.get_static_value(max_new_tokens))  # Debug info
        batch_size = new_shape[0]

        # 扩展 input_ids
        """
        at here, we need to padding to -> [batch_size, max_new_tokens]
        """
        padding = tf.zeros((batch_size, max_new_tokens), dtype=input_ids.dtype)
        print("Padding shape:", padding.shape)  # Debug info
        
        extended_input_ids = tf.concat([input_ids, padding], axis=1)
        extended_attention_mask1 = tf.concat([attention_mask, tf.zeros((tf.shape(attention_mask)[0],
                                    max_new_tokens), dtype=attention_mask.dtype)], axis=1)
        
        extended_input_ids = tf.cast(extended_input_ids, tf.int32)
        extended_attention_mask1 = tf.cast(extended_attention_mask1, tf.float32)
        
        print(f"Extended input_ids shape: {extended_input_ids.shape}")
        print(f"Extended attention_mask shape: {extended_attention_mask1.shape}")

        # 确保最大长度大于最小长度
        max_length = max_len_value + max_new_tokens
        min_length = 1
        # tf.print("max_length:", max_length)
        # tf.print("min_length:", min_length)

        tf.debugging.assert_greater(max_length, min_length, message=f"max_length ({max_length}) must be greater than min_length ({min_length})")
        
        pad_token_id = int(self.tokenizer.pad_token_id)
        eos_token_id = int(self.tokenizer.eos_token_id)
        bos_token_id = int(self.tokenizer.bos_token_id)

        try:                    
            generated_ids = self.gen.generate(
                extended_input_ids, 
                attention_mask=extended_attention_mask1, 
                max_new_tokens=max_new_tokens,
                pad_token_id=pad_token_id,
                eos_token_id=eos_token_id,
                bos_token_id=bos_token_id,
                # use_cache=True,
                # num_beams=1,  # 使用贪婪搜索
                do_sample=False,  # 不使用采样
                # temperature=1.0,  # 降低随机性
            )
            print("Generation successful. Generated IDs shape:", generated_ids.shape)

        except Exception as e:
            print(f"Error during generation: {e}")
            print(f"input_ids shape: {input_ids.shape}")
            print(f"attention_mask shape: {attention_mask.shape}")
            print(f"max_len_value: {max_len_value}")
            raise

'''

"""
Relevant Message
"""

Training gen
Style embeddings shape: (4, 768)
Input embeddings shape: (4, 72, 768)
Extended embeddings shape: (4, 72, 768)
Logits shape: (4, 72, 21128)
Logits dtype: <dtype: 'float16'>
labels shape: (4, 72)
Mask shape: (4, 72)
Mask dtype: <dtype: 'float16'>
Reconstruction loss: Tensor("Cast_3:0", shape=(), dtype=float32)
Input shapes: input_ids: (4, 72) input_ids dtype: <dtype: 'int32'> attention_mask: (4, 72) labels: (4, 72) styles: (4,) max_len_value: tf.Tensor(125, shape=(), dtype=int32)
New shape: Tensor("Shape_1:0", shape=(2,), dtype=int32)
Seq len: 72
Max length: tf.Tensor(125, shape=(), dtype=int32)
Max new tokens: tf.Tensor(43, shape=(), dtype=int32)
Max new tokens: 43
Padding shape: (4, 43)
Extended input_ids shape: (4, 115)
Extended attention_mask shape: (4, 115)
/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py:377: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use and modify the model generation configuration (see https://huggingface.co/docs/transformers/generation_strategies#default-text-generation-configuration )
return py_builtins.overload_of(f)(*args)
Error during generation: max_new_tokens must be greater than 0, but is 43.
input_ids shape: (4, 72)
attention_mask shape: (4, 72)
max_len_value: 125
Traceback (most recent call last):
File "train.py", line 530, in
train_model.train(train_tf_dataset_X, train_tf_dataset_Y, valid_tf_dataset_X, valid_tf_dataset_Y, trainconfig.epochs)
File "train.py", line 306, in train
rec_loss, lm_loss, adv_loss, kl_loss, current_lr, accuracy, total_gen_loss = self.distributed_train_generator_step(
File "train.py", line 138, in distributed_train_generator_step
loss, rec_loss, lm_loss, adv_loss, kl_loss, current_lr, accuracy = self.strategy.run(
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1316, in run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2892, in call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 677, in _call_for_each_replica
return mirrored_run.call_for_each_replica(
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_run.py", line 104, in call_for_each_replica
return _call_for_each_replica(strategy, fn, args, kwargs)
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_run.py", line 246, in _call_for_each_replica
coord.join(threads)
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/six.py", line 719, in reraise
raise value
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
yield
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/distribute/mirrored_run.py", line 346, in run
self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py", line 601, in wrapper
return func(*args, **kwargs)
File "train.py", line 133, in generator_step
loss, rec_loss, lm_loss, adv_loss, kl_loss, current_lr, accuracy, gradients = self.model.train_generator_step(*args, **kwargs)
File "/root/autodl-tmp/model/model.py", line 302, in train_generator_step
step_total_loss, step_rec_loss, step_lm_loss, step_adv_loss, step_kl_loss, step_gradients, step_accuracy = step_fn(
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 1129, in autograph_handler
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

File "/root/autodl-tmp/model/model.py", line 220, in step_fn *
generated_ids = self.gen.generate(
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/transformers/generation/tf_utils.py", line 738, in generate *
model_kwargs = generation_config.update(**kwargs) # All unused kwargs must be model kwargs
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/transformers/generation/configuration_utils.py", line 1207, in update *
self.validate()
File "/root/miniconda3/envs/gpt2-env/lib/python3.8/site-packages/transformers/generation/configuration_utils.py", line 544, in validate *
raise ValueError(f"max_new_tokens must be greater than 0, but is {self.max_new_tokens}.")

ValueError: max_new_tokens must be greater than 0, but is 43.
"""
Definetely, l occured this mistake within @tf.function, and there is no logical mistake when l dubug my code under eager-excution model, similarly, when i use max_length and min_length paramters, it would be occured to "ValueError: max_length must be greater than min_length, 1 is larger than 128.", like this. But, when l set the paramter"max_new_tokens" as a constant value like 50, it would be fine, l donno what leads this, and debug this for at least 20 times.
"""

Expected behavior

Of course, the value of my variable is dynamic, but I have already defined it outside the graph and used it as a parameter. My expected behavior should be 43 as max_new_token, but it reported an error.

@HelloWorldU HelloWorldU added the bug label Sep 2, 2024
@HelloWorldU
Copy link
Author

sry, the original issus is #33229

@ArthurZucker
Copy link
Collaborator

Hey! Thanks for posting! Which model are you using? Also one weird thing is: train_generator_step should not use generate, in general using generate is for inference!

@HelloWorldU
Copy link
Author

yeah, l know.l use uer/gpt2-chinese-cluecorpussmall model as generator. And l need to use generate to train my GAN, you could find complete code section in my repository, basically, l need the generate id to train my model.

Hey! Thanks for posting! Which model are you using? Also one weird thing is: train_generator_step should not use generate, in general using generate is for inference!

@ArthurZucker
Copy link
Collaborator

I would recommend not using TF in the mean time 😅

@HelloWorldU
Copy link
Author

I would recommend not using TF in the mean time 😅

anyway, thanks for looking, l aint mean to provoke you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants