Skip to content

Commit

Permalink
Docs: fix indentation in HammingDiversityLogitsProcessor (huggingfa…
Browse files Browse the repository at this point in the history
  • Loading branch information
gante authored and EduardoPach committed Nov 18, 2023
1 parent cd9ce13 commit 3fd3c66
Showing 1 changed file with 86 additions and 135 deletions.
221 changes: 86 additions & 135 deletions src/transformers/generation/logits_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -1066,155 +1066,106 @@ def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> to

class HammingDiversityLogitsProcessor(LogitsProcessor):
r"""
[`LogitsProcessor`] that enforces diverse beam search.
[`LogitsProcessor`] that enforces diverse beam search.
Note that this logits processor is only effective for [`PreTrainedModel.group_beam_search`]. See [Diverse Beam
Search: Decoding Diverse Solutions from Neural Sequence Models](https://arxiv.org/pdf/1610.02424.pdf) for more
details.
Note that this logits processor is only effective for [`PreTrainedModel.group_beam_search`]. See [Diverse Beam
Search: Decoding Diverse Solutions from Neural Sequence Models](https://arxiv.org/pdf/1610.02424.pdf) for more
details.
<Tip>
<Tip>
Diverse beam search can be particularly useful in scenarios where a variety of different outputs is desired,
rather than multiple similar sequences. It allows the model to explore different generation paths and provides
a broader coverage of possible outputs.
Diverse beam search can be particularly useful in scenarios where a variety of different outputs is desired, rather
than multiple similar sequences. It allows the model to explore different generation paths and provides a broader
coverage of possible outputs.
</Tip>
</Tip>
<Tip warning={true}>
<Tip warning={true}>
This logits processor can be resource-intensive, especially when using large models or long sequences.
This logits processor can be resource-intensive, especially when using large models or long sequences.
</Tip>
</Tip>
Traditional beam search often generates very similar sequences across different beams.
Traditional beam search often generates very similar sequences across different beams.
`HammingDiversityLogitsProcessor` addresses this by penalizing beams that generate tokens already chosen by other
beams in the same time step.
The `HammingDiversityLogitsProcessor` addresses this by penalizing beams that generate tokens already chosen by
other beams in the same time step.
How It Works:
- **Grouping Beams**: Beams are divided into groups. Each group selects tokens independently of the others.
- **Penalizing Repeated Tokens**: If a beam in a group selects a token already chosen by another group in the
same step, a penalty is applied to that token's score.
- **Promoting Diversity**: This penalty discourages beams within a group from selecting the same tokens as
beams in other groups.
How It Works:
- **Grouping Beams**: Beams are divided into groups. Each group selects tokens independently of the others.
- **Penalizing Repeated Tokens**: If a beam in a group selects a token already chosen by another group in the
same step, a penalty is applied to that token's score.
- **Promoting Diversity**: This penalty discourages beams within a group from selecting the same tokens as
beams in other groups.
Benefits:
- **Diverse Outputs**: Produces a variety of different sequences.
- **Exploration**: Allows the model to explore different paths.
Benefits:
- **Diverse Outputs**: Produces a variety of different sequences.
- **Exploration**: Allows the model to explore different paths.
Args:
diversity_penalty (`float`):
This value is subtracted from a beam's score if it generates a token same as any beam from other group at a
particular time. Note that `diversity_penalty` is only effective if group beam search is enabled. The
penalty applied to a beam's score when it generates a token that has already been chosen by another beam
within the same group during the same time step. A higher `diversity_penalty` will enforce greater
diversity among the beams, making it less likely for multiple beams to choose the same token. Conversely, a
lower penalty will allow beams to more freely choose similar tokens. Adjusting this value can help strike a
balance between diversity and natural likelihood.
num_beams (`int`):
Number of beams used for group beam search. Beam search is a method used that maintains beams (or "multiple
hypotheses") at each step, expanding each one and keeping the top-scoring sequences. A higher `num_beams`
will explore more potential sequences. This can increase chances of finding a high-quality output but also
increases computational cost.
num_beam_groups (`int`):
Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams.
Each group of beams will operate independently, selecting tokens without considering the choices of other
groups. This division promotes diversity by ensuring that beams within different groups explore different
paths. For instance, if `num_beams` is 6 and `num_beam_groups` is 2, there will be 2 groups each containing
3 beams. The choice of `num_beam_groups` should be made considering the desired level of output diversity
and the total number of beams. See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
Args:
diversity_penalty (`float`):
This value is subtracted from a beam's score if it generates a token same as any beam from other group
at a particular time. Note that `diversity_penalty` is only effective if `group beam search` is
enabled.
- The penalty applied to a beam's score when it generates a token that has already been chosen
by another beam within the same group during the same time step.
- A higher `diversity_penalty` will enforce greater diversity among the beams,
making it less likely for multiple beams to choose the same token.
- Conversely, a lower penalty will allow beams to more freely choose similar tokens. --
Adjusting
this value can help strike a balance between diversity and natural likelihood.
num_beams (`int`):
Number of beams used for group beam search. See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for
more details.
- Beam search is a method used that maintains beams (or "multiple hypotheses") at each step,
expanding each one and keeping the top-scoring sequences.
- A higher `num_beams` will explore more potential sequences
This can increase chances of finding a high-quality output but also increases computational
cost.
num_beam_groups (`int`):
Number of groups to divide `num_beams` into in order to ensure diversity among different groups of
beams. See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
- Each group of beams will operate independently, selecting tokens without considering the
choices of other groups.
- This division promotes diversity by ensuring that beams within different groups explore
different paths.
- For instance, if `num_beams` is 6 and `num_beam_groups` is 2, there will be 2 groups each
containing 3 beams.
- The choice of `num_beam_groups` should be made considering the desired level of output
diversity and the total number of beams.
Example: the below example shows a comparison before and after applying Hamming Diversity.
```python
>>> from transformers import (
... AutoTokenizer,
... AutoModelForSeq2SeqLM,
... LogitsProcessorList,
... MinLengthLogitsProcessor,
... HammingDiversityLogitsProcessor,
... BeamSearchScorer,
... )
>>> import torch
>>> # Initialize the model and tokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
>>> # A long text about the solar system
>>> text = "The Solar System is a gravitationally bound system comprising the Sun and the objects that orbit it, either directly or indirectly. Of the objects that orbit the Sun directly, the largest are the eight planets, with the remainder being smaller objects, such as the five dwarf planets and small Solar System bodies. The Solar System formed 4.6 billion years ago from the gravitational collapse of a giant interstellar molecular cloud."
>>> encoder_input_str = "summarize: " + text
>>> encoder_input_ids = tokenizer(encoder_input_str, return_tensors="pt").input_ids
>>> # Set up for diverse beam search
>>> num_beams = 6
>>> num_beam_groups = 2
>>> model_kwargs = {
... "encoder_outputs": model.get_encoder()(
... encoder_input_ids.repeat_interleave(num_beams, dim=0), return_dict=True
... )
... }
>>> beam_scorer = BeamSearchScorer(
... batch_size=1,
... max_length=model.config.max_length,
... num_beams=num_beams,
... device=model.device,
... num_beam_groups=num_beam_groups,
... )
>>> # Initialize the diversity logits processor
>>> # set the logits processor list, note that `HammingDiversityLogitsProcessor` is effective only if `group beam search` is enabled
>>> logits_processor_diverse = LogitsProcessorList(
... [
... HammingDiversityLogitsProcessor(5.5, num_beams=num_beams, num_beam_groups=num_beam_groups),
... MinLengthLogitsProcessor(10, eos_token_id=model.config.eos_token_id),
... ]
... )
>>> # generate the diverse summary using group_beam_search
>>> outputs_diverse = model.group_beam_search(
... encoder_input_ids.repeat_interleave(num_beams, dim=0),
... beam_scorer,
... logits_processor=logits_processor_diverse,
... **model_kwargs,
... )
>>> # Generate non-diverse summary
>>> outputs_non_diverse = model.generate(
... encoder_input_ids,
... max_length=100,
... num_beams=num_beams,
... no_repeat_ngram_size=2,
... early_stopping=True,
... )
>>> # Decode and print the summaries
>>> summaries_diverse = tokenizer.batch_decode(outputs_diverse, skip_special_tokens=True)
>>> summary_non_diverse = tokenizer.decode(outputs_non_diverse[0], skip_special_tokens=True)
>>> print("Diverse Summary:")
>>> print(summaries_diverse[0])
>>> # The Solar System is a gravitationally bound system comprising the Sun and the objects that orbit it, either directly or indirectly. Of the objects that orbit the Sun directly, the largest are the eight planets, with the remainder being smaller objects, such as the five dwarf planets and small Solar System bodies. The Solar System formed 4.6 billion years ago from the gravitational collapse of a giant interstellar molecular cloud.
>>> print("\nNon-Diverse Summary:")
>>> print(summary_non_diverse)
>>> # The Sun and the objects that orbit it directly are the eight planets, with the remainder being smaller objects, such as the five dwarf worlds and small Solar System bodies. It formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud. ```
```
Examples:
For more details, see [Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence
Models](https://arxiv.org/pdf/1610.02424.pdf).
```python
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
>>> import torch
>>> # Initialize the model and tokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
>>> # A long text about the solar system
>>> text = "The Solar System is a gravitationally bound system comprising the Sun and the objects that orbit it, either directly or indirectly. Of the objects that orbit the Sun directly, the largest are the eight planets, with the remainder being smaller objects, such as the five dwarf planets and small Solar System bodies. The Solar System formed 4.6 billion years ago from the gravitational collapse of a giant interstellar molecular cloud."
>>> inputs = tokenizer("summarize: " + text, return_tensors="pt")
>>> # Generate diverse summary
>>> outputs_diverse = model.generate(
... **inputs,
... num_beam_groups=2,
... diversity_penalty=10.0,
... max_length=100,
... num_beams=4,
... num_return_sequences=2,
... )
>>> summaries_diverse = tokenizer.batch_decode(outputs_diverse, skip_special_tokens=True)
>>> # Generate non-diverse summary
>>> outputs_non_diverse = model.generate(
... **inputs,
... max_length=100,
... num_beams=4,
... num_return_sequences=2,
... )
>>> summary_non_diverse = tokenizer.batch_decode(outputs_non_diverse, skip_special_tokens=True)
>>> # With `diversity_penalty`, the resulting beams are much more diverse
>>> print(summary_non_diverse)
['the solar system formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud. of the objects that orbit the Sun directly, the largest are the eight planets.',
'the Solar System formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud. of the objects that orbit the Sun directly, the largest are the eight planets.']
>>> print(summaries_diverse)
['the solar system formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud. of the objects that orbit the Sun directly, the largest are the eight planets.',
'the solar system formed 4.6 billion years ago from the collapse of a giant interstellar molecular cloud. of the objects that orbit the Sun directly, the largest are the eight planets. the rest of the objects are smaller objects, such as the five dwarf planets and small solar system bodies.']
```
"""

def __init__(self, diversity_penalty: float, num_beams: int, num_beam_groups: int):
Expand Down

0 comments on commit 3fd3c66

Please sign in to comment.