Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASR Confidence update and tutorial #6810

Merged
merged 18 commits into from
Jul 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/source/starthere/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,12 @@ To run a tutorial:
* - ASR
- Hybrid ASR-TTS Models Tutorial
- `Multi-lingual ASR <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_TTS_Tutorial.ipynb>`_
* - ASR
- ASR Confidence Estimation
- `ASR Confidence Estimation <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_Confidence_Estimation.ipynb>`_
* - ASR
- Confidence-based Ensembles
- `Confidence-based Ensembles <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Confidence_Ensembles.ipynb>`_
* - NLP
- Using Pretrained Language Models for Downstream Tasks
- `Pretrained Language Models for Downstream Tasks <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/nlp/01_Pretrained_Language_Models_for_Downstream_Tasks.ipynb>`_
Expand Down
70 changes: 35 additions & 35 deletions nemo/collections/asr/metrics/rnnt_wer.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,32 +100,33 @@ class AbstractRNNTDecoding(ConfidenceMixin):
from the `token_confidence`.
aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence.
Valid options are `mean`, `min`, `max`, `prod`.
method_cfg: A dict-like object which contains the method name and settings to compute per-frame
measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame
confidence scores.

name: The method name (str).
name: The measure name (str).
Supported values:
- 'max_prob' for using the maximum token probability as a confidence.
- 'entropy' for using a normalized entropy of a log-likelihood vector.

entropy_type: Which type of entropy to use (str).
Used if confidence_method_cfg.name is set to `entropy`.
Used if confidence_measure_cfg.name is set to `entropy`.
Supported values:
- 'gibbs' for the (standard) Gibbs entropy. If the temperature α is provided,
- 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided,
the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)).
Note that for this entropy, the temperature should comply the following inequality:
1/log(V) <= α <= -1/log(1-1/V) where V is the model vocabulary size.
Note that for this entropy, the alpha should comply the following inequality:
(log(V)+2-sqrt(log^2(V)+4))/(2*log(V)) <= α <= (1+log(V-1))/log(V-1)
where V is the model vocabulary size.
- 'tsallis' for the Tsallis entropy with the Boltzmann constant one.
Tsallis entropy formula is the following: H_α = 1/(α-1)*(1-sum_i(p^α_i)),
where α is a parameter. When α == 1, it works like the Gibbs entropy.
More: https://en.wikipedia.org/wiki/Tsallis_entropy
- 'renui' for the Rényi entropy.
- 'renyi' for the Rényi entropy.
Rényi entropy formula is the following: H_α = 1/(1-α)*log_2(sum_i(p^α_i)),
where α is a parameter. When α == 1, it works like the Gibbs entropy.
More: https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy

temperature: Temperature scale for logsoftmax (α for entropies). Here we restrict it to be > 0.
When the temperature equals one, scaling is not applied to 'max_prob',
alpha: Power scale for logsoftmax (α for entropies). Here we restrict it to be > 0.
When the alpha equals one, scaling is not applied to 'max_prob',
and any entropy type behaves like the Shannon entropy: H = -sum_i(p_i*log(p_i))

entropy_norm: A mapping of the entropy value to the interval [0,1].
Expand All @@ -139,7 +140,7 @@ class AbstractRNNTDecoding(ConfidenceMixin):
timestep during greedy decoding. Setting to larger values allows longer sentences
to be decoded, at the cost of increased execution time.
preserve_frame_confidence: Same as above, overrides above value.
confidence_method: Same as above, overrides confidence_cfg.method.
confidence_measure_cfg: Same as above, overrides confidence_cfg.measure_cfg.

"beam":
beam_size: int, defining the beam size for beam search. Must be >= 1.
Expand Down Expand Up @@ -255,15 +256,13 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
# initialize confidence-related fields
self._init_confidence(self.cfg.get('confidence_cfg', None))

# Update preserve frame confidence
if self.preserve_frame_confidence is False:
if self.cfg.strategy in ['greedy', 'greedy_batch']:
self.preserve_frame_confidence = self.cfg.greedy.get('preserve_frame_confidence', False)
self.confidence_method_cfg = self.cfg.greedy.get('confidence_method_cfg', None)

elif self.cfg.strategy in ['beam', 'tsd', 'alsd', 'maes']:
# Not implemented
pass
# Confidence estimation is not implemented for these strategies
if (
not self.preserve_frame_confidence
and self.cfg.strategy in ['beam', 'tsd', 'alsd', 'maes']
and self.cfg.beam.get('preserve_frame_confidence', False)
):
raise NotImplementedError(f"Confidence calculation is not supported for strategy `{self.cfg.strategy}`")

if self.cfg.strategy == 'greedy':
if self.big_blank_durations is None:
Expand All @@ -278,7 +277,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
),
preserve_alignments=self.preserve_alignments,
preserve_frame_confidence=self.preserve_frame_confidence,
confidence_method_cfg=self.confidence_method_cfg,
confidence_measure_cfg=self.confidence_measure_cfg,
)
else:
self.decoding = greedy_decode.GreedyTDTInfer(
Expand All @@ -292,7 +291,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
),
preserve_alignments=self.preserve_alignments,
preserve_frame_confidence=self.preserve_frame_confidence,
confidence_method_cfg=self.confidence_method_cfg,
confidence_measure_cfg=self.confidence_measure_cfg,
)
else:
self.decoding = greedy_decode.GreedyMultiblankRNNTInfer(
Expand All @@ -305,7 +304,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
),
preserve_alignments=self.preserve_alignments,
preserve_frame_confidence=self.preserve_frame_confidence,
confidence_method_cfg=self.confidence_method_cfg,
confidence_measure_cfg=self.confidence_measure_cfg,
)

elif self.cfg.strategy == 'greedy_batch':
Expand All @@ -321,7 +320,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
),
preserve_alignments=self.preserve_alignments,
preserve_frame_confidence=self.preserve_frame_confidence,
confidence_method_cfg=self.confidence_method_cfg,
confidence_measure_cfg=self.confidence_measure_cfg,
)
else:
self.decoding = greedy_decode.GreedyBatchedTDTInfer(
Expand All @@ -335,7 +334,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
),
preserve_alignments=self.preserve_alignments,
preserve_frame_confidence=self.preserve_frame_confidence,
confidence_method_cfg=self.confidence_method_cfg,
confidence_measure_cfg=self.confidence_measure_cfg,
)

else:
Expand All @@ -349,7 +348,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
),
preserve_alignments=self.preserve_alignments,
preserve_frame_confidence=self.preserve_frame_confidence,
confidence_method_cfg=self.confidence_method_cfg,
confidence_measure_cfg=self.confidence_measure_cfg,
)

elif self.cfg.strategy == 'beam':
Expand Down Expand Up @@ -1006,32 +1005,33 @@ class RNNTDecoding(AbstractRNNTDecoding):
from the `token_confidence`.
aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence.
Valid options are `mean`, `min`, `max`, `prod`.
method_cfg: A dict-like object which contains the method name and settings to compute per-frame
measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame
confidence scores.

name: The method name (str).
name: The measure name (str).
Supported values:
- 'max_prob' for using the maximum token probability as a confidence.
- 'entropy' for using a normalized entropy of a log-likelihood vector.

entropy_type: Which type of entropy to use (str).
Used if confidence_method_cfg.name is set to `entropy`.
Used if confidence_measure_cfg.name is set to `entropy`.
Supported values:
- 'gibbs' for the (standard) Gibbs entropy. If the temperature α is provided,
- 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided,
the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)).
Note that for this entropy, the temperature should comply the following inequality:
1/log(V) <= α <= -1/log(1-1/V) where V is the model vocabulary size.
Note that for this entropy, the alpha should comply the following inequality:
(log(V)+2-sqrt(log^2(V)+4))/(2*log(V)) <= α <= (1+log(V-1))/log(V-1)
where V is the model vocabulary size.
- 'tsallis' for the Tsallis entropy with the Boltzmann constant one.
Tsallis entropy formula is the following: H_α = 1/(α-1)*(1-sum_i(p^α_i)),
where α is a parameter. When α == 1, it works like the Gibbs entropy.
More: https://en.wikipedia.org/wiki/Tsallis_entropy
- 'renui' for the Rényi entropy.
- 'renyi' for the Rényi entropy.
Rényi entropy formula is the following: H_α = 1/(1-α)*log_2(sum_i(p^α_i)),
where α is a parameter. When α == 1, it works like the Gibbs entropy.
More: https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy

temperature: Temperature scale for logsoftmax (α for entropies). Here we restrict it to be > 0.
When the temperature equals one, scaling is not applied to 'max_prob',
alpha: Power scale for logsoftmax (α for entropies). Here we restrict it to be > 0.
When the alpha equals one, scaling is not applied to 'max_prob',
and any entropy type behaves like the Shannon entropy: H = -sum_i(p_i*log(p_i))

entropy_norm: A mapping of the entropy value to the interval [0,1].
Expand All @@ -1047,7 +1047,7 @@ class RNNTDecoding(AbstractRNNTDecoding):

preserve_frame_confidence: Same as above, overrides above value.

confidence_method: Same as above, overrides confidence_cfg.method.
confidence_measure_cfg: Same as above, overrides confidence_cfg.measure_cfg.

"beam":
beam_size: int, defining the beam size for beam search. Must be >= 1.
Expand Down
21 changes: 11 additions & 10 deletions nemo/collections/asr/metrics/rnnt_wer_bpe.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,32 +100,33 @@ class RNNTBPEDecoding(AbstractRNNTDecoding):
from the `token_confidence`.
aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence.
Valid options are `mean`, `min`, `max`, `prod`.
method_cfg: A dict-like object which contains the method name and settings to compute per-frame
measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame
confidence scores.

name: The method name (str).
name: The measure name (str).
Supported values:
- 'max_prob' for using the maximum token probability as a confidence.
- 'entropy' for using a normalized entropy of a log-likelihood vector.

entropy_type: Which type of entropy to use (str).
Used if confidence_method_cfg.name is set to `entropy`.
Used if confidence_measure_cfg.name is set to `entropy`.
Supported values:
- 'gibbs' for the (standard) Gibbs entropy. If the temperature α is provided,
- 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided,
the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)).
Note that for this entropy, the temperature should comply the following inequality:
1/log(V) <= α <= -1/log(1-1/V) where V is the model vocabulary size.
Note that for this entropy, the alpha should comply the following inequality:
(log(V)+2-sqrt(log^2(V)+4))/(2*log(V)) <= α <= (1+log(V-1))/log(V-1)
where V is the model vocabulary size.
- 'tsallis' for the Tsallis entropy with the Boltzmann constant one.
Tsallis entropy formula is the following: H_α = 1/(α-1)*(1-sum_i(p^α_i)),
where α is a parameter. When α == 1, it works like the Gibbs entropy.
More: https://en.wikipedia.org/wiki/Tsallis_entropy
- 'renui' for the Rényi entropy.
- 'renyi' for the Rényi entropy.
Rényi entropy formula is the following: H_α = 1/(1-α)*log_2(sum_i(p^α_i)),
where α is a parameter. When α == 1, it works like the Gibbs entropy.
More: https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy

temperature: Temperature scale for logsoftmax (α for entropies). Here we restrict it to be > 0.
When the temperature equals one, scaling is not applied to 'max_prob',
alpha: Power scale for logsoftmax (α for entropies). Here we restrict it to be > 0.
When the alpha equals one, scaling is not applied to 'max_prob',
and any entropy type behaves like the Shannon entropy: H = -sum_i(p_i*log(p_i))

entropy_norm: A mapping of the entropy value to the interval [0,1].
Expand All @@ -141,7 +142,7 @@ class RNNTBPEDecoding(AbstractRNNTDecoding):

preserve_frame_confidence: Same as above, overrides above value.

confidence_method: Same as above, overrides confidence_cfg.method.
confidence_measure_cfg: Same as above, overrides confidence_cfg.measure_cfg.

"beam":
beam_size: int, defining the beam size for beam search. Must be >= 1.
Expand Down
Loading
Loading