You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For semantic entropy they were using the classwise probability as defined by
Here is an example of that calculation from the same paper.
However, the way this is calculating it, they are adding up all the sample texts outputted, but not taking into account that often sample texts repeat which gets you probabilites greater than 1.
For example, lets say the model outputs 5 outputs. ['Paris','Paris','Paris','Its Paris','London'] with the following likelihoods [0.6,0.6,0.6,0.3,0.1]. Based on the way this library was calculating it theyd get the probability of the first class as 0.6+0.6+0.6+0.3=2.1 and the second class as 0.1. But how can that first class be a probability greater than 1? It shouldn't be because it should be only adding non-repeating classes together. So since those first three outputs are the same then the class probabilites should be 0.9 and 0.1.
In the code you can see it in semantic_entropy.py inside the estimators folder. for i in range(len(hyps_list)): class_likelihoods = [ np.array(loglikelihoods_list[i])[np.array(class_idx)] for class_idx in self._class_to_sample[i] ] class_lp = [ np.logaddexp.reduce(likelihoods) for likelihoods in class_likelihoods ] if log_weights[i] is None: log_weights[i] = [0 for _ in hyps_list[i]] semantic_logits[i] = -np.mean( [ class_lp[self._sample_to_class[i][j]] * np.exp(log_weights[i][j]) for j in range(len(hyps_list[i])) ] )
class_lp portion is summing all outputs in each class instead of all unique outputs in each class.
This means that the more outputs you generate the larger the uncertainty will get.
The text was updated successfully, but these errors were encountered:
@gyampols thank you for your finding. Indeed we use the official implementation from the original paper. This approach has a pointed out "bug". The question is whether this "bug" leads to better results or not.
We will implement the "corrected" version in addition to the original one and conduct the experiments. If you have already tested the "corrected" version, please share the results.
For semantic entropy they were using the classwise probability as defined by
![image](https://private-user-images.githubusercontent.com/178942882/359624129-06b66da6-1fbf-4a43-a5f2-641887355e9f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1NjY1NzYsIm5iZiI6MTczOTU2NjI3NiwicGF0aCI6Ii8xNzg5NDI4ODIvMzU5NjI0MTI5LTA2YjY2ZGE2LTFmYmYtNGE0My1hNWYyLTY0MTg4NzM1NWU5Zi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNFQyMDUxMTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xMTMzNjlhODJkNjkyMzdhN2Y4ZjY4YWYxYjc2YjRjZDdmZDFhNTM2MzE5ZTg5ZTUwM2U3MzFmM2I1MzMwNjA5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.f9nOoywJbrp6HJbUKn3qMlceP3ykDFdw-YPITIu3NNU)
![image](https://private-user-images.githubusercontent.com/178942882/359624407-7f0f97e5-974f-4ece-bf2e-07d9398d76ee.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1NjY1NzYsIm5iZiI6MTczOTU2NjI3NiwicGF0aCI6Ii8xNzg5NDI4ODIvMzU5NjI0NDA3LTdmMGY5N2U1LTk3NGYtNGVjZS1iZjJlLTA3ZDkzOThkNzZlZS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNFQyMDUxMTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yNWNhYWUyM2QyZWI5NjdmMDNiYmViYjM0ZDk0NDllOTQxZWYyNzUyZTg1NjBmNmViOWUwYjM0M2ZmZjY0MjVkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.XoBwbvYYk9UWnxunSt30FB-OYnATTUDa6oqu7n5h4EU)
Here is an example of that calculation from the same paper.
However, the way this is calculating it, they are adding up all the sample texts outputted, but not taking into account that often sample texts repeat which gets you probabilites greater than 1.
For example, lets say the model outputs 5 outputs. ['Paris','Paris','Paris','Its Paris','London'] with the following likelihoods [0.6,0.6,0.6,0.3,0.1]. Based on the way this library was calculating it theyd get the probability of the first class as 0.6+0.6+0.6+0.3=2.1 and the second class as 0.1. But how can that first class be a probability greater than 1? It shouldn't be because it should be only adding non-repeating classes together. So since those first three outputs are the same then the class probabilites should be 0.9 and 0.1.
In the code you can see it in semantic_entropy.py inside the estimators folder.
for i in range(len(hyps_list)): class_likelihoods = [ np.array(loglikelihoods_list[i])[np.array(class_idx)] for class_idx in self._class_to_sample[i] ] class_lp = [ np.logaddexp.reduce(likelihoods) for likelihoods in class_likelihoods ] if log_weights[i] is None: log_weights[i] = [0 for _ in hyps_list[i]] semantic_logits[i] = -np.mean( [ class_lp[self._sample_to_class[i][j]] * np.exp(log_weights[i][j]) for j in range(len(hyps_list[i])) ] )
class_lp portion is summing all outputs in each class instead of all unique outputs in each class.
This means that the more outputs you generate the larger the uncertainty will get.
The text was updated successfully, but these errors were encountered: