ways to compute ROC-AUC and the label #262

EdwardChang5467 · 2024-11-26T12:02:57Z

Thank you again for providing this repository. I would like to know how to use this repository to evaluate the obtained uncertainty after obtaining it, and how the labels used in the evaluation process are obtained, such as the ROC-AUC value of the CCP method based on biographical claims. Could you please provide an example?

IINemo · 2024-11-28T21:56:39Z

Hi, thank you for your question.

You can perform evaluation by using the polygraph_eval script. The example config file for it could be found here: https://github.com/IINemo/lm-polygraph/blob/main/examples/configs/polygraph_eval_coqa.yaml
If you have any trouble, I would be happy to help you to running your script with appropriate configuration.
The "ground truth" labels are obtained by querying GPT-4o. You can see how the labels for claims are obtained here: https://github.com/IINemo/lm-polygraph/blob/main/src/lm_polygraph/generation_metrics/openai_fact_check.py
Before validating the claims, we split text into atomic claims using GPT-4o here:
https://github.com/IINemo/lm-polygraph/blob/main/src/lm_polygraph/stat_calculators/extract_claims.py

IINemo · 2024-11-29T21:58:23Z

Here is an example for launching the benchmark scripts:

CUDA_VISIBLE_DEVICES=0 HYDRA_CONFIG=`pwd`/examples/configs/polygraph_eval_mmlu.yaml polygraph_eval

EdwardChang5467 · 2024-11-30T03:32:26Z

When I'm labeling, I find that sometimes it outputs' nan '. When calculating ROC-AUC, is it filtered along with the corresponding uncertainty score?

cant-access-rediska0123 · 2024-12-05T13:08:28Z

When I'm labeling, I find that sometimes it outputs' nan '. When calculating ROC-AUC, is it filtered along with the corresponding uncertainty score?

Hi, thank you for your question! Yes, when the uncertainty estimator outputs 'NaN' for certain claims, those claims are excluded from the ROC-AUC calculations. Additionally, some claims cannot be determined as either 'True' or 'False' (e.g., when GPT-4 outputs 'Not known' instead of a definitive 'True' or 'False'). These claims are labeled as 'NaN' in OpenAIFactCheck and are similarly excluded from the computations.

IINemo assigned rvashurin and cant-access-rediska0123 Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ways to compute ROC-AUC and the label #262

ways to compute ROC-AUC and the label #262

EdwardChang5467 commented Nov 26, 2024

IINemo commented Nov 28, 2024

IINemo commented Nov 29, 2024

EdwardChang5467 commented Nov 30, 2024

cant-access-rediska0123 commented Dec 5, 2024

ways to compute ROC-AUC and the label #262

ways to compute ROC-AUC and the label #262

Comments

EdwardChang5467 commented Nov 26, 2024

IINemo commented Nov 28, 2024

IINemo commented Nov 29, 2024

EdwardChang5467 commented Nov 30, 2024

cant-access-rediska0123 commented Dec 5, 2024