Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ways to compute ROC-AUC and the label #262

Open
EdwardChang5467 opened this issue Nov 26, 2024 · 4 comments
Open

ways to compute ROC-AUC and the label #262

EdwardChang5467 opened this issue Nov 26, 2024 · 4 comments
Assignees

Comments

@EdwardChang5467
Copy link

Thank you again for providing this repository. I would like to know how to use this repository to evaluate the obtained uncertainty after obtaining it, and how the labels used in the evaluation process are obtained, such as the ROC-AUC value of the CCP method based on biographical claims. Could you please provide an example?

@IINemo
Copy link
Owner

IINemo commented Nov 28, 2024

Hi, thank you for your question.

  1. You can perform evaluation by using the polygraph_eval script. The example config file for it could be found here: https://github.com/IINemo/lm-polygraph/blob/main/examples/configs/polygraph_eval_coqa.yaml
    If you have any trouble, I would be happy to help you to running your script with appropriate configuration.
  2. The "ground truth" labels are obtained by querying GPT-4o. You can see how the labels for claims are obtained here: https://github.com/IINemo/lm-polygraph/blob/main/src/lm_polygraph/generation_metrics/openai_fact_check.py
    Before validating the claims, we split text into atomic claims using GPT-4o here:
    https://github.com/IINemo/lm-polygraph/blob/main/src/lm_polygraph/stat_calculators/extract_claims.py

@IINemo
Copy link
Owner

IINemo commented Nov 29, 2024

Here is an example for launching the benchmark scripts:

CUDA_VISIBLE_DEVICES=0 HYDRA_CONFIG=`pwd`/examples/configs/polygraph_eval_mmlu.yaml polygraph_eval

@EdwardChang5467
Copy link
Author

When I'm labeling, I find that sometimes it outputs' nan '. When calculating ROC-AUC, is it filtered along with the corresponding uncertainty score?

@cant-access-rediska0123
Copy link
Collaborator

When I'm labeling, I find that sometimes it outputs' nan '. When calculating ROC-AUC, is it filtered along with the corresponding uncertainty score?

Hi, thank you for your question! Yes, when the uncertainty estimator outputs 'NaN' for certain claims, those claims are excluded from the ROC-AUC calculations. Additionally, some claims cannot be determined as either 'True' or 'False' (e.g., when GPT-4 outputs 'Not known' instead of a definitive 'True' or 'False'). These claims are labeled as 'NaN' in OpenAIFactCheck and are similarly excluded from the computations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants