GitHub

ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models

This is the repo for ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models.

Dataset Introduction:

CONTRADOC contains 449 self-contradictory(positive examples) and 442 non-contradictory(negative examples) documents, covering documents sourced from CNN_Dailymain News, Wikipedia and Story Summaries, document length varying from 300 to 2200 tokens. This dataset is created by introducing self-contradiction to documents using GPT-4-Modify => Human Annotate and Verify. This dataset is developed as a benchmark to test model's ability in finding contradiction in long document.

Dataset Format:

The positive examples are in "pos", while negative examples are in "neg", please refer to the paper for more detials of each label.

{"pos":
  {"DOC_ID":
    {"text": DOCUMENT, 
      "evidence": SENTENCE_INTRODUCING_CONTRADICTION,
      "unique id": DOC_ID,
      "doc_type":"story_OR_news_OR_wiki",
      "contra_plug": "Insert_OR_Replace", 
      "contra_type": [contradiction type],
      "scope": "global_OR_local_OR_intra",
      "ref sentences"(optional): [sentences contradict the evidence]
    },
  },
},
{"neg":
  {"DOC_ID":
    {"text": DOCUMENT,
      "doc_type": "story_OR_news_OR_wiki",
      "unique id": DOC_ID
    },
  },
}

Dataset Creation:

One needs to firstly generate and store false documents generated by GPT Remember to setup your openai api key

cd extract
mkdir contradiction_json
python extract.py # check --help for information/options

The raw files will be stored in contradiction_json, then aggregate the files, and run contradoc_create.py to do insertion, replacement, and initial filter for generated false document candidates. Note that all generated documents need further verification by humans to guarantee the validity.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
extract		extract
ContraDoc.json		ContraDoc.json
LICENSE		LICENSE
README.md		README.md
eval_metric.py		eval_metric.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models

Dataset Introduction:

Dataset Format:

Dataset Creation:

About

Releases

Packages

Contributors 2

Languages

License

ddhruvkr/CONTRADOC

Folders and files

Latest commit

History

Repository files navigation

ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models

Dataset Introduction:

Dataset Format:

Dataset Creation:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages