This repository contains a large-scale temporal question answering dataset designed for evaluating and training language models on temporal reasoning tasks. The dataset consists of question-answer pairs with a focus on temporal aspects, covering a wide range of events and entities from 1987 to 2023.
- Size: The dataset comprises 100,228,457 question-answer pairs, making it one of the largest temporal question answering datasets available.
- Question Types: Questions are categorized based on their complexity, including easy and hard questions, each designed to test different levels of temporal reasoning and understanding.
- Content: The dataset covers a diverse range of events and entities, sourced from Wikipedia and Wikidata, ensuring a rich and varied set of questions for evaluation.
- Metadata: Each question-answer pair includes additional metadata, such as entity/event IDs, question difficulty ratings, and temporal attributes, providing valuable information for analysis and model evaluation.
Name | Total |
---|---|
Attribute Event | 83,798 |
Attribute Entity | 84,079 |
Attribute Time | 9,454 |
Comparison Event | 25,353,340 |
Comparison Entity | 74,678,117 |
Comparison Time | 54,022,952 |
Counting Event | 18,325 |
Counting Entity | 10,798 |
Counting Time | 12,732 |
Multi-Hop: | 76,933 |
Unnamed Event: | 8,707,123 |
Total: | 100,228,457 |
- Performance Evaluation: The dataset can be used to evaluate the performance of language models on temporal reasoning tasks, including across-time comparison, event/entity detection, and multi-hop reasoning.
- Fine-Tuning: Researchers can leverage this dataset for fine-tuning language models, enhancing their temporal reasoning capabilities and performance on similar tasks.
- Download: The dataset is available at Hugging Face
This project contains Python scripts designed to generate various types of questions based on event data. The scripts read event attributes from a database, construct questions, and store them back in the database.
- Python 3.x
psycopg2
for PostgreSQL database interactionrequests
for HTTP requestsconfigparser
for reading database configurationSPARQLWrapper
for executing SPARQL queries
- Clone the repository:
git clone <repository_url> cd <repository_folder>
- Install the required Python packages:
pip install psycopg2 configparser pandas
- Configure the database connection:
- Create a
database.ini
file with the following format:[postgresql] host=your_host database=your_database user=your_user password=your_password
- Create a
- Ensure your database is set up and populated with the required data.
- Run the question generation files for the desired type of question