Skip to content

DataScienceUIBK/ComplexTempQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 

Repository files navigation

Huggingface

ComplexTempQA

Overview

This repository contains a large-scale temporal question answering dataset designed for evaluating and training language models on temporal reasoning tasks. The dataset consists of question-answer pairs with a focus on temporal aspects, covering a wide range of events and entities from 1987 to 2023.

Dataset Description

  • Size: The dataset comprises 100,228,457 question-answer pairs, making it one of the largest temporal question answering datasets available.
  • Question Types: Questions are categorized based on their complexity, including easy and hard questions, each designed to test different levels of temporal reasoning and understanding.
  • Content: The dataset covers a diverse range of events and entities, sourced from Wikipedia and Wikidata, ensuring a rich and varied set of questions for evaluation.
  • Metadata: Each question-answer pair includes additional metadata, such as entity/event IDs, question difficulty ratings, and temporal attributes, providing valuable information for analysis and model evaluation.

Dataset distribution

Name Total
Attribute Event 83,798
Attribute Entity 84,079
Attribute Time 9,454
Comparison Event 25,353,340
Comparison Entity 74,678,117
Comparison Time 54,022,952
Counting Event 18,325
Counting Entity 10,798
Counting Time 12,732
Multi-Hop: 76,933
Unnamed Event: 8,707,123
Total: 100,228,457

Evaluation and Usage

  • Performance Evaluation: The dataset can be used to evaluate the performance of language models on temporal reasoning tasks, including across-time comparison, event/entity detection, and multi-hop reasoning.
  • Fine-Tuning: Researchers can leverage this dataset for fine-tuning language models, enhancing their temporal reasoning capabilities and performance on similar tasks.

Dataset

Question Generation

This project contains Python scripts designed to generate various types of questions based on event data. The scripts read event attributes from a database, construct questions, and store them back in the database.

Requirements

  • Python 3.x
  • psycopg2 for PostgreSQL database interaction
  • requests for HTTP requests
  • configparser for reading database configuration
  • SPARQLWrapper for executing SPARQL queries

Setup

  1. Clone the repository:
    git clone <repository_url>
    cd <repository_folder>
  2. Install the required Python packages:
    pip install psycopg2 configparser pandas
  3. Configure the database connection:
    • Create a database.ini file with the following format:
      [postgresql]
      host=your_host
      database=your_database
      user=your_user
      password=your_password

Running the Scripts

  1. Ensure your database is set up and populated with the required data.
  2. Run the question generation files for the desired type of question

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages