Skip to content

Latest commit

 

History

History
21 lines (11 loc) · 1.9 KB

README.md

File metadata and controls

21 lines (11 loc) · 1.9 KB

ORCHESTRA Dataset

中文簡介

ORCHESTRA (cOmpRehensive Classical cHinESe poeTRy dAtaset) 是一個全面的古典中文詩歌的數據集,數據來自搜韻網。本數據集由 nk2028 進行格式轉換並發佈,希望透過公開高品質的古典中文詩歌數據,促進對古典中文詩歌及古典中文自然語言處理的研究。

ORCHESTRA-simple 是 ORCHESTRA 數據集的簡化格式,僅保留 id, title, group_index, type, dynasty, author, content 這 7 個欄位,而去除其他欄位,以簡化使用。

本資料集可用於大型語言模型的訓練。如欲作其他用途,請向數據提供者搜韻網諮詢。

數據集 ORCHESTRA-simple-1M 已經發佈於 Hugging Face Datasets。

English Introduction

ORCHESTRA (cOmpRehensive Classical cHinESe poeTRy dAtaset) is a comprehensive dataset of classical Chinese poetry, with data sourced from SouYun Website. This dataset was converted and published by nk2028, with the hope that by publicly releasing high-quality classical Chinese poetry data, it can promote research in classical Chinese poetry and natural language processing of classical Chinese.

ORCHESTRA-simple is a simplified format of the ORCHESTRA dataset, retaining only 7 fields: id, title, group_index, type, dynasty, author, and content, while removing other fields to simplify the usage.

This dataset can be used for training large language models. If you wish to use it for other purposes, please consult with the data provider, SouYun Website.

The dataset ORCHESTRA-simple-1M has been published on Hugging Face Datasets.