BoundaryDetection(AAAI 2024 full paper)

1.These are the implemented codes and hybrid text dataset for our boundary detection model TriBERT (https://arxiv.org/abs/2307.12267), which has been accepted as a full paper on AAAI 2024.

2.For details about the boundary detection model TriBERT and how the hybrid essay dataset was constructed.Please refer to our paper:

  **Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid Essay in Education**
  BibTex:
  @article{
      zeng2023towards,
      title={Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid Essay in Education},
      author={Zeng, Zijie and Sha, Lele and Li, Yuheng and Yang, Kaixun and Ga{\v{s}}evi{\'c}, Dragan and Chen, Guanliang},
      journal={Proceedings of the 38th AAAI Conference on Artificial Intelligence},
      year={2024}
  }

4.Particularly, here we describe the meaning of columns from data.xlsx (hybrid essay dataset) ----------------------------------**********************-----------------------------------

essay_id: The id number of the original source essay.

essayset: The id of the prompt of the source essay.

essay: The original source essay on which the hybrid essay is based.

score1: The score given by rater 1 for the original source essay.

score2: The score given by rater 2 for the original source essay.

score: The final score for the original source essay.

ratio: Randomly generated number, PLEASE IGNORE THIS.

train_ix: This indicator is used to specify to which set (Train/Valid/Test) the current piece of data belongs.

sent_and_label: The list of <sentence, label> combinations (Each sentence comes from the hybrid essay), i.e., sentences and their labels. Label here means the authorship of the sentence. For example, label 'human' mean human-written and label 'machine' means ChatGPT-generated.

hybrid_text: The human-AI collaboratively written by ChatGPT and students.

boundary_ix: The list containing all boundaries for the above hybrid essay.

boundary_num: The number of boundaries of this hybrid essay.

author_seq: The structure of the hybrid essay. For example, 'H_M' means that the hybrid essay begins with human-written sentences and ends with machine-generated sentences (ChatGPT). 'M_H_M' means that the beginning text and ending text are machine-generated while the middle part is human-written.

human_part: Concatenation of all human-written sentences (extracted from the hybrid text).

machine_part: Concatenation of all ChatGPT-generated sentences (extracted from the hybrid text).

----------------------------------**********************-----------------------------------

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Codes_and_DataForBoundaryDetectionProject		Codes_and_DataForBoundaryDetectionProject
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BoundaryDetection(AAAI 2024 full paper)

About

Releases

Packages

Languages

douglashiwo/BoundaryDetection

Folders and files

Latest commit

History

Repository files navigation

BoundaryDetection(AAAI 2024 full paper)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages