Skip to content

Latest commit

 

History

History
55 lines (46 loc) · 2.08 KB

README.md

File metadata and controls

55 lines (46 loc) · 2.08 KB

Data for Experiments

Format:

{
    "comment": "comment text",
    "classification": "technical debt types (DESIGN | IMPLEMENTATION | DEFECT | DOCUMENTATION | TEST | WITHOUT_CLASSIFICATION)"
}

Format:

{
    "comment": "comment text",
    "classification": "technical debt types (DESIGN | IMPLEMENTATION | DEFECT | DOCUMENTATION | TEST | WITHOUT_CLASSIFICATION)",
    "projectname":"repository name" // unsused when training
}
  • 10-fold maldonado62k: 10-folds/projects training and validation of Maldonado62K dataset. Use to answer RQ1. Data format is same as tesoro_as_extra_data.

  • 10-fold tesoro_comment: 10-folds training and validation of $\text{Tesoro}_{comment}$ dataset. Use to answer RQ2.

Format:

{
    "id": "function id in the dataset",
    "comment_id": "comment id of the function",
    "comment": "comment text",
    "classification": "technical debt types (DESIGN | IMPLEMENTATION | DEFECT | DOCUMENTATION | TEST | NONSATD)",
    "code": "full fucntion context",
    "code_context_2": "2 lines code context",
    "code_context_10": "10 lines code context",
    "code_context_20": "20 lines code context",
    "repo": "Repository that contains this source" // unsused when training
}
  • 10-fold tesoro_code: 10-folds training and validation of $\text{Tesoro}_{code}$ dataset. Use to answer RQ3.

Format:

{
    "id": "function id in the dataset",
    "original_code": "raw function",
    "code_wo_comment": "original code without comment",
    "cleancode": "normalized version of code (lowercase, remove newline \n)",
    "label": "binary list corresponding to 4 TD types (DESIGN, IMPLEMENATION, DEFECT, TEST)",
    "repo": "Repository that contains this source" // unsused when training
}