Skip to content

Commit

Permalink
refactor: optimize dataset name
Browse files Browse the repository at this point in the history
  • Loading branch information
ganler committed Apr 21, 2024
1 parent 6d92053 commit a60032b
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions docs/curate_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ python scripts/curate/dataset_ensemble_clone.py

> [!Tip]
>
> **Output**: `repoqa-{datetime}.json` by adding a `"content"` field (path to content) for each repo.
> **Output**: `repoqa-snf-{datetime}.json` by adding a `"content"` field (path to content) for each repo.

### Step 3: Dependency analysis
Expand All @@ -45,23 +45,23 @@ python scripts/curate/dep_analysis/{language}.py # python
### Step 4: Merge step 2 and step 3

```shell
python scripts/curate/merge_dep.py --dataset-path repoqa-{datetime}.json
python scripts/curate/merge_dep.py --dataset-path repoqa-snf-{datetime}.json
```

> [!Tip]
>
> **Input**: Download dependency files in to `scripts/curate/dep_analysis/data`.
>
> **Output**: Update `repoqa-{datetime}.json` by adding a `"dependency"` field for each repository.
> **Output**: Update `repoqa-snf-{datetime}.json` by adding a `"dependency"` field for each repository.

### Step 5: Function collection with TreeSitter

```shell
# collect functions (in-place)
python scripts/curate/function_analysis.py --dataset-path repoqa-{datetime}.json
python scripts/curate/function_analysis.py --dataset-path repoqa-snf-{datetime}.json
# select needles (in-place)
python scripts/curate/needle_selection.py --dataset-path repoqa-{datetime}.json
python scripts/curate/needle_selection.py --dataset-path repoqa-snf-{datetime}.json
```

> [!Tip]
Expand All @@ -72,7 +72,7 @@ python scripts/curate/needle_selection.py --dataset-path repoqa-{datetime}.json
### Step 6: Annotate each function with description to make a final dataset

```shell
python scripts/curate/needle_annotation.py --dataset-path repoqa-{datetime}.json
python scripts/curate/needle_annotation.py --dataset-path repoqa-snf-{datetime}.json
```

> [!Tip]
Expand All @@ -85,7 +85,7 @@ python scripts/curate/needle_annotation.py --dataset-path repoqa-{datetime}.json
### Step 7: Merge needle description to the final dataset

```shell
python scripts/curate/merge_annotation.py --dataset-path repoqa-{datetime}.json --annotation-path {output-desc-path}.jsonl
python scripts/curate/merge_annotation.py --dataset-path repoqa-snf-{datetime}.json --annotation-path {output-desc-path}.jsonl
```

> [!Tip]
Expand Down

0 comments on commit a60032b

Please sign in to comment.