Skip to content

Commit

Permalink
Merge pull request #56 from JHU-CLSP/mainaug15
Browse files Browse the repository at this point in the history
[WIP]
  • Loading branch information
Daniel Khashabi authored Sep 2, 2023
2 parents 3c7df69 + c34624c commit 9842ba5
Show file tree
Hide file tree
Showing 107 changed files with 7,005 additions and 7,019 deletions.
7 changes: 5 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,19 @@ jobs:
path: "requirements.txt"

# Runs a set of commands using the runners shell
- name: test the task formats
- name: run Django/Turkle server
run: |
echo 'Python version'
python --version
python3 --version
echo 'Moving to src directory'
cd src
echo 'Print the current directory'
pwd
echo 'List the files in the current directory'
ls -l
echo 'Clone Turkle'
./1.run_website.sh
./1.run_website.sh & sleep 30
echo 'Generate the input files'
python 2.generate_input_csv.py
echo 'Upload the tasks'
Expand Down
4 changes: 0 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,5 @@
src/Turkle/
/src/Turkle/*
src/Turkle/*
/tasks/*/batch.csv
tasks/*/batch.csv
*/tasks/*/batch.csv
*/batch.csv
/tasks/*/input.csv
tasks/*/input.csv
26 changes: 26 additions & 0 deletions data/splits/evaluation_tasks.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Associate countries and languages with Ethnologue
atomic_event2event-effects 4
Author In-Group Analysis Phrase Classification 2
Compile list of area chairs
Elicitation obj
Elicitation subj
Full sentence style annotations
Gun violence structured extraction
Lattice
NER - Task scruples 26,200 - 30,922
neural-pop (PLAN evaluation) t5-human-test b
Paraphrase Clustering with Merge
Photo Collection GVDB
Radiology Report Sentence Classification
Reddit In-group Analysis Comment annotation 3
ROT Details [m=50] rocstories - 0 - 99
Scalar Adjectives Identification
Script KD eval LONG V2 - disc result eval 1
Sherlock IMG 2 TXT Eval 15
Spanish Word Alignment
wikiHow step-goal linking pilot cleanse-url
winogrande validation (grammar) additional_ph
sandbox_audio_quality
sandbox_figure_descriptions
sandbox_lamecows
sandbox_scambaiting
3 changes: 3 additions & 0 deletions data/splits/subjective_evaluation_tasks.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ANES 2008 open-ended survey
Recreation of the Dan Johnson
Congressional Bills 5 point
20 changes: 13 additions & 7 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
GitPython
selenium==4.8.2
beautifulsoup4
requests
pandas
rouge-score
python-dateutil
colorama
GitPython==3.1.31
beautifulsoup4==4.11.2
requests==2.28.2
pandas==1.5.3
rouge-score==0.1.2
python-dateutil==2.8.2
colorama==0.4.6
Pillow==9.4.0
transformers==4.26.1
tqdm==4.64.1
numpy==1.24.1
boto3==1.28.34
jsonlines==3.1.0
37 changes: 26 additions & 11 deletions src/2.generate_input_csv.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,42 @@
import pandas as pd
import os

"""
Script for removing unnecessary data from csv files
"""


def create_input(csv_file):
df = pd.read_csv(csv_file, low_memory=False)
df = df.loc[:, ~df.columns.str.startswith('Answer.')]
df.drop_duplicates(inplace=True)
df.to_csv(csv_file.replace('batch.csv', 'input.csv') , encoding='utf-8-sig', index=False)
df.to_csv(csv_file.replace('batch.csv', 'input.csv'), encoding='utf-8-sig', index=False)


ROOT = '../tasks'

if __name__ == '__main__':
# ensure that ../tasks is available
if not os.path.exists('../tasks'):

if not os.path.exists(ROOT):
raise Exception("No directory named `tasks` found. Make sure that you run this script in the `src/` directory")

for root, dirs, files in os.walk('../tasks'):
# if files is empty then show an error
if not files or len(files) == 0:
raise Exception(f"No files in the specified directory: {dirs}")
items = os.listdir(ROOT)
# if files is empty then show an error
if len(items) == 0:
raise Exception(f"No files in the specified directory `{items}`: {ROOT}")

for item in items:

# skip if not a dir
if not os.path.isdir(os.path.join(ROOT, item)):
continue

file = os.path.join(ROOT, item, 'batch.csv')

# make sure the file exists
if not os.path.exists(file):
raise Exception(f"File `{file}` does not exist")

for file in files:
if file.endswith('batch.csv'):
path = os.path.join(root, file)
print(' ** Reading: ' + path)
create_input(path)
print(' ** Reading: ' + file)
create_input(file)
5 changes: 3 additions & 2 deletions src/3.upload_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,9 @@ def __init__(self, batch_name, project_name, template, csv):
output_encoding = 'utf8'
df = pd.read_csv(csvpath, encoding=input_encoding)
df.to_csv(csvpath, index=False, encoding=output_encoding)
print(dir)
options = Options(batch_name=dir, project_name=dir,
print(f"{Fore.BLUE} -> {dir}")
options = Options(batch_name=dir,
project_name=dir,
template=temp,
csv=csvpath)
result = client.upload(options)
Expand Down
Loading

0 comments on commit 9842ba5

Please sign in to comment.