-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated scripts for preparing subgraphs input data #145
Conversation
9d6e4c8
to
d685fb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments, nothing to major:)
} | ||
], | ||
"source": [ | ||
"with open('to_subgraphs/mintaka_mixtral_valid.jsonl', 'w') as f:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the sake of reproducibility, include 2 lines to create the folder if not exist
from pathlib import Path
output_path = './to_subgraphs'
Path(output_path).mkdir(parents=True, exist_ok=True)
with open(Path(output_path) / "mintaka_mixtral_valid", 'w+') as f:
for data_line in data_to_subgraphs(valid_df):
f.write(ujson.dumps(data_line)+'\n')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
], | ||
"source": [ | ||
"with open('to_subgraphs/mintaka_mixtral_test.jsonl', 'w') as f:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as above
with open(Path(output_path) / "mintaka_mixtral_test", 'w+') as f:
for data_line in data_to_subgraphs(valid_df):
f.write(ujson.dumps(data_line)+'\n')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
], | ||
"source": [ | ||
"with open('./mintaka_mixtral_50_valid_predictions.json', 'r') as f:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case, it is better to use pandas.read_json
, since we want to load in a dataframe anyway. Get rids of 2 unnecessary steps.
import pandas as pd
valid_predictions = pd.read_json('./mintaka_mixtral_50_valid_predictions.json')
test_predictions = pd.read_json('./mintaka_mixtral_50_test_predictions.json')
ds = datasets.load_dataset("AmazonScience/mintaka")
valid_df = pd.merge(
valid_predictions,
ds['validation'].to_pandas(),
on=['id', 'question'],
)
test_df = pd.merge(
test_predictions.rename(columns={'mixtral answers': 'model_answer'}),
ds['test'].to_pandas(),
on=['id', 'question'],
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, am agree! Thank you
Done
d685fb7
to
e1f9f7b
Compare
No description provided.