Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated scripts for preparing subgraphs input data #145

Merged
merged 1 commit into from
Mar 19, 2024

Conversation

mdsalnikov
Copy link
Collaborator

No description provided.

@mdsalnikov mdsalnikov requested a review from highly0 March 7, 2024 08:38
@mdsalnikov mdsalnikov force-pushed the feature/mistral_mixtral_subgraphs_preparing branch 2 times, most recently from 9d6e4c8 to d685fb7 Compare March 7, 2024 10:36
Copy link
Collaborator

@highly0 highly0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments, nothing to major:)

}
],
"source": [
"with open('to_subgraphs/mintaka_mixtral_valid.jsonl', 'w') as f:\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the sake of reproducibility, include 2 lines to create the folder if not exist

from pathlib import Path
output_path = './to_subgraphs'
Path(output_path).mkdir(parents=True, exist_ok=True)

with open(Path(output_path) / "mintaka_mixtral_valid", 'w+') as f:
    for data_line in data_to_subgraphs(valid_df):
        f.write(ujson.dumps(data_line)+'\n')

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
],
"source": [
"with open('to_subgraphs/mintaka_mixtral_test.jsonl', 'w') as f:\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as above

with open(Path(output_path) / "mintaka_mixtral_test", 'w+') as f:
    for data_line in data_to_subgraphs(valid_df):
        f.write(ujson.dumps(data_line)+'\n')

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
],
"source": [
"with open('./mintaka_mixtral_50_valid_predictions.json', 'r') as f:\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case, it is better to use pandas.read_json, since we want to load in a dataframe anyway. Get rids of 2 unnecessary steps.

import pandas as pd

valid_predictions = pd.read_json('./mintaka_mixtral_50_valid_predictions.json')
test_predictions = pd.read_json('./mintaka_mixtral_50_test_predictions.json')

ds = datasets.load_dataset("AmazonScience/mintaka")

valid_df = pd.merge(
    valid_predictions,
    ds['validation'].to_pandas(),
    on=['id', 'question'],
)

test_df = pd.merge(
    test_predictions.rename(columns={'mixtral answers': 'model_answer'}),
    ds['test'].to_pandas(),
    on=['id', 'question'],
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, am agree! Thank you
Done

@mdsalnikov mdsalnikov force-pushed the feature/mistral_mixtral_subgraphs_preparing branch from d685fb7 to e1f9f7b Compare March 18, 2024 13:47
@highly0 highly0 merged commit a6eaf14 into master Mar 19, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants