Updated scripts for preparing subgraphs input data #145

mdsalnikov · 2024-03-07T08:38:19Z

No description provided.

highly0

Few comments, nothing to major:)

highly0 · 2024-03-11T09:41:34Z

experiments/subgraphs_datasets_prepare_input_data/mintaka_subgraphs_preparing.ipynb

+    }
+   ],
+   "source": [
+    "with open('to_subgraphs/mintaka_mixtral_valid.jsonl', 'w') as f:\n",


I think for the sake of reproducibility, include 2 lines to create the folder if not exist

from pathlib import Path output_path = './to_subgraphs' Path(output_path).mkdir(parents=True, exist_ok=True) with open(Path(output_path) / "mintaka_mixtral_valid", 'w+') as f: for data_line in data_to_subgraphs(valid_df): f.write(ujson.dumps(data_line)+'\n')

highly0 · 2024-03-11T09:42:16Z

experiments/subgraphs_datasets_prepare_input_data/mintaka_subgraphs_preparing.ipynb

+    }
+   ],
+   "source": [
+    "with open('to_subgraphs/mintaka_mixtral_test.jsonl', 'w') as f:\n",


Same thing as above

with open(Path(output_path) / "mintaka_mixtral_test", 'w+') as f: for data_line in data_to_subgraphs(valid_df): f.write(ujson.dumps(data_line)+'\n')

highly0 · 2024-03-11T09:46:29Z

experiments/subgraphs_datasets_prepare_input_data/mintaka_subgraphs_preparing.ipynb

+    }
+   ],
+   "source": [
+    "with open('./mintaka_mixtral_50_valid_predictions.json', 'r') as f:\n",


I think in this case, it is better to use pandas.read_json, since we want to load in a dataframe anyway. Get rids of 2 unnecessary steps.

import pandas as pd valid_predictions = pd.read_json('./mintaka_mixtral_50_valid_predictions.json') test_predictions = pd.read_json('./mintaka_mixtral_50_test_predictions.json') ds = datasets.load_dataset("AmazonScience/mintaka") valid_df = pd.merge( valid_predictions, ds['validation'].to_pandas(), on=['id', 'question'], ) test_df = pd.merge( test_predictions.rename(columns={'mixtral answers': 'model_answer'}), ds['test'].to_pandas(), on=['id', 'question'], )

Yes, am agree! Thank you
Done

mdsalnikov requested a review from highly0 March 7, 2024 08:38

mdsalnikov force-pushed the feature/mistral_mixtral_subgraphs_preparing branch 2 times, most recently from 9d6e4c8 to d685fb7 Compare March 7, 2024 10:36

highly0 reviewed Mar 11, 2024

View reviewed changes

Updated scripts for preparing subgraphs input data

e1f9f7b

mdsalnikov force-pushed the feature/mistral_mixtral_subgraphs_preparing branch from d685fb7 to e1f9f7b Compare March 18, 2024 13:47

highly0 approved these changes Mar 19, 2024

View reviewed changes

highly0 merged commit a6eaf14 into master Mar 19, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated scripts for preparing subgraphs input data #145

Updated scripts for preparing subgraphs input data #145

mdsalnikov commented Mar 7, 2024

highly0 left a comment

highly0 Mar 11, 2024

mdsalnikov Mar 18, 2024

highly0 Mar 11, 2024

mdsalnikov Mar 18, 2024

highly0 Mar 11, 2024

mdsalnikov Mar 18, 2024

Updated scripts for preparing subgraphs input data #145

Updated scripts for preparing subgraphs input data #145

Conversation

mdsalnikov commented Mar 7, 2024

highly0 left a comment

Choose a reason for hiding this comment

highly0 Mar 11, 2024

Choose a reason for hiding this comment

mdsalnikov Mar 18, 2024

Choose a reason for hiding this comment

highly0 Mar 11, 2024

Choose a reason for hiding this comment

mdsalnikov Mar 18, 2024

Choose a reason for hiding this comment

highly0 Mar 11, 2024

Choose a reason for hiding this comment

mdsalnikov Mar 18, 2024

Choose a reason for hiding this comment