-
-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make output artifacts live in the task hierarchy they were produced in #647
Comments
I agree, though I would not advocate for different behavior on S3 vs locally on the AMLB level. I would expect I imagine using |
Yeah, I have no issues with the Re AWS mode logic: I don't mind too much the redundant path information so long as it is changed to the
As a follow-up, would it break AMLB's logic if I send a PR changing to |
Really busy so I can't look around right now. From the top of my head, I can't think of a reason to keep the extra I would want the change to happen for all frameworks though, otherwise it just creates more work later and might lead to unexpected issues. |
Currently if I create an output_dir for a given artifact in a given task, such as
iris
on fold 0, it produces what I consider a suboptimal path. This has led me for the past 3 years to have a fork of AMLB that has different saving logic for AutoGluon so the S3 folder format is easier to work with.Current Logic
Output:
Shortening for brevity:
Then if I want to save the file, it makes it even longer:
Current AWS Mode Logic
As an example, when the files are saved to S3, parsing becomes very complicated with the logic in mainline. If I simply want to concatenate all
leaderboard.csv
files in a benchmark run, while adding additionaldataset
,fold
andmethod
columns to differentiate between the runs, this becomes well over 200 lines of very complicated, very slow code.Here is a real example of a path I have in s3 using the mainline logic:
Breaking it down:
Writing logic given the run properties to find the right files is challenging given this format. The relative path of
leaderboard.csv
toresults.csv
for a given task result requires knowledge of the dataset and fold name, which isn't ideal. Especially when the dataset name in S3 can differ from what it is in thetask_metadata.csv
due to special characters.Proposal
Instead, I recommend the final output locations of the artifacts be:
This is much nicer IMO: It makes all of the artifacts related to a given method_task run (
autogluon.test.test.local.20241110T031036/iris/0/
) available in the same directory. It then becomes very easy to tell what artifacts are available from the run, rather then them being spread across many folders.AWS Mode
For AWS mode, I propose that the artifact save format should be:
(optionally can also remove the output/ dir if it is redundant)
This would make the relative path of all artifacts identical for all tasks, since they live in the same directory with the same structure regardless of the task name / fold / method.
The text was updated successfully, but these errors were encountered: