Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strategies for Managing Machine Learning Model Metadata and Lineage #814

Open
1 task
ShellLM opened this issue Apr 24, 2024 · 1 comment
Open
1 task
Labels
MachineLearning ML Models, Training and Inference Models LLM and ML model repos and links Research personal research notes for a topic Software2.0 Software development driven by AI and neural networks.

Comments

@ShellLM
Copy link
Collaborator

ShellLM commented Apr 24, 2024

Strategies for Managing Machine Learning Model Metadata and Lineage

Snippet

Keeping track of models and their associated metadata.

Discussion

I am starting to accumulate a large number of models for a project I am working on, many of these models are old which I am keeping for archival sake, and many are fine tuned from other models. I am wondering if there is an industry standard way of dealing with this, in particular I am looking for the following:

  • Information about parameters used to train the model
  • Datasets used to train the model
  • Other metadata about the model (i.e. what objects an object detection model trained for)
  • Model performance
  • Model lineage (What model was it fine tuned from)
  • Model progression (Is this model a direct upgrade from some other model, such as being fine tuned from the same model but using better hyper parameters)
  • Model source (Not sure about this, but I'm thinking some way of linking the model to the python script which was used to train it. Not crucial but something like this would be nice)

Are there any tools of services which could help be achieve some of this functionality? Also, if this is not the sub for this question could I get some pointers in the correct direction. Thanks!

Original Reddit Discussion

Comments

u/fiftyfourseventeen

Weights and biases (wandb)

Gardienss

Aren't you just describing tensorboard with some added metadata ?

u/qalis

MLFlow is built literally for this purpose

Material_Policy6327

MLFlow is what we use

metric_logger

My colleague wrote a blog post on how to use Comet (an experiment tracking solution that does all you said) for object detection use-cases.

Compare Object Detection Models from Torchvision

gdpoc

In addition to wandb, comet, mlflow, and Neptune, (plug) state farm just open sourced a package called ThingStore for general process logging and tracking.

Suggested labels

None

@ShellLM ShellLM added MachineLearning ML Models, Training and Inference Models LLM and ML model repos and links Research personal research notes for a topic Software2.0 Software development driven by AI and neural networks. labels Apr 24, 2024
@ShellLM
Copy link
Collaborator Author

ShellLM commented Apr 28, 2024

Related content

#665 similarity score: 0.85
#734 similarity score: 0.85
#660 similarity score: 0.84
#750 similarity score: 0.84
#699 similarity score: 0.83
#647 similarity score: 0.83

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MachineLearning ML Models, Training and Inference Models LLM and ML model repos and links Research personal research notes for a topic Software2.0 Software development driven by AI and neural networks.
Projects
None yet
Development

No branches or pull requests

1 participant