-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which tool to load and share metadata? #13
Comments
I see a couple of points that are important to consider for our choice of tool: Is the simplicity of our backend important? For us, for the users?a simple backend tool
Datalad is very powerful, but 99% of the users would probably not understand how the metadata portal actually works. The use of a CLI (and not only a GUI) might therefore be limited to only a small fraction of the community. Intake is powerful but also simple, it is easy-ier for the users to grasp. Because intake is linked to e.g. Dask, analysis and visualisation is a natural and simple step after the download. Intake is full Python, with all the implications it has. Is that an important point? I think it is. |
At present, my choice goes for: Intake |
Not sure why this is an issue and not the discussion (#12) :). Anyway, maybe we should build a prototype using both to better understand the pros and cons. Suggest initial list is #4, but for our prototype I'm not sure those are good given their large size. How about the following five datasets:
I suggest the prototype include:
Ideally, we each build two prototypes, so that we can each understand both tools for a decision/discussion. |
We initially explored datalad, but other options are very interesting too:
datalad
Very powerful because directly based on git-annex, but I still haven't fully understood how to use it properly/efficiently.
Datalad is a data management system, and only that (to my knowledge). Very efficient because concentrated on this one task, but somehow limits our application. Or calls for the use of other tools in combination. Which might just be ok.
intake
Simple set of tools but also powerful. Because simple, the community could easily contribute new catalog entries (through yaml files).
Allows for local file caching
Dask capabilities for big data
Cloud access support
Possibility for a simple GUI
Storing catalog metadata in files makes the structuring of our portal very easy to understand and efficient.
The use of the yaml format makes community contribution easier, even from non coders (json and more xml can be intimidating if not used to coding at all).
Intake is more than just a data management tool. Not only the data download step is streamlined but also the reading through the many drivers available (and easy to implement new ones).
pooch
Simple and similar to intake, instead data sources are not really considered as catalogs. Developed to download test data for libraries so we might see some limitations for our metadata portal.
This comparison will be further modified/refined.
The text was updated successfully, but these errors were encountered: