-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISA light #97
Comments
Can you give a more detailed exampled for this? |
IMO we don't need to explicitly state which assay is part of which study. This can be inferred by the processes in the assay having outputs from the study processes as inputs. |
From my point of view the ISA-TAB format format already differs slightly from the tab format. See (ISA-TAB) - not only formaly but also in terms of new concepts being introduced which are necessitated to make the format work in the ARC evironment. Now you are proposing a third format, which is not that different - but still - different. I think that would have more caveats then it solves the problems stated here. In general I agree with the problems you stated:
Yes, and this would be so much easier with the ISA-TAB format.
Here I really see a problem in usability and development. For developers - because this introduces a lot of overhead to make sure this data is in sync, bloates code, potential error source. For users, if there is no program like Architect this has to be done manually which can be tidious, prone to human error and cause confusion (why duplicate entries?) And although I think that these point should be improved I really do not like that solution of an additional format. Firstly because this has, again, consequences for users, data stewards and developers. UsersIn a perfect world there would be gui tools that make it very convient for a user to work on their ARC. I know tools have improved a lot and they are very helpfull. But as of now the users will have to interact with their ARCs in more inconvinient ways. Which - depending on the users background - can be a rather confusing experience. Adding another format adds additional information in sources and knowledge base and potential usability issues/ error sources - and just because it makes sense from a developer view - it does not have to make sense for a user. Another scenario would be if a user takes another ARC as basis of a new ARC but depending on which format definition they look at, they will run into trouble. Data Stewardship/ TeachingAs of the nature of this project it is impossible to have a finished definition of the ARC format from the start. With the consequence that we will have to update what we teach students from time to time and I think it would be beneficial to keep these changes as little as possible. Introducing another format is not only changing something but it is also adding a possibility. Which means either no explanation to keep it simple or add it and make it more complicated. The ARC itself is not complicated to teach ... but it's strength - beeing quite flexible besides its given constraints - can sometimes be a bit confusing, especially as we are all learning how to do this. And to exactly this flexibility we add another ISA format which intern increases the way of how things are done. At least for the first-ARCers it should rather be: this is the structure - this is the file format - and how to use it. Development
Again I think your mentioned problems are valid and there needs to be a solution but I don't think that this ISA LIGHT format should be the way to go. |
This is a valid point, the complexity of git diffing is further complicated by us using the XLSX format. But we decided on XLSX for the other benefits, like containing multiple sheets and the visual representation benefits it provides. Viewing big tables in ISA-TAB becomes a mess.
Well, we are not using ISA-TAB, but ISA-XLSX, which already differs quite a lot from ISA-TAB, but still implements the ISA abstract model, making parsing towards ISA-Json and ISA-TAB comparatively straight forward. The reasons for creating our own ISA-XLSX specification are plentyfold, e.g. improved usage of controlled vocabularies, self contained data containers (assays,studies) and the abiltiy to add features necessary for FAIR depiction of a full research cycle (#93).
We are proposing a variant to the ISA-XLSX format (you can call it a reduced version of ISA-XLSX) which may only be used in the ARC context with less information than a full ISA-XLSX format. As stated above, ISA-XLSX has its own specification. Users of the ARC don't have to learn three formats now, ISA-XLSX suffices. Knowledge about the other implementations of the ISA abstract model becomes relevant only for active, manual interoperation between different ecosystems.
I think this should be more important to users as to developers, as we can expect reading a specification from developers. But you have a valid point regarding exploration, in which the investigation file does not contain information about its studies anymore. This is a good point, as it would reduce discoverability of 2nd level metadata (e.g. what protocol is part of which study) from the pure investigation ISA-XLSX file. Discoverability of 1st level metadata (what study is part of this ARC) is not affected though, as looking into the studies folder suffices now. To sum up, look at it from the following perspective: Because ISA-XLSX is designed to work as an alternative for ISA-TAB and ISA-JSON it must contain some information to be convertible into these formats. But in the ARC, a lot of this information is handled implicitly and therefore we do not write it out. This makes the format easier, as there is less duplication and less room for error (as you stated). We do not need to teach full ISA-XLSX in the ARC context or even explain that there is a difference between ISA-XLSX light and ISA-XLSX. 95% of all ARC users will never require full ISA-XLSX. |
Hey @eik-dahms, after some consideration, maybe there was a little bit of confusion caused by me framing this as |
Closed by #101 |
Currently, the investigation file contains registry information about studies and assays. This is necessary in ISA-Tab because of two reasons:
Both these reasons do not apply in the ARC, as 1) the ISA-XLSX files have their own contextualizing information in an additional metadata sheet and 2) the ARC is a structured container, so Assay and Study file location is explicit.
Instead this registration now causes two problems in the context of the ARC:
As a solution, we propose ISA light:
I would suggest having ISA-light as an option in the ISA-XLSX specification. The ARC specification would then explicitly implement ISA-light, making it non-optional (with implicit backwards compatability in the tools)
@Freymaurer @JonasLukasczyk @Brilator @muehlhaus @chgarth
PS: This is already being tested out in the ARCitect
The text was updated successfully, but these errors were encountered: