-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layout namespaces and discovery proposal #15
Conversation
I'm all for getting this well specced. 😄 |
Yeah, we kinda ran up against a wall of what we'd really be able to do with this (sorry, @parente for not getting back to you more). We came up with these issues that make it hard to get to one data standard:
Of those, the cell identity one is really the most fundamental to this kind of work: having the layouts spread all over the place in different cells will inevitably make layouts more brittle. But having a "blessed" field (e.g. As it is, we'd still be left with, say, It could be that there exists a suitable structured data vocabulary that we could bring in more-or-less whole cloth (via JSON-LD), but for the time being (and until notebook 5.0 hits), I don't see how we'd end up with anything other than work to show for it... |
Define "this". :) The metadata and rendering spec for the dashboards extension is here for the time being: https://github.com/jupyter-incubator/dashboards/wiki/Dashboard-Metadata-and-Rendering. We'll happily take issues about it or direct edits. There's probably an equivalent page for nbpresent (or certainly could be). And for RISE. And for the slideshow toolbar in Notebook. I don't think there's one Jupyter spec that can rule them all. Or, at least, I personally have no clue how to write it at the moment. |
Friday thought ... Should this turn into a simple spec about where extensions should put their metadata in the notebook document? @jhpedemonte, @dalogsdon, @nitind and I have been thinking about the metadata we use in the dashboards projects and how to go from what we evolved as we went open source and added features, to something cleaner and more easily extended in the future. (v1 draft over here: https://github.com/jupyter-incubator/dashboards/wiki/Dashboard-Metadata-and-Rendering) We still don't have an idea of how to write one spec for all tools to follow, but we did all agree that it's kind of silly to have a top level "layouts" key that the PR proposed. Really, all extensions that want to shove data into an ipynb are going to want some guidance on how to avoid stepping on each other, especially if creating extensions keeps getting easier (4.2, 5.0, ...) The PR could simply become:
|
Big 👍 from me. |
A big win would be to adopt a self-describing, application-layer meaning on top of the cell and notebook metadata, if not the whole document. JSON Schema provides some of this, but as it isn't a managed standard, you kinda get what you get with your implementation. Also, the implementation of The best standard out there I have found to this end is JSON-LD. The big three of JSON-LD are:
All of these take advantage of XML-style nestable namespace decimation, i.e. The out-of-band part means that in many cases, existing JSON doesn't need to change.
A document described in this way would then be able to have one of several things done to it:
This could circle back to the layout interop thing, as one could see, for example, an Part of this whole robust-ifying would be to create some types that would actually have impact elsewhere: for example, were we to extend the schema.org CreativeWork, with, say, |
@bollwyvl I gave the notebook document you linked a read. I'd never seen it before. I'll admit I don't completely grok the use cases it addresses. Can you give an example of what impact adding context, id, and type would have for layouts, or, say, nbpresent specifically? |
@parente Cool, thanks! The long con on this is being able to leverage the world + dog's notebooks as something at the scale of the Wolfram Language. Consider if
With a second layer of information beyond
Inside nbpresent notebook metadata, a number of things eventually don't fit cleanly inside a hierarchy:
If, instead of doing ugly, MongoDB-style dereferencing with n+1 queries, I could treat the underlying document as a canonical location for writing, but be able to read/subscribe to a flattened form of the graph. If there was a Here's what some of that data might look like: http://tinyurl.com/jzed66n |
OK. To echo back my understanding, it's about making notebook content more easily discoverable and reusable. The application of the approach to layout is just one example, and the fundamental idea would cover all notebook content.
I think this comes back to doing:
*Where should extensions (layout or otherwise) that want to put data in the notebook document write their data so that it avoids conflict with other extensions AND with future notebook format changes?"
"How should tools that write to notebook documents store content and metadata so that it can be identified and reused by other tools?" This was the division of right now vs over time I was hinting at (poorly) in the roadmap. Shared concepts like I suggest turning this PR into guidance on the "where" problem (#15 (comment)) and starting a new PR about how LD-JSON could apply. I don't think getting this simple proposal discussed and accepted would hinder anything later: the ID, types, etc. will apply wherever extension or non-extension metadata is stored. By the same token, the simple recommendation of having extensions / plugins store metadata in What do folks think, @bollwyvl in particular since he's the most likely candidate for seeding that new PR with his LD-JSON expertise? |
I think identifying a recommendation for where extension metadata should go is a good idea. Currently, the official recommendation is an extension-specific location, but that is not specified in detail. A recommendation is to use |
I think recommending |
I'll take the opposite position: Additionally, metadata is already a namespace for additional information - the official notebook schema can add keys outside of metadata. So having an 'extensions' namespace inside that feels a bit redundant. |
@takluyver said:
If consensus forms around doing everything in @parente said:
Anyone else have an opinion on whether this simple proposal has any value (i.e., where extension metadata should go), if an LD-JSON JSON proposal should be separate, or otherwise? |
I continue to think that LD-JSON, and semantic web technology in general, is an overcomplicated solution in search of a problem (this is a debate we've had before). So I'm quite happy for extension metadata to stay in a simple format. |
Indeed, it is in respect of this perspective I haven't pushed more on this issue more since raising it some years ago. Just to summarize, here are some of the problems I think adopting strongly-typed, URI-based metadata solves: DiscoverabilitySearch engines, journals, content republishers, etc. make heavy use of strongly-typed data to provide better results to users about traditional metadata: who said it, what it is about, when it was said, what you can do with it, who referenced it. The only debate, really, is which standard to use not whether this is a good idea: some folks don't like Dublin core, for example. If notebooks do not play in this space, they will always be treated more like a figure than the artifact of record, and generating content for these outlets will always require the kinds of manual steps that limit the speed and impact of publishing. DocumentationThe Matlab and Mathematica documentation user experiences are objectively superior to the documentation our users experience because of their language homogeneity, and the integration of the authoring runtimes. Of course, we could do that, too, post-hoc: tools exist for statically reverse-engineering the symbolic content of code in most of our kernels. However, by doing the kind of extensions needed for better completion, mentioned elsewhere, we could make this part of the kernel implementation itself. If we raise the bar for the naming of atomic units of code, every notebook cell (in context) would become a potential source of documentation, ideally, annotating the types at cell execution time. DataWhile there are different kernels, libraries, etc. the schema of the data of interest are often shared: tables, trees, graphs, singletons. As mentioned earlier, if a InteroperabilityIf, when extension developers had the choice of building their own data format, or inheriting from an existing one, preferably managed outside of a specific implementation, we'd have the ability to start moving forward on the ecosystem building new and cool things. Describing "stuff in a viewport how the user wants it" is really quite a useful thing to achieve some consensus on, hence why I even brought this stuff up, again! Widgets, I think, would benefit tremendously, if they could carry units, etc. Anyhow, just throwing this stuff out there! |
I'm going to close out this proposal as it's lingered here for some time and failed to garner significant support. The dashboards extension documents where it puts its metadata and that's just fine. |
Thanks @parente |
This PR proposes a very light weight spec for notebook layout metadata, addressing namespaces and discovery. It does not attempt to define a common schema for layout metadata intended to cover all tools and use cases for reasons given in the PR (see Intentionally Limited Scope) and based on the discussion in the roadmap PR about dashboards.
After dwelling on this for a while and writing it up, I don't think there's much value here versus simply asking layout tools to document their metadata format and rendering procedure, like we've now done for jupyter-incubator/dashboards. If there's anything of value here, it's in the basic guidelines given for picking a a unique namespace under
metadata
to avoid conflict. (But such guidelines are generally applicable across all tools that wish to write tometadata
, not just layout tools.)"IMHO" aside, we agreed at the dev meeting that an enhancement proposal on this topic was on the path of advancing dashboard support in Jupyter, so here it is for discussion.
cc'ing folks who were in the room when we discussed dashboards during the March dev meeting, and others who expressed an interest: @bollwyvl @fperez @ellisonbg @minrk @sccolbert @blink1073 @jasongrout @rgbkrk @lbustelo @jhpedemonte @dalogsdon