Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernelspec JSON schema #105

Merged
merged 8 commits into from
Jun 10, 2024
Merged

Conversation

JohanMabille
Copy link
Member

@JohanMabille JohanMabille commented Apr 19, 2023

Kernelspec JSON schema

This proposes to specify the kernelspec through a JSON schema, and have the specification and documentation live in a dedicated repo.

The JSON schema is joined as a standalone file.

Voting from @jupyter/software-steering-council

Like #106 , let's move to the voting phase. As said in this comment, future schema should not be JEPs, but PR open to the Jupyter Schema repo

to [the current description of the kernelspec](https://jupyter-client.readthedocs.io/en/stable/kernels.html#kernel-specs),
and adds an optional `protocol_version` field.

[A dedicated repo](https://github.com/jupyter-standards/kernelspec) for the specification and the documentation of

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sold yet on repo-per-spec, and think the "big tent" repo, with common tooling and interlinked documentation will be more sustainable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a repo-per-spec might be overkill. However, I would like to have some separation of concerns. The repo for the kernel protocol will also hold the roadmap and additional documentation about the "release" process of the protocol. Also the kernel protocol and the widget protocol are somehow orthogonal, I would find it confusing to have them in the same repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sold yet on repo-per-spec, and think the "big tent" repo, with common tooling and interlinked documentation will be more sustainable.

I would agree with this. I think it would be more sustainable and easier for folks to follow if we had a single "Jupyter specs" repo.

Copy link
Member Author

@JohanMabille JohanMabille Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we gather specs by topics or "thematics"? Specifications for the kernel protocol, the kernelspec and the connection file could live together in the kernel_protocol repo. Everything related to the widgets specification (for instance) could live in another one. This way we avoid the multiplication of specs repos, while keeping some separation of concerns. Also this would be consistent with the current situation: a kernel has to implement the kernel protocol, provide a kernel spec and handle a conection file, while supporting the widget protocol is optional.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, widgets are maybe an exception... but the knowledge of how comm works is not. At the end of the chain, something like the Jupyter Server API references almost everything each other... kernelspec just happens to be very low in the order, with only perhaps a known-mimetypes.json below it, as that is not a schema-level construct like uri.

When changes occur, often these things will reference each other, and documentation and generated type packages should reflect the compatibility of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How comm works is part of the Kernel protocol, thus I agree the Widgets protocol references the Kernel protocol, but not vice versa. And I took the Widget protocol as an example, but we can consider other things, like the LSP, which is totally unrelated. I see this as a software distribution, we have low layers upon which are built upper layers, and there should not be circular dependencies. The Jupyter Server API you mentioned is a super high layer that references everything, however lower layers should not reference it.

"title": "Kernelspec",
"description": "Specification of the kernelspec file",
"type": "object",
"properties": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend some top-level definitions to make individual pieces easier to reference in other schema.

Copy link
Member Author

@JohanMabille JohanMabille Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find a way of gathering pieces, they all appear independent to me. Having a definition per field would just add complexity and not ease their reference in other schemaa, right? Sorry for the naive question, but I'm not used to JSON schemas.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernelspec, and specific fields within it, need to be referenced by other schema.

Have a look at, e.g. nbformat. If the kernelspec is formalized, then conventionally specs would like to point to specific meanings without pointing at a concrete locations:

$ref = "https://schema.jupyter.org/kernelspec/v1/schema.json#/definitions/display-name"

vs

$ref = "https://schema.jupyter.org/kernelspec/v1/schema.json#/properties/display_name"

This becomes more relevant, once more complex patterns are used such as additionalProperties or patternProperties (useful for e.g. mimetypes).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation, I will try to make the appropriate changes.

},
"metadata": {
"description": "A dictionary of additional attributes about this kernel; used by clients to aid in kernel selection. Metadata added here should be namespaced for the tool reading and writing that metadata.",
"type": "object"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is not widely-used yet, could this already be further constrained to enforce namespacing:

[definitions.metadata]
type = "object"

[definitions.metadata.additionalProperties]
type = "object"

and adds an optional `protocol_version` field.

[A dedicated repo](https://github.com/jupyter-standards/kernelspec) for the specification and the documentation of
the kernelspec has been created.
Copy link
Member

@krassowski krassowski Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Notebook format, precisely nbformat v4, in metadata there is a kernelspec property, with schema defined here. Previously (in nbformat v3) this was named as kernel_info (side note: format description is outdated as it still mentions kernel_info).

Should the JEP also give mandate to update nbformat schema so that it refer to this new kernelspec a source of truth (using definition link)? In nbformat v4 language is not required.

Can we describe advantages of moving kernelspec schema out of nbformat in the JEP to explain the reasoning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the JEP also give mandate to update nbformat schema so that it refer to this new kernelspec a source of truth (using definition link)?

I don't know actually, would it make sense to store the arguments used to launch the kernel in the nbformat for instance? If all the mandatory fields of the kernel spec make sense for the notebook format, then we should probably update it so that it refers this new kernelspec. Otherwise, maybe the name could be reverted to kernel_info or something else to avoid confusion? But that would be out of the scope of this JEP.

Can we describe advantages of moving kernelspec schema out of nbformat in the JEP to explain the reasoning?

The kernelspec is really about describing a kernel, and it is totally orthogonal to the notebook format. Kernel authors should not have to refer to the ntebook format spec to know how to implement a kernelspec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "kernelspec" entry in nbformat.v4 is a misnomer. It is not a kernelspec—sure, it shares some of the same fields, but there is a subtle difference here.

The kernelspec file does not have a "name" field. The (locally) unique name for a kernelspec comes from the directory on disk where the kernel.json file is located—it is not stored in the kernelspec file.

In the nbformat, this "name" field is necessary to locate the kernelspec on disk (this has always been brittle, because it means this information becomes useless the moment the notebook leaves the current context).

It really should be called something like "kernelspec_info".

Should the JEP also give mandate to update nbformat schema so that it refer to this new kernelspec a source of truth (using definition link)? In nbformat v4 language is not required.

This would be great for notebooks that always run in the same Python/conda/virtual environment (this field isn't helpful for notebooks that move around, but that's a separate issue). I would be in favor of changing this field, but I think this is out-of-scope for this JEP. We should make a follow-up JEP to update nbformat if we'd like to pursue that.

Can we describe advantages of moving kernelspec schema out of nbformat in the JEP to explain the reasoning?

The kernelspec is really about describing a kernel, and it is totally orthogonal to the notebook format. Kernel authors should not have to refer to the ntebook format spec to know how to implement a kernelspec.

I believe this is out-of-scope for the current JEP.

That said, they aren't entirely orthogonal. Yes, the kernelspec shouldn't depend on the nbformat, but the nbformat loosely depends on the kernelspec today. I think that's what @krassowski is alluding to... if we can drop nbformat's current conflicting definition of the kernelspec and point at this definition, we gain some standardization.

},
"kernel_protocol_version": {
"description": "The version of protocol this kernel implements. If not specified, the client will assume the version is <5.5 until it can get it via the kernel_info request.",
"type": "string"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this stays, should have some pattern. but as mentioned, hoisting this to and $id might be more robust over time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be included in the first version of the kernelspec.

I believe the first published version should only include the spec currently documented in the protocol documentation.

Then, we can quickly publish a second version with kernel_protocol_version added as a key. That way, we don't immediately invalidate everyone kernelspecs.

Also, creates a circular dependency on JEP #66.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way, we don't immediately invalidate everyone kernelspecs.

I realize this actually isn't true, since "kernel_protocol_version" wouldn't be a required field. That said, we still would need to settle the naming in #66 to merge this. I think "kernel_protocol_version" is a fine name for the field, so maybe this isn't any issue.

Copy link
Member Author

@JohanMabille JohanMabille Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually let's say that #66 depends on this one and will be updated accordingly (if needed) when this one is accepted.

@Zsailer
Copy link
Member

Zsailer commented Apr 19, 2023

Thank you for doing this @JohanMabille! You beat me to it. I had spent some time yesterday writing the exact same JEP as an action item from the SSC meeting. 😅

@JohanMabille
Copy link
Member Author

@Zsailer arf sorry for that! This came out when discussing the Handshake pattern, we need an additional field in the kernelspec and I thought it could be the opportunity to add a schema for it. I'll be more careful next time to avoid doing the work twice!

@Zsailer
Copy link
Member

Zsailer commented Apr 19, 2023

@Zsailer arf sorry for that! This came out when discussing the Handshake pattern, we need an additional field in the kernelspec and I thought it could be the opportunity to add a schema for it. I'll be more careful next time to avoid doing the work twice!

No no no—this is fantastic! I really appreciate you taking this over! No need to apologize 😃

@Zsailer
Copy link
Member

Zsailer commented Apr 19, 2023

I think it's worth saying for voting purposes—I'm 👍 for this JEP getting accepted.

We can continue to copy-edit, but wanted to explicitly submit my support. 😃

@JohanMabille
Copy link
Member Author

More changes to come, but I need to educate myself more to JSON schemas 😅

@westurner
Copy link

westurner commented Dec 5, 2023 via email

JohanMabille and others added 3 commits February 26, 2024 21:27
Co-authored-by: Jason Grout <jasongrout@users.noreply.github.com>
Co-authored-by: Frédéric Collonval <fcollonval@gmail.com>
@JohanMabille
Copy link
Member Author

Kind ping on this, I think this is ready for a final review:

  • The Jupyter Schema repo has been created, it will hold all the schemas for Project Jupyter and the JEP has been updated accodingly (not more specific repo per spec)
  • While the first schema specifications have been proposed through JEPs, the recent works and meetings of the Jupyter Contributors involved in the schema specifications led to the conclusion that we could directly work on the Jupyter Schema repo: implement first versions of different schema and then have fast iterations to make them more robust and extract common parts into dedicated schema. This would streamline the process.

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made one small suggestion on minItems, but 👍

kernelspec-spec/kernelspec.schema.json Outdated Show resolved Hide resolved
@JohanMabille
Copy link
Member Author

The JEP has been approved.

Yes: 9
No: 0
Abstention: 0
No vote: 1

Merging

@JohanMabille JohanMabille merged commit 3c52177 into jupyter:master Jun 10, 2024
1 check passed
@JohanMabille JohanMabille deleted the kernelspec-spec branch June 10, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants