Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-hosting gen of the schema-schema. #62

Merged
merged 1 commit into from
Jul 30, 2020

Conversation

warpfork
Copy link
Collaborator

Self-hosting gen of the schema-schema!

... Mostly. I've taken the opportunity to make a few tweaks to naming as I went, and there are a few other minor divergences:

  • A few cases use keyed unions when they should be kinded; this is a significant todo.
  • A few cases use keyed unions where the schema-schema declaration says they should use inline representations!
    • In these, I've come to believe the schema-schema made a mistake; we probably will update it.
  • We're still making little hacks to dodge the current placeholder typeinfo's lack of support for inline defns.
    • ...but this is purely a problem of the placeholder typeinfo structures, and can disappear the instant we replace them.

And he whole specification is still written in code: the 'SpawnFoo' placeholder methods are in heavy use. (The beginning of the end for them might be nigh, though!)

The overall structure, however, is not significantly diverged. This is a description of the schema-schema, and we're generating code for it.

...!

Which means... we can soon turn around and start using this to build up tooling which actually uses schema JSON as a config mechanism. Which will then bring us quite a bit closer to being able to make free-standing usable CLI tools for working with further codegen.

There's a few other bits to go. For starters, right now, this is just generating output into a demo output dir. I've made no attempt in this commit to rig it up as a proper snake-eating-its-tail by replacing the 'SpawnFoo' methods and placeholder type info; that'll come in due time. (And I think we may still have fun choices coming up with that, incidentally; the distinction between string type names and reified pointers is still looming, and we need to figure out what the story is for gen outputs containing their own type descriptions (which may touch on the same interface design choices); etc.) I'll probably move somewhat cautiously with this, and only cut over after polishing the gen outputs some more... but it's now near in reach.

The size of the generated output is also very likely to need work. We're looking something on the order of 1.6MB of generated output. (It's highly redundant: if you gzip it, it's 95kb.) Mind: I've made no effort whatsoever to bring this down. So, it's probably safe to assume we'll find some low-hanging fruit when we actually look into it. (I'm not yet sure what the bar will be for satisfaction with this: I regard the current number as vaguely "seems rather high", but it's also for a fairly sizable schema and for a lot of features provided, so maybe some size trades are just what we're going to face in golang.)

That's it. There will probably be some PRs to the schema-schema documents in the specs repo shortly. Other than those, this should also be about ready to line up with and parse JSON output created by the other IPLD Schema DSL->JSON parsers we already have, which could start unlocking some really neat stuff. 🎉

... With a few minor alterations.

I've taken the opportunity to make a few tweaks to naming as I went.

A few cases use keyed unions when they should be kinded; this is a
significant todo.

A few cases use keyed unions where the schema-schema declaration says
they should use inline representations!  In these, I've come to believe
the schema-schema made a mistake; we probably will update it.

The overall structure, however, is not significantly diverged.

The whole specification is still written in code: the 'SpawnFoo'
placeholder methods are in heavy use.  (This might herald the beginning
of the end, for them, though!)  (We're also still making little
hacks to dodge the current placeholder typeinfo's lack of support
for inline defns; but this is purely a problem of the placeholder
typeinfo structures, and can disappear the instant we replace them.)

If you run this generation, the emitted code is (aside from those
caveats listed above) suitable for parsing schema declarations.

...!

Which means... we can soon turn around and start using this to build
up tooling which actually uses schema JSON as a config mechanism.
Which will then bring us quite a bit closer to being able to make
free-standing usable CLI tools for working with further codegen.

There's a few other bits to go.  For starters, right now, this is
just generating output into a demo output dir.  I've made no attempt
in this commit to rig it up as a proper snake-eating-its-tail by
replacing the 'SpawnFoo' methods and placeholder type info; that'll
come in due time.  (And I think we may still have fun choices coming
up with that, incidentally; the distinction between string type names
and reified pointers is still looming, and we need to figure out what
the story is for gen outputs containing their own type descriptions
(which may touch on the same interface design choices); etc.)  I'll
probably move somewhat cautiously with this, and only cut over after
polishing the gen outputs some more... but it's now near in reach.

The size of the generated output is also very likely to need work.
We're looking something on the order of 1.6MB of generated output.
(It's *highly* redundant: if you gzip it, it's 95kb.)  Mind: I've made
*no* effort whatsoever to bring this down.  So, it's probably safe to
assume we'll find some low-hanging fruit when we actually look into it.
(I'm not yet sure what the bar will be for satisfaction with this:
I regard the current number as vaguely "seems rather high", but it's
also for a fairly sizable schema and for a lot of features provided,
so maybe some size trades are just what we're going to face in golang.)

That's it.  There will probably be some PRs to the schema-schema
documents in the specs repo shortly.  Other than those, this should
also be about ready to line up with and parse JSON output created
by the other IPLD Schema DSL->JSON parsers we already have, which
could start unlocking some really neat stuff.  🎉
@warpfork
Copy link
Collaborator Author

warpfork commented Jul 27, 2020

If you want to peak at the actual outputs, but can't be bothered to check out the repo and run it yourself: there's a commit here that has the full output checked in: 1f5a5cb

(It's a temporary commit, though, on a branch not destined to merge -- it will probably disappear at some point.)

@warpfork
Copy link
Collaborator Author

@rvagg , this might be interesting to you. Or might not -- you can wait until I make schema-schema PRs in the specs repo to talk about this stuff, rather than sift through the gory "Spawn" method slew, if you like.

@rvagg
Copy link
Member

rvagg commented Jul 29, 2020

Ok, I think I see what you might be getting at with the inline representations and inline types, they appear to be coupled problems.

And you should really get "unit" into the schema-schema and docs, it's a very particular concept we need to socialise.

Also yay!

I started down a path or writing a schema validator for JS but while doing it I felt like I was manually writing a schema-schema representation in code and it was just wrong and I couldn't expend the brainpower at the time to figure out how to make it right and whether I was chasing a rabbit down a hole that I'd come to regret once I got too far, so I stopped! I need to revisit some of that work at some point and figure out the relationship of schema validation to schema-schema and how to reuse as much of the schema-schema as possible to apply to the task of validating that a schema takes a proper structure.

Re next steps, I assume at some point soon you might check in a copy of the codegen'd schema-schema so it can be linked against by other tools (like the schema parser), but right now it's just too darn big?

@warpfork
Copy link
Collaborator Author

Yep yep and yep.

I think it might still remain entirely possible to check in the generated code, even at this size, but... it would be nice if the size goes down, yeah. And either way I think I'd like to knock around with it a bit first and try to get a feel for if the ergonomics are at all right, what other methods might be needed, etc. (Which might involve a few other demos, too.)

@warpfork
Copy link
Collaborator Author

Merginate'n, todos and all. I'll be working on relevant missing features (e.g. kinded union gen) on master and continuing to update this going forward as those pieces become available.

@warpfork warpfork merged commit ac99bd4 into master Jul 30, 2020
@warpfork warpfork deleted the self-hosting-gen-of-schema branch July 30, 2020 11:22
@aschmahmann aschmahmann mentioned this pull request Feb 18, 2021
73 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants