Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bespoke reader/parser for speed and extensibility #2101

Closed
wants to merge 7 commits into from

Conversation

scauligi
Copy link
Member

Very much still WIP, f-strings (among several other things) are not implemented yet.
Also needs refactoring, proper error handling, better naming, better organization, documentation, etc.
I'm opening a PR here just so that it's visible (@allison-casey #722 (comment)), but it will be a while before it's ready to be actually reviewed/merged.

But hey, user-defined reader macros!

; run this through `hy2py -s -np`

(eval-and-read
  (defn -#UPPER- [reader -]
    (.slurp_space reader)
    (assert (= "\"" (.getc reader)))
    (setv s (reader.read-rest-of-quote))
    (-> (s.upper) hy.models.String))
  (if (in (hy.mangle "&reader") (locals))
    (assoc &reader.reader-table "#UPPER" -#UPPER-)))

(defmacro "#upper" [s]
  (.upper s))

#upper "this string is a lowercase literal that gets compiled to uppercase later"
#UPPER "this string is an all-caps literal *at read time*!"

@scauligi
Copy link
Member Author

Hmm, after implementing the rest of the parser, this may have been somewhat ill-advised.
Either that or I refactored something very poorly.

custom         0.4069 s
hy (rply)      0.3009 s

@scauligi
Copy link
Member Author

scauligi commented Jun 18, 2021

Spoke too soon; it was due to me adding a (recursive) call to HyObject.replace on every generated node.
Removing that (and just making sure that all generated nodes have source info) brings us back down to:

custom         0.2623 s
hy (rply)      0.3009 s

We're still in business, friendos!

@allison-casey
Copy link
Contributor

king 🙌 👑

@scauligi scauligi force-pushed the custom-parser branch 4 times, most recently from 58222bb to be57ea1 Compare July 4, 2021 08:31
@Kodiologist
Copy link
Member

You're still working on this, right?

@scauligi
Copy link
Member Author

Yup, it's still in the works!
I should probably push and write what I'm doing here so that folks can comment in on design choices as I'm going along.

@scauligi
Copy link
Member Author

I may be creeping on scope with this, but...

I have the reader yielding top-level forms as a generator now instead of tokenizing the entire file before it hits the compiler.
What does this allow you to do? Well, if you modify the reader (at compile-time), it takes effect immediately!

;; test_slicing.hy
(require slicing)

#: a:(+ 3 4):"slice slice"  ; treated as a generic tag-style macro,
                            ; limited by normal macro parsing semantics

(slicing.install-#:)        ; from here, #: is now special

#: a:(+ 3 4):"slice slice"  ; works!

After running through hy2py (spacing added for clarity):

import hy
hy.macros.require('slicing', None, assignments='ALL', prefix='slicing')

hyx_Xnumber_signXXcolonX(hyx_aXcolonX)
3 + 4
hy.models.Keyword('')
'slice slice'

slice(a, 3 + 4, 'slice slice')

The first instance gets parsed as (#: a:) (+ 3 4) : "slice slice"
since it's following the usual tag macro semantics, but the second
form gets handled by #: reader!

And of course the secret sauce:

;; slicing.hy
(defmacro install-#: []
  '(eval-when-compile
     (defn parse-node [&reader]
       (setv node
         (if (= ":" (.peekc &reader))
           None
           (.parse-one-node &reader)))
       (if (= node '...)
         'Ellipsis
         node))

     (defn read-slice [&reader key]
       (setv old-ends-ident (.copy &reader.ends-ident))
       (.add &reader.ends-ident ":")

       (setv nodes [])

       (.slurp-space &reader)
       (.append nodes (parse-node &reader))
       (while (.peek-and-getc &reader ":")
         (.append nodes (parse-node &reader)))

       (setv &reader.ends-ident old-ends-ident)
       `(slice ~@nodes))

     (assoc hy.&reader.reader-table "#:" read-slice)))

I can probably make this a bit easier to use, perhaps by adding a defreader macro or something of the sort.


Also managed to knock the speed up a tad by removing some more redundant stuff I was doing:

custom         0.2081 s
hy (rply)      0.3102 s

Also please excuse the mess of commits; I'm treating this as a dev branch rather than a PR branch at the moment, but will clean it up before the proper PR lands.

@Kodiologist
Copy link
Member

Well, if you modify the reader (at compile-time), it takes effect immediately!

That seems in scope to me. You can use an ordinary macro in the same file you define it, so it's nice to be able to do the same thing with a reader macro. I don't remember if other Lisps let you do this.

@scauligi
Copy link
Member Author

So I think I'm pretty much ready to submit this as a PR, once I clean up the commits.

Currently, the way one modifies the reader table is by manually adding an entry to the reader_table dict in the currently active reader, which exposes itself as hy.&reader inside of eval-*-compile blocks (see the example I gave above).

It's quite... magic; I'd like to change this to something else but I'm not sure what a good way is of exposing the reader.

Alternatively, I could leave it as-is and create a new macro defreader that handles that for you.
I haven't thought too much about it, but perhaps something like

(defreader "#:" [reader]
  (do things with (.chars reader) and so on))

@scauligi
Copy link
Member Author

scauligi commented Jul 17, 2021

Oh I should probably mention, I snuck in a reader macro #( ... ) for parsing tuples, and I also changed #@ to a (somewhat un-lispy?) reader macro that works much more like Python's decorator syntax.

@Kodiologist
Copy link
Member

I snuck in a reader macro #( ... ) for parsing tuples

Why use that in place of the , result macro?

@scauligi
Copy link
Member Author

#(...) parses to (, ...) since there's no model for tuples;
I tend to use tuples a lot and #(...) looks more like a top level structure (and is similar to Python tuple syntax).

@Kodiologist
Copy link
Member

I wouldn't be in favor of having both syntaxes for the same thing, and I don't think the new one is much of an improvement over the old one.

@allison-casey
Copy link
Contributor

@scauligi mind if i take a crack at cleaning up the API? i'll fork and pr it into this branch.

is there a reason that tuples are the only top level data structures that use a function position symbol? It does look a bit weird in hindsight.

{"a": 1}
#{"a" "b"}
[1 2]
#(1 2)  ; which is a character shorter too vOv
; vs
(, 1 2)

@scauligi
Copy link
Member Author

@allison-casey By all means, that would be bomb! I'm going to be a bit busy with my defense/dissertation so I won't be too available for a while unfortunately.

Also I think you mentioned you're working through the error handling code in cmdline.py? I still haven't figured out what errors are supposed to go where, so I don't think I'm properly raising errors in the right places for the new reader, would you be able to take a look at that as well?

@allison-casey
Copy link
Contributor

Also, would yielding top level forms as a generator essentially turn hy's unit of compilation to top level forms like we've been talking about in #1689?

@scauligi
Copy link
Member Author

I believe it would, yeah, although I don't believe hy2py should be affected at all

@Kodiologist
Copy link
Member

I'm closing this PR because it's inactive. If you'd like to pick it up again, please reopen it when it's ready to be reviewed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants