-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retain resource URL in java-parser results, improve caching #124
Retain resource URL in java-parser results, improve caching #124
Conversation
aebf914
to
c71adc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Appreciate the PR desc.
I guess this could have test coverage both directly, and more importantly it's worth checking out the consumer of this defn - surely it lacked test coverage as well.
Great PR indeed! If we can easily improve the test coverage, that'd be great, but I'm willing to take the improvement regardless. |
The consumer has tests, but no tests that consider Not sure if tests for The current behavior is that Actually we could go a step further here and skip the check on class info that is already cached and coming from a JAR, on the assumption that if a class is coming from a JAR now it will still be coming from a JAR in the future. Seems this also needs a rebase now, I'll get to that. |
I like this idea!
That's too much hassle for little return, so let's not bother with it. |
Looking at this a bit closer the problem is really in java-parser (assoc ...
:file path
:path (.getPath (io/resource path)))
Also keep in mind that the So I think the better solution is actually for java-parser to do
and skip the If it's possible that other callers might still be relying on |
639c881
to
82bee7c
Compare
Glad I slept on this, on closer inspection I've changed the behavior of ;; Before
(select-keys (parser/source-info 'clojure.lang.RT) [:file :path :archive?])
{:file "clojure/lang/RT.java", :path "file:/home/arne/.m2/repository/org/clojure/clojure/1.10.1/clojure-1.10.1-sources.jar!/clojure/lang/RT.java"}
;; After
(select-keys (parser/source-info 'clojure.lang.RT) [:file :path :archive?])
{:file "clojure/lang/RT.java", :path "/home/arne/.m2/repository/org/clojure/clojure/1.10.1/clojure-1.10.1-sources.jar", :archive? true} So that ;; non-archive resources
(select-keys (parser/source-info 'orchard.java.DummyClass) [:file :path :archive?])
{:file "orchard/java/DummyClass.java", :path "/home/arne/github/orchard/test/orchard/java/DummyClass.java", :archive? false} |
0861687
to
578e5f9
Compare
Seems like a fantastic change to me!
Did you check in CIDER how are those paths currently handled? I wondering if the updated format won't breaking something. |
I can't immediately find anywhere in CIDER where Do you have any pointers for things to check? This is specifically for java class info, so things like The thing is that when (-> info
(merge
(when-let [file (:file info)]
(info/file-info file))
)
) Where (defn file-info
[path]
(let [[resource-relative resource-full] (resource/resource-path-tuple path)]
(merge {:file (or (file-path path) resource-full path)}
;; Classpath-relative path if possible
(when resource-relative
{:resource resource-relative})))) Is this path a classpath-relative path, or a CWD-relative path? And in this case (when the info is coming from java.parser) we have all this information already... And then in (let (
(file (or
(and (null var) (cider-resolve-java-class class))
(nrepl-dict-get info "file")
(button-get button 'file)))
) |
I briefly checked the code and it seems you're right.
I think all relative paths should be classpath-relative.
Too much legacy - I'm pretty sure we've slowly accumulated a lot of small inconsistencies over the years. I agree the current behavior is flawed. |
ok, in that case let's take this from the source to the consumer step by step, and try to be consistent about the meaning of each key. In the end there are two things that matter: the classpath-relative path, and the full file/jar URL. The info maps coming from orchard seem to all use For your own REPL perusal (require '[orchard.info :as info] '[orchard.java :as java] '[cider.nrepl.middleware.info :as cider-info])
(info/info 'clojure.core 'str)
(info/info-java 'clojure.lang.RT 'baseLoader)
(java/class-info 'clojure.lang.RT)
(cider-info/format-response (cider-info/info {:ns 'clojure.core :sym 'str}))
(cider-info/format-response (cider-info/info {:ns 'cider.nrepl.middleware.info :sym 'info}))
(cider-info/format-response (cider-info/info {:ns 'clojure.lang.RT :sym 'baseLoader})) My current thinking is this: we look at the places in The cider.nrepl.middleware.info can use Finally we can see how CIDER uses this information, heuristics should only be used when no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be interested in reviewing this PR again later, I guess it's a WIP again
(will be nice to have an updated title / desc summarizing the newer findings)
src/orchard/java/parser.clj
Outdated
"Given a jar:file:...!/... URL, return the location of the jar file on the | ||
filesystem. Returns nil on any other URL." | ||
[^URL jar-resource] | ||
(when (= "jar" (url-protocol jar-resource)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(assert (= "jar" (url-protocol jar-resource))
;; debug info:
(url-protocol jar-resource))
is just as cheap, but also ensures our assumptions hold true. Specifically, jar-path
has only one caller, so it's pretty easy to reason about what our preconditions are
(other jobs are known to take 2m+ minutes) This is worth keeping in mind, ideally we wouldn't regress on performance. I think that if solving all these problems at once we'd get a fast build for every job, see also #131 (comment) |
8c4a051
to
0a1523e
Compare
I pushed a new iteration which simply returns the Having looked around a bit more it seems it's going to be challenging to standardize the different keys used for things, e.g. orchard.java.resource
orchard.java.parser
orchard.info/file-info
orchard.info/info*
|
I think this is good from my end. It's a fairly conservative change so far, it'll get more interesting once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the PR desc up to date?
src/orchard/util/io.clj
Outdated
[^java.net.URL url] | ||
(.getProtocol url)) | ||
|
||
(defn in-archive? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file-within-archive?
perhaps?
Originally I thought this meant a different thing ("is this an archive")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this clear from the param? To me it seems obvious that we're checking if an URL is in archive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it seems obvious that we're checking if an URL is in archive.
Hindsight is always 20/20 :) as mentioned at first the name genuinely confused me. If we pick this code up some years later we can forget these meanings, which is why being extra careful wrt clarity seems desirable for a project like this one.
No biggie though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now url-to-file-within-archive?
as in "does this URL point at a file within an archive?"
src/orchard/util/io.clj
Outdated
[^java.net.URL url] | ||
(= "jar" (url-protocol url))) | ||
|
||
(defn jar-location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jar-location-of-file-within-archive
?
More verbose, but:
- consistent w/ my prior suggestion
- does not take a generic name for future use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with a shorter name as the longer seems extremely verbose. I'm all for clear naming, but we don't want to go overboard here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is now resource-jarfile
, since it returns the .jar
File
instance for a given resource (resources being URLs).
src/orchard/util/io.clj
Outdated
(io/as-file inner-url)))) | ||
|
||
(defn resource-artifact | ||
"Get the fully qualified file name containing the resource identified by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get
is confusing, lead me to believe it referred to the return value at first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrased, thanks.
test/orchard/java/parser_test.clj
Outdated
(:require [orchard.misc :as misc] | ||
[clojure.test :refer :all])) | ||
|
||
(when (>= orchard.misc/java-api-version 9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you follow instead this pattern? ffb71cd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I added test-newer-jdks
analogous to src-newer-jdks
@@ -198,3 +198,7 @@ | |||
classes. If no source is available, return the relative path as is." | |||
[^String path] | |||
{:javadoc (java/resolve-javadoc-path path)}) | |||
|
|||
(comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this some debug code you forgot or it's something you meant to keep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was meaning to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got rid of it after all, the tests provide enough examples.
It'd be nice to include this with the next CIDER release. ETA a couple of weeks. I guess if Arne doesn't have time to tweak this further I'll just merge it as is |
Hey folks, I've been on holiday the last two weeks, and have a good bunch of other stuff on my plate. This is definitely on my stack for soon ™, but I can't give a hard ETA. |
No rush. |
0a1523e
to
6d8e936
Compare
Rebased, going through the review comments now, and will then update the description. |
The java parser/legacy-parser returns a resource's `:file`, actually a relative path as a string, and `:path`, the relative path converted to absolute. It does this conversion through an `io/resource` lookup to find the file system location of the artifact. There are two issues with this: 1. The return value of `io/resource` is discarded, even though it is needed later on in `orchard.java`, causing unnecessary resource lookups. 2. The path is obtained through `(.getPath (io/resource path))`, which yields nonsensical results when used on resources that are inside a JAR. This change keeps the `:file` returned from java-parser (relative path string), but removes the `:path` value in favor of a `:resource-url`, i.e. the return value of `(io/resource path)`. It then provides utility functions to work with this resource URL directly, namely mapping it to a filesystem artifact, or retrieving its modification time. java.parser/parser already does a `io/resource` lookup for the resource it is parsing, meaning it has a full URL (jar: or file:). By including this URL in the map it returns callers can do further checks or operations on the resource without having to re-scan the classpath. This in turn allows us to simplify and optimize `orchard.java/class-info`. The old version would call `io/resource` on each invocation, even when the result was already cached, in order to find the artifact's modification time. This can noticably speed up cider-nrepl's `"stacktrace"` op, which calls `orchard.java/class-info` on each stack frame, in extreme cases taking multiple seconds to analyze all stack frames and return a result. This leads to rather jarring UX, with the stacktrace buffering popping up in Emacs seconds after the evaluation and exception happened. This is exarcerbated when using the nREPL sideloader, which adds overhead to each resource lookup. This also makes the return value of java-parser more correct. As mentioned the previous version would always call `(.getPath (io/resource path))`, but this only makes sense for `file:` resources, not for `jar:` resources. For `file:` URLs `.getPath` returns the path of the file. For `jar:` URLs it returns the nested url+the jar entry, so a `file:` URL but with a dangling `!/jar/entry`. Illustration of why calling `getPath` on `jar:` URLs is not useful: ```clojure (io/resource "lambdaisland/witchcraft.clj") ;;=> #java.net.URL "file:/srv/mc/witchcraft/src/lambdaisland/witchcraft.clj" (.getPath (io/resource "lambdaisland/witchcraft.clj")) ;;=> "/srv/mc/witchcraft/src/lambdaisland/witchcraft.clj" (io/resource "clojure/lang/RT.class") ;;=> #java.net.URL "jar:file:/root/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/lang/RT.class" (.getPath (io/resource "clojure/lang/RT.class")) ;;=> "file:/root/.m2/repository/org/clojure/clojure/1.10.3/clojure-1.10.3.jar!/clojure/lang/RT.class" ```
6d8e936
to
3e047e2
Compare
Squashed, addressed the review comments, updated the commit message and PR title/desc. |
Eyeing in a bit 👀 |
The string may not be directly useful but it can be successfully, meaningfully parsed with a simple algo. I think it was (is?) done in cider.el. I also happen to have Emacs code that parses such strings from cider-nrepl. Regardless, out of caution we can stick to a simple no-breakage policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have the chance to QA this (cider.el -> cider-nrepl -> orchard) it would probably be greatly useful. cider.el's test suite is not exactly an e2e one.
This project has a make install
command (after which, I reckon, you should run cider-nrepl's own make install
)
@vemv feel free to push changes if you deem them necessary |
Can't push to your branch ( |
I'll make a new pr from a branch on the main repo, just not at the computer right now. |
Thanks! Yeah it wasn't working for w/e reason 🌀 |
The java parser/legacy-parser returns a resource's
:file
, actually a relativepath as a string, and
:path
, the relative path converted to absolute. It doesthis conversion through an
io/resource
lookup to find the file system locationof the artifact. There are two issues with this:
The return value of
io/resource
is discarded, even though it is neededlater on in
orchard.java
, causing unnecessary resource lookups.The path is obtained through
(.getPath (io/resource path))
, which yieldsnonsensical results when used on resources that are inside a JAR.
This change keeps the
:file
returned from java-parser (relative path string),but removes the
:path
value in favor of a:resource-url
, i.e. the returnvalue of
(io/resource path)
. It then provides utility functions to work withthis resource URL directly, namely mapping it to a filesystem artifact, or
retrieving its modification time.
java.parser/parser already does a
io/resource
lookup for the resource it isparsing, meaning it has a full URL (jar: or file:). By including this URL in the
map it returns callers can do further checks or operations on the resource
without having to re-scan the classpath.
This in turn allows us to simplify and optimize
orchard.java/class-info
. Theold version would call
io/resource
on each invocation, even when the resultwas already cached, in order to find the artifact's modification time.
This can noticably speed up cider-nrepl's
"stacktrace"
op, which callsorchard.java/class-info
on each stack frame, in extreme cases taking multipleseconds to analyze all stack frames and return a result. This leads to rather
jarring UX, with the stacktrace buffering popping up in Emacs seconds after the
evaluation and exception happened. This is exarcerbated when using the nREPL
sideloader, which adds overhead to each resource lookup.
This also makes the return value of java-parser more correct. As mentioned the
previous version would always call
(.getPath (io/resource path))
, but thisonly makes sense for
file:
resources, not forjar:
resources. Forfile:
URLs
.getPath
returns the path of the file. Forjar:
URLs it returns thenested url+the jar entry, so a
file:
URL but with a dangling!/jar/entry
.Illustration of why calling
getPath
onjar:
URLs is not useful: