-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20075][CORE][WIP] Support classifier, packaging in Maven coordinates #17416
Conversation
@@ -941,6 +965,12 @@ private[spark] object SparkSubmitUtils { | |||
val ri = ModuleRevisionId.newInstance(mvn.groupId, mvn.artifactId, mvn.version) | |||
val dd = new DefaultDependencyDescriptor(ri, false, false) | |||
dd.addDependencyConfiguration(ivyConfName, ivyConfName + "(runtime)") | |||
if (mvn.classifier.isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @brkyvz and @BryanCutler (and maybe @vanzin) for a look. This does not quite work yet, but it's close. I believe it's this stanza that's not quite right, where I try to add classifier info to Ivy's description of an artifact.
For example ./bin/spark-shell --packages edu.stanford.nlp:stanford-corenlp:jar:models:3.4.1
works and seems to resolve the right artifact with classifier locally, no errors, but it doesn't seem to be on the classpath.
I don't need you to debug it for me but mostly wondering if you know Ivy much better than I (not seen it until today) and can see an obvious error in this approach. I can't quite figure Ivy's concept that maps to "classifier" in Maven.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this looks ok to me. This seems like the right way to add an artifact with classifier, but I'm not certain. Does the artifact show up in the ResolveReport
at the end of SparkSubmit.resolveMavenCoordinates
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling we might be missing it when we copy the files from the local repo
to the ivy cache. The way the resolution works is as follows:
- Ivy searches local cache
- Ivy searches remote repos
- Ivy downloads from remote repo to local repo
- Ivy copies the file from local repo to cache
Then the files are served from the local cache. Step 4 is called retrieval
and happens on line 1159. I think that's what you're missing
Test build #75179 has finished for PR 17416 at commit
|
case Array(groupId, artifactId, version) => | ||
new MavenCoordinate(groupId, artifactId, version) | ||
case Array(groupId, artifactId, packaging, version) => | ||
new MavenCoordinate(groupId, artifactId, Some(packaging), None, version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can a classifier be given without the packaging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like it, given https://maven.apache.org/pom.html#Maven_Coordinates . If there are four dimensions, the third is packaging.
Thanks @BryanCutler , here's the current full output after trying to fix per #17416 (comment) . Looks like progress I think but still missed something here. Working on it ...
|
So is the problem that it downloads It looks like it might be possible to add the classifier to If I have some time, I'll give it a shot to run it and see what's going on.. |
Yeah, it clearly downloads the right models .jar file, puts it in the .ivy cache correctly (I can find it there as expected) but then
Let me push some more updates including the one you suggested, which didn't seem to change this, but I wouldn't expect it to. Still tracing down exactly what sets this value of spark.jars. |
Test build #75389 has finished for PR 17416 at commit
|
The essence of the problem is that the It is possible to write some code to stitch that extra info back onto the resolved artifacts, I guess. I think that would still leave some corner cases that would not work, like, depending on two artifacts that are identical except for a classifier. |
@srowen Can you confirm what happens when the jars are not found in your local m2 cache? Do you still find the |
@sethah yeah it downloads the right models jar and copies it into the ivy cache. It just doesn't know that it's what belongs on the classpath. spark.jars doesn't get the memo and I can see why, because the I can't figure it out; I'd be surprised if it can't work but don't know enough Ivy to fix it. I was thinking of manually mapping the input artifacts' classifiers to the resolved ones' classifiers but that won't work in the general case. This is mostly seeing if you see something I missed. It's a corner case though causes a problem for a few key artifacts that need classifiers. If it's not obvious how to continue I think I'd shelve this for now and admit defeat. |
@srowen , I finally had some time to look into this and I was able to get the correct jar on the classpath. The fix was to use the code you had in the previous commit for The reason is that when the public static DefaultModuleDescriptor newDefaultInstance(ModuleRevisionId mrid,
DependencyArtifactDescriptor[] artifacts) {
DefaultModuleDescriptor moduleDescriptor = new DefaultModuleDescriptor(mrid, "release",
null, true);
moduleDescriptor.addConfiguration(new Configuration(DEFAULT_CONFIGURATION));
if (artifacts != null && artifacts.length > 0) {
for (int i = 0; i < artifacts.length; i++) {
moduleDescriptor.addArtifact(DEFAULT_CONFIGURATION,
new MDArtifact(moduleDescriptor, artifacts[i].getName(),
artifacts[i].getType(), artifacts[i].getExt(), artifacts[i].getUrl(),
artifacts[i].getExtraAttributes()));
}
} else {
moduleDescriptor.addArtifact(DEFAULT_CONFIGURATION, new MDArtifact(moduleDescriptor,
mrid.getName(), "jar", "jar"));
}
moduleDescriptor.setLastModified(System.currentTimeMillis());
return moduleDescriptor;
} I think that some other code you added in the second commit was also required, which is maybe why it didn't work for you in the first place, but give it another try. Here is the output from my test, looks like it should work now:
|
Thanks @BryanCutler ! so, eh, do you mean you have a working version of this change? feel free to open it as a PR to master or to this PR or whatever, whatever's simplest. I'd be pleased to have a look and get it in. |
Sure, I can PR against your branch. I didn't do anything new, just used the prev version of your
both get downloaded but only the models get on the classpath |
OK @BryanCutler whatever's easiest for you. Whatever you have working, I can test and get merged. |
Test build #75914 has finished for PR 17416 at commit
|
… still not quite right.
Thanks @BryanCutler ! yes that's working better. I can run, for example, There's a smaller corner case that doesn't quite seem to work, and that is adding both a classified and unclassified version of the same artifact, as in Unless you see a clean fix for that too (I don't yet, still looking), I'm still inclined to merge this as it makes a significant set of artifacts now addressable, and loading via I need to add a test. |
Test build #75936 has finished for PR 17416 at commit
|
Well it's a little different, now that I've modified the tests to cover the classifier case. In this case, when the main artifact has a classifier, it doesn't find the dependency that the test sets up. It's definitely declared correctly in the POM that's created. I'm guessing somehow adding the DependencyArtifactDescriptor prevents it from attempting resolution or something. Still looking into it... |
Yeah, I'm not sure why exactly that doesn't work. I know I saw it downloads them both so I think it should work, but I didn't have time to look where it goes wrong. I agree that could be addressed later and this would be fine to be merged, pending the test. |
Yeah, I think it's a little bigger than that -- the test suggests that it won't load dependencies of an artifact with a classifier, which is better than nothing maybe, but still broken enough that I'm worried it is confusing to commit this. I may noodle on this a bit more but I couldn't make a breakthrough yesterday. Just checking if you had any bright ideas from your investigation but if not I may shelve this one for now. |
Oh, I see. I would have expected it to resolve those dependencies too, but it's hard to tell from the Ivy code what it will do. Was this part of a unit test you added here, or just some manual testing you were doing? |
Yeah @BryanCutler basically I was just modifying the existing test to include an artifact with a classifier. If you just add one to the first line in this test in
It won't find the dependency then. |
Closing this for now -- happy to revive it if anyone has other ideas or wants to run with it. |
Do not merge
Initial attempt to add classifier, packaging support. Ivy integration still not quite right.
What changes were proposed in this pull request?
This adds support for
group:artifact:packaging:classifier:version
support in the Maven/Ivy package infrastructure available tospark-submit
et al via--packages
.How was this patch tested?
Existing tests and updated tests.