Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SemanticDB support #1467

Closed
wants to merge 0 commits into from
Closed

SemanticDB support #1467

wants to merge 0 commits into from

Conversation

aishfenton
Copy link
Contributor

@aishfenton aishfenton commented Jan 29, 2023

Description

Adds support for SemanticDB

Extends the work from the stale PR by @olafurpg and @ricochet1k.

  • Fixed to work with current HEAD
  • Added Scala3 support
  • Adds a enableSemanticDb option, that is set on the scala_toolchain method. Defaults to false. Needed, as semanticdb adds 20-50% overhead to compilation times (which not everyone will want/need).
  • Adds tests for semanticdb in Scala 2 & 3.

Motivation

SemanticDB is needed for many Scala tools, including Metals.

@google-cla
Copy link

google-cla bot commented Jan 29, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@aishfenton
Copy link
Contributor Author

@ricochet1k are you able to sign the Google CLA? Seems that's needed for them to accept your part of the PR.

@aishfenton
Copy link
Contributor Author

aishfenton commented Jan 31, 2023

Updated to add Scala3 test (and fixed Scala3 bug that exposed).

Copy link

@eed3si9n eed3si9n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aishfenton for resurrecting @olafurpg's #1329!

We are in SemanticDB starvation with Scala and Bazel (Metals doesn't work etc), and letting scala_library generate SemanticDB monorepo-wide, I think will get the whole thing moving in the right direction.

FTR I added SemanticDB support in sbt 1.3.0 with the following settings https://www.scala-sbt.org/1.x/docs/sbt-1.3-Release-Notes.html#SemanticDB+support:

ThisBuild / semanticdbEnabled := true
ThisBuild / semanticdbVersion := "4.1.9"
ThisBuild / semanticdbIncludeInJar := false

Regarding rule vs aspect discussion from the previous PR, I think this is a better approach than using aspect since SemanticDB is basically a compiler plugin that generates some binary on the side, so making some tool build with aspect would effectively be a duplicated compilation effort on CI (or more likely on someone's laptop trying to use Metals).

@ricochet1k
Copy link
Contributor

@ricochet1k are you able to sign the Google CLA? Seems that's needed for them to accept your part of the PR.

Strange, I thought I did already. Tried again.

@aishfenton
Copy link
Contributor Author

That did the trick @ricochet1k 👍

@@ -81,6 +81,10 @@ artifacts = {
"@io_bazel_rules_scala_scala_library",
],
},
"org_scalameta_semanticdb_scalac": {
"artifact": "org.scalameta:semanticdb-scalac_2.13.6:4.7.3",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that ScalaMeta version is repeated a few times in this file:

    "org_scalameta_scalameta": {
        "artifact": "org.scalameta:scalameta_2.13:4.3.24",
....
    "org_scalameta_trees": {
        "artifact": "org.scalameta:trees_2.13:4.3.24",

I suggest that it should be factored out as a variable similar to scala_version at line 1:

scala_version = "2.13.6"

and change the above to

        "artifact": "org.scalameta:semanticdb-scalac_{}:{}".format(scala_version, scalameta_version),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in follow up PR. Seems to have a few gotchas, as it also requires ScalaFmt rules to be updated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my own experience - keeping versions duplicated allows simple grep for artifact coords

@liucijus
Copy link
Collaborator

liucijus commented Feb 8, 2023

@aishfenton I will review this next week, sorry for the delay

@liucijus
Copy link
Collaborator

@aishfenton thanks for taking this!

Do you already use SemanticDB with Rules Scala? If yes, I would like to understand how tools consume such output.

Regarding impl, have you consider implementing it as a phase? Eg. scalafmt phase. I'd like to understand if we can avoid adding additional logic to ScalacWorker and keep it within starlark files. At Wix we have macro which adds semantic db generation (authored by @martynasp-wix), which is used for refactoring automation. Here's the gist of it (most likely can be trimmed to much shorter impl by dropping configuration checks):

def _append_parameters_to_list(kwargs, name, values_to_add):
    values = kwargs.pop(name, [])
    kwargs[name] = values + values_to_add

def _add_semanticdb_support(**kwargs):
    semanticdb_plugins = ["@org_scalameta_semanticdb_scalac_2_12_15"]
    semanticdb_scalacopts = [
        "-P:semanticdb:synthetics:on",
    ]

    is_main_repo = native.repository_name() == "@"
    build_semanticdb = kwargs.pop("build_semanticdb", None)

    if build_semanticdb:
        scalacopts = semanticdb_scalacopts
        plugins = semanticdb_plugins
    elif build_semanticdb == False or not is_main_repo:
        scalacopts = []
        plugins = []
    else:
        plugins = select({
            "@wix_tools//dependencies/rules_scala:with_semanticdb": semanticdb_plugins,
            "//conditions:default": [],
        })

        scalacopts = select({
            "@wix_tools//dependencies/rules_scala:with_semanticdb": semanticdb_scalacopts,
            "//conditions:default": [],
        })

    _append_parameters_to_list(kwargs, "plugins", plugins)
    _append_parameters_to_list(kwargs, "scalacopts", scalacopts)

    return kwargs

def with_meta(scala_rule, *args, **kwargs):
    scala_rule(*args, **_add_semanticdb_support(**kwargs))

@aishfenton
Copy link
Contributor Author

aishfenton commented Feb 20, 2023

Thanks @liucijus ! I went with modifying the ScalaWorker class, as Scala3 has SemanticDB built into the compiler (it now gets enabled as a standard scalac option, not a plugin). So I was thinking it'd make the most sense to treat that similar to other first-class scalac options. But that leaves the question for what to do for Scala2... And there I was somewhat hesitant to create a divergence between how 2.x and 3.x work.

Another consideration is that often users will need to switch SemanticDB on/off in a single build (IDE/tools need it, but it adds a 30-50% compile overhead, so generally users will want it to default to off). At the moment I'm supporting that by

  • Making the enableSemanticDb flag a toolchain param
  • Setting up two toolchains (one with it enabled, one with it disabled)
  • Making use of Bazel's --extra_toolchains to switch on/off as required.

Is there a similar mechanism we could exploit for phases? (without having to create two targets for everything).

@aishfenton
Copy link
Contributor Author

aishfenton commented Feb 21, 2023

Regarding how generally they're consumed by the tools. I'm only familiar with Metals. But my guess is that most tools would be similar in how they're using it. Metals does the following.

  • It asks the build system (via different adaptors, for different build systems) to compile a specific target
  • It looks in a user specified "class folder" to find the semanticdb files, alongside each compiled class file.

The last point is going to require a couple of changes on Metal's side to be compatible with Bazel. As we'll need to pass it a back collection of Jars, rather than a single exploded directory (working on that separately with them).

I think the most important consideration is that enabling semanticdb can slow down the compile task quite a lot (the scalameta team mentioned 30-50% to me, but obviously it depends on complexity of the code). So it's going be to be important to be able to toggle it on/off depending on the caller.

@liucijus
Copy link
Collaborator

liucijus commented Feb 21, 2023

Thanks @liucijus ! I went with modifying the ScalaWorker class, as Scala3 has SemanticDB built into the compiler (it now gets enabled as a standard scalac option, not a plugin). So I was thinking it'd make the most sense to treat that similar to other first-class scalac options. But that leaves the question for what to do for Scala2... And there I was somewhat hesitant to create a divergence between how 2.x and 3.x work.

I think phase can make a decision which parameters to pass based on configured scala version. The fact that we had to split compiler code for 2 and 3 already complicates maintenance, I'd like to remove code from there where it is possible.

Another consideration is that often users will need to switch SemanticDB on/off in a single build (IDE/tools need it, but it adds a 30-50% compile overhead, so generally users will want it to default to off). At the moment I'm supporting that by

* Making the `enableSemanticDb` flag a toolchain param

* Setting up two toolchains (one with it enabled, one with it disabled)

* Making use of Bazel's `--extra_toolchains` to switch on/off as required.

Is there a similar mechanism we could exploit for phases? (without having to create two targets for everything).

What works for us with dependency tracking is filters on toolchain. It does not have to be on the toolchain, but you need to have a way to enable it per target.

@aishfenton
Copy link
Contributor Author

aishfenton commented Feb 21, 2023

@liucijus , can you explain a bit more on how filters could solve this?

From what I can grok, filters seem to be about enabling something for a subset of targets, whereas the situation here is that we want it to enable/disable it for all targets (although doing it for a subset of targets could be nice too, but that hasn't come up as a use-case yet). And we need to be able to do that from the Bazel command line (without changing any BUILD files).

@aishfenton
Copy link
Contributor Author

aishfenton commented Feb 21, 2023

FWIW. I think conceptually semanticdb is quite similar to diagnosticsproto. It's an additional diagnostic output from the compiler, for IDE/tool consumers. Generally you want it on/off for all scalac targets. But you need to be able to make that switch at the command line (which you can do with diagnosticsproto via Bazel's --extra_toolchains option).

(i.e. so that IDEs can include --extra_toolchains option, to get the additional diagnostic outputs. Whereas normal runs of bazel won't have it enabled).

@aishfenton
Copy link
Contributor Author

aishfenton commented Feb 21, 2023

Just to run the design change past you, for phases (to make sure I've got it right).

  • Create a new phase, phase_semanticdb.bzl
  • The phase would check the toolchain to see if semanticdb is enabled, or not.
  • It'd then set the appropriate p.scalacopt options depending on that (and setting different ones for Scala2/3).

We'd include that phase as a standard phase (unlike phase_scalafmt)? Since it's enabled/disabled by the toolchain, rather than by specifying a custom set of phases.

Also want check an alternative design with you .. just in case.

  • Modify, phase_compile.bzl to see if the toolchain hassemanticdb enabled, or not.
  • Set the appropriate scalacopt options, before calling the scalac worker (and setting different ones for Scala2/3).

Wondering if that'd be clearer, since the semanticdb phase above isn't really doing much more than a pass through. (and this'd be more consistent with diagnosticproto).

@liucijus
Copy link
Collaborator

@liucijus , can you explain a bit more on how filters could solve this?

From what I can grok, filters seem to be about enabling something for a subset of targets, whereas the situation here is that we want it to enable/disable it for all targets (although doing it for a subset of targets could be nice too, but that hasn't come up as a use-case yet). And we need to be able to do that from the Bazel command line (without changing any BUILD files).

Sorry for not being clear. What I had in mind is that rebuilding all the graph might be expensive, so filtering is a nice way to limit the impact. I'm not saying that it's needed in case of semantic db, but dropping some food for thought.

@liucijus
Copy link
Collaborator

Just to run the design change past you, for phases (to make sure I've got it right).

* Create a new phase, `phase_semanticdb.bzl`

* The phase would check the toolchain to see if `semanticdb` is enabled, or not.

* It'd then set the appropriate `p.scalacopt` options depending on that (and setting different ones for Scala2/3).

We'd include that phase as a standard phase (unlike phase_scalafmt)? Since it's enabled/disabled by the toolchain, rather than by specifying a custom set of phases.

Also want check an alternative design with you .. just in case.

* Modify, `phase_compile.bzl` to see if the toolchain has`semanticdb` enabled, or not.

* Set the appropriate `scalacopt` options, before calling the scalac worker (and setting different ones for Scala2/3).

Wondering if that'd be clearer, since the semanticdb phase above isn't really doing much more than a pass through. (and this'd be more consistent with diagnosticproto).

I may be wrong but I think current phase implementation forces to have the following design:

  • Have a separate phase (phase_semanticdb.bzl) to check if semantic db is enabled. Can be simple toolchain attribute check. Separate phase allows it to be replaced with different logic if Rules Scala users want it.
  • (Unfortunately) at some other phase (maybe phase_compile.bzl) you need to translate output of sematicdb phase decision to scalacopts

Btw, pass through phases may look like they do almost nothing, but they are extension points which can be changed to a different logic by the users. For example filtering idea discussed above, can be implemented as a custom phase.

@aishfenton
Copy link
Contributor Author

@liucijus had a go at those changes. Is that what you had in mind ? (one test environment is failing for bazel-green-head ... but not sure if that's related to my changes, as seems to be about scala_proto? 🤔 ).

Copy link
Collaborator

@liucijus liucijus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aishfenton my expectations are to have all the sematic db logic implemented in starlark without anything added to worker classes. If you think this is not doable for some reason - let me know.

@@ -243,6 +243,17 @@ private static String[] getPluginParamsFrom(CompileOptions ops) {
"-P:dependency-analyzer:unused-deps-ignored-targets:"
+ String.join(":", ignoredTargets));
}
}

if (ops.enableSemanticDb) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aishfenton can this decision be made in one of the phases? My goal is to not have additional logic in worker code.

String[] params = {
"-Yrangepos",
"-P:semanticdb:failures:error",
"-Xplugin-require:semanticdb"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these ops could be part of sematicdb phase. Eg.

def phase_semanticdb(ctx, p):
    ...

        return struct(
            enabled = enable_semanticdb, 
            scalacopts = [
                "-Yrangepos",
               "-P:semanticdb:failures:error",
               "-Xplugin-require:semanticdb"
          ]
        )
     ...

then this phase can be consumed by compile phase to augment other scalacopts

@crt-31
Copy link
Contributor

crt-31 commented Mar 21, 2023

I tried out the changes in aishfenton’s pull request. I was able to get it to work with a Metals BSP prototype I’m experimenting with. Here’s some of my feedback:

  1. All the generated *.semanticdb should be declared as action outputs so that they work with bazel’s caching.

  2. setup_scala_toolchain() should take a parameter for the semanticdb_scalac classpath (just like scala_compile_classpath and scala_library_classpath). This would allow to customize the semanticdb lib we want to use in this function.

  3. The semanticdb version is always being set to 2.13.6 when major version is 2.13, but it should use the version that the compiler is using. (Looks like @luicijus already made a review comment related to this).

  4. Might suggest there be an error if semanticdb_scalac depprovider has more than one jar dependency.

  5. Might wanna support user ability to add additional semanticdb options if they want. (Maybe this is something that is already available using the existing scala_toolchain.scalaopts parameter?)

  1. On Metals integration: I think that as long as an aspect can get the semanticdb jar path (which in the PR can be done using the semanticdb_scalac depprovider), a metals bsp adapter has mostly what it needs. Metals will ask for the scalacopts for a target, and then parses them to determine if semanticdb is being used (if option starts with “-Xplugin:” and contians “semanticdb-scalac”) as well as using the targetroot and sourceroot options. I would imagine, that a metals bsp adapter would be able to give Metals a groomed set of options that sets the -Xplugin, targetroot, and sourceroot to the correct values for Metals, even if bazel uses different options… which would be the case with relative path differences. Long way of saying I’m thinking the PR gives what is needed for a Metals integration as it is.

@crt-31
Copy link
Contributor

crt-31 commented Nov 30, 2023

This ticket can probably be closed now. SemanitcDB support was added in #1508

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants