Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Make
QuarkusBuild
not pollute Gradle's build cache
Make `QuarkusBuild` not pollute Gradle's build cache Improve Quarkus build in Gradle, proper up-to-date mechanisms, right config-source priorities. Currently the `QuarkusBuild` task implementation adds even large build artifacts and unmodified dependency jars to Gradle's build cache. This pollutes the Gradle build cache quite a lot and therefore lets archived build caches become unnecessary huge. On top of that, "relicts" from previous runs of `QuarkusBuild` using a different configuration, like a different package-type, causes multiple permutations of the cached artifacts and leads to restoring unexpected artifacts from the build cache. This large change updates the Gradle logic to have "proper" up-to-date checks and in turn a reasonable setting of what is cached and what is not cached. Background on how I discovered the "caching and up-to-date issue" with `QuarkusBuild`: Quarkus 2.16 added the `@CacheableTask` annotation to `QuarkusBuild`. This means that _everything_ that is generated by `QuarkusBuild` (and "declared" as an output: fast-jar directory, uber-jar, native-runner) is cached. After having Quarkus 2.16.x in our "main" branch for about 1/2 day, nearly the whole "10 GB budget" for cache artifacts in GitHub was occupied by Gradle build cache artifacts - the build cache artifacts grew from build to build - this was not the case without the `@CacheableTask` annotation. The current inputs of the `QuarkusBuild` task are also incomplete. Those inputs do not include for example the configured `finalName`, the settings on the Gradle project and the settings on the Quarkus extension. This leads to situations that the wrong Gradle cache artifacts could be restored (think: you ask for a fast-jar, but get an uber-jar). Another issue is that repeated runs of `quarkusBuild` against the exact same code base and dependencies leads to different outputs, the contents of the generated jars and e.g. quarkus-application.dat differ. This causes different outputs of the `QuarkusBuild` task, which cause depending tasks and projects to unnecessarily run/work again. The initial goal of this effort was to make caching of `fast-jar` builds work nice in CI environments, I assume that's the most common package type used in CI environments for example for testing purposes. The idea here is to _not_ cache all the dependency jars, but leverage the mechanisms available in Gradle to "just collect" the dependencies. In other words (and simplified): for `fast-jar`, cache everthing "in `quarkus-app` except `lib/`. There is no change for users running a `./gradlew quarkusBuild`. `QuarkusBuild` is still the task that users should execute (or depend on in their builds). To achive the goal to make especially `fast-jar` builds "nicely cacheable" in CI, two new tasks had to be introduced: * `QuarkusBuildDependencies` uses the dependency information from the Quarkus `ApplicationModel` to tell Gradle to collect those. This task works for `fast-jar` and `legacy-jar` and is a "no op" for other package types. This task is _never_ cacheable, but has _up-to-date_ checks. * `QuarkusBuildCacheableAppParts` performs a Quarkus application build and collects the _cacheable_ parts of that build. This task works for `fast-jar` and `legacy-jar` and is a "no op" for other package types. This task is cacheable. * `QuarkusBuild`, for `fast-jar` and `legacy-jar`, assembles the outputs of the above two tasks to produce the expected output. Performs a Quarkus build for other package types. See below on when this task is cacheable. The `build/quarkus-build/` directory is used to properly separate the outputs of the above three tasks: * `build/quarkus-build/gen/` receives the full output of a Quarkus build * `build/quarkus-build/app/` receives the _cacheable_ parts from `build/quarkus-build/gen/` * `build/quarkus-build/dep/` receives the dependency jars The output of `QuarkusBuild` is, by default, cacheable in non-CI environments and _not_ cacheable in CI environments. The behavior can be explicitly overridden using the new property `cacheLargeArtifacts` property. As outlined above, caching huge artifacts is not beneficial in CI, either due to space limitations (GitHub's 10GB limit for example) or just the cost of network traffic/duration. On a developer's machine however, the cost of storing/retrieving even bigger artifacts is rather neglectible and lower than rebuilding a Quarkus application. Before this change, the "priority" of configuration settings was (in most cases...) effectively: System properties, environment variables, `application.properties`, configurations in Gradle build scripts & Gradle project properties. This order is unintuitive and was changed to: system properties, environment variables, Gradle project properties, Quarkus build configuration, application.properties. The previous code had several places in which Quarkus related configuration settings were retrieved with sometimes different priorities of the "config sources". The whole "machinery" to get the configuration has been rewritten and encapsulated in new classes `EffectiveConfig` (effective for a Quarkus build) and `BaseConfig` (available during Gradle task configuration phase). `EffectiveConfig` is not really different from `BaseConfig, but contains the "special" settings from e.g. `ImagePush` task. The new implementation also uses SmallRye Config and especially the existing Quarkus mechanisms to pull information out of a `PackageConfig` _object_ produced from a configuration. It turned out to be easier to reason about `PackageConfig` than "raw property values". Support for `application.yaml/yml` has also been added. All calls into the "inner workings" of a Quarkus build and Quarkus code generation have been moved to separate worker processes. The code in this change to support this is pretty dead simple. The primary reason to move that work into isolated worker processes is that all the `QuarkusBootstrap` and derived pieces know nothing about Gradle, so there is no "property like" mechanism to override settings from `application.properties/yaml/yml` with those from e.g. the `QuarkusPluginExtension`. The only option would have been to modify the system properties of the Gradle build process - but that's a no-go, especially considering other tasks running in parallel (think: two `QuarkusBuild` of different projects running at the same time). It would have been relatively easy to serialize all `QuarkusBuild` actions across a Gradle build, but why - and it would prevent using "beefy machines" to run many `QuarkusBuild`s in parallel. Another option would have been to implement a rather complex (and likely very racy) mechanism to track modifications to system properties. As a conclusion, it wasn't just very simple to leverage the process isolation using the Gradle worker API, but it's also not bad. Gradle does reuse already spawned and compatible worker instances. The "trick" implemented to "prioritize" configs from Gradle project properties and Quarkus extension settings is to pass the whole config as system properties to the worker. A bunch of hopefully useful logging has been implemented and added. With info-level loogging, the tasks emit at least some basic information about what they do and at least the package type being used. To be able to investigate which configuration settings were effectively used, there's another new task called `quarkusShowEffectiveConfig` that shows all the `quarkus.*` properties and some more information, including the loaded `application.(properties|yaml|yml)`. This task is intended to debug build issues. The task can optionally save the effecitve config properties as a properties file in the `build/` directory, if run with the command line option `./gradlew quarkusShowEffectiveConfig --save-config-properties`. None of the existing tests has changed, except the `BuildCOnfigurationTest` had to be adopted to reflect the updated order of config sources. Relates to: quarkusio#30852
- Loading branch information