Use Earthly for Build Automation #5024

mattt · 2022-04-21T13:29:37Z

This PR replaces our Dockerfiles, build scripts, and test scripts with a new build system using Earthly.

Motivation

Our CI system currently takes about 30 minutes to complete¹. This long build time is resource intensive and makes it difficult to iterate on a problem quickly. Our build system is also very complicated. For example, if a developer wanted to test a code change related to the .NET ecosystem, they'd need to install Ruby, Python, Node.js, Elm, PHP, Go, Elixir, Rust, Terraform, and Dart before getting to test those changes (which require none of those dependencies).

Over the past month or so, I've been looking into ways to improve our situation. After building out a prototype, I found a solution that reduces our overall run time by 20 – 30% and greatly simplifies our build system.

Results

Here's a table showing the times for the CI run for a recently merged PR (before) compared to the CI run for this PR.

All times are measured in seconds (smaller is better).

	Before	After	Improvement
Bundler	1559	484	69.0%
Cargo	1278	209	83.6%
Common	1054	141	86.6%
Composer	1172	321	72.6%
Docker	1093	128	88.3%
Elm	1154	125	89.2%
Git Submodules	955	111	88.4%
GitHub Actions	1249	124	90.1%
Go Modules	1598	569	64.4%
Gradle	1039	111	89.3%
Hex	1567	402	74.3%
Maven	995	98	90.2%
npm and yarn	1557	547	64.9%
Omnibus	1177	111	90.6%
Python	2700	1560	42.2%
Pub	1170	234	80.0%
Terraform	1079	225	79.1%

Disclaimer: CI times are highly variable. For other runs, such as this scheduled CI run, Earthly represents closer to 20 – 30% improvement.

How It Works

Earthly is a build automation tool for creating Docker images and other artifacts. Builds happen in containers, making the process self-contained, portable, and repeatable.

To use Earthly, you write down a set of instructions in an Earthfile, which has a syntax that's kind of a cross between a Makefile and Dockerfile. Each build product is described by a target, which can depend on other targets. By decomposing jobs into component parts, Earthly can run steps in parallel and cache intermediary layers more efficiently than a typical Dockerfile.

The dependabot-core project has a somewhat complex and unconventional structure. Support for each ecosystem is provided by a gem (e.g. dependabot-gradle) within a separate subdirectory (e.g./gradle) containing its manifest, source, and test files. Each ecosystem depends on the dependabot-common gem, which resides in a sibling /common directory. There's also a dependabot-omnibus gem that bundles together all of the ecosystem gems along with dependabot-common. A top-level Dockerfile is responsible for installing the system library dependencies for all of the ecosystems into a monolithic container, which is used for development, testing, and production.

For this PR, I took our existing Dockerfile and extracted the relevant instructions for each ecosystem into a separate Earthfile in their respective subdirectory (this was challenging, because instructions are spread across different sections of the file).

For example, our Dockerfile includes the following section for installing the dependencies needed for Go support:

### GO

# Install Go
ARG GOLANG_VERSION=1.17.7
ARG GOLANG_CHECKSUM=02b111284bedbfa35a7e5b74a06082d18632eff824fd144312f6063943d49259
ENV PATH=/opt/go/bin:$PATH
RUN cd /tmp \
  && curl --http1.1 -o go.tar.gz https://dl.google.com/go/go${GOLANG_VERSION}.linux-amd64.tar.gz \
  && echo "$GOLANG_CHECKSUM go.tar.gz" | sha256sum -c - \
  && tar -xzf go.tar.gz -C /opt \
  && rm go.tar.gz

Further down, there are additional instructions for building and installing the required helper tools:

COPY --chown=dependabot:dependabot go_modules/helpers /opt/go_modules/helpers

# ...

RUN bash /opt/go_modules/helpers/build

The corresponding Earthfile in the /go_modules directory organizes this logic into an Earthly user-defined command (UDC). It copies the Go executable from the go target (SAVE ARTIFACT + COPY here is equivalent to a COPY --from <image>). The go target is set as the base layer for the helpers target, which produces the Go helper executable. Finally, the test target calls SETUP and runs the unit tests for dependabot-go_modules:

VERSION 0.6
FROM ruby:2.7.1
WORKDIR /dependabot-go_modules

ARG GOLANG_VERSION=1.18

SETUP:
    COMMAND

    ENV DEPENDABOT_NATIVE_HELPERS_PATH="${DEPENDABOT_NATIVE_HELPERS_PATH:-/opt}"
    
    DO ../common+CREATE_DEPENDABOT_USER
    
    COPY --chown=dependabot:dependabot +go/go "$DEPENDABOT_NATIVE_HELPERS_PATH/go"
    ENV PATH="$DEPENDABOT_NATIVE_HELPERS_PATH/go/bin:$PATH"

    COPY --chown=dependabot:dependabot +helpers/helper $DEPENDABOT_NATIVE_HELPERS_PATH/go_modules/bin/helper

go:
    FROM golang:$GOLANG_VERSION-buster

    SAVE ARTIFACT /usr/local/go

helpers:
    FROM +go

    ENV GOOS="$TARGETOS"
    ENV GOARCH="$TARGETARCH"

    COPY --dir helpers .

    WORKDIR helpers

    RUN go build -o "../helper" \
     && go clean -cache

    SAVE ARTIFACT ../helper

deps:
    COPY --dir ../common+shared/common ../common
    COPY Gemfile *.gemspec .
    RUN bundle install --jobs 4 --retry 3

    COPY --dir lib spec .

test:
    FROM +deps
    DO +SETUP

    DO ../common+CONFIGURE_GIT_USER

    RUN bundle exec rspec spec

lint:
    FROM +deps

    COPY ../+shared/.rubocop.yml ../
    COPY .rubocop.yml .
    RUN bundle exec rubocop .

By defining fully-contained test and lint targets for the Go subproject, we can run them individually by running the following command in the top-level directory:

$ earthly ./go_modules+test
$ earthly ./go_modules+lint

The revised ci.yml workflow in my prototype takes this exact approach:

jobs:
  ci:
    name: "${{ matrix.suite.name }}"
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        suite:
          # ...
          - { path: git_submodules, name: "Git Submodules" }
          # ...    
     steps:
      - name: Checkout code
        uses: actions/checkout@v2
      - name: Install Earthly
        uses: earthly/actions-setup@v1
        with:
          version: "v0.6.10"
      - name: Run Tests
        run: |
          earthly --ci ./${{ matrix.suite.path }}+test
      - name: Lint
        run: |
          earthly --ci ./${{ matrix.suite.path }}+lint

Results

In addition to making our builds faster, Earthly eliminates several points of friction:

CI can run on forks: Our current CI system uses a remote caching system that requires authentication to container registries. As of this PR, this is no longer necessary.
Native helpers automatically update on rebuild: Any changes to an ecosystem helper are automatically reflected the next time Earthly runs.
No more build scripts: Steps for building helpers are encoded directly into the the container build process
No more ci-test scripts: Steps for testing are encoded in each ecosystem's Earthfile
No more bin/docker-dev-shell: Rebuilding containers is fast enough that it's no longer necessary to develop within a Docker image.
One-line system dependency upgrades: Because all system dependencies versions are defined by a single ARG declaration at the top of each Earthfile, upgrading from, say, Go 1.17 to 1.18, is a single line change.

It also lays the foundation for some compelling follow-up work:

Beta ecosystems: Earthly makes it easy to conditionalize logic behind a build argument, which could allow us to lift our moratorium on new ecosystems.
Replace Rakefile: Earthly can build non-Docker artifacts, including Ruby .gem files. We could remove our dependency on rake by porting that logic to gem targets in (e.g. earthly +gem --push).
Slim ecosystem containers: By defining a docker target on each ecosystem Earthfile, we can create images that ship a version of dependabot-core for only that ecosystem.
Further improvements to build time: This PR only scratches the surface of what we can do to speed up CI. Additional caching directives could speed things up even more.

Consider the setup instructions from our contribution guidelines:

Getting set up to run all of the tests on Dependabot isn't as simple as we'd like it to be - sorry about that. Dependabot needs to shell out to multiple different languages to correctly update dependency files, which makes things a little complicated.

With this new setup, we'll no longer have to apologize for things being complicated. 😃

Next Steps

Fix Failing Tests

There are still a handful of failing tests across a couple ecosystems, and it's unclear whether those are A) broken behavior that need fixing, or B) broken tests that rely on hardcoded assumptions.

We'll need to address these to get CI ✅ before merging this PR.

Here's a breakdown of the current test failures:

Unresolved CI Failures by Ecosystem

Bundler (3)

1) Dependabot::Bundler::FileUpdater#updated_dependency_files the updated lockfile when the Gemfile.lock didn't have a BUNDLED WITH line doesn't add in a BUNDLED WITH
   Failure/Error:
     SharedHelpers.run_helper_subprocess(
       command: command,
       function: function,
       args: args,
       env: {
         # Bundler will pick the matching installed major version
         "BUNDLER_VERSION" => bundler_version,
         "BUNDLE_GEMFILE" => File.join(helpers_path, "Gemfile"),
         "GEM_PATH" => File.join(helpers_path, "vendor/cache")
       }
   Dependabot::SharedHelpers::HelperSubprocessFailed:
   # /common/lib/dependabot/shared_helpers.rb:128:in `rescue in run_helper_subprocess'
   # /common/lib/dependabot/shared_helpers.rb:77:in `run_helper_subprocess'
   # ./lib/dependabot/bundler/native_helpers.rb:44:in `block in run_bundler_subprocess'
   # ./lib/dependabot/bundler/native_helpers.rb:40:in `run_bundler_subprocess'
   # ./lib/dependabot/bundler/file_updater/lockfile_updater.rb:68:in `block in build_updated_lockfile'
   # /common/lib/dependabot/shared_helpers.rb:48:in `block in in_a_temporary_directory'
   # /common/lib/dependabot/shared_helpers.rb:48:in `chdir'
   # /common/lib/dependabot/shared_helpers.rb:48:in `in_a_temporary_directory'
   # /common/lib/dependabot/shared_helpers.rb:37:in `in_a_temporary_repo_directory'
   # ./lib/dependabot/bundler/file_updater/lockfile_updater.rb:62:in `build_updated_lockfile'
   # ./lib/dependabot/bundler/file_updater/lockfile_updater.rb:46:in `updated_lockfile_content'
   # ./lib/dependabot/bundler/file_updater.rb:157:in `updated_lockfile_content'
   # ./lib/dependabot/bundler/file_updater.rb:41:in `updated_dependency_files'
   # ./spec/dependabot/bundler/file_updater_spec.rb:54:in `block (3 levels) in <top (required)>'
   # ./spec/dependabot/bundler/file_updater_spec.rb:240:in `block (4 levels) in <top (required)>'
   # ./spec/dependabot/bundler/file_updater_spec.rb:516:in `block (5 levels) in <top (required)>'
   # ./spec/spec_helper.rb:54:in `block (2 levels) in <top (required)>'
   # /common/spec/spec_helper.rb:49:in `block (2 levels) in <top (required)>'
   # /usr/local/bundle/gems/webmock-3.14.0/lib/webmock/rspec.rb:37:in `block (2 levels) in <top (required)>'
   # ------------------
   # --- Caused by: ---
   # JSON::ParserError:
   #   783: unexpected token at ''
   #   /common/lib/dependabot/shared_helpers.rb:118:in `run_helper_subprocess'

2) Dependabot::Bundler::FileUpdater#updated_dependency_files the updated lockfile when a gem has been yanked and it's another gem does not touch the yanked gem
   Failure/Error: expect(file.content).to include("business (1.4.1)")
     expected "GEM\n  remote: https://rubygems.org/\n  specs:\n    business (1.18.0)\n    statesman (1.3.1)\n\nPLAT...ORMS\n  ruby\n\nDEPENDENCIES\n  business (~> 1.4)\n  statesman (~> 1.3)\n\nBUNDLED WITH\n   2.2.0\n" to include "business (1.4.1)"
     Diff:
     @@ -1,15 +1,29 @@
     -business (1.4.1)
     +GEM
     +  remote: https://rubygems.org/
     +  specs:
     +    business (1.18.0)
     +    statesman (1.3.1)
     +
     +PLATFORMS
     +  ruby
     +
     +DEPENDENCIES
     +  business (~> 1.4)
     +  statesman (~> 1.3)
     +
     +BUNDLED WITH
     +   2.2.0
   # ./spec/dependabot/bundler/file_updater_spec.rb:335:in `block (6 levels) in <top (required)>'
   # ./spec/spec_helper.rb:54:in `block (2 levels) in <top (required)>'
   # /common/spec/spec_helper.rb:49:in `block (2 levels) in <top (required)>'
   # /usr/local/bundle/gems/webmock-3.14.0/lib/webmock/rspec.rb:37:in `block (2 levels) in <top (required)>'

3) Dependabot::Bundler::UpdateChecker::VersionResolver#latest_resolvable_version_details with a rubygems source with a Bundler v2 version specified attempting to update Bundler returns nil as resolution returns the bundler version installed by core
   Failure/Error: expect(subject).to be_nil
     expected: nil
          got: {:version=>#<Gem::Version "1.16.3">}
   # ./spec/dependabot/bundler/update_checker/version_resolver_spec.rb:151:in `block (6 levels) in <top (required)>'
   # ./spec/spec_helper.rb:54:in `block (2 levels) in <top (required)>'
   # /common/spec/spec_helper.rb:49:in `block (2 levels) in <top (required)>'
   # /usr/local/bundle/gems/webmock-3.14.0/lib/webmock/rspec.rb:37:in `block (2 levels) in <top (required)>'

Composer (1)

Failure caused by failed authentication to GitHub.com?

1) Dependabot::Composer::UpdateChecker#latest_resolvable_version with a git source dependency that's not the dependency we're checking with an alias 
    Failure/Error:
      SharedHelpers.run_helper_subprocess(
        command: "php -d memory_limit=-1 #{php_helper_path}",
        allow_unsafe_shell_command: true,
        function: "get_latest_resolvable_version",
        args: [
          Dir.pwd,
          dependency.name.downcase,
          git_credentials,
          registry_credentials
        ]
    Dependabot::SharedHelpers::HelperSubprocessFailed:
      Could not authenticate against github.com

Python (1)

Failure due to, um... Rust not being installed?

  Dependabot::Python::FileUpdater::PoetryFileUpdater#updated_dependency_files with a supported python version updates the lockfile
      Failure/Error:
        raise SharedHelpers::HelperSubprocessFailed.new(
          message: stdout,
          error_context: {
            command: command,
            time_taken: time_taken,
            process_exit_value: process.to_s
          }
        )
...

    running egg_info
    writing src/cryptography.egg-info/PKG-INFO
    writing dependency_links to src/cryptography.egg-info/dependency_links.txt
    writing requirements to src/cryptography.egg-info/requires.txt
    writing top-level names to src/cryptography.egg-info/top_level.txt
    reading manifest file 'src/cryptography.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    no previously-included directories found matching 'docs/_build'
    warning: no previously-included files found matching 'vectors'
    warning: no previously-included files matching '*' found under directory 'vectors'
    warning: no previously-included files matching '*' found under directory '.github'
    warning: no previously-included files found matching 'release.py'
    warning: no previously-included files found matching '.coveragerc'
    warning: no previously-included files found matching 'codecov.yml'
    warning: no previously-included files found matching '.readthedocs.yml'
    warning: no previously-included files found matching 'dev-requirements.txt'
    warning: no previously-included files found matching 'tox.ini'
    warning: no previously-included files found matching 'mypy.ini'
    warning: no previously-included files matching '*' found under directory '.circleci'
    adding license file 'LICENSE'
    adding license file 'LICENSE.APACHE'
    adding license file 'LICENSE.BSD'
    adding license file 'LICENSE.PSF'
    writing manifest file 'src/cryptography.egg-info/SOURCES.txt'
    copying src/cryptography/py.typed -> build/lib.linux-x86_64-3.6/cryptography
    creating build/lib.linux-x86_64-3.6/cryptography/hazmat/bindings/_rust
    copying src/cryptography/hazmat/bindings/_rust/__init__.pyi -> build/lib.linux-x86_64-3.6/cryptography/hazmat/bindings/_rust
    copying src/cryptography/hazmat/bindings/_rust/asn1.pyi -> build/lib.linux-x86_64-3.6/cryptography/hazmat/bindings/_rust
    copying src/cryptography/hazmat/bindings/_rust/ocsp.pyi -> build/lib.linux-x86_64-3.6/cryptography/hazmat/bindings/_rust
    copying src/cryptography/hazmat/bindings/_rust/x509.pyi -> build/lib.linux-x86_64-3.6/cryptography/hazmat/bindings/_rust
    running build_ext
    running build_rust
    
        =============================DEBUG ASSISTANCE=============================
        If you are seeing a compilation error please try the following steps to
        successfully install cryptography:
        1) Upgrade to the latest pip and try again. This will fix errors for most
            users. See: https://pip.pypa.io/en/stable/installing/#upgrading-pip
        2) Read https://cryptography.io/en/latest/installation/ for specific
            instructions for your platform.
        3) Check our frequently asked questions for more information:
            https://cryptography.io/en/latest/faq/
        4) Ensure you have a recent Rust toolchain installed:
            https://cryptography.io/en/latest/installation/#rust
    
        Python: 3.6.9
        platform: Linux-5.10.104-linuxkit-x86_64-with-debian-10.5
        pip: 21.3.1
        setuptools: 59.6.0
        setuptools_rust: 1.1.2
        =============================DEBUG ASSISTANCE=============================

Update `devcontainer.json`

Development containers have built-in support for Dockerfile and docker-compose.yml, but not Earthfile. So we'll need to switch to using a pre-built dependabot-core-development image. I don't yet have a sense of how this would impact our development experience for Codespaces users.

Smoke Test Container

I've spent a lot of time getting tests to pass, but I don't have a good sense of what implicit requirements exist for the container in production.

Before merging, we should manually build a container image with Earthly and see how it works in staging.

https://github.com/dependabot/dependabot-core/actions/runs/2005806532 ↩

mattt · 2022-04-21T13:35:47Z

bundler/helpers/v1/spec/native_spec_helper.rb

@@ -2,7 +2,6 @@

 require "rspec/its"
 require "webmock/rspec"
-require "debug"


The debug gem was causing issues during dependency resolution. I removed it here, because we didn't appear to be using it anywhere.

mattt · 2022-04-21T13:38:32Z

bundler/lib/dependabot/bundler/native_helpers.rb

@@ -49,8 +49,7 @@ def self.run_bundler_subprocess(function:, args:, bundler_version:, options: {})
              # Bundler will pick the matching installed major version
              "BUNDLER_VERSION" => bundler_version,
              "BUNDLE_GEMFILE" => File.join(helpers_path, "Gemfile"),
-              # Prevent the GEM_HOME from being set to a folder owned by root
-              "GEM_HOME" => File.join(helpers_path, ".bundle")
+              "GEM_PATH" => File.join(helpers_path, "vendor/cache")


GEM_PATH defines where gems are loaded, whereas GEM_HOME has to do with where gems are installed by default. We set GEM_PATH here to point to a directory created by bundle package, which creates a self-contained distribution of vendored gems.

The vendor/cache folder generated by bundle package is not a proper GEM_PATH or GEM_HOME location, it's just a folder with .gem packages inside to act as a local cache of gems for an application, so that bundle install doesn't need to use the network for fetching .gem packages.

I don't think the current setup, where GEM_HOME is set to a .bundle folder works as expected either, because bundle config --local path .bundle does not actually install gems to .bundle, but to .bundle/ruby/2.7.0 instead (this is so that switching rubies does not reuse the installation path and mess up your native gems that will have different byte code depending on the Ruby version).

So basically the current setup is essentially setting GEM_HOME to an empty folder. I think the reason things still work currently is because GEM_PATH is unchanged and still includes the default GEM_HOME which has the proper version of bundler installed.

To provide a self-contained distribution of gems one normally uses bundle config path, which is what the code was previously doing, but I don't think it was working as expected anyways because:

bundle config --local path .bundle does not actually set gems to be installed to .bundle, but to .bundle/ruby/2.7.0 instead (this is so that switching rubies does not reuse the installation path and mess up your native gems that will have different byte code depending on the Ruby version).

Even if that used a correct path, bundle install does not vendor the version of Bundler itself into the configured path. I think this is something we should improve upstream, but not there yet.

I created a patch the fixed tests for me locally, but it's just a WIP, not really happy with it yet. I'm working on simplifying things at #4973, but I share it here just in case:

diff --git a/bundler/Earthfile b/bundler/Earthfile index 9c868ecc1..716943727 100644 --- a/bundler/Earthfile +++ b/bundler/Earthfile @@ -52,15 +52,12 @@ bundler-v1: COPY helpers/v1/Gemfile . COPY helpers/v1/run.rb . - RUN gem update --system $RUBYGEMS_SYSTEM_VERSION --no-document && \ - gem install bundler:$BUNDLER_VERSION --no-document && \ - bundle config --local without "test" && \ - bundle install --jobs 4 --retry 3 && \ - bundle package && \ - rm -rf /var/lib/gems/*/cache/* - - RUN cd vendor/cache && \ - gem fetch bundler -v $BUNDLER_VERSION + RUN gem update --system $RUBYGEMS_SYSTEM_VERSION --no-document \ + && gem install bundler:$BUNDLER_VERSION --no-document \ + && bundle config --local path ".bundle" \ + && bundle install --jobs 4 --retry 3 \ + && gem install bundler:$BUNDLER_VERSION --no-document --install-dir .bundle/ruby/2.7.0 \ + && rm -rf /var/lib/gems/*/cache/* SAVE ARTIFACT . helpers @@ -74,14 +71,11 @@ bundler-v2: RUN gem update --system $RUBYGEMS_SYSTEM_VERSION --no-document \ && gem install bundler:$BUNDLER_VERSION --no-document \ - && bundle config --local without "test" \ + && bundle config --local path ".bundle" \ && bundle install --jobs 4 --retry 3 \ - && bundle package \ + && gem install bundler:$BUNDLER_VERSION --no-document --install-dir .bundle/ruby/2.7.0 \ && rm -rf /var/lib/gems/*/cache/* - RUN cd vendor/cache \ - && gem fetch bundler -v $BUNDLER_VERSION - SAVE ARTIFACT . helpers deps: diff --git a/bundler/lib/dependabot/bundler/native_helpers.rb b/bundler/lib/dependabot/bundler/native_helpers.rb index d15f59907..b40783691 100644 --- a/bundler/lib/dependabot/bundler/native_helpers.rb +++ b/bundler/lib/dependabot/bundler/native_helpers.rb @@ -49,7 +49,8 @@ module Dependabot # Bundler will pick the matching installed major version "BUNDLER_VERSION" => bundler_version, "BUNDLE_GEMFILE" => File.join(helpers_path, "Gemfile"), - "GEM_PATH" => File.join(helpers_path, "vendor/cache") + # Prevent the GEM_HOME from being set to a folder owned by root + "GEM_HOME" => File.join(helpers_path, ".bundle", "ruby", "2.7.0") } ) rescue SharedHelpers::HelperSubprocessFailed => e

@deivid-rodriguez Thanks for explaining all that. A few follow-up questions:

Is vendor a valid GEM_PATH? My understanding is that RubyGems looks for .gem files in a cache subdirectory of a gems dir. Is that sufficient? Or does it also need the sources in gems/<gem_name>/?

Do you know if there's a way to prevent Bundler from hardcoding that ruby/2.7.0 path?

Alternatively, do you think we could we sidestep this issue with Traveling Ruby?

Hei @mattt, I just noticed that while trying to reorganize my previous comment I copied over some stuff and ended up repeating the same things again and again, sorry 😓.

Anyways, regarding your questions. A valid path for GEM_PATH has the following structure:

ls -la /Users/deivid/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0 total 0 drwxr-xr-x 11 deivid staff 352 28 abr 16:53 . drwxr-xr-x 3 deivid staff 96 15 abr 12:41 .. drwxr-xr-x 4 deivid staff 128 28 abr 10:39 bin drwxr-xr-x 2 deivid staff 64 15 abr 12:41 build_info drwxr-xr-x 3 deivid staff 96 20 abr 08:42 bundler drwxr-xr-x 502 deivid staff 16064 28 abr 16:53 cache drwxr-xr-x 4 deivid staff 128 15 abr 22:40 doc drwxr-xr-x 3 deivid staff 96 15 abr 12:41 extensions drwxr-xr-x 564 deivid staff 18048 28 abr 16:53 gems drwxr-xr-x 4 deivid staff 128 26 abr 13:51 plugins drwxr-xr-x 491 deivid staff 15712 28 abr 16:53 specifications

When you configure bundle config path vendor, then bundle install will generate a similar structure at the vendor/ruby/2.7.0 folder.

My understanding is that RubyGems looks for .gem files in a cache subdirectory of a gems dir. Is that sufficient?

Correct, bundle cache provides something similar but that's supposed to be application specific, rather than global for your Ruby installation.

Or does it also need the sources in gems/<gem_name>/?

Overall, I think setting GEM_PATH like we are doing now shouldn't be necessary in general, since Dependabot does not really install any gems, only resolves versions and updates files. I think it's needed exclusively for some code related to vendoring gems that needs to download some gems, so dependabot ends up trying to write to the cache of gems in GEM_PATH, which is by default not writable. So, yeah, I think only the cache should be needed for us.

Do you know if there's a way to prevent Bundler from hardcoding that ruby/2.7.0 path?

That should be easy to do, "ruby" is RUBY_ENGINE, and "2.7.0" is the ABI version, which would be something like RUBY_VERSION.sub(/\d+\z/, "0").

Regarding, "traveling ruby", I'm honestly not sure.

Do you know if there's a way to prevent Bundler from hardcoding that ruby/2.7.0 path?

Oh, I think I misunderstood this question. You're asking if there's a way for Bundler to install to vendor directly instead of vendor/ruby/2.7.0, right? If that's the case, no, there's no such option at the moment.

mattt · 2022-04-21T13:42:36Z

common/Earthfile

+FROM ruby:2.7.1
+WORKDIR /dependabot-common
+
+CREATE_DEPENDABOT_USER:


This user-defined command is run whenever switch to the non-root user. It creates the dependabot user and group if they don't already exist in an idempotent way. As a result, ecosystems can be setup in any order, in any combination.

mattt · 2022-04-21T13:43:18Z

common/Earthfile

+            chown dependabot:dependabot $DEPENDABOT_NATIVE_HELPERS_PATH; \
+        fi
+
+CONFIGURE_GIT_USER:


This user-defined command is run in test targets for ecosystems with specs that rely on a default git user being configured.

python/Earthfile

mattt · 2022-04-21T13:47:38Z

python/Earthfile

+    COPY .rubocop.yml .
+    RUN bundle exec rubocop .
+
+lint-helpers:


This is a good example of how Earthly makes it easy to incorporate ecosystem-specific tools without creating a burdensome dependency.

deivid-rodriguez · 2022-04-21T15:03:23Z

@mattt This looks very nice, I'm happy to give it a try and try figure out some of the CI issues!

mctofu

Nice work so far! I'm still learning how this works but I gave it a test drive today and left a few notes.

.github/workflows/ci.yml

Earthfile

mctofu · 2022-04-21T21:05:40Z

README.md

-$ bin/docker-dev-shell
-=> running docker development shell
-$ bin/dry-run.rb go_modules rsc/quote
+$ docker run -it dependabot/dependabot-core \


Just FYI, this script seems to be missing from the image?

$ docker run -it dependabot/dependabot-core bin/dry-run.rb /bin/sh: 0: Can't open bin/dry-run.rb

Didn't see it included in the dependabot-core dir or the dev image either.

Update: I included the bin directory in the image and now get this error:

$ bin/dry-run.rb Traceback (most recent call last): 11: from bin/dry-run.rb:69:in `<main>' 10: from /usr/lib/ruby/gems/2.7.0/gems/bundler-2.3.12/lib/bundler.rb:151:in `setup' 9: from /usr/lib/ruby/gems/2.7.0/gems/bundler-2.3.12/lib/bundler/runtime.rb:18:in `setup' 8: from /usr/lib/ruby/gems/2.7.0/gems/bundler-2.3.12/lib/bundler/definition.rb:238:in `specs_for' 7: from /usr/lib/ruby/gems/2.7.0/gems/bundler-2.3.12/lib/bundler/definition.rb:190:in `specs' 6: from /usr/lib/ruby/gems/2.7.0/gems/bundler-2.3.12/lib/bundler/definition.rb:468:in `materialize' 5: from /usr/lib/ruby/gems/2.7.0/gems/bundler-2.3.12/lib/bundler/definition.rb:269:in `resolve' 4: from /usr/lib/ruby/gems/2.7.0/gems/bundler-2.3.12/lib/bundler/resolver.rb:23:in `resolve' 3: from /usr/lib/ruby/gems/2.7.0/gems/bundler-2.3.12/lib/bundler/resolver.rb:48:in `start' 2: from /usr/lib/ruby/gems/2.7.0/gems/bundler-2.3.12/lib/bundler/resolver.rb:252:in `verify_gemfile_dependencies_are_found!' 1: from /usr/lib/ruby/gems/2.7.0/gems/bundler-2.3.12/lib/bundler/resolver.rb:252:in `map!' /usr/lib/ruby/gems/2.7.0/gems/bundler-2.3.12/lib/bundler/resolver.rb:269:in `block in verify_gemfile_dependencies_are_found!': Could not find gem 'rubocop' in locally installed gems. (Bundler::GemNotFound)

There might be something from the dev Dockerfile we'd need to run?

dependabot-core/Dockerfile.development

Lines 39 to 54 in 8a0b4f9

RUN GREEN='\033[0;32m'; NC='\033[0m'; \

for d in `find ${CODE_DIR} -type f -mindepth 2 -maxdepth 2 \

-not -path "${CODE_DIR}/common/Gemfile" \

-name 'Gemfile' | xargs dirname`; do \

echo && \

echo "---------------------------------------------------------------------------" && \

echo "Installing gems for ${GREEN}$(realpath --relative-to=${CODE_DIR} $d)${NC}..." && \

echo "---------------------------------------------------------------------------" && \

cd $d && bundle install; \

done

COPY --chown=dependabot:dependabot omnibus/Gemfile omnibus/dependabot-omnibus.gemspec ${CODE_DIR}/omnibus/

RUN cd omnibus \

&& bundle install

# Make omnibus gems available to bundler in root directory

RUN echo 'eval_gemfile File.join(File.dirname(__FILE__), "omnibus/Gemfile")' > Gemfile

@mctofu Sorry for the confusion there. I messed up my original +docker target. I got the dry-run script to work for me locally using these commands:

$ earthly --use-inline-cache +docker --development=true $ docker run -it dependabot/dependabot-core-development bin/dry-run.rb go_modules rsc/quote

mctofu · 2022-04-21T22:10:24Z

Earthfile

+
+    ENTRYPOINT ["/bin/sh"]
+
+    IF [ $development ]


This isn't about this particular line but I wanted to comment on this line in the PR description:

No more bin/docker-dev-shell: Rebuilding containers is fast enough that it's no longer necessary to develop within a Docker image.

I'm probably doing something wrong but I'm not finding it fast to rebuild containers. I installed earthly in a 32 core codespace and it seems to do a lot of work each time I run earthly +docker --development=true, even if there are no changes. It's definitely faster than rebuilding from scratch but I'm finding it takes 15s to complete with no changes and a minute if I make a change to the npm readme. Does it run faster for you?

I often have a lot of learning to do when working on core and tend to make a lot of small, iterative changes to debug. If it's going to add more than an extra second or two to rebuild & try a change I think I'd prefer to keep the dev container to keep the feedback loop quick.

I also think this might hamper my current workflow of running rubocop -A in the container to fix my poorly styled code? I currently rely on the volume mounting done by the dev shell script to get the edits back into my workspace.

No more bin/docker-dev-shell: Rebuilding containers is fast enough that it's no longer necessary to develop within a Docker image.
I currently rely on the volume mounting done by the dev shell script to get the edits back into my workspace.

I am behind the curve on reading in on Earthly adoption, apologies - I also wanted to comment on this part of the description.

We have an internal debugging workflow which effectively wraps the bin/docker-dev-shell and volume mounting to allow us to reproduce some issues that involve the production network stack.

It might be possible to just relocate the script into the network debugging tool but it is useful to have it travel with core right now as that means it gets more use/exposure than it would in the debugging kit.

@mctofu @brrygrdn Thanks for your feedback about bin/docker-dev-shell. I'll admit that I haven't relied on that myself, and don't have a good understanding of how it's being used.

I'm probably doing something wrong but I'm not finding it fast to rebuild containers. I installed earthly in a 32 core codespace and it seems to do a lot of work each time I run earthly +docker --development=true, even if there are no changes. It's definitely faster than rebuilding from scratch but I'm finding it takes 15s to complete with no changes and a minute if I make a change to the npm readme. Does it run faster for you?

On my 2017 iMac, it takes ~20 seconds to rebuild with the --use-inline-cache (down from ~1 minute without):

$ time earthly --use-inline-cache +docker --development=true # ... ________________________________________________________ Executed in 19.51 secs fish external $ time earthly +docker --development=true # ... ________________________________________________________ Executed in 56.47 secs fish external

How much faster does it run with --use-inline-cache?

As a recovering iOS developer, I have a higher tolerance for long compile times. 😅 But I totally get where you're coming from, and want to have a solution that can provide a quick feedback loop.

I often have a lot of learning to do when working on core and tend to make a lot of small, iterative changes to debug. If it's going to add more than an extra second or two to rebuild & try a change I think I'd prefer to keep the dev container to keep the feedback loop quick.

If we can't get the feedback loop fast enough with cached rebuilds, then we have a couple options:

Run the dependabot-core-development image with a bind mount, as in bin/docker-dev-shell, or

Get the same effect developing within Codespaces with dependabot-core-development as the base image.

If the docker run invocation is cumbersome, we could create a new Earthly target like this:

dev-shell: LOCALLY BUILD (+docker --development=true) RUN docker run --rm dependabot-core-development \ -v $(PWD):/home/dependabot/dependabot-core # etc.

I also think this might hamper my current workflow of running rubocop -A in the container to fix my poorly styled code? I currently rely on the volume mounting done by the dev shell script to get the edits back into my workspace.

That's a great call-out. We should be able to support that by doing something like this:

rubocop: FROM ruby:2.7.1 RUN gem install rubocop --no-document CMD ["rubocop"] SAVE IMAGE rubocop lint: LOCALLY ARG autocorrect ARG --required path WITH DOCKER --load=+rubocop RUN docker run --rm rubocop \ -v $(PWD):/ \ -- ${autocorrect:+"-A"} ${path} END

Then, to run the linter, you'd do:

$ earthly +lint --autocorrect=true --path=bundler

@mctofu What do you think about @brrygrdn's suggestion to move the dev shell out of core?

We have an internal debugging workflow which effectively wraps the bin/docker-dev-shell and volume mounting to allow us to reproduce some issues that involve the production network stack.

It might be possible to just relocate the script into the network debugging tool but it is useful to have it travel with core right now as that means it gets more use/exposure than it would in the debugging kit.

I just updated the +docker --development=true target to be closer to our original, but it still copies over sources rather than mounting local ecosystem directories as volumes. Do you think that'll work for your development flow?

Alternatively, we could continue to ship Dockerfile.development and bin/docker-dev-shell (since it can still build off a dependabot-core image built with Earthly).

Long-term, I'd love for us to formalize dry-run into a real CLI and have that be the default CMD / ENTRYPOINT for dependabot-core. Maybe we can keep some existing dev shell stuff around until it gets replaced (and decoupled from our debugging process).

Unless we can get the time it takes to make a code edit & see it run to within a few seconds I think we should keep the volume mounting support around.

I haven't been a fan of using the core image as a codespace dev container as that gets really cumbersome when you end up needing to change the Dockerfile. I don't think it necessarily needs to be via Dockerfile.development and bin/docker-dev-shell though. If we do the volume mounts for a dev shell via earthly to get the same result that could be sufficient?

I think it'd also be great if we could rework the dry-run script to work more independently on a per ecosystem basis. Currently it supports all ecosystems so if I just want to dry run the go modules updater I still need to build/install a lot of things for the other ecosystems. The dev experience would be much better if getting into a dev shell was quick because it only required building the tooling for that ecosystem.

mctofu · 2022-04-21T22:13:48Z

go_modules/Earthfile

+    COPY --chown=dependabot:dependabot +helpers/helper $DEPENDABOT_NATIVE_HELPERS_PATH/go_modules/bin/helper
+
+go:
+    FROM golang:$GOLANG_VERSION-buster


If this was a Dockerfile we could have Dependabot keep this up to date for us.

I wonder how much effort it would be to support updating Earthfiles? Just a random thought... not something we need to work on now.

BTW earthly debug ast ./Earthfile | jq returns the AST, which can be easily interpreted. The AST format is just json and is spec'd here: https://github.com/earthly/earthly/blob/main/ast/spec/earthfile.go#L47.

A FROM looks like this, for example:

{"name":"FROM","args":["golang:1.17-alpine3.14"]}

mctofu · 2022-04-21T22:20:36Z

.github/workflows/ci.yml

-            --cache-from "$CORE_IMAGE:latest" \
-            --cache-from "$CORE_BRANCH_IMAGE:$BRANCH_REF" \
-            .
-      - name: Push dependabot-core-branch image to GHCR


We do sometimes use these branch builds for testing. I don't think it's worth keeping in here since it's a lot simpler without it! We could use a tag or branch prefix to trigger branch build in the docker.yml if we need it.

jurre · 2022-04-22T10:13:02Z

Nice work on this! One thing that I was wondering is, if I understand things correctly we're running the tests against ecosystem specific containers (making them faster, small containers go brrrrr 🏎️!), but then compiling all those into a single monolithic containers that we run in production. With that setup, do we run the risk of having dependencies from one ecosystems interfere with another, and only noticing this once it lands in prod?

deivid-rodriguez · 2022-04-22T16:19:30Z

I was also wondering whether multi-stage dockerfiles were considered. It seems to me they would have the potential to make things faster and thiner, and would deviate less from the current setup (which is both bad since they probably still have some of the downsides fixed here, but also good since it would probably be a simpler and with less friction migration).

mattt · 2022-04-28T18:29:50Z

Nice work on this! One thing that I was wondering is, if I understand things correctly we're running the tests against ecosystem specific containers (making them faster, small containers go brrrrr 🏎️!), but then compiling all those into a single monolithic containers that we run in production. With that setup, do we run the risk of having dependencies from one ecosystems interfere with another, and only noticing this once it lands in prod?

@jurre Yes, there's a risk that the sum of each part is different from the whole. And we'll definitely want to test that. What I'm hoping to do here is to create a fast happy path for changes covered by unit tests, and use integration / end-to-end / smoke tests to catch holistic problems.

I was also wondering whether multi-stage dockerfiles were considered. It seems to me they would have the potential to make things faster and thiner, and would deviate less from the current setup (which is both bad since they probably still have some of the downsides fixed here, but also good since it would probably be a simpler and with less friction migration).

@deivid-rodriguez Multi-stage builds primarily help with image size, since they let you isolate the context in which artifacts are built from the image you ship. But the current Dockerfile is actually quite lean — running docker-slim and dive, I couldn't find much that could be removed. Dependabot relies on the language runtimes that other systems might strip out for production images.

However, they can't help as much to speed things up. With multi-stage builds, you can stop at a particular target, but I don't know of a way to avoid building everything up until that point.

deivid-rodriguez · 2022-04-29T07:48:15Z

@mattt To be honest I never worked with multi-stage docker builds before but my initial understanding when first reading about it was that as opposed to standard docker layers, "stages" are independent of each other, in particular, independent of the order in which they are defined. So a change in any of the stages only triggers changes in stages dependent on it.

So, take the following Dockerfile for example:

FROM ruby:2.7 as ruby
RUN gem install bundler:2.3.11

FROM node:16 as javascript
RUN npm install -g npm@v8.5.1

FROM ubuntu:20.04 as main
COPY --from=ruby /usr/local/bin/bundle /usr/local/bin/bundle
COPY --from=javascript /usr/local/bin/npm /usr/local/bin/npm

My understanding is that changing the second line to instead install bundler 2.3.12 will only trigger a rebuild of the ruby stage and the main stage, while the javascript stage can stay cached. With a standard Dockerfile, however, as long as ruby steps are defined before javascript steps, any change in ruby would trigger a rebuild of every layer after that.

mattt · 2022-05-02T19:26:36Z

My understanding is that changing the second line to instead install bundler 2.3.12 will only trigger a rebuild of the ruby stage and the main stage, while the javascript stage can stay cached. With a standard Dockerfile, however, as long as ruby steps are defined before javascript steps, any change in ruby would trigger a rebuild of every layer after that.

@deivid-rodriguez Sorry, I jumped right over the issue of build time to size for some reason. 😅 But yeah, that's also my understanding of how things work. Unfortunately, some of the stages involve installing system libraries (including bundler, composer, and npm_and_yarn), so this may not be a good fit. Unless we're confident about what's a build time dependency and what's a run time dependency, a multi-stage build would require a lot of copy-paste duplication, which is liable to get out of sync.

mattt · 2022-06-21T17:56:54Z

Closing for now, because the main branch of my fork is being used for other work.

jeffwidman · 2022-08-31T01:10:41Z

It's sad that the diff here was completely lost... I realize we paused this work for now, but any chance you have it sitting in a feature branch and could open a different PR so that the diff is available for anyone exploring this in the future? For example I came here from #946 (comment) as I was curious to see how it was wired up.

mattt · 2022-09-23T16:56:29Z

@jeffwidman I can't reopen the PR to change the base branch, but all of that code can be found here: https://github.com/mattt/dependabot-core/tree/earthly

mattt commented Apr 21, 2022

View reviewed changes

python/Earthfile Outdated Show resolved Hide resolved

mattt commented Apr 21, 2022

View reviewed changes

python/Earthfile Outdated Show resolved Hide resolved

mattt commented Apr 21, 2022

View reviewed changes

mattt mentioned this pull request Apr 21, 2022

Reduce bundler version hardcoding #4973

Merged

mctofu reviewed Apr 21, 2022

View reviewed changes

mattt mentioned this pull request May 9, 2022

Docker image is huge #946

Closed

4 tasks

pavera mentioned this pull request May 25, 2022

Upgrade pip inside of any newly created pyenv #5195

Merged

jeffwidman mentioned this pull request May 31, 2022

Use inferred flutter version #5207

Merged

Refactor CI workflow

093d276

mattt closed this Jun 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Earthly for Build Automation #5024

Use Earthly for Build Automation #5024

mattt commented Apr 21, 2022 •

edited

Loading

mattt Apr 21, 2022

mattt Apr 21, 2022

deivid-rodriguez Apr 22, 2022

mattt Apr 28, 2022

deivid-rodriguez Apr 29, 2022 •

edited

Loading

deivid-rodriguez May 3, 2022

mattt Apr 21, 2022 •

edited

Loading

mattt Apr 21, 2022

mattt Apr 21, 2022

deivid-rodriguez commented Apr 21, 2022

mctofu left a comment

mctofu Apr 21, 2022

mctofu Apr 21, 2022

mattt May 24, 2022

mctofu Apr 21, 2022

brrygrdn Apr 25, 2022

mattt Apr 28, 2022

mattt May 24, 2022 •

edited

Loading

mctofu May 25, 2022

mctofu Apr 21, 2022

vladaionescu Apr 22, 2022

mctofu Apr 21, 2022

jurre commented Apr 22, 2022

deivid-rodriguez commented Apr 22, 2022

mattt commented Apr 28, 2022 •

edited

Loading

deivid-rodriguez commented Apr 29, 2022 •

edited

Loading

mattt commented May 2, 2022

mattt commented Jun 21, 2022

jeffwidman commented Aug 31, 2022 •

edited

Loading

mattt commented Sep 23, 2022

	RUN GREEN='\033[0;32m'; NC='\033[0m'; \
	for d in `find ${CODE_DIR} -type f -mindepth 2 -maxdepth 2 \
	-not -path "${CODE_DIR}/common/Gemfile" \
	-name 'Gemfile' \| xargs dirname`; do \
	echo && \
	echo "---------------------------------------------------------------------------" && \
	echo "Installing gems for ${GREEN}$(realpath --relative-to=${CODE_DIR} $d)${NC}..." && \
	echo "---------------------------------------------------------------------------" && \
	cd $d && bundle install; \
	done

	COPY --chown=dependabot:dependabot omnibus/Gemfile omnibus/dependabot-omnibus.gemspec ${CODE_DIR}/omnibus/
	RUN cd omnibus \
	&& bundle install
	# Make omnibus gems available to bundler in root directory
	RUN echo 'eval_gemfile File.join(File.dirname(__FILE__), "omnibus/Gemfile")' > Gemfile

Use Earthly for Build Automation #5024

Use Earthly for Build Automation #5024

Conversation

mattt commented Apr 21, 2022 • edited Loading

Motivation

Results

How It Works

Results

Next Steps

Fix Failing Tests

Bundler (3)

Composer (1)

Python (1)

Update devcontainer.json

Smoke Test Container

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deivid-rodriguez Apr 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattt Apr 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deivid-rodriguez commented Apr 21, 2022

mctofu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattt May 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jurre commented Apr 22, 2022

deivid-rodriguez commented Apr 22, 2022

mattt commented Apr 28, 2022 • edited Loading

deivid-rodriguez commented Apr 29, 2022 • edited Loading

mattt commented May 2, 2022

mattt commented Jun 21, 2022

jeffwidman commented Aug 31, 2022 • edited Loading

mattt commented Sep 23, 2022

mattt commented Apr 21, 2022 •

edited

Loading

Update `devcontainer.json`

deivid-rodriguez Apr 29, 2022 •

edited

Loading

mattt Apr 21, 2022 •

edited

Loading

mattt May 24, 2022 •

edited

Loading

mattt commented Apr 28, 2022 •

edited

Loading

deivid-rodriguez commented Apr 29, 2022 •

edited

Loading

jeffwidman commented Aug 31, 2022 •

edited

Loading