Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSD Optimizer: Improve strategy for detecting stale compilation targets #8428

Closed
Swiddis opened this issue Oct 1, 2024 · 4 comments · Fixed by #8472
Closed

OSD Optimizer: Improve strategy for detecting stale compilation targets #8428

Swiddis opened this issue Oct 1, 2024 · 4 comments · Fixed by #8472
Labels
enhancement New feature or request

Comments

@Swiddis
Copy link
Contributor

Swiddis commented Oct 1, 2024

Is your feature request related to a problem? Please describe.

Coming from opensearch-project/dashboards-observability#2188, I'm hoping to implement caching for the compilation step of OSD, because this takes more than 10 minutes per CI run. Across all of our integ test workflows, that's multiple hours of compute time per code change.

I found that osd-optimizer determines whether to build files based on the last-modified time (mtime), instead of being based on the actual content of the file. This means that the optimizer can never work in CI, because pulling code will always update the mtime.

Describe the solution you'd like

In .osd-optimizer-cache, instead of the cacheKey depending on mtimes

{
  // ...
  "cacheKey": {
    // ...
    "mtimes": {
      ".../node_modules/@elastic/apm-rum-core/node_modules/error-stack-parser/package.json": 1689369687625,
      ".../node_modules/@elastic/apm-rum-core/node_modules/stackframe/package.json": 1689369687854,
      ".../node_modules/@elastic/apm-rum-core/package.json": 1689369687355,
      ".../node_modules/@elastic/apm-rum/package.json": 1689369687118,
      ".../node_modules/@elastic/charts/dist/theme.scss": 1727807816498.0383,
      ".../node_modules/@elastic/eui/src/global_styling/functions/_colors.scss": 1725572246345.999,
      ".../node_modules/@elastic/eui/src/global_styling/functions/_index.scss": 1725572246345.999,
      ".../node_modules/@elastic/eui/src/global_styling/functions/_math.scss": 1725572246345.999,
      // ...
    }
  },
  // ...
}

Base it on a hash of these files' content (e.g. sha256, blake3, xxhash, siphash...)

{
  // ...
  "cacheKey": {
    // ...
    "file_hashes": {
      ".../node_modules/@elastic/apm-rum-core/node_modules/error-stack-parser/package.json": "f90bcab5e738adcc6cbfe48280633df341a36bfb7a0344903e9122023c86f9b2",
      ".../node_modules/@elastic/apm-rum-core/node_modules/stackframe/package.json": "fd31383a3ec16fbc09be017856b4438c9b8c33f134cf6be25979953d56f5ec73",
      ".../node_modules/@elastic/apm-rum-core/package.json": "6b1cd6ad20a0b7f351dd838b6365b2fcd9cfa8c4185c9e61fd4dccfef2034ae6",
      ".../node_modules/@elastic/apm-rum/package.json": "3135c78159ac6dcf0c00f7b0c21a6c85b4b4e67805f5cf7f766bff10b63f9eb7",
      ".../node_modules/@elastic/charts/dist/theme.scss": "116e43edbab9d1b81e47ad1391b7b159ba7e9546e4d6462a606fdd8057fb6422",
      ".../node_modules/@elastic/eui/src/global_styling/functions/_colors.scss": "22739aa86cde21eac0c9685e61b4c94d986fcb69193c6f8f28d1ebafcae975f1",
      ".../node_modules/@elastic/eui/src/global_styling/functions/_index.scss": "e825b6dccb6a688ca0109c52a632beb90e365bf4463fef1a02a6cfedd438078f",
      ".../node_modules/@elastic/eui/src/global_styling/functions/_math.scss": "866ad28b44e11c7d3c08e1ef65cf5ce0bc3dc35a8bbd3410e9a4baa6b79ea905",
      // ...
    }
  },
  // ...
}

This makes it possible to cache build artifacts even when moving files between machines. Instead of actual_mtime > cache_mtime, check actual_hash != cache_hash.

Describe alternatives you've considered

N/A

Additional context

N/A

@Swiddis
Copy link
Contributor Author

Swiddis commented Oct 2, 2024

Experimented with this more on the downstream issue, see opensearch-project/dashboards-observability#2188 (comment). I experimentally verified that there's no build time savings at all when using the current cache system.

osd_build

@Swiddis Swiddis changed the title OSD Optimizer: Cache on file content hash instead of last-modified-time OSD Optimizer: Improve caching strategy for detecting stale files Oct 2, 2024
@Swiddis Swiddis changed the title OSD Optimizer: Improve caching strategy for detecting stale files OSD Optimizer: Improve strategy for detecting stale compilation targets Oct 2, 2024
@Swiddis
Copy link
Contributor Author

Swiddis commented Oct 2, 2024

Proof of concept that works locally, I replaced the getMtimes method in-place to compute hashes instead of the actual mtimes: Swiddis@1e2529c.

Only issue is that it'd never get through PR review to keep the old naming of everything. Also need to figure out how to test it in the real build.

@Swiddis
Copy link
Contributor Author

Swiddis commented Oct 2, 2024

Set up some job runs connected to the POC:

It works as expected. (That's not a vertical line -- it takes 0.4 seconds to hash all of the source files)
osd_build_2

@Swiddis
Copy link
Contributor Author

Swiddis commented Oct 3, 2024

To avoid all the renaming work, we can just say MTIME stands for "Modern Tracking of Incremental Modification Events"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant