Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default value of smithy4sOutputDir can break codegen cache #1495

Closed
majk-p opened this issue Apr 19, 2024 · 5 comments · Fixed by #1499
Closed

Default value of smithy4sOutputDir can break codegen cache #1495

majk-p opened this issue Apr 19, 2024 · 5 comments · Fixed by #1499

Comments

@majk-p
Copy link
Contributor

majk-p commented Apr 19, 2024

TL;DR

Can we please change:

config / smithy4sOutputDir := (config / sourceManaged).value / "scala"

to something like:

config / smithy4sOutputDir := (config / sourceManaged).value / "scala" / "smithy4s"

Or exclude codegenArgs.output from cache altogether?

Problem

I'm working on an sbt plugin that uses smithy4s, but also generates some of it's own managed sources. Smithy4s produces files to smithy4sOutputDir which by default is defined as smithy4sOutputDir := (config / sourceManaged).value / "scala". This value is then later used by Smithy4sCodegenPlugin as a part of codegenArgs used to produce files. The entire CodegenArgs object is used as a cache key and thus the output counts in. That wouldn't be a problem because the path would just remain cached as a String, but smithy4s also overrides the json format for os.Path in a way that calculates the hash of the entire directory content. It isn't a problem as long as smithy4s is the only source generator. When another plugin adds managed sources the hashes change and we stop hitting cache. In some cases this might lead to repeated very long compilation times.

Perhaps it's worth reconsidering if smithy4s codegen cache should rely on the contents of source_managed and resource_managed at all, since it seems easy to invalidate the cache.

It's also not obvious to the end user why the cache got invalidated. In my debugging case I would be very happy if the plugin told me why it decided to rerun codegen even though smithy sources remained unchanged.

Solution

I think adding another segment in the path should solve this and save fellow sbt plugin authors some headache. This way other plugins can either use the root sourceManaged / scala directory or anything different than sourceManaged / scala / smithy4s.

The other option is to modify JsonConverters.codegenArgsIso in a way that it doesn't serialize output.

@kubukoz
Copy link
Member

kubukoz commented Apr 19, 2024

I'm thinking to do both:

  • nest the generated sources under an extra directory so that it's clearly separated from other codegens
  • remove the output's contents from the input hash, that feels like something we got "for free" unintentionally

@Baccata
Copy link
Contributor

Baccata commented Apr 22, 2024

I agree with Jakub.

@majk-p
Copy link
Contributor Author

majk-p commented Apr 22, 2024

Sounds good, I can raise a PR to address this

@majk-p
Copy link
Contributor Author

majk-p commented Apr 22, 2024

Working on my initial problem I've noticed an additional issue around pathFormat

implicit val pathFormat: JsonFormat[os.Path] =
  BasicJsonProtocol.projectFormat[os.Path, HashFileInfo](
    p => {
      if (os.isFile(p)) FileInfo.hash(p.toIO)
      else
        // If the path is a directory, we get the hashes of all files
        // then hash the concatenation of the hash's bytes.
        FileInfo.hash(
          p.toIO,
          Hash(
            os.walk(p)
              .map(_.toIO)
              .map(Hash(_))
              .foldLeft(Array.emptyByteArray)(_ ++ _)
          )
        )
    },
    hash => os.Path(hash.file)
  )

It assumes all paths will exist, which is not always the case. In one of my sub-modules I noticed the resource_managed was not present, which silently causes the cache miss and code regeneration. Since we're about to exclude hashes of the outputs (both sources and resources) the issue should disappear for now, but this may require some more thought in future.

@kubukoz
Copy link
Member

kubukoz commented Apr 22, 2024

The paths are checked for existence first:

val inputFiles =
  (inputDirs ++ generatedFiles)
    .filter(_.exists())
    .toList

but only for the inputs 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants