Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing module reference in module causes resolve to stack overflow (500USD Bounty) #3715

Closed
KireinaHoro opened this issue Oct 10, 2024 · 15 comments
Labels
Milestone

Comments

@KireinaHoro
Copy link

KireinaHoro commented Oct 10, 2024


From the maintainer Li Haoyi: I'm putting a 500USD bounty on this issue, payable by bank transfer on a merged PR implementing this.

The task is to appropriately detect module cycles during task resolution time, report a nice user-friendly error, and direct the user towards use of ModuleRef to fix the issue. This should be covered by unit tests, and we should verify that the module cycles in the code sample below gets properly recognized and handled.


I'm trying to work out a pattern that would allow the consumer of a foreign module swap out dependencies of modules. Given the following module definition:

import mill._, scalalib._

trait CommonModule extends ScalaModule {
  def scalaVersion = "2.13.12"
  def m1 = mym1
  def m2 = mym2
}

object mym1 extends M1
trait M1 extends CommonModule

object mym2 extends M2
trait M2 extends CommonModule {
  override def moduleDeps = super.moduleDeps ++ Agg(m1)
}

One can then define another build.sc, import the above, instantiate new M1 and M2 (for example with new Scala versions), and then have the newly instantiated M2 depend on M1:

import mill._, scalalib._
import $file.dep.build

trait MyModules {
  def m1 = newm1
  def m2 = newm2
}
object newm1 extends dep.build.M1 with MyModules
object newm2 extends dep.build.M2 with MyModules

However this breaks resolve with a stack overflow error:

$ mill resolve __
No mill version specified.
You should provide a version via '.mill-version' file or --mill-version option.
Using mill version 0.11.12
[1/1] resolve 
Exception in thread "MillServerActionRunner" java.lang.StackOverflowError
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/java.lang.Class.getClasses(Class.java:1752)
        at mill.define.Reflect$.reflectNestedObjects0(Reflect.scala:77)
        at mill.resolve.ResolveCore$.resolveDirectChildren0(ResolveCore.scala:349)
        at mill.resolve.ResolveCore$.$anonfun$resolveDirectChildren$3(ResolveCore.scala:300)
        at scala.util.Either.flatMap(Either.scala:360)
        at mill.resolve.ResolveCore$.resolveDirectChildren(ResolveCore.scala:295)
        at mill.resolve.ResolveCore$.resolveTransitiveChildren(ResolveCore.scala:223)
        at mill.resolve.ResolveCore$$anonfun$1.applyOrElse(ResolveCore.scala:229)
        at mill.resolve.ResolveCore$$anonfun$1.applyOrElse(ResolveCore.scala:228)
        at scala.collection.StrictOptimizedIterableOps.collect(StrictOptimizedIterableOps.scala:151)
        at scala.collection.StrictOptimizedIterableOps.collect$(StrictOptimizedIterableOps.scala:136)
        at scala.collection.immutable.HashSet.collect(HashSet.scala:34)
        at mill.resolve.ResolveCore$.$anonfun$resolveTransitiveChildren$2(ResolveCore.scala:228)
        at scala.util.Either.map(Either.scala:390)
        at mill.resolve.ResolveCore$.$anonfun$resolveTransitiveChildren$1(ResolveCore.scala:226)
        at scala.util.Either.flatMap(Either.scala:360)
        at mill.resolve.ResolveCore$.resolveTransitiveChildren(ResolveCore.scala:224)
        at mill.resolve.ResolveCore$$anonfun$1.applyOrElse(ResolveCore.scala:229)
        at mill.resolve.ResolveCore$$anonfun$1.applyOrElse(ResolveCore.scala:228)
        at scala.collection.StrictOptimizedIterableOps.collect(StrictOptimizedIterableOps.scala:151)
        at scala.collection.StrictOptimizedIterableOps.collect$(StrictOptimizedIterableOps.scala:136)
        at scala.collection.immutable.HashSet.collect(HashSet.scala:34)
        at mill.resolve.ResolveCore$.$anonfun$resolveTransitiveChildren$2(ResolveCore.scala:228)
        at scala.util.Either.map(Either.scala:390)
        at mill.resolve.ResolveCore$.$anonfun$resolveTransitiveChildren$1(ResolveCore.scala:226)
        at scala.util.Either.flatMap(Either.scala:360)
        at mill.resolve.ResolveCore$.resolveTransitiveChildren(ResolveCore.scala:224)
        at mill.resolve.ResolveCore$$anonfun$1.applyOrElse(ResolveCore.scala:229)
        at mill.resolve.ResolveCore$$anonfun$1.applyOrElse(ResolveCore.scala:228)
        at scala.collection.StrictOptimizedIterableOps.collect(StrictOptimizedIterableOps.scala:151)
        at scala.collection.StrictOptimizedIterableOps.collect$(StrictOptimizedIterableOps.scala:136)
        at scala.collection.immutable.HashSet.collect(HashSet.scala:34)
        at mill.resolve.ResolveCore$.$anonfun$resolveTransitiveChildren$2(ResolveCore.scala:228)
        at scala.util.Either.map(Either.scala:390)
        at mill.resolve.ResolveCore$.$anonfun$resolveTransitiveChildren$1(ResolveCore.scala:226)
        at scala.util.Either.flatMap(Either.scala:360)
        at mill.resolve.ResolveCore$.resolveTransitiveChildren(ResolveCore.scala:224)
        at mill.resolve.ResolveCore$$anonfun$1.applyOrElse(ResolveCore.scala:229)
        at mill.resolve.ResolveCore$$anonfun$1.applyOrElse(ResolveCore.scala:228)
        at scala.collection.StrictOptimizedIterableOps.collect(StrictOptimizedIterableOps.scala:151)
        at scala.collection.StrictOptimizedIterableOps.collect$(StrictOptimizedIterableOps.scala:136)
        at scala.collection.immutable.HashSet.collect(HashSet.scala:34)
        at mill.resolve.ResolveCore$.$anonfun$resolveTransitiveChildren$2(ResolveCore.scala:228)
        at scala.util.Either.map(Either.scala:390)
[...]

After experimenting around a bit, it seems like the def m1 and def m2 module references are somehow picked up as submodules and thus resulted in an infinite loop. Wrapping the module references in Some(..) works just ok, however it's quite cumbersome to write and feels hacky. Can this be fixed somehow?

@lihaoyi
Copy link
Member

lihaoyi commented Oct 10, 2024

Thanks for the report. This is expected but not ideal. The way to work around this is to wrap the reference in a ModuleRef, and we should try to detect these infinite recursions and provide a proper error message suggesting that

@lefou
Copy link
Member

lefou commented Oct 12, 2024

The proper way to refer to already existing modules is to use ModuleRef.

import mill._, scalalib._

trait CommonModule extends ScalaModule {
  def scalaVersion = "2.13.12"
  def m1 = ModuleRef(mym1)
  def m2 = ModuleRef(mym2)
}

object mym1 extends M1
trait M1 extends CommonModule

object mym2 extends M2
trait M2 extends CommonModule {
  override def moduleDeps = super.moduleDeps ++ Agg(m1())
}

@lihaoyi We already try to detect cycles in moduleDeps. Also detecting them for module references looks like the right thing to do. I assume, the resolver could maintain just a list of already seen/referenced modules and report when it detect a second reference.

@lihaoyi lihaoyi added the bounty label Oct 16, 2024
@lihaoyi
Copy link
Member

lihaoyi commented Oct 16, 2024

Added a 500USD bounty if anyone wants to dig into this

@lihaoyi lihaoyi changed the title Storing module reference in module causes resolve to stack overflow Storing module reference in module causes resolve to stack overflow (500USD Bounty) Oct 16, 2024
@myyk
Copy link
Contributor

myyk commented Oct 28, 2024

I'm taking a crack at this.

myyk added a commit to myyk/mill that referenced this issue Oct 29, 2024
myyk added a commit to myyk/mill that referenced this issue Oct 29, 2024
myyk added a commit to myyk/mill that referenced this issue Oct 29, 2024
@myyk
Copy link
Contributor

myyk commented Oct 29, 2024

I wrote the starts of two rudimentary tests. Although, I haven't made it much further. I was hoping to build something like ModuleUtils does but there's a pretty big fundamental difference. For ModuleUtils's recursive method, there's an equality check that can be used between the dependencies to see if it's been seen before or not.

I could be wrong, but I think as ResolveCore is walking the graph, it's creating new objects through reflection. Which raises a question I don't know how to answer:

How can I check if two of these modules are the same and not just named the same and the same class?

@lefou
Copy link
Member

lefou commented Oct 29, 2024

@myyk This is exactly the point where I stopped to investigate. Since this is reflection land, we can't know for sure whether the instance is meant to be a singleton.

So, best is to always stick to the rule to use a ModuleRef whenever we want to keep a reference to another modules and only hold it directly, if it is meant to be a real child object.

@lefou
Copy link
Member

lefou commented Oct 29, 2024

We could try to look for patterns though. E.g. whenever we instantiate a module, we could look into the chain of parents. E.g. if we find the same types in a loop, multiple times (e.g. 3x), we should at least print a warning. So when we later run into a StackOverflowError, it would be the last message Mill printed before and should be as good as a good exception message.

@lihaoyi
Copy link
Member

lihaoyi commented Oct 29, 2024

I think during module resolution, if the recursion "stack" contains the same module class twice, that's probably a good enough signal that we should error. Note that most modules are singleton objects, so they have a more specific class than JavaModule, and basically all of them should be unique subclasses

Like @lefou suggests, we can probably say that all references to modules except their "primary" definition should be via ModuleRef, and therefore if during recursion we bump into the same exact class twice that should be grounds for raising an error

@lihaoyi
Copy link
Member

lihaoyi commented Oct 29, 2024

If I'm not mistaken, we also do materialize the concrete module objects as well at some point after traversing their classes/method-names. But this only happens after the stackoverflow has already occurred, and so it would not be useful for us to report errors at this stage

@lefou
Copy link
Member

lefou commented Oct 29, 2024

We do have objects in trait sometimes, which are valid child modules. So we need to be careful with errorring out just on heuristics. I know we had it in ScoverageModule (inner scoverage data holder module) but changed to lazy val to be overrideable, but I seen it now and then in Mill and external projects. Bloop had some implicit inner object of the same type, and so on.

@myyk
Copy link
Contributor

myyk commented Oct 29, 2024

We do have objects in trait sometimes, which are valid child modules. So we need to be careful with errorring out just on heuristics. I know we had it in ScoverageModule but changed to lazy val to be overrideable, but I seen it now and then in Mill and external projects.

Shouldn't those traits be wrapping those objects in ModuleRef?

@lefou
Copy link
Member

lefou commented Oct 29, 2024

We do have objects in trait sometimes, which are valid child modules. So we need to be careful with errorring out just on heuristics. I know we had it in ScoverageModule but changed to lazy val to be overrideable, but I seen it now and then in Mill and external projects.

Shouldn't those traits be wrapping those objects in ModuleRef?

Not if they meant to be concrete sub modules. object in traits are only singletons inside that scope. (Can't tell if they count as path dependent types, but kind of.)

@lihaoyi
Copy link
Member

lihaoyi commented Oct 29, 2024

We do have objects in trait sometimes, which are valid child modules. So we need to be careful with errorring out just on heuristics. I know we had it in ScoverageModule (inner scoverage data holder module) but changed to lazy val to be overrideable, but I seen it now and then in Mill and external projects. Bloop had some implicit inner object of the same type, and so on.

Ah good point, that makes it a bit trickier since they will have the exact same class even if they are different instances.

But even in that case, if the exact same class appears twice in the module stack during resolution, that is an indication that there is a loop in the resolution module graph, which is opens up the possibility of a stackoverflow. I think we should be able to safely disallow that and request that people use ModuleRef for such cases if they need references that do not convey a parent-child relationship

myyk added a commit to myyk/mill that referenced this issue Oct 30, 2024
@myyk
Copy link
Contributor

myyk commented Oct 30, 2024

I've caught some of the cyclic graphs but I'm not sure if my changes catches all of them. I added these test graphs:

  object CyclicModuleRefInitError extends TestBaseModule {
    import mill.Agg

    // See issue: https://github.com/com-lihaoyi/mill/issues/3715
    trait CommonModule extends TestBaseModule {
      def moduleDeps: Seq[CommonModule] = Seq.empty
      def a = myA
      def b = myB
    }

    object myA extends A
    trait A extends CommonModule
    object myB extends B
    trait B extends CommonModule {
      override def moduleDeps = super.moduleDeps ++ Agg(a)
    }
  }

  object CyclicModuleRefInitError2 extends TestBaseModule {
    // The cycle is in the child
    def A = CyclicModuleRefInitError
  }

  object CyclicModuleRefInitError3 extends TestBaseModule {
    // The cycle is in directly here
    object A extends Module {
      def b = B
    }
    object B extends Module {
      def a = A
    }
  }

  object CrossedCyclicModuleRefInitError extends TestBaseModule {
    object cross extends mill.Cross[Cross]("210", "211", "212")
    trait Cross extends Cross.Module[String] {
      def suffix = Task { crossValue }
      def c2 = cross2
    }

    object cross2 extends mill.Cross[Cross2]("210", "211", "212")
    trait Cross2 extends Cross.Module[String] {
      override def millSourcePath = super.millSourcePath / crossValue
      def suffix = Task { crossValue }
      def c1 = cross
    }
  }

Can any of you think of any more test cases with cycles you'd like to see covered?

@lihaoyi
Copy link
Member

lihaoyi commented Oct 30, 2024

@myyk I think that looks great. You covered the three major cases I can think of: "vanilla" circular references, circular references through inherited traits, and circular references through Cross[T] modules

lihaoyi added a commit that referenced this issue Nov 1, 2024
…#3878)

This completes #3715 with all the cycle cases I could think of.

It also touches `resolveDirectChildren` where I made some changes that I
would help with readability. I thought I was going to need to build on
those but didn't. These can be reverted if it's unwanted.

---------

Co-authored-by: Li Haoyi <haoyi.sg@gmail.com>
@lihaoyi lihaoyi closed this as completed Nov 1, 2024
@lefou lefou added this to the 0.12.2 milestone Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants