Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change startup script name generation #1020

Merged
merged 12 commits into from
Jan 8, 2018
Merged

Change startup script name generation #1020

merged 12 commits into from
Jan 8, 2018

Conversation

atrosinenko
Copy link
Contributor

This PR tries to lower the chance of generation of multiple scripts
with equal names in one archive. Fixes #1016.

Additionally, an attempt was made to improve the word splitting
for the script names, so, for example UITest becomes ui-test
and not u-i-test.

I had to edit ForwarderScripts.apply(...) as well but I'm not sure I did it the right way.

@lightbend-cla-validator

Hi @atrosinenko,

Thank you for your contribution! We really value the time you've taken to put this together.

Before we proceed with reviewing this pull request, please sign the Lightbend Contributors License Agreement:

http://www.lightbend.com/contribute/cla

(qualifiedClassName, lowerCased.replace('.', '-'), lowerCased.split("\\.").last)
}

names.groupBy(_._3).toSeq.flatMap {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the preferred way to rephrase this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue that _._3 is hard to read as this parameter has no name.
A better way would be

names.groupBy {
  case (_, _, foo) => foo
}.toSeq.flatMap {
 ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't thought about it. It really looks more readable.

@atrosinenko
Copy link
Contributor Author

Strange, at least one of Travis failures looks like some misconfiguration.

@muuki88 muuki88 added rpm universal Zip, tar.gz, tgz and bash issues and removed rpm labels Aug 29, 2017
Copy link
Contributor

@muuki88 muuki88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your well crafted PR ( refactoring logic and unit tests ).

I think the logic is way to complex. You already have an amazingly simple idea in your code 😄
Instead of applying heuristics to hope for the best, we can actually be certain what name to use.

  1. Create a tuple of (simpe name, fully qualified name)
  2. Group by the simple name -> Map[simple name, List[fully qualified name]
  3. Partition `(fully qualified names length == 1, fully qualified names length > 1)
  4. Extend simple name and start from step 1 with the fully qualified names length > 1 partition

(qualifiedClassName, lowerCased.replace('.', '-'), lowerCased.split("\\.").last)
}

names.groupBy(_._3).toSeq.flatMap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue that _._3 is hard to read as this parameter has no name.
A better way would be

names.groupBy {
  case (_, _, foo) => foo
}.toSeq.flatMap {
 ...
}

@atrosinenko
Copy link
Contributor Author

If you propose to extend the simpleNames just for the classes that generate conflicting script names without this, then it may introduce additional conflicts, something like

pkg1.Pkg1Test -> pkg-1-test
pkg1.Test     -> test -> pkg-1-test
pkg2.Test     -> test -> pkg-2-test

So it may be not so simple. On the other hand, when these heuristics fail, it does not render resulting zip completely unusable (though, it may became unusable in fully automatic setups). So there may be another option to drop the most nested numbering logic completely (not very good, does the other means lower the chance of failure to some negligible level?). Or apply some iterative algorithm.

Another problem, if we decide to iteratively extend the simpleName to not make it very long: consider the following classes:

com.company.feature1.core.MainClass
com.company.feature2.ui.MainClass

If we just extend it from right to left, then we probably end up with meaningless core-main-class and ui-main-class. In fact, we would prefer feature-1-main-class and feature-2-main-class. Another option may be to strip equal prefix from the left, then extend it as main-class -> feature-1-main-class -> feature-1-core-main-class.

@muuki88
Copy link
Contributor

muuki88 commented Aug 30, 2017

If you propose to extend the simpleNames just for the classes that generate conflicting script names without this, then it may introduce additional conflicts.

This is true if we change the class name. In your example the camel cased name Pkg1Test is transformed to pkg1-test, which we shouldn't do. We should only transform . (separating packages) to - in the script name. The compiler then ensures that if we expand from right to left in the end every bash script is unique.

Another problem, if we decide to iteratively extend the simpleName to not make it very long

I'm willing to choose this trade off. I consider this rather an edge case as this could be fixed on an application level by renaming classes or moving it to another package. I know naming is one of the harder problems in software engineering, but users of your scripts will be happy to have meaningful script names instead of weird package names ;)

I first thought of adding a "resolve conflict strategy" as sbt-assembly does it, but I think it's an overkill as we need to document and maintain it.

Long story short: A recursive algorithm extending only the conflicting classes until all conflicts are resolved seems the most stable and simplest solution :)

@atrosinenko
Copy link
Contributor Author

So should the lower casing algorithm be completely dropped? Previously, class names were splitted into words separated with - (but this failed with abbreviations), if I got it right.

@muuki88
Copy link
Contributor

muuki88 commented Aug 30, 2017

Either this or we replace the dots with underscored. This would keep backwards compatibility and wouldn't generate toooo ugly script names.

@lightbend-cla-validator
Copy link

Hi @atrosinenko,

Thank you for your contribution! We really value the time you've taken to put this together.

Before we proceed with reviewing this pull request, please sign the Lightbend Contributors License Agreement:

http://www.lightbend.com/contribute/cla

@atrosinenko
Copy link
Contributor Author

Please excuse me for overcomplicating everything once again, but here is another try: now heuristics are even worse :( but they should give more meaningful results (see tests). Based on your review I have to either try to simplify it to some informally provable form or drop it in favor of simpler prefix-based strategy.

@sbt sbt deleted a comment Sep 4, 2017
@sbt sbt deleted a comment Sep 4, 2017
@muuki88
Copy link
Contributor

muuki88 commented Sep 4, 2017

Thanks for your time and dedication @atrosinenko :)
The results make sense (stripping common packages and keep script names short).

Still I think this in an edge case and we should rather add the ability to configure script names then a complicated algorithm that can't be explain in two sentences. Maintaining and explaining this will cost more time I fear ;)

What do you thing of this implementation?

  private case class MainClass(scriptName: String, fullyQualifiedClassName: String) {
    val parts: Array[String] = toLowerCase(fullyQualifiedClassName).split("\\.")
    def asTuple: (String, String) = (fullyQualifiedClassName, scriptName)
  }

  def getMainClasses(discoveredMainClasses: Seq[String]): Seq[(String, String)] = {
    val mainClasses = discoveredMainClasses.map { qualifiedClassName =>
      val lowerCased = toLowerCase(qualifiedClassName)
      val parts = lowerCased.split("\\.")
      MainClass(parts.last, qualifiedClassName)
    }
    disambiguateNames(mainClasses, 1).map(_.asTuple)
  }

  private def disambiguateNames(mainClasses: Seq[MainClass], expansionIndex: Int): Seq[MainClass] =
    if (mainClasses.isEmpty) {
      // end of recursion
      mainClasses
    } else {
      // find duplicates
      val (duplicates, uniques) = mainClasses
        .groupBy(_.scriptName)
        .partition {
          case (_, classes) => classes.length > 1
        }

      // expand names for duplicated script names
      val expandedScriptNames = for {
        (_, classes) <- duplicates
        extendedClasses <- classes.map(expandClassName(expansionIndex))
      } yield extendedClasses

      uniques.toSeq.flatMap(_._2) ++ disambiguateNames(expandedScriptNames.toSeq, expansionIndex + 1)
    }

  private def expandClassName(expansionIndex: Int)(mainClass: MainClass): MainClass = {
    val scriptName = mainClass.parts.takeRight(expansionIndex + 1).mkString("_")
    mainClass.copy(scriptName = scriptName)
  }

@atrosinenko
Copy link
Contributor Author

Please excuse me for the delay and thank you for code advice. I have rewritten disambiguation logic once again, now heavily based on your proposal. Now it looks much simpler than from the second try. I decided to expand from the left after stripping common prefix instead of expanding from the right -- looks like it can give more meaningful distinction by features instead of some technical innermost packages (but I don't insist it is the most correct approach).

I had to manually rebase previous commits as well, so it may be worth to minimally glance on them.

@atrosinenko
Copy link
Contributor Author

Oops, fixed the corner case of single main class (the common prefix included the class name itself) -- that didn't caught by the tests on Linux.
PS: The CLA signing bot suggested me to notify the maintainer that CLA is signed for already opened PRs. :)

@muuki88
Copy link
Contributor

muuki88 commented Sep 17, 2017

Thanks @atrosinenko I'll try to take a look this week. I'll be away for 4 weeks on vacation then and will merge it after that :)

@muuki88
Copy link
Contributor

muuki88 commented Sep 21, 2017

I'll be on vacation un 20.10. Thanks for your understanding if this can't be merged until then

Copy link
Contributor

@muuki88 muuki88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your patience and passion you put into this. Once again I'm requesting more simplification 😉

The main reasons is that the behaviour is non-obvious for the user. As a way forward I would suggest that we simply keep the disambiguateNames method and remove all the prettifying stuff. The next step would be a pull request (if you have time and motivation) and add a setting like

executableScriptNames: SettingKey[Map[String, String]]

which provides a way to customize the generated script names.

*/
def toLowerCase(qualifiedClassName: String): String = {
// suppose list is not very huge, so no need in tail recursion
def split(chars: List[Char]): List[Char] = chars match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is too much for an edge case. Prettifying should be done by the user.
We could do that in a second pull request where we provide a setting that let's the user
configure executable script names for each main class.


private[this] def disambiguateNames(
mainClasses: Seq[MainClass],
expansionIndex: Int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that parameter name 👍

val parts = lowerCased.split("\\.")
MainClass(qualifiedClassName, parts)
}
val commonPrefixLength = mainClasses.map(_.parts.init.toList).reduce(commonPrefix).size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave out every prettifying here. As mentioned below the way forward would be
a user facing setting where a mapping between mainClass and executable script name
can be configured.

(fullyQualifiedClassName, scriptName(expansionIndex))
}

private[this] def disambiguateNames(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done implementation. This the bare minimum for these feature to work

This commit tries to lower the chance of generation of multiple scripts
with equal names in one archive. Fixes #1016.

Additionally, an attempt was made to improve the word splitting
for the script names, so, for example "UITest" becomes "ui-test"
and not "u-i-test".
@atrosinenko
Copy link
Contributor Author

Please excuse me for huge delay. I have stripped some of complicated logic.

For now, I have left lowercasing logic as-is , but changed other beautifying to simply the following:

  • split lower-cased names by uniqueness of their last parts (simple names)
  • a) unique names replace by lower-cased simple names
  • b) non-unique names replace by lower cased fully qualified names with . replaced by -

I cannot prove right now that lower-casing of fully qualified class names is reversible (that is, cannot introduce name clashes), but names that differ just in letter case are already clashed on Windows. Supposing all lower-cased qualified names are distinct, this trivial disambiguation seems to not introduce clashes:

  • the part of resulting name after the last _ (if any) is generated from the simple class name, so no clashes for names from a) among them and with names from b) (even from the default package)
  • names from b) are all distinct because they are just lower-cased qualified names and we supposed they are all distinct :)

@muuki88
Copy link
Contributor

muuki88 commented Jan 2, 2018

No need to apologise 😄 . I have to thank you for keeping up with all my change requests ❤️

The changes are looking maintainable (great docs, thank you 👍 ) and reasonable.
I have only a last request :) :) :)
Could you add a small section to the java app multiple apps documentation that the start scripts are disambiguated. Ideally with a couple of examples?

@atrosinenko
Copy link
Contributor Author

I will try to make requested fixes soon.

I spotted that name lower-casing logic can introduce clashes, at least in some corner cases like Test1Class and Test1class will both end up in test-1-class (for this particular case there is already clash for case-insensitive file system). I want to show warnings in such cases. I also want to slightly factor out common logic from BatStartScriptPlugin and BashStartScriptPlugin, so it will be easier to implement this warning.

@atrosinenko
Copy link
Contributor Author

But since this logic just inserts some -s and changes characters to lower case, it seems to introduce new clashes only if there are already some "case-insensitive clashes" like pkg1.A and pkg1.a (or class names contain a - :) ). Meanwhile, subsequent disambiguation logic can introduce new clashes, if class names contain a _, but it is too many coincidences.

@muuki88
Copy link
Contributor

muuki88 commented Jan 4, 2018

But since this logic just inserts some -s and changes characters to lower case, it seems to introduce new clashes only if there are already some "case-insensitive clashes" like pkg1.A and pkg1.a (or class names contain a - :) ).

True, and you are right that these are too many coincidences ( or rather obscure class namings ).
So keeping things simple is a better option :)

@atrosinenko
Copy link
Contributor Author

I have added a note to the docs about name disambiguation and added a warning on script name conflicts (although, I did not test the warning itself at all, only added a test on duplication detection). When adding a warning, I refactored BashStartScriptPlugin and BatStartScriptPlugin and I hope they still work :) -- ^validate and ^validateOSX pass on my Ubuntu, but I did not tested that these scripts works. I can test it in real life setting next week.

Copy link
Contributor

@muuki88 muuki88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring! I like the shape this becomes. The documentation is also pretty neat.

During the refactoring you added an interesting type. I didn't quite get it, but if necessary this is good to go.

Thanks for you dedication ❤️

IO.write(script, scriptContent)
// TODO - Better control over this!
if (makeScriptsExecutable)
script.setExecutable(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You you simply do

script.setExecutable(makeScriptsExecutable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, assuming script.setExecutable(false) is no-op and works well on Windows :) Meanwhile, does building zips with executable bits set on Windows works at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I don't hesitate to simplify this line, if it is possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. No idea what happens on windows. We can leave this as is and should put a comment in there that setExecutable may behave differently on different platforms.


IO.write(file, scriptContent)
if (makeScriptsExecutable)
file.setExecutable(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You you simply do

script.setExecutable(makeScriptsExecutable)

def withScriptName(scriptName: String): SpecializedScriptConfig
}

protected[this] type SpecializedScriptConfig <: ScriptConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some template methods (or how they call it?) can depend on fields from the specialized type. Particularly, it was the way I allow Replacements.appDefines() to access configLocation and extraDefines from specialized config.

mainClasses: Seq[String],
config: SpecializedScriptConfig): Seq[(String, String)]

protected[this] trait ScriptConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like binding the script config type to the actual instance.

@@ -223,6 +223,48 @@ For two main classes ``com.example.FooMain`` and ``com.example.BarMain`` ``sbt s

Now you can package your application as usual, but with multiple start scripts.

A note on script names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done 👍

@atrosinenko
Copy link
Contributor Author

Forgot to say, there were some hardcoded bin/s and scripts/s in the paths for main (and, probably, forwarder) scripts, sometimes script was saved to scripts/ and the second element of tuple referenced bin/. I'm not sure I got it right, but for now I simply changed it to scriptTargetFolder (== "bin/") everywhere.

@muuki88
Copy link
Contributor

muuki88 commented Jan 7, 2018

All tests seem to pass, so the bin refactoring worked ( and is quite nice ).
As mentioned in the comment above we can leave the setExecutable code as is, but should put an inline comment why.

Then this is good to merge ❤️

@atrosinenko
Copy link
Contributor Author

I have added an explanation for the SpecializedScriptConfig type and changed executable bit logic to unconditionally setting some particular value, as you proposed first, because it is simpler and, after all, it is worth to not only set an executable bit on shell script, but also to explicitly unset it on .bat script. I don't know how this behaves on Windows, but AFAIK the shell scripts were previously generated when building on Windows too, weren't they? If so, this should not error out, but will fail to set that bit if I get the documentation on setExecutable(...) right (it can throw an exception, but it is when SecurityManager disallows an operation).

@muuki88
Copy link
Contributor

muuki88 commented Jan 8, 2018

Nice nice nice 😎

I retriggered the one build that timed-out. If this is green, then we can merge this.
I'm very happy about the output. Thanks for all your time and patience ☺️

@muuki88 muuki88 merged commit e0b0a13 into sbt:master Jan 8, 2018
@atrosinenko
Copy link
Contributor Author

Thanks for all your time and assistance! :)

@atrosinenko atrosinenko deleted the universal-script-names branch January 9, 2018 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
universal Zip, tar.gz, tgz and bash issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants