Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some fixes: #1

Merged
merged 2 commits into from
Nov 10, 2016
Merged

some fixes: #1

merged 2 commits into from
Nov 10, 2016

Conversation

jampy
Copy link

@jampy jampy commented Nov 9, 2016

  • simplify hex-to-decimal conversion
  • really stable, reproducible module ID calculation
  • don't fix module ID collisions by choosing a different ID, because that makes the ID unstable; instead, throw an exception
  • provide a "seed" option so that collisions can be fixed manually

This PR fixes some issues that I think the original code has.

First of all, I think that the old genModuleId() is dangerous because it tries to "fix" module ID collisions by pratically choosing a different ID. However, this makes a module ID dependent to the other modules in the build - effectively defeating the purpose of the stable ID.

- simplify hex-to-decimal conversion
- really stable, reproducible module ID calculation
- don't fix module ID collisions by choosing a different ID, because that makes the ID unstable; instead, throw an exception
- provide a "seed" option so that collisions can be fixed manually
@jampy
Copy link
Author

jampy commented Nov 9, 2016

A note on the seed option:

This PR may cause builds to fail when two modules get the same module ID. This is by design because such collisions are problematic for the solution itself.

In such a situation, a webpack-stable-module-id-and-hash module id collision exception is thrown.

To solve the problem, a different seed (number 0..31) should be choosen. The seed affects the module ID algorithm. When changing this value, the module ids will be completely different (but still stable).

The seed is also included in the file names, so everything remains unique.

@zhenyong
Copy link
Owner

zhenyong commented Nov 9, 2016

@jampy Thanks so much for your PR.

I agree with your point that the old genModuleId() is dangerous.

Your solution is awesome, but it seems like just extremely reduce happenstance of pratically choosing a different ID.

Once we try to make module id (hash) shorter, it absolutely causes a module ID dependent to the other (in theory). Do you think so?

Let's think about it.

Awesome PR really!

@KyleAMathews
Copy link
Contributor

👍 to this change. Making the hash a bit longer makes hash collisions very unlikely.

Excited to try out this module! Wish something like this was enabled by default in Webpack.

@jampy
Copy link
Author

jampy commented Nov 9, 2016

@zhenyong thanks

Your solution is awesome, but it seems like just extremely reduce happenstance of pratically choosing a different ID.

Once we try to make module id (hash) shorter, it absolutely causes a module ID dependent to the other (in theory). Do you think so?

I'm not sure I understood your question, sorry. The basic principle is still yours. As with any hash, collisions are basically always possible. The larger the hash, the less probable it is. The hash computed by this PR is 28 bits long (2^28 possible IDs). I choose 28 bits because I guess that in JavaScript at least a 32 bit signed integer can be considered safe.

Perhaps also 63 bits would be possible - could be worth to check out.

The PR could be improved to use up to 31 bits - it just takes a little more effort (a little bit more math than just extracting a substring).

The size of the hash (in my version) really only affects the probability of a collision (and the probability of a failed build). The IDs should however remain stable in any case.

BTW, I'm using this in a large project (~1500 modules) and it works flawlessly so far :-)

@zhenyong
Copy link
Owner

@KyleAMathews
webpack 2 provide HashedModuleIdsPlugin

@zhenyong
Copy link
Owner

@jampy I hope the original codes not make harm to ur large project O(∩_∩)O~

I agree with that your solution can kill almost 99.9%(maybe ^_^) collision.

In a sense, it is the equivalent of changingvar len = 4; to var len = 28 or more. And only if we use the full hash (decimal 32~64bit) that can 100% kill the collision in theory.

Here is anthor issue for both current code and your PR , that manually change "seed" to kill collision is not elegant for CI .

Really a puzzle ha ^_^!

@KyleAMathews
I may first change the default "pickup" length from 4 to 8. (I know ...)

Let me think about more.

@zhenyong
Copy link
Owner

#3

@jampy
Copy link
Author

jampy commented Nov 10, 2016

@zhenyong

I agree with that your solution can kill almost 99.9%(maybe ^_^) collision.

Actually I think the probability of a collision is somewhat like 1 / 268435456 * (number of modules). With 1500 modules that's 0,0005%. With a 31 bit hash that would be 0,00007%.

And only if we use the full hash (decimal 32~64bit) that can 100% kill the collision in theory.

A hash can never be 100% collision-safe. You can just reduce the probability.

Here is anthor issue for both current code and your PR , that manually change "seed" to kill collision is not elegant for CI .

That's true, but I doubt there can be a solution that's hash based avoiding this problem completely.

However, I think that it's worth it and the build should already fail on the development machine, even before reaching CI.

@jampy
Copy link
Author

jampy commented Nov 10, 2016

In a sense, it is the equivalent of changingvar len = 4; to var len = 28 or more.

Not really.

Your original hexToNum() function seems broken to me. I'm not sure if it is supposed to convert (part of) a hex string to the equivalent integer - anyway it does not. It probably makes a much worse conversion and likely will lead to IDs higher than 2^31.

For example: hexToNum("FFFFFFFF") returns 7070707070707070

The crucial part is really that usedIds should never cause any changes in the module IDs.

Just think of this situation (with your version):

  • module A has hash "123456789ABC", genModuleId() returns 4950515253545556 (len 8)
  • module B has hash "123456789FFF", genModuleId() returns 495051525354555650 (len 9)

So, you have two different IDs - great.

Now "module A" is removed from the codebase. This causes "module B" to get the ID 4950515253545556 - which is the ID that module A had.

So, the main purpose of the plugin is lost.

Do you see the problem now?

Do you see any problems with my PR or would you mind simply accepting it? Thanks.

@zhenyong zhenyong merged commit fee79ff into zhenyong:master Nov 10, 2016
@zhenyong
Copy link
Owner

@jampy I really know the original problem from begining, thanks your detail explanation.

@jampy
Copy link
Author

jampy commented Nov 10, 2016

awesome! thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants