-
-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minify unicode-range: U+2600-26FF → U+26?? #344
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent first try! I've added some comments to the code, let me know what you think!
@@ -1201,6 +1201,35 @@ func (c *cssMinifier) minifyProperty(prop Hash, values []Token) []Token { | |||
values[0].Data = oneBytes | |||
values[0].Ident = 0 | |||
} | |||
case UnicodeRange: | |||
for i, value := range values { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be aware that value
may be just a CommaToken which you may need to continue
?
css/css.go
Outdated
copy(data, value.Data) | ||
|
||
// If we have a range of exactly two parts. | ||
if parts := bytes.Split(data, []byte{'-'}); len(parts) == 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove U+
first, for speed just check if the first byte is U
and the second a +
, then take a slice starting at 2
to skip them both. Next we have two cases: one single value or a range. Start looking byte for byte until we meet the end or -
. Keep track of leading 0
s so we trim those too. If this is a range, continue like you do.
css/css.go
Outdated
parts[0] = bytes.TrimPrefix(parts[0], []byte("U+")) | ||
|
||
// And both parts are the same length. | ||
if len(parts[0]) == len(parts[1]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a requirement as far as I can see
css/css.go
Outdated
} | ||
|
||
// If both parts now match we only need one. | ||
if bytes.Equal(parts[0], parts[1]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this check first: in reverse order check if the first part is a zero and the second part an F
. If this is not the case, check that both bytes are equal, or the left side is a 0
or is shorter than the right side (e.g. U+0-04FF => U+4??
as far as I can see).
If you put this check first before setting the ?
s, we can get rid of the copy()
.
Take 2, I no longer have the I believe both parts do have to be the same length with the exception of a special case where the left side can be grown to be the same length, e.g. I also believe your explicit example is invalid and Also there's a TODO test that I would like to resolve before this PR is merged but I don't yet see an elegant way to do that. It's not a bug, just a missed optimisation, it won't replace with question marks if there's no common prefix. And finally I wasn't sure if the bitwise mask for case insensitive byte checking was too esoteric, if so // Starts with "U+..." or "u+..."
if len(value.Data) >= 2 && value.Data[0]|32 == 'u' && value.Data[1] == '+' { could become // Starts with "U+..." or "u+..."
if bytes.HasPrefix(value.Data, []byte("U+")) || bytes.HasPrefix(value.Data, []byte("u+")) { |
// U+2600-26FF → U+26?? | ||
for i, value := range values { | ||
// Starts with "U+..." or "u+..." | ||
if len(value.Data) >= 2 && value.Data[0]|32 == 'u' && value.Data[1] == '+' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a personal style I have, but I prefer to write it the other way around, as in 1 < len(value.Data)
or 2 <= len(value.Data)
. That way the left side is always smaller then the right side, which is more intuitive (e.g. when we write intervals). It's easier to process mentally, for me at least.
for i, value := range values { | ||
// Starts with "U+..." or "u+..." | ||
if len(value.Data) >= 2 && value.Data[0]|32 == 'u' && value.Data[1] == '+' { | ||
hyphen := bytes.IndexByte(value.Data, '-') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to avoid function in bytes
like these, as this makes it easier to write code non-optimally. For example, in this case you should parse from left to right and put the bytes in left
. Then if you encounter -
, stop parsing and put the remainder in right
.
// Skip over "U+", we have a range of two same length parts. | ||
left := value.Data[2:hyphen] | ||
right := value.Data[hyphen+1:] | ||
if len(left) != len(right) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you said, this is not supported yet, but as you parse from left to right like I explain above, you can keep track of leading zeros. Then you can check for lengths to be equal (ignoring leading zeros).
} | ||
|
||
// Starting at the ends compare each part byte by byte. | ||
for j := len(left); j > 0; j-- { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too do I prefer j < 0
|
||
// Starting at the ends compare each part byte by byte. | ||
for j := len(left); j > 0; j-- { | ||
if left[j-1] != '0' || right[j-1]|32 != 'f' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a really smart way to check whether it's either upper or lower case!
// Starting at the ends compare each part byte by byte. | ||
for j := len(left); j > 0; j-- { | ||
if left[j-1] != '0' || right[j-1]|32 != 'f' { | ||
if bytes.EqualFold(left[:j], right[:j]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bit wasteful to check the entire length every time, the first character will be checked for the same thing many times!
The bitwise mask is excellent! Be aware that the alternative you put is pretty slow, you need to do two byte-slice allocations and two function calls! I like where this is going ;-), I've left a few comments we can discuss, let me know how it goes. |
This has now been implemented |
@tdewolff Consider this my opening salvo, it works, but it's not very elegant nor efficient, I particularly don't like the
copy()
, any feedback would be much appreciated.I've decided to just tackle the first part of #321 in this PR as it was simpler imo and both optimisations are fairly self contained.
Updates #321