Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove regexp use #40

Merged
merged 5 commits into from
May 17, 2022
Merged

Remove regexp use #40

merged 5 commits into from
May 17, 2022

Conversation

kolyshkin
Copy link
Contributor

Copy/paste from the most important commit:
size: stop using regexp

Using a regular expression brings in the whole regexp and regexp/syntax
packages, which increase the resulting binary size by about 130K
(Go 1.18.1, Linux/amd64).

Besides, regular expressions are generally slow and incur some
initialization overhead (to compile all global regexp.MustComplile
variables). This, unlike the size difference, is not the main motivation
for this commit, but it feels like it should have also been mentioned.

A quick benchmark comparison shows a huge improvement (again, this is
not why this is done, nevertheless it pleases the eye):

name         old time/op    new time/op    delta
ParseSize-4    10.6µs ± 3%     2.6µs ±29%  -75.10%  (p=0.002 n=6+6)

name         old alloc/op   new alloc/op   delta
ParseSize-4    3.26kB ± 0%    0.20kB ± 0%  -93.75%  (p=0.000 n=7+6)

name         old allocs/op  new allocs/op  delta
ParseSize-4      72.0 ± 0%      26.0 ± 0%  -63.89%  (p=0.000 n=7+6)

Compatibility note: I chose to accept (rather than error out) input
strings like ".4 Gb" as this is valid and makes sense, and the inability
to process such input by the old implementation is more about the
limitation of regex being used, rather than a decision to error out on
those.

For the rest, please see commit messages.

@@ -128,6 +129,8 @@ func TestRAMInBytes(t *testing.T) {
assertSuccessEquals(t, 32, RAMInBytes, "32.3")
tmp := 32.3 * MiB
assertSuccessEquals(t, int64(tmp), RAMInBytes, "32.3 mb")
tmp = 0.3 * MiB
assertSuccessEquals(t, int64(tmp), RAMInBytes, "0.3MB")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 0.3 <space> MB ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, never mind; expected is the first argument 😞 we should swap those; always confuses me if assertions use <expected> <actual>

@@ -99,11 +99,11 @@ func TestFromHumanSize(t *testing.T) {
assertSuccessEquals(t, 32.5*KB, FromHumanSize, "32.5 kB")
assertSuccessEquals(t, 32, FromHumanSize, "32.5 B")
assertSuccessEquals(t, 300, FromHumanSize, "0.3 K")
assertSuccessEquals(t, 300, FromHumanSize, ".3kB")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly on the fence if we want to accept .<number> (and <number>.) or not 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came up with some additional test-cases (TBD if the valid/invalids are in the right one);

  • tests with leading/trailing whitespace (I think generally we should be "ok" with those, and trim before use)
  • tests with trailing . (should be an error)
  • tests with multiple (but valid) units, e.g. bb (looks like those aren't detected)
  • TBD: multiple spaces between number and unit (probably should be an error)
  • TBD: number with ., but no number following (should we accept those?)
func TestFromHumanSize(t *testing.T) {
	assertSuccessEquals(t, 0, FromHumanSize, " 0")
	assertSuccessEquals(t, 0, FromHumanSize, " 0b")
	assertSuccessEquals(t, 0, FromHumanSize, " 0B")
	assertSuccessEquals(t, 0, FromHumanSize, " 0 B")
	assertSuccessEquals(t, 0, FromHumanSize, "0 ")
	assertSuccessEquals(t, 0, FromHumanSize, "0b ")
	assertSuccessEquals(t, 0, FromHumanSize, "0B ")
	assertSuccessEquals(t, 0, FromHumanSize, "0 B ")
	assertSuccessEquals(t, 0, FromHumanSize, "0")
	assertSuccessEquals(t, 0, FromHumanSize, "0b")
	assertSuccessEquals(t, 0, FromHumanSize, "0B")
	assertSuccessEquals(t, 0, FromHumanSize, "0 B")
	assertSuccessEquals(t, 32, FromHumanSize, "32")
	assertSuccessEquals(t, 32, FromHumanSize, "32b")
	assertSuccessEquals(t, 32, FromHumanSize, "32B")
	assertSuccessEquals(t, 32, FromHumanSize, "32")
	assertSuccessEquals(t, 32*KB, FromHumanSize, "32k")
	assertSuccessEquals(t, 32*KB, FromHumanSize, "32K")
	assertSuccessEquals(t, 32*KB, FromHumanSize, "32kb")
	assertSuccessEquals(t, 32*KB, FromHumanSize, "32Kb")
	assertSuccessEquals(t, 32*MB, FromHumanSize, "32Mb")
	assertSuccessEquals(t, 32*GB, FromHumanSize, "32Gb")
	assertSuccessEquals(t, 32*TB, FromHumanSize, "32Tb")
	assertSuccessEquals(t, 32*PB, FromHumanSize, "32Pb")

	assertSuccessEquals(t, 32.5*KB, FromHumanSize, "32.5kB")
	assertSuccessEquals(t, 32.5*KB, FromHumanSize, "32.5 kB")
	assertSuccessEquals(t, 32, FromHumanSize, "32.5 B")
	assertSuccessEquals(t, 300, FromHumanSize, "0.3 K")
	assertSuccessEquals(t, 300, FromHumanSize, ".3kB")

	assertError(t, FromHumanSize, "")
	assertError(t, FromHumanSize, ".")
	assertError(t, FromHumanSize, ". ")
	assertError(t, FromHumanSize, " ")
	assertError(t, FromHumanSize, "  ")
	assertError(t, FromHumanSize, " .")
	assertError(t, FromHumanSize, " . ")
	assertError(t, FromHumanSize, "0.")
	assertError(t, FromHumanSize, "0. ")
	assertError(t, FromHumanSize, "0.b")
	assertError(t, FromHumanSize, "0.B")
	assertError(t, FromHumanSize, "-0")
	assertError(t, FromHumanSize, "-0b")
	assertError(t, FromHumanSize, "-0B")
	assertError(t, FromHumanSize, "-0 b")
	assertError(t, FromHumanSize, "-0 B")
	assertError(t, FromHumanSize, "-32")
	assertError(t, FromHumanSize, "-32b")
	assertError(t, FromHumanSize, "-32B")
	assertError(t, FromHumanSize, "-32 b")
	assertError(t, FromHumanSize, "-32 B")
	assertError(t, FromHumanSize, "32.")
	assertError(t, FromHumanSize, "32.b")
	assertError(t, FromHumanSize, "32.B")
	assertError(t, FromHumanSize, "32. b")
	assertError(t, FromHumanSize, "32. B")
	assertError(t, FromHumanSize, "32b.")
	assertError(t, FromHumanSize, "32B.")
	assertError(t, FromHumanSize, "32 b.")
	assertError(t, FromHumanSize, "32 B.")
	assertError(t, FromHumanSize, "32 bb")
	assertError(t, FromHumanSize, "32 BB")
	assertError(t, FromHumanSize, "32 b b")
	assertError(t, FromHumanSize, "32 B B")
	assertError(t, FromHumanSize, "32  b")
	assertError(t, FromHumanSize, "32  B")
	assertError(t, FromHumanSize, "hello")
	assertError(t, FromHumanSize, "-32")
	assertError(t, FromHumanSize, " 32 ")
	assertError(t, FromHumanSize, "32m b")
	assertError(t, FromHumanSize, "32bm")
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as to what should pass and what should fail, I think we should follow the rules of strconv.ParseFloat (plus our own way of handling suffixes). This is why I decided we should pass .3 as it's a valid floating point number in those programming languages/environments I am familiar with.

Adding your tests now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I decided we should pass .3 as it's a valid floating point number in those programming languages/environments I am familiar wit

Ah, gotcha, yes, that makes sense.

With the leading/trailing . I was a bit concerned about those coming from "idk" (accidental things), so was veering on the "safe side".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	assertSuccessEquals(t, 0, FromHumanSize, " 0")
	assertSuccessEquals(t, 0, FromHumanSize, " 0b")
	assertSuccessEquals(t, 0, FromHumanSize, " 0B")
	assertSuccessEquals(t, 0, FromHumanSize, " 0 B")

Neither the old nor the new code tolerates extra leading or trailing space. We can certainly relax that, although my original intent was to replicate the functionality as-is (with the exception of accepting strings like ".33 GB" or "345. B", as described in the last commit).

go.mod Outdated
@@ -0,0 +1,3 @@
module github.com/docker/go-units
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if this repository should also live in the moby org; wondering what's best; add a go.mod already, or wait with adding until after we moved; wdyt?

In either case, may be better to do it separate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, let's move this to moby first.

The go.mod is added for selfish reasons only -- so I can actually run go test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

reminds me, I also want to look at https://github.com/tonistiigi/units, and see where the gaps (and overlaps) are between the two, as both end up in our dependency tree; perhaps we can unify them

(I know there's some possibly "bad" choices in docker/go-units, so maybe there's reasons for one over the other, but 🤷‍♂️)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #41 about move to under moby org. Once done we can move from there.

Looking into https://github.com/tonistiigi/units

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Tonis' repo is about printing/output, and does not cover conversion/input.

Copy link
Member

@thaJeztah thaJeztah Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup; it's "the other side of go-units", so mostly complementary; but in the same area, so it could make sense to have these together (not urgent, but an option).

I checked with Tonis, and he wasn't against, so it's an option we can consider (not "MUST", but can)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could make sense to have these together

Perhaps it's better to keep it separate, as they solve complementary, but different tasks.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! overall looks good; left some comments;

also, can you drop the first commit for now (adding the go.mod)?

size_test.go Show resolved Hide resolved
size_test.go Outdated Show resolved Hide resolved
size_test.go Show resolved Hide resolved
kolyshkin and others added 5 commits May 16, 2022 09:42
Such sizes are quite valid but were never tested.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Shows on my laptop (6 iterations processed by benchstat):

name         time/op
ParseSize-4  10.6µs ± 3%

name         alloc/op
ParseSize-4  3.26kB ± 0%

name         allocs/op
ParseSize-4    72.0 ± 0%

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This way, error messages will show correct line information.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Co-authored-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using a regular expression brings in the whole regexp and regexp/syntax
packages, which increase the resulting binary size by about 130K
(Go 1.18.1, Linux/amd64).

Besides, regular expressions are generally slow and incur some
initialization overhead (to compile all global regexp.MustComplile
variables). This, unlike the size difference, is not the main motivation
for this commit, but it feels like it should have also been mentioned.

A quick benchmark comparison shows a huge improvement (again, this is
not why this is done, nevertheless it pleases the eye):

name         old time/op    new time/op    delta
ParseSize-4    10.6µs ± 3%     2.6µs ±29%  -75.10%  (p=0.002 n=6+6)

name         old alloc/op   new alloc/op   delta
ParseSize-4    3.26kB ± 0%    0.20kB ± 0%  -93.75%  (p=0.000 n=7+6)

name         old allocs/op  new allocs/op  delta
ParseSize-4      72.0 ± 0%      26.0 ± 0%  -63.89%  (p=0.000 n=7+6)

Compatibility note: As a result, we are now a but more relaxed to the
input, allowing e.g. ".4 Gb", or "-0", or "234. B", following the rules
of strconv.ParseFloat. It seems that those were previously rejected as
a result of a regex being used, not deliberately.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

also, can you drop the first commit for now (adding the go.mod)?

Done

@kolyshkin kolyshkin requested a review from thaJeztah May 16, 2022 16:43
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@thaJeztah thaJeztah merged commit e682442 into docker:master May 17, 2022
@kolyshkin kolyshkin mentioned this pull request Jun 6, 2022
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Aug 4, 2022
We only use a single method, RAMInBytes, which can be easily implemented
locally.

Do that (based on docker/go-units#40, so the
implementation is fully backward-compatible, except for the addition
of treating -1), and remove docker/go-units dependency.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Aug 4, 2022
We only use a single method, RAMInBytes, which can be easily implemented
locally.

Do that (based on docker/go-units#40, so the
implementation is fully backward-compatible, except for the addition
of treating -1), and remove docker/go-units dependency.

This implementation relies on strings.Cut() which is only available
since Go 1.18.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Aug 4, 2022
We only use a single method, RAMInBytes, which can be easily implemented
locally.

Do that (based on docker/go-units#40, so the
implementation is fully backward-compatible, except for the addition
of treating -1), and remove docker/go-units dependency.

This implementation relies on strings.Cut() which is only available
since Go 1.18.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Aug 18, 2022
We only use a single method, RAMInBytes, which can be easily implemented
locally.

Do that (based on docker/go-units#40, so the
implementation is fully backward-compatible, except for the addition
of treating -1), and remove docker/go-units dependency.

This implementation relies on strings.Cut() which is only available
since Go 1.18.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Aug 18, 2022
We only use a single method, RAMInBytes, which can be easily implemented
locally.

Do that (based on docker/go-units#40, so the
implementation is fully backward-compatible, except for the addition
of treating -1), and remove docker/go-units dependency.

This implementation relies on strings.Cut() which is only available
since Go 1.18.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 1, 2022
Mostly to include docker/go-units#40

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Dec 14, 2023
... to improve performance by not using regular expressions [1].
However, the potential reduction in binary size is lost because Toolbx
already uses the 'regexp' package to check if a string might be the ID
of an image or a valid container name.

[1] go-units commit 737572633c434ce2
    docker/go-units@737572633c434ce2
    docker/go-units#40
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Dec 14, 2023
... to improve performance by not using regular expressions [1].
However, the potential reduction in binary size is lost because Toolbx
already uses the 'regexp' package to check if a string might be the ID
of an image or a valid container name.

[1] go-units commit 737572633c434ce2
    docker/go-units@737572633c434ce2
    docker/go-units#40

containers#1420
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants