-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use regexp #3460
Do not use regexp #3460
Conversation
Fedora CI failed with
which is pretty weird. Perhaps it's the problem with the underlying fs and so the kernel remounted it read-only. Restarted. |
Yes, but not the bottleneck of runc AFAICS. |
A few of runc dependencies use regexp as well.
Once (and if) all of these are merged, I am going to bring this out of draft, as it will save us 200K (130K stripped) from the runc binary size (which is about 5%, comparing the stripped binaries). |
Currently saves 531K from the stripped binary (which is 5.6% of binary size). Still very far from crun footprint, yet it seems like a worthwhile saving. |
66c008c
to
1d053af
Compare
|
|
Rebased, made a fresh comparison: [kir@kir-rhat runc]$ go version
go version go1.18.1 linux/amd64
[kir@kir-rhat runc]$ git describe HEAD
v1.1.0-239-g93ad6a85
[kir@kir-rhat runc]$ ls -l runc.main runc.no-regexp
-rwxrwxr-x. 1 kir kir 12999304 Jul 14 13:31 runc.main
-rwxrwxr-x. 1 kir kir 12708080 Jul 14 13:31 runc.no-regexp
[kir@kir-rhat runc]$ strip runc.main runc.no-regexp
[kir@kir-rhat runc]$ ls -l runc.main runc.no-regexp
-rwxrwxr-x. 1 kir kir 9310880 Jul 14 13:33 runc.main
-rwxrwxr-x. 1 kir kir 9100352 Jul 14 13:33 runc.no-regexp
[kir@kir-rhat runc]$ size runc.main runc.no-regexp
text data bss dec hex filename
5645422 3657744 229832 9532998 917646 runc.main
5503086 3587520 229416 9320022 8e3656 runc.no-regexp So, the saving is about 200K for a stripped binary, which is 2.3% of its size. |
c52b39d
to
8d41fd0
Compare
77be969
to
1ea0c76
Compare
go.mod
Outdated
@@ -8,7 +8,6 @@ require ( | |||
github.com/containerd/console v1.0.3 | |||
github.com/coreos/go-systemd/v22 v22.3.2 | |||
github.com/cyphar/filepath-securejoin v0.2.3 | |||
github.com/docker/go-units v0.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed v0.5.0 so you could update to that instead if you want to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, replace the docker/units removal with the bump commit; PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, I have re-applied the commit that replaces docker/units with the in-house implementation of RAMInBytes, and the difference between these two is negligible (meaning, I guess, that the linker is smart enough to remove the functions that are not used). Here are the raw numbers:
[kir@kir-rhat runc]$ size runc-no-regex runc-no-units
text data bss dec hex filename
5522150 3598080 229384 9349614 8ea9ee runc-no-regex
5520182 3597232 229352 9346766 8e9ece runc-no-units
[kir@kir-rhat runc]$ ls -l runc-no-regex runc-no-units
-rwxr-xr-x. 1 kir kir 9129056 Aug 31 18:26 runc-no-regex
-rwxr-xr-x. 1 kir kir 9128992 Aug 31 18:32 runc-no-units
89a7058
to
402bfa8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM, but left some comments to improve some of the comments (the "GoDoc" suggestions aren't new, so would also be fine as a follow-up)
@@ -261,7 +258,28 @@ func loadState(root string) (*State, error) { | |||
} | |||
|
|||
func validateID(id string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps would be good to (despite it not being exported) add a GoDoc to describe accepted formats, and outline some of it.
Looking at this PR made me go down a bit of a trip in history; let me post below, but I should probably post elsewhere
I noticed the underscore being added, which was not explicitly included in the
regexp, so I wondered if that was accidentally included in the past (as it's
easy to forget it's part of \w
). I also was curious if this ID was meant to
match other ID formats (e.g., used in Moby), some of which are required to start
with an alpha-numeric character, and ._-+
only allowed as separator.
So, I went on a short trip down history;
- Validation was added in 66e6806 / docker-archive/libcontainer@8fcda56 (Adds ID validation. docker-archive/libcontainer#249),
which allowed for "word" (\w
) characters and had the underscore added explicitly. Maximum length was limited to 1024 characters (no reason mentioned for this). - The length constraint was moved from the regex to a separate check in docker-archive/libcontainer@de57f78
- In d950500 (Allow hyphen in "id" (based on
cwd
pathname) #31), hyphens (-
) were added to the list of allowed characters;A directory with a hyphen currently generates an InvalidId error because
of the regex in libcontainer. I don't believe there is any reason a
hyphen should be disallowed. - Periods (
.
) were added in d4be340 (Fixing valid-id in regex #647), which also removed the explicit underscore, acknowledging that it's reduntant, and already part of\w
. The issue linked to that mentions Valid IDs #635 (comment);It is for valid filepath names because the container id needs to be written
to disk in something like/run
to store state. - Finally,
+
symbols were added in 4629512 (Allow + in container ID #675). The motivation there was Allow + in container ID #675 (comment)I added just the + because I have a use case where the foo+abc is used as a
dir name and I think we should allow the containers to be named following the
same rules as file/dir names.
Yes, I agree with you that we should allow more chars that fall into this category.
So, from the above;
- The original intent was for the ID to be something that can be used as part of a path, and limiting it to "portable" characters.
- Up until Allow + in container ID #675, this meant IDs followed the POSIX Portable Filename Character Set
Some what unfortunately, the runtime-spec does NOT define a format (or restrictions) in any way, which means that currently runc is more restrictive than the OCI runtime spec (overall this is fine). I found multiple attempts to have this included;
- There was a discussion in Add runtime state configuration and structs runtime-spec#87 (comment) (and on the mailing list) about the ID format being under-specified, and I see a patch was created for clarification (but not merged); wking/opencontainer-runtime-spec@2ec34c6 (posted at the end of this comment).
- Following Allow + in container ID #675, @mikebrow opened a ticket in the runc repository to gather interest on formalizing the validation Expand support for container id naming #676, which did not get participation, so was closed.
- Allow + in container ID #675 (comment) also mentions that the runtime spec removed the requirement for IDs to be usable as element in paths.
To summarize my thoughts;
- It's probably good to document this function, and to outline the accepted characters (possibly outline (some of) the motivations for the chosen set of characters.
- I (personally) would be in favor of formalizing accepted characters, and would've been in favor of adopting the POSIX Portable Filename Character Set, but the
+
addition was a bit unfortunate for that. - Perhaps it's worth giving it another attempt to update the OCI spec (at least with a recommendation if that's acceptable)
proposed PATCH for formalizing the format from 2015
From 2ec34c62b019431c0edebead50b56ebafe70a67c Mon Sep 17 00:00:00 2001
From: "W. Trevor King" <wking@tremily.us>
Date: Mon, 10 Aug 2015 12:07:20 -0700
Subject: [PATCH] runtime: Document container ID charset and uniqueness domain
Allow the runtime to use it's own scheme, but let the caller use UUIDs
if they want. Jonathan asked for clarification as part of #87, but
didn't suggest a particular approach [1]. When we discussed it in the
2015-08-26 meeting [2], the consensus was to just allow everything.
With container IDs like 'a/b/c' leading to state entries like
'/var/oci/containers/a/b/c/state.json'. But that could get ugly with
container IDs that contain '../' etc. And perhaps there are some
filesystems out there that cannot represent non-ASCII characters
(actually, I'm not even sure what charset our JSON is in ;). I'd
rather pick this minimal charset which can handle UUIDs, and make life
easy for runtime implementers and safe for bundle consumers at a
slight cost of flexibility for bundle-authors.
There was some confusion on the list about what "ASCII letters" meant
[3], so I've explicitly listed the allowed character ranges. Here's a
Python 3 script that shows the associated Unicode logic:
import unicodedata
# http://www.unicode.org/reports/tr44/tr44-4.html#GC_Values_Table
category = {
'Ll': 'lowercase letter',
'Lu': 'uppercase letter',
'Nd': 'decimal number',
'Pd': 'dash punctuation',
}
for i in range(1<<7):
char = chr(i)
abbr = unicodedata.category(char)
if abbr[0] in ['L', 'N'] or abbr == 'Pd':
cat = category[abbr]
print('{:02x} {} {}'.format(i, char, cat))
[1]: https://github.com/opencontainers/specs/pull/87#discussion_r35828046
[2]: https://github.com/opencontainers/specs/wiki/Meeting-Minutes-2015-08-26
[3]: https://groups.google.com/a/opencontainers.org/d/msg/dev/P9gZBYhiqDE/-ptpOcQ5FwAJ
Message-Id: <7ec9cff6-c1a6-4beb-82de-16eb412bf2f8@opencontainers.org>
Reported-by: Jonathan Boulle <jonathanboulle@gmail.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
---
runtime.md | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/runtime.md b/runtime.md
index be2045b58..c9da6af9a 100644
--- a/runtime.md
+++ b/runtime.md
@@ -11,6 +11,9 @@ By providing a default location that container state is stored external applicat
* **version** (string) Version of the OCI specification used when creating the container.
* **id** (string) ID is the container's ID.
+ Only ASCII letters, numbers, and hyphens are valid (a–z, A–Z, 0–9, and ‘-’).
+ This value must be unique for a given host, but need not be universally unique.
+ Runtimes must allow the caller to set this ID, so that callers may choose, for example, to use [UUIDs][uuid] for universal uniqueness.
* **pid** (int) Pid is the ID of the main process within the container.
* **root** (string) Root is the path to the container's bundle directory.
@@ -94,3 +97,5 @@ If a hook returns a non-zero exit code, then an error is logged and the remainin
`path` is required for a hook.
`args` and `env` are optional.
+
+[uuid]: https://tools.ietf.org/html/rfc4122
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Finally, + symbols were added in 4629512 (Allow + in container ID #675).
Most interestingly, as pointed out by @mrunalp below, this commit also implicitly adds comma (,
) to the list of allowed characters, since +-\.
is now a range from +
(ASCII 43) to .
(ASCII 45) and that includes ASCII 44 (,
).
Now we need to decide whether to keep supporting ,
or not. I vote to drop ,
and deal with the consequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment about TrimLeft
vs TrimPrefix
matches := re.FindStringSubmatch(verStr) | ||
if len(matches) < 2 { | ||
return 0, fmt.Errorf("can't parse version %s: incorrect number of matches %v", verStr, matches) | ||
str = strings.TrimLeft(str, `"v`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH! Actually, this is not correct; this should be strings.TrimPrefix
. strings.TrimLeft
takes a "cut set" of characters, so currently "vvvvvv""v"v"v""v245.4-1.fc32"
would be accepted;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct since it allows us to Trim either v
or "v
(i.e. with or without the quote).
Sure, it will also cut v"
or vvvv"""vvvv
but we do not expect such input here (the input expectations are documented).
Within the realm of expected input,
strings.TrimLeft(str, `"v`)
work the same as
strings.TrimPrefix(str, `"`)
strings.TrimPrefix(str, `v`)
but is somewhat more compact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thaJeztah PTAL ^^
@thaJeztah addressed your comments, PTAL |
case c == '_': | ||
case c == '+': | ||
case c == '-': | ||
case c == '.': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current regex also includes the comma :/
https://go.dev/play/p/n9ZiwZM8KMd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we missed \
before -
making it a range from +
to .
instead here - https://github.com/opencontainers/runc/pull/675/files
It should have been ^[\w+\-\.]+$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow.
I guess that the best way is to document that we're no longer allowing ,
in container name.
The alternative is to add ,
to the above switch
.
Replace a regex with a simple for loop. Document the function. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Rewrite systemdVersionAtoi to not use regexp, and fix two issues: 1. It was returning 0 (rather than -1) for some errors. 2. The comment was saying that the input string is without quotes, while in fact it is. Note the new function, similar to the old one, works on input either with or without quotes. Amend the test to add test cases without quotes. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@thaJeztah @mrunalp I have documented the function, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
w.r.t.;
I guess that the best way is to document that we're no longer allowing , in container name.
The alternative is to add , to the above switch.
I think it should be ok to not allow ,
, as (from the discussion) it looks like that was a bug, but perhaps good to make sure it's called out in the changelog / release-notes
Don't get me wrong, I love regular expressions. But the regexp package is somewhat big and slow, and we can do just fine without.
Remove two last uses of regexp in runc code (modulo tests).
This saves about 200K (2.3%) from the resulting binary, without losing any of the functionality.
Currently a draft pending