Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid computing strlen() inside loops #16584

Merged
merged 3 commits into from
Oct 2, 2024
Merged

Conversation

rilysh
Copy link
Contributor

@rilysh rilysh commented Sep 30, 2024

Description

Compiling with -O0 (no proper optimizations), strlen() call in loops for comparing the size, isn't being called/initialized before the actual loop gets started, which causes n-numbers of strlen() calls (as long as the string is). Keeping the length before entering in the loop is a good idea.

On some places, even with -O2, both GCC and Clang can't recognize this pattern, which seem to happen in an array of char pointer.

Out of this context, on some places, 32-bit int is used to store the length of a given string, which implicitly casts strlen() result to an int. I haven't made any changes to this.

How Has This Been Tested?

Not yet tested.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Compiling with -O0 (no proper optimizations), strlen() call
in loops for comparing the size, isn't being called/initialized
before the actual loop gets started, which causes n-numbers of
strlen() calls (as long as the string is). Keeping the length
before entering in the loop is a good idea.

On some places, even with -O2, both GCC and Clang can't
recognize this pattern, which seem to happen in an array
of char pointer.

Signed-off-by: rilysh <nightquick@proton.me>
Since zvol_name was modified, we need to compute the
length again.

Signed-off-by: rilysh <nightquick@proton.me>
cmd/zdb/zdb.c Outdated
@@ -8816,7 +8816,7 @@ zdb_read_block(char *thing, spa_t *spa)
for (s = strtok_r(flagstr, ":", &tmp);
s != NULL;
s = strtok_r(NULL, ":", &tmp)) {
for (i = 0; i < strlen(flagstr); i++) {
for (i = 0, len = strlen(flagstr); i < len; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this strlen() before the if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. like this?

len = strlen(flagstr);
for (i = 0; i < len; i++)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I do like compact code, but not by the cost of readability.

Copy link
Member

@amotin amotin Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What really looks odd to me though, unrelated to this commit, is that flagstr is used inside the strtok_r() loop, while token's cut by it (s) is not. Does this code actually work as intended, whatever it supposed to do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried with a small example, this portion looks a little bit hacky

#include <stdio.h>
#include <string.h>

int main(void)
{
	char *flag, *tmp;
        char *s, *tok;

	s = "hello:world:s:this:world";
	flag = strdup(s ? s : "");

	tmp = NULL;
        for (s = strtok_r(flag, ":", &tmp); s != NULL;
	     s = strtok_r(NULL, ":", &tmp)) {
		fprintf(stdout, "len: %zu\n", strlen(flag));
	}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks "a little bit" broken. But I have no idea what it supposed to do, just suspect not this.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 30, 2024
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 30, 2024
Signed-off-by: rilysh <nightquick@proton.me>
@behlendorf behlendorf merged commit 86737c5 into openzfs:master Oct 2, 2024
16 of 21 checks passed
@n0xena
Copy link

n0xena commented Oct 2, 2024

TIL: for loops work different than I thought for the past 20 years

Sorry guys to spam as it likely doesn't belong here - but as I just found this: Java suffers from the same problem!
Please bare with me - I just started learning C - but I'm quite familiar with Java for about 20 years or so.

#include <stdio.h>
#include <string.h>

int ll(char *s);

int main(void)
{
        char *s;
        s="hello";
        for(int i=0; i<ll(s); i++)
        {
                printf("%c\n", s[i]);
        }
        return 0;
}

int ll(char *s)
{
        printf("ll called\n");
        return strlen(s);
}
class Test
{
        public static void main(String... args)
        {
                String string="hello";
                for(int i=0; i<len(string); i++)
                {
                        System.out.println(string.substring(i, i+1));
                }
        }
        private static int len(String string)
        {
                System.out.println("len called");
                return string.length();
        }
}

Until this here I always thought the computation of the strings length is done once - wrote this test code and got surprised: Nope! It's executed every single time! Both codes produce the same output:

ll called
h
ll called
e
ll called
l
ll called
l
ll called
o
ll called
len called
h
len called
e
len called
l
len called
l
len called
o
len called

I started learning Java roughly 20 years ago (plus/minus) - and in the book I used (no, it was not Head First Java) this style was all over the place. Wonder if the books author ever learned about that.
Now that I learned about this issue I wonder how many more languages suffer from this?

Anyway - @rilysh nice catch!

@amotin
Copy link
Member

amotin commented Oct 2, 2024

@n0xena for loop is expected to execute that function every time. The only way when compiler can optimize it is if it knows that the function has no side effects. I think some libraries and/or compilers can be more "clever" on that front than others. But it is probably better not to rely on it when possible.

@n0xena
Copy link

n0xena commented Oct 3, 2024

@amotin Thanks for the explanation. 👍

ptr1337 pushed a commit to CachyOS/zfs that referenced this pull request Nov 14, 2024
Compiling with -O0 (no proper optimizations), strlen() call
in loops for comparing the size, isn't being called/initialized
before the actual loop gets started, which causes n-numbers of
strlen() calls (as long as the string is). Keeping the length
before entering in the loop is a good idea.

On some places, even with -O2, both GCC and Clang can't
recognize this pattern, which seem to happen in an array
of char pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: rilysh <nightquick@proton.me>
Closes openzfs#16584
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants