Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zstd CPU overhead optimizations #11709

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open
9 changes: 6 additions & 3 deletions lib/libzstd/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ include $(top_srcdir)/config/Rules.am
VPATH = $(top_srcdir)/module/zstd

# -fno-tree-vectorize is set for gcc in zstd/common/compiler.h
# Set it for other compilers, too.
# Set it for other compilers, too.
AM_CFLAGS += -fno-tree-vectorize

noinst_LTLIBRARIES = libzstd.la
Expand All @@ -14,8 +14,11 @@ KERNEL_C = \

nodist_libzstd_la_SOURCES = $(KERNEL_C)

lib/zstd.$(OBJEXT): CFLAGS += -fno-tree-vectorize -include $(top_srcdir)/module/zstd/include/zstd_compat_wrapper.h -Wp,-w
lib/zstd.l$(OBJEXT): CFLAGS += -fno-tree-vectorize -include $(top_srcdir)/module/zstd/include/zstd_compat_wrapper.h -Wp,-w
# see also: module/zstd/Makefile.in for the
# kernel-build equivalents of these rules

lib/zstd.$(OBJEXT): CFLAGS += -fno-tree-vectorize -include $(top_srcdir)/module/zstd/include/zstd_compat_wrapper.h -Wp,-w -I$(top_srcdir)/module/zstd/include
lib/zstd.l$(OBJEXT): CFLAGS += -fno-tree-vectorize -include $(top_srcdir)/module/zstd/include/zstd_compat_wrapper.h -Wp,-w -I$(top_srcdir)/module/zstd/include

zfs_zstd.$(OBJEXT): CFLAGS += -include $(top_srcdir)/module/zstd/include/zstd_compat_wrapper.h
zfs_zstd.l$(OBJEXT): CFLAGS += -include $(top_srcdir)/module/zstd/include/zstd_compat_wrapper.h
Expand Down
3 changes: 3 additions & 0 deletions module/zstd/Makefile.in
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ obj-$(CONFIG_ZFS) := $(MODULE).o
asflags-y := -I$(zstd_include)
ccflags-y := -I$(zstd_include)

# see also: lib/libzstd/Makefile.am for the
# userspace-build equivalents of these rules

# Zstd uses -O3 by default, so we should follow
ccflags-y += -O3

Expand Down
21 changes: 21 additions & 0 deletions module/zstd/include/string.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,27 @@ extern "C" {
#include_next <string.h>
#endif /* _KERNEL */

/*
* GCC/Clang doesn't know that the kernel's memcpy (etc)
* are standards-compliant so it can't be sure that it
* can inline its own optimized copies without altering
* behavior. This is particularly significant for
* zstd which, for example, expects a fast inline memcpy
* in its inner loops. Explicitly using the __builtin
* versions permits the compiler to inline where it
* considers this profitable, falling back to the
* kernel's memcpy (etc) otherwise.
*/
#if defined(_KERNEL) && defined(__GNUC__) && __GNUC__ >= 4
#define ZSTD_memcpy(d, s, l) __builtin_memcpy((d), (s), (l))
#define ZSTD_memmove(d, s, l) __builtin_memmove((d), (s), (l))
#define ZSTD_memset(p, v, l) __builtin_memset((p), (v), (l))
#else
#define ZSTD_memcpy(d, s, l) memcpy((d), (s), (l))
#define ZSTD_memmove(d, s, l) memmove((d), (s), (l))
#define ZSTD_memset(p, v, l) memset((p), (v), (l))
#endif

Copy link
Contributor

@mjguzik mjguzik Oct 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to ask what was done here to confirm compiler builtins are not used as is. Note general kernel code in both FreeBSD and Linux is using them, so something special would have to be happening in zstd code for that to not happen.

I did a quick check by adding this on FreeBSD:

diff --git a/sys/contrib/openzfs/module/zstd/lib/common/xxhash.c b/sys/contrib/openzfs/module/zstd/lib/common/xxhash.c
index 597de18fc895..c3254a1ce646 100644
--- a/sys/contrib/openzfs/module/zstd/lib/common/xxhash.c
+++ b/sys/contrib/openzfs/module/zstd/lib/common/xxhash.c
@@ -86,6 +86,9 @@ static void XXH_free (void* p) { free(p); }
#include <string.h>
static void* XXH_memcpy(void* dest, const void* src, size_t size) { return memcpy(dest,src,size); }

+void foomemset(char *buf);
+void foomemset(char *buf) { memset(buf, 0, 20); }
+

... and confirmed foomemset is using the builtin at least on that platform, I don't have handy manner to check Linux right now.

Regardless of what's going on, replacing memcpy et al uses with ZSTD_ prefixed variants is in my opinion too problematic -- for one one will have to keep a look out on each update.

In absolute worst case this can redefine these routines, but as noted previously this should not even be necessary. If things fail on Linux, there is probably a missing include.

Copy link
Contributor Author

@adamdmoss adamdmoss Oct 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to ask what was done here to confirm compiler builtins are not used as is.

The kernel's out-of-line memcpy() etc were hot on perftop; their being visible at all there means they're non-builtins.

replacing memcpy et al uses with ZSTD_ prefixed variants is in my opinion too problematic -- for one one will have to keep a look out on each update.

Upstream zstd already did this

#ifdef __cplusplus
}
#endif
Expand Down
Loading