-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCRE2: optimize memory allocations #15395
base: master
Are you sure you want to change the base?
PCRE2: optimize memory allocations #15395
Conversation
Wraps the system API (pthread on unix, FLS on windows) for thread local storage (TLS) or thread specific storage (TSS). Allows to register a destructor to automatically cleanup the thread local values when a thread terminates.
Both are merely scratchpad for the current thread to execute the current, blocking, PCRE2 match. There are no concurrency issues. We only need to make sure that only one thread can access both allocations at any one time. Keeping a thread local is simpler and faster than using Crystal::ThreadLocalValue and we don't need to keep live references to GC allocated objects (both are allocated using malloc).
Damn, that was promising, and it's working well on Linux, but both Darwin and Windows completely broke 😭 Windows could be explained by using |
WTH: I can't reproduce the errors in my Windows VM (MSVC) 😕 |
I can reproduce when running the compiler specs. Looking at the memory graph, the committed memory keeps growing rapidly, and it starts failing exactly when reaching 32GB (the VM has 16GB of RAM). That can't be a coincidence. When running the compiler specs from master, the committed memory is rather stable or grows slowly (around 3.5GB). Let's note that a linker command still fails, but I get a bunch of weird spec errors in the VM. Maybe |
We noticed in #15088 that we don't need
Crystal::ThreadLocalValue
in the Regex PCRE2 engine.We can reuse the JIT stack but also the match data for every Regex (no need for a specific match data per Regex). We can allocate one and make sure it's not used by other threads... hence the thread locals: no more spinlock (thread contention) nor hash.
It's simpler and faster. Here's the benchmark from #13144 for example:
Enabling MT also no longer has any impact on performance:
The drawback is that we must allocate each matchdata with a maximum number of ovectors (65535). That might increase memory usage, though I failed to notice it in practice. Maybe not allocating memory for every regular expression is helping?
Note: this PR will be separated into a couple PRs to introduce
Crystal::System::ThreadLocal(T)
. The point of this new type is for this patch, so I want approval on the overall approach before the split.I could have used
@[ThreadLocal]
but some targets don't support it (namely: Android, MinGW and OpenBSD) and we can't register destructors either (on thread shutdown). But usingpthread_key_create
orFlsAlloc
we can 😍