Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement getenv and putenv in go #1086

Merged
merged 23 commits into from
Oct 18, 2024
Merged

implement getenv and putenv in go #1086

merged 23 commits into from
Oct 18, 2024

Conversation

withinboredom
Copy link
Collaborator

Inspired by the discussion in #1080, this is an ultra-risky implementation that should fix #915. Basically, Go uses RWLocks to prevent multiple threads from writing to the environment, while PHP uses TSRM. TSRM and Go don't know about each other and if one happens to be reading the environment while the other is writing to it, a segfault is bound to happen.

So... this PR replaces the built-in getenv and putenv with ones that defer to Go. See #1080 discussions for why this may not be the best solution, but maybe it is a 'good enough' solution to prevent segfaults from concurrent environmental accesses in Go and PHP. If we are happy with this approach, this will need many tests to ensure the behavior is exactly the same as the built-in ones.

For example, a missing feature is setting the TZ environment variable which should trigger a reset of the current timezone... however, that would bleed into other threads currently running, so it isn't ideal. In fact, it might be good that it doesn't work.

frankenphp.go Outdated Show resolved Hide resolved
frankenphp.go Outdated Show resolved Hide resolved
frankenphp.go Outdated Show resolved Hide resolved
frankenphp.go Outdated Show resolved Hide resolved
frankenphp.go Outdated Show resolved Hide resolved
frankenphp.go Outdated Show resolved Hide resolved
@withinboredom
Copy link
Collaborator Author

In theory, we could call the go-side to fill the track_vars array. However, I believe this would end up conflicting with #1080 and in any case, would be an optimization for after that is merged.

@withinboredom
Copy link
Collaborator Author

withinboredom commented Oct 12, 2024

huh. I guess I will have to do that since ASAN found go/php stepping on each other.

frankenphp.go Outdated Show resolved Hide resolved
frankenphp.go Show resolved Hide resolved
Copy link
Owner

@dunglas dunglas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also rebase please?

This seems almost ready to merge.

frankenphp.c Outdated Show resolved Hide resolved
frankenphp.go Outdated Show resolved Hide resolved
frankenphp.c Outdated Show resolved Hide resolved
@AlliBalliBaba
Copy link
Collaborator

I think I realized, why $_ENV sometimes behaves weird #1061 . We can fix that in one swoop or a separate PR after this one.

Basically:

  • PHP has the following superglobals: _GET _POST _COOKIE _SERVER _ENV _REQUEST and _FILES.

  • _SERVER, _ENV and _REQUEST are special in that they have a 'jit' functionality, where they only get loaded when they are encountered on script compilation (this can be disabled via auto_globals_jit = Off)

  • Since in worker mode we do not recompile scripts, these 3 globals are never reloaded.

FrankenPHP does actually reload $_SERVER here. The other 2 are destroyed here, but their reference as a global variable in the EG(symbol_table) remains.

We should probably not destroy $_ENV and reset it in the EG(symbol_table) after each request.

We could also fix $_REQUEST as it's currently always empty in worker mode with auto_globals_jit = On. That will come at a performance penalty though (we can also just document the behaviour).

@withinboredom
Copy link
Collaborator Author

Yes @AlliBalliBaba, I think after this PR, we are in a place were we can implement the env in a php-fpm compatible way (if it isn't already). We should do some testing on fpm + frankenphp and identify any differences between them. Your PR gets us most of the way there, but this one just unifies Go/PHP environment access. I think there is still more to be done that is out of this PR's scope.

@withinboredom
Copy link
Collaborator Author

In other words, after this PR, the environment is entirely managed in go (whether it actually gets the environment or from a map), so we can do whatever we need to. There might be a slight performance penalty, due to the cgo call, but getting it right in a higher level language will probably be easier before we optimize back to c (if we even need to).

@withinboredom
Copy link
Collaborator Author

Not actually sure how to test $_ENV due to test pollution (previous tests causing $_ENV to be populated when we reach the test where we want to test $_ENV), so I've removed that part of the test.

frankenphp.c Outdated Show resolved Hide resolved
frankenphp.c Outdated
@@ -472,7 +563,28 @@ int frankenphp_update_server_context(
return SUCCESS;
}

PHPAPI void get_full_env(zval *track_vars_array) {
struct go_getfullenv_return full_env = go_getfullenv(thread_index);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can probably cache the environment somewhere here as a thread local (or even global) zend_array, so it will only get imported once. I'll check later this week again, but I think that's how FPM also behaves with since putenv doesn't alter $_ENV or $_SERVER

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably do it on the go side and not mix source of truths, but yeah.

Copy link
Collaborator

@AlliBalliBaba AlliBalliBaba Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah even in the manual someone mentions how putenv doesn't affect $_ENV. Since $_ENV is unchanging for the duration of the process I think it would be most efficient to directly store it as a zend_array at the start of the process. I can also do that in a follow-up PR if you want.

putenv and getenv should always access the real environment though like in your implementation 👍

Copy link
Collaborator Author

@withinboredom withinboredom Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working as it is supposed to. This only gets called in the following scenarios:

  1. on each request in cgi-mode when $_SERVER is filled
  2. at worker startup in worker mode to fill $_SERVER
  3. getenv() without any parameters
  4. jit-ENV when $_ENV is first accessed

There shouldn't be any reason to cache this.

@withinboredom
Copy link
Collaborator Author

This should be good to merge!

@withinboredom withinboredom merged commit e812473 into main Oct 18, 2024
45 checks passed
@withinboredom withinboredom deleted the go-env branch October 18, 2024 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation violation when running parallel E2E tests
3 participants