-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Only unserialize Phar metadata when getMetadata() is called #5855
Conversation
@bishopb @smalyshev - this took longer than I thought. Tests were passing locally.
|
43cceb3
to
880ea80
Compare
Is this one ready for review? It looks like the new parameter for getMetadata from the RFC is not implemented. |
I'd gotten distracted when investigating how Phars track metadata internally, and the best way to refactor it to implement the main part of this RFC. This should be implemented by Friday or earlier (implementation details can be copied from unserialize()). |
To clarify what I plan to implement:
On second thought, https://externals.io/message/111093#111121 sound like a much better idea if future options are changed in unserialize() in the future (e.g. the 'max_depth' option already added in php 8.0, or future options such as 'forbid_references') |
https://wiki.php.net/rfc/phar_stop_autoloading_metadata#proposal has been amended as version 0.4 with the intent to change the signature to |
880ea80
to
1dfc39d
Compare
That's implemented now, and tests passed locally. |
a2bfcc3
to
2aa66a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial notes.
ext/phar/phar.c
Outdated
/* Assert it should not be possible to create raw data in a persistent phar (i.e. from cache_list) */ | ||
|
||
ZEND_ASSERT(has_unserialize_options || Z_ISUNDEF(tracker->val)); | ||
if (has_unserialize_options && tracker->str == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, is the idea here that we can call getMetadata multiple times with different unserialize options and because of that have to go through an artificial serialize + unserialize cycle here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is expending unnecessary effort and complexity for something nobody is going to care about. I would just spec the function to say "returns the metadata if already unserialized, or unserializes it with the given options if not".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, is the idea here that we can call getMetadata multiple times with different unserialize options and because of that have to go through an artificial serialize + unserialize cycle here?
Only if there wasn't serialized data. But in the latest commit, I realize this is unnecessary - both setMetaData and reading a phar file ensure there is serialized data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that a few days ago but forgot to push that.
I also added another commit
Improve robustness of `Phar*->setMetadata()`
Add sanity checks for edge cases freeing metadata, when destructors
or serializers modify the phar recursively.
Typical use cases of php have read-only phars (from the ini setting) and would not be affected.
Apart from the one unresolved comment, this all looks okay-ish, but I also have pretty much zero familiarity with phar and the issues that may be involved here. Maybe @bishopb wants to take a look at this as well. |
a496a62
to
f2c1329
Compare
In other words, don't automatically unserialize when the magic phar:// stream wrappers are used. RFC: https://wiki.php.net/rfc/phar_stop_autoloading_metadata Also, change the signature from `getMetadata()` to `getMetadata(array $unserialize_options = [])`. Start throwing earlier if setMetadata() is called and serialization threw. See https://externals.io/message/110856 and https://bugs.php.net/bug.php?id=76774 This was refactored to add a phar_metadata_tracker for the following reasons: - The way to properly copy a zval was previously implicit and undocumented (e.g. is it a pointer to a raw string or an actual value) - Avoid unnecessary serialization and unserialization in the most common case - If a metadata value is serialized once while saving a new/modified phar file, this allows reusing the same serialized string. - Have as few ways to copy/clone/lazily parse metadata (etc.) as possible, so that code changes can be limited to only a few places in the future. - Performance is hopefully not a concern - copying a string should be faster than unserializing a value, and metadata should be rare in most cases. Remove unnecessary skip in a test(Compression's unused) Add additional assertions about usage of persistent phars Improve robustness of `Phar*->setMetadata()` - Add sanity checks for edge cases freeing metadata, when destructors or serializers modify the phar recursively. - Typical use cases of php have phar.readonly=1 and would not be affected.
da612a3
to
28417b7
Compare
I updated the UPGRADING documentation and squashed the commits and rebased with no other changes. At the direction of the release manager for php 8.0, I'll be merging this PR this evening in the absence of major issues. https://externals.io/message/111286 (there are currently 24 yes votes and 0 no votes)
To avoid potential merge conflicts, the update to NEWS was left out of this PR |
In other words, don't automatically unserialize when the magic
phar:// stream wrappers are used.
RFC: https://wiki.php.net/rfc/phar_stop_autoloading_metadata
See https://externals.io/message/110856 and
https://bugs.php.net/bug.php?id=76774
This was refactored to add a phar_metadata_tracker for the following reasons:
(e.g. is it a pointer to a raw string or an actual value)
this allows reusing the same serialized string.
so that code changes can be limited to only a few places in the future.
than unserializing a value, and metadata should be rare in most cases.