Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json_decode() and integer overflow strike back #1003

Closed
theDisco opened this issue Dec 8, 2015 · 25 comments
Closed

json_decode() and integer overflow strike back #1003

theDisco opened this issue Dec 8, 2015 · 25 comments

Comments

@theDisco
Copy link
Contributor

theDisco commented Dec 8, 2015

I have just installed version 3.0.0-beta1 after reading #717 and #941 in hope to get the integer overflow problem in json_decode() sorted out. After some debugging I have found following problems:

  1. In some of the older releases of php5-json Debian package JSON_BIGINT_AS_STRING is not implemented throwing a warning [Warning] json_decode(): option JSON_BIGINT_AS_STRING not implemented. Below my version table:
vagrant@dev-box:/srv/app$ apt-cache policy php5-json
php5-json:
  Installed: 1.3.2-2build1
  Candidate: 1.3.9-1+deb.sury.org~trusty+2
  Version table:
     1.3.9-1+deb.sury.org~trusty+2 0
        500 http://ppa.launchpad.net/ondrej/php5/ubuntu/ trusty/main amd64 Packages
 *** 1.3.2-2build1 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status

After upgrading to 1.3.9 json_decode() does not complain about JSON_BIGINT_AS_STRING anymore. Maybe you should look at how firebase/php-jwt (found here) has solved the problem and implement it similarly?

  1. After sorting out the problem with json_decode() and JSON_BIGINT_AS_STRING I get following error now Fatal error: Cannot use object of type stdClass as array in /srv/app/vendor/ruflin/elastica/lib/Elastica/Response.php on line 121 which can be traced to this line. Fix is very simple, just replace false with true, but what is more important, I couldn't find a test for it anywhere in the test directory. I am more than happy to provide a PR for that.

  2. A smaller thing, but it would save me a lot of time in debugging. The default definition of bigintConversion config variable from here should be actually listed under config key, since it is obtained by the connection from the config in here. I can provide tests for that too.

Can you please confirm my findings?

@ruflin
Copy link
Owner

ruflin commented Dec 8, 2015

@theDisco Which PHP version are you using?

@theDisco
Copy link
Contributor Author

theDisco commented Dec 8, 2015

@ruflin here it is:

php -v
PHP 5.5.30-1+deb.sury.org~trusty+1 (cli) (built: Oct  4 2015 16:23:01) 
Copyright (c) 1997-2015 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2015 Zend Technologies
    with Zend OPcache v7.0.6-dev, Copyright (c) 1999-2015, by Zend Technologies
    with Xdebug v2.3.2, Copyright (c) 2002-2015, by Derick Rethans

@ruflin
Copy link
Owner

ruflin commented Dec 8, 2015

@theDisco Very interesting. I found some more information here which could be related: http://stackoverflow.com/questions/19520487/json-bigint-as-string-removed-in-php-5-5

@theDisco
Copy link
Contributor Author

theDisco commented Dec 8, 2015

@ruflin haha, that is one of the links I have provided in my initial comment :) Nevertheless it doesn't explain why 2) is failing and why 3) is implemented as it is?

@ruflin
Copy link
Owner

ruflin commented Dec 8, 2015

@theDisco Sorry, missed that one. Currently busy with something else but will have a closer look later. Perhaps it makes sense to add @oldskool to the conversation?

@ruflin
Copy link
Owner

ruflin commented Dec 9, 2015

@theDisco Here a more detailed answer:

  1. We initially thought it is only depending on the PHP version. But as you proofed above it isn't. The php-jwt implementation looks promising. At the same time this error only appears, if you enable JSON_BIGINT_AS_STRING right? So if you enable it, I assume they know quite well what they are doing and should use the appropriate versions of php and plugins.

  2. I tried to figure out why we set it here to false, but couldn't find a reason. It would be awesome if you could provide a PR with a fix and test.

  3. I'm not 100% sure what you mean by "config" key? Feel free to open also directly a PR for the suggested change as quite often code says more then a thousand words.

Thanks a lot for the detailed reporting and also the suggested solutions. Sorry again about my brief answer in the beginning.

@oldskool
Copy link
Contributor

oldskool commented Dec 9, 2015

Interesting find, thanks for reporting this @theDisco.

  1. We could add an additional check before calling JSON_BIGINT_AS_STRING to see if it's actually available. Like:
if ($this->getJsonBigintConversion() && defined('JSON_BIGINT_AS_STRING')) {
    $response = JSON::parse($response, false, 512, JSON_BIGINT_AS_STRING);
} else {
    $response = JSON::parse($response);
}
  1. (continued) Of course this is just a workaround for the message and doesn't really solve the problem at hand. Maybe we can play with the jwt implementation to see if that could work for this case as well.

  2. It's set to false, because that has always been the default value that PHP uses when json_decode is called (see $assoc var in the docs).

  3. I'm not sure that I follow. The $connection->hasConfig just checks if the $_config['bigintConversion'] key is set for your Client implementation. If you have a cleaner idea of handling that, please do feel free to create a PR for that.

@ruflin
Copy link
Owner

ruflin commented Dec 9, 2015

The main reason I don't like the suggestion for 1) above is that the user doesn't get what he configured. Elastica as a library should make sure not to do magic and the the developer gets what he configured. If he doesn't he should get an exception.

For 2) Thanks @oldskool for bringing that about. I remember I strongly suggested this change to have the default behaviour. What I'm asking myself now is what exactly goes wrong in 2, as this then should be unrelated to 1). It would also fail without JSON_BIGING_*?

@theDisco
Copy link
Contributor Author

theDisco commented Dec 9, 2015

@ruflin @oldskool

  1. I like the combined approach of checking, if JSON_BIGINT_AS_STRING was configured and is defined and if not throwing an exception. Will include that in my PR.

  2. I understand that false is the default value, but then we should cast to array, because the rest of the code does not know how to deal with \stdClass. I will prepare a PR for that and we can discuss it there.

  3. I didn't explain this one well enough. It boils down to these two cases. In this one bigintConversion is not passed to the connection correctly:

$config = [
    'host' => '127.0.0.1',
    'port' => 9200,
    'bigintConversion' => true,
];
new \Elastica\Client($config);

In this case it does:

$config = [
    'host' => '127.0.0.1',
    'port' => 9200,
    'config' => [
        'bigintConversion' => true,
    ],
];
new \Elastica\Client($config);

After debugging a little bit more, it seems bigintConversion has to be added here https://github.com/ruflin/Elastica/blob/master/lib/Elastica/Client.php#L121:

    /**
     * Creates a Connection params array from a Client or server config array.
     *
     * @param array $config
     *
     * @return array
     */
    protected function _prepareConnectionParams(array $config)
    {
        $params = array();
        $params['config'] = array();
        foreach ($config as $key => $value) {
            if (in_array($key, array('curl', 'headers', 'url'))) {
                $params['config'][$key] = $value;
            } else {
                $params[$key] = $value;
            }
        }

        return $params;
    }

@ruflin
Copy link
Owner

ruflin commented Dec 9, 2015

About 3, I think the confusing part is that the current implementation uses the transport(connection) object to set the variable for response object. This works but is ugly as it has to be set multiple times and is confusing as we see. The problem is: What is the better solution? I'm quite oposed to set global variables. So the question is how do we get it best from the client to the response object (which doesn't no about the client ...).

@theDisco
Copy link
Contributor Author

theDisco commented Dec 9, 2015

I don't think it is that bad, because the client acts as a factory and uses configuration to set up the underlying ecosystem. Transport should be primitive and only responsible for transferring the data between two endpoints. The response creation can be moved to the request object and request object can be composed of serializers which take care of json transformation. This way you would completely decouple the static calls to JSON object. Nevertheless this implementation exceeds the scope of the problem we are dealing with in here. I also think all of that is not necessary if you are going to replace the communication layer with php-elasticsearch (as stated in #944 and #989).

@ruflin
Copy link
Owner

ruflin commented Dec 9, 2015

@theDisco You are correct, especially with your last statement. Going with #944 will move the problem one layer up. I'm sure @polyfractal likes that :-D

@polyfractal
Copy link

Urgh. Json encoding is the bane of my existence in PHP. This issue came up at es-php recently too: elastic/elasticsearch-php#334

:'(

I'm not really sure the best way forward. As mentioned here, JSON_BIGINT_AS_STRING is not supported uniformly across versions and platforms. There are native-PHP solutions but they look very fragile (lots of regex) and probably slow. I don't like checking if the option is supported vs. not, since the client will behave inconsistently depending on your platform.

Personally, this feels like something that should be handled outside the library, either by the user dealing with BigInts ahead of time (since ES doesn't support BigInts either), or by subclassing the Serializer and providing one that is BigInt-aware (because you know you have BigInt support).

That said, I could add a BigIntSerializer, or ability to specify custom json_encode() options. Then the user (and/or Elastica) can select for themselves?

Open to suggestions :)

EDIT: Realized this is for json_decode, not encode. Meaning you dont have a chance to escape yourself. Sigh. Will think about the best of the options :(

@theDisco
Copy link
Contributor Author

theDisco commented Dec 9, 2015

I think detection, if BigInt is supported or not is a good thing. The goal of libraries like elastica or es-php is to hide the details of low or even high level implementations and let the user concentrate on the use case. I like the simplicity of just instantiating Elastica\Client and using it without providing and connection details. To be honest, if I wouldn't hit this bug, I would probably never know about the whole licensing hassle with json libraries and to be honest I wouldn't like to know about it in the first place, because it is very sad... php has already enough bad rep, anyway...

After saying that I think it is a good thing to have 2 serializers, one with BigInt support and one without. I would add a detection for BigInt support by default and let the user overwrite the serializer, if necessary. After all you want to solve things for the user and not make them more complicated as they already are. You should also think about aligning with the requirements, es-php is >= 5.3.9, elastica >= 5.4.0.

Let me know, what you think about that?

@polyfractal
Copy link

Regarding versions, ES-PHP 2.0 is explicitly PHP 5.4+, so that will be fine. I'm trying to move people off ES-PHP 1.x as soon as possible :)

I'll mull over the options and work something out. I agree this is a less than ideal situation for the end-user, who doesn't really care about how numbers are encoded at all.

And I agree this should probably be handled at the ES-PHP level, not Elastica :)

@ruflin
Copy link
Owner

ruflin commented Dec 9, 2015

@polyfractal Yep, fun problem. Until yesterday I thought JSON_BIGINT_AS_STRING is "solved" by going with PHP 5.4 but @theDisco proofed me wrong :-(

theDisco added a commit to theDisco/Elastica that referenced this issue Dec 9, 2015
…bigintConversion config parameter to the connection, provide tests for both
@theDisco
Copy link
Contributor Author

theDisco commented Dec 9, 2015

PR is ready, when can I expect it to become available through composer?

@ruflin
Copy link
Owner

ruflin commented Dec 9, 2015

@theDisco Just commented on it. What are you referencing to in composer? Master?

@theDisco
Copy link
Contributor Author

theDisco commented Dec 9, 2015

@ruflin Sorry, let me clarify, when will you create a tag I will be able to configure in composer.json? I haven't found any information on how you deal with versioning...

@ruflin
Copy link
Owner

ruflin commented Dec 9, 2015

@theDisco There is no predefined schedule. The main issue / pr I would like to get in before tagging 3.0.0 is this one: #1001 Unfortunately there is no PR yet.

@theDisco
Copy link
Contributor Author

theDisco commented Dec 9, 2015

@ruflin I was rather thinking about 3.0.0-beta2 :)

@ruflin
Copy link
Owner

ruflin commented Dec 9, 2015

The other thing I'm thinking about is actually going with 3.0.0 to bring the adoption to discover more bugs. I think currently lots of user stick with the old version.

@theDisco
Copy link
Contributor Author

theDisco commented Dec 9, 2015

@ruflin This works out really well for me, but you need to think about people, who are still using facets instead of aggregations. They will have to invest some time to refactor their code. You are completely right though, people should be encouraged to upgrade to newer versions.

@ruflin
Copy link
Owner

ruflin commented Dec 10, 2015

@theDisco Unfortunately it is much worse then people just using Facets. It's also about the PHP version and some people use even older ES versions like 0.90.0 ...

I'm closing this issue as the PR was merged. Thanks again also for bringing up all the points and take the time to discuss it.

@ruflin ruflin closed this as completed Dec 10, 2015
@theDisco
Copy link
Contributor Author

Thanks for taking your time and merging it so quick!

michalbrauner pushed a commit to BrandEmbassy/Elastica that referenced this issue Dec 4, 2017
… bigintConversion config parameter to the connection, provide tests for both
michalbrauner pushed a commit to BrandEmbassy/Elastica that referenced this issue Dec 4, 2017
… bigintConversion config parameter to the connection, provide tests for both
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants