-
-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance problem on large datasets #244
Comments
Hi, query ($codes: [String!]!){
families(codes: $codes) {
code,
attributes {
code
}
}
} variables: {"codes": [
... 100 family codes
]} This should ease the parser work. |
Hello, Unfortunately, it does not improve the performance. Do you have some feedback with other projects about such performance problems? |
GraphQL is a better fit for highly structured content (when you may request lots of different subsets of data). It is not the best fit for situations when you only query for a list with a bunch of flat attributes (unfiltered). Custom REST endpoints will always outperform GraphQL in such scenario because well, you don't need most of the features of GraphQL for such queries! As for performance issues - there is indeed an overhead for each field, we do have plans to adjust common scenarios to use short evaluation paths and memoization, but no ETA for this feature. Actually, it is better to watch after graphql-js thread you mentioned, as they also experiment with such optimizations and we'll likely follow the same route they choose eventually. But also make sure your data layer is optimized as (usually) it is the bottleneck. Say how many requests to the underlying store do you make? How is your internal data structured, etc? |
Actually, I'm just doing 4 SQL requests. It costs "only" 260 ms when requesting all the data of the My SQL requests use the powerfulness of the new Combined it with a dataloader, and results are really good. Family REST endpoint, on the other side, can trigger more than 300 SQL requests for such families. Of course, when I request only some fields, GraphQL outperform REST. To be honest, I would bet on an improvement by 3 or 4x with GraphQL, as the SQL requests + hydration is highly optimized. Thanks for reply. |
Do you use default field resolver? It is hard to say anything without seeing some code (or better) reproducible example (since data layer is not the case - I guess we can just stub it with arrays to reproduce reliably). |
It's a proof of concept, but you can access code is in this bundle: https://github.com/ahocquard/pim-community-dev/blob/poc-the-poc/src/Pim/Bundle/ResearchBundle/PimResearchBundle.php It's not related to any other code of the whole software at this moment, so you can focus on it. I'm using dedicated resolver I will try to provide an example with the same volume of data and with fake repositories. |
I tried to reproduce this issue on your codebase, but couldn't. Check out https://github.com/vladar/graphql-perf-debug and run it on your machine. Mine laptop returns about 0.43 second to execute the query against an in-memory store (with dataloader, etc) on PHP 7.2 (but make sure to run it several times to populate PHP cache) So result contains 100 entries, each containing 100 attributes (10000 total). Maybe I've missed something, so feel free to adjust the code to reproduce. |
And make sure you have xdebug (and other similar tools) disabled when testing. |
Closing this for now. But if you have fresh details, feel free to re-open! |
Hello, Really sorry for the delay, I was pretty busy. I had the same results as yours with https://github.com/vladar/graphql-perf-debug (~400ms). The extra overhead of the SQL requests is about 250 ms. And I found the reason: the test you did was creating the same attributes for all the families. If you have completely different attribute codes for each family, the result is pretty bad: ~6s5 on my computer (meaning 10000 different attributes). You can test it here: In conclusion, GraphQL implementation is pretty fast for most of the needs, but it can be pretty slow on large datasets, and it depends of the distributivity of the data. |
Thanks for the further details! Actually, I did some profiling and the bottleneck is not graphql, but dataloader. Those lines are to blame: https://github.com/overblog/dataloader-php/blob/master/src/CacheMap.php#L63-L71 It makes any cache operation O(N) vs O(1) with obvious consequences. I did a quick change locally to just use array key vs looping over all cache entries and the total time for your test_2.php had dropped from 6s to ~0.7s Even here the overhead above 0.4s is caused by other tools, not graphql. There is no special caching right now on the graphql side of things. So for graphql it is irrelevant if you have 100 same attributes or different. @mcg-web Can you share some thoughts on why you guys use looping vs lookup in the lines above and can you change this? |
@vladar thank you for the feedback. We should give this a deeper look but we'll fix it a fast as possible 👍. |
So this is actually not a problem of this lib, closing |
Hello,
It seems I have some performance problems on large datasets with the library.
My request is:
Inside each families, I have 100 attributes.
It takes more than 1 seconds to respond. It seems that GraphQL implementation call
resolveField
for each field, and that's it's pretty slow.And when I want to display all properties in a family, it takes more than 6 seconds.
In this case, the rest API only takes 2 seconds.
Theoretically, rest API should be slower because with GraphQL, I use the dataloader, reducing drastically the number of requests to the database.
I tried to figure out the performance of the library by using your
phpbench
tests, but resolving 10000 fields takes only 200 ms with the generated schema.So, I'm wondering if I did something badly or if GraphQL is pretty slow on large datasets.
I saw that this problem is common on different languages (graphql/graphql-js#723).
I can provide blackfire profile if you want.
Thanks for any advice.
The text was updated successfully, but these errors were encountered: