Slow query response with Nebula version v3.1.0 #4227

porscheme · 2022-05-05T04:57:15Z

porscheme
May 5, 2022

Nebula version: v3.1.0
graphd: 1 (128GM, 2 TB SSD)
metad: 1 (128GM, 2 TB SSD)
storage: 3 (128GM, 2 TB SSD)

Below query took about 20 minutes

MATCH (s:Student)-[r]-(a:CourseTcode)-[rr]-(b)
WHERE a.CourseTcode.id == 522687
RETURN s, r, a, rr, b limit 3

Below is the profile

id	name	dependencies	profiling data
18	Project	16	ver: 0, rows: 3, execTime: 18355us, totalTime: 18365us
16	Limit	14	ver: 0, rows: 3, execTime: 25528291us, totalTime: 25528300us
14	Filter	6	ver: 0, rows: 11636144, execTime: 8150513us, totalTime: 8150522us

I changed my query like below, little improvement but not enough

MATCH (s:Student)-[r ]-(a:CourseTcode)-[rr]-(b)
WHERE id(a) == "522687"
RETURN p, r, a, rr, b limit 3

Below is the profile

id	name	dependencies	profiling data
18	Projection	16	ver: 0, rows: 3, execTime: 25216us, totalTime: 25227us
16	Limit	14	ver: 0, rows: 3, execTime: 20186664us, totalTime: 20186672us
14	Filter	7	ver: 0, rows: 11636144, execTime: 5799073us, totalTime: 5799088us

@wey-gu

wey-gu · 2022-05-05T07:20:35Z

wey-gu
May 5, 2022
Collaborator

Thanks @porscheme
Regarding the profile, it will be helpful to have the full profile output to see the whole time consumption distributions.

1. As you could see from the profile/explain output, the query started to seek a first as it's the only one with condition filtered for now, as you tested, id(a) == "522687" should be faster, but it should rarely help as it's not the major slow phase at all, while, please use id(foo) == xxx over property conditions whenever possible.

2. Due to the nature of query/storage separation design, it'll be costly to have lots of data being fetched from storage to query engine when some of the filter/limits cannot be pushed down to the storage side.

2.1 On the nebula graph side, introducing more optimization rules and storage pushdown operators would help here(progress: #2533 ), here I could see Filter/Limit is really costly, maybe there are some space to be optimized.

2.2 On the query composing side, adding more information to reduce the data being traversed would help:

2.2.1 MATCH (s:Student)-[r:EdgeTypeA|:EdgeTypeB|:EdgeTypeC]-(a:CourseTcode)-[rr:EdgeTypeE|:EdgeTypeF|:EdgeTypeG]-(b) if the edges type are not for all, please specify it as much as possible, same applied to the type of b.

2.2.2 Another approach could be to limit the traverse in the middle rather than only in the final phase:

i. it could be something like this, where, if you check its plan, the limit will be applied in the first part of the traversal

match (s:player)-[r]-(a:player)
where a.player.name == "Tim Duncan" 
with s,r,a limit 100
match (a:player)-[rr]-(b)
return s,r,a,rr,b limit 3

ii. or, even further, we use GO/ FETCH/ LOOKUP for this equivalent query(do query one step by one step, limit in each step) to enable better optimized performance, this is highly recommended in case of huge data volume queries when possible.

2.3 On the Super Node perspective, when few vertices could be connected to tons of vertices, if all of the queries are targeting sample(limit/topN) data instead of fetching all of them, or, for those supernodes, we would like to truncate data, a configuration in storageD max_edge_returned_per_vertex could be configured, i.e. 1000, or other values.

@Shylock-Hg do you see how any optimization rule could help here in 2.1, please?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow query response with Nebula version v3.1.0 #4227

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Slow query response with Nebula version v3.1.0 #4227

porscheme May 5, 2022

Below is the profile

Below is the profile

Replies: 1 comment

wey-gu May 5, 2022 Collaborator

porscheme
May 5, 2022

wey-gu
May 5, 2022
Collaborator