-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed Queries #2202
Distributed Queries #2202
Conversation
4a39dd6
to
dbd33b7
Compare
40b6866
to
2abe5b3
Compare
Refactored query engine to have different processing pipeline for raw queries. This enables queries that have a large offset to not keep everything in memory. It also makes it so that queries against raw data that have a limit will only p rocess up to that limit and then bail out. Raw data queries will only read up to a certain point in the map phase before yielding to the engine for further processing. Fixes #2029 and fixes #2030
Fixes issue #1649
b1a4df1
to
b461a36
Compare
No doubt about it, URL routing is definitely brittle and needs work.
With a max remote reponse of 1GB, testing is unable to proceed.
b461a36
to
5882f0b
Compare
} | ||
|
||
// request to start streaming results | ||
resp, err := http.Post(m.dataNodes[0].URL.String()+"/run_mapper", "application/json", bytes.NewReader(b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be m.dataNode[rand.Intn(len(m.dataNodes))]
to avoid a single node handling all the traffic? Or is m.dataNodes
already randomized beforehand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking exactly the same thing myself.
Two comments but LGTM otherwise. |
No description provided.