-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handing out topology information to CrateDB clients #170
Comments
Hi @amotl, There is currently a k8s load balancer in front of all the CrateDB nodes, so round-robin is already happening in that sense - although it is L4 so would balance on the TCP connection level only. Are you imagining we do smarter routing on a per-request basis (i.e. maybe based on the URL)? HAProxy in front would be possible, but it's a very non-trivial thing to do (as we get the current LB for free from k8s, but would need to configure HAProxy ourselves). Do we see some scalability challenge in the future with this setup? Regards, |
Not sure what we would gain from changing to HAProxy on k8s. Certainly it would be an option for on-premise setups. Exposing each node by it own public IP makes us possibly bad neighbors on a cloud, as we would allocate a lot of IPs. Anyhow we would give up the health-check on the backend service. |
We could technically expose each node as a different port on the LB ( |
Hi Romanas and Walter, thanks for your answers and thoughts about this. The issue has just been created here to add something to the topic @SBlechmann, @c0c0n3 and @chicco785 are discussing at orchestracities/ngsi-timeseries-api#452. As I am not very much into the details of efficiently operating a CrateDB cluster within a Kubernetes environment, I decided to reach out to you and ask for your opinion about this.
This doesn't sound bad at all as it will probably also be able to handle both HTTP connections to port 4200 as well as PostgreSQL wire protocol connections to port 5432?
No, not at all. We just shared some thoughts with @mfussenegger and @seut the other day and concluded that - if there would be demand for that - HAProxy would be able to apply more sophisticated balancing algorithms.
I completely understand that.
I hear you. So, the conclusion to this is that everything balancing should already be handled by the clustering infrastructure and the CrateDB client will just communicate with a single endpoint, right? With kind regards, |
So, when imagining a CrateDB cluster comprised of nodes having different roles (e.g. read-only nodes vs. equally shared roles), the K8s load balancer in front of all the CrateDB nodes will have to be made aware where to distribute the requests to, right? That might be only a subset of all CrateDB nodes, right? I am just curious about this topic: Will |
Hey Andreas,
Yes it would, but that's not really possible with the current setup, as we don't offer different kinds of nodes...
The
Correct. In fact it already does - all CrateDB k8s clusters are reachable on both ports using the same LB. You will get round-robingly assigned to one of the nodes for the duration of your connection.
I wasn't around when this was being built, but that's my understanding. I think there is a lot of merit in revisiting this in the future, especially perhaps if we can take it further and making the client itself aware of the topology of the cluster - i.e. I would think that writing directly to the node that has the primary shard for the date you're inserting would be more optimal? Cheers, |
Hi Romanas, thanks again for sharing more insights about this topic.
I believe that is perfect and exactly the very thing @SBlechmann and @c0c0n3 were discussing at orchestracities/ngsi-timeseries-api#452 (comment) ff. If I get you right, every necessary step is already automated and a CrateDB client will just have to connect to a single endpoint (the LB) in order to have its requested to be distributed amongst the cluster nodes. I believe that is all I wanted to primarily gain from this discussion. Regarding my detours to "role-based" cluster nodes, where the cluster topology is more advanced, I completely understand that this is currently beyond the scope of With kind regards, |
That's one heck of a job though :-) I can't wait to give crate-operator a try in our clusters. Not sure when, but hopefully in a not so distant future... |
Hi @c0c0n3, thanks for already recognizing the addendum I posted at orchestracities/ngsi-timeseries-api#452 (comment). I just wanted to make clear that the With kind regards, |
Hi there,
at orchestracities/ngsi-timeseries-api#452, we are having a nice discussion about how to properly populate the list of CrateDB database URIs to connect to when using the HTTP protocol.
I discussed that with @mfussenegger and @seut already and they told me that a round-robin like distribution mechanism is implemented in
crate-python
. However, the same thing can also be implemented by using some K8s ingress technologies or by just using a dedicated HAProxy (containerized or not) as a more sophisticated HTTP load balancer in order to apply more advanced balancing mechanisms [1].So, I wanted to take that chance to bring up this topic here if you see a chance to also bring that functionality to
crate-operator
or an appropriate extension somehow or if that wouldn't align with the role ofcrate-operator
at all.I am just imagining something like whether
crate-operator
might be able to provide cluster topology information to clients (or HAProxy instances) in order to populate their list of database URIs to connect to. I have to admit that I don't know much about the scope ofcrate-operator
yet but will be happy to learn more about it.With kind regards,
Andreas.
P.S.: This topic is obviously not only limited to load balancing HTTP connections. When aiming at the PostgreSQL interface, respective topology information might want to be used to seed PgBouncer and friends.
[1] http://cbonte.github.io/haproxy-dconv/2.3/configuration.html#4-balance
The text was updated successfully, but these errors were encountered: