Implement file-based leader election method #90

nojima · 2024-06-18T08:52:42Z

We are planning to deploy yrmcds to our Kubernetes cluster.

Currently yrmcds uses virtual_ip to detect whether the instance is master or not. However, virtual IPs are difficult to utilize in Kubernetes clusters. Therefore, I implemented the file-based leader election method. Strictly speaking, this PR does not implement leader election algorithm, but the way for yrmcds to receive the result of a leader election.

This PR introduces a new option leader_election_method:

# method of leader election. "virtual_ip" or "file".
# * virtual_ip:
#     The node that owns the virtual_ip address becomes the master.
#     "virtual_ip" must be set.
#     "master_host" and "master_file" are ignored.
# * file:
#     The node that has "master_file" becomes the master.
#     "master_host" and "master_file" must be set.
#     "virtual_ip" is ignored.
leader_election_method = "virtual_ip"

When leader_election_method is file, a yrmcds instance is considered to be master when a file exists on the certain path (specified by master_file).

When leader_election_method is virtual_ip, the current behavior remains.

This PR also adds a new option master_host, which is a address used by slaves to connect to the current master.

Minor fixes

This PR also fixes minor issues I found during implementing this PR.

When a slave node disconnected from master longer than 5 seconds, it clears all data it stores. That is, when a leader election takes more than 5 seconds, all data are lost.
When an error occurred in a name resolution in tcp_connect, yrmcds crushes without retry. This is an issue because we use dynamically created/updated Service resources to connect to the master node on Kubernetes cluster.
When slave cannot to promote to the master, yrmcds outputs too many logs, which are not useful.

nojima · 2024-07-18T05:11:30Z

@ymmt2005
May I ask for a review?

ymmt2005 · 2024-07-18T05:14:50Z

@nojima
sure thing. Any deadline in mind?

nojima · 2024-07-18T05:38:12Z

@ymmt2005
There are no specific deadlines, as this is not an urgent task. Please review it when you are available.

ymmt2005 · 2024-07-18T12:37:37Z

@nojima

When an error occurred in a name resolution in tcp_connect, yrmcds crushes without retry. This is an issue because we use dynamically created/updated Service resources to connect to the master node on Kubernetes cluster.

Is this really a problem? I guess Kubernetes Pod restarts yrmcdsd if it crashes.

ymmt2005

Looks mostly okay.
Please update docs/design.md to align with the new design.

ymmt2005 · 2024-07-18T13:43:01Z

src/config.hpp

@@ -52,7 +57,7 @@ class counter_config {
 class config {
 public:
    // Setup default configurations.
-    config(): m_vip("127.0.0.1"), m_tempdir(DEFAULT_TMPDIR) {
+    config(): m_vip(std::optional("127.0.0.1")), m_tempdir(DEFAULT_TMPDIR) {


Shouldn't we initialize m_leader_election_method here as well?
It looks inconsistent to initialize the field elsewhere.

m_leader_election_method is initialized with the following line:

yrmcds::leader_election_method m_leader_election_method = yrmcds::leader_election_method::virtual_ip;

I know. I thought it was inconsistent.

I will move all member initializations to the declaration statement.
I think it is easier to read when they are grouped together rather than separated in the constructor and member declarations.

ymmt2005 · 2024-07-18T13:51:35Z

src/counter/handler.cpp

-        m_reactor.add_resource(
-            make_server_socket(g_config.vip(), port, w, true),
-            cybozu::reactor::EVENT_IN);
+        if( g_config.vip() ) {


I guess this is always true because m_vip is initialized to "127.0.0.1".
Am I right?

Oh, that's right.

Since m_vip is always set, I think using optional for this might be confusing and lead to bugs.

I'll leave this to you to figure out how to fix it.

When leader_election_method is file, yrmcds actually has no virtual IPs. Therefore, I feel that the type of m_vip should be std::optional to model the situation correctly.

What is wrong is the initial value of m_vip. It should be initialized to be nullopt and should only have the user-specified value or 127.0.0.1 when in virtual_ip mode.

↑This was wrong.

Since the initial value of leader_election_method is virtual_ip, the initial value of m_vip must be 127.0.0.1.
Therefore, we must empty m_vip when leader_election_method is changed to other than virtual_ip.

ymmt2005 · 2024-07-18T14:05:19Z

src/memcache/handler.cpp

+        fd = cybozu::tcp_connect(master_host.c_str(), g_config.repl_port());
+    } catch( std::runtime_error& err ) {
+        logger::error() << "Failed to connect to the master (" << master_host << "): " << err.what();
+        m_reactor.run_once();


What does this intend to do? A comment would be helpful.

It was called in the existing error handling, so I called it in the code I added.

if( fd == -1 ) { m_reactor.run_once(); return false; }

I'm not aware of the original intent of the code, so I'm just guessing: since signal handling, etc., depends on the reactor, we should sometimes run the reactor during re-connecting to the master.

I see. Thanks.

ymmt2005 · 2024-07-18T14:06:19Z

src/memcache/handler.cpp

-        m_reactor.add_resource(
-            make_server_socket(g_config.vip(), g_config.port(), w, true),
-            cybozu::reactor::EVENT_IN);
+        if( g_config.vip() ) {


Ditto. This is always true, IIUC.

nojima · 2024-07-19T03:04:51Z

@ymmt2005

When an error occurred in a name resolution in tcp_connect, yrmcds crushes without retry. This is an issue because we use dynamically created/updated Service resources to connect to the master node on Kubernetes cluster.

Is this really a problem? I guess Kubernetes Pod restarts yrmcdsd if it crashes.

I consider this to be critical.

Consider the behavior of yrmcds running as a slave when master dies: the slave checks whether it can promote itself to master, and if it cannot promote itself to master for 5 seconds, it attempts to connect to the current master. If the connection fails, the slave returns to the loop of trying to be promoted to master again.

If the pod crashes when connecting to master, all data held by the slave will be lost at that point.

ymmt2005 · 2024-07-19T03:09:23Z

If the pod crashes when connecting to master, all data held by the slave will be lost at that point.

Makes sense. Thank you.

ymmt2005

LGTM

nojima · 2024-07-24T09:17:48Z

I want to merge #91 first, so I'll leave this PR merge for a while.

nojima marked this pull request as draft June 18, 2024 08:52

nojima changed the title ~~JTASK-1062: implement file-based leader election method~~ Implement file-based leader election method Jul 12, 2024

nojima added 4 commits July 12, 2024 16:23

Implement file-based leader election method

3347d33

Prevent slave from deleting data when unable to connect to master

98e49b4

Always retry connection to master

659b219

Reduce wasteful logs

6137800

nojima force-pushed the JTASK-1062 branch from 26d79d0 to 6137800 Compare July 12, 2024 07:24

nojima marked this pull request as ready for review July 18, 2024 05:08

ymmt2005 requested changes Jul 18, 2024

View reviewed changes

nojima added 2 commits July 24, 2024 10:48

clear m_vip when leader_election_method != virtual_ip

05b092d

add missing return statement

32ad158

ymmt2005 approved these changes Jul 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement file-based leader election method #90

Implement file-based leader election method #90

nojima commented Jun 18, 2024 •

edited

Loading

nojima commented Jul 18, 2024

ymmt2005 commented Jul 18, 2024

nojima commented Jul 18, 2024

ymmt2005 commented Jul 18, 2024

ymmt2005 left a comment

ymmt2005 Jul 18, 2024

nojima Jul 19, 2024

ymmt2005 Jul 19, 2024

nojima Jul 23, 2024

ymmt2005 Jul 18, 2024

nojima Jul 19, 2024

ymmt2005 Jul 19, 2024 •

edited

Loading

nojima Jul 23, 2024

nojima Jul 24, 2024

ymmt2005 Jul 18, 2024

nojima Jul 19, 2024 •

edited

Loading

ymmt2005 Jul 19, 2024

ymmt2005 Jul 18, 2024

nojima commented Jul 19, 2024

ymmt2005 commented Jul 19, 2024

ymmt2005 left a comment

nojima commented Jul 24, 2024

Implement file-based leader election method #90

Are you sure you want to change the base?

Implement file-based leader election method #90

Conversation

nojima commented Jun 18, 2024 • edited Loading

Minor fixes

nojima commented Jul 18, 2024

ymmt2005 commented Jul 18, 2024

nojima commented Jul 18, 2024

ymmt2005 commented Jul 18, 2024

ymmt2005 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ymmt2005 Jul 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nojima Jul 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nojima commented Jul 19, 2024

ymmt2005 commented Jul 19, 2024

ymmt2005 left a comment

Choose a reason for hiding this comment

nojima commented Jul 24, 2024

nojima commented Jun 18, 2024 •

edited

Loading

ymmt2005 Jul 19, 2024 •

edited

Loading

nojima Jul 19, 2024 •

edited

Loading