Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement file-based leader election method #90

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

nojima
Copy link
Collaborator

@nojima nojima commented Jun 18, 2024

We are planning to deploy yrmcds to our Kubernetes cluster.

Currently yrmcds uses virtual_ip to detect whether the instance is master or not. However, virtual IPs are difficult to utilize in Kubernetes clusters. Therefore, I implemented the file-based leader election method. Strictly speaking, this PR does not implement leader election algorithm, but the way for yrmcds to receive the result of a leader election.

This PR introduces a new option leader_election_method:

# method of leader election. "virtual_ip" or "file".
# * virtual_ip:
#     The node that owns the virtual_ip address becomes the master.
#     "virtual_ip" must be set.
#     "master_host" and "master_file" are ignored.
# * file:
#     The node that has "master_file" becomes the master.
#     "master_host" and "master_file" must be set.
#     "virtual_ip" is ignored.
leader_election_method = "virtual_ip"

When leader_election_method is file, a yrmcds instance is considered to be master when a file exists on the certain path (specified by master_file).

When leader_election_method is virtual_ip, the current behavior remains.

This PR also adds a new option master_host, which is a address used by slaves to connect to the current master.

Minor fixes

This PR also fixes minor issues I found during implementing this PR.

  1. When a slave node disconnected from master longer than 5 seconds, it clears all data it stores. That is, when a leader election takes more than 5 seconds, all data are lost.
  2. When an error occurred in a name resolution in tcp_connect, yrmcds crushes without retry. This is an issue because we use dynamically created/updated Service resources to connect to the master node on Kubernetes cluster.
  3. When slave cannot to promote to the master, yrmcds outputs too many logs, which are not useful.

@nojima nojima marked this pull request as draft June 18, 2024 08:52
@nojima nojima changed the title JTASK-1062: implement file-based leader election method Implement file-based leader election method Jul 12, 2024
@nojima nojima marked this pull request as ready for review July 18, 2024 05:08
@nojima
Copy link
Collaborator Author

nojima commented Jul 18, 2024

@ymmt2005
May I ask for a review?

@ymmt2005
Copy link
Member

@nojima
sure thing. Any deadline in mind?

@nojima
Copy link
Collaborator Author

nojima commented Jul 18, 2024

@ymmt2005
There are no specific deadlines, as this is not an urgent task. Please review it when you are available.

@ymmt2005
Copy link
Member

@nojima

When an error occurred in a name resolution in tcp_connect, yrmcds crushes without retry. This is an issue because we use dynamically created/updated Service resources to connect to the master node on Kubernetes cluster.

Is this really a problem? I guess Kubernetes Pod restarts yrmcdsd if it crashes.

Copy link
Member

@ymmt2005 ymmt2005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly okay.
Please update docs/design.md to align with the new design.

src/config.hpp Outdated
@@ -52,7 +57,7 @@ class counter_config {
class config {
public:
// Setup default configurations.
config(): m_vip("127.0.0.1"), m_tempdir(DEFAULT_TMPDIR) {
config(): m_vip(std::optional("127.0.0.1")), m_tempdir(DEFAULT_TMPDIR) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we initialize m_leader_election_method here as well?
It looks inconsistent to initialize the field elsewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_leader_election_method is initialized with the following line:

yrmcds::leader_election_method m_leader_election_method = yrmcds::leader_election_method::virtual_ip;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. I thought it was inconsistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move all member initializations to the declaration statement.
I think it is easier to read when they are grouped together rather than separated in the constructor and member declarations.

m_reactor.add_resource(
make_server_socket(g_config.vip(), port, w, true),
cybozu::reactor::EVENT_IN);
if( g_config.vip() ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is always true because m_vip is initialized to "127.0.0.1".
Am I right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's right.

Copy link
Member

@ymmt2005 ymmt2005 Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since m_vip is always set, I think using optional for this might be confusing and lead to bugs.

I'll leave this to you to figure out how to fix it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When leader_election_method is file, yrmcds actually has no virtual IPs. Therefore, I feel that the type of m_vip should be std::optional to model the situation correctly.

What is wrong is the initial value of m_vip. It should be initialized to be nullopt and should only have the user-specified value or 127.0.0.1 when in virtual_ip mode.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

↑This was wrong.

Since the initial value of leader_election_method is virtual_ip, the initial value of m_vip must be 127.0.0.1.
Therefore, we must empty m_vip when leader_election_method is changed to other than virtual_ip.

fd = cybozu::tcp_connect(master_host.c_str(), g_config.repl_port());
} catch( std::runtime_error& err ) {
logger::error() << "Failed to connect to the master (" << master_host << "): " << err.what();
m_reactor.run_once();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this intend to do? A comment would be helpful.

Copy link
Collaborator Author

@nojima nojima Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was called in the existing error handling, so I called it in the code I added.

    if( fd == -1 ) {
        m_reactor.run_once();
        return false;
    }

I'm not aware of the original intent of the code, so I'm just guessing: since signal handling, etc., depends on the reactor, we should sometimes run the reactor during re-connecting to the master.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks.

m_reactor.add_resource(
make_server_socket(g_config.vip(), g_config.port(), w, true),
cybozu::reactor::EVENT_IN);
if( g_config.vip() ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. This is always true, IIUC.

@nojima
Copy link
Collaborator Author

nojima commented Jul 19, 2024

@ymmt2005

When an error occurred in a name resolution in tcp_connect, yrmcds crushes without retry. This is an issue because we use dynamically created/updated Service resources to connect to the master node on Kubernetes cluster.

Is this really a problem? I guess Kubernetes Pod restarts yrmcdsd if it crashes.

I consider this to be critical.

Consider the behavior of yrmcds running as a slave when master dies: the slave checks whether it can promote itself to master, and if it cannot promote itself to master for 5 seconds, it attempts to connect to the current master. If the connection fails, the slave returns to the loop of trying to be promoted to master again.

If the pod crashes when connecting to master, all data held by the slave will be lost at that point.

@ymmt2005
Copy link
Member

If the pod crashes when connecting to master, all data held by the slave will be lost at that point.

Makes sense. Thank you.

Copy link
Member

@ymmt2005 ymmt2005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nojima
Copy link
Collaborator Author

nojima commented Jul 24, 2024

I want to merge #91 first, so I'll leave this PR merge for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants