Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serve metrics under a dedicated port #48

Merged
merged 5 commits into from
Mar 7, 2018

Conversation

linki
Copy link
Member

@linki linki commented Feb 28, 2018

etcd v3.3 introduced a new flag to allow serving /metrics and /health under a different port than e.g. /v2/keys. This allows us to protect etcd's data via firewall rules but still let monitoring tools to access the monitoring information.

See feature request in etcd repo: etcd-io/etcd#8060.
The implementation landed in v3.3: etcd-io/etcd#8242

This PR instructs etcd to serve metrics and health under the additonal port 2381 unconditionally when the used etcd binary is >=v3.3.x. However, if not explicitely set in the senza.yaml this port won't be mapped to the outside and therefore isn't accessible. It doesn't expose more information than anything under 2379 already does. See our intended usage here: zalando-incubator/kubernetes-on-aws#879.

Note: This fails when bundled with etcd lower than v3.3. Furthermore, serving this additional endpoint unconditionally should be safe but might not be desired. It keeps the implementation very simple, though. LMKWYT

/cc @CyberDem0n @aermakov-zalando @mikkeloscar

@mikkeloscar
Copy link
Member

lgtm

etcd.py Outdated
@@ -253,6 +255,8 @@ def etcd_arguments(self, data_dir, initial_cluster, cluster_state):
self.peer_url,
'-listen-client-urls',
'http://0.0.0.0:{}'.format(self.client_port),
'-listen-metrics-urls',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It breaks compatibility with old etcd, because the same script is used for different images containing different version.
Moreover, it even breaks upgrade 3.2 => 3.3.

Those arguments should be applied conditionally, i.e. only if etcd version is >= 3.3.
You can get version from environment: ETCDVERSION.

@linki
Copy link
Member Author

linki commented Mar 1, 2018

@CyberDem0n Thanks.

It now looks at the ETCDVERSION env variable and only adds the -listen-metrics-urls flag when the version is greater than or equal to v3.3.x

etcd.py Outdated
# this section handles etcd version specific flags
etcdversion = os.environ.get('ETCDVERSION')
if etcdversion:
etcdversion = list(map(lambda x: int(x), etcdversion.split('.')))
Copy link

@aermakov-zalando aermakov-zalando Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcdversion = [int(x) for x in etcdversion.split('.')]

You could also use tuples which are immutable (tuple(int(x) for x in etcdversion.split('.'))), but don't forget to change the comparison to match.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I changed it.

etcd.py Outdated
if etcdversion:
etcdversion = list(map(lambda x: int(x), etcdversion.split('.')))
# don't try to add additonal flags if we encounter an unexpected version format
if len(etcdversion) == 3:
Copy link

@aermakov-zalando aermakov-zalando Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is worth it TBH. If they decide to do a 3.3.1.2, this will suddenly break instead of continuing. I'd just check that the length is >= 2, or even drop the check altogether.

etcd.py Outdated
# don't try to add additonal flags if we encounter an unexpected version format
if len(etcdversion) == 3:
# etcd >= v3.3: serve metrics on an additonal port
if etcdversion[0] >= 3 and etcdversion[1] >= 3:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcdversion >= [3, 3]

@szuecs
Copy link
Member

szuecs commented Mar 5, 2018

@CyberDem0n can we get this merged?

@szuecs
Copy link
Member

szuecs commented Mar 5, 2018

👍

etcd.py Outdated
@@ -263,6 +266,20 @@ def etcd_arguments(self, data_dir, initial_cluster, cluster_state):
cluster_state
]

# this section handles etcd version specific flags
etcdversion = os.environ.get('ETCDVERSION')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still one corner case left, it breaks upgrade from 3.2 to 3.3 :(
During the upgrade first it starts the old binary in order to get data, than it stops it and starts the new binary.
Current change will try to apply -listen-metrics-urls to the old binary and fail if the old version is 3.2

In order to solve it, let's introduce one more argument to this method, run_old and pass its value from https://github.com/linki/stups-etcd-cluster/blob/0a914150be28b93ff77b56747b0cf09a010138dc/etcd.py#L474:

    return self.me.etcd_arguments(self.DATA_DIR, peers, cluster_state, self.run_old)

And the line 270 has to be changed to:

    etcdversion = os.environ.get('ETCDVERSION_PREV' if run_old else 'ETCDVERSION')

etcd.py Outdated
@@ -241,8 +243,9 @@ def delete_member(self, member):
self.adjust_security_groups('revoke_ingress', member)
return result

def etcd_arguments(self, data_dir, initial_cluster, cluster_state):
return [
def etcd_arguments(self, data_dir, initial_cluster, cluster_state, run_old)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-    def etcd_arguments(self, data_dir, initial_cluster, cluster_state, run_old)):
+    def etcd_arguments(self, data_dir, initial_cluster, cluster_state, run_old):

this is the reason why tests are failing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn, sorry about that.

@CyberDem0n
Copy link
Contributor

👍

@linki
Copy link
Member Author

linki commented Mar 6, 2018

@CyberDem0n let me quickly test on our cluster that it still works as expected before merging.

@linki
Copy link
Member Author

linki commented Mar 6, 2018

@CyberDem0n ok, works with 3.3.

@CyberDem0n
Copy link
Contributor

I need a second 👍

@linki
Copy link
Member Author

linki commented Mar 7, 2018

@CyberDem0n Did it work for you as well?

@CyberDem0n
Copy link
Contributor

I didn't tried. In order to build image it needs to be merged and we need a second approval for that

@mikkeloscar
Copy link
Member

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants