implementing signals as microservice deployments #49

magaldima · 2018-07-19T21:34:43Z

This PR introduces a major shift in the architecture behind the argo-events signals. I know this PR has many changes, but I'll do my best to explain what is changing and why. I'll start with why.

Why?

Flexibility & Extensibility of signal sources. Users can build their own micro signal services without changing the argo-events code base. Users can deploy their own combination of signal support depending on their needs.
Zero-downtime for signals that should be always present e.g. webhooks (this is because the signals are now deployments and users can configure the deployments with >1 replicas).
Stateless signals simplifies signal code and allows 2.

What is this?
To get a better idea of what this change introduces, it would be helpful to understand the history of the argo-events signal functionality.

Signals started out (as of the v0.5-alpha1 release) as separate goroutines run inside of individual "executor" pods. For each active sensor, there was a running pod.

A couple weeks ago, I merged in a change to make certain signals (artifact, calendar, resource, webhook) run as separate Goroutines within the sensor-controller pod while the stream signals (nats, kafka, mqtt, amqp) run as separate processing "plugged-in" to the controller binary via go-plugin. All of the signals implemented the same interface, so calling every signal happened the same way.

Now, this PR introduces separate signal deployments which register themselves as micro services to any other micro clients. These services were enhanced in order to become stateless. In our case, the sensor-controller becomes the sole micro client and listens to signals via gRPC Listen().

CLAassistant · 2018-07-19T21:34:50Z

All committers have signed the CLA.

magaldima · 2018-07-19T21:51:40Z

closes #17

shrinandj · 2018-07-19T22:10:45Z

Is it fair to say that; after this change:

Any K8s pod can register itself with argo-events as a signal. It will that pods job to figure out if that signal is fired. Whenever it does, that pod will inform the sensor-controller that it has fired and the sensor-controller should execute the trigger. The sensor pod should also determine when to resolve the signal (i.e. mark it completed and take some finalizing actions. e.g. stop the webserver).

magaldima · 2018-07-19T22:22:23Z

@shrinandj just to clarify your understanding:

Yes, any k8s pod can register itself as a micro service that the argo-events sensor controller CAN use as a signal service. This depends on the name of the registered service and the type of the signal.
The sensor-controller is the Listener client so it initiates a bi-directional stream with a stateless process on a signal pod when it processes a signal from a new sensor. The controller listens on the stream for events while the signal pod sends events on the stream. When the controller receives an event that satisfies it's sensor requirements, it sends a signal on the stream to terminate the signal. The signal pod then finishes any process associated with that req context. So to correct you, the controller determines when to resolve the signal, while the signal pod takes the necessary actions e.g. stopping the webserver.

spk83 · 2018-07-20T14:25:06Z

@magaldima how are we handling signal pods errors/failures here? signal pod would notify errors to controller if its running, what would happen in case where signal pod is in crash loop? would controller detect connection disconnect and make decision based on that?

magaldima · 2018-07-20T16:03:45Z

@spk83 When a signal pod dies, all the bi-directional streams become disconnected from the sensor controller. When this happens, the controller's goroutines listening on the streams receive an error notifying them that the context has failed or the connection was closed. The signal nodes are then updated with an error. During the next processing loop of the affected sensor, the controller will attempt to re-establish a connection stream with the signal pod (up to the number of retries (3) for a sensor in an error phase). Therefore, if the signal pod is in a crash loop, the controller would try 3 times to make a connection, and after failing would not re-queue the sensor. The sensor resource and the signal node would remain in an error phase.

Assuming, the signal pod requires user intervention to fix the issue and it becomes stable, the user would then have to re-queue or re-create the sensors so that the controller can operate on them.

…lient to pod failures and moving escalation logic after requeue failures

shrinandj · 2018-07-23T22:13:44Z

After this commit, do each of the signal docker images need to be deployed in the cluster individually?

* implementing signals as microservice deployments, making signals resilient to pod failures and moving escalation logic after requeue failures * updating Makefile and CONTRIBUTING guide * fixing failing tests

magaldima requested review from VaibhavPage, shrinandj and spk83 July 19, 2018 21:34

magaldima changed the title ~~implementing signals as microservice deployments~~ WIP - implementing signals as microservice deployments Jul 19, 2018

magaldima mentioned this pull request Jul 19, 2018

Support for events generated by GCP Pub/Sub and GCS (via Pub/Sub or webhooks) #45

Closed

shrinandj previously approved these changes Jul 20, 2018

View reviewed changes

magaldima dismissed shrinandj’s stale review via a6a46d6 July 20, 2018 21:05

matt-magaldi added 2 commits July 20, 2018 17:22

implementing signals as microservice deployments, making signals resi…

5d132f7

…lient to pod failures and moving escalation logic after requeue failures

updating Makefile and CONTRIBUTING guide

b2b8b80

magaldima force-pushed the integrate-micro-for-service-plugins branch from a6a46d6 to b2b8b80 Compare July 20, 2018 21:42

magaldima changed the title ~~WIP - implementing signals as microservice deployments~~ implementing signals as microservice deployments Jul 20, 2018

fixing failing tests

de02449

shrinandj approved these changes Jul 20, 2018

View reviewed changes

magaldima merged commit 3e00ae1 into master Jul 23, 2018

magaldima deleted the integrate-micro-for-service-plugins branch July 23, 2018 12:54

This was referenced Jul 23, 2018

Webhook unit test panics with invalid memory address #53

Closed

Long running sensors #17

Closed

magaldima mentioned this pull request Jul 23, 2018

add check in controller for at least one signal service #55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implementing signals as microservice deployments #49

implementing signals as microservice deployments #49

magaldima commented Jul 19, 2018 •

edited

Loading

CLAassistant commented Jul 19, 2018 •

edited

Loading

magaldima commented Jul 19, 2018

shrinandj commented Jul 19, 2018

magaldima commented Jul 19, 2018 •

edited

Loading

spk83 commented Jul 20, 2018

magaldima commented Jul 20, 2018 •

edited

Loading

shrinandj commented Jul 23, 2018

implementing signals as microservice deployments #49

implementing signals as microservice deployments #49

Conversation

magaldima commented Jul 19, 2018 • edited Loading

CLAassistant commented Jul 19, 2018 • edited Loading

magaldima commented Jul 19, 2018

shrinandj commented Jul 19, 2018

magaldima commented Jul 19, 2018 • edited Loading

spk83 commented Jul 20, 2018

magaldima commented Jul 20, 2018 • edited Loading

shrinandj commented Jul 23, 2018

magaldima commented Jul 19, 2018 •

edited

Loading

CLAassistant commented Jul 19, 2018 •

edited

Loading

magaldima commented Jul 19, 2018 •

edited

Loading

magaldima commented Jul 20, 2018 •

edited

Loading