Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementing signals as microservice deployments #49

Merged
merged 3 commits into from
Jul 23, 2018

Conversation

magaldima
Copy link
Contributor

@magaldima magaldima commented Jul 19, 2018

This PR introduces a major shift in the architecture behind the argo-events signals. I know this PR has many changes, but I'll do my best to explain what is changing and why. I'll start with why.

Why?

  1. Flexibility & Extensibility of signal sources. Users can build their own micro signal services without changing the argo-events code base. Users can deploy their own combination of signal support depending on their needs.
  2. Zero-downtime for signals that should be always present e.g. webhooks (this is because the signals are now deployments and users can configure the deployments with >1 replicas).
  3. Stateless signals simplifies signal code and allows 2.

What is this?
To get a better idea of what this change introduces, it would be helpful to understand the history of the argo-events signal functionality.

Signals started out (as of the v0.5-alpha1 release) as separate goroutines run inside of individual "executor" pods. For each active sensor, there was a running pod.

A couple weeks ago, I merged in a change to make certain signals (artifact, calendar, resource, webhook) run as separate Goroutines within the sensor-controller pod while the stream signals (nats, kafka, mqtt, amqp) run as separate processing "plugged-in" to the controller binary via go-plugin. All of the signals implemented the same interface, so calling every signal happened the same way.

Now, this PR introduces separate signal deployments which register themselves as micro services to any other micro clients. These services were enhanced in order to become stateless. In our case, the sensor-controller becomes the sole micro client and listens to signals via gRPC Listen().

@CLAassistant
Copy link

CLAassistant commented Jul 19, 2018

CLA assistant check
All committers have signed the CLA.

@magaldima
Copy link
Contributor Author

closes #17

@shrinandj
Copy link
Contributor

Is it fair to say that; after this change:

  • Any K8s pod can register itself with argo-events as a signal. It will that pods job to figure out if that signal is fired. Whenever it does, that pod will inform the sensor-controller that it has fired and the sensor-controller should execute the trigger. The sensor pod should also determine when to resolve the signal (i.e. mark it completed and take some finalizing actions. e.g. stop the webserver).

@magaldima
Copy link
Contributor Author

magaldima commented Jul 19, 2018

@shrinandj just to clarify your understanding:

  1. Yes, any k8s pod can register itself as a micro service that the argo-events sensor controller CAN use as a signal service. This depends on the name of the registered service and the type of the signal.

  2. The sensor-controller is the Listener client so it initiates a bi-directional stream with a stateless process on a signal pod when it processes a signal from a new sensor. The controller listens on the stream for events while the signal pod sends events on the stream. When the controller receives an event that satisfies it's sensor requirements, it sends a signal on the stream to terminate the signal. The signal pod then finishes any process associated with that req context. So to correct you, the controller determines when to resolve the signal, while the signal pod takes the necessary actions e.g. stopping the webserver.

@magaldima magaldima changed the title implementing signals as microservice deployments WIP - implementing signals as microservice deployments Jul 19, 2018
@spk83
Copy link

spk83 commented Jul 20, 2018

@magaldima how are we handling signal pods errors/failures here? signal pod would notify errors to controller if its running, what would happen in case where signal pod is in crash loop? would controller detect connection disconnect and make decision based on that?

@magaldima
Copy link
Contributor Author

magaldima commented Jul 20, 2018

@spk83 When a signal pod dies, all the bi-directional streams become disconnected from the sensor controller. When this happens, the controller's goroutines listening on the streams receive an error notifying them that the context has failed or the connection was closed. The signal nodes are then updated with an error. During the next processing loop of the affected sensor, the controller will attempt to re-establish a connection stream with the signal pod (up to the number of retries (3) for a sensor in an error phase). Therefore, if the signal pod is in a crash loop, the controller would try 3 times to make a connection, and after failing would not re-queue the sensor. The sensor resource and the signal node would remain in an error phase.

Assuming, the signal pod requires user intervention to fix the issue and it becomes stable, the user would then have to re-queue or re-create the sensors so that the controller can operate on them.

shrinandj
shrinandj previously approved these changes Jul 20, 2018
@magaldima magaldima force-pushed the integrate-micro-for-service-plugins branch from a6a46d6 to b2b8b80 Compare July 20, 2018 21:42
@magaldima magaldima changed the title WIP - implementing signals as microservice deployments implementing signals as microservice deployments Jul 20, 2018
@magaldima magaldima merged commit 3e00ae1 into master Jul 23, 2018
@magaldima magaldima deleted the integrate-micro-for-service-plugins branch July 23, 2018 12:54
@shrinandj
Copy link
Contributor

After this commit, do each of the signal docker images need to be deployed in the cluster individually?

juliev0 pushed a commit to juliev0/argo-events that referenced this pull request Mar 29, 2022
* implementing signals as microservice deployments, making signals resilient to pod failures and moving escalation logic after requeue failures

* updating Makefile and CONTRIBUTING guide

* fixing failing tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants