Using a simple message queue + moderately coupled microservices to streamline our platform #31

dadiorchen · 2024-02-01T08:56:23Z

Using a simple message queue + moderately coupled microservices to streamline our platform

Status: proposed

Deciders: @dadiorchen @Kpoke @ZavenArra @sebastiangaertner @arunbakt

Date: Thu Feb 1 05:58:58 CST 2024

Technical Story: {description | ticket/issue URL}

Context and Problem Statement

Problem

I believe for now the strict coupling between the microservices and the utilization of rabbitmq blocks us from pushing forward the designed new domain target and all those new features like capture+tree binding.

Currently, we strictly limit the microservices to having access to database schema and use rabbitmq to decouple the microservices. This makes the process of our business code indirect and leads to long workflow crossing services and rabbitmq, and a long workflow means it is difficult to code and more importantly difficult to test, for example, to approve a tree, we need to do the creation in treetracker db schema and emit an event to another microservice to inform it to do corresponding action (update some data in its domain/schema). In this proposal, I suggest direct access for some microservices to be able to finish the job in a single place, of course, with some rules to access.

Also, rabbitmq is a big burden for us to maintain and keep right, don't forget, rabbitmq stores data (events) in the system, it is a state of the system, so once we lost ( for example, the service gets evicted by k8s, or something wrong happened to the persistent storage of it), we are facing error of the whole system, not like microservices, they are stateless. I feel for now there isn't a member in the community could be able to really operate and maintain rabbitmq in a serious manner as a production service.

Solution

I suggest we make some changes to our architecture:

Moderately couple the microservices.

We should allow some microservices to have access to multiple database schemas for good enough reason, for example, the service to handle the tree and capture should be able to have access to raw capture schema to allow some necessary updates, just like the example that I gave above.

A bonus for this manner is that we can use database transactions here, to do a crossing schema update, it makes the system state more robust.

We are not saying we should allow any kind of access for services, the Machine Learning service should not have direct access to the core schema, and we still need a message queue to handle this, it means, we emit messages of like, new capture generation to queue, and the ML service listen to it and process the event.

To avoid unnecessary coupling, we need to define a clear relationship between the microservices, which service has access to which schemas, and use Terraform to strictly set the access in DB, by the role for every microservice.

Use a simple queue to replace rabbitmq.

We can use our database Postgresql as a simple queue, there are already companies adopting this solution, and some libraries in different languages support this. Some resources here:

https://www.prequel.co/blog/sql-maxis-why-we-ditched-rabbitmq-and-replaced-it-with-a-postgres-queue

https://github.com/timgit/pg-boss/tree/master

In this way, we use a dedicated database schema to store the message, with simple API, microservice could send and subscribe message.

Also, we can put the queue operation into our database transaction, which means the whole process including queue emitting and database update will be atomic, which makes the process simple and safe.

The pros of this solution:

It simplifies our microservices, easier to code and test, which means faster development, fewer bugs, robust system, the price for this is: higher coupling, lower throughout capacity, not suit a big and complicated business model, but I feel these problems are not the most priority for us by now.
It is easier to maintain, we don't add any extra maintenance by doing this, the Postgresql is already there, and we removed rabbitmq.
It is safer, all the states are in the database, as long as the database is safe and recoverable, we have all the states, even the whole k8s crashed.
Being able to use transactions for a bunch of operations, is a big relief for developers from the burden of considering all kinds of exceptions and consequences by the possibility of unsafe operations.

The cons of this solution:

The Postgresql message queue is not a full-featured message queue, for example, the pub/sub model could not be as powerful as rabbitmq.
It is less decoupled
It is less scalable, it can not support a use case that needs a huge message throughout capacity, and the queue is not a cluster, so we can not scale it easily.

Summary

By using a simple queue + moderately coupled microservices to streamline our platform, we can speed up our development on the new feature, and new domain model we planned, and easy to maintain ( without the burden of rabbitmq), and it's a safer solution on our data and application robustness.

The problem with this solution is the ability to scale up, but I think when we reach the ceiling of this solution, we are big enough to set up a big team, including cloud engineering to handle the platform, before that, this solution is a suitable choice for us. By the way, the rabbitmq is not really used currently, so it is easy for us to do the switch.

ZavenArra

This is a superb idea. It also allows engineers to get some familiarity with the queue concept through a more familiar tool (postgres). If a huge throughput is needed later, the system can be extended to make use of rabbitmq. But, you might want to model the throughput that you need, and consider if the queue should run in a separate DO database, so the performance and be easily monitored and segemented.

dadiorchen · 2024-02-02T21:50:13Z

Another benefit:
reduce the cost: the minimum installment of rabbitmq is 3 nodes cluster, 3GB mem, it occupies almost a single node/droplet

arunbakt · 2024-02-24T15:09:20Z

@dadiorchen Sorry, I may not be up to speed on the current state of things. The proposal is a viable alternative indeed but a short term one.

If we store generated events before emitting it to a queue in a partitioned table at the source schema, we still can maintain transactional integrity and avoid losing data if the queuing system crashes. For testing distributed event based system, couldn't we opt the path where individual microservices can be tested in isolation for the events they emit (validating the contracts)? If they have to be end to end tested, don't we have a production like test env? If not, I see the issue there.

Sharing schemas even if they are locked by terraform to limit access would not be fail-proof. I think, using a db dedicated as a queueing mechanism is probably a good middle ground for the short-term. But still I would store the events at the source in a partitioned table and play them to a queue (db or rabbitmq) so that we can switch to a messaging system later.

I see Digital Ocean has a managed Kafka, is that an option we could consider?

dadiorchen · 2024-02-26T09:57:31Z

@arunbakt I might be combining two changes into this protocol, 1. Coupling the microservice moderetly, 2. Use PostgreSQL to implement a queue.

Regardless No.2, for No.1, we give more visibility for some core microservice to reduce the complexity of back and forth between services, so we can ease the development.

For No.2, we for sure need a dedicated schema for the queue, and use API to call it, so in the future, we can replace it with advanced MQ.

dadiorchen added 2 commits February 1, 2024 05:56

chore: init

f9b6b58

chore: the proposal

2552fc9

dadiorchen marked this pull request as ready for review February 1, 2024 08:59

ZavenArra self-requested a review February 2, 2024 20:06

ZavenArra approved these changes Feb 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using a simple message queue + moderately coupled microservices to streamline our platform #31

Using a simple message queue + moderately coupled microservices to streamline our platform #31

dadiorchen commented Feb 1, 2024 •

edited

Loading

ZavenArra left a comment

dadiorchen commented Feb 2, 2024

arunbakt commented Feb 24, 2024

dadiorchen commented Feb 26, 2024

Using a simple message queue + moderately coupled microservices to streamline our platform #31

Are you sure you want to change the base?

Using a simple message queue + moderately coupled microservices to streamline our platform #31

Conversation

dadiorchen commented Feb 1, 2024 • edited Loading

Using a simple message queue + moderately coupled microservices to streamline our platform

Context and Problem Statement

Problem

Solution

The pros of this solution:

The cons of this solution:

Summary

ZavenArra left a comment

Choose a reason for hiding this comment

dadiorchen commented Feb 2, 2024

arunbakt commented Feb 24, 2024

dadiorchen commented Feb 26, 2024

dadiorchen commented Feb 1, 2024 •

edited

Loading