Revisit production infrastructure architecture / tooling #344

benoit74 · 2025-01-07T15:43:57Z

Currently, our infrastructure is split in two:

some machines are oriented towards k8s, with a control plane provided by Scaleway and machines provided only by Scaleway
other machines are provided by Scaleway and other providers, and managed manually

I see multiple drawbacks which could be opportunities for discussions / improvements:

we use the heavy lifted k8s which is rather designed for big clusters of 100s of machines with workloads moving everywhere while in our case we have assigned workloads and only 4 machines
- this allows us to benefit from inventions designed for the masses
- this comes with a significant burden in terms of maintenance
  - many feature we do not mind about but are not tailored to our use case so we need to tweak them
  - simple things like not overwhelming a machine with many old software versions become something complex and risky to tackle
- this comes with a significant risk for production (significant accidental complexity for rather very simple services)
- we have limited opportunities to implement things we are supposed to be more straightforward (storage HA, IPv6)
we pay a lot (1260€/year) for a control plane without any SLA, no HA and some serious limitations (IPv6 not supported, limited etcd size, no backup)
only k8s machines are monitored in Grafana because it was the most urgent ones, if the technology was homogeneous everything would be monitored

One perspective I have in mind is the question of using something maybe simpler or more tailored to our "one machine - one role" use case, like k3s. Or even simplify the stack further down with even simpler tools.

rgaudin · 2025-01-08T08:34:16Z

Important to note that our usage of k8s features is very limited due to our usage of local storage (enforcing the one machine, one role). This makes the whole strategy very expensive.

That usage of local storage is driven by cost as cloud block storage is very expensive.
An alternative is network storage which is not possible or too expensive because that competes with the former.

I also think we should discuss our general needs and constraints and revisit the strategy.

benoit74 added enhancement New feature or request question Further information is requested labels Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit production infrastructure architecture / tooling #344

Revisit production infrastructure architecture / tooling #344

benoit74 commented Jan 7, 2025

rgaudin commented Jan 8, 2025

Revisit production infrastructure architecture / tooling #344

Revisit production infrastructure architecture / tooling #344

Comments

benoit74 commented Jan 7, 2025

rgaudin commented Jan 8, 2025