Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit production infrastructure architecture / tooling #344

Open
benoit74 opened this issue Jan 7, 2025 · 1 comment
Open

Revisit production infrastructure architecture / tooling #344

benoit74 opened this issue Jan 7, 2025 · 1 comment
Labels
enhancement New feature or request question Further information is requested

Comments

@benoit74
Copy link
Collaborator

benoit74 commented Jan 7, 2025

Currently, our infrastructure is split in two:

  • some machines are oriented towards k8s, with a control plane provided by Scaleway and machines provided only by Scaleway
  • other machines are provided by Scaleway and other providers, and managed manually

I see multiple drawbacks which could be opportunities for discussions / improvements:

  • we use the heavy lifted k8s which is rather designed for big clusters of 100s of machines with workloads moving everywhere while in our case we have assigned workloads and only 4 machines
    • this allows us to benefit from inventions designed for the masses
    • this comes with a significant burden in terms of maintenance
      • many feature we do not mind about but are not tailored to our use case so we need to tweak them
      • simple things like not overwhelming a machine with many old software versions become something complex and risky to tackle
    • this comes with a significant risk for production (significant accidental complexity for rather very simple services)
    • we have limited opportunities to implement things we are supposed to be more straightforward (storage HA, IPv6)
  • we pay a lot (1260€/year) for a control plane without any SLA, no HA and some serious limitations (IPv6 not supported, limited etcd size, no backup)
  • only k8s machines are monitored in Grafana because it was the most urgent ones, if the technology was homogeneous everything would be monitored

One perspective I have in mind is the question of using something maybe simpler or more tailored to our "one machine - one role" use case, like k3s. Or even simplify the stack further down with even simpler tools.

@benoit74 benoit74 added enhancement New feature or request question Further information is requested labels Jan 7, 2025
@rgaudin
Copy link
Member

rgaudin commented Jan 8, 2025

Important to note that our usage of k8s features is very limited due to our usage of local storage (enforcing the one machine, one role). This makes the whole strategy very expensive.

That usage of local storage is driven by cost as cloud block storage is very expensive.
An alternative is network storage which is not possible or too expensive because that competes with the former.

I also think we should discuss our general needs and constraints and revisit the strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants