Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update images to match official Supabase Docker compose #48

Merged
merged 21 commits into from
Apr 12, 2024

Conversation

drpsyko101
Copy link
Contributor

@drpsyko101 drpsyko101 commented Oct 25, 2023

What kind of change does this PR introduce?

A major update to sync the image tags to the official Docker compose, as well as adding more services such as imgproxy, analytics, vector, and functions. Full changelog:

  • Image tag update:
    • db - supabase/postgres:15.1.0.147
    • studio - supabase/studio:20240326-5e5586d
    • auth - supabase/gotrue:v2.143.0
    • rest - postgrest/postgrest:v12.0.1
    • realtime - supabase/realtime:v2.27.5
    • meta - supabase/postgres-meta:v0.77.2
    • storage - supabase/storage-api:v0.46.4
    • imgproxy (new) - darthsim/imgproxy:v3.8.0
    • kong - kong:2.8.1
    • analytics (new) - supabase/logflare:1.4.0
    • vector (new) - timberio/vector:0.34.0-alpine
    • functions (new) - supabase/edge-runtime:v1.36.1
    • minio (new, disabled) - minio/minio:latest
  • Update environment variables implementation
    • Environment variables are now set to spec.template.spec.containers[0].env instead of using ConfigMap
    • ConfigMap is now reserved for non-sensitive file-based volume mounts
    • Remove unused environment variables
  • Update db initialization scripts so as not to override built-in image scripts
  • Update README to reflect the new changes
  • Ingress removal for all services except for kong.
  • Secrets are now handled by the Helm chart instead of being manually created by the user
  • Add PR linting and testing charts
  • Add support for Supabase migration script Golden source of truth for init scripts needed #38
  • Add a simple Helm test suite for checking pod liveliness
  • Add livelinessProbe & readinessProbe to all deployments
  • Add support for dashboard basic authentication
  • PersistentVolumeClaim is now handled by <service>.persistence
    • <service>.storage has been removed to standardize PVC handling across deployments
  • Fixes:
    • storage will fail if /var/lib/storage doesn't exist
    • kong shell doesn't support pipefail
    • Namespace cannot be resolved if using a non-standard domain search value
    • kong nil pointer evaluating interface #10 secrets are now defined in secrets/jwt
    • Database password not URL-encoded, which results in malformed DB URIs

Version compatibility

0.0.x to 0.1.x

  • supabase/postgres bumped from 14.1 to 15.1, which warrants backing up all your data before proceeding to update the major version
  • Initialization scripts for supabase/postgres has been reworked and matched closely to the Docker Compose version. Further tweaks to the scripts are needed to ensure backward-compatibility.
  • Migration scripts are now exposed at db.config, which will be mounted at /docker-entrypoint-initdb.d/migrations/. Copy your migration files from your local project's supabase/migration and populate the db.config.
  • Ingress is now limited to only kong service. This is designed to limit entry to the stack through secure kong service.
  • kong.yaml has been modified to follow Docker kong.yaml template.
  • supabase/storage does not come with pre-populated /var/lib/storage, therefore an emptyDir will be created if persistence is disabled. This might be incompatible with the previous version if the persistent storage location is set to a location other than specified above.

Known Issues

  • / character in dashboard username/password may result in kong service failed to initialize.
  • changing POSTGRES_BACKEND_URL value after initial deployment may cause analytics to return 404. A workaround is to simply restart db and/or analytics deployment.
  • Certain API calls such as /api/organizations/default-org-slug/billing/subscription, /api/projects/default/billing/addons, /api/integrations/default-org-slug?expand=true return 404
  • vector requires read access to the /var/log/pods directory. When run in a Kubernetes cluster this can be provided with a hostPath volume.

* version update:
  * db: 15.1.0.117
  * studio: 20230921-d657f29
  * auth: v2.99.0
  * rest: v11.2.0
  * realtime: v2.10.1
  * meta: v0.68.0
  * storage: v0.40.4
  * kong: 2.8.1
* major changes:
  * add imgproxy service
  * add dashboard basic auth
  * removal of all ingress except for db and kong
  * db port & database are included in secret and used by other services
  * db storage renamed as volume following other services
  * db init-scripts are now independent from image init-scripts
  * add db migration config
  * set internal URLs programmatically in deployments respectively
* minor changes:
  * update all services environment variable default values
  * add useful value comments about the new major changes
* bugfixes:
  * fix kong container command to bash
  * fix kong.yaml location due to invalid path on image
  * fix obsolete storage values and replaced with persistence instead
  * fix storage mount path by setting the default image persistence path
  * fix auth default mailer URL paths
  * fix kong paths using a possible different DNS search name
  * fix db SSL mode allowed by default
  * fix README instruction applying asset to specific namespace
* add analytic secret
* add analytic kong path
* fix imgproxy service port
* bump chart version
* add functions path to kong
* fix kong services dynamic ports instead of static ports
* fix analyic & imgproxy volumes & volumeMounts generation
* secrets are now built-in inside values.yaml
* previous secrets can still be used by removing the secret values
* bump minor project version for major structure changes
* update README to reflect all of the major changes
* add simple health check for supported services
* fix auth env DB_PORT value
* add cluster role and cluster role binding for vector log collection
* add node name as vector env
* disable vector own log from being collected
* change vector secret to config map
* remove useless copy vector if statement
* update ingress TLS secret name and hosts
* fix migration copy script bash syntax
* add detailed `kubectl get pod` instruction
* add missing `db` persistence default values
* remove deprecated `storage.storage` field
* update `db`, `realtime` environment variables
* update README uninstall instructions
* further simplify example values
* fix password not URL-encoded for db-uri
* fix `db_user` default value for related services
* fix `kong` config depending on enabled services
* fix `db` liveliness command
* remove `PG_USER` from `db`
* add chart icon based on supabase Github avatar
* fix kong deployment malformed if dashboard is disabled
* fix chart YAML lint errors
@drpsyko101 drpsyko101 marked this pull request as ready for review November 5, 2023 13:37
@drpsyko101
Copy link
Contributor Author

@milanvanschaik Added linting & testing Github Actions during PR based on helm chart-testing-action. But to enable automatic deployment to Github Pages, an environment and a token must be set up for the main branch. Proposed code for .github/workflows/release.yaml:

name: Release Charts

on:
  push:
    branches:
      - main

jobs:
  release:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v3
        with:
          fetch-depth: 0

      - name: Configure Git
        run: |
          git config user.name "$GITHUB_ACTOR"
          git config user.email "$GITHUB_ACTOR@users.noreply.github.com"

      - name: Run chart-releaser
        uses: helm/chart-releaser-action@v1.5.0
        env:
          CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"

@martincollado
Copy link

martincollado commented Nov 6, 2023

Any plans to merge this PR? We really need these fixes and improvements for Supabase on Kubernetes

@Rjpdude
Copy link

Rjpdude commented Nov 29, 2023

I tip my hat to you, sir. Awesome stuff right here.

@ushuz
Copy link

ushuz commented Dec 1, 2023

@milanvanschaik @bigbitbus @kiwicopple @devudopw @pieveee @icrotz

Sorry to bother you, but would you have some time to review and merge this? Thanks.

@kiwicopple
Copy link
Contributor

we're just trying to figure out the right maintainers of this repo. At the moment we have a few community members. We don't have available k8s expertise internally

image

@drpsyko101
Copy link
Contributor Author

@kiwicopple I could volunteer, although I still need to figure out how the GitHub Actions Helm deployment, coming from GitLab CI/CD workflow.

@arpagon
Copy link
Member

arpagon commented Dec 5, 2023

Hello there @kiwicopple

I'd like to offer my help as a volunteer maintainer for this repository. I believe I can contribute effectively based on my experience and skills.

Here's a brief overview of my qualifications:

  • I have experience with GitHub Actions and can contribute to automation and CI/CD processes.
  • I've developed approximately a dozen Helm Charts for various applications in the past, which can be beneficial for maintaining Helm Chart deployments.
  • I'm familiar with multiple cloud providers, including GKE, AKS, and Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE), which can be helpful for addressing cloud-related issues.
  • I've worked with self-hosted Kubernetes setups like k3s, Ubuntu Charmed Kubernetes, and microk8s, which means I can provide insights into managing Kubernetes clusters effectively.

I'm particularly interested in understanding the community's needs and objectives. While maintaining Helm Charts via GitHub Actions is one potential area, I'm open to exploring various ways to assist the project and contribute to its growth.

Let's initiate a discussion within the community to align our goals and determine how I can best support our shared objectives. Feel free to reach out if you have any questions or would like to discuss this further. I'm here to collaborate and work together with the community.

Thank you for considering my involvement, and I look forward to being part of this collaborative effort.

@kiwicopple
Copy link
Contributor

thanks @arpagon - I've also seen you in the community for a long time now. I've added you to the SRE team.

Also feel free to reach out to me on twitter to discuss the maintainer role more

* add JWT expiry
* update `db` JWT schema
* update `db` arguments based on docker-compose
* update README with local testing suite
* fix testing conditions
* remove `db` ingress
@mul14
Copy link

mul14 commented Dec 13, 2023

@drpsyko101 I've get an error when use

db:
  enabled: true
  persistence:
    enabled: true
    size: 100Gi
Error: INSTALLATION FAILED: 1 error occurred:
        * PersistentVolumeClaim "supabase-db-pvc" is invalid: spec.resources[storage]: Required value

You may want to check templates/db/volume.yaml.

spec:
  {{- if .Values.db.persistence.storageClassName }}
  storageClassName: {{ .Values.db.persistence.storageClassName }}
  {{- end }}
  accessModes:
  {{- range .Values.db.persistence.accessModes }}
    - {{ . | quote }}
  {{- end }}
  resources:
    requests:
-     db: {{ .Values.db.persistence.size | quote }}
+     storage: {{ .Values.db.persistence.size | quote }}

* update analytics command and environment variables
* fix persistent volume claim request schema
@drpsyko101
Copy link
Contributor Author

drpsyko101 commented Dec 13, 2023

@mul14 Good catch! This commit drpsyko101@c4b8116 should fix the issue affecting db, imgproxy and storage. Maybe I should include another values.yaml with persistent volumes enabled for CI as well.

@kiwicopple kiwicopple requested a review from arpagon December 14, 2023 08:21
@n84ck
Copy link

n84ck commented Dec 15, 2023

Hi @drpsyko101 there is a small typo in the values.example.yaml file.

The text 'repo' should be changed to 'tag'. There is no latest tag so the example will not work without this.

auth:
  image:
    repo: v2.125.1
  environment:
    GOTRUE_SITE_URL: http://example.com
    GOTRUE_EXTERNAL_EMAIL_ENABLED: "true"
    GOTRUE_MAILER_AUTOCONFIRM: "true"
    GOTRUE_SMTP_ADMIN_EMAIL: "your-mail@example.com"
    GOTRUE_SMTP_HOST: "smtp.example.com"
    GOTRUE_SMTP_PORT: "587"
    GOTRUE_SMTP_SENDER_NAME: "your-mail@example.com"

@drpsyko101
Copy link
Contributor Author

drpsyko101 commented Dec 15, 2023

Thanks @n84ck! It's a typo from drpsyko101@70a3545. Not sure how it got through the chart testing script though.

It seems like they've removed the latest tag quite recently. Probably a good idea to set spec.template.spec.containers[0].imagePullPolicy to Always in the values.example.yaml.

@nabi-chan
Copy link

Hello, I have trouble with analytics.
When I try to get some info from Log explorer, It always return 500 error.
At analytics container, it always return this logs
anyone can help me?

LOGFLARE_NODE_HOST is: 127.0.0.1
13:05:07.208 [info] Starting migration
13:05:07.456 [info] Migrations already up
13:05:07.458 [info] Migration finished
13:05:09.416 [info] Elixir.Logflare.SigtermHandler is being initialized...
13:05:09.642 [info] Table counters started!
13:05:09.642 [info] Rate counter table started!
13:05:09.652 [info] Running LogflareWeb.Endpoint with cowboy 2.10.0 at 0.0.0.0:4000 (http)
13:05:09.657 [info] Access LogflareWeb.Endpoint at http://localhost:4000
13:05:09.659 [info] Running LogflareGrpc.Endpoint with Cowboy using http://0.0.0.0:50051
13:05:09.665 [info] Executing startup tasks
13:05:09.666 [info] Ensuring single tenant user is seeded...
13:05:09.762 [error] GenServer {Logflare.Backends.SourceRegistry, Logflare.Repo.Postgres.Adaptorb845a513a0a0eb88efd00dff0544f0b09886f15c2bc64d24a58c4e2d83808f3d} terminating
** (stop) exited in: DBConnection.Holder.checkout(#PID<0.5071.0>, [log: #Function<13.1785099/1 in Ecto.Adapters.SQL.with_log/3>, repo: Logflare.Repo.Postgres.Adaptorb845a513a0a0eb88efd00dff0544f0b09886f15c2bc64d24a58c4e2d83808f3d, timeout: 15000, pool_size: 3, pool: DBConnection.ConnectionPool])
** (EXIT) killed
(db_connection 2.5.0) lib/db_connection/holder.ex:97: DBConnection.Holder.checkout/3
(db_connection 2.5.0) lib/db_connection.ex:1200: DBConnection.checkout/3
(db_connection 2.5.0) lib/db_connection.ex:1525: DBConnection.run/6
(db_connection 2.5.0) lib/db_connection.ex:656: DBConnection.parsed_prepare_execute/5
(db_connection 2.5.0) lib/db_connection.ex:648: DBConnection.prepare_execute/4
(postgrex 0.17.1) lib/postgrex.ex:361: Postgrex.query_prepare_execute/4
(logflare 1.4.0) lib/logflare/backends/adaptor/postgres_adaptor/pg_repo.ex:207: Logflare.Backends.Adaptor.PostgresAdaptor.PgRepo.handle_continue/2
(stdlib 4.3.1) gen_server.erl:1123: :gen_server.try_dispatch/4
Last message: {:continue, :connect_repo}
13:05:09.764 [error] Task #PID<0.5023.0> started from Logflare.Supervisor terminating
** (CaseClauseError) no case clause matching: {:error, {:shutdown, {:failed_to_start_child, "source-backend-1", {{:killed, {DBConnection.Holder, :checkout, [#PID<0.5071.0>, [log: #Function<13.1785099/1 in Ecto.Adapters.SQL.with_log/3>, repo: Logflare.Repo.Postgres.Adaptorb845a513a0a0eb88efd00dff0544f0b09886f15c2bc64d24a58c4e2d83808f3d, timeout: 15000, pool_size: 3, pool: DBConnection.ConnectionPool]]}}, {GenServer, :call, [{:via, Registry, {Logflare.Backends.SourceRegistry, Logflare.Repo.Postgres.Adaptorb845a513a0a0eb88efd00dff0544f0b09886f15c2bc64d24a58c4e2d83808f3d}}, :connected?, 5000]}}}}}
(logflare 1.4.0) lib/logflare/backends.ex:196: Logflare.Backends.start_source_sup/1
(logflare 1.4.0) lib/logflare/single_tenant.ex:176: anonymous fn/2 in Logflare.SingleTenant.ensure_supabase_sources_started/0
(elixir 1.14.4) lib/enum.ex:2468: Enum."-reduce/3-lists^foldl/2-0-"/3
(logflare 1.4.0) lib/logflare/single_tenant.ex:174: Logflare.SingleTenant.ensure_supabase_sources_started/0
(logflare 1.4.0) lib/logflare/application.ex:197: Logflare.Application.startup_tasks/0
(elixir 1.14.4) lib/task/supervised.ex:89: Task.Supervised.invoke_mfa/2
(stdlib 4.3.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Function: #Function<0.43670906/0 in Logflare.Application.get_children/1>
Args: []

@drpsyko101
Copy link
Contributor Author

drpsyko101 commented Mar 29, 2024

@nabi-chan It is a known issue at this moment.

analytics internal server error due to Logflare endpoint match error.

I'm still investigating the source of the problem. It might be related to Kubernetes resources rather than the container itself.

Edit: Adding missing LOGFLARE_URL and LOGFLARE_API_KEY environment variables to the studio deployment wlll show the exact error:

07:36:19.593 [error] GenServer {Logflare.Endpoints.Cache, 1, %{"iso_timestamp_start" => "2024-03-29T06:00:00.000Z", "project" => "default", "project_tier" => "ENTERPRISE", "ref" => "default", "sql" => "select id, timestamp, event_message, request.method, request.path, response.status_code\n  from edge_logs\n  cross join unnest(metadata) as m\n  cross join unnest(m.request) as request\n  cross join unnest(m.response) as response\n  \n  order by timestamp desc\n  limit 100\n  "}} terminating
** (MatchError) no match of right hand side value: {:error, {:already_started, #PID<0.5145.0>}}
(logflare 1.4.0) lib/logflare/backends/adaptor/postgres_adaptor/pg_repo.ex:37: Logflare.Backends.Adaptor.PostgresAdaptor.PgRepo.create_repo/1
(logflare 1.4.0) lib/logflare/backends/adaptor/postgres_adaptor.ex:108: Logflare.Backends.Adaptor.PostgresAdaptor.execute_query/2
(logflare 1.4.0) lib/logflare/endpoints.ex:261: Logflare.Endpoints.exec_query_on_backend/4
(logflare 1.4.0) lib/logflare/endpoints/cache.ex:127: Logflare.Endpoints.Cache.handle_call/3
(stdlib 4.3.1) gen_server.erl:1149: :gen_server.try_handle_call/4
(stdlib 4.3.1) gen_server.erl:1178: :gen_server.handle_msg/6
(stdlib 4.3.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.5224.0>): :query

@AntonOfTheWoods
Copy link

@drpsyko101 where are we on this now? Is it pretty much good, or still in dev?

@arpagon are you still able to dedicate some time to this, or should someone else take over review duties?

* update image tags
@drpsyko101
Copy link
Contributor Author

drpsyko101 commented Mar 30, 2024

@AntonOfTheWoods I'm actively running it in prod, but only covers microk8s and minikube environment. More in-depth tests are needed for vector and analytics since I do not touch that part much in my code.

@nabi-chan Fixed the analytics issue when using URL parameter in the JDBC. I can only assume that = violates the query match request to the db.

Edit: Changing the POSTGRES_BACKEND_URL analytics environment variables after the initial deployment will cause it to return 404 to the studio.

@AntonOfTheWoods
Copy link

AntonOfTheWoods commented Mar 30, 2024

@AntonOfTheWoods I'm actively running it in prod, but only covers microk8s and minikube environment. More in-depth tests are needed for vector and analytics since I do not touch that part much in my code.

Nice @drpsyko101 . I see you have an extra commit in your repo for using existing secrets - is that intended for here also (yes please!)?

Copy link

@AntonOfTheWoods AntonOfTheWoods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There looks like a bit of factorisation could reduce the number of lines of repeated code

securityContext:
{{- toYaml .Values.analytics.podSecurityContext | nindent 8 }}
initContainers:
- name: init-db

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated in quite a few places, and probably could be in a couple of other deployments that don't currently have an init container db check. Could this be extracted to the top _helpers.tpl and included?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. However, there might be slight changes to analytics init-db since changing POSTGRES_BACKEND_URL after initial deployment may cause it to return 404. I'm still trying to figure out where does the initial value resides and maybe add an alteration script to the init-db.

Copy link

@AntonOfTheWoods AntonOfTheWoods Mar 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are interested, I can try and adapt the bitnami way of doing things to what we have here. Personally, I think it would be worth exploring using the bitnami postgresql and minio charts as sub-dependencies. While they can be horribly slow updating some stuff, they are pretty good on the highly used charts, like postgres & co. What do you think?

image: postgres:15-alpine
imagePullPolicy: IfNotPresent
env:
- name: DB_HOST

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of how bitnami does these things, because it allows to easily refer to external dbs, e.g, https://github.com/bitnami/charts/blob/main/bitnami/supabase/templates/_helpers.tpl#L413C1-L426C12 . I am a n00b but is there a benefit to having DB_HOST as an explicit external env var for each submodule -> could there not be a externalDatabase like bitnami ( https://github.com/bitnami/charts/blob/main/bitnami/supabase/values.yaml#L3142)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add external db values somewhere under db.external.host. But as you can see, the main objective of this PR is to make sure all deployments are up-to-date with the official Supabase repo. Any QoL improvements may hopefully come after this PR is merged.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, have you had any signs of life from @arpagon though? It seems he might be too busy, and it might be better to just move forward and try and get things merged later. This already officially qualifies as a super-mega-merge PR, so a few extra large-scale changes probably wouldn't hurt :-). And bitnami are a common source of inspiration for charts around the interwebs. So we could continue this on your repo and then try and get that merged later maybe?

@drpsyko101
Copy link
Contributor Author

@AntonOfTheWoods Using existing refs still need a little bit more work on handling custom keys and escaping characters. It probably will be in the next PR, I think.

@nabi-chan
Copy link

@nabi-chan Fixed the analytics issue when using URL parameter in the JDBC. I can only assume that = violates the query match request to the db.

@drpsyko101
It looks like has another issues... :0... is there any solution?

{
  "code": 502,
  "errors": [],
  "message": "Something went wrong! Unknown error. If this continues please contact support.",
  "status": "UNKNOWN"
}

@drpsyko101
Copy link
Contributor Author

@nabi-chan Apparently changing POSTGRES_BACKEND_URL value after initial deployment may still cause analytics to return 404 to the studio. Restarting db and/or analytics will resolve the issue.

@nabi-chan
Copy link

nabi-chan commented Mar 31, 2024

@nabi-chan Apparently changing POSTGRES_BACKEND_URL value after initial deployment may still cause analytics to return 404 to the studio. Restarting db and/or analytics will resolve the issue.

Hello @drpsyko101, thanks for your help! anyway, I restarted all projects, but when I try got these logs in analytics. How can I fix it?

01:18:13.057 [error] GenServer {Logflare.Endpoints.Cache, 1, %{"project" => "default", "project_tier" => "ENTERPRISE", "ref" => "default", "sql" => "\nSELECT\n-- log-event-chart\n timestamp_trunc(t.timestamp, minute) as timestamp,\n count(t.timestamp) as count\nFROM\n auth_logs t\n cross join unnest(metadata) as metadata\n where t.timestamp > '2024-03-30T18:00:00.000Z'\nGROUP BY\ntimestamp\nORDER BY\n timestamp ASC\n "}} terminating
** (MatchError) no match of right hand side value: {:error, {:already_started, #PID<0.5085.0>}}
(logflare 1.4.0) lib/logflare/backends/adaptor/postgres_adaptor/pg_repo.ex:37: Logflare.Backends.Adaptor.PostgresAdaptor.PgRepo.create_repo/1
(logflare 1.4.0) lib/logflare/backends/adaptor/postgres_adaptor.ex:108: Logflare.Backends.Adaptor.PostgresAdaptor.execute_query/2
(logflare 1.4.0) lib/logflare/endpoints.ex:261: Logflare.Endpoints.exec_query_on_backend/4
(logflare 1.4.0) lib/logflare/endpoints/cache.ex:127: Logflare.Endpoints.Cache.handle_call/3
(stdlib 4.3.1) gen_server.erl:1149: :gen_server.try_handle_call/4
(stdlib 4.3.1) gen_server.erl:1178: :gen_server.handle_msg/6
(stdlib 4.3.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.5199.0>): :query
01:18:13.058 [error] Endpoint query exited for an unknown reason

* fix storage fails to deploy if minio stub exists
* update postgres-meta image tag
@AntonOfTheWoods
Copy link

AntonOfTheWoods commented Apr 2, 2024

@drpsyko101 , I no longer seem to be getting any errors but I'm not actually getting any logs. With the docker I can see various different logs for the different services but I don't get any on kube. Can you confirm you are actually seeing logs?

EDIT: it looks like maybe only the aggregator is being installed, and agents are not installed. Or is there some mode that they can work as both?

@drpsyko101
Copy link
Contributor Author

@AntonOfTheWoods Same here. The request sent to analytics returns an empty array. I'll crosscheck the init script on db with the docker compose later.

@nabi-chan The configuration might still be stuck somewhere, either in the ephemeral storage or in the db. Try clearing analytics schema from the db and then simply delete the analytics pod.

@AntonOfTheWoods
Copy link

@drpsyko101 I'm not sure how relevant the compose file is going to be, given straight docker compose, or at least the compose file provided by supabase, isn't meant to be multi-server, and Kubernetes definitely is. My reading of the Vector kube setup is that it requires an agent to be deployed on every host (at least that will be hosting the supabase pods).

So that would mean ideally a daemonset or similar for agents. Or just studying or even directly using the vector.dev provided helm chart as a sub-chart. I will have a play and see if I can get anything working.

@ThePragmaticFox
Copy link

ThePragmaticFox commented Apr 5, 2024

I successfully adjusted and installed the helm chart, thanks for it. I have four remaining issues:

  • The logs don't work yet, I either get nothing, or if I manually query I get:
    { "code": 502, "errors": [], "message": "Something went wrong! Unknown error. If this continues please contact support.", "status": "UNKNOWN" }
    Don't think it's too big of a deal, will just need some fine-tuning.

  • https://URL/api/pg-meta/default/query?key=migrations yields a 400 error. Actually I double-checked; this also happens in the official supabase.com hosted instance. So, I guess it will be fixed at some point.

  • https://URL/ redirects to https://URL/default, not the project I specified. Not very important, but ... a bit pointless.

  • https://URL/project/default/realtime/inspector when I Join a channel, it doesn't work, because I didn't find a way to make websockets (wss://URL/realtime/v1/websocket) work over ingress nginx . Anyone has any idea how we can adjust the ingress to kong so it works for websockets?

Btw.

Certain API calls such as /api/organizations/default-org-slug/billing/subscription, /api/projects/default/billing/addons, /api/integrations/default-org-slug?expand=true return 404

I fixed that by bumping up the versions of all components to match https://github.com/supabase/cli/blob/c9a0c2972194d51004fe6849619339a17c85f1c8/internal/utils/misc.go (sorry, don't know which bump actually resolved the issue)

@drpsyko101
Copy link
Contributor Author

drpsyko101 commented Apr 5, 2024

@AntonOfTheWoods I've managed to get the vector to parse the correct event log that can be picked up by analytics. The only issue remains is to resolve the secret handling to the vector.yaml. Another concerning issue is exposing /var/log/pods to the vector pod through hostPath as this might not present in all Kubernetes environment.

Edit: Seems impossible to pass secret to http uri sink at the moment vectordotdev/vector#1155. Is there any way to pass the key via headers or something? Updated the image to 0.34-alpine that supports importing secrets.

* add helm install notes
* add environment variable support to vector
* update image tags
* update README on vector volume
* fix vector volume error
@AntonOfTheWoods
Copy link

@drpsyko101 I can see the logs! Hurrah! From what I can tell from https://kubernetes.io/docs/concepts/storage/volumes/#hostpath, it looks like this is pretty much the only usecase where it is considered "valid". There seem to be a few very old tickets in the vector helm repo regarding vector trying to load several hostPath dirs in rw but I'm not sure how relevant they are now. Looking at Google's GKE, I think that mounting /var/log read-only is actually fine in their "fully-managed" (auto-pilot) mode so I'm not sure there would be many commonly used platforms that don't at least allow that. I have no idea how it works behind the scenes but I don't think there is any fundamental issue requiring this for logs to be supported in this chart.

This is good news anyway! I unfortunately no longer have any multi-node clusters but I suspect that the fact it is using a deployment rather than a DaemonSet is going to be a problem for multi-node clusters. If you are using a hostPath to get the logs then you of course need a pod on every node, and deployments are not meant to be node-specific, nor for there to be one on every node.

@arpagon
Copy link
Member

arpagon commented Apr 10, 2024

Hello @drpsyko101 and @AntonOfTheWoods,

I want to extend a sincere thank you to both of you for your incredible contributions. @drpsyko101, your initiative to align the Helm chart with the official Docker compose is incredibly valuable, and @AntonOfTheWoods, your detailed feedback and insights have been instrumental in shaping this update.

I apologize for the delay in reviewing this PR. I recognize that the complexity of the changes and my own schedule have created an unexpected bottleneck in the process.

Here's how we're going to move forward:

  • Focused Review: I'm dedicating significant time over the coming week for a comprehensive review and testing of the proposed updates. I'll provide detailed feedback by the end of this week.

  • Open Collaboration: @drpsyko101 and @AntonOfTheWoods, please continue to share your thoughts. I welcome your input as we navigate any further adjustments. Let's work together to get this ready for merging.

  • Future Improvements: This experience highlights an area for growth within our project. I'm committed to streamlining our review and feedback processes to provide better support for contributors. I'd welcome your suggestions on how we can achieve this.

Thank you again for your dedication and patience.

@AntonOfTheWoods
Copy link

AntonOfTheWoods commented Apr 11, 2024

Thanks for your update @arpagon ! Just to preface the discussion, I think it is very clear that by FAR the most important issue for current supabase helm charts is the reactivity of the maintainers, notably on this repo and the bitnami one. I think if we can fix that issue there is a good chance we a high quality and up-to-date supabase chart for the community to share and improve on. While it's obviously a challenge finding the time, I am a firm believer it is better to make things clear at the top of the readme if a project has no maintainers that can take a look at bug reports and PRs in a reasonable timeframe (<2weeks-ish). Everyone has priorities and life happens sometimes, but users and contributors can waste massive amounts of time if maintainers drop the ball without announcing it and that can cause a great amount of pointless frustration.

There is one thing that I would also like to point out about trying to "make this chart like the upstream docker compose". The goals of kubernetes and docker compose (without storm but who would use that nowadays?) are fundamentally different, and the compose that is provided is definitely not meant to be multi-machine. As a SAAS provider, I can understand supabase wanting to provide a great compose experience - it works great on a dev machine and for hobbyists, but if you need any sort of reliability, scalability or performance then you'll need to go with their services. I hope we can craft a chart that enables for proper scaling across a reasonably-sized kube cluster with heterogenous services, though I definitely understand that supabase the company might not be hugely interested in that!

There are many parts of the current chart that look like they are just trying to copy/paste from the compose. While I don't claim to be an expert, the Bitnami charts are typically pretty well respected for what they do. Bitnami is now fully corporate and obviously don't have a great demand from paying customers for supabase. They do, however, have excellent charts for postgresql and minio, and I would definitely recommend using relevant charts from them as dependencies rather than the current approach. They reuse many of their own charts as dependencies and typically (when they have customers for it anyway) that works great, and enables them to have high quality, very complicated systems with relatively clear, concise and well-written charts.

Regarding the vector comments above, I got a response from one of their maintainers on the agent vs aggregator thing and their own chart for installing vector is supposed to be installed twice, once in the agent role (as a DaemonSet) and once with the aggregator role (deployment or statefulset). That seems a very dubious way of doing things (why not just allow a single instance install one or more roles???) but it seems clear that this PR's current deployment based system won't work with a multi-node cluster.

One thing that admittedly makes for more verbose charts but I think is much, much better for explicitly integrating the constraints of the underlying software is not having obligatory config defined in environment sections in the values.yaml. I would argue that all obligatory configuration should really be explicit in the templates in the env section with default values, rather than just as "put everything you want into environment". There is always the possibility of an extraEnv that does what the current environment section does for non-obligatory stuff, and that is the approach Bitnami typically takes.

I have a project based on react-admin on the frontend that has a very ugly combination of rxdb, dexie and wa-sqlite, with fastapi graphql + REST on the backend I am intending to migrate to react-admin + electric-sql + supabase. To get things started I am trying to convert the compose-based ra-supabase example project to use this chart (along with a number of DDL changes for electric compatibility). It is taking far longer than I hoped but my intention was to get something working, then iteratively work on improving this chart (based on this PR!) until it was sparkly clean and then submit for discussion. I'm still very new to supabase so I regularly get things that aren't working and the extent of the error messages is, e.g, TypeError: fetch failed, with nothing in any logs I could find...

Anyway, if we can get something similar to this merged, then I will definitely be submitting plenty of PRs in the next month or two!

@arpagon
Copy link
Member

arpagon commented Apr 12, 2024

Thanks for your update @arpagon ! Just to preface the discussion, I think it is very clear that by FAR the most important issue for current supabase helm charts is the reactivity of the maintainers, notably on this repo and the bitnami one. I think if we can fix that issue there is a good chance we a high quality and up-to-date supabase chart for the community to share and improve on. While it's obviously a challenge finding the time, I am a firm believer it is better to make things clear at the top of the readme if a project has no maintainers that can take a look at bug reports and PRs in a reasonable timeframe (<2weeks-ish). Everyone has priorities and life happens sometimes, but users and contributors can waste massive amounts of time if maintainers drop the ball without announcing it and that can cause a great amount of pointless frustration.

I am totally. I had a bottleneck at the beginning of the year. But I think I am still able to maintain this repo (in regular base of <2 weeks-ish).

There is one thing that I would also like to point out about trying to "make this chart like the upstream docker compose". The goals of kubernetes and docker compose (without storm but who would use that nowadays?) are fundamentally different, and the compose that is provided is definitely not meant to be multi-machine.

You are right, this chart should be work in multi-node cluster, and this should be the medium-term objective for this chart. For now, we can integrate this PR. and work on a roadmap together with everyone who has engaged in this discussion, wdyt?

As a SAAS provider, I can understand supabase wanting to provide a great compose experience - it works great on a dev machine and for hobbyists, but if you need any sort of reliability, scalability or performance then you'll need to go with their services. I hope we can craft a chart that enables for proper scaling across a reasonably-sized kube cluster with heterogenous services, though I definitely understand that supabase the company might not be hugely interested in that!

I agree, I tell you that that was my main concern when talking to @kiwicopple. And for me, it was very satisfying to see that in Supabase's vision they really love self-hosting, but it's just not a priority. That is why the work falls on us community contributors.

There are many parts of the current chart that look like they are just trying to copy/paste from the compose. While I don't claim to be an expert, the Bitnami charts are typically pretty well respected for what they do. Bitnami is now fully corporate and obviously don't have a great demand from paying customers for supabase. They do, however, have excellent charts for postgresql and minio, and I would definitely recommend using relevant charts from them as dependencies rather than the current approach. They reuse many of their own charts as dependencies and typically (when they have customers for it anyway) that works great, and enables them to have high quality, very complicated systems with relatively clear, concise and well-written charts.

To be honest, I'm not a big fan of bitnami charts. I mean, I love the work they do and the standardization they have achieved. But I like much more that each Application, in this case Supabase, follows its own path with its own challenges.

Regarding the vector comments above, I got a response from one of their maintainers on the agent vs aggregator thing and their own chart for installing vector is supposed to be installed twice, once in the agent role (as a DaemonSet) and once with the aggregator role (deployment or statefulset). That seems a very dubious way of doing things (why not just allow a single instance install one or more roles???) but it seems clear that this PR's current deployment based system won't work with a multi-node cluster.

One thing that admittedly makes for more verbose charts but I think is much, much better for explicitly integrating the constraints of the underlying software is not having obligatory config defined in environment sections in the values.yaml. I would argue that all obligatory configuration should really be explicit in the templates in the env section with default values, rather than just as "put everything you want into environment". There is always the possibility of an extraEnv that does what the current environment section does for non-obligatory stuff, and that is the approach Bitnami typically takes.

I have a project based on react-admin on the frontend that has a very ugly combination of rxdb, dexie and wa-sqlite, with fastapi graphql + REST on the backend I am intending to migrate to react-admin + electric-sql + supabase. To get things started I am trying to convert the compose-based ra-supabase example project to use this chart (along with a number of DDL changes for electric compatibility). It is taking far longer than I hoped but my intention was to get something working, then iteratively work on improving this chart (based on this PR!) until it was sparkly clean and then submit for discussion. I'm still very new to supabase so I regularly get things that aren't working and the extent of the error messages is, e.g, TypeError: fetch failed, with nothing in any logs I could find...

Anyway, if we can get something similar to this merged, then I will definitely be submitting plenty of PRs in the next month or two!

Let's discuss everything else in a roadmap thread, okay?

@arpagon arpagon merged commit fa0547c into supabase-community:main Apr 12, 2024
@drpsyko101 drpsyko101 deleted the update-images branch April 13, 2024 11:33
badgifter pushed a commit to badgifter/supabase-helm that referenced this pull request May 30, 2024
Update images to match official Supabase Docker compose
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.