Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Spark Operator Deployment Not Handling Multiple Namespaces #2052

Closed
1 task done
vara-bonthu opened this issue Jun 10, 2024 · 6 comments · Fixed by #2072
Closed
1 task done

[Feature] Spark Operator Deployment Not Handling Multiple Namespaces #2052

vara-bonthu opened this issue Jun 10, 2024 · 6 comments · Fixed by #2072

Comments

@vara-bonthu
Copy link
Contributor

vara-bonthu commented Jun 10, 2024

Description

Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration.

If your request is for a new feature, please use the Feature request template.

  • The Spark Operator deployment currently handles either a single namespace or all namespaces. It does not correctly handle a specified list of namespaces.

Description:
When deploying the Spark Operator with Helm, the following behaviors were observed:

Deploying with a single namespace works as expected.

sparkJobNamespaces:
  - "ns1"

Deploying with an empty list correctly monitors all namespaces.

sparkJobNamespaces: ""

Deploying with multiple namespaces as a list as shown below. Deployment is successful but it is not monitoring any namespace.

sparkJobNamespaces:
  - "ns1"
  - "ns2"
  • ✋ I have searched the open/closed issues and my issue is not listed.

Reproduction Code [Required]

Steps to reproduce the behavior:

Expected behavior

The Spark Operator should be able to monitor multiple specified namespaces when provided as a list.

Actual behavior

The Spark Operator deployment starts, but it does not monitor any of the specified namespaces when provided as a list.

Terminal Output Screenshot(s)

Environment & Versions

Additional context

@imtzer
Copy link

imtzer commented Jun 10, 2024

Hi @vara-bonthu, it dose not support multiple specified namespaces if you have read the deployment.yaml like this:

{{- $jobNamespaces := .Values.sparkJobNamespaces | default list }}
---
...
        args:
...
        {{- if eq (len $jobNamespaces) 1 }}
        - -namespace={{ index $jobNamespaces 0 }}
        {{- end }}
...

if you set more that one namespace, it will use default value that is all namespace
so I think this is not a bug

@vara-bonthu
Copy link
Contributor Author

You are correct that the current Helm template only allows specifying either a single namespace or all namespaces if more than one is defined. This makes the sparkJobNamespaces list somewhat misleading, as it doesn't support multiple namespaces in the way our documentation suggests. It might be more logical to use a string instead of a list if multiple namespaces are not supported.

sparkJobNamespaces | list | [""] | List of namespaces where to run spark jobs

To provide better support for users who want to deploy multiple instances of Spark Operator and monitor dedicated namespaces, it would be beneficial to implement support for multiple namespaces per deployment. For example, allowing spark-operator-1 to manage ns1 and ns2, while spark-operator-2 manages ns3 and ns4, etc.

This enhancement would improve flexibility and align with user expectations based on our documentation.

I will change this to feature request .Thanks

@vara-bonthu vara-bonthu changed the title [BUG] Spark Operator Deployment Not Handling Multiple Namespaces [Feature] Spark Operator Deployment Not Handling Multiple Namespaces Jun 10, 2024
@imtzer
Copy link

imtzer commented Jun 11, 2024

OK, I will try to do this

@imtzer
Copy link

imtzer commented Jun 13, 2024

Hi @vara-bonthu, multi namespace support has been widely discussed in many repo, #507, #25692, #74415

In this repo, InformerFactory code generated by code-generator only support one or all namespace due to libaray api restriction, and there are there possible solution:

  1. Use controller-runtime multiNamespaceInformer instand, here. But currently, the informer code is generated by code-generator, and will have huge work to do
  2. Add a new type informer which contains some single namespace informer to handle multi namespace sparkapp, whic is mentioned in #25692 and here. Someone says this will bring addtional resource consumer
  3. Handle all namespace sparkapp default, but use namespace parameter to controll which namespace sparkpp can be handled

But I don't think those are best solution

@yuchaoran2011
Copy link
Contributor

@imtzer Thanks for the summary. I looked into refactoring the code base using controller-runtime with a co-worker years ago but didn't follow through on it due to the massive scale of changes required. It will basically be a rewrite. But probably can be the right thing to do at some point.
The third option you mentioned is probably the easiest to achieve

@vikas-saxena02
Copy link
Contributor

@vara-bonthu thank you very much.... i have been stuck on this for past 2 weeks. The rational behind our problem is that with number of concurrent jobs increasing for our cluster across various namespaces we are seeing performance issues with the spark-operator pod and hence wanted to deploy multiple spark-operators each handling 2-3 namespaces (for us 1 namespace = 1 team). This change will help for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants