Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Allow users to specify custom Dependency jars (spark) #1359

Open
amahussein opened this issue Sep 24, 2024 · 0 comments
Open

[FEA] Allow users to specify custom Dependency jars (spark) #1359

amahussein opened this issue Sep 24, 2024 · 0 comments
Assignees
Labels
feature request New feature or request user_tools Scope the wrapper module running CSP, QualX, and reports (python)

Comments

@amahussein
Copy link
Collaborator

amahussein commented Sep 24, 2024

Is your feature request related to a problem? Please describe.

Users with custom Spark might face issues using the python wrapper if the open source versions of spark is not compatible with the custom build.
It will be helpful to allow the users to set their own dependency in order to use the custom spark/hadoop jars.

Suggested proposal

  • Add a new input argument that takes a path to a local json file.
    • --tools_config_file
    • The argument has to be local because we would not be able to access it before setting the spark dependencies.
  • the config file can be used in the future to append any configurations that the user is willing to override in the default behavior.
    • the suggested current structure is:
      {
         "dependencies": {
            "deployMode": {
            "LOCAL": {
              "EnvDep1": [
                 {
                    "name": "Apache Spark"
                    "uri": "path to the location of the compressed tgz (https:// or file://)"
                    "type": "archive (for tgz files) / jar for jar files"
                    "relativePath": "where to find the jars within the unarchived folder (i.e., jars/*)"
                    "sha512": "XYZ (used for verification)"
                    "size": "size in bytes (used for )"
                 },
                 {
                    "name": "Storage connector (this can be hadoop jar if it is not included in spark jars or any other required jar",
                    "uri": ""
                    "type": "jar"
                    "relativePath": "where to find the jars within the unarchived folder (i.e., jars/*)"
                    "sha512": "XYZ (used for verification)"
                    "size": "size in bytes (used for )"
                 },
                 ....define remaining dependencies if needed.
              ]
            }
         }
      }
    the above format requires some changes in current code which is basically detecting local FS vs https
@amahussein amahussein added ? - Needs Triage feature request New feature or request user_tools Scope the wrapper module running CSP, QualX, and reports (python) labels Sep 24, 2024
@mattahrens mattahrens changed the title [FEA] Allow users to specify custome Dependency jars (spark) [FEA] Allow users to specify custom Dependency jars (spark) Sep 24, 2024
@amahussein amahussein self-assigned this Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

No branches or pull requests

1 participant