Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for app ids in spark #691

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add support for app ids in spark #691

wants to merge 1 commit into from

Conversation

ambud
Copy link

@ambud ambud commented Feb 17, 2023

gprofiler doesn't pull the spark app ids which are necessary as a fallback mechanism if app name is unavailable.

Description

Adding a simple cli parser to pull app ids in case app names are not available

Related Issue

None

Motivation and Context

Without app ids we get no way to identify what app this is.

How Has This Been Tested?

Locally tested the snippet, added try catch in case the snippet fails.

@Jongy
Copy link
Contributor

Jongy commented Feb 18, 2023

Hi @ambud ! I let the CI run, there are small lint errors that you can fix.

To e2e test the profiler, you can build it locally with the instructions here https://github.com/Granulate/gprofiler/blob/master/CONTRIBUTING.md#docker-build for a Docker image build, and ./scripts/build_x86_64_executable.sh for an executable build (depending on how you wish to deploy it). Note that the build can take 20-30 mins on an average server and requires at least 16GB of RAM (it builds all relevant components of the profiler). Those last few instructions are not included in the CONTRIBUTING doc, I'll make sure to add them :)

@Jongy
Copy link
Contributor

Jongy commented Feb 18, 2023

Note that you might be able to achieve the same goal by using the --app-id-args-filter option. It will match an RE against each argument individually and will include all of them in the gProfiler appid for the application. (Using it you cannot match against the --app-id flag, however - only against the value)

@ambud
Copy link
Author

ambud commented Feb 18, 2023

Hi @ambud ! I let the CI run, there are small lint errors that you can fix.

To e2e test the profiler, you can build it locally with the instructions here https://github.com/Granulate/gprofiler/blob/master/CONTRIBUTING.md#docker-build for a Docker image build, and ./scripts/build_x86_64_executable.sh for an executable build (depending on how you wish to deploy it). Note that the build can take 20-30 mins on an average server and requires at least 16GB of RAM (it builds all relevant components of the profiler). Those last few instructions are not included in the CONTRIBUTING doc, I'll make sure to add them :)

I had already tried running the script however it failed due to Granulate utils dependency, is there something I need to install?

@ambud
Copy link
Author

ambud commented Feb 18, 2023

Note that you might be able to achieve the same goal by using the --app-id-args-filter option. It will match an RE against each argument individually and will include all of them in the gProfiler appid for the application. (Using it you cannot match against the --app-id flag, however - only against the value)

Thanks for this pointer, I don't think it will be as clean though since there are several places the application_XXXX is specified.

@ambud
Copy link
Author

ambud commented Feb 18, 2023

Updated the code and fixed the lint issue

@Jongy
Copy link
Contributor

Jongy commented Feb 18, 2023

I had already tried running the script however it failed due to Granulate utils dependency, is there something I need to install?

Did you clone with submodules? (see as suggested here. granulate-utils is a submodule necessary for the build, yeah. Don't need to build it or anything, just clone

@Jongy
Copy link
Contributor

Jongy commented Feb 18, 2023

Thanks for this pointer, I don't think it will be as clean though since there are several places the application_XXXX is specified.

I see, so a pattern like (application_[^ ]+) will catch many sites.

If the matching was done on the full commandline, the pattern could be --app-id[ =]([^ ]+) and it'd do the trick, am I right?

for idx, x in enumerate(args):
if x == self._APP_ID_KEY:
return f"spark: {args[idx+1]}"
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect any particular exception type here? ValueError for idx+1? Please change to ValueError then (it's nicer compared to generic except Exception which might catch unexpected things)

@ambud
Copy link
Author

ambud commented Feb 18, 2023

I had already tried running the script however it failed due to Granulate utils dependency, is there something I need to install?

Did you clone with submodules? (see as suggested here. granulate-utils is a submodule necessary for the build, yeah. Don't need to build it or anything, just clone

Oh I didn't realize that's where the build instructions were. Let me try.

@Jongy
Copy link
Contributor

Jongy commented Feb 20, 2023

We're currently investigating another approach, so putting this on hold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants