Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading curations from ClearlyDefined in the analyzer step is inefficient #3905

Closed
1 of 2 tasks
oheger-bosch opened this issue Apr 19, 2021 · 3 comments
Closed
1 of 2 tasks
Labels
analyzer About the analyzer tool enhancement Issues that are considered to be enhancements

Comments

@oheger-bosch
Copy link
Member

oheger-bosch commented Apr 19, 2021

When analyzing a larger project, I noticed that about half of the time was consumed by loading curation data from ClearlyDefined via ClearlyDefinedPackageCurationProvider. So optimizing this class should actually have a measurable effect on analyzer runs.

Points to optimize could be:

  • Packages that are referenced by multiple projects are queried multiple times.
  • Requests for single packages are sent one by one. It could be checked whether the ClearlyDefined API supports bulk requests, or some parallelization could be done on client side.
@oheger-bosch oheger-bosch added enhancement Issues that are considered to be enhancements analyzer About the analyzer tool labels Apr 19, 2021
sschuberth added a commit that referenced this issue Apr 19, 2021
This is a preparation for resolving #3905.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit that referenced this issue Apr 20, 2021
This is a preparation for resolving #3905.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
@sschuberth
Copy link
Member

It could be checked whether the ClearlyDefined API supports bulk requests

Looks good: https://api.clearlydefined.io/api-docs/#/curations/post_curations_

sschuberth added a commit that referenced this issue Jan 4, 2022
Requesting curations for multiple packages / ids at once can be much
more performant than single request, depending on the actual provider
implementation.

Partly resolves #3905.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit that referenced this issue Jan 4, 2022
Requesting curations for multiple packages / ids at once can be much
more performant than single request, depending on the actual provider
implementation.

Partly resolves #3905.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit that referenced this issue Jan 4, 2022
Requesting curations for multiple packages / ids at once can be much
more performant than single request, depending on the actual provider
implementation.

Partly resolves #3905.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit that referenced this issue Jan 4, 2022
Requesting curations for multiple packages / ids at once can be much
more performant than single request, depending on the actual provider
implementation.

Partly resolves #3905.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit that referenced this issue Jan 4, 2022
Requesting curations for multiple packages / ids at once can be much
more performant than single request, depending on the actual provider
implementation.

Partly resolves #3905.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit that referenced this issue Jan 4, 2022
Requesting curations for multiple packages / ids at once can be much
more performant than single request, depending on the actual provider
implementation.

Partly resolves #3905.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
sschuberth added a commit that referenced this issue Jan 5, 2022
Requesting curations for multiple packages / ids at once can be much
more performant than single request, depending on the actual provider
implementation.

Partly resolves #3905.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
@sschuberth
Copy link
Member

Reopening this to discuss with @oheger-bosch about:

Packages that are referenced by multiple projects are queried multiple times.

@sschuberth sschuberth reopened this Jan 5, 2022
@sschuberth
Copy link
Member

Reopening this to discuss with @oheger-bosch about:

Packages that are referenced by multiple projects are queried multiple times.

After looking at

/**
* Add the given [packageSet] to this builder. This function can be used for packages that have been obtained
* independently of a [ProjectAnalyzerResult].
*/
fun addPackages(packageSet: Set<Package>): AnalyzerResultBuilder {
val (curations, duration) = measureTimedValue { curationProvider.getCurationsFor(packageSet.map { it.id }) }

and

.addPackages(managerResult.sharedPackages)

and

/**
* A set with [Package]s shared across the projects analyzed by this [PackageManager]. Package managers that
* produce a shared [DependencyGraph] typically do not collect packages on a project-level, but globally. Such
* packages can be stored in this property.
*/
val sharedPackages: Set<Package> = sortedSetOf()

@oheger-bosch and I came to the conclusion that this issue will disappear once all package manager are migrated to the dependency graph format (see #3825) which uses a shared list of packages for all projects of the same type.

So I'm closing this in favor of keeping only #3825.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer About the analyzer tool enhancement Issues that are considered to be enhancements
Projects
None yet
Development

No branches or pull requests

2 participants