Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate the performance of the plugin #169

Closed
lubosmj opened this issue Mar 2, 2022 · 5 comments
Closed

Investigate the performance of the plugin #169

lubosmj opened this issue Mar 2, 2022 · 5 comments
Assignees
Labels

Comments

@lubosmj
Copy link
Member

lubosmj commented Mar 2, 2022

The sync and import pipelines are very slow when downloading or associating content.

The root cause of the problem might be the use of built-in ostree capabilities provided by the gi module.

@lubosmj lubosmj added the Task label Mar 2, 2022
@lubosmj lubosmj moved this to Triaged in pulp_ostree Mar 9, 2022
@lubosmj lubosmj moved this from Triaged to Triage-Needed in pulp_ostree Mar 9, 2022
@lubosmj lubosmj moved this from Triage-Needed to Triaged in pulp_ostree Mar 9, 2022
@lubosmj
Copy link
Member Author

lubosmj commented Mar 31, 2022

I am attaching the result of profiling one sync task (syncing fedora/stable/x86_64/iot from https://d2ju0wfl996cmc.cloudfront.net/, 739MB, 22,799 content units, 7:54-10:29):

    |
    |waiting time average: 0.000000
    |queue length average: 0.000000
    |interarrival average: 0.038544
    |
    ̌

pulpcore.plugin.stages.artifact_stages.QueryExistingArtifacts
	service time average: 0.000000


    |
    |waiting time average: 0.037405
    |queue length average: 0.000000
    |interarrival average: 0.039208
    |
    ̌

pulpcore.plugin.stages.artifact_stages.ArtifactDownloader
	service time average: 11.835289


    |
    |waiting time average: 0.030090
    |queue length average: 0.000000
    |interarrival average: 0.039210
    |
    ̌

pulpcore.plugin.stages.artifact_stages.ArtifactSaver
	service time average: 4.536510


    |
    |waiting time average: 0.029402
    |queue length average: 0.000000
    |interarrival average: 0.039741
    |
    ̌

pulpcore.plugin.stages.content_stages.QueryExistingContents
	service time average: 17.828544


    |
    |waiting time average: 0.032978
    |queue length average: 0.000000
    |interarrival average: 0.040197
    |
    ̌

pulpcore.plugin.stages.content_stages.ContentSaver
	service time average: 18.693540


    |
    |waiting time average: 0.037385
    |queue length average: 0.000000
    |interarrival average: 0.040579
    |
    ̌

pulpcore.plugin.stages.artifact_stages.RemoteArtifactSaver
	service time average: 18.941958


    |
    |waiting time average: 0.037729
    |queue length average: 0.000000
    |interarrival average: 0.040769
    |
    ̌

pulpcore.plugin.stages.content_stages.ResolveContentFutures
	service time average: 15.270633


    |
    |waiting time average: 0.018239
    |queue length average: 0.000000
    |interarrival average: 0.040770
    |
    ̌

pulp_ostree.app.tasks.stages.OstreeAssociateContent
	service time average: 0.017073


    |
    |waiting time average: 0.017619
    |queue length average: 0.000000
    |interarrival average: 0.040771
    |
    ̌

pulpcore.plugin.stages.content_stages.ContentAssociation
	service time average: 0.046093


    |
    |waiting time average: 0.005339
    |queue length average: 0.000000
    |interarrival average: 0.040771
    |
    ̌

pulpcore.plugin.stages.api.EndStage
	service time average: 0.008337

Content units spend too much time waiting in the stages pulpcore.plugin.stages.content_stages.QueryExistingContents, pulpcore.plugin.stages.content_stages.ContentSaver, pulpcore.plugin.stages.artifact_stages.RemoteArtifactSaver, and pulpcore.plugin.stages.content_stages.ResolveContentFutures. The slowdown may be caused by the number of compared units (22,799). Also, the way how we resolve futures in the first stage should be reconsidered.

The statistics are described at https://docs.pulpproject.org/pulpcore/plugins/api-reference/profiling.html?highlight=profiling#profiling-api-machinery.

@lubosmj
Copy link
Member Author

lubosmj commented Apr 4, 2022

I have profiled the code with line_profiler and here are the results:

File: /home/vagrant/devel/pulp_ostree/pulp_ostree/app/tasks/synchronizing.py
Function: submit_related_objects at line 255

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   255                                               async def submit_related_objects(self, commit_dc):
   256                                                   """Queue related DeclarativeContent objects and additionally download dirtree metadata."""
   257         1         44.0     44.0      0.0          _, loaded_commit, _ = self.repo.load_commit(commit_dc.content.checksum)
   258                                           
   259                                                   # it is necessary to download referenced dirtree objects; otherwise, the traversal cannot
   260                                                   # be executed without errors; the traversing allows us to read all referenced checksums,
   261                                                   # meaning that in the end we will have a list of all objects referenced by a single commit
   262         1        316.0    316.0      0.0          dirtree_checksum = bytes_to_checksum(loaded_commit[6])
   263         2         23.0     11.5      0.0          relative_path = get_checksum_filepath(
   264         1          3.0      3.0      0.0              dirtree_checksum, OstreeObjectType.OSTREE_OBJECT_TYPE_DIR_TREE
   265                                                   )
   266         1       1317.0   1317.0      0.2          await self.download_remote_object(relative_path)
   267                                           
   268         1        247.0    247.0      0.0          _, dirtree_obj = self.repo.load_variant(OSTree.ObjectType.DIR_TREE, dirtree_checksum)
   269         1       3137.0   3137.0      0.4          subtree_checksums = {bytes_to_checksum(subtree[1]) for subtree in dirtree_obj[1]}
   270         1        853.0    853.0      0.1          await self.download_dirtrees(subtree_checksums)
   271                                           
   272         1     868466.0 868466.0     99.3          await super().submit_related_objects(commit_dc)
File: /home/vagrant/devel/pulp_ostree/pulp_ostree/app/tasks/stages.py
Function: submit_related_objects at line 30

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    30                                               async def submit_related_objects(self, commit_dc):
    31                                           
    32         1     893584.0 893584.0      6.5          _, related_objects = self.repo.traverse_commit(commit_dc.content.checksum, maxdepth=0)
    33     28744    2898159.0    100.8     20.9          for obj_checksum, obj_type in related_objects.values():
    34     28743      54750.0      1.9      0.4              if obj_checksum == commit_dc.content.checksum:
    35         1          1.0      1.0      0.0                  continue
    36                                           
    37     28742    3990124.0    138.8     28.8              obj = OstreeObject(typ=obj_type, checksum=obj_checksum)
    38     28742     424510.0     14.8      3.1              obj_relative_path = get_checksum_filepath(obj_checksum, obj_type)
    39     28742    5136164.0    178.7     37.1              object_dc = self.create_object_dc_func(obj_relative_path, obj)
    40     28742     118836.0      4.1      0.9              object_dc.extra_data["commit_relation"] = await commit_dc.resolution()
    41     28742     335332.0     11.7      2.4              await self.put(object_dc)

The method create_object_dc_func() just creates a new DeclarativeContent object that is submitted to the pipeline. It seems like the problem is that we are creating thousands of objects and submitting them to the pipeline (~28,000 content units (normal size of a Fedora-IoT repository)).

Concerning the profiling outputs, I conclude that the root cause of the problem might not be the use of built-in ostree capabilities provided by the gi module.

@lubosmj
Copy link
Member Author

lubosmj commented Mar 8, 2023

Another idea.

Maybe, we could bypass the content de-duplication procedure (since this is possibly the place where the performance lacks) and create a new endpoint that resolves conflicts in the background. Repositories will need to be locked for modification. Finding a reasonable trade-off between the time spent on content-resolution and disk space usage might be welcome.

@lubosmj
Copy link
Member Author

lubosmj commented Mar 20, 2023

Another idea.

It might be worthwhile trying to perform mirroring in the following manner internally:

ostree --repo={repo} init --mode=archive
ostree --repo={repo} remote add {remote} {url}
ostree --repo={repo} pull --mirror {remote}:{ref}

Thus, instead of manually building and issuing tons of HTTP GET requests, we will mirror a remote repository by leveraging the ostree utilities. Then, we will traverse the downloaded repository and publish all the static content.

In the end, the plugin will mimic the behaviour of that in pulp_file, enhanced by the ostree functionality. Content de-duplication is already in place if we start submitting content units to the pulpcore's pipeline.

There still should be a notion of Commit, Ref, and Object content units to allow users to perform the recursive copy.

@lubosmj
Copy link
Member Author

lubosmj commented Nov 16, 2023

Closing in favour of #289. There are more details considered.

@lubosmj lubosmj closed this as completed Nov 16, 2023
@github-project-automation github-project-automation bot moved this from Triaged to Done in pulp_ostree Nov 16, 2023
@pulpbot pulpbot moved this from In Progress to Done in RH Pulp Kanban board Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Archived in project
Status: Done
Development

No branches or pull requests

1 participant