[RFC] Full cache #75

BrunoMSantos · 2021-11-23T01:35:57Z

I've been holding this in different forms for quite a long time. Most of the complication and rebase conflicts were actually on the test folder and honestly I wasn't happy with any of it anyway, so I'm skipping it for now. We can at least have a discussion about the actual caching behaviour, which amounts to very little code changes in fact.

Note that I have no idea how to cleanly do the unit testing for this (ideas are welcome), but I did test it in an ad hoc fashion and it seems to work well.

I can also say that the changes did not measurably impact performance on my machine for the existing test set. For what it's worth, it actually scored better by a few seconds (cumulative) over 100 runs of make test. Then again I wasn't expecting any difference either.

BrunoMSantos · 2021-11-23T01:55:41Z

hawkmoth/__init__.py

+        if nmatches == 0:
+            self.logger.warning('No matching docstring.',
+                                location=(self.env.docname, self.lineno))
+        elif nmatches > 1:


Bah, this needs to be auto and ... or it would trigger in the cautodoc directive when issued with globs or multiple files.

jnikula · 2021-11-23T07:44:52Z

Interesting! I'm afraid I might not have time to look into it until the weekend.

BrunoMSantos · 2021-11-23T10:53:05Z

Interesting! I'm afraid I might not have time to look into it until the weekend.

No worries!

BrunoMSantos · 2021-11-23T20:42:50Z

doc/directives.rst

+         of cached entries that match the :rst:dir:`c:autodoc:clang` options.
+         Note that this includes any file outside the :data:`cautodoc_root`
+         search path that has been previously cached by a directive which
+         explicitly included it.


Kept thinking about this. This last sentence is a lie in the current version... Only files in or nested cautodoc_root will ever be considered in this case, even if others were cached in between.

hawkmoth/__init__.py

jnikula · 2021-12-05T10:29:00Z

hawkmoth/__init__.py

+        nfiles = 0
+
+        filenames = glob.glob(os.path.join(self.env.config.cautodoc_root, '*.[ch]'))
+        filenames += glob.glob(os.path.join(self.env.config.cautodoc_root, '**/*.[ch]'))


If I understand this correctly, encountering a directive without filenames specified will scan all the *.[ch] files in source tree. For every directive.

The obvious improvement would be to scan all the files only once at extension setup. However, I still have really mixed feelings about on the one hand restricting to *.[ch] suffix and on the other hand scanning the entire source tree, and parsing all the files if there's just one directive without a filename specified.

Say, if you have a moderately big project, but only have a handful of files to document, you'd end up running clang on every file in the project across the entire project.

If I understand this correctly, encountering a directive without filenames specified will scan all the *.[ch] files in source tree. For every directive.

No, we take the file names on every directive, which might be a bit wasteful, but it's just walking the file system here. Clang will only be called once per file (not considering the issue you raised with parallel builds).

That said, we can take the file list only once though for a small performance improvement indeed.

The obvious improvement would be to scan all the files only once at extension setup.

At setup we don't know if there are any directives that do not list a file name. Essentially, this means that we will always have worst case parsing even if the user never uses the auto discovery explicitly, which would be against your last point and which I'm also against.

This may be worked around if we instead have to pass all the filenames we're interested in through the configuration file instead of a directory. Or maybe a white / black list of source files.

And as for the globing pattern(s), we could default to *.[ch] but make it user configurable.

However, I still have really mixed feelings about on the one hand restricting to *.[ch] suffix and on the other hand scanning the entire source tree, and parsing all the files if there's just one directive without a filename specified.

Say, if you have a moderately big project, but only have a handful of files to document, you'd end up running clang on every file in the project across the entire project.

Hmm, I don't think we can have it both ways, but that's why the documentation should be very clear that omitting the file names will likely trade performance for convenience.

Say, if you have a moderately big project, but only have a handful of files to document, you'd end up running clang on every file in the project across the entire project.

Hmm, I don't think we can have it both ways, but that's why the documentation should be very clear that omitting the file names will likely trade performance for convenience.

Well, I guess a possible compromise would be to stop parsing when we get a match. But if anyone is ever going to rely on auto discovery, they should be assured that there is no ambiguity in their docs. And returning the 1st hit might miss the fact that there are multiple symbols with the same name. And the likelihood of such duplicate symbol names only increase with project size.

jnikula · 2021-12-05T10:41:15Z

hawkmoth/__init__.py

    def run(self):
+        clang_args = self.__get_clang_args()
+        cache = self.env.temp_data.setdefault('cautodoc_cache', {})


So my limited understanding of Sphinx parallel build (sphinx-build -j N argument) is that self.env is not shared during document reading i.e. each document has its own copy, and the data gets merged at a later stage.

See documentation for parallel_read_safe under https://www.sphinx-doc.org/en/master/extdev/index.html#extension-metadata as well as env-merge-info event documentation at https://www.sphinx-doc.org/en/master/extdev/appapi.html#event-env-merge-info

I can't be certain without actually digging into Sphinx source and/or testing, but I think this implementation is still a per-document cache, not a global cache. I'd be happy to be proven wrong, but this was my conclusion also when I opted for the simple cache.

Also see the todo example extension at https://www.sphinx-doc.org/en/master/development/tutorials/todo.html
I think the only ways to benefit from global cache is to either

Disable parallel build in the extension setup, so we can use the cache synchronously across the documents.
or

Switch to a three phase build where the first phase simply gathers the directives and the filenames to parse, the second phase is the env-merge-info merges them, and the third phase is the doctree-resolved event where stuff gets parsed and output generated.

Interesting. I didn't test with parallel build indeed so I failed to notice this. I also assumed that part was fine because this is how you handled the single file cache, which apparently does not cache as much as I thought it did. Of course this needs to be looked into before going any further.

Thanks for the references. I'll see what can be done about this.

BrunoMSantos · 2021-12-11T17:53:48Z

Sorry, I didn't mean to close this, it was a confusion on my part with the automatic mirror from GitLab... still learning how it all fits together. Feel free to keep the discussion going of course.

I'll open a new PR once I sort out the parallel build issue at least.

We need this method to return a bit more context to the caller so we know when to do automatic name resolution or not. Returning a list instead of yielding each file name also allows us to issue an additional warning for the CAutoDocDirective's implementation when no file is matched.

BrunoMSantos · 2022-10-23T15:43:02Z

Nothing to see here! I just pushed some of the smaller changes I had worked on ages ago so that they lived elsewhere other than my hard drive. I forgot github automatically updates PRs on code push... I haven't touched this in ages and don't remember what state it is in. Judging from CI, not a good one...

Maybe we should just close it for the time being?

jnikula · 2022-10-27T19:27:18Z

It's not like anything gets lost if we close this one.

BrunoMSantos requested a review from jnikula November 23, 2021 01:36

BrunoMSantos commented Nov 23, 2021

View reviewed changes

jnikula reviewed Dec 5, 2021

View reviewed changes

hawkmoth/__init__.py Show resolved Hide resolved

jnikula reviewed Dec 5, 2021

View reviewed changes

BrunoMSantos force-pushed the full-cache branch from eb9a0a5 to f47b8df Compare December 11, 2021 17:38

BrunoMSantos closed this Dec 11, 2021

BrunoMSantos deleted the full-cache branch December 11, 2021 17:43

BrunoMSantos restored the full-cache branch December 11, 2021 17:43

BrunoMSantos reopened this Dec 11, 2021

BrunoMSantos added 4 commits April 11, 2022 18:33

cautodoc: implement full cache

7556ef9

cautodoc: detect no-op directives and over matching

65ab493

doc: update the directive option descriptions

7adf147

BrunoMSantos force-pushed the full-cache branch from f47b8df to 7adf147 Compare October 23, 2022 15:10

jnikula closed this Oct 27, 2022

BrunoMSantos mentioned this pull request May 26, 2023

Make :file: directive option actually optional #157

Closed

BrunoMSantos mentioned this pull request Jul 9, 2023

Optional file option #168

Closed

jnikula mentioned this pull request Jul 29, 2023

Make it possible to specify all source files to parse in conf.py #173

Open

stephanlachnit mentioned this pull request Nov 26, 2023

Cursor prop cache v2 #225

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Full cache #75

[RFC] Full cache #75

BrunoMSantos commented Nov 23, 2021

BrunoMSantos Nov 23, 2021

jnikula commented Nov 23, 2021

BrunoMSantos commented Nov 23, 2021

BrunoMSantos Nov 23, 2021

jnikula Dec 5, 2021

BrunoMSantos Dec 11, 2021

BrunoMSantos Dec 11, 2021

jnikula Dec 5, 2021

jnikula Dec 5, 2021

BrunoMSantos Dec 11, 2021

BrunoMSantos commented Dec 11, 2021

BrunoMSantos commented Oct 23, 2022 •

edited

Loading

jnikula commented Oct 27, 2022

[RFC] Full cache #75

[RFC] Full cache #75

Conversation

BrunoMSantos commented Nov 23, 2021

Choose a reason for hiding this comment

jnikula commented Nov 23, 2021

BrunoMSantos commented Nov 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BrunoMSantos commented Dec 11, 2021

BrunoMSantos commented Oct 23, 2022 • edited Loading

jnikula commented Oct 27, 2022

BrunoMSantos commented Oct 23, 2022 •

edited

Loading