-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add option to persist rule index #1880
base: master
Are you sure you want to change the base?
Conversation
**What type of PR is this?** Other **What package or component does this PR mostly affect?** all **What does this PR do? Why is it needed?** Prefactor for #1880
10667ba
to
936cb51
Compare
} | ||
defer indexDbFile.Close() | ||
|
||
indexDbContent, err := io.ReadAll(indexDbFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large does this file get? I'm wondering whether json.NewDecoder
could be more appropriate here. Doesn't matter for the first iteration though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this could get fairly large and I've been wondering if we'll want something more compact then json. But I figured if this is EXPERIMENTAL and we can break it whenever we want then plain json is a good place to start?
@@ -96,6 +99,7 @@ func (ucr *updateConfigurer) RegisterFlags(fs *flag.FlagSet, cmd string, c *conf | |||
fs.StringVar(&ucr.memProfile, "memprofile", "", "write memory profile to `file`") | |||
fs.Var(&gzflag.MultiFlag{Values: &ucr.knownImports}, "known_import", "import path for which external resolution is skipped (can specify multiple times)") | |||
fs.StringVar(&ucr.repoConfigPath, "repo_config", "", "file where Gazelle should load repository configuration. Defaults to WORKSPACE.") | |||
fs.StringVar(&uc.ruleIndexFile, "indexdb", "", "EXPERIMENTAL: file to cache the rule index") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions for the naming of the cli arg? I used "indexdb" because this is caching the RuleIndex
data 🤷
Maybe we should use a --experimental
prefix?
Just to re-iterate a few points and limitations of this: This is EXPERIMENTAL with no api guarantees. 99% chance there will be breaking changes. You should never commit this file to source-control. When this EXPERIMENTAL feature is used:
If There is no cache invalidation here. It is entirely up to the user to pass the correct list of FUTURE: if those updated |
ca7cf8f
to
ae8009a
Compare
@fmeum did you have any opinions on this going forward? Is this something you think should live in gazelle? For large repos even simply walking the fs can be slow (even after the changes making it concurrent) so any type of incremental BUILD-generation needs to run on a subset of the workspace which is the goal of this PR. |
I could very well see this live in Gazelle. Do you think that any of the two recent proposals #1891 and #1907 could substantially improve the situation for your large repos? If so, I would prefer to explore this route first, mostly since serialization, even if experimental, often ends up constraining future efforts to change the persisted data structure. Not doing unnecessary work in the first place could save us the complexity of having to persist this work. |
Avoiding any type of caching/persisting would always be ideal if we can do it 👍 For persisting/serialization... I would prefer just thinking of it as caching, it should never be required. If we just throw a hash at the start of the serialized data then we can ignore any cached data from an older version of gazelle or different list of languages etc (hash = gazelle version + every language compiled into the gazelle binary maybe), then we should never have to deal with older data formats at least. |
I'll try to leave comments on #1891 and #1907. I worry any major improvements will require breaking changes to truly help though, maybe completely replacing traversal+indexing+resolving with something lazy by default? I think a lot of scenarios might still need to know the fs tree in order to do things based on convention or lazy indexing though, but if we can then do the actual indexing+resolving in a lazy fashion that would be great. |
What type of PR is this?
Feature
What package or component does this PR mostly affect?
cmd/gazelle
What does this PR do? Why is it needed?
Add the ability to persist the
RuleIndex
across runs to allow updating only the specified directories when indexing is required.Which issues(s) does this PR fix?
Ref: #1181