Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Setting to turn off automatic reindexing of localDocs collections #2651

Open
3Simplex opened this issue Jul 11, 2024 · 4 comments
Open
Assignees
Labels
chat-ui-ux Issues related to the look and feel of GPT4All Chat. enhancement New feature or request local-docs

Comments

@3Simplex
Copy link
Collaborator

3Simplex commented Jul 11, 2024

After long debate I think we've settled on a simple option in localdocs that will turn off all automatic reindexing of localdocs collections.


OLDER ORIGINAL REQUEST

Feature Request

The option to "Lock" a localDocs collection to prevent reindex would be useful to ensure that an important collection remains unchanged. (Larger collections take hours to index and embed.)

Screenshot 2024-07-11 163757

  • A "Lock" button which will disable the "remove" and "rebuild" options from the collection.
  • When "Locked" this collection will not be automatically changed for any reason.

lockCollection1
UnlockCollection1

@3Simplex 3Simplex added the enhancement New feature or request label Jul 11, 2024
@manyoso manyoso self-assigned this Jul 11, 2024
@manyoso manyoso added local-docs chat-ui-ux Issues related to the look and feel of GPT4All Chat. labels Jul 11, 2024
@cebtenzzre
Copy link
Member

cebtenzzre commented Jul 15, 2024

I would much rather add an "Are you sure?" dialog to both buttons, and add the "Update" button that we have been lacking for a while, which is like Rebuild but non-destructive. I cannot think of any reason to intentionally have a LocalDocs collection be inconsistent with what is actually on disk, which also outweighs the confusion that would likely be caused by such a situation (since you can't actually inspect the collection to see which files are and aren't in it).

e.g., if you are worried that your OneDrive might disconnect and the files will disappear temporarily, you should make a copy of the files instead. There are all manner of sync programs you can use to maintain a copy of a set of files. But trying to build this kind of sync functionality into GPT4All itself seems like unnecessary complexity.

If your use case is suited by e.g. leaving embeddings in cache for some duration in case files are moved or deleted but then restored in short order, I would also prefer that. They could even be cached indefinitely in the collection until you clear the cache. But I don't think GPT4All should ever reference files that currently do not exist at the specified path.

@3Simplex
Copy link
Collaborator Author

From what I can see, the DB stores all the data that the files provide. It does not rely on the files to exist in order to function. The program itself requires the files to exist, which triggers actions the user may not want to occur. i.e. For each collection "update db" upon change to the files/structure within the collection, or upon changes to the settings that govern collections.

Screenshot 2024-07-15 124033

I want to choose when my collection is updated. I don't want to rebuild all of my collections because I chose to add a new filetype as a setting. I don't want to rebuild all of my collections because one of my collections needs a larger chunk size and less chunks. I don't want to rebuild when I make one small change to a volatile directory that is otherwise fine.

If I have taken the time to embed for several hours I want it protected now that it is done.

@manyoso
Copy link
Collaborator

manyoso commented Jul 16, 2024

@cebtenzzre I think in the end this is about having a setting that turns off automatic re-indexing when we discover a change through QFileSystemWatcher... some users want to manually control re-indexing. Having that setting (not per collection) plus an 'ARE YOU SURE' dialog I think would get @3Simplex what he's after

@dgcruzing
Copy link

dgcruzing commented Jul 26, 2024

I agree with this feature, as I have just been experiencing this myself.

Synology NAS with several collections. And finding that this would be a feature that would open up a large number of users, both domestically and commercially. I am testing it for the construction industry (Training), and it makes sense that asI have tier 1 company files on it that are industry standards (engineering reports and Australian standards). Of course, the end game is to train an LLM on these, but I can't recommend gpt4all to architects/builders running with a NAS. This re-indexing on every startup is a show-stopper.
image
So it is a +1 from me on this one.

Edits: Testing out using Windows Mapped Drives to fix the re-indexing; if it is stable, this is a workaround using NAS files.

@manyoso manyoso changed the title [Feature] Option to "Lock" a localDocs collection to prevent reindex. [Feature] Setting to turn off automatic reindexing of localDocs collections Jul 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chat-ui-ux Issues related to the look and feel of GPT4All Chat. enhancement New feature or request local-docs
Projects
No open projects
Development

No branches or pull requests

4 participants