-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG/FEAT]: Delete documents under “My Documents” after changing the embedding model #2745
Comments
This comment was marked as off-topic.
This comment was marked as off-topic.
It is still not clear why everything should be deleted under ‘My Documents’ if it has already been done on the right-hand side of the WORKSPACE. The documents in ‘My Documents’ only become part of the workspace after ‘Move to workspace’ and only then they are vectorised by ‘save and embed ’. In addition, as described, I am unable to define document hierarchies in connection with different folders in ‘My Documents’. Kind regards |
The documents should not be deleted, but the cached embeddings and the vectors stored in the tables should be as there is no guarantee the embedding model dimensions are the same, which if they are not will cause all upserts and similarity search to fail due to dimension mismatch. |
Hi, I have explained several times that on my Windows 10 system in AnythingLLM, after changing the embed model provider (via Ollama), I have to delete the documents under 'My Documents' in order for the vector database to be filled with the new data of a newly tested embedding model. Before doing this, I reset the vector database. In my opinion, you shouldn't have to delete anything on the LEFT side of "My Documents", but only on the RIGHT side of the workspace! If I delete everything on the right side after I've integrated a new embedding model via Ollama and reset the vector database, and then start 'Save and Embed', the vector database is obviously filled with the old data again. Only when I delete the documents in 'My Documents' on the left, upload them again, move them to the workspace and click 'Save and Embed' does the vector database get filled by the new embed server model. Also, it is still a mystery to me how, for example, I can create three folders with different documents in 'My Documents'. Drag and drop or similar methods do not work. Unfortunately, the manual is very sparse on this point. Kind regards |
We are saying the same thing |
Hi, thank you for looking into the problem I described. I have one more request: It is still a mystery to me how I can, for example, create three folders with different documents in "My Documents". Drag and drop or similar methods do not work. Therefore, I am currently still unable to create a hierarchically organised workspace. Kind regards P. S.: I GOT IT NOW! :-)) |
Hi,
I don't understand how the workspaces and the functions provided there are implemented.
Question #1:
On the left, under “My Documents”, I can create folders and assign names to them.
In my case, I created the folders “Testfolder” and “Testfolder2” in addition to the already existing “custom-documents” folder. However, despite multiple attempts, I have not been able to upload the documents I uploaded via the “Click to upload or drag and drop” area into these subfolders. They always end up in the “custom-documents” directory and cannot be moved from there to one of the other two folders, “Testfolder” and “Testfolder2”, using drag and drop.
Where can I find more detailed instructions that will help me solve the above problem so that I can create folders with appropriate names under “My Documents” to structure the source documents and upload the desired source documents to them?
Question #2:
The actual workspace is on the right. To get one or more of the documents from “My Documents” there, the relevant document from “My Documents” must be selected and then “Move to Workspace” clicked. Vectorization can then be started by clicking on “Save and Embed”.
Before using it as a local RAG system, the appropriate combination of LLM model (LM Studio) and embedding model (Ollama) must be found, which means that a large number of tests have to be carried out. This also includes testing different embedding models (via Ollama).
Problem:
If you test a new embedding model, you have to reset the existing vector database with “Reset Vector Database” and repeat the embedding process with the new embedding model. In my opinion, it should be sufficient to delete all documents in the workspace on the right, then reselect the relevant documents on the left under “My Documents”, then click on “Move to Workspace” so that the documents move to the workspace on the right again to be vectorized by the new embedding model (to be tested) (Save and Embed).
Exactly this procedure, which is in line with expectations, does NOT work!
Because afterwards, a glance at “Max Context Snippets” and “Vector Count” in the relevant workspace shows that the values have not changed, although I deliberately always change the value of “Text Chunk Size” beforehand to test a different embedding model.
Only after I delete the relevant documents on the left under “My Documents”, upload them again, then move them to the right using “Move to workspace” and then click on “Save and Embed” is the vector database generated and saved by the new embedding model.
As I understand it, this should not be the case, but should work as follows: as soon as I delete the documents that have already been embedded on the right-hand side of the workspace, then click on these documents again on the left under “My Documents”, move them into the workspace and click on “save and embed”, this process should be carried out using the new embedding model and the data should be vectorized accordingly.
Thanks in advance and kind regards
Joomgallerytestit
The text was updated successfully, but these errors were encountered: