Context Compressor: A Recursive Ternary Embedding Document Search Algorithm #15985
SuiltaPico
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This algorithm divides a document recursively into three groups for compression. Assuming we have an array of text segments
[1, 2, 3, 4, 5, 6]
, we first divide it into three groups[1, 2, 3], [3, 4, 5], [4, 5, 6]
, and then embed these three groups. Next, we search for the two most similar documents in the document based on vector similarity. After comparison, we find that the group[4, 5, 6]
has the lowest similarity. Therefore, we continue using the same method on the text segment array[1, 2, 3, 4]
...However, this is just a preliminary idea and I haven't had the time to put it into practice. Additionally, I have not determined the termination condition for the recursion. I hope this idea can inspire everyone and we can discuss the feasibility of this method together.
Beta Was this translation helpful? Give feedback.
All reactions