You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The documentation of BaseLoader
say: Implementations should implement the lazy-loading method using generators to avoid loading all Documents into memory at once.
I fully agree with this objective. But the current API doesn't make it possible.
The TextSplitter API is incompatible with an async approach.
With this patch in langchain_text_splitters/base.py#textSplitter
asyncdefatransform_documents(
self, documents: AsyncIterator[Document], **kwargs: Any
) ->AsyncIterator[Document]:
"""Asynchronously transform a list of documents. Args: documents: A sequence of Documents to be transformed. Returns: A list of transformed Documents. """returnawaitrun_in_executor(
None, self.transform_documents, documents, **kwargs
)
It is then possible to have a Splitter from a lazy load. But it's no longer lazy.
To remain lazy, if must be implemented this way:
asyncdefatransform_documents(
self, documents: AsyncIterator[Document], **kwargs: Any
) ->AsyncIterator[Document]:
"""Transform sequence of documents by splitting them."""asyncfordocindocuments:
yieldself.split_documents([doc])
In this case, the caller must be modified accordingly:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The documentation of BaseLoader
say: Implementations should implement the lazy-loading method using generators to avoid loading all Documents into memory at once.
I fully agree with this objective. But the current API doesn't make it possible.
The
TextSplitter
API is incompatible with an async approach.With this patch in
langchain_text_splitters/base.py#textSplitter
It is then possible to have a Splitter from a lazy load. But it's no longer lazy.
To remain lazy, if must be implemented this way:
In this case, the caller must be modified accordingly:
It's still not lazy. For this to be the case, you need to modify
afrom_documents()
from :to
possibly with:
and review all implementations.
What do you think? This conforms to the
BaseLoader
approach.Note that I had already proposed a solution for making transformations lazy, while making them
Runnable
. Shall I try it?Beta Was this translation helpful? Give feedback.
All reactions