Replies: 3 comments 2 replies
-
🤖 Hey @kareiweb, nice to see you diving into new challenges! How's everything going on your end? Based on the current implementation of the However, you can exclude certain directories from being loaded using the const loader = new RecursiveUrlLoader(url, {
extractor: compiledConvert,
maxDepth: 1,
excludeDirs: ["https://js.langchain.com/docs/api/"],
}); In this example, the If you want to load documents from specific directories or URLs, you might consider using the const directoryLoader = new DirectoryLoader(
"src/document_loaders/example_data/",
{
".pdf": (path: string) => new PDFLoader(path),
},
);
const docs = await directoryLoader.load();
console.log({ docs }); In this example, the If you need to load documents from multiple directories or URLs, you could create multiple instances of the I hope this helps! If you have any other questions or need further clarification, feel free to ask. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
🤖 Currently, the RecursiveUrlLoader in langchainjs does not support loading an array of URLs or including custom directories directly. It is designed to recursively load URLs from a single base URL, excluding any directories specified in the However, you can achieve similar functionality by creating multiple instances of RecursiveUrlLoader, each with a different base URL from your array, and then loading them separately. Here's an example: const urls = ['http://example1.com', 'http://example2.com', 'http://example3.com']; // Your array of URLs
const loaders = urls.map(url => new RecursiveUrlLoader(url, { /* your options here */ }));
// Load documents from all URLs
const allDocuments = [];
for (const loader of loaders) {
const documents = await loader.load();
allDocuments.push(...documents);
} In this example, For more information, you can refer to the RecursiveUrlLoader in the langchainjs codebase. Regarding your question about a feature similar to the UnstructuredURLLoader in python langchain, currently, langchainjs does not have a direct equivalent. The closest feature is the RecursiveUrlLoader, which allows for multiple URLs to be loaded at once from a single base URL and its linked pages, controlled by the const url = "https://js.langchain.com/docs/get_started/introduction";
const compiledConvert = compile({ wordwrap: 130 }); // returns (input: string) => string;
const loader = new RecursiveUrlLoader(url, {
extractor: compiledConvert,
maxDepth: 1,
excludeDirs: ["https://js.langchain.com/docs/api/"],
});
const docs = await loader.load();
expect(docs.length).toBeGreaterThan(1);
expect(docs[0].pageContent).toContain("LangChain"); In this example, the You can find this example in the langchainjs test suite. I hope this helps! If you have any more questions or need further clarification, feel free to ask. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
@kareiweb Any updates on this? I want to do something in JS similar to WebBaseLoader in python, where the arrays of urls I want are passed in. |
Beta Was this translation helpful? Give feedback.
-
is there a way to include custom directories to the Recursive URL Loader?
incase it can't reach them or maybe it's one or two links from another website.
maybe something like this:
Beta Was this translation helpful? Give feedback.
All reactions