Skip to content

Puppeteer Pool, run a cluster of instances in parallel

License

Notifications You must be signed in to change notification settings

joone/headless-cluster

 
 

Repository files navigation

Headless Cluster NPM

headless-cluster is a fork of the renowned puppeteer-cluster library, designed to streamline and optimize the process of managing multiple puppeteer instances concurrently. This project enhances the core functionalities of puppeteer-cluster by providing proxy support and integrating the latest features of Puppeteer.

Proxy support

Headless-cluster enables authenticated proxy support. Pass a data object to cluster.execute containing proxy settings (contextOptions) and authentication credentials (authentication). Retrieve these in your task callback and use page.authenticate to set username and password. See the example code in examples/execute-proxy.js.

  // Create a cluster with 2 workers
  // You can also use Cluster.CONCURRENCY_BROWSER
  const cluster = await Cluster.launch({
      concurrency: Cluster.CONCURRENCY_CONTEXT,
      maxConcurrency: 2,
  });

  // Define a task
  await cluster.task(async ({ page, data }) => {
    try {
      await page.goto(data.url);
    } catch (err) {
      console.log(err);
      return 'Failed to load the page';
    }
    const pageTitle = await page.evaluate(() => document.title);
    return pageTitle;
  });

  // Use try-catch block as "execute" will throw instead of using events
  try {
      // Execute the tasks one after another via execute
      let data = { contextOptions: {'proxyServer': 'http://localhost:3128'}, url: 'https://www.google.com',
          authentication: { username: 'foobar', password: 'Ya4zAzj8i' }};
      console.log(data);

      const result1 = await cluster.execute(data);
      console.log(result1);
      const result2 = await cluster.execute({ url: 'https://www.wikipedia.org'});
      console.log(result2);
  } catch (err) {
      // Handle crawling error
  }

  // Shutdown after everything is done
  await cluster.idle();
  await cluster.close();

About

Puppeteer Pool, run a cluster of instances in parallel

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • TypeScript 98.9%
  • Other 1.1%