Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solving request interception chain #364

Closed
XavierGeerinck opened this issue Nov 1, 2020 · 15 comments · Fixed by #592
Closed

Solving request interception chain #364

XavierGeerinck opened this issue Nov 1, 2020 · 15 comments · Fixed by #592
Labels
enhancement New feature or request ongoing-research upstream-issue The issue is caused by a dependency

Comments

@XavierGeerinck
Copy link

Hi Everyone!

Currently I am having the problem that when you have 2 plugins that try to intercept and handle requests, that the dependencies start to fail. (E.g. when I have the adblocker plugin and a personal plugin that both try to block resources, then it starts stating that a request has already been handled).

I agree that this is how it is with Puppeteer and that it will be hard to fix it (sadly enough). But seeing the plugin interface, I think it might be worthwhile to find a solution to this.

Some thoughts that I had:

  • Create page hooks (such as the onPageCreated) but then for onPageRequest, onPageRequestFinished, onPageRequestFailed and make them check if a request has been handled already (it should be possible to keep working with requests until an .abort has been called or the request finished?)
  • Add a weight system to allow certain plugins to come before others (e.g. adblocker to block requests before other plugins).

I'd love to hear some thoughts from the community to see if this is feasible or not? Looking forward to the responses!

@berstend
Copy link
Owner

berstend commented Nov 2, 2020

@XavierGeerinck thanks for the well thought-out comment 👍

It's been a while since I worked on that aspect, but I remember the main annoyance being that when request interception is enabled someone needs to decide if the request shall be continued or aborted (but that decision can only happen once).

That makes it a bit annoying to work with from a plugin framework perspective 😄 If I remember correctly puppeteer is holding an internal state for this.

If all code happens within plugins this would be relatively simple to fix, it get's more tricky when we need to take care of code in the user's scripts as well (by patching .enableRequestInterception) and the like.

I would need to refresh my memory on the issues I ran into while working on a solution to this a while back, happy for other input on this though.

@berstend berstend added enhancement New feature or request ongoing-research labels Nov 2, 2020
@XavierGeerinck
Copy link
Author

Well I am personally fine if plugins have to be written for this to work inside the core, but I also believe a better solution could be found. The solution I personally was thinking was to indeed handle this within the plugin framework and:

  • add the appropriate hooks, then when they exist make the hooks return abort or continue and finally decide in the framework to continue or not :)
  • Add a specific flag to show what was decided (so that in the code you can check if the request was indeed handled or not)

Is this something that could be added? Would love to see this upstream!

In the meantime I also opened a request on puppeteer for official plugin support as well as solving this use case: puppeteer/puppeteer#6567

let me know what you think!

@knuppe
Copy link

knuppe commented Nov 21, 2020

A simple solution would be.

page.on('request', async (req) => {
    if (req._interceptionHandled) return;
    /* ... */
}

@XavierGeerinck
Copy link
Author

_interceptionHandled

I am not sure tbh.. this is what I used before and it will then crash further on in the code if you have multiple listeners, since the request was already handled then.

@berstend
Copy link
Owner

berstend commented Dec 8, 2020

I'm lacking the bandwidth at the moment to look into this properly, but I'm pretty sure there's a way to fix this so multiple plugins as well as external scripts can use request interception at the same time. :-)

I will revisit this once playwright support and the new stealth plugin has landed, if anyone else wants to take a stab at this in the meantime that'd be great as well.

@berstend
Copy link
Owner

berstend commented Dec 8, 2020

I guess one "transparent" way to handle this could be to patch request.continue() to not finalize the interception but have all other listeners be called as well. In case the following handlers don't issue either continue() or abort() then a fallback handler would do the final continue(). In that scenario abort() would be final (so if the adblocker plugin aborts a request it wouldn't show up in other handlers).

In terms of order of execution: First come first served? That would mean plugins get the first pick (in order they're added), followed by handlers the user might've defined.

@benallfree
Copy link

benallfree commented Jan 4, 2021

edit: @berstend I had a similar idea, PR in puppeteer/puppeteer#6735

I'm here because I hit this same issue. My use case is I want to intercept pptr requests and potentially pull them from a Redis cache rather than repeating the request or relying on the pptr cache. Unfortunately, whoever calls request.continue() or request.abort() first, wins.

To me, this means pptr can have at most one request interceptor in page.on('request'). Since that's the case, puppeteer-extra should add one listener there and then all the plugins should hook into a puppeteer-extra event system. That way all the plugins can run, each saying whether or not to continue or abort.

page.on('request', req=> {
   const votes = await Promise.all(px.onRequestListeners.map( (e)=>e(req) ))
   const shouldContinue = votes.reduce( (a, e)=>a&&e, true)
   if(shouldContinue) req.continue() else req.abort()
});

@benallfree
Copy link

Note: I published a package to fix this https://www.npmjs.com/package/enchant-puppeteer

@berstend berstend added the upstream-issue The issue is caused by a dependency label Feb 8, 2021
benww added a commit to benww/puppeteer-extra that referenced this issue Apr 23, 2021
…: No resource with given identifier found`

We recently started using [`enchant-puppeteer`](berstend#364 (comment)) and doing our own asynchronous request interception while also using `puppeteer-extra-plugin-stealth`.  It seems like this combination makes it much more likely for this error to happen, especially on pages with redirects. The error was causing the whole process to be killed.

This seems relevant: puppeteer/puppeteer#2258 (comment)

We aren’t sure this is the right way to handle this problem. However, this seems like an improvement.
@berstend
Copy link
Owner

@benallfree any specific reason your upstream PR is closed but not merged? 🤔
puppeteer/puppeteer#6733

@benallfree
Copy link

@berstend Sorry for the confusion, it was moved to puppeteer/puppeteer#6735

berstend pushed a commit that referenced this issue Apr 26, 2021
We recently started using [`enchant-puppeteer`](#364 (comment)) and doing our own asynchronous request interception while also using `puppeteer-extra-plugin-stealth`.  It seems like this combination makes it much more likely for this error to happen, especially on pages with redirects. The error was causing the whole process to be killed.

This seems relevant: puppeteer/puppeteer#2258 (comment)

We aren’t sure this is the right way to handle this problem. However, this seems like an improvement.
@FdezRomero
Copy link
Contributor

@benallfree Your PR puppeteer/puppeteer#6735 was already merged. What would need to be done to completely fix the issue in puppeteer-extra? Thanks!

@benallfree
Copy link

benallfree commented Nov 22, 2021

@FdezRomero Yes, here are the upgrade instructions: https://github.com/puppeteer/puppeteer/blob/v11.0.0/docs/api.md#upgrading-to-cooperative-mode-for-package-maintainers

Basically it entails adding a 2nd arg to each .abort(), .continue(), and .respond() you call.

Also take note of the setInterceptResolutionStrategy() suggestion in the upgrade notes. puppeteer-extra could really benefit from this.

If you'd like hands-on help implementing this upgrade, please let me know. puppeteer-extra and this ticket were the original inspiration for what eventually became a core contrib to Puppeteer 👍

@FdezRomero
Copy link
Contributor

FdezRomero commented Nov 22, 2021

@benallfree Thanks a lot for the details! I'll try adding support for Cooperative Mode for the stealth, adblocker and resource blocker plugins and confirm that they get along well 👍🏻

I agree that it should be made possible to define the priorities, so I'll add setInterceptResolutionStrategy() as suggested there. I'll let you know if I need some help 😉

@FdezRomero
Copy link
Contributor

@berstend @benallfree I created a PR to fix this issue for both plugins in #592, feedback appreciated 😄

@dekelev
Copy link

dekelev commented Feb 4, 2022

@berstend Looks like PR is stuck waiting for review for few months now. Can we please merge this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ongoing-research upstream-issue The issue is caused by a dependency
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants