-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link crawler parallelization is hampered by session locks #193
Comments
brendanheywood
added a commit
that referenced
this issue
Feb 4, 2025
brendanheywood
added a commit
that referenced
this issue
Feb 4, 2025
brendanheywood
added a commit
that referenced
this issue
Feb 4, 2025
brendanheywood
added a commit
that referenced
this issue
Feb 4, 2025
brendanheywood
added a commit
that referenced
this issue
Feb 4, 2025
brendanheywood
added a commit
that referenced
this issue
Feb 4, 2025
brendanheywood
added a commit
that referenced
this issue
Feb 4, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When we spawn N adhoc tasks to crawl the site in parallel, all of these will still use the same session cookie:
https://github.com/catalyst/moodle-tool_crawler/blob/MOODLE_310_STABLE/classes/robot/crawler.php#L1092
This means that most of the time spent waiting for the request to finish is just waiting for some other crawling process to finish. If there are N adhoc tasks crawling then each should have its own independent session cookie.
So proposing something like tool_crawler_crawl() and crawler(); both having an optional param called 'worker' which is passed through from the adhoc custom data and each one will have its own cookie jar file.
On top of this, I think it's safe and better to move the cookie jar file to local temp, it doesn't matter if this gets thrown away and rebuild semi regularly
Implementing this should see the crawl rate go up by probably a factor of 10
The text was updated successfully, but these errors were encountered: