-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve harvesting scheduler #8184
Improve harvesting scheduler #8184
Conversation
Merge from upsteram
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.2 to 1.26.3. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/1.26.3/CHANGES.rst) - [Commits](urllib3/urllib3@1.26.2...1.26.3) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Toni <toni.schoenbuchner@csgis.de>
…6881) * [Fixes GeoNode#6880] Circle CI upload tests fail irregulary * CircleCI test fix: sometimes expires due to upload timeout in the test environment * - Avoid infinite loop on upload testing * Revert "CircleCI test fix: sometimes expires due to upload timeout in the test environment" This reverts commit 66139fd. Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it> Co-authored-by: afabiani <alessio.fabiani@gmail.com>
…de#6911) * get meaningful document filenames on download * - Strip extension from document title before slugify it (e.g.: image.jpg instead of imagejpg.jpg) Co-authored-by: afabiani <alessio.fabiani@gmail.com> Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>
…loop on "wait_for_progress"
…ng slash at the end of GEOSERVER_LOCATION (GeoNode#6913) * [Fixes GeoNode#6916] gsimporter.api.NotFound caused by missing trailing slash at the end of GEOSERVER_LOCATION * [Fixes GeoNode#6916] unit test for GEOSERVER_LOCATION
Bumps [django-cors-headers](https://github.com/adamchainz/django-cors-headers) from 3.6.0 to 3.7.0. - [Release notes](https://github.com/adamchainz/django-cors-headers/releases) - [Changelog](https://github.com/adamchainz/django-cors-headers/blob/master/HISTORY.rst) - [Commits](adamchainz/django-cors-headers@3.6.0...3.7.0) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [amqp](https://github.com/celery/py-amqp) from 5.0.3 to 5.0.5. - [Release notes](https://github.com/celery/py-amqp/releases) - [Changelog](https://github.com/celery/py-amqp/blob/master/Changelog) - [Commits](celery/py-amqp@v5.0.3...v5.0.5) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [pip](https://github.com/pypa/pip) from 21.0 to 21.0.1. - [Release notes](https://github.com/pypa/pip/releases) - [Changelog](https://github.com/pypa/pip/blob/master/NEWS.rst) - [Commits](pypa/pip@21.0...21.0.1) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [coverage](https://github.com/nedbat/coveragepy) from 5.3.1 to 5.4. - [Release notes](https://github.com/nedbat/coveragepy/releases) - [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst) - [Commits](nedbat/coveragepy@coverage-5.3.1...coverage-5.4) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [pytest](https://github.com/pytest-dev/pytest) from 6.2.1 to 6.2.2. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/master/CHANGELOG.rst) - [Commits](pytest-dev/pytest@6.2.1...6.2.2) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [djangorestframework-gis](https://github.com/openwisp/django-rest-framework-gis) from 0.16 to 0.17. - [Release notes](https://github.com/openwisp/django-rest-framework-gis/releases) - [Changelog](https://github.com/openwisp/django-rest-framework-gis/blob/master/CHANGES.rst) - [Commits](openwisp/django-rest-framework-gis@v0.16.0...v0.17.0) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
… it has… (GeoNode#6923) * [Fixes GeoNode#6922][REST API v2] Expose the curated thumbnail URL if it has been uploaded * - Add REST APIs test suite to CircleCI
* [Cleanup and Refactor] Remove QGIS server backend dependencies * [Cleanup and Refactor] Remove QGIS server backend dependencies * - Fix LGTM issues
…iddleware Feature#650 basic auth middleware
Upstream master
…olutions_master
… not need it anymore
…ing does not need it anymore" This reverts commit 8d50cf2.
- remove django-celery-beat as a dependency - implement a simple harvesting scheduler as a celery task - implement an action for resetting a harvester's status - add (and fix) tests
… not need it anymore
Codecov Report
@@ Coverage Diff @@
## master #8184 +/- ##
==========================================
- Coverage 59.26% 59.25% -0.02%
==========================================
Files 763 765 +2
Lines 46562 46683 +121
Branches 5886 5897 +11
==========================================
+ Hits 27597 27662 +65
- Misses 17384 17435 +51
- Partials 1581 1586 +5 |
@afabiani did you have time to review this yet - if possible, I'd like to merge soon, in order to avoid picking up conflicts with master again, which makes it burdensome to resolve and keep up to date |
The error is due to some migrations which are now conflicting, after I tried to update this to latest master. Need to fix |
migration conflicts (hopefully) fixed |
@giohappy this PR will probably break the master.demo migrations, those must be fixed manually on the DB. |
This PR implements an alternative strategy for the harvesting scheduler.
The proposed implementation introduces the
harvesting_scheduler()
celery task, which performs the scheduling of harvesting sessions by checking each harvester, checking whether it is time to refresh its harvestable resources or to perform a harvesting session, and then dispatching the relevant celery tasks accordingly.This scheduler task is being assigned to the celery beat file-based scheduler and is configured to run every 30 seconds.
This PR ditches the
django-celery-beat
depedency, which was being used to createPeriodicTask
s for each harvester. This seemed like a good approach, as it allowed storing the configuration of celery beat in the DB and provided an admin backend for changing it dynamically. Unfortunately,django-celery-beat
proved unreliable with the previous implementation, as harvesters were not being scheduled - there seem to be multiple issues reported on django-celery-beat's issue tracker that deal with similar problems:https://github.com/celery/django-celery-beat/issues?page=1&q=is%3Aissue+is%3Aopen
In addition to the scheduler work, this PR also includes a new admin action for the
harvesting.harvesters
changelist that allows a harvester's status to be reset back toready
- this can be useful for testing or debugging purposes, as a harvester can sometimes become locked up with another status.