feat: Adds `resendfailedbatches` setting to allow failed statements to be retried. #367

garemoko · 2018-12-20T12:23:19Z

Description
This PR does a few different things in order to improve the handling of failed batches.

(No related issue)

Previously emit_task.php would record in the trace a list of event ids which the plugin had attempted to send to the LRS (some or all of which might have been in failed batches) with the message "Events [list of event ids] have been successfully sent to the LRS." But those events had not necessarily been successfully sent and having the event ids served no purpose because the events were no longer in the database in order to look up the ids.

Now emit_task.php will instead record the number of successful and (separately) unsuccessful events so the person looking at the logs can see how many events were successful and how many were not.

(Fixes Undefined variable: eventobj #356)

Previously load_batch.php, in the event of a failed xapi request, would try to trace the event id using the $eventobj->id variable. However this only resulted in an error because:
A. The $eventobj variable did not exist in that scope.
B. It couldn't exist because the xapi request would normally relate to multiple events, not a single id.
C. Having the event id(s) would have been useless anyway because the database row id for a row that's deleted is useless (the event gets a new id in the failed log table).

Now the error instead includes the size of the batch that was rejected.

(Fixes re-send statements one by one if batch fails #359)

Previously if a batch failed, all events in that batch were marked as failed, even if there was only one event in the batch that had an issue.

Now the plugin will recursively retry the events in smaller and smaller batches (half size each time) until it succeeds or gets down to a batch size of one. This means that events will not be rejected for just being in the same batch as a bad request. It also makes debugging of bad events easier because you'll end up with a bad request containing just one event's statement(s).

Note: I have not tested this with a large data set, only with a pair of events, one of which is bad.

Related Issues

PR Type

Fixes and Enhancement

garemoko · 2018-12-20T14:02:03Z

@ryansmith94 This and the other two PRs I've opened are ready for review.

ryasmi · 2018-12-21T14:39:36Z

src/loader/utils/load_batch.php

        $logerror($e->getTraceAsString());
-        $loadedevents = construct_loaded_events($transformedevents, false);
+
+        // Recursively retry sending statements in increasingly smaller batches so that only the actual bad data fails.


I think this will cause some issues because this will delay the time the CRON takes to execute so the next CRON will try to process the same events so we'll duplicate statements. Hope that makes sense.

I believe that Moodle cron has some built in stuff that will prevent that happening. See "running cron on multiple servers" here: https://docs.moodle.org/36/en/Cron

Ah didn't know that, good spot.

Tasks can run in parallel and processes use locking to prevent tasks from running at the same time which allows cron to be triggered from multiple web servers that serve the same Moodle instance.

Do you think it would be better to just validate the statements before we send them? Otherwise if the first statement in the batch is invalid we might send X requests where X is the max statements in a batch.

Validating before we send them is good, but it'll be a lot of effort to ensure validation is as good and matches the validation done by the LRS.

X requests would not be sent in that scenario. Let's say it's a batch of 10 statements the first statement is bad.

Round 1: 1 batch of 10 statements fails
Round 2: 2 batches of 5 statements (1 fails, 1 succeeds)
Round 3: 2 batches of up to 3 statements (1 fails, 1 succeeds)
Round 4: 2 batches of up to 2 statements (1 fails, 1 succeeds)
Round 5: 2 batches of 1 statement (1 fails, 1 succeeds)

Total requests: 9. Ok, that's actually pretty close to 10, but if you have up to 12 events, it's still only 9 requests!

I'm open to suggestions of a better approach. The current approach of failing the whole batch is not great.

Yeah matching the validation of the LRS is pretty much impossible given the slight differences between the LRSs and the little holes in the conformance tests. I'd be interested to know what other people think about this issue (@davidpesce and @caperneoignis). I don't think the current approach is too bad, agree that it's not ideal though, but I'm not sure what the ideal solution is either.

I made changes so the recursive retry functionality only triggers in the event of a 400 Bad Request response from the LRS. This ensures that in the event of slow responses we're not retrying and retrying so that cron gets blocked.

I do still think we should retry in the event of 404, 429, 503 and 504, but that can be handled as a separate issue, and probably means retrying next time cron runs rather than in the same cron task.

I like the approach of us tackling specific error codes or groups of error codes one by one, rather than having a catch all approach, so happy for this to start with just 400 errors.

Sorry avoided checking in here to avoid getting in trouble with the better half. I'm quite happy to retry on any error code and perhaps only put events in the failed log if we get a 400-499 error code.

Also putting this behind a flag would be good I think.

I added a setting but haven't tested it just yet. I'll comment again once I have tested. (It's passing automated tests)

Cool thanks Andrew 👍

(not tested yet!)

garemoko · 2019-01-07T20:47:01Z

@ryansmith94 this has been tested with both the setting on and off with batches of 2 statements one of which is good and one is bad. In both cases it worked as expected.

I'm expecting it to pass automated testing and then it should be good for you to review again.

ryasmi

Thanks @garemoko. Made some comments that you might find interesting/useful but no need to make any changes.

ryasmi · 2019-01-08T08:46:31Z

src/loader/utils/load_batch.php

+        // In the event of a 400 error, recursively retry sending statements in increasingly
+        // smaller batches so that only the actual bad data fails.
+        if ($batchsize === 1 || $e->getCode() !== 400 || $config['lrs_resend_failed_batches'] !== '1') {
+            $loadedevents = construct_loaded_events($transformedevents, false);


Personally I would return early here instead of assigning to loadedevents to be returned later because it's quicker to read since you don't need to scroll down looking for any uses/modifications to loadedevents. Happy to merge with this just sharing that because you might agree and find it useful in future.

Yep, good point.

ryasmi · 2019-01-08T08:51:24Z

src/loader/utils/load_batch.php

+
+        // In the event of a 400 error, recursively retry sending statements in increasingly
+        // smaller batches so that only the actual bad data fails.
+        if ($batchsize === 1 || $e->getCode() !== 400 || $config['lrs_resend_failed_batches'] !== '1') {


I might have flipped this if around and just returned construct_loaded_events($transformedevents, false); outside without an else, but that is very much a personal preference. I think I prefer that way because it means you don't have to reverse the condition (lacked a better phrase, hope that makes sense). You could instead write the following.

$hasInvalidStatement = $e->getCode() === 400; $isResendEnabled = $config['lrs_resend_failed_batches'] === '1'; if ($batchsize > 1 && $hasInvalidStatement && $isResendEnabled) { // ... }

You could even extract resend logic into a function to remove the need for the comment that currently explains the if block.

$hasInvalidStatement = $e->getCode() === 400; $isResendEnabled = $config['lrs_resend_failed_batches'] === '1'; if ($batchsize > 1 && $hasInvalidStatement && $isResendEnabled) { retry_in_smaller_batches($config, $transformedevents, $loader); }

I think in the long run we'll end up with an if for batchsize > 1 and the setting containing a switch for the various types of error.

Extracting the logic does look clean.

Yeah sounds good 👍

ryasmi · 2019-01-08T08:57:33Z

src/loader/utils/load_batch.php

+            $loadedevents = construct_loaded_events($transformedevents, false);
+        } else {
+            $newconfig = $config;
+            $newconfig['lrs_max_batch_size'] = round($batchsize / 2);


Really like the way you solved this recursively with a modified copy of $config.

HT2Bot · 2019-01-08T09:07:07Z

🎉 This PR is included in version 4.2.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

garemoko added 7 commits May 2, 2016 17:50

Merge branch 'jlowe64/master'

1d2dacc

Merge remote-tracking branch 'upstream/master'

50e817d

Merge remote-tracking branch 'upstream/master'

e5646a5

Merge remote-tracking branch 'upstream/master'

eb699fe

Improve handling of failed batches

9ef0e93

whitespace coding style fixes

9e107ed

Merge branch 'master' into issue-356-eventobj

1800b97

ryasmi reviewed Dec 21, 2018

View reviewed changes

ryasmi and others added 2 commits December 21, 2018 15:08

Merge branch 'master' into issue-356-eventobj

07576a8

Merge branch 'master' into issue-356-eventobj

8f8b933

ryasmi added the feat label Dec 21, 2018

ryasmi self-assigned this Dec 21, 2018

garemoko added 6 commits December 27, 2018 07:30

Merge branch 'master' into issue-356-eventobj

708c2ac

Only re-try parts of batch in event of 400 response

47e60e2

Merge branch 'master' into issue-356-eventobj

4f899c3

Make resending failed batches a setting

5a37bf7

(not tested yet!)

Merge branch 'master' into issue-356-eventobj

fc850b4

booleans return from Moodle database as string "0" or "1"

08e72bd

ryasmi approved these changes Jan 8, 2019

View reviewed changes

Merge branch 'master' into issue-356-eventobj

5cb6db3

ryasmi changed the title ~~Improved handling of failed batches~~ feat: Adds resendfailedbatches setting to allows failed statements to be retried. Jan 8, 2019

ryasmi changed the title ~~feat: Adds resendfailedbatches setting to allows failed statements to be retried.~~ feat: Adds resendfailedbatches setting to allows failed statements to be retried. Jan 8, 2019

ryasmi changed the title ~~feat: Adds resendfailedbatches setting to allows failed statements to be retried.~~ feat: Adds resendfailedbatches setting to allow failed statements to be retried. Jan 8, 2019

ryasmi merged commit 803bb4c into xAPI-vle:master Jan 8, 2019

HT2Bot added the released label Jan 8, 2019

garemoko deleted the issue-356-eventobj branch January 8, 2019 10:28

ryasmi linked an issue Feb 14, 2020 that may be closed by this pull request

re-send statements one by one if batch fails #359

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Adds `resendfailedbatches` setting to allow failed statements to be retried. #367

feat: Adds `resendfailedbatches` setting to allow failed statements to be retried. #367

garemoko commented Dec 20, 2018 •

edited

Loading

garemoko commented Dec 20, 2018

ryasmi Dec 21, 2018

garemoko Dec 21, 2018

ryasmi Dec 21, 2018

garemoko Dec 21, 2018 •

edited

Loading

ryasmi Dec 21, 2018

garemoko Dec 28, 2018

ryasmi Jan 2, 2019

ryasmi Jan 2, 2019

garemoko Jan 4, 2019

ryasmi Jan 7, 2019

garemoko commented Jan 7, 2019

ryasmi left a comment

ryasmi Jan 8, 2019

garemoko Jan 8, 2019

ryasmi Jan 8, 2019 •

edited

Loading

garemoko Jan 8, 2019

ryasmi Jan 10, 2019 •

edited

Loading

ryasmi Jan 8, 2019

HT2Bot commented Jan 8, 2019

feat: Adds resendfailedbatches setting to allow failed statements to be retried. #367

feat: Adds resendfailedbatches setting to allow failed statements to be retried. #367

Conversation

garemoko commented Dec 20, 2018 • edited Loading

garemoko commented Dec 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garemoko Dec 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garemoko commented Jan 7, 2019

ryasmi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryasmi Jan 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryasmi Jan 10, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HT2Bot commented Jan 8, 2019

feat: Adds `resendfailedbatches` setting to allow failed statements to be retried. #367

feat: Adds `resendfailedbatches` setting to allow failed statements to be retried. #367

garemoko commented Dec 20, 2018 •

edited

Loading

garemoko Dec 21, 2018 •

edited

Loading

ryasmi Jan 8, 2019 •

edited

Loading

ryasmi Jan 10, 2019 •

edited

Loading