-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adds resendfailedbatches
setting to allow failed statements to be retried.
#367
Changes from 7 commits
1d2dacc
50e817d
e5646a5
eb699fe
9ef0e93
9e107ed
1800b97
07576a8
8f8b933
708c2ac
47e60e2
4f899c3
5a37bf7
fc850b4
08e72bd
5cb6db3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,10 +28,23 @@ function load_batch(array $config, array $transformedevents, callable $loader) { | |
$loadedevents = construct_loaded_events($transformedevents, true); | ||
return $loadedevents; | ||
} catch (\Exception $e) { | ||
$batchsize = count($transformedevents); | ||
$logerror = $config['log_error']; | ||
$logerror("Failed load for event id #" . $eventobj->id . ": " . $e->getMessage()); | ||
$logerror("Failed load batch (" . $batchsize . " events)" . $e->getMessage()); | ||
$logerror($e->getTraceAsString()); | ||
$loadedevents = construct_loaded_events($transformedevents, false); | ||
|
||
// Recursively retry sending statements in increasingly smaller batches so that only the actual bad data fails. | ||
if ($batchsize === 1) { | ||
$loadedevents = construct_loaded_events($transformedevents, false); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally I would return early here instead of assigning to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, good point. |
||
} else { | ||
$newconfig = $config; | ||
$newconfig['lrs_max_batch_size'] = round($batchsize / 2); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Really like the way you solved this recursively with a modified copy of |
||
$batches = get_event_batches($newconfig, $transformedevents); | ||
$loadedevents = array_reduce($batches, function ($result, $batch) use ($newconfig, $loader) { | ||
$loadedbatchevents = load_batch($newconfig, $batch, $loader); | ||
return array_merge($result, $loadedbatchevents); | ||
}, []); | ||
} | ||
return $loadedevents; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will cause some issues because this will delay the time the CRON takes to execute so the next CRON will try to process the same events so we'll duplicate statements. Hope that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that Moodle cron has some built in stuff that will prevent that happening. See "running cron on multiple servers" here: https://docs.moodle.org/36/en/Cron
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah didn't know that, good spot.
Do you think it would be better to just validate the statements before we send them? Otherwise if the first statement in the batch is invalid we might send X requests where X is the max statements in a batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validating before we send them is good, but it'll be a lot of effort to ensure validation is as good and matches the validation done by the LRS.
X requests would not be sent in that scenario. Let's say it's a batch of 10 statements the first statement is bad.
Round 1: 1 batch of 10 statements fails
Round 2: 2 batches of 5 statements (1 fails, 1 succeeds)
Round 3: 2 batches of up to 3 statements (1 fails, 1 succeeds)
Round 4: 2 batches of up to 2 statements (1 fails, 1 succeeds)
Round 5: 2 batches of 1 statement (1 fails, 1 succeeds)
Total requests: 9. Ok, that's actually pretty close to 10, but if you have up to 12 events, it's still only 9 requests!
I'm open to suggestions of a better approach. The current approach of failing the whole batch is not great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah matching the validation of the LRS is pretty much impossible given the slight differences between the LRSs and the little holes in the conformance tests. I'd be interested to know what other people think about this issue (@davidpesce and @caperneoignis). I don't think the current approach is too bad, agree that it's not ideal though, but I'm not sure what the ideal solution is either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made changes so the recursive retry functionality only triggers in the event of a 400 Bad Request response from the LRS. This ensures that in the event of slow responses we're not retrying and retrying so that cron gets blocked.
I do still think we should retry in the event of 404, 429, 503 and 504, but that can be handled as a separate issue, and probably means retrying next time cron runs rather than in the same cron task.
I like the approach of us tackling specific error codes or groups of error codes one by one, rather than having a catch all approach, so happy for this to start with just 400 errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry avoided checking in here to avoid getting in trouble with the better half. I'm quite happy to retry on any error code and perhaps only put events in the failed log if we get a 400-499 error code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also putting this behind a flag would be good I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a setting but haven't tested it just yet. I'll comment again once I have tested. (It's passing automated tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks Andrew 👍