Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adds resendfailedbatches setting to allow failed statements to be retried. #367

Merged
merged 16 commits into from
Jan 8, 2019
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion classes/task/emit_task.php
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,16 @@ private function get_failed_events($events) {
return $failedevents;
}

private function get_successful_events($events) {
$loadedevents = array_filter($events, function ($loadedevent) {
return $loadedevent['loaded'] === true;
});
$successfulevents = array_map(function ($loadedevent) {
return $loadedevent['event'];
}, $loadedevents);
return $successfulevents;
}

private function get_event_ids($loadedevents) {
return array_map(function ($loadedevent) {
return $loadedevent['event']->id;
Expand All @@ -61,13 +71,17 @@ private function delete_processed_events($events) {
global $DB;
$eventids = $this->get_event_ids($events);
$DB->delete_records_list('logstore_xapi_log', 'id', $eventids);
mtrace("Events (".implode(', ', $eventids).") have been successfully sent to LRS.");
}

private function store_failed_events($events) {
global $DB;
$failedevents = $this->get_failed_events($events);
$DB->insert_records('logstore_xapi_failed_log', $failedevents);
mtrace(count($failedevents) . " event(s) have failed to send to the LRS.");
}

private function record_successful_events($events) {
mtrace(count($this->get_successful_events($events)) . " event(s) have been successfully sent to the LRS.");
}

/**
Expand All @@ -81,6 +95,7 @@ public function execute() {
$extractedevents = $this->extract_events($store->get_max_batch_size());
$loadedevents = $store->process_events($extractedevents);
$this->store_failed_events($loadedevents);
$this->record_successful_events($loadedevents);
$this->delete_processed_events($loadedevents);
}
}
17 changes: 15 additions & 2 deletions src/loader/utils/load_batch.php
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,23 @@ function load_batch(array $config, array $transformedevents, callable $loader) {
$loadedevents = construct_loaded_events($transformedevents, true);
return $loadedevents;
} catch (\Exception $e) {
$batchsize = count($transformedevents);
$logerror = $config['log_error'];
$logerror("Failed load for event id #" . $eventobj->id . ": " . $e->getMessage());
$logerror("Failed load batch (" . $batchsize . " events)" . $e->getMessage());
$logerror($e->getTraceAsString());
$loadedevents = construct_loaded_events($transformedevents, false);

// Recursively retry sending statements in increasingly smaller batches so that only the actual bad data fails.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will cause some issues because this will delay the time the CRON takes to execute so the next CRON will try to process the same events so we'll duplicate statements. Hope that makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that Moodle cron has some built in stuff that will prevent that happening. See "running cron on multiple servers" here: https://docs.moodle.org/36/en/Cron

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah didn't know that, good spot.

Tasks can run in parallel and processes use locking to prevent tasks from running at the same time which allows cron to be triggered from multiple web servers that serve the same Moodle instance.

Do you think it would be better to just validate the statements before we send them? Otherwise if the first statement in the batch is invalid we might send X requests where X is the max statements in a batch.

Copy link
Contributor Author

@garemoko garemoko Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validating before we send them is good, but it'll be a lot of effort to ensure validation is as good and matches the validation done by the LRS.

X requests would not be sent in that scenario. Let's say it's a batch of 10 statements the first statement is bad.

Round 1: 1 batch of 10 statements fails
Round 2: 2 batches of 5 statements (1 fails, 1 succeeds)
Round 3: 2 batches of up to 3 statements (1 fails, 1 succeeds)
Round 4: 2 batches of up to 2 statements (1 fails, 1 succeeds)
Round 5: 2 batches of 1 statement (1 fails, 1 succeeds)

Total requests: 9. Ok, that's actually pretty close to 10, but if you have up to 12 events, it's still only 9 requests!

I'm open to suggestions of a better approach. The current approach of failing the whole batch is not great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah matching the validation of the LRS is pretty much impossible given the slight differences between the LRSs and the little holes in the conformance tests. I'd be interested to know what other people think about this issue (@davidpesce and @caperneoignis). I don't think the current approach is too bad, agree that it's not ideal though, but I'm not sure what the ideal solution is either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made changes so the recursive retry functionality only triggers in the event of a 400 Bad Request response from the LRS. This ensures that in the event of slow responses we're not retrying and retrying so that cron gets blocked.

I do still think we should retry in the event of 404, 429, 503 and 504, but that can be handled as a separate issue, and probably means retrying next time cron runs rather than in the same cron task.

I like the approach of us tackling specific error codes or groups of error codes one by one, rather than having a catch all approach, so happy for this to start with just 400 errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry avoided checking in here to avoid getting in trouble with the better half. I'm quite happy to retry on any error code and perhaps only put events in the failed log if we get a 400-499 error code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also putting this behind a flag would be good I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a setting but haven't tested it just yet. I'll comment again once I have tested. (It's passing automated tests)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks Andrew 👍

if ($batchsize === 1) {
$loadedevents = construct_loaded_events($transformedevents, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would return early here instead of assigning to loadedevents to be returned later because it's quicker to read since you don't need to scroll down looking for any uses/modifications to loadedevents. Happy to merge with this just sharing that because you might agree and find it useful in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good point.

} else {
$newconfig = $config;
$newconfig['lrs_max_batch_size'] = round($batchsize / 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the way you solved this recursively with a modified copy of $config.

$batches = get_event_batches($newconfig, $transformedevents);
$loadedevents = array_reduce($batches, function ($result, $batch) use ($newconfig, $loader) {
$loadedbatchevents = load_batch($newconfig, $batch, $loader);
return array_merge($result, $loadedbatchevents);
}, []);
}
return $loadedevents;
}
}