Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue #47 If stalled job restarts, the already exported data is overwritten #64

Open
wants to merge 2 commits into
base: 2
Choose a base branch
from

Conversation

SanderHamaka
Copy link

Issue #47
If an export job stalls and restarts the CSV writer overwrites the file resulting in an export only containing the records from the last (re)start and no header row.
By changing the mode to append instead of write every record (even after a restart of the job) is appended on a new line. Because of that, the setNewLine has to be reset to empty otherwise there is an extra empty line between each record in the file.

… CSV writer overwrites the file resulting in an export only containing the records from the last (re)start and no header row.

By changing the mode to append instead of write every record is appended on a new line. Because of that, the setNewLine has to be reset otherwise there is an empty line between each record in the file.
src/Jobs/GenerateCSVJob.php Outdated Show resolved Hide resolved
@dhensby
Copy link
Contributor

dhensby commented Mar 10, 2022

OK - thanks for this. These are some changes that make me a little worried and we can see are now causing the tests to fail. I think due to the line ending changes.

The fact that this causes blank lines between values is strange when we're in append mode, that seems like a bug in the CSV writer, but hard to be sure.

I definitely see that exports that take a few rounds of output need a way to append to existing files, so this looks like an important fix. If we use append mode, can this mean a new job appends an old job's file, or are the output names always unique?

@SanderHamaka
Copy link
Author

OK - thanks for this. These are some changes that make me a little worried and we can see are now causing the tests to fail. I think due to the line ending changes.

The fact that this causes blank lines between values is strange when we're in append mode, that seems like a bug in the CSV writer, but hard to be sure.

I definitely see that exports that take a few rounds of output need a way to append to existing files, so this looks like an important fix. If we use append mode, can this mean a new job appends an old job's file, or are the output names always unique?

I can only talk from my own experience but looking at 30+ large exports that took multiple rounds to complete, the output names seem to be unique.

@GuySartorelli
Copy link
Member

GuySartorelli commented Mar 10, 2022

If we use append mode, can this mean a new job appends an old job's file, or are the output names always unique?

The file name is a hash based on a random token generated when the job is constructed, so it should be unique for each new job instance.

$this->ID = Injector::inst()->create(RandomGenerator::class)->randomToken('sha1');

public function getSignature()
{
return md5(get_class($this) . '-' . $this->ID);
}

Though I thought (from memory, so definitely don't take this as definitely how it works) a new QueuedJob instance was created each time the job is run - so wouldn't this be unique between rounds of the same job as well?

@SanderHamaka
Copy link
Author

If we use append mode, can this mean a new job appends an old job's file, or are the output names always unique?

The file name is a hash based on a random token generated when the job is constructed, so it should be unique for each new job instance.

$this->ID = Injector::inst()->create(RandomGenerator::class)->randomToken('sha1');

public function getSignature()
{
return md5(get_class($this) . '-' . $this->ID);
}

Though I thought (from memory, so definitely don't take this as definitely how it works) a new QueuedJob instance was created each time the job is run - so wouldn't this be unique between rounds of the same job as well?

I can see in my output that between rounds the same file in the same directory is used. I have exports of 10.000+ records. Before this fix it stalled (and resumed) after processing 6000 records resulting in a single csv file in that folder with the final 4000 records. After this fix, the exports still get a unique folder and a unique file but now I have the header row and the 10.000+ records. There always is just one csv file in the new folder after the job finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants