Fix for issue #47 If stalled job restarts, the already exported data is overwritten #64

SanderHamaka · 2022-03-09T11:53:35Z

Issue #47
If an export job stalls and restarts the CSV writer overwrites the file resulting in an export only containing the records from the last (re)start and no header row.
By changing the mode to append instead of write every record (even after a restart of the job) is appended on a new line. Because of that, the setNewLine has to be reset to empty otherwise there is an extra empty line between each record in the file.

… CSV writer overwrites the file resulting in an export only containing the records from the last (re)start and no header row. By changing the mode to append instead of write every record is appended on a new line. Because of that, the setNewLine has to be reset otherwise there is an empty line between each record in the file.

src/Jobs/GenerateCSVJob.php

dhensby · 2022-03-10T15:36:29Z

OK - thanks for this. These are some changes that make me a little worried and we can see are now causing the tests to fail. I think due to the line ending changes.

The fact that this causes blank lines between values is strange when we're in append mode, that seems like a bug in the CSV writer, but hard to be sure.

I definitely see that exports that take a few rounds of output need a way to append to existing files, so this looks like an important fix. If we use append mode, can this mean a new job appends an old job's file, or are the output names always unique?

SanderHamaka · 2022-03-10T16:43:02Z

OK - thanks for this. These are some changes that make me a little worried and we can see are now causing the tests to fail. I think due to the line ending changes.

The fact that this causes blank lines between values is strange when we're in append mode, that seems like a bug in the CSV writer, but hard to be sure.

I definitely see that exports that take a few rounds of output need a way to append to existing files, so this looks like an important fix. If we use append mode, can this mean a new job appends an old job's file, or are the output names always unique?

I can only talk from my own experience but looking at 30+ large exports that took multiple rounds to complete, the output names seem to be unique.

GuySartorelli · 2022-03-10T21:08:46Z

If we use append mode, can this mean a new job appends an old job's file, or are the output names always unique?

The file name is a hash based on a random token generated when the job is constructed, so it should be unique for each new job instance.

silverstripe-gridfieldqueuedexport/src/Jobs/GenerateCSVJob.php

Line 87 in 9812e9d

    
           $this->ID = Injector::inst()->create(RandomGenerator::class)->randomToken('sha1');

silverstripe-gridfieldqueuedexport/src/Jobs/GenerateCSVJob.php

Lines 113 to 116 in 9812e9d

    
           public function getSignature() 
        
           { 
        
               return md5(get_class($this) . '-' . $this->ID); 
        
           }

Though I thought (from memory, so definitely don't take this as definitely how it works) a new QueuedJob instance was created each time the job is run - so wouldn't this be unique between rounds of the same job as well?

SanderHamaka · 2022-03-11T09:19:26Z

If we use append mode, can this mean a new job appends an old job's file, or are the output names always unique?

The file name is a hash based on a random token generated when the job is constructed, so it should be unique for each new job instance.

silverstripe-gridfieldqueuedexport/src/Jobs/GenerateCSVJob.php

Line 87 in 9812e9d

$this->ID = Injector::inst()->create(RandomGenerator::class)->randomToken('sha1');

silverstripe-gridfieldqueuedexport/src/Jobs/GenerateCSVJob.php

Lines 113 to 116 in 9812e9d

public function getSignature()

{

return md5(get_class($this) . '-' . $this->ID);

}

Though I thought (from memory, so definitely don't take this as definitely how it works) a new QueuedJob instance was created each time the job is run - so wouldn't this be unique between rounds of the same job as well?

I can see in my output that between rounds the same file in the same directory is used. I have exports of 10.000+ records. Before this fix it stalled (and resumed) after processing 6000 records resulting in a single csv file in that folder with the final 4000 records. After this fix, the exports still get a unique folder and a unique file but now I have the header row and the 10.000+ records. There always is just one csv file in the new folder after the job finished.

dhensby suggested changes Mar 9, 2022

View reviewed changes

src/Jobs/GenerateCSVJob.php Outdated Show resolved Hide resolved

setNewLine removed

9812e9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for issue #47 If stalled job restarts, the already exported data is overwritten #64

Fix for issue #47 If stalled job restarts, the already exported data is overwritten #64

SanderHamaka commented Mar 9, 2022

dhensby commented Mar 10, 2022

SanderHamaka commented Mar 10, 2022

GuySartorelli commented Mar 10, 2022 •

edited

Loading

SanderHamaka commented Mar 11, 2022

Fix for issue #47 If stalled job restarts, the already exported data is overwritten #64

Are you sure you want to change the base?

Fix for issue #47 If stalled job restarts, the already exported data is overwritten #64

Conversation

SanderHamaka commented Mar 9, 2022

dhensby commented Mar 10, 2022

SanderHamaka commented Mar 10, 2022

GuySartorelli commented Mar 10, 2022 • edited Loading

SanderHamaka commented Mar 11, 2022

GuySartorelli commented Mar 10, 2022 •

edited

Loading