-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate tasks from diffs for exporting #803
Conversation
a4c1e0d
to
61caf0d
Compare
export/tasks_generator.go
Outdated
"fmt" | ||
"strings" | ||
|
||
rinqueue "github.com/erikdubbelboer/ringqueue" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting - why rinqueue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto-happened because of something Go-ish. There is a bug in ringqueue
that the package is actually named rinqueue
. Anything else causes fun with gofmt
while saving etc.
export/tasks_generator.go
Outdated
|
||
const successFilename = "_lakefs_success" | ||
|
||
const CopyActipn = "export:copy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- s/Actipn/Action/
- can scope them in one const - usually, it just telling me they are all part of the same gang
- optional add a type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1+2 are great. Not doing 3, because Action
is actually a string
on TaskData
.
export/tasks_generator.go
Outdated
return *ret, true | ||
} | ||
|
||
func makeDirMatchCache(pred func(path string) bool) *dirMatchCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func makeDirMatchCache(pred func(path string) bool) *dirMatchCache { | |
func newDirMatchCache(pred func(path string) bool) *dirMatchCache { |
any reason you prefer 'make' and not the usual 'new' prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value factories are called make
in C++. Wait, you mean we're not using C++?!
(Done)
export/tasks_generator.go
Outdated
dir = dirname(dir) | ||
var ok bool | ||
if ret, ok = dmc.upMatchCache[dir]; ok { | ||
break | ||
} | ||
if dmc.pred(dir) { | ||
copy := dir | ||
ret = © | ||
break | ||
} | ||
if dir == "" { | ||
break | ||
} | ||
} | ||
for dir = dirname(filename); dir != "" && (ret == nil || dir != *ret); dir = dirname(dir) { | ||
dmc.upMatchCache[dir] = ret | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does pred
can return true for empty dir?
if empty dir got no meaning and we do not cache the result for empty dir, I think we can move the dir == ""
check to be after dir = dirname(dir)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point: it certainly does make sense to return true for the empty directory.
Exporting the object and writing a test for it, so I can fix its behaviour (in the sense of not changing, and also in the sense of making it behave correctly).
export/tasks_generator.go
Outdated
return &dirMatchCache{pred: pred, upMatchCache: make(map[string]*string)} | ||
} | ||
|
||
// generateTasksFromDiffs converts diffs into many tasks that depend on startTaskID, with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/generateTasksFromDiffs/GenerateTasksFromDiffs/
var ErrBadTypeConversion = errors.New("bad type") | ||
|
||
// nolint: stylecheck | ||
func (dst *TaskStatusCodeValue) Scan(src interface{}) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, all the receivers have the same name (and optionally - one letter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. But the sql
package does it this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more than one letter - but I don't think for the same type have a different name for the receiver (src, dst)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it's pgtype that does this for Scan
and Value
. Will keep this code as it does follow the gold standard.
return fmt.Errorf("cannot convert %T to TaskStatusCodeValue: %w", src, ErrBadTypeConversion) | ||
} | ||
|
||
if !(sc == TaskPending || sc == TaskInProgress || sc == TaskAborted || sc == TaskCompleted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't commit on previous lines but check the code-doc of these of does not have the matched constant name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand what I need to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the tasks here got wrong code-doc comment - ex: TaskAborted is ABORTED
// TaskPending indicates a task is waiting for an actor to perform it (new or being
// retried)
TaskPending TaskStatusCodeValue = "pending"
// IN_PROGRESS indicates a task is being performed by an actor.
TaskInProgress TaskStatusCodeValue = "in-progress"
// ABORTED indicates an actor has aborted this task with message, will not be reissued
TaskAborted TaskStatusCodeValue = "aborted"
// TaskCompleted indicates an actor has completed this task with message, will not reissued
TaskCompleted TaskStatusCodeValue = "completed"
// TaskInvalid is used by the API to report errors
TaskInvalid TaskStatusCodeValue = "[invalid]"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thanks! I went from C++ style to Go style and missed the comments. It's part of parade, so on #812.
parade/parade_test.go
Outdated
t.Helper() | ||
rows, err := db.Query(`SELECT id FROM tasks WHERE id LIKE format('%s%%', $1::text)`, prefix) | ||
if err != nil { | ||
t.Errorf("[I] select remaining IDs for prefix %s: %s", prefix, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fatalf
for rows.Next() { | ||
var id parade.TaskID | ||
if err = rows.Scan(&id); err != nil { | ||
t.Errorf("[I] scan ID value: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fatalf - not sure you can continue to scan after that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? It's just a broken row (e.g. parse failure). rows.Next
is welcome to fail if the database is 🔥 on fire.
So you're actually right, the code is wrong, and I added a test that rows were successfully scanned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that calling Next after Scan that failed will probably exit. The code reader that sees Errorf will think that this error is something we can live with and continue - in this case adding the id to the slice, which is not what we wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not done, because of a difference in how we two think about test failures.
This is how Moketa the code monkey once tried to teach me how to test. Basically, stop the test only when continuing the test will fail on testing code. Otherwise continue and print more error messages to get more data about the failure. In this case I don't mind pushing a dud id onto the slice: it won't be the first error, but the remaining error messages might help me understand what went wrong. E.g. otherwise I don't see any of the other, possibly successfully-parsed, IDs.
parade/parade_test.go
Outdated
if err := dst.Scan(c.in); err != nil { | ||
if !errors.Is(err, c.err) { | ||
t.Errorf("got err %v, expected %v", err, c.err) | ||
} | ||
} else { | ||
if dst != c.out { | ||
t.Errorf("expected %s, got %s", c.out, dst) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := dst.Scan(c.in); err != nil { | |
if !errors.Is(err, c.err) { | |
t.Errorf("got err %v, expected %v", err, c.err) | |
} | |
} else { | |
if dst != c.out { | |
t.Errorf("expected %s, got %s", c.out, dst) | |
} | |
} | |
err := dst.Scan(c.in) | |
if !errors.Is(err, c.err) { | |
t.Fatalf("got err %v, expected %v", err, c.err) | |
} | |
if err == nil && dst != c.out { | |
t.Fatalf("expected %s, got %s", c.out, dst) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Taking it but keeping it an error rather than a fatal. It gives me less indentation (which is good for my 80-column brain) and it tests the case where an error was expected but not received. Thanks!
Codecov Report
@@ Coverage Diff @@
## master #803 +/- ##
==========================================
+ Coverage 41.87% 42.57% +0.69%
==========================================
Files 133 134 +1
Lines 10435 10583 +148
==========================================
+ Hits 4370 4506 +136
- Misses 5483 5491 +8
- Partials 582 586 +4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @nopcoder ! PTAL...
export/tasks_generator.go
Outdated
"fmt" | ||
"strings" | ||
|
||
rinqueue "github.com/erikdubbelboer/ringqueue" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto-happened because of something Go-ish. There is a bug in ringqueue
that the package is actually named rinqueue
. Anything else causes fun with gofmt
while saving etc.
export/tasks_generator.go
Outdated
|
||
const successFilename = "_lakefs_success" | ||
|
||
const CopyActipn = "export:copy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1+2 are great. Not doing 3, because Action
is actually a string
on TaskData
.
export/tasks_generator.go
Outdated
dir = dirname(dir) | ||
var ok bool | ||
if ret, ok = dmc.upMatchCache[dir]; ok { | ||
break | ||
} | ||
if dmc.pred(dir) { | ||
copy := dir | ||
ret = © | ||
break | ||
} | ||
if dir == "" { | ||
break | ||
} | ||
} | ||
for dir = dirname(filename); dir != "" && (ret == nil || dir != *ret); dir = dirname(dir) { | ||
dmc.upMatchCache[dir] = ret | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point: it certainly does make sense to return true for the empty directory.
Exporting the object and writing a test for it, so I can fix its behaviour (in the sense of not changing, and also in the sense of making it behave correctly).
export/tasks_generator.go
Outdated
return *ret, true | ||
} | ||
|
||
func makeDirMatchCache(pred func(path string) bool) *dirMatchCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value factories are called make
in C++. Wait, you mean we're not using C++?!
(Done)
export/tasks_generator.go
Outdated
// "generate success" task after generating all files in each directory that matches | ||
// generateSuccessFor. | ||
func GenerateTasksFromDiffs(exportID string, dstPrefix string, diffs catalog.Differences, generateSuccessFor func(path string) bool) ([]parade.TaskData, error) { | ||
const initialSize = 100000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it matters, so reduced 100x.
return nil, fmt.Errorf("%s: failed to serialize %+v: %w", successPath, data, err) | ||
} | ||
bodyStr := string(body) | ||
totalDependencies := td // copy to get new address each time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh? "I just want an lvalue reference...".
var ErrBadTypeConversion = errors.New("bad type") | ||
|
||
// nolint: stylecheck | ||
func (dst *TaskStatusCodeValue) Scan(src interface{}) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. But the sql
package does it this way
return fmt.Errorf("cannot convert %T to TaskStatusCodeValue: %w", src, ErrBadTypeConversion) | ||
} | ||
|
||
if !(sc == TaskPending || sc == TaskInProgress || sc == TaskAborted || sc == TaskCompleted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand what I need to do here.
for rows.Next() { | ||
var id parade.TaskID | ||
if err = rows.Scan(&id); err != nil { | ||
t.Errorf("[I] scan ID value: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? It's just a broken row (e.g. parse failure). rows.Next
is welcome to fail if the database is 🔥 on fire.
So you're actually right, the code is wrong, and I added a test that rows were successfully scanned.
parade/parade_test.go
Outdated
if err := dst.Scan(c.in); err != nil { | ||
if !errors.Is(err, c.err) { | ||
t.Errorf("got err %v, expected %v", err, c.err) | ||
} | ||
} else { | ||
if dst != c.out { | ||
t.Errorf("expected %s, got %s", c.out, dst) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Taking it but keeping it an error rather than a fatal. It gives me less indentation (which is good for my 80-column brain) and it tests the case where an error was expected but not received. Thanks!
f3046b6
to
0cc7e51
Compare
Correctly handle enum values and report errors nicely. This will give better behaviour when adding new enum values, and also makes code type-safer which is nice.
Only useful for tests, but still necessary for them.
These should *not* decrement total_dependencies, they've already been counted on num_signals.
This will be _partially_ filled in when computing differences, according to some requested set of fields. This will allow various applications of differences.
It's actually manually generated, and we missed it again on master.
Code was correct, but efficiency might not matter here.
Path is now Entry.Path.
This is a fairly major change to `DirMatchCache`, so explicitly test it (and capitalize `dirMatchCache` to export it...).
Turns out this is Go not C++, and constructors are not named like C++ value factories.
Remove dependency on `ringqueue`, which doesn't even name its package correctly. This might take a bit more memory, and/or be more CPU cache friendly. So it probably does not matter.
0cc7e51
to
d7eb030
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of #534.
Reviewers: