-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Parallelism configuration option to lakectl #8283
Added Parallelism configuration option to lakectl #8283
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Concise fix :-)
Confused about the precedence of the 2 settings for parallelism, so asking for a change in case I am right about the required precedence.
cmd/lakectl/cmd/root.go
Outdated
defaultSyncParallelism = 25 | ||
defaultSyncPresign = true | ||
defaultNoProgress = false | ||
defaultParallelismConfig = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use 0? If not, we need to document what "-1" means.
pkg/local/sync.go
Outdated
parallelismToUse := s.cfg.Parallelism | ||
if parallelismToUse <= 0 { | ||
parallelismToUse = s.cfg.SyncFlags.Parallelism | ||
} | ||
for i := 0; i < parallelismToUse; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 changes, the first is more important:
- AFAIU SyncFlags.Parallelism is more specific than the new setting, so if both are specified we should use it and not the new more global setting.
- It is enough to call the var "parallelism", it will obviously be used.
parallelismToUse := s.cfg.Parallelism | |
if parallelismToUse <= 0 { | |
parallelismToUse = s.cfg.SyncFlags.Parallelism | |
} | |
for i := 0; i < parallelismToUse; i++ { | |
parallelism := s.cfg.SyncFlags.Parallelism | |
if parallelism <= 0 { | |
parallelism = s.cfg.Parallelism | |
} | |
for i := 0; i < parallelism; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I understand, problem is, the default for s.cfg.SyncFlags.Parallelism
is 25, so unless the user flagged with a 0 or less, the used value will be the default flag and not the configuration.
🎊 PR Preview 665c8e1 has been successfully built and deployed to https://treeverse-lakeFS-preview-pr-8283.surge.sh 🕐 Build time: 0.009s 🤖 By surge-preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the flag redundancy here.
You can have both a default param value and override it with a config value if exists it's not mutually exclusive.
cmd/lakectl/cmd/root.go
Outdated
defaultParallelismConfig = 25 | ||
defaultSyncParallelism = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why both are needed.
We should:
- Use configuration param if provided
- Use modified param value if passed as part of command
- Fallback to default param value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was, that positive default value should come from one source only. In case both have positive default values how will we determine which one to use? I chose the config val to be positive by default, following Ariel's change request to go from the more general case to specific case.
In any case, this code run according to the priorities:
- no flag no config val - uses 25 as config default
- no flag with config - uses config
- flag with no config - uses the flag,
- flag with config - uses flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your priorities seem correct. We have a way to determine whether a flag in Cobra was explicitly provided by the user or not (there's at least one place in the code where we use it AFAIR).
You can use the same method to do that and determine whether we need to use the config value or the param value
665c8e1
to
56bf6e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ItamarYuran thanks! I think we're in the right direction!
Added some more comments, if anything is not clear - let's sync to reduce the review cycle.
pkg/local/sync.go
Outdated
@@ -74,7 +74,11 @@ func (s *SyncManager) Sync(rootPath string, remote *uri.URI, changeSet <-chan *C | |||
defer s.progressBar.Stop() | |||
|
|||
wg, ctx := errgroup.WithContext(s.ctx) | |||
for i := 0; i < s.cfg.SyncFlags.Parallelism; i++ { | |||
parallelism := s.cfg.Parallelism |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not needed anymore?
cmd/lakectl/cmd/root.go
Outdated
|
||
presignMode := getPresignMode(cmd, client) | ||
return local.SyncFlags{ | ||
SetParallelism: setParallelism, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't really need SetParallelism anymore.
Note that you have cfg in your context here, so basically you can make the decision (what value to use for the parallelism) here.
pkg/local/config.go
Outdated
@@ -9,6 +9,7 @@ const ( | |||
) | |||
|
|||
type SyncFlags struct { | |||
SetParallelism bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not needed anymore, right?
cmd/lakectl/cmd/root.go
Outdated
@@ -574,6 +579,7 @@ func initConfig() { | |||
viper.SetDefault("server.retries.min_wait_interval", defaultMinRetryInterval) | |||
viper.SetDefault("experimental.local.posix_permissions.enabled", false) | |||
viper.SetDefault("local.skip_non_regular_files", false) | |||
viper.SetDefault("options.parallelism", defaultParallelism) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need a default for this value since the default is handled in param
@N-o-Z Thank you for your review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ItamarYuran Perfect!
Please note the single comment remained
pkg/local/sync.go
Outdated
parallelism := s.cfg.SyncFlags.Parallelism | ||
|
||
for i := 0; i < parallelism; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please revert this change (redundant assignment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closes #8159
Change Description
Added the option to configure the amount of parallelism for lakectl.
See feature request for more backround.
Currently, the configuration nests in a new section "Options"