Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Parallelism configuration option to lakectl #8283

Conversation

ItamarYuran
Copy link
Contributor

Closes #8159

Change Description

Added the option to configure the amount of parallelism for lakectl.

See feature request for more backround.

Currently, the configuration nests in a new section "Options"

@ItamarYuran ItamarYuran added area/lakectl Issues related to lakeFS' command line interface (lakectl) include-changelog PR description should be included in next release changelog labels Oct 13, 2024
Copy link

E2E Test Results - Quickstart

11 passed

Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Concise fix :-)

Confused about the precedence of the 2 settings for parallelism, so asking for a change in case I am right about the required precedence.

defaultSyncParallelism = 25
defaultSyncPresign = true
defaultNoProgress = false
defaultParallelismConfig = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use 0? If not, we need to document what "-1" means.

Comment on lines 77 to 81
parallelismToUse := s.cfg.Parallelism
if parallelismToUse <= 0 {
parallelismToUse = s.cfg.SyncFlags.Parallelism
}
for i := 0; i < parallelismToUse; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 changes, the first is more important:

  • AFAIU SyncFlags.Parallelism is more specific than the new setting, so if both are specified we should use it and not the new more global setting.
  • It is enough to call the var "parallelism", it will obviously be used.
Suggested change
parallelismToUse := s.cfg.Parallelism
if parallelismToUse <= 0 {
parallelismToUse = s.cfg.SyncFlags.Parallelism
}
for i := 0; i < parallelismToUse; i++ {
parallelism := s.cfg.SyncFlags.Parallelism
if parallelism <= 0 {
parallelism = s.cfg.Parallelism
}
for i := 0; i < parallelism; i++ {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
I understand, problem is, the default for s.cfg.SyncFlags.Parallelism is 25, so unless the user flagged with a 0 or less, the used value will be the default flag and not the configuration.

Copy link

github-actions bot commented Oct 14, 2024

🎊 PR Preview 665c8e1 has been successfully built and deployed to https://treeverse-lakeFS-preview-pr-8283.surge.sh

🕐 Build time: 0.009s

🤖 By surge-preview

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the flag redundancy here.
You can have both a default param value and override it with a config value if exists it's not mutually exclusive.

Comment on lines 159 to 160
defaultParallelismConfig = 25
defaultSyncParallelism = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why both are needed.
We should:

  1. Use configuration param if provided
  2. Use modified param value if passed as part of command
  3. Fallback to default param value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was, that positive default value should come from one source only. In case both have positive default values how will we determine which one to use? I chose the config val to be positive by default, following Ariel's change request to go from the more general case to specific case.
In any case, this code run according to the priorities:

  • no flag no config val - uses 25 as config default
  • no flag with config - uses config
  • flag with no config - uses the flag,
  • flag with config - uses flag

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your priorities seem correct. We have a way to determine whether a flag in Cobra was explicitly provided by the user or not (there's at least one place in the code where we use it AFAIR).
You can use the same method to do that and determine whether we need to use the config value or the param value

@ItamarYuran ItamarYuran force-pushed the 8159-allow-for-the-lakectl-local-parallelism-flag-to-have-persistent-default-values-through-env-varsconfig branch from 665c8e1 to 56bf6e1 Compare October 15, 2024 14:21
Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ItamarYuran thanks! I think we're in the right direction!
Added some more comments, if anything is not clear - let's sync to reduce the review cycle.

@@ -74,7 +74,11 @@ func (s *SyncManager) Sync(rootPath string, remote *uri.URI, changeSet <-chan *C
defer s.progressBar.Stop()

wg, ctx := errgroup.WithContext(s.ctx)
for i := 0; i < s.cfg.SyncFlags.Parallelism; i++ {
parallelism := s.cfg.Parallelism
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not needed anymore?


presignMode := getPresignMode(cmd, client)
return local.SyncFlags{
SetParallelism: setParallelism,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't really need SetParallelism anymore.
Note that you have cfg in your context here, so basically you can make the decision (what value to use for the parallelism) here.

@@ -9,6 +9,7 @@ const (
)

type SyncFlags struct {
SetParallelism bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not needed anymore, right?

@@ -574,6 +579,7 @@ func initConfig() {
viper.SetDefault("server.retries.min_wait_interval", defaultMinRetryInterval)
viper.SetDefault("experimental.local.posix_permissions.enabled", false)
viper.SetDefault("local.skip_non_regular_files", false)
viper.SetDefault("options.parallelism", defaultParallelism)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need a default for this value since the default is handled in param

@ItamarYuran
Copy link
Contributor Author

@N-o-Z Thank you for your review!
Got rid of all the redundancies. LMK what do you think.

@ItamarYuran ItamarYuran requested a review from N-o-Z October 22, 2024 09:17
Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ItamarYuran Perfect!
Please note the single comment remained

Comment on lines 76 to 78
parallelism := s.cfg.SyncFlags.Parallelism

for i := 0; i < parallelism; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please revert this change (redundant assignment)

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, and thanks for continuing the review @N-o-Z in the best way! @N-o-Z 's comment remains, of course.

@ItamarYuran ItamarYuran merged commit 2f4a142 into master Oct 23, 2024
37 checks passed
@ItamarYuran ItamarYuran deleted the 8159-allow-for-the-lakectl-local-parallelism-flag-to-have-persistent-default-values-through-env-varsconfig branch October 23, 2024 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/lakectl Issues related to lakeFS' command line interface (lakectl) include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for the lakectl local --parallelism flag to have persistent default values through env vars/config
3 participants