Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1225 config collector elemental #32

Merged
merged 16 commits into from
Jun 7, 2023
Merged

Conversation

jimmykarily
Copy link
Contributor

@jimmykarily jimmykarily commented May 25, 2023

Fixes kairos-io/kairos#1225

and probably kairos-io/kairos#209 too

There are basically 3 fixes here:

  • Merging the user's cloud config before we run the before-install and kairos-install.pre.* hooks
  • Creating all partitions defined in the cloud config, not just the first
  • Expanding the last partition after we have created all the defined partitions, so that we expand the partition we defined.

@jimmykarily jimmykarily self-assigned this May 25, 2023
@@ -106,7 +95,14 @@ func ManualInstall(c string, options map[string]string, strictValidations, debug
}
}

return RunInstall(debug, options)
// Load the installation Config from the system
installConfig, err := elementalConfig.ReadConfigRun("/etc/elemental")
Copy link
Contributor Author

@jimmykarily jimmykarily May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRY this (and put the path to a constant)

Copy link
Member

@Itxaka Itxaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting this to be different.

i.e. the ReadConfigRun to call the collector so we merge options from elemental config <- cloud-config so we can move the options out of elemental, so they are configured during install/upgrade/reset directly? Otherwise this still reads most of stuff from elemental config which is on RO and not easily modificable and we need to get exposing things over to the agent.

Ideally IMO, one function should read both files and merge them, and we can deprecate the elemental config directly and have everything by default unless set on the cloud-config.

thoughts?

@jimmykarily
Copy link
Contributor Author

I was expecting this to be different.

i.e. the ReadConfigRun to call the collector so we merge options from elemental config <- cloud-config so we can move the options out of elemental, so they are configured during install/upgrade/reset directly? Otherwise this still reads most of stuff from elemental config which is on RO and not easily modificable and we need to get exposing things over to the agent.

Ideally IMO, one function should read both files and merge them, and we can deprecate the elemental config directly and have everything by default unless set on the cloud-config.

thoughts?

I was trying to understand the various different configs and this is what I ended up with:

  • pkg/config/config.go is the struct that holds only the relevant parts from the yaml collected with the collector. We have similar structs in the other consumers of the collector. This is the struct to which we unmarshal the yaml created by the collector. This is where user's config ends up being unmarshalled to.
  • pkg/types/v1/config.go seems to be a relic wrapper from when we used to call out to elemental (wrapping the command line options?). We should get rid of this by moving the various options where they make more sense (other configs)
  • RunConfig : yet another "config" which also embeds the one above. This can be gone too?
  • InstallSpec : doesn't have a "config" in its name, but it's still another "config".

I tried to clean my head from what I've seen and think about what kind of configuration we actually need. I see 2 different types:

  • The cloud-config including the user provided one
  • The cli configuration which is simply [optionally] replacing command line arguments that can be passed to the binary.

The user should be able to change both (no hardcoded locations, but maybe default config for the cli could exist).

If this is a good plan ^ I would need some help consolidating and cleaning up because I'm not sure what all those options are used for.

Still, we can still move on with this PR as a fix for custom partitioning and refactor on another PR.

@Itxaka
Copy link
Member

Itxaka commented May 26, 2023

I was expecting this to be different.
i.e. the ReadConfigRun to call the collector so we merge options from elemental config <- cloud-config so we can move the options out of elemental, so they are configured during install/upgrade/reset directly? Otherwise this still reads most of stuff from elemental config which is on RO and not easily modificable and we need to get exposing things over to the agent.
Ideally IMO, one function should read both files and merge them, and we can deprecate the elemental config directly and have everything by default unless set on the cloud-config.
thoughts?

I was trying to understand the various different configs and this is what I ended up with:

* [pkg/config/config.go](https://github.com/kairos-io/kairos-agent/blob/5fe8eb8cc2617e96e609da282d1079a95ba06e0c/pkg/config/config.go#L38) is the struct that holds only the relevant parts from the yaml collected with the collector. We have similar structs in the other consumers of the collector. This is the struct to which we unmarshal the yaml created by the collector. This is where user's config ends up being unmarshalled to.

This was the config from the kairos-agent only, and it used parts of that to build the args to pass to elemental. This was loaded from the cloud-config/config or interactive install inputs

* [pkg/types/v1/config.go](https://github.com/kairos-io/kairos-agent/blob/5fe8eb8cc2617e96e609da282d1079a95ba06e0c/pkg/types/v1/config.go#L42) seems to be a relic wrapper from when we used to call out to `elemental` (wrapping the command line options?). We should get rid of this by moving the various options where they make more sense (other configs)

This is the main config of elemental. This is what is used around in a lot of different places and exposes interfaces so they can be easily substituted by other structs that can fulfill that interface, see for example the Mounter interface and how we have a mocked one so we can tests the whole thing without resorting to e2e tests.

This would be impossible to remove as its being used almost everywhere for everything. You need to log? cfg.Logger, mount stuff? cfg.Mounter.

Not only that but allows for future proofing without too much hassle. Want to substitute the current mounter? Just write a new one or adapt an existing lib to fullfill the interface. Done.

* `RunConfig` : yet another "config" which also embeds the one above. This can be gone too?

Runconfig was the config that affected run stuff (install,reset,upgrade). It embeds the config on it and it was especialized for that run config with values that were only used for it. There was a BuildConfig that had different values that was dedicated to building stuff as as separation of concerns. IMO this one could go away easily as we only have one config there for install/upgrade/reset and the build stuff is somewhere else

* `InstallSpec` : doesn't have a "config" in its name, but it's still another "config".

InstallSpec/UpgradeSpec/ResetSpec is the final spec to be used for action X. Its the results of reading the config and sanitizing it. Its the guide to the action and the values that will be used. Its a byproduct of the config + sanitizing it + adding defaults + runtime stuff (like platform arch).

I tried to clean my head from what I've seen and think about what kind of configuration we actually need. I see 2 different types:

* The cloud-config including the user provided one

* The cli configuration which is simply [optionally] replacing command line arguments that can be passed to the binary.

The user should be able to change both (no hardcoded locations, but maybe default config for the cli could exist).

If this is a good plan ^ I would need some help consolidating and cleaning up because I'm not sure what all those options are used for.

Still, we can still move on with this PR as a fix for custom partitioning and refactor on another PR.

IMO ideally we should.

  • Drop the RunConfig and use a expanded Config.
  • Move the loading of the values from /etc/elemental/config.yaml to the cloud-config loader. Deprecate loading from /etc/elemental/config.yaml
  • Load from the collector the values and dump them into a Config object (expanded)
  • Use that Config everywhere. Rely on it for all the stuff used in the lifecycle of the agent (so log with cfg.Logger for example)
  • On Config.Sanitize() or init or whatever, add the default values to empty values.

So basically, your cloud config would for example look like a merge of the elemental config and our current cloud-config:

install:
  auto: true
  reboot: true
  device: /dev/vda
  grub-entry-name: "Kairos"
  system:
    size: 3000
  grub_options:
    extra_cmdline: "rd.immucore.debug foobarzz"

stages:
  initramfs:
    - name: "Set user and password"
      users:
        kairos:
          passwd: "kairos"
      hostname: kairos-{{ trunc 4 .Random }}

And that would generate a Config object that had Config.Auto: true, Config.Reboot: true, etc...

l.Error(out)
return err
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: Let's try an installation without custom partitions with this fix to see if this breaks something. @Itxaka has a hunch...

@jimmykarily
Copy link
Contributor Author

jimmykarily commented May 29, 2023

I run a kairos-agent manual-install with no custom partitions, using a custom build of kairos-agent from this branch and I got this error:

INFO[2023-05-29T15:12:06Z] Copying /run/cos/state/cOS/active.img source... 
INFO[2023-05-29T15:12:09Z] Finished copying /run/cos/state/cOS/active.img into /run/cos/state/cOS/passive.img 
INFO[2023-05-29T15:12:10Z] Running after-install hook                   
ERRO[2023-05-29T15:12:10Z] mount: /tmp/mnt/STATE: /dev/vda4 already mounted on /run/cos/state.
 : failed to run STATEDIR=/tmp/mnt/STATE
STATE=$(blkid -L COS_STATE || true)
mkdir -p $STATEDIR || true
mount ${STATE} $STATEDIR
: exit status 32 
ERRO[2023-05-29T15:12:10Z] 1 error occurred:
	* failed to run STATEDIR=/tmp/mnt/STATE
STATE=$(blkid -L COS_STATE || true)
mkdir -p $STATEDIR || true
mount ${STATE} $STATEDIR
: exit status 32
 
ERRO[2023-05-29T15:12:10Z] umount: /tmp/mnt/STATE: not mounted.
 : failed to run umount /tmp/mnt/STATE
: exit status 32 
ERRO[2023-05-29T15:12:10Z] 1 error occurred:
	* failed to run umount /tmp/mnt/STATE
: exit status 32
 
INFO[2023-05-29T15:12:10Z] Unmounting disk partitions                   
INFO[2023-05-29T15:12:10Z] kairos-agent version v2.1.0-rc4-3-g1a2174d-dirty 
INFO[2023-05-29T15:12:10Z] Running stage: kairos-install.after.before   
INFO[2023-05-29T15:12:10Z] Done executing stage 'kairos-install.after.before' 
INFO[2023-05-29T15:12:10Z] Running stage: kairos-install.after          
INFO[2023-05-29T15:12:10Z] Done executing stage 'kairos-install.after'  
INFO[2023-05-29T15:12:10Z] Running stage: kairos-install.after.after    
INFO[2023-05-29T15:12:10Z] Done executing stage 'kairos-install.after.after' 
INFO[2023-05-29T15:12:10Z] Running stage: kairos-install.after.before   
INFO[2023-05-29T15:12:10Z] Done executing stage 'kairos-install.after.before' 
INFO[2023-05-29T15:12:10Z] Running stage: kairos-install.after          
INFO[2023-05-29T15:12:10Z] Done executing stage 'kairos-install.after'  
INFO[2023-05-29T15:12:10Z] Running stage: kairos-install.after.after    
INFO[2023-05-29T15:12:10Z] Done executing stage 'kairos-install.after.after' 
[   89.083940][ T1778] EXT4-fs (vda5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[   89.090567][ T1783] EXT4-fs (vda2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

@Itxaka is this relevant to what you were afraid would happen? Does it ring any bells?

@Itxaka
Copy link
Member

Itxaka commented May 29, 2023

Not that I can remember. That looks like the boot assessment after-install run

@jimmykarily
Copy link
Contributor Author

hmm I tried again with the full config and I get the same error when I use custom partitions:

ERRO[2023-05-29T15:21:31Z] umount: /tmp/mnt/STATE: not mounted.
 : failed to run umount /tmp/mnt/STATE
: exit status 32 
ERRO[2023-05-29T15:21:31Z] 1 error occurred:
	* failed to run umount /tmp/mnt/STATE
: exit status 32
 
INFO[2023-05-29T15:21:31Z] Unmounting disk partitions   

I don't know how this error popped up. It wasn't there on Friday, I'm pretty sure.

@jimmykarily
Copy link
Contributor Author

jimmykarily commented May 29, 2023

Ignore it all, I thought any "recent" version of kairos core would be ok and used kairos-opensuse-leap-v2.0.2-k3sv1.23.17+k3s1.iso. Turns out it works fine on a build from latest master, both with and without custom partitions. I don't get any errors (probably last week I was testing on a recent build).

…hooks

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
to let the user set this in the cloud-config file instead of
/etc/elemental/config.yaml

See also here:
kairos-io/kairos#209 (comment)

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
- that "return" made it stop after creating the first partition
- the expand was happening before we created the partitions thus,
  our partitions were never expanded

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
@jimmykarily jimmykarily force-pushed the 1225-config-collector-elemental branch from a53d8a0 to 83adcb7 Compare May 31, 2023 12:52
Signed-off-by: Itxaka <itxaka.garcia@spectrocloud.com>
@Itxaka Itxaka force-pushed the 1225-config-collector-elemental branch from d528ff4 to 74bee8f Compare June 1, 2023 13:06
Itxaka and others added 2 commits June 1, 2023 16:16
Signed-off-by: Itxaka <itxaka.garcia@spectrocloud.com>
which should test that this line was removed:

https://github.com/kairos-io/kairos-agent/pull/32/files#diff-9bf4494bcc865684d4aeae7837925b1732855e863067c12a61f6f38e2fc16716L114

Signed-off-by: Itxaka Serrano Garcia <itxaka.garcia@spectrocloud.com>
@jimmykarily jimmykarily force-pushed the 1225-config-collector-elemental branch from b70e46f to 6e7412d Compare June 2, 2023 10:26
Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
@jimmykarily
Copy link
Contributor Author

We created most of the needed tests. The only one missing I think is the one that tests this fix: https://github.com/kairos-io/kairos-agent/pull/32/files#diff-bdceadb6b1ed909e97670e80e5251016ad0826cff05ec4cac9ed83add43cd992R338-R345

which ensures that the users cloud config is merge before we run the hooks. I started writing a test here: https://github.com/kairos-io/kairos-agent/pull/32/files#diff-f26da7c79a4ee9afdb26e2f102d65f35feea2f2d731b3d1bdd4d48349c88e0ebR118 but I find that impossible to mock enough things to get RunInstall to work in tests. It a high level function and doesn't take interface arguments like others so it's not "mockable" in its current form.

Create a new spec from scratch instead of using the helper method, as
some fields still need to be added to the spec before the Sanitize
method is called. Otherwise we will get an error as some things may be
missing like the Target

Signed-off-by: Itxaka <itxaka.garcia@spectrocloud.com>
Signed-off-by: Itxaka <itxaka.garcia@spectrocloud.com>
@Itxaka Itxaka marked this pull request as ready for review June 7, 2023 08:10
Signed-off-by: Itxaka <itxaka.garcia@spectrocloud.com>
@Itxaka Itxaka force-pushed the 1225-config-collector-elemental branch from eb83d4e to b4cb70a Compare June 7, 2023 08:11
@Itxaka Itxaka requested a review from mauromorales June 7, 2023 08:12
pkg/config/config.go Outdated Show resolved Hide resolved
Co-authored-by: Mauro Morales <mauro.morales@spectrocloud.com>
Copy link
Member

@mauromorales mauromorales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: Itxaka <itxaka.garcia@spectrocloud.com>
@codecov-commenter
Copy link

Codecov Report

Merging #32 (7113a8e) into main (1e1638f) will decrease coverage by 0.16%.
The diff coverage is 23.07%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main      #32      +/-   ##
==========================================
- Coverage   57.90%   57.74%   -0.16%     
==========================================
  Files          42       42              
  Lines        5093     5110      +17     
==========================================
+ Hits         2949     2951       +2     
- Misses       1895     1910      +15     
  Partials      249      249              
Impacted Files Coverage Δ
internal/agent/install.go 4.71% <0.00%> (-0.07%) ⬇️
internal/agent/interactive_install.go 0.00% <0.00%> (ø)
pkg/config/config.go 0.00% <ø> (ø)
pkg/types/v1/config.go 78.27% <ø> (ø)
pkg/partitioner/disk.go 55.92% <42.85%> (-0.61%) ⬇️
pkg/cloudinit/layout_plugin.go 86.56% <100.00%> (-0.20%) ⬇️
pkg/elemental/elemental.go 82.25% <100.00%> (+0.08%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Itxaka Itxaka merged commit a113147 into main Jun 7, 2023
@Itxaka Itxaka deleted the 1225-config-collector-elemental branch June 7, 2023 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

✨ Use config/collector on kairos-agent ex-elemental commands
4 participants