-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(WSL): Add support for Ubuntu Pro configs #5116
Conversation
Previously dscheck would fail due to the for loop not properly expanding. This change removes the quotes around $@ and corrects some spelling mistakes to fix that.
ds-identify changes in the previous commits broke its tests, so this corrects those tests and accounts for the new ds-identify which considers the new directory locations. This also breaks up ds-identify into more functions to allow for better testing. Co-authored-by: Carlos Nihelton <carlosnsoliveira@gmail.com>
Previously, configuration files were assumed to be YAML only. This change allows for non-YAML files to be loaded by just setting userdata_raw to a list of binary data instead of attempting to parse each config manually. This has the consequence of not being able to control how merging works for those files.
Breaks up yaml/binary loading into a function for better reusability. Also adds more context to the usage of a list instead of manually merging dicts from YAML.
Previously if a ~/.cloud-init dir was not found then we'd improperly error out. Since we've added Ubuntu Pro configs, that's not always a bad thing so this change allows for no ~/.cloud-init to be found and userdata will still attempt to load from other locations.
Reorder test imports Fix variable types and formatting
The main precedence test and others unittests were broken due to previous changes. This change fixes those tests somewhat, but a commit after this will permanently fix the problems.
Since we do actually need $@ to expand to properly handle the directories passed in by the caller, we can safetly disable shellcheck here. Co-authored-by: Carlos Nihelton <carlosnsoliveira@gmail.com>
Adds more incremental tests to userdata loading to account for different machine configurations, including the new cloud-init configs provided by Ubuntu Pro.
Minor refactor and renaming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloudinit/sources/DataSourceWSL.py
Outdated
def landscape_file_name(instance_name) -> str: | ||
""" | ||
Return the Landscape configuration name. | ||
""" | ||
return "%s.user-data" % instance_name | ||
|
||
|
||
def agent_file_name() -> str: | ||
""" | ||
Return the Pro agent configuration name. | ||
""" | ||
return "agent.yaml" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why define landscape_file_name
/agent_file_name
functions here and pay the cost of a function call instead of just referencing variables?
LANDSCAPE_USERDATA_TMPL = "%s:.user-data"
AGENT_FILE = "agent.yaml"
|
||
win_profile_dir = win_path_2_wsl(home) | ||
seed_dir = os.path.join(win_profile_dir, ".cloud-init") | ||
def cloud_init_data_dir(user_home: PurePath) -> Optional[PurePath]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like cloud_init_data_dir
and ubuntu_pro_data_dir
is a significant wrapper around a fairly straightforward isdir check and log if absent. Do we need two separate functions to do this or can we maybe just rely on the FileNotFoundError in load_yaml_or_bin
to log for us if we pass the full user-data path and react appropriately to the return value of None. We'd have enough logging of absent files to understand what user-data paths were searched and it wouldn't require these extra function definitions.
cloudinit/sources/DataSourceWSL.py
Outdated
if user_data is None and seed_dir is not None: | ||
# Regular user data | ||
file = self.find_user_data_file(seed_dir) | ||
if os.path.exists(file.as_posix()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect to get here if file doesn't exist? I thought we'd raise IOError if find_user_data_file found no existing file in seed_dir and I presumed we can only return a file path that exists so I don't think we need this test again.
cloudinit/sources/DataSourceWSL.py
Outdated
file = self.find_user_data_file(seed_dir) | ||
if os.path.exists(file.as_posix()): | ||
bin_user_data = util.load_binary_file(file.as_posix()) | ||
user_data = util.load_yaml(bin_user_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user-data could very well not be YAML here but some other base64encoded blob that cloud-init would understand. If user-data isn't YAML but any number of other supported userdata blobs such as #include-url
, you'd get a TypeError raised on the line above which would prevent you from setting user_data in this line. It'd also raise that traceback as your exception handling below doesn't catch TypeError. I think you may want to separate some of this processing into different try/except blocks as earlier exceptions will prevent any user_data processing from happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I addressed this correctly or not, so please let me know if my recent commits were sufficient or not
tools/ds-identify
Outdated
WSL_win2linux_path() { | ||
local val="" | ||
val="$(wslpath -au "$1" )" | ||
_RET="$val" | ||
} | ||
|
||
WSL_linux2win_path() { | ||
local val="" | ||
val="$(wslpath -am "$1" )" | ||
_RET="$val" | ||
} | ||
|
||
WSL_run_cmd() { | ||
local val="" | ||
val=$(/init "$cmdexe" /c "$@" 2>/dev/null) | ||
_RET="$val" | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are defining a number of separate functions here for one-liners that could be used inline at the callsites. I think for both shell execution efficiency and readability of the code we probably don't want that level of encapsulation and 'complexity' for these cases as it doesn't add a whole lot of value to represent the one-liner as a function definition elsewhere in the code. Each of these function definitions end up putting a burden on the call-sites to additionally process the var $_RET after the fact making the call-sites longer too. Is the reason you've defined these separate functions to make mocking easier in our test_ds_identify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the only reason was to make testing more robust.
self.userdata_raw = cast(Any, [user_data, agent_data]) | ||
return True | ||
|
||
# We only care about overriding modules entirely, so we can just |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the use-cases we expect to see that would require full top-level key replacement vs individual leaf value replacement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example might be that both files provides the landscape
top-level key, which themselves contain keys and URLs. We'd only want the entire module from either of the files to preserve the correct credentials. Same for ubuntu_advantage
, etc.
) | ||
return True | ||
except (ValueError, IOError) as err: | ||
LOG.error("Unable to load metadata: %s", str(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way we get to a ValueError or IOError from load_instance_metadata function is if the metadata path exists but the file is not readable for some reason or not structured as a YAML dict which contains an instance-id key right?
It looks like in cases where either the file or parent seed_dir doesn't exist we expect we return the default {"instance-id": DEFAULT_INSTANCE_ID}
which will allow us to happily proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I'm not sure what you are suggesting for a change though, could you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashuntu Thanks for representing failure conditions. If we receive invalid metadata structure (a string instead of YAML dict) or if that metadata on disk doesn't contain an instance-id key we would have raised a ValueError from load_instance_metadata. this raised ValueError would leave self.metadata unset. I think in this case we want to LOG.error as you have done and also return False
because we can't detect WSL datasource with invalid metadata as that may trigger cloud-init to re-run again across the perceived meta-data "instance-id" change causing cloud-init to regenerate SSH host keys, recreate users, reset password etc due to some invalid meta-data format on disk. I think it's better to be noisy about this and return False to avoid cloud-init trying to run in a misconfigured environment.
LOG.error("Unable to load metadata: %s", str(err)) | |
LOG.error("Unable to load metadata: %s", str(err)) | |
return False |
cloudinit/sources/DataSourceWSL.py
Outdated
agent_data = load_agent_data(user_home) | ||
user_data = load_landscape_data(self.instance_name, user_home) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we generate any kind of exception from these 2 functions, we will not try to process and fallback to valid WSL user-data below as we will have ejected from this try block. Is this desired behavior?
It seems load_yaml_or_bin function is resilient enough to return None on FileNotFound or non-YAML binaries. So, maybe we don't expect this to raise unless there is a file permissions error. If so we'd fall through without any user_data defined as _get_data would return False skipping this DataSourceWSL detection.
tools/ds-identify
Outdated
WSL_win2linux_path() { | ||
local val="" | ||
val="$(wslpath -au "$1" )" | ||
_RET="$val" | ||
} | ||
|
||
WSL_linux2win_path() { | ||
local val="" | ||
val="$(wslpath -am "$1" )" | ||
_RET="$val" | ||
} | ||
|
||
WSL_run_cmd() { | ||
local val="" | ||
val=$(/init "$cmdexe" /c "$@" 2>/dev/null) | ||
_RET="$val" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have created a number of very simple functions here that increase complexity of the shell code for what otherwise would be a one-liner inline in other call-sites. This complexity also forces call-sites to process changes to the env var _RET which makes those call-sites busier and harder to read for limited benefit. Is the reason we are doing this to get around test_ds_identify test mocking?
Created a unified load_ubuntu_pro_data to share some of the logic. Also changed functions for getting file names into const variables. _get_data separates some of the logic out of the single try/except to prevent early termination of certain code paths. load_ubuntu_pro_data is robust enough to only fail when it should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ashuntu I think we are getting closer.
A couple of requests:
- better error handling when WSLPATH_CMD doesn't exit to avoid trying detecting the datasource and presenting tracebacks
- drop separate wsl_path_2_win function and inline it inside of instance_name
- drop separate win_path_2_wsl function and inline it inside find_home
- don't perform instance_name in init pre-flight check in get_data for WSLPATH_CMD
- Don't log full user-data when merging as it's sensitive information
- consolidate WSL_path from two separate path functions in ds-identify
Here's a consolidated diff that takes into account most suggestions for your consideration
diff --git a/cloudinit/sources/DataSourceWSL.py b/cloudinit/sources/DataSourceWSL.py
index f0b9e87ea..815296402 100644
--- a/cloudinit/sources/DataSourceWSL.py
+++ b/cloudinit/sources/DataSourceWSL.py
@@ -25,27 +25,18 @@ LANDSCAPE_DATA_FILE = "%s.user-data"
AGENT_DATA_FILE = "agent.yaml"
-def wsl_path_2_win(path: str) -> PurePath:
- """
- Translates a path inside the current WSL instance's filesystem to a
- Windows accessible path.
-
- Example:
- # Running under an instance named "CoolInstance"
- root = wslpath2win("/") # root == "//wsl.localhost/CoolInstance/"
-
- :param path: string representing a Linux path, whether existing or not.
- """
- out, _ = subp.subp([WSLPATH_CMD, "-am", path])
- return PurePath(out.rstrip())
-
-
def instance_name() -> str:
"""
Returns the name of the current WSL instance as seen from outside.
+
+ Obtained by tranlating the root path inside with current WSL instance's
+ filesytsem to a Windows accessible path. and returning the final path
+ element which happens to be the instance name.
+
+ Example: WSLPATH_CMD -qm / == "//wsl.localhost/CoolInstance/"
"""
- root_net_path = wsl_path_2_win("/")
- return root_net_path.name
+ out, _ = subp.subp([WSLPATH_CMD, "-am", "/"])
+ return PurePath(out.rstrip()).name
def mounted_win_drives() -> List[str]:
@@ -64,26 +55,6 @@ def mounted_win_drives() -> List[str]:
return mounted
-def win_path_2_wsl(path: str) -> PurePath:
- """
- Returns a translation of a Windows path to a Linux path that can be
- accessed inside the current instance filesystem.
-
- It requires the Windows drive mounting feature to be enabled and the
- disk drive must be muonted for this to succeed.
-
- Example:
- # Assuming Windows drives are mounted under /mnt/ and "S:" doesn't exist:
- p = winpath2wsl("C:\\ProgramData") # p == "/mnt/c/ProgramData/"
- n = winpath2wsl("S:\\CoolFolder") # Exception! S: is not mounted.
-
- :param path: string representing a Windows path. The root drive must exist,
- although the path is not required to.
- """
- out, _ = subp.subp([WSLPATH_CMD, "-au", path])
- return PurePath(out.rstrip())
-
-
def cmd_executable() -> PurePath:
"""
Returns the Linux path to the Windows host's cmd.exe.
@@ -111,6 +82,10 @@ def cmd_executable() -> PurePath:
def find_home() -> PurePath:
"""
Finds the user's home directory path as a WSL path.
+
+ raises IOError: when windows drives are not mounted or no cmd.exe found
+ raises ProcessExecutionError: when cmd.exe is unable to show profile dir
+ or the profile dir isn't mounted?
"""
cmd = cmd_executable()
@@ -124,7 +99,15 @@ def find_home() -> PurePath:
raise subp.ProcessExecutionError(
"No output from cmd.exe to show the user profile dir."
)
- return win_path_2_wsl(home)
+
+ # Returns a translation of a Windows path to a Linux path that can be
+ # accessed inside the current instance filesystem.
+ # Example:
+ # Assuming Windows drives are mounted under /mnt/ and "S:" doesn't exist:
+ # WSLPATH_CMD -au "C:\\ProgramData" == "/mnt/c/ProgramData/"
+ # WSLPATH_CMD -au "S:\\Something" raises exception S: is not mounted
+ out, _ = subp.subp([WSLPATH_CMD, "-au", home])
+ return PurePath(out.rstrip())
def cloud_init_data_dir(user_home: PurePath) -> Optional[PurePath]:
@@ -235,7 +218,7 @@ class DataSourceWSL(sources.DataSource):
def __init__(self, sys_cfg, distro: Distro, paths: Paths, ud_proc=None):
super().__init__(sys_cfg, distro, paths, ud_proc)
- self.instance_name = instance_name()
+ self.instance_name = None
def find_user_data_file(self, seed_dir: PurePath) -> PurePath:
"""
@@ -287,6 +270,12 @@ class DataSourceWSL(sources.DataSource):
return False
def _get_data(self) -> bool:
+ if not subp.which(WSLPATH_CMD):
+ LOG.debug(
+ "No WSL command %s found. Can not detect WSL datasource",
+ WSLPATH_CMD
+ )
+ return False
self.vendordata_raw = None
user_home = find_home()
seed_dir = cloud_init_data_dir(user_home)
@@ -300,6 +289,7 @@ class DataSourceWSL(sources.DataSource):
)
except (ValueError, IOError) as err:
LOG.error("Unable to load metadata: %s", str(err))
+ return False
# Load Ubuntu Pro configs
agent_data, user_data = load_ubuntu_pro_data(user_home)
@@ -318,7 +308,6 @@ class DataSourceWSL(sources.DataSource):
# No configs were found
if not any([user_data, agent_data]):
- self.userdata_raw = None
return False
# If we cannot reliably model data files as dicts, then we cannot merge
@@ -336,14 +325,21 @@ class DataSourceWSL(sources.DataSource):
# provides them instead.
# That's the reason for not using util.mergemanydict().
merged = {}
+ agent_overrides = []
if user_data:
- for key in user_data:
- merged[key] = user_data[key]
+ merged = user_data
if agent_data:
+ if user_data:
+ LOG.debug("Merging both user_data and agent.yaml configs.")
for key in agent_data:
+ if key in merged:
+ agent_overrides.append(key)
merged[key] = agent_data[key]
-
- LOG.debug("Merged data: %s", merged)
+ if agent_overrides:
+ LOG.debug(
+ " agent.yaml overrides config keys:"
+ f" {', '.join(agent_overrides)}"
+ )
self.userdata_raw = "#cloud-config\n%s" % yaml.dump(merged)
return True
diff --git a/tests/unittests/test_ds_identify.py b/tests/unittests/test_ds_identify.py
index 0e369af89..c8d41d3ba 100644
--- a/tests/unittests/test_ds_identify.py
+++ b/tests/unittests/test_ds_identify.py
@@ -1242,7 +1242,7 @@ class TestWSL(DsIdentifyBase):
# Forcing WSL_linux2win_path to return a path we'll fail to parse
# (missing one / in the begining of the path).
for i, m in enumerate(data["mocks"]):
- if m["name"] == "WSL_linux2win_path":
+ if m["name"] == "WSL_path":
data["mocks"][i]["RET"] = "/wsl.localhost/cant-findme"
cloudinitdir = os.path.join(userprofile, ".cloud-init")
@@ -2416,7 +2416,7 @@ VALID_CFG = {
MOCK_VIRT_IS_WSL,
MOCK_UNAME_IS_WSL,
{
- "name": "WSL_linux2win_path",
+ "name": "WSL_path",
"ret": 0,
"RET": "//wsl.localhost/%s/" % MOCK_WSL_INSTANCE_DATA["name"],
},
@@ -2439,7 +2439,7 @@ VALID_CFG = {
MOCK_VIRT_IS_WSL,
MOCK_UNAME_IS_WSL,
{
- "name": "WSL_linux2win_path",
+ "name": "WSL_path",
"ret": 0,
"RET": "//wsl.localhost/%s/" % MOCK_WSL_INSTANCE_DATA["name"],
},
diff --git a/tools/ds-identify b/tools/ds-identify
index ea18c1d4b..33f085353 100755
--- a/tools/ds-identify
+++ b/tools/ds-identify
@@ -1594,15 +1594,9 @@ dscheck_VMware() {
return "${DS_NOT_FOUND}"
}
-WSL_win2linux_path() {
- local val=""
- val="$(wslpath -au "$1" )"
- _RET="$val"
-}
-
-WSL_linux2win_path() {
- local val=""
- val="$(wslpath -am "$1" )"
+WSL_path() {
+ local params="$1" path="$2" val=""
+ val="$(wslpath "$params" "$1" )"
_RET="$val"
}
@@ -1629,7 +1623,7 @@ WSL_profile_dir() {
# wslpath is a program supplied by WSL itself that translates Windows and Linux paths,
# respecting the mountpoints where the Windows drives are mounted.
# (in fact it's a symlink to /init).
- WSL_win2linux_path "$profiledir"
+ WSL_path "-au" "$profiledir"
return $?
fi
fi
@@ -1640,7 +1634,7 @@ WSL_profile_dir() {
WSL_instance_name() {
local val="" instance_name=""
- WSL_linux2win_path "/"
+ WSL_path "-au" "/"
instance_name="${_RET}"
# Extracts "Ubuntu/" from "//wsl.localhost/Ubuntu/"
val="${instance_name#//*/}"
""" | ||
Returns the Windows user profile directory translated as a Linux path | ||
accessible inside the current WSL instance. | ||
Finds the user's home directory path as a WSL path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document the conditions where we think we'll raise ProcessExecutionError from this function?
Finds the user's home directory path as a WSL path. | |
Finds the user's home directory path as a WSL path. | |
raises: IOError when no mountpoint with cmd.exe found | |
ProcessExecutionError when either cmd.exe is unable to show proflie dir or | |
the profile drive is not mounted? |
cloudinit/sources/DataSourceWSL.py
Outdated
@@ -119,11 +124,19 @@ def cloud_init_data_dir() -> PurePath: | |||
raise subp.ProcessExecutionError( | |||
"No output from cmd.exe to show the user profile dir." | |||
) | |||
return win_path_2_wsl(home) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's collapse this win_path_2_wsl and linline it in this function.
return win_path_2_wsl(home) | |
# Returns a translation of a Windows path to a Linux path that can be | |
# accessed inside the current instance filesystem. | |
# Example: | |
# Assuming Windows drives are mounted under /mnt/ and "S:" doesn't exist: | |
# WSLPATH_CMD -au "C:\\ProgramData" == "/mnt/c/ProgramData/" | |
# WSLPATH_CMD -au "S:\\Something" raises exception S: is not mounted | |
out, _ = subp.subp([WSLPATH_CMD, "-au", home]) | |
return PurePath(out.rstrip()) |
) | ||
return True | ||
except (ValueError, IOError) as err: | ||
LOG.error("Unable to load metadata: %s", str(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashuntu Thanks for representing failure conditions. If we receive invalid metadata structure (a string instead of YAML dict) or if that metadata on disk doesn't contain an instance-id key we would have raised a ValueError from load_instance_metadata. this raised ValueError would leave self.metadata unset. I think in this case we want to LOG.error as you have done and also return False
because we can't detect WSL datasource with invalid metadata as that may trigger cloud-init to re-run again across the perceived meta-data "instance-id" change causing cloud-init to regenerate SSH host keys, recreate users, reset password etc due to some invalid meta-data format on disk. I think it's better to be noisy about this and return False to avoid cloud-init trying to run in a misconfigured environment.
LOG.error("Unable to load metadata: %s", str(err)) | |
LOG.error("Unable to load metadata: %s", str(err)) | |
return False |
cloudinit/sources/DataSourceWSL.py
Outdated
@@ -238,22 +288,65 @@ def check_instance_id(self, sys_cfg) -> bool: | |||
|
|||
def _get_data(self) -> bool: | |||
self.vendordata_raw = None | |||
seed_dir = cloud_init_data_dir() | |||
user_home = find_home() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the environment runs into issues below find_home
will raise an IOError right?
- the correct mounted disks
- or finding a viable cmd.exe in mounted disks
- being unable to determine the user profile dir find_home
In any of the above conditions, ds-identify
would have rejected this platform as DS_NOT_FOUND, and I think we need to LOG.debug("Unable to detect WSL datasource: %s", str(e)) and return False in this case to align w/ ds-identify discovery as well.
user_home = find_home() | |
try: | |
user_home = find_home() | |
except IOError as e: | |
LOG.debug("Unable to detect WSL datasource: %s", e) | |
return False |
cloudinit/sources/DataSourceWSL.py
Outdated
for key in agent_data: | ||
merged[key] = agent_data[key] | ||
|
||
LOG.debug("Merged data: %s", merged) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also don't log full user-data to logs as it typically contains sensitive infomraiton like passwords. The less places that is logged the better. Maybe only log that we have "Merged config data from both user_data and agent.yaml" and maybe list the keys overridden by agent.yaml.
then the first user-provided configuration will be used in its place. | ||
|
||
2. ``%USERPROFILE%\.ubuntupro\.cloud-init\agent.yaml`` holds data provided by | ||
the Ubuntu Pro for WSL agent. If this file is present, it's modules will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
the Ubuntu Pro for WSL agent. If this file is present, it's modules will be | |
the Ubuntu Pro for WSL agent. If this file is present, its modules will be |
cloudinit/sources/DataSourceWSL.py
Outdated
|
||
# No configs were found | ||
if not any([user_data, agent_data]): | ||
self.userdata_raw = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to unset userdata_raw, a return False
prevents cloud-init from trying to pickle or preseve this class instance and/or attributes.
cloudinit/sources/DataSourceWSL.py
Outdated
DEFAULT_INSTANCE_ID = "iid-datasource-wsl" | ||
LANDSCAPE_DATA_FILE = "%s.user-data" | ||
AGENT_DATA_FILE = "agent.yaml" | ||
|
||
|
||
def wsl_path_2_win(path: str) -> PurePath: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can drop wsl_path_2_win function and just call the content from within instance_name
as that's a tiny function and it only has one call-site.
tools/ds-identify
Outdated
WSL_win2linux_path() { | ||
local val="" | ||
val="$(wslpath -au "$1" )" | ||
_RET="$val" | ||
} | ||
|
||
WSL_linux2win_path() { | ||
local val="" | ||
val="$(wslpath -am "$1" )" | ||
_RET="$val" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these functions are primarily to ease mocking and are near identical, let's refactor them to one function that accepts two positional params
WSL_win2linux_path() { | |
local val="" | |
val="$(wslpath -au "$1" )" | |
_RET="$val" | |
} | |
WSL_linux2win_path() { | |
local val="" | |
val="$(wslpath -am "$1" )" | |
_RET="$val" | |
} | |
WSL_path() { | |
local params="$1" path="$2" val="" | |
val="$(wslpath "$params" "$1" )" | |
_RET="$val" | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @ashuntu.
@ashuntu and @CarlosNihelton
The only significant questions I have outstanding now are the following:
- Given that
.ubuntu-pro/.cloud-init/agent.yaml
is not instance name specific, do we expect that this agent.yaml is shared and common amongst all instances on a system? - It looks like we have logic that expects agent.yaml may be bytes instead ofYAML. If non-YAML is expected, maybe this filename should be something else?
.cloud-init/<instance_name>.meta-data
contains aninstance-id
value, that should change if we expect to apply new configuration Given that self.metadata is not being updated due to files in the .ubuntupro subdirectory, is there a mechanism through ubuntu-pro-for-wsl that will update the.cloud-init/<instance_name>.meta-data
keyinstance-id
with a new UUID when new config is provided via UP4W? Otherwise, any updates to the .ubuntupro files will be ignored by any WSL instance which has already been launched because cloud-init only re-runs config when instance-id changes.
# If we cannot reliably model data files as dicts, then we cannot merge | ||
# ourselves, so we can pass the data in ascending order as a list for | ||
# cloud-init to handle internally | ||
if isinstance(agent_data, bytes) or isinstance(user_data, bytes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would we expect agent-data.yaml file to by bytes and not YAML? If we expect this case, shouldn't this be named just agent-data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From cloud-init's point of view, is it better to be strict with what we know from the UP4W implementation (which our team controls and do enforce YAML as of now) or let the datasource be more flexible so that changes in UP4W implementation won't affect the datasource as much?
The question sounds silly in principle, but there is a layer of trust here: if we know the agent only writes YAML, ingesting any other format would mean ingesting invalid (thus potentially harmful) config data. I'm in favor of the less future maintenance path, trusting that the user has full control over the directories involved and apply the relevant protection to their systems. I hope I'm not too positive about this 😄
|
||
1. ``%USERPROFILE%\.ubuntupro\.cloud-init\<InstanceName>.user-data`` holds data | ||
provided by Landscape to configure a specific WSL instance. If this file | ||
is present, normal user-provided configurations are not looked for. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is present, normal user-provided configurations are not looked for. This | |
is present, normal user-provided configurations are ignored. This |
file is merged with (2) on a per-module basis. If this file is not present, | ||
then the first user-provided configuration will be used in its place. | ||
|
||
2. ``%USERPROFILE%\.ubuntupro\.cloud-init\agent.yaml`` holds data provided by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This agent.yaml file is not <InstanceName>
specific, it will therefore be consumed by default and override any conflicting keys in the user-data for all instances, is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, when ubuntu-pro-for-wsl writes this agent.yaml, will it also update any .cloud-init/<InstanceName>.meta-data
to ensure that expected instances get this agent.yaml update? Otherwise cloud-init will not re-run on a WSL instance that is formerly launched because it will see an unchanged metadata["instance-id"] and assume it needs to remain inert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Landscape job is (or will eventually be) to send cloud-init user-data to the UP4W agent as part of an Install command, which implies we won't get updates, but rather uninstall/install requests coming with the relevant config data. The API via which both components talk to each other already supports that use-case, but it lacks implementation on Landscape side. So, no updates so far in the horizon.
First, configurations from Ubuntu Pro/Landscape are checked for in the | ||
following paths: | ||
|
||
1. ``%USERPROFILE%\.ubuntupro\.cloud-init\<InstanceName>.user-data`` holds data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If landscape updates %USERPROFILE%\.ubuntupro\.cloud-init\<InstanceName>.user-data
will UP4W update %USERPROFILE%\.cloud-init/<InstanceName>.meta-data
with a instance-id: <new_UUID>
to trigger cloud-init to re-run on next WSL instance start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely can, but it's not scoped at this moment. It's as easy for WSL users to just wipe the instance and start from a clean default, that metadata.instance-id
might end up never been a feature for this data source. We need real users to probe that assumption.
Thanks for the reviews so far, @blackboxsw ! Answers below:
It's shared and holds information controlled by the Ubuntu Pro for WSL agent, which is globally applied to instances of any Ubuntu release.
We (UP4W) are in control of that file and it's currently YAML.
We don't (currently maybe) support that feature at the moment. We want to land something minimally viable so we can collect user feedback to guide the next developments in that area. Limiting scope is a must at this moment. |
cloudinit/sources/DataSourceWSL.py
Outdated
# Load Ubuntu Pro configs | ||
agent_data, user_data = load_ubuntu_pro_data(user_home) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized we don't check if the distro is Ubuntu before doing this. Sorry for not raising that before, but we should just skip this step if the distro is not Ubuntu, as it contains config data that's not relevant to other distros, such as an Ubuntu Pro subscription token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code and changes look good. This appears to come down to unittest issue given the new calls.
I've pushed a commit that refactors the unittests to use a pytest fixture for common setup and mocks the new subp.which call as well as setting up the distro as ubuntu passed into each unittest.
If tests succeed here, I'll squash merge your changeset. Thank you!
diff --git a/tests/unittests/sources/test_wsl.py b/tests/unittests/sources/test_wsl.py
index dcd31984b..31c5c897e 100644
--- a/tests/unittests/sources/test_wsl.py
+++ b/tests/unittests/sources/test_wsl.py
@@ -14,6 +14,7 @@ import pytest
from cloudinit import util
from cloudinit.sources import DataSourceWSL as wsl
+from tests.unittests.distros import _get_distro
from tests.unittests.helpers import does_not_raise, mock
INSTANCE_NAME = "Noble-MLKit"
@@ -251,34 +252,41 @@ def join_payloads_from_content_type(
class TestWSLDataSource:
- @mock.patch("cloudinit.sources.DataSourceWSL.instance_name")
- @mock.patch("cloudinit.sources.DataSourceWSL.find_home")
- def test_metadata_id_default(self, m_home_dir, m_iname, tmpdir, paths):
+ @pytest.fixture(autouse=True)
+ def setup(self, mocker, tmpdir):
+ mocker.patch(
+ "cloudinit.sources.DataSourceWSL.instance_name",
+ return_value=INSTANCE_NAME,
+ )
+ mocker.patch(
+ "cloudinit.sources.DataSourceWSL.find_home",
+ return_value=PurePath(tmpdir),
+ )
+ mocker.patch(
+ "cloudinit.sources.DataSourceWSL.subp.which",
+ return_value="/usr/bin/wslpath",
+ )
+
+ def test_metadata_id_default(self, tmpdir, paths):
"""
Validates that instance-id is properly set, indepedent of the existence
of user-data.
"""
- m_iname.return_value = INSTANCE_NAME
- m_home_dir.return_value = PurePath(tmpdir)
ds = wsl.DataSourceWSL(
sys_cfg=SAMPLE_CFG,
- distro=None,
+ distro=_get_distro("ubuntu"),
paths=paths,
)
ds.get_data()
assert ds.get_instance_id() == wsl.DEFAULT_INSTANCE_ID
- @mock.patch("cloudinit.sources.DataSourceWSL.instance_name")
- @mock.patch("cloudinit.sources.DataSourceWSL.find_home")
- def test_metadata_id(self, m_home_dir, m_iname, tmpdir, paths):
+ def test_metadata_id(self, tmpdir, paths):
"""
Validates that instance-id is properly set, indepedent of the existence
of user-data.
"""
- m_iname.return_value = INSTANCE_NAME
- m_home_dir.return_value = PurePath(tmpdir)
SAMPLE_ID = "Nice-ID"
metadata_path = tmpdir.join(
".cloud-init", f"{INSTANCE_NAME}.meta-data"
@@ -290,7 +298,7 @@ class TestWSLDataSource:
ds = wsl.DataSourceWSL(
sys_cfg=SAMPLE_CFG,
- distro=None,
+ distro=_get_distro("ubuntu"),
paths=paths,
)
ds.get_data()
@@ -298,19 +306,15 @@ class TestWSLDataSource:
assert ds.get_instance_id() == SAMPLE_ID
@mock.patch("cloudinit.util.lsb_release")
- @mock.patch("cloudinit.sources.DataSourceWSL.instance_name")
- @mock.patch("cloudinit.sources.DataSourceWSL.find_home")
- def test_get_data_cc(self, m_home_dir, m_iname, m_gld, paths, tmpdir):
- m_gld.return_value = SAMPLE_LINUX_DISTRO
- m_iname.return_value = INSTANCE_NAME
- m_home_dir.return_value = PurePath(tmpdir)
+ def test_get_data_cc(self, m_lsb_release, paths, tmpdir):
+ m_lsb_release.return_value = SAMPLE_LINUX_DISTRO
data_path = tmpdir.join(".cloud-init", f"{INSTANCE_NAME}.user-data")
data_path.dirpath().mkdir()
data_path.write("#cloud-config\nwrite_files:\n- path: /etc/wsl.conf")
ds = wsl.DataSourceWSL(
sys_cfg=SAMPLE_CFG,
- distro=None,
+ distro=_get_distro("ubuntu"),
paths=paths,
)
@@ -325,19 +329,15 @@ class TestWSLDataSource:
assert "wsl.conf" in cast(str, userdata)
@mock.patch("cloudinit.util.lsb_release")
- @mock.patch("cloudinit.sources.DataSourceWSL.instance_name")
- @mock.patch("cloudinit.sources.DataSourceWSL.find_home")
- def test_get_data_sh(self, m_home_dir, m_iname, m_gld, tmpdir, paths):
- m_gld.return_value = SAMPLE_LINUX_DISTRO
- m_iname.return_value = INSTANCE_NAME
- m_home_dir.return_value = PurePath(tmpdir)
+ def test_get_data_sh(self, m_lsb_release, tmpdir, paths):
+ m_lsb_release.return_value = SAMPLE_LINUX_DISTRO
COMMAND = "echo Hello cloud-init on WSL!"
data_path = tmpdir.join(".cloud-init", f"{INSTANCE_NAME}.user-data")
data_path.dirpath().mkdir()
data_path.write(f"#!/bin/sh\n{COMMAND}\n")
ds = wsl.DataSourceWSL(
sys_cfg=SAMPLE_CFG,
- distro=None,
+ distro=_get_distro("ubuntu"),
paths=paths,
)
@@ -354,14 +354,8 @@ class TestWSLDataSource:
assert COMMAND in userdata
@mock.patch("cloudinit.util.get_linux_distro")
- @mock.patch("cloudinit.sources.DataSourceWSL.instance_name")
- @mock.patch("cloudinit.sources.DataSourceWSL.find_home")
- def test_data_precedence(
- self, m_home_dir, m_instance, m_distro, tmpdir, paths
- ):
- m_distro.return_value = SAMPLE_LINUX_DISTRO
- m_instance.return_value = INSTANCE_NAME
- m_home_dir.return_value = PurePath(tmpdir)
+ def test_data_precedence(self, m_get_linux_dist, tmpdir, paths):
+ m_get_linux_dist.return_value = SAMPLE_LINUX_DISTRO
# Set up basic user data:
@@ -380,7 +374,7 @@ class TestWSLDataSource:
# Run the datasource
ds = wsl.DataSourceWSL(
sys_cfg=SAMPLE_CFG,
- distro=None,
+ distro=_get_distro("ubuntu"),
paths=paths,
)
@@ -425,7 +419,7 @@ ubuntu_advantage:
# Run the datasource
ds = wsl.DataSourceWSL(
sys_cfg=SAMPLE_CFG,
- distro=None,
+ distro=_get_distro("ubuntu"),
paths=paths,
)
@@ -459,7 +453,7 @@ package_update: true"""
# Run the datasource
ds = wsl.DataSourceWSL(
sys_cfg=SAMPLE_CFG,
- distro=None,
+ distro=_get_distro("ubuntu"),
paths=paths,
)
cloudinit/sources/DataSourceWSL.py
Outdated
@@ -296,7 +296,7 @@ def _get_data(self) -> bool: | |||
return False | |||
|
|||
# # Load Ubuntu Pro configs only on Ubuntu distros | |||
if platform.freedesktop_os_release().get("NAME") == "Ubuntu": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good given that freedesktop_os_release isn't avail until python 3.10 and cloud-init currently supports 3.7 environments still for a bit.
- add missing subp.which mock of WSLPATH_CMD - provide non-empty distro to all DataSourceWSL init - consolidate common mocks in autouse pytest fixture
Validated on Ubuntu-Preview clean boot with updated ds-identify and DataSourceWSL with and without .ubuntupro config dir. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. thanks for limiting behavior of .ubuntupro subdir to ubuntu only. Changest looks good and behavior well on live WSL instances.
Proposed Commit Message
Additional Context
Currently, the WSL datasource can only work with user-provided configurations manually created in the host Windows filesystem. This change allows organizations to provide configurations via Landscape/Ubuntu Pro and have them loaded into WSL instances. These new configurations will be created under
%USERPROFILE%/.ubuntupro/.cloud-init
. This change will require ubuntu-pro-for-wsl to implement these files after this is merged, as already created in canonical/ubuntu-pro-for-wsl#506.Test Steps
As described in the updated docs and the spec sheet for this change, you can test this by manually creating the necessary files on your Windows host system. If an
agent.yaml
file is found, it should properly merge (overriding any conflicts) on a per-module basis with either a Landscape config (in%USERPROFILE%/.ubuntupro/.cloud-init/<InstanceName>.user-data
) or a user-data found by the previous implementation as described in #4786.ds-check should also verify the environment by using any one of the files described above.
Checklist
Merge type
Edit:
Force pushed to add more context to some commits
UDENG-2381