feat(WSL): Add support for Ubuntu Pro configs #5116

ashuntu · 2024-03-28T21:14:58Z

Proposed Commit Message

feat(WSL): Add support for Ubuntu Pro configs

Currently, WSL only supports manual cloud-init configurations provided
in the host Windows filesystem. This adds support for Landscape/Ubuntu
Pro for WSL to provide cloud-init configurations and have them merged 
with or override manual user configurations. This adds support for 
organizations to better provision WSL instances using cloud-init.

Additional Context

Currently, the WSL datasource can only work with user-provided configurations manually created in the host Windows filesystem. This change allows organizations to provide configurations via Landscape/Ubuntu Pro and have them loaded into WSL instances. These new configurations will be created under %USERPROFILE%/.ubuntupro/.cloud-init. This change will require ubuntu-pro-for-wsl to implement these files after this is merged, as already created in canonical/ubuntu-pro-for-wsl#506.

Test Steps

As described in the updated docs and the spec sheet for this change, you can test this by manually creating the necessary files on your Windows host system. If an agent.yaml file is found, it should properly merge (overriding any conflicts) on a per-module basis with either a Landscape config (in %USERPROFILE%/.ubuntupro/.cloud-init/<InstanceName>.user-data) or a user-data found by the previous implementation as described in #4786.

ds-check should also verify the environment by using any one of the files described above.

Checklist

My code follows the process laid out in the documentation
I have updated or added any unit tests accordingly
I have updated or added any documentation accordingly

Merge type

Squash merge using "Proposed Commit Message"
Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

Edit:
Force pushed to add more context to some commits

UDENG-2381

Previously dscheck would fail due to the for loop not properly expanding. This change removes the quotes around $@ and corrects some spelling mistakes to fix that.

ds-identify changes in the previous commits broke its tests, so this corrects those tests and accounts for the new ds-identify which considers the new directory locations. This also breaks up ds-identify into more functions to allow for better testing. Co-authored-by: Carlos Nihelton <carlosnsoliveira@gmail.com>

Previously, configuration files were assumed to be YAML only. This change allows for non-YAML files to be loaded by just setting userdata_raw to a list of binary data instead of attempting to parse each config manually. This has the consequence of not being able to control how merging works for those files.

Breaks up yaml/binary loading into a function for better reusability. Also adds more context to the usage of a list instead of manually merging dicts from YAML.

Previously if a ~/.cloud-init dir was not found then we'd improperly error out. Since we've added Ubuntu Pro configs, that's not always a bad thing so this change allows for no ~/.cloud-init to be found and userdata will still attempt to load from other locations.

Reorder test imports Fix variable types and formatting

The main precedence test and others unittests were broken due to previous changes. This change fixes those tests somewhat, but a commit after this will permanently fix the problems.

Since we do actually need $@ to expand to properly handle the directories passed in by the caller, we can safetly disable shellcheck here. Co-authored-by: Carlos Nihelton <carlosnsoliveira@gmail.com>

Adds more incremental tests to userdata loading to account for different machine configurations, including the new cloud-init configs provided by Ubuntu Pro.

Minor refactor and renaming

blackboxsw

Thanks @ashuntu for working through this and your patience. Here's a first pass on this PR and I'll wrap up review a bit later today. Also cc @holmanb on some of this ds-identify work to get some extra eyes on the shell-isms we are dealing with here.

blackboxsw · 2024-04-12T04:31:17Z

cloudinit/sources/DataSourceWSL.py

+def landscape_file_name(instance_name) -> str:
+    """
+    Return the Landscape configuration name.
+    """
+    return "%s.user-data" % instance_name
+
+
+def agent_file_name() -> str:
+    """
+    Return the Pro agent configuration name.
+    """
+    return "agent.yaml"
+


Why define landscape_file_name/agent_file_name functions here and pay the cost of a function call instead of just referencing variables?
LANDSCAPE_USERDATA_TMPL = "%s:.user-data"
AGENT_FILE = "agent.yaml"

blackboxsw · 2024-04-12T05:09:37Z

cloudinit/sources/DataSourceWSL.py


-    win_profile_dir = win_path_2_wsl(home)
-    seed_dir = os.path.join(win_profile_dir, ".cloud-init")
+def cloud_init_data_dir(user_home: PurePath) -> Optional[PurePath]:


It feels like cloud_init_data_dir and ubuntu_pro_data_dir is a significant wrapper around a fairly straightforward isdir check and log if absent. Do we need two separate functions to do this or can we maybe just rely on the FileNotFoundError in load_yaml_or_bin to log for us if we pass the full user-data path and react appropriately to the return value of None. We'd have enough logging of absent files to understand what user-data paths were searched and it wouldn't require these extra function definitions.

blackboxsw · 2024-04-12T05:16:58Z

cloudinit/sources/DataSourceWSL.py

+            if user_data is None and seed_dir is not None:
+                # Regular user data
+                file = self.find_user_data_file(seed_dir)
+                if os.path.exists(file.as_posix()):


Do we expect to get here if file doesn't exist? I thought we'd raise IOError if find_user_data_file found no existing file in seed_dir and I presumed we can only return a file path that exists so I don't think we need this test again.

blackboxsw · 2024-04-12T05:20:36Z

cloudinit/sources/DataSourceWSL.py

+                file = self.find_user_data_file(seed_dir)
+                if os.path.exists(file.as_posix()):
+                    bin_user_data = util.load_binary_file(file.as_posix())
+                    user_data = util.load_yaml(bin_user_data)


user-data could very well not be YAML here but some other base64encoded blob that cloud-init would understand. If user-data isn't YAML but any number of other supported userdata blobs such as #include-url, you'd get a TypeError raised on the line above which would prevent you from setting user_data in this line. It'd also raise that traceback as your exception handling below doesn't catch TypeError. I think you may want to separate some of this processing into different try/except blocks as earlier exceptions will prevent any user_data processing from happening.

I'm not sure if I addressed this correctly or not, so please let me know if my recent commits were sufficient or not

blackboxsw · 2024-04-12T18:15:40Z

tools/ds-identify

+WSL_win2linux_path() {
+    local val=""
+    val="$(wslpath -au "$1" )"
+    _RET="$val"
+}
+
+WSL_linux2win_path() {
+    local val=""
+    val="$(wslpath -am "$1" )"
+    _RET="$val"
+}
+
+WSL_run_cmd() {
+    local val=""
+    val=$(/init "$cmdexe" /c "$@" 2>/dev/null)
+    _RET="$val"
+}
+


We are defining a number of separate functions here for one-liners that could be used inline at the callsites. I think for both shell execution efficiency and readability of the code we probably don't want that level of encapsulation and 'complexity' for these cases as it doesn't add a whole lot of value to represent the one-liner as a function definition elsewhere in the code. Each of these function definitions end up putting a burden on the call-sites to additionally process the var $_RET after the fact making the call-sites longer too. Is the reason you've defined these separate functions to make mocking easier in our test_ds_identify?

Yes, the only reason was to make testing more robust.

blackboxsw · 2024-04-12T19:38:48Z

cloudinit/sources/DataSourceWSL.py

+            self.userdata_raw = cast(Any, [user_data, agent_data])
+            return True
+
+        # We only care about overriding modules entirely, so we can just


What are the use-cases we expect to see that would require full top-level key replacement vs individual leaf value replacement?

An example might be that both files provides the landscape top-level key, which themselves contain keys and URLs. We'd only want the entire module from either of the files to preserve the correct credentials. Same for ubuntu_advantage, etc.

blackboxsw · 2024-04-12T19:50:10Z

cloudinit/sources/DataSourceWSL.py

-            )
-            return True
+        except (ValueError, IOError) as err:
+            LOG.error("Unable to load metadata: %s", str(err))


The only way we get to a ValueError or IOError from load_instance_metadata function is if the metadata path exists but the file is not readable for some reason or not structured as a YAML dict which contains an instance-id key right?

It looks like in cases where either the file or parent seed_dir doesn't exist we expect we return the default {"instance-id": DEFAULT_INSTANCE_ID} which will allow us to happily proceed.

Yes. I'm not sure what you are suggesting for a change though, could you clarify?

@ashuntu Thanks for representing failure conditions. If we receive invalid metadata structure (a string instead of YAML dict) or if that metadata on disk doesn't contain an instance-id key we would have raised a ValueError from load_instance_metadata. this raised ValueError would leave self.metadata unset. I think in this case we want to LOG.error as you have done and also return False because we can't detect WSL datasource with invalid metadata as that may trigger cloud-init to re-run again across the perceived meta-data "instance-id" change causing cloud-init to regenerate SSH host keys, recreate users, reset password etc due to some invalid meta-data format on disk. I think it's better to be noisy about this and return False to avoid cloud-init trying to run in a misconfigured environment.

Suggested change

LOG.error("Unable to load metadata: %s", str(err))

LOG.error("Unable to load metadata: %s", str(err))

return False

blackboxsw · 2024-04-12T20:01:48Z

cloudinit/sources/DataSourceWSL.py

+            agent_data = load_agent_data(user_home)
+            user_data = load_landscape_data(self.instance_name, user_home)


If we generate any kind of exception from these 2 functions, we will not try to process and fallback to valid WSL user-data below as we will have ejected from this try block. Is this desired behavior?

It seems load_yaml_or_bin function is resilient enough to return None on FileNotFound or non-YAML binaries. So, maybe we don't expect this to raise unless there is a file permissions error. If so we'd fall through without any user_data defined as _get_data would return False skipping this DataSourceWSL detection.

blackboxsw · 2024-04-12T20:12:32Z

tools/ds-identify

+WSL_win2linux_path() {
+    local val=""
+    val="$(wslpath -au "$1" )"
+    _RET="$val"
+}
+
+WSL_linux2win_path() {
+    local val=""
+    val="$(wslpath -am "$1" )"
+    _RET="$val"
+}
+
+WSL_run_cmd() {
+    local val=""
+    val=$(/init "$cmdexe" /c "$@" 2>/dev/null)
+    _RET="$val"
+}


We have created a number of very simple functions here that increase complexity of the shell code for what otherwise would be a one-liner inline in other call-sites. This complexity also forces call-sites to process changes to the env var _RET which makes those call-sites busier and harder to read for limited benefit. Is the reason we are doing this to get around test_ds_identify test mocking?

tools/ds-identify

Created a unified load_ubuntu_pro_data to share some of the logic. Also changed functions for getting file names into const variables. _get_data separates some of the logic out of the single try/except to prevent early termination of certain code paths. load_ubuntu_pro_data is robust enough to only fail when it should.

blackboxsw

Thanks @ashuntu I think we are getting closer.
A couple of requests:

better error handling when WSLPATH_CMD doesn't exit to avoid trying detecting the datasource and presenting tracebacks
drop separate wsl_path_2_win function and inline it inside of instance_name
drop separate win_path_2_wsl function and inline it inside find_home
don't perform instance_name in init pre-flight check in get_data for WSLPATH_CMD
Don't log full user-data when merging as it's sensitive information
consolidate WSL_path from two separate path functions in ds-identify

Here's a consolidated diff that takes into account most suggestions for your consideration

diff --git a/cloudinit/sources/DataSourceWSL.py b/cloudinit/sources/DataSourceWSL.py
index f0b9e87ea..815296402 100644
--- a/cloudinit/sources/DataSourceWSL.py
+++ b/cloudinit/sources/DataSourceWSL.py
@@ -25,27 +25,18 @@ LANDSCAPE_DATA_FILE = "%s.user-data"
 AGENT_DATA_FILE = "agent.yaml"
 
 
-def wsl_path_2_win(path: str) -> PurePath:
-    """
-    Translates a path inside the current WSL instance's filesystem to a
-    Windows accessible path.
-
-    Example:
-    # Running under an instance named "CoolInstance"
-    root = wslpath2win("/") # root == "//wsl.localhost/CoolInstance/"
-
-    :param path: string representing a Linux path, whether existing or not.
-    """
-    out, _ = subp.subp([WSLPATH_CMD, "-am", path])
-    return PurePath(out.rstrip())
-
-
 def instance_name() -> str:
     """
     Returns the name of the current WSL instance as seen from outside.
+
+    Obtained by tranlating the root path inside with current WSL instance's
+    filesytsem to a Windows accessible path. and returning the final path
+    element which happens to be the instance name.
+
+    Example: WSLPATH_CMD -qm / == "//wsl.localhost/CoolInstance/"
     """
-    root_net_path = wsl_path_2_win("/")
-    return root_net_path.name
+    out, _ = subp.subp([WSLPATH_CMD, "-am", "/"])
+    return PurePath(out.rstrip()).name
 
 
 def mounted_win_drives() -> List[str]:
@@ -64,26 +55,6 @@ def mounted_win_drives() -> List[str]:
     return mounted
 
 
-def win_path_2_wsl(path: str) -> PurePath:
-    """
-    Returns a translation of a Windows path to a Linux path that can be
-    accessed inside the current instance filesystem.
-
-    It requires the Windows drive mounting feature to be enabled and the
-    disk drive must be muonted for this to succeed.
-
-    Example:
-    # Assuming Windows drives are mounted under /mnt/ and "S:" doesn't exist:
-    p = winpath2wsl("C:\\ProgramData") # p == "/mnt/c/ProgramData/"
-    n = winpath2wsl("S:\\CoolFolder") # Exception! S: is not mounted.
-
-    :param path: string representing a Windows path. The root drive must exist,
-    although the path is not required to.
-    """
-    out, _ = subp.subp([WSLPATH_CMD, "-au", path])
-    return PurePath(out.rstrip())
-
-
 def cmd_executable() -> PurePath:
     """
     Returns the Linux path to the Windows host's cmd.exe.
@@ -111,6 +82,10 @@ def cmd_executable() -> PurePath:
 def find_home() -> PurePath:
     """
     Finds the user's home directory path as a WSL path.
+
+    raises IOError: when windows drives are not mounted or no cmd.exe found
+    raises ProcessExecutionError: when cmd.exe is unable to show profile dir
+        or the profile dir isn't mounted?
     """
     cmd = cmd_executable()
 
@@ -124,7 +99,15 @@ def find_home() -> PurePath:
         raise subp.ProcessExecutionError(
             "No output from cmd.exe to show the user profile dir."
         )
-    return win_path_2_wsl(home)
+
+    # Returns a translation of a Windows path to a Linux path that can be
+    # accessed inside the current instance filesystem.
+    # Example:
+    # Assuming Windows drives are mounted under /mnt/ and "S:" doesn't exist:
+    # WSLPATH_CMD -au "C:\\ProgramData" == "/mnt/c/ProgramData/"
+    # WSLPATH_CMD -au "S:\\Something" raises exception S: is not mounted
+    out, _ = subp.subp([WSLPATH_CMD, "-au", home])
+    return PurePath(out.rstrip())
 
 
 def cloud_init_data_dir(user_home: PurePath) -> Optional[PurePath]:
@@ -235,7 +218,7 @@ class DataSourceWSL(sources.DataSource):
 
     def __init__(self, sys_cfg, distro: Distro, paths: Paths, ud_proc=None):
         super().__init__(sys_cfg, distro, paths, ud_proc)
-        self.instance_name = instance_name()
+        self.instance_name = None
 
     def find_user_data_file(self, seed_dir: PurePath) -> PurePath:
         """
@@ -287,6 +270,12 @@ class DataSourceWSL(sources.DataSource):
             return False
 
     def _get_data(self) -> bool:
+        if not subp.which(WSLPATH_CMD):
+            LOG.debug(
+                "No WSL command %s found. Can not detect WSL datasource",
+                WSLPATH_CMD
+            )
+            return False
         self.vendordata_raw = None
         user_home = find_home()
         seed_dir = cloud_init_data_dir(user_home)
@@ -300,6 +289,7 @@ class DataSourceWSL(sources.DataSource):
             )
         except (ValueError, IOError) as err:
             LOG.error("Unable to load metadata: %s", str(err))
+            return False
 
         # Load Ubuntu Pro configs
         agent_data, user_data = load_ubuntu_pro_data(user_home)
@@ -318,7 +308,6 @@ class DataSourceWSL(sources.DataSource):
 
         # No configs were found
         if not any([user_data, agent_data]):
-            self.userdata_raw = None
             return False
 
         # If we cannot reliably model data files as dicts, then we cannot merge
@@ -336,14 +325,21 @@ class DataSourceWSL(sources.DataSource):
         # provides them instead.
         # That's the reason for not using util.mergemanydict().
         merged = {}
+        agent_overrides = []
         if user_data:
-            for key in user_data:
-                merged[key] = user_data[key]
+            merged = user_data
         if agent_data:
+            if user_data:
+                LOG.debug("Merging both user_data and agent.yaml configs.")
             for key in agent_data:
+                if key in merged:
+                    agent_overrides.append(key)
                 merged[key] = agent_data[key]
-
-        LOG.debug("Merged data: %s", merged)
+            if agent_overrides:
+                LOG.debug(
+                    " agent.yaml overrides config keys:"
+                    f" {', '.join(agent_overrides)}"
+                )
         self.userdata_raw = "#cloud-config\n%s" % yaml.dump(merged)
         return True
 
diff --git a/tests/unittests/test_ds_identify.py b/tests/unittests/test_ds_identify.py
index 0e369af89..c8d41d3ba 100644
--- a/tests/unittests/test_ds_identify.py
+++ b/tests/unittests/test_ds_identify.py
@@ -1242,7 +1242,7 @@ class TestWSL(DsIdentifyBase):
         # Forcing WSL_linux2win_path to return a path we'll fail to parse
         # (missing one / in the begining of the path).
         for i, m in enumerate(data["mocks"]):
-            if m["name"] == "WSL_linux2win_path":
+            if m["name"] == "WSL_path":
                 data["mocks"][i]["RET"] = "/wsl.localhost/cant-findme"
 
         cloudinitdir = os.path.join(userprofile, ".cloud-init")
@@ -2416,7 +2416,7 @@ VALID_CFG = {
             MOCK_VIRT_IS_WSL,
             MOCK_UNAME_IS_WSL,
             {
-                "name": "WSL_linux2win_path",
+                "name": "WSL_path",
                 "ret": 0,
                 "RET": "//wsl.localhost/%s/" % MOCK_WSL_INSTANCE_DATA["name"],
             },
@@ -2439,7 +2439,7 @@ VALID_CFG = {
             MOCK_VIRT_IS_WSL,
             MOCK_UNAME_IS_WSL,
             {
-                "name": "WSL_linux2win_path",
+                "name": "WSL_path",
                 "ret": 0,
                 "RET": "//wsl.localhost/%s/" % MOCK_WSL_INSTANCE_DATA["name"],
             },
diff --git a/tools/ds-identify b/tools/ds-identify
index ea18c1d4b..33f085353 100755
--- a/tools/ds-identify
+++ b/tools/ds-identify
@@ -1594,15 +1594,9 @@ dscheck_VMware() {
     return "${DS_NOT_FOUND}"
 }
 
-WSL_win2linux_path() {
-    local val=""
-    val="$(wslpath -au "$1" )"
-    _RET="$val"
-}
-
-WSL_linux2win_path() {
-    local val=""
-    val="$(wslpath -am "$1" )"
+WSL_path() {
+    local params="$1" path="$2" val=""
+    val="$(wslpath "$params" "$1" )"
     _RET="$val"
 }
 
@@ -1629,7 +1623,7 @@ WSL_profile_dir() {
                 # wslpath is a program supplied by WSL itself that translates Windows and Linux paths,
                 # respecting the mountpoints where the Windows drives are mounted.
                 # (in fact it's a symlink to /init).
-                WSL_win2linux_path "$profiledir"
+                WSL_path "-au" "$profiledir"
                 return $?
             fi
         fi
@@ -1640,7 +1634,7 @@ WSL_profile_dir() {
 
 WSL_instance_name() {
     local val="" instance_name=""
-    WSL_linux2win_path "/"
+    WSL_path "-au" "/"
     instance_name="${_RET}"
     # Extracts "Ubuntu/" from "//wsl.localhost/Ubuntu/"
     val="${instance_name#//*/}"

blackboxsw · 2024-04-19T22:19:22Z

cloudinit/sources/DataSourceWSL.py

    """
-    Returns the Windows user profile directory translated as a Linux path
-    accessible inside the current WSL instance.
+    Finds the user's home directory path as a WSL path.


Can we document the conditions where we think we'll raise ProcessExecutionError from this function?

Suggested change

Finds the user's home directory path as a WSL path.

Finds the user's home directory path as a WSL path.

raises: IOError when no mountpoint with cmd.exe found

ProcessExecutionError when either cmd.exe is unable to show proflie dir or

the profile drive is not mounted?

blackboxsw · 2024-04-19T22:20:15Z

cloudinit/sources/DataSourceWSL.py

@@ -119,11 +124,19 @@ def cloud_init_data_dir() -> PurePath:
        raise subp.ProcessExecutionError(
            "No output from cmd.exe to show the user profile dir."
        )
+    return win_path_2_wsl(home)


Let's collapse this win_path_2_wsl and linline it in this function.

Suggested change

return win_path_2_wsl(home)

# Returns a translation of a Windows path to a Linux path that can be

# accessed inside the current instance filesystem.

# Example:

# Assuming Windows drives are mounted under /mnt/ and "S:" doesn't exist:

# WSLPATH_CMD -au "C:\\ProgramData" == "/mnt/c/ProgramData/"

# WSLPATH_CMD -au "S:\\Something" raises exception S: is not mounted

out, _ = subp.subp([WSLPATH_CMD, "-au", home])

return PurePath(out.rstrip())

blackboxsw · 2024-04-20T01:10:35Z

cloudinit/sources/DataSourceWSL.py

-            )
-            return True
+        except (ValueError, IOError) as err:
+            LOG.error("Unable to load metadata: %s", str(err))


@ashuntu Thanks for representing failure conditions. If we receive invalid metadata structure (a string instead of YAML dict) or if that metadata on disk doesn't contain an instance-id key we would have raised a ValueError from load_instance_metadata. this raised ValueError would leave self.metadata unset. I think in this case we want to LOG.error as you have done and also return False because we can't detect WSL datasource with invalid metadata as that may trigger cloud-init to re-run again across the perceived meta-data "instance-id" change causing cloud-init to regenerate SSH host keys, recreate users, reset password etc due to some invalid meta-data format on disk. I think it's better to be noisy about this and return False to avoid cloud-init trying to run in a misconfigured environment.

Suggested change

LOG.error("Unable to load metadata: %s", str(err))

LOG.error("Unable to load metadata: %s", str(err))

return False

blackboxsw · 2024-04-20T01:20:28Z

cloudinit/sources/DataSourceWSL.py

@@ -238,22 +288,65 @@ def check_instance_id(self, sys_cfg) -> bool:

    def _get_data(self) -> bool:
        self.vendordata_raw = None
-        seed_dir = cloud_init_data_dir()
+        user_home = find_home()


If the environment runs into issues below find_home will raise an IOError right?

the correct mounted disks

or finding a viable cmd.exe in mounted disks

being unable to determine the user profile dir find_home

In any of the above conditions, ds-identify would have rejected this platform as DS_NOT_FOUND, and I think we need to LOG.debug("Unable to detect WSL datasource: %s", str(e)) and return False in this case to align w/ ds-identify discovery as well.

Suggested change

user_home = find_home()

try:

user_home = find_home()

except IOError as e:

LOG.debug("Unable to detect WSL datasource: %s", e)

return False

blackboxsw · 2024-04-20T02:25:44Z

cloudinit/sources/DataSourceWSL.py

+            for key in agent_data:
+                merged[key] = agent_data[key]
+
+        LOG.debug("Merged data: %s", merged)


Please also don't log full user-data to logs as it typically contains sensitive infomraiton like passwords. The less places that is logged the better. Maybe only log that we have "Merged config data from both user_data and agent.yaml" and maybe list the keys overridden by agent.yaml.

blackboxsw · 2024-04-20T03:44:25Z

doc/rtd/reference/datasources/wsl.rst

+   then the first user-provided configuration will be used in its place.
+
+2. ``%USERPROFILE%\.ubuntupro\.cloud-init\agent.yaml`` holds data provided by
+   the Ubuntu Pro for WSL agent. If this file is present, it's modules will be


typo

Suggested change

the Ubuntu Pro for WSL agent. If this file is present, it's modules will be

the Ubuntu Pro for WSL agent. If this file is present, its modules will be

blackboxsw · 2024-04-20T03:53:33Z

cloudinit/sources/DataSourceWSL.py

+
+        # No configs were found
+        if not any([user_data, agent_data]):
+            self.userdata_raw = None


No need to unset userdata_raw, a return False prevents cloud-init from trying to pickle or preseve this class instance and/or attributes.

blackboxsw · 2024-04-20T03:55:36Z

cloudinit/sources/DataSourceWSL.py

+DEFAULT_INSTANCE_ID = "iid-datasource-wsl"
+LANDSCAPE_DATA_FILE = "%s.user-data"
+AGENT_DATA_FILE = "agent.yaml"
+

 def wsl_path_2_win(path: str) -> PurePath:


I think we can drop wsl_path_2_win function and just call the content from within instance_name as that's a tiny function and it only has one call-site.

blackboxsw · 2024-04-20T04:00:08Z

tools/ds-identify

+WSL_win2linux_path() {
+    local val=""
+    val="$(wslpath -au "$1" )"
+    _RET="$val"
+}
+
+WSL_linux2win_path() {
+    local val=""
+    val="$(wslpath -am "$1" )"
+    _RET="$val"
+}


Since these functions are primarily to ease mocking and are near identical, let's refactor them to one function that accepts two positional params

Suggested change

WSL_win2linux_path() {

local val=""

val="$(wslpath -au "$1" )"

_RET="$val"

}

WSL_linux2win_path() {

local val=""

val="$(wslpath -am "$1" )"

_RET="$val"

}

WSL_path() {

local params="$1" path="$2" val=""

val="$(wslpath "$params" "$1" )"

_RET="$val"

}

tools/ds-identify

blackboxsw · 2024-04-25T03:42:18Z

Thanks @ashuntu note that James just put up a WSL tutorial as well :) #5206

blackboxsw

Looks good @ashuntu.

@ashuntu and @CarlosNihelton
The only significant questions I have outstanding now are the following:

Given that .ubuntu-pro/.cloud-init/agent.yaml is not instance name specific, do we expect that this agent.yaml is shared and common amongst all instances on a system?
It looks like we have logic that expects agent.yaml may be bytes instead ofYAML. If non-YAML is expected, maybe this filename should be something else?
.cloud-init/<instance_name>.meta-data contains an instance-id value, that should change if we expect to apply new configuration Given that self.metadata is not being updated due to files in the .ubuntupro subdirectory, is there a mechanism through ubuntu-pro-for-wsl that will update the .cloud-init/<instance_name>.meta-data key instance-id with a new UUID when new config is provided via UP4W? Otherwise, any updates to the .ubuntupro files will be ignored by any WSL instance which has already been launched because cloud-init only re-runs config when instance-id changes.

blackboxsw · 2024-04-25T14:57:43Z

cloudinit/sources/DataSourceWSL.py

+        # If we cannot reliably model data files as dicts, then we cannot merge
+        # ourselves, so we can pass the data in ascending order as a list for
+        # cloud-init to handle internally
+        if isinstance(agent_data, bytes) or isinstance(user_data, bytes):


When would we expect agent-data.yaml file to by bytes and not YAML? If we expect this case, shouldn't this be named just agent-data?

From cloud-init's point of view, is it better to be strict with what we know from the UP4W implementation (which our team controls and do enforce YAML as of now) or let the datasource be more flexible so that changes in UP4W implementation won't affect the datasource as much?

The question sounds silly in principle, but there is a layer of trust here: if we know the agent only writes YAML, ingesting any other format would mean ingesting invalid (thus potentially harmful) config data. I'm in favor of the less future maintenance path, trusting that the user has full control over the directories involved and apply the relevant protection to their systems. I hope I'm not too positive about this 😄

blackboxsw · 2024-04-25T14:58:42Z

doc/rtd/reference/datasources/wsl.rst

+
+1. ``%USERPROFILE%\.ubuntupro\.cloud-init\<InstanceName>.user-data`` holds data
+   provided by Landscape to configure a specific WSL instance. If this file
+   is present, normal user-provided configurations are not looked for. This


Suggested change

is present, normal user-provided configurations are not looked for. This

is present, normal user-provided configurations are ignored. This

blackboxsw · 2024-04-25T15:00:16Z

doc/rtd/reference/datasources/wsl.rst

+   file is merged with (2) on a per-module basis. If this file is not present,
+   then the first user-provided configuration will be used in its place.
+
+2. ``%USERPROFILE%\.ubuntupro\.cloud-init\agent.yaml`` holds data provided by


This agent.yaml file is not <InstanceName> specific, it will therefore be consumed by default and override any conflicting keys in the user-data for all instances, is that intended?

Also, when ubuntu-pro-for-wsl writes this agent.yaml, will it also update any .cloud-init/<InstanceName>.meta-data to ensure that expected instances get this agent.yaml update? Otherwise cloud-init will not re-run on a WSL instance that is formerly launched because it will see an unchanged metadata["instance-id"] and assume it needs to remain inert.

Landscape job is (or will eventually be) to send cloud-init user-data to the UP4W agent as part of an Install command, which implies we won't get updates, but rather uninstall/install requests coming with the relevant config data. The API via which both components talk to each other already supports that use-case, but it lacks implementation on Landscape side. So, no updates so far in the horizon.

blackboxsw · 2024-04-25T15:06:39Z

doc/rtd/reference/datasources/wsl.rst

+First, configurations from Ubuntu Pro/Landscape are checked for in the
+following paths:
+
+1. ``%USERPROFILE%\.ubuntupro\.cloud-init\<InstanceName>.user-data`` holds data


If landscape updates %USERPROFILE%\.ubuntupro\.cloud-init\<InstanceName>.user-data will UP4W update %USERPROFILE%\.cloud-init/<InstanceName>.meta-data with a instance-id: <new_UUID> to trigger cloud-init to re-run on next WSL instance start?

We definitely can, but it's not scoped at this moment. It's as easy for WSL users to just wipe the instance and start from a clean default, that metadata.instance-id might end up never been a feature for this data source. We need real users to probe that assumption.

CarlosNihelton · 2024-04-25T16:54:21Z

Thanks for the reviews so far, @blackboxsw !

Answers below:

Given that .ubuntu-pro/.cloud-init/agent.yaml is not instance name specific, do we expect that this agent.yaml is shared and common amongst all instances on a system?

It's shared and holds information controlled by the Ubuntu Pro for WSL agent, which is globally applied to instances of any Ubuntu release.

It looks like we have logic that expects agent.yaml may be bytes instead of YAML. If non-YAML is expected, maybe this filename should be something else?

We (UP4W) are in control of that file and it's currently YAML.

.cloud-init/<instance_name>.meta-data contains an instance-id value, that should change if we expect to apply new configuration Given that self.metadata is not being updated due to files in the .ubuntupro subdirectory, is there a mechanism through ubuntu-pro-for-wsl that will update the .cloud-init/<instance_name>.meta-data key instance-id with a new UUID when new config is provided via UP4W? Otherwise, any updates to the .ubuntupro files will be ignored by any WSL instance which has already been launched because cloud-init only re-runs config when instance-id changes.

We don't (currently maybe) support that feature at the moment. We want to land something minimally viable so we can collect user feedback to guide the next developments in that area. Limiting scope is a must at this moment.

CarlosNihelton · 2024-04-25T17:10:07Z

cloudinit/sources/DataSourceWSL.py

+        # Load Ubuntu Pro configs
+        agent_data, user_data = load_ubuntu_pro_data(user_home)


I just realized we don't check if the distro is Ubuntu before doing this. Sorry for not raising that before, but we should just skip this step if the distro is not Ubuntu, as it contains config data that's not relevant to other distros, such as an Ubuntu Pro subscription token.

blackboxsw

Code and changes look good. This appears to come down to unittest issue given the new calls.
I've pushed a commit that refactors the unittests to use a pytest fixture for common setup and mocks the new subp.which call as well as setting up the distro as ubuntu passed into each unittest.

If tests succeed here, I'll squash merge your changeset. Thank you!

diff --git a/tests/unittests/sources/test_wsl.py b/tests/unittests/sources/test_wsl.py
index dcd31984b..31c5c897e 100644
--- a/tests/unittests/sources/test_wsl.py
+++ b/tests/unittests/sources/test_wsl.py
@@ -14,6 +14,7 @@ import pytest
 
 from cloudinit import util
 from cloudinit.sources import DataSourceWSL as wsl
+from tests.unittests.distros import _get_distro
 from tests.unittests.helpers import does_not_raise, mock
 
 INSTANCE_NAME = "Noble-MLKit"
@@ -251,34 +252,41 @@ def join_payloads_from_content_type(
 
 
 class TestWSLDataSource:
-    @mock.patch("cloudinit.sources.DataSourceWSL.instance_name")
-    @mock.patch("cloudinit.sources.DataSourceWSL.find_home")
-    def test_metadata_id_default(self, m_home_dir, m_iname, tmpdir, paths):
+    @pytest.fixture(autouse=True)
+    def setup(self, mocker, tmpdir):
+        mocker.patch(
+            "cloudinit.sources.DataSourceWSL.instance_name",
+            return_value=INSTANCE_NAME,
+        )
+        mocker.patch(
+            "cloudinit.sources.DataSourceWSL.find_home",
+            return_value=PurePath(tmpdir),
+        )
+        mocker.patch(
+            "cloudinit.sources.DataSourceWSL.subp.which",
+            return_value="/usr/bin/wslpath",
+        )
+
+    def test_metadata_id_default(self, tmpdir, paths):
         """
         Validates that instance-id is properly set, indepedent of the existence
         of user-data.
         """
-        m_iname.return_value = INSTANCE_NAME
-        m_home_dir.return_value = PurePath(tmpdir)
 
         ds = wsl.DataSourceWSL(
             sys_cfg=SAMPLE_CFG,
-            distro=None,
+            distro=_get_distro("ubuntu"),
             paths=paths,
         )
         ds.get_data()
 
         assert ds.get_instance_id() == wsl.DEFAULT_INSTANCE_ID
 
-    @mock.patch("cloudinit.sources.DataSourceWSL.instance_name")
-    @mock.patch("cloudinit.sources.DataSourceWSL.find_home")
-    def test_metadata_id(self, m_home_dir, m_iname, tmpdir, paths):
+    def test_metadata_id(self, tmpdir, paths):
         """
         Validates that instance-id is properly set, indepedent of the existence
         of user-data.
         """
-        m_iname.return_value = INSTANCE_NAME
-        m_home_dir.return_value = PurePath(tmpdir)
         SAMPLE_ID = "Nice-ID"
         metadata_path = tmpdir.join(
             ".cloud-init", f"{INSTANCE_NAME}.meta-data"
@@ -290,7 +298,7 @@ class TestWSLDataSource:
 
         ds = wsl.DataSourceWSL(
             sys_cfg=SAMPLE_CFG,
-            distro=None,
+            distro=_get_distro("ubuntu"),
             paths=paths,
         )
         ds.get_data()
@@ -298,19 +306,15 @@ class TestWSLDataSource:
         assert ds.get_instance_id() == SAMPLE_ID
 
     @mock.patch("cloudinit.util.lsb_release")
-    @mock.patch("cloudinit.sources.DataSourceWSL.instance_name")
-    @mock.patch("cloudinit.sources.DataSourceWSL.find_home")
-    def test_get_data_cc(self, m_home_dir, m_iname, m_gld, paths, tmpdir):
-        m_gld.return_value = SAMPLE_LINUX_DISTRO
-        m_iname.return_value = INSTANCE_NAME
-        m_home_dir.return_value = PurePath(tmpdir)
+    def test_get_data_cc(self, m_lsb_release, paths, tmpdir):
+        m_lsb_release.return_value = SAMPLE_LINUX_DISTRO
         data_path = tmpdir.join(".cloud-init", f"{INSTANCE_NAME}.user-data")
         data_path.dirpath().mkdir()
         data_path.write("#cloud-config\nwrite_files:\n- path: /etc/wsl.conf")
 
         ds = wsl.DataSourceWSL(
             sys_cfg=SAMPLE_CFG,
-            distro=None,
+            distro=_get_distro("ubuntu"),
             paths=paths,
         )
 
@@ -325,19 +329,15 @@ class TestWSLDataSource:
         assert "wsl.conf" in cast(str, userdata)
 
     @mock.patch("cloudinit.util.lsb_release")
-    @mock.patch("cloudinit.sources.DataSourceWSL.instance_name")
-    @mock.patch("cloudinit.sources.DataSourceWSL.find_home")
-    def test_get_data_sh(self, m_home_dir, m_iname, m_gld, tmpdir, paths):
-        m_gld.return_value = SAMPLE_LINUX_DISTRO
-        m_iname.return_value = INSTANCE_NAME
-        m_home_dir.return_value = PurePath(tmpdir)
+    def test_get_data_sh(self, m_lsb_release, tmpdir, paths):
+        m_lsb_release.return_value = SAMPLE_LINUX_DISTRO
         COMMAND = "echo Hello cloud-init on WSL!"
         data_path = tmpdir.join(".cloud-init", f"{INSTANCE_NAME}.user-data")
         data_path.dirpath().mkdir()
         data_path.write(f"#!/bin/sh\n{COMMAND}\n")
         ds = wsl.DataSourceWSL(
             sys_cfg=SAMPLE_CFG,
-            distro=None,
+            distro=_get_distro("ubuntu"),
             paths=paths,
         )
 
@@ -354,14 +354,8 @@ class TestWSLDataSource:
         assert COMMAND in userdata
 
     @mock.patch("cloudinit.util.get_linux_distro")
-    @mock.patch("cloudinit.sources.DataSourceWSL.instance_name")
-    @mock.patch("cloudinit.sources.DataSourceWSL.find_home")
-    def test_data_precedence(
-        self, m_home_dir, m_instance, m_distro, tmpdir, paths
-    ):
-        m_distro.return_value = SAMPLE_LINUX_DISTRO
-        m_instance.return_value = INSTANCE_NAME
-        m_home_dir.return_value = PurePath(tmpdir)
+    def test_data_precedence(self, m_get_linux_dist, tmpdir, paths):
+        m_get_linux_dist.return_value = SAMPLE_LINUX_DISTRO
 
         # Set up basic user data:
 
@@ -380,7 +374,7 @@ class TestWSLDataSource:
         # Run the datasource
         ds = wsl.DataSourceWSL(
             sys_cfg=SAMPLE_CFG,
-            distro=None,
+            distro=_get_distro("ubuntu"),
             paths=paths,
         )
 
@@ -425,7 +419,7 @@ ubuntu_advantage:
         # Run the datasource
         ds = wsl.DataSourceWSL(
             sys_cfg=SAMPLE_CFG,
-            distro=None,
+            distro=_get_distro("ubuntu"),
             paths=paths,
         )
 
@@ -459,7 +453,7 @@ package_update: true"""
         # Run the datasource
         ds = wsl.DataSourceWSL(
             sys_cfg=SAMPLE_CFG,
-            distro=None,
+            distro=_get_distro("ubuntu"),
             paths=paths,
         )

blackboxsw · 2024-04-29T20:57:35Z

cloudinit/sources/DataSourceWSL.py

@@ -296,7 +296,7 @@ def _get_data(self) -> bool:
            return False

        # # Load Ubuntu Pro configs only on Ubuntu distros
-        if platform.freedesktop_os_release().get("NAME") == "Ubuntu":


This is good given that freedesktop_os_release isn't avail until python 3.10 and cloud-init currently supports 3.7 environments still for a bit.

- add missing subp.which mock of WSLPATH_CMD - provide non-empty distro to all DataSourceWSL init - consolidate common mocks in autouse pytest fixture

blackboxsw · 2024-04-29T21:53:41Z

Validated on Ubuntu-Preview clean boot with updated ds-identify and DataSourceWSL with and without .ubuntupro config dir.

blackboxsw

LGTM. thanks for limiting behavior of .ubuntupro subdir to ubuntu only. Changest looks good and behavior well on live WSL instances.

ashuntu and others added 17 commits April 1, 2024 17:34

Fix dscheck for WSL

21de562

Previously dscheck would fail due to the for loop not properly expanding. This change removes the quotes around $@ and corrects some spelling mistakes to fix that.

feat(WSL): Add support for Landscape and agent configs

a241e37

Simplify profiledir variable

8b1ed65

Reduce amount of cmd.exe calls

565f09d

Fix comments and formatting

3b6c010

Add myself to cla-signers

40c7d55

Add file loading utility function

281a0ed

Breaks up yaml/binary loading into a function for better reusability. Also adds more context to the usage of a list instead of manually merging dicts from YAML.

Add WSL datasource unit test

0c2489c

Fix lint problems and Ruff formatting

7f1f122

Reorder test imports Fix variable types and formatting

Fix pecedence test and dir mocks

71b3315

The main precedence test and others unittests were broken due to previous changes. This change fixes those tests somewhat, but a commit after this will permanently fix the problems.

Disable shellcheck for profile directory expansion

2d38ebf

Since we do actually need $@ to expand to properly handle the directories passed in by the caller, we can safetly disable shellcheck here. Co-authored-by: Carlos Nihelton <carlosnsoliveira@gmail.com>

Increment precedence testing

b6722d0

Adds more incremental tests to userdata loading to account for different machine configurations, including the new cloud-init configs provided by Ubuntu Pro.

Satisfy mypy and black

352bad9

Minor refactor and renaming

Update WSL datasource documentation

bf210e5

ashuntu force-pushed the ubuntu-pro branch from 5c4b586 to bf210e5 Compare April 1, 2024 22:45

blackboxsw self-assigned this Apr 4, 2024

blackboxsw reviewed Apr 12, 2024

View reviewed changes

ashuntu added 4 commits April 16, 2024 09:40

Fix WSL_run_cmd relying on external variables

92c3c5f

Add clarifying comment regarding variable expansion

ed33dbd

Remove unnecessary wrapper for Ubuntu Pro dir

d7e666a

blackboxsw requested changes Apr 20, 2024

View reviewed changes

ashuntu added 5 commits April 24, 2024 17:15

Merge wslpath functions into a single function

da6b5c4

Correct grammar in WSL docs

bb2e093

Inline wsl_path_2_win

06b22fd

Inline win_path_2_wsl and add exceptions to doc comment

71de3f7

Instantiate instance name in _get_data

44d9e2b

ashuntu added 3 commits April 24, 2024 17:35

Add additional environment checks before attempting to get data

8e03e7f

Streamline raw user_data cases

b57387e

Log merged keys, instead of full merged data

eef9704

blackboxsw reviewed Apr 25, 2024

View reviewed changes

CarlosNihelton reviewed Apr 25, 2024

View reviewed changes

ashuntu added 2 commits April 25, 2024 16:22

Check for Ubuntu distro before looking for Pro data

9ade636

Correct linting and utilize datasource distro

1414b58

blackboxsw reviewed Apr 29, 2024

View reviewed changes

tests: adding missed mocks subp.which, refactor using common fixture

a174264

- add missing subp.which mock of WSLPATH_CMD - provide non-empty distro to all DataSourceWSL init - consolidate common mocks in autouse pytest fixture

blackboxsw approved these changes Apr 29, 2024

View reviewed changes

blackboxsw merged commit 70e87f7 into canonical:main Apr 29, 2024
29 checks passed

	LOG.error("Unable to load metadata: %s", str(err))
	LOG.error("Unable to load metadata: %s", str(err))
	return False

		agent_data = load_agent_data(user_home)
		user_data = load_landscape_data(self.instance_name, user_home)

-    Finds the user's home directory path as a WSL path.
+    Finds the user's home directory path as a WSL path.
+    raises: IOError when no mountpoint with cmd.exe found
+                ProcessExecutionError when either cmd.exe is unable to show proflie dir or
+                the profile drive is not mounted?

	the Ubuntu Pro for WSL agent. If this file is present, it's modules will be
	the Ubuntu Pro for WSL agent. If this file is present, its modules will be

	is present, normal user-provided configurations are not looked for. This
	is present, normal user-provided configurations are ignored. This

		# Load Ubuntu Pro configs
		agent_data, user_data = load_ubuntu_pro_data(user_home)

feat(WSL): Add support for Ubuntu Pro configs #5116

feat(WSL): Add support for Ubuntu Pro configs #5116

Conversation

ashuntu commented Mar 28, 2024 • edited Loading

Proposed Commit Message

Additional Context

Test Steps

Checklist

Merge type

blackboxsw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blackboxsw Apr 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blackboxsw Apr 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blackboxsw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blackboxsw commented Apr 25, 2024

blackboxsw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarlosNihelton commented Apr 25, 2024

Choose a reason for hiding this comment

blackboxsw left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blackboxsw commented Apr 29, 2024

blackboxsw left a comment

Choose a reason for hiding this comment

ashuntu commented Mar 28, 2024 •

edited

Loading

blackboxsw Apr 12, 2024 •

edited

Loading

blackboxsw Apr 12, 2024 •

edited

Loading

blackboxsw left a comment •

edited

Loading