-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Cached VM ID File for Provisioning Lifecycle Management #164
Adding Cached VM ID File for Provisioning Lifecycle Management #164
Conversation
Since dmidecode is not used anymore, please fix the description to remove reference to dmidecode |
I would recommend an additional cli feature to clear the provisioning directory. This might come in handy if we need to force azure-init to provision again. For any software that has a "cache", it should offer some feature to clear the cache |
libazureinit/src/status.rs
Outdated
/// - `None` if something fails or the output is empty. | ||
pub fn get_vm_id() -> Option<String> { | ||
// Test override check | ||
if let Ok(mock_id) = std::env::var("MOCK_VM_ID") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TEST_PROVISION_DIR vs MOCK_VM_ID
if there are going to be env vars to support testing, maybe we need a scheme that's consistent.
e.g. AZURE_INIT_TEST_PROVISION_DIR
AZURE_INIT_TEST_VM_ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's possible to avoid this mock altogether by allowing an optional VM id param to
- is_provisioning_complete()
- mark_provisioning_complete()
but to get the best testing, it may be best to allow overriding "/sys/class/dmi/id/product_uuid" so we can put whatever id in there we want and test the scenarios.
is_vm_gen1() would some special care to mock or parameterized option for get_vm_id()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be practical for testing if all paths derived from some configurable root path and any file structure is easily faked, but certainly beyond the scope of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can easily make an issue for the last idea and circle back to it!
For now, I updated is_provisioning_complete()
and mark_provisioning_complete()
to accept an optional vm_id, so we can pass in a test VM ID directly rather than relying on get_vm_id()
or MOCK_VM_ID
. In addition, I removed MOCK_VM_ID
from get_vm_id()
itself. Now it takes an optional path argument to override /sys/class/dmi/id/product_uuid
, which avoids environment variables for testing.
I also updated is_vm_gen1()
to accept an optional mock_efi_path
, letting us simulate Gen1 or Gen2 in tests without touching real system files.
That was definitely a little more complicated than I anticipated, so please take a look and let me know what you think!
I amended the logic in const DEFAULT_PROVISION_DIR: &str = "/var/lib/azure-init/"; With this change though, we have to pass the config into To me, it seemed better to do it as I show below because I anticipate needing a cloned version of config when I complete some of the issues related to toggling kvp telemetry on and off: let clone_config: Config = config.clone();
match provision(config, opts).await {
mark_provisioning_complete(Some(&clone_config), None) instead of doing this and cloning the config inline. match provision(config.clone(), opts).await
mark_provisioning_complete(Some(&config), None) This is one part of the PR that I would love feedback on if anyone else has a thought or a Rust standard we need to conform to! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I think you're going to need to adjust the tests a bit to avoid writing to the environment 😿
libazureinit/src/status.rs
Outdated
fn swap_uuid_to_little_endian(bytes: [u8; 16]) -> [u8; 16] { | ||
let d1 = BigEndian::read_u32(&bytes[0..4]); | ||
let d2 = BigEndian::read_u16(&bytes[4..6]); | ||
let d3 = BigEndian::read_u16(&bytes[6..8]); | ||
|
||
let d1_le = d1.swap_bytes(); | ||
let d2_le = d2.swap_bytes(); | ||
let d3_le = d3.swap_bytes(); | ||
|
||
let mut swapped = [0u8; 16]; | ||
|
||
swapped[0] = (d1_le >> 24) as u8; | ||
swapped[1] = (d1_le >> 16) as u8; | ||
swapped[2] = (d1_le >> 8) as u8; | ||
swapped[3] = (d1_le) as u8; | ||
|
||
swapped[4] = (d2_le >> 8) as u8; | ||
swapped[5] = (d2_le) as u8; | ||
|
||
swapped[6] = (d3_le >> 8) as u8; | ||
swapped[7] = (d3_le) as u8; | ||
|
||
swapped[8..16].copy_from_slice(&bytes[8..16]); | ||
|
||
swapped | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to drop the byteorder dependency and avoid some bit shifting with something like the following:
fn swap_uuid_to_little_endian(mut bytes: [u8; 16]) -> Uuid {
let (d1, remainder) = bytes.split_at(std::mem::size_of::<u32>());
let d1 = d1.try_into().map(u32::from_be_bytes).unwrap_or(0).to_ne_bytes();
let (d2, remainder) = remainder.split_at(std::mem::size_of::<u16>());
let d2 = d2.try_into().map(u16::from_be_bytes).unwrap_or(0).to_ne_bytes();
let (d3, _) = remainder.split_at(std::mem::size_of::<u16>());
let d3 = d3.try_into().map(u16::from_be_bytes).unwrap_or(0).to_ne_bytes();
let native_endian= d1
.into_iter()
.chain(d2.into_iter())
.chain(d3.into_iter())
.collect::<Vec<_>>();
debug_assert_eq!(native_endian.len(), 8);
bytes[..native_endian.len()].copy_from_slice(&native_endian);
uuid::Uuid::from_bytes(bytes)
}
Frustratingly, I can't find a great way to avoid the fallible calls; there's not a stable API to split an array into a known length. split_at
gives us a slice, which even though we know is the size of the u32, we have to use try_into()
to attempt to convert it to [u8; 4]
. I settled for using unwrap_or(0)
although we should never hit the or case.
src/main.rs
Outdated
Ok(_) => ExitCode::SUCCESS, | ||
Ok(_) => { | ||
if let Err(e) = | ||
mark_provisioning_complete(Some(&clone_config), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I think, this function would be internal to the library and something it calls at the end of a successful provision. However, since the current Provision
API has the user manually calling goalstate::report_health
after it completes, perhaps we don't want this call to be done until after that?
We should consider moving the health reporting to be internal to the library as well and done for the user without additional calls, but if that's what we want to do, we can obviously do that separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that having the library automatically call mark_provisioning_complete() once provisioning and health reporting are done would make the most sense! It'll probably be more intuitive to follow that logic too. I can file an issue so we can discuss there and decide if we want to wrap the entire flow (including report_health) inside the library?
In the meantime, if it's good with you, I'll just move the mark_provisioning_complete() call to occur after report_health() so we’re at least marking provisioning complete at the very end of the process (which I agree makes more sense).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me 👍🏼
src/main.rs
Outdated
return ExitCode::SUCCESS; | ||
} | ||
|
||
let clone_config = config.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I don't think this is "odd" or breaking any Rust conventions I'm aware of so doing this or doing it inline are both fine
- Creates a .provisioned file named after the VM ID to indicate provisioning completion - Provides functions to check if provisioning is complete and to mark it complete - Skips re-provisioning if the .provisioned file already exists (ie, when a VM has been rebooted)
- Replaced dmidecode system call with direct read from /sys/class/dmi/id/product_uuid. - Implemented Gen1/Gen2 detection using /sys/firmware/efi presence. - Applied byte-swap to the first three UUID fields for Gen1 VMs using byteorder.
… VM ID to KVP: - Add KVP tracing calls to log the final resolved VM ID in for visibility. - Fetch VM ID once in and store it as a struct field. - Modify to include the VM ID in every event key.
…rectory, remove MOCK_VM_ID, and address othe minor PR comments. - Moved the provisioning directory setting from a hardcoded constant into the Config struct via a new ProvisioningDir field. - Updated get_provisioning_dir(), check_provision_dir(), is_provisioning_complete(), and mark_provisioning_complete() to accept an Option<&Config> so they use config.provisioning_dir.path if provided, falling back to /var/lib/ azure-init/ otherwise. - Removed MOCK_VM_ID from get_vm_id(); it now takes an optional path argument to override /sys/class/dmi/id/product_uuid, avoiding environment variables for testing. - Updated is_vm_gen1() to accept an optional mock_efi_path, allowing simulation of Gen1 versus Gen2 behavior without relying on real system files. - Adjusted tests accordingly to support the new signatures and behavior. - Open to feedback on whether we should clone the config before passing it to provision() or pass it inline.
…g_complete fatal - Removed environment variable overrides in tests; replaced with a test-local Config pointing to a TempDir. - Moved mark_provisioning_complete into provision() after report_health, ensuring provisioning is only considered complete if marking succeeds. - Made mark_provisioning_complete errors fatal by returning an error up the chain. - Cleaned up clippy warnings by removing explicit .into_iter() calls where arrays already implement IntoIterator and updated swap_uuid_to_little_endian() according to the PR suggestions, dropping the byteswap dependency.
ac67757
to
0d03035
Compare
libazureinit/src/status.rs
Outdated
fn get_provisioning_dir(config: Option<&Config>) -> PathBuf { | ||
config | ||
.map(|cfg| cfg.provisioning_dir.path.clone()) | ||
.unwrap_or_else(|| PathBuf::from("/var/lib/azure-init/")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config already has a default. If there's a possibility that the config module will fail to provide a default, this value (/var/lib/azure-init
) should be declared a const that both modules can access. This way there's only one possible default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should've caught that in my look-over -- will update, thank you!
…FAULT_PROVISIONING_DIR as a public constant next to ProvisioningDir.
libazureinit/src/config.rs
Outdated
/// | ||
/// This constant is declared outside its related struct so that both the `ProvisioningDir` struct | ||
/// and other modules (like `status.rs`) can reference the same path without risking any mismatch. | ||
pub const DEFAULT_PROVISIONING_DIR: &str = "/var/lib/azure-init/"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider calling it something like [PERSISTENT_]STATE_DIR or simar. this path's scope is potentially wider than "provisioning" if azure-init wants non-provisioning state for some reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change the config struct too then to something similar?
pub const DEFAULT_PERSISTENT_STATE_DIR: &str = "/var/lib/azure-init/";
#[derive(Serialize, Deserialize, Debug, Clone)]
#[serde(default)]
pub struct PersistentStateDir{
/// Specifies the path used for storing persistent status files.
/// Defaults to `/var/lib/azure-init/`.
pub path: PathBuf,
}
impl Default for PersistentStateDir {
fn default() -> Self {
Self {
path: PathBuf::from(DEFAULT_PERSISTENT_STATE_DIR),
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, looking at the rest of the code too, are you okay with keeping the functions in status.rs get_provisioning_dir
and check_provisioning_dir
or does it make more sense to change them too?
TLDR --> get_persistent_state_dir
and check_persistent_state_dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking pretty solid, and my notes are less relevant if the provisioning marking/checking APIs become private and part of the provision API down the road.
libazureinit/src/status.rs
Outdated
/// Retrieves the VM ID by reading `/sys/class/dmi/id/product_uuid` and byte-swapping if Gen1. | ||
/// | ||
/// The VM ID is a unique system identifier that persists across reboots but changes | ||
/// when a VM is cloned or redeployed. | ||
/// | ||
/// # Returns | ||
/// - `Some(String)` containing the VM ID if retrieval is successful. | ||
/// - `None` if something fails or the output is empty. | ||
pub fn get_vm_id( | ||
custom_path: Option<&str>, | ||
mock_efi_path: Option<&str>, | ||
) -> Option<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're exposing this as a public API, I would recommend keeping the test parameters private. The easiest way would be to do something like this:
/// Retrieves the VM ID by reading `/sys/class/dmi/id/product_uuid` and byte-swapping if Gen1. | |
/// | |
/// The VM ID is a unique system identifier that persists across reboots but changes | |
/// when a VM is cloned or redeployed. | |
/// | |
/// # Returns | |
/// - `Some(String)` containing the VM ID if retrieval is successful. | |
/// - `None` if something fails or the output is empty. | |
pub fn get_vm_id( | |
custom_path: Option<&str>, | |
mock_efi_path: Option<&str>, | |
) -> Option<String> { | |
/// Retrieves the VM ID by reading `/sys/class/dmi/id/product_uuid` and byte-swapping if Gen1. | |
/// | |
/// The VM ID is a unique system identifier that persists across reboots but changes | |
/// when a VM is cloned or redeployed. | |
/// | |
/// # Returns | |
/// - `Some(String)` containing the VM ID if retrieval is successful. | |
/// - `None` if something fails or the output is empty. | |
pub fn get_vm_id() -> Option<String> { | |
private_get_vm_id(None, None) | |
} | |
fn private_get_vm_id( | |
custom_path: Option<&str>, | |
mock_efi_path: Option<&str>, | |
) -> Option<String> { |
/// This function checks whether a provisioning status file exists for the current VM ID. | ||
/// If the file exists, provisioning has already been completed and should be skipped. | ||
/// If the file does not exist or the VM ID has changed, provisioning should proceed. | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a public function, it'd be good to expand the documentation slightly to describe what happens if either of the optional parameters isn't provided; if config
is None, the default provisioning directory is used to check if things are complete, and if no VM id is provided, it'll be read.
If the parameters exist to make testing easier, you can use the same trick at above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one, one of the parameters does actually matter beyond just testing --
config
is how we determine the provisioning_dir
(ie: where we put the vm_id.provisioned file).
main.rs
sends the config when is_provisioning_competed() is called because it needs to know where to check for the file.
But, vm_id
is just for testing - so I can make a private / public version here too. I'll also update the documentation to make everything clearer!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spirit of keeping this all consistent, I did the same for mark_provisioning_complete()
and split it into a private and public function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good example of where the optional parameter would be useful, desirable even. If the VM ID is known earlier it can be reused with this parameter without having to query it again.
The main entry point should query for VM ID and pass it to logging -> KVP.
And then it can re-use when querying for provisioning status.
This would avoid multiple calls to get_vm_id().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some discussion offline, I've pushed a refactor to streamline how we handle vm_id
. Instead of calling get_vm_id()
in multiple places, we now retrieve it once in main.rs
and pass it explicitly where needed (logging, provisioning, status etc.).
vm_id
is now passed frommain.rs
→setup_layers()
→EmitKVPLayer
, soEmitKVPLayer
no longer fetches it itself.is_provisioning_complete()
andmark_provisioning_complete()
now takevm_id
as a parameter frommain.rs
instead of each callingget_vm_id()
internally.- Since
vm_id
is always passed explicitly, we no longer need the separate private/public function split foris_provisioning_complete()
andmark_provisioning_complete()
.
This removes redundant lookups with get_vm_id()
. Let me know if anything looks off with that or anyone has any questions!
…DEFAULT_PROVISIONING_DIR to DEFAULT_AZURE_INIT_DATA_DIR and ProvisionDir to AzureInitDataDir, propagating the nature of that change throughout the files.
libazureinit/src/status.rs
Outdated
fn is_vm_gen1(mock_efi_path: Option<&str>) -> bool { | ||
if let Some(path) = mock_efi_path { | ||
!Path::new(path).exists() | ||
} else { | ||
if Path::new("/sys/firmware/efi").exists() | ||
|| Path::new("/dev/efi").exists() | ||
{ | ||
return false; | ||
} | ||
true | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this works, I would suggest a mocking strategy that avoids new code paths for mocks. This inherently guarantees we can't get coverage over the actual potential code paths. While this function is relatively simple, it can be problematic for more complex cases.
Consider something like:
fn is_vm_gen1(sysfs_efi_path: Option<&str>, dev_efi_path: Option<&str>) -> bool {
let sysfs_efi_path = sysfs_efi_path.unwrap_or("/sys/firmware/efi");
let dev_efi_path = dev_efi_path.unwrap_or("/dev/efi");
!Path::new(sysfs_efi_path).exists() && !Path::new(dev_efi_path).exists()
}
#[cfg(test)]
mod tests {
use super::*;
use std::fs::{create_dir, remove_dir};
#[test]
fn test_gen1_vm() {
// Mock paths that do not exist
assert!(is_vm_gen1(Some("/nonexistent/efi"), Some("/nonexistent/dev_efi")));
}
#[test]
fn test_gen2_vm_with_sysfs_efi() {
let mock_path = "/tmp/mock_efi";
create_dir(mock_path).ok(); // Create a mock directory
assert!(!is_vm_gen1(Some(mock_path), Some("/nonexistent/dev_efi")));
remove_dir(mock_path).ok(); // Clean up
}
#[test]
fn test_gen2_vm_with_dev_efi() {
let mock_path = "/tmp/mock_dev_efi";
create_dir(mock_path).ok(); // Create a mock directory
assert!(!is_vm_gen1(Some("/nonexistent/efi"), Some(mock_path)));
remove_dir(mock_path).ok(); // Clean up
}
}
This approach allows us to pretty much get 100% coverage.
libazureinit/src/status.rs
Outdated
} | ||
|
||
fn private_get_vm_id( | ||
custom_path: Option<&str>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah this is better. just a nit on name:
custom_path: Option<&str>, | |
product_uuid_path: Option<&str>, | |
dev_efi_path: Option<&str>, |
Opinionated, but instead of "mock_" just write a descriptive name which allow for overrides (either for tests or in some cases, callers that just want to specify some, usually default, parameter).
…ID handling - Removed private/public split in is_provisioning_complete and mark_provisioning_complete - Functions now require vm_id to be passed explicitly instead of calling get_vm_id() - Updated main.rs to retrieve vm_id once and pass it through all necessary functions, including provision() and setup_layers() - Updated EmitKVPLayer to accept vm_id at initialization, eliminating redundant lookups to append VM ID to each event key
73d8efa
to
b60671d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All my concerns have been addressed, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the work and all the testing for this PR
This PR adds a new
status.rs
module that helps manage the provisioning lifecycle for Azure Linux VMs.The core idea is to create and check a
.provisioned
file named after the VM ID when provisioning is completed, ensuring we skip provisioning on reboot if the file already exists. Additionally, this file can serve as an indicator that provisioning has been completed.Key Points
Provisioning File Logic
\{provision_dir}/{vm_id}.provisioned
.Config.provisioning_dir
and defaults to/var/lib/azure-init/
.VM ID Retrieval
/sys/class/dmi/id/product_uuid
and byte-swapping if Gen1.is_vm_gen1()
accepts a mock EFI path, allowing test scenarios to simulate different VM generations without relying on actual system files.Integration with
main.rs
main.rs
checksis_provisioning_complete()
to see if it needs to run provisioning.mark_provisioning_complete()
creates the.provisioned
file.vm_id
once and passes it through all necessary functions.Unit Test Setup
Environment variables drive our partial mocking in tests:
TEST_PROVISION_DIR
→ points to aTempDir
instead of/var/lib/azure-init/
.Tests verify that:
mark_provisioning_complete()
actually writes the file..provisioned
file is detected, so provisioning is skipped (simulated reboot test).Linked Issues:
azure-init
with WALinuxAgent