Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report failures to Azure via Wireserver with KVP Fallback #170

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

peytonr18
Copy link
Contributor

@peytonr18 peytonr18 commented Mar 5, 2025

This PR introduces failure reporting for provisioning. The changes ensure that if provisioning fails, the error is reported to Azure either via wireserver or, as a fallback, via KVP .

Key Changes

  • XML Payload for Failure Reporting:

    • Implemented build_report_failure_file and report_failure() functions.
    • These functions construct an XML payload similar to cloud‑init, where:
      • <State> is set to "NotReady",
      • <Substatus> is set to "ProvisioningFailed", and
      • <Description> contains a user-visible error message.
  • Integration in the Provisioning Flow:

    • In the provision() function, when a provisioning error occurs:
      • The current goalstate is fetched from wireserver.
      • report_failure() is called with the appropriate error description.
      • If the wireserver connection fails and the error can't be posted, the error is logged via tracing::error! as a fallback.

Next Steps

  • Open this as a draft PR to gather feedback.
  • Add additional unit tests for report_failure() (could use some insight into how best mimic the sever/request dynamic) and further refine the implementation if needed.

Resolves #58.

@peytonr18 peytonr18 force-pushed the probertson-report-failure branch from a7f05c0 to f8be2fe Compare March 5, 2025 02:35
@peytonr18 peytonr18 changed the title Report failures to Azure via wireserver with KVP Fallback Report failures to Azure via Wireserver with KVP Fallback Mar 5, 2025
…lure events; adding unit tests to check formatting according to cloud-init documentation; adding a KVP entry for provisioning success
@peytonr18 peytonr18 force-pushed the probertson-report-failure branch from b9d440e to c120b01 Compare March 6, 2025 03:50
@peytonr18
Copy link
Contributor Author

peytonr18 commented Mar 6, 2025

To Do items here:

  • Consider pulling report_heatlh specific functionality into helper functions to declutter on_event() -- DONE, refactored into handle_health_event().
  • Investigate how to make the "extra" dynamic based on the emitted trace (or, consider if we even need that).
  • Once PR Adding Cached VM ID File for Provisioning Lifecycle Management  #164 is merged, acquire vm_id using the function defined in status.rs.

@peytonr18 peytonr18 force-pushed the probertson-report-failure branch from 09e5f87 to 8ce2f99 Compare March 6, 2025 20:58
@cjp256
Copy link
Contributor

cjp256 commented Mar 7, 2025

There is a new endpoint which does not require us to fetch goal state and uses json instead of xml.

sudo curl -X POST -H 'x-ms-guest-agent-name: azure-init/some-version' \
--header 'Content-Type: application/json' \
--data '{
    "state": "NotReady",
    "details": {
        "subStatus":  "ProvisioningFailed",
        "description": "The VM encountered an error during deployment."
    }
}' \
http://168.63.129.16/provisioning/health 

Example provisoning complete report without description
{
    "state": "Ready",
}

Example provisoning failure report
{
    "state": "NotReady",
    "details": {
        "subStatus":  "ProvisioningFailed",
        "description": "The provisioning client has something to report."
    }
}

Example provisioning in-progress report
{
    "state": "NotReady",
    "details": {

        "subStatus":  "Provisioning",
        "description": "The provisioning client has something to report."
    }
}
Response Code Meaning
201 success
429 too many requests. Rate limted
503 GoalState is not ready yet, try again

We don't have to do it in this PR, but I would like to switch over to the new endpoint to avoid dealing with goalstate.

The one upside to fetching goalstate is getting the container id.

@peytonr18 peytonr18 force-pushed the probertson-report-failure branch 3 times, most recently from 37e441e to 24c4c29 Compare March 24, 2025 22:47
…() so we can't emit traces there. Instead, move report_success to the provisioing_result block and check for success there or go to failure logic.
@peytonr18 peytonr18 force-pushed the probertson-report-failure branch from 24c4c29 to 33297d7 Compare March 24, 2025 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFE] Report failures to Azure when there's an unrecoverable error
2 participants