Skip to content

Commit

Permalink
doc: Overhaul debugging documentation (#4578)
Browse files Browse the repository at this point in the history
- New page and content describing debugging for users
- New page and content documenting cloud-init's status
- New page and content documenting cloud-init's exported errors
- New page and content documenting cloud-init's failure states
- New page and content documenting how to re-run cloud-init
- New content documenting how validate user-data
- New content documenting how to use cloud-init with libvirt

Documents GH-4500
Fixes GH-4608
  • Loading branch information
holmanb committed Dec 6, 2023
1 parent 3393d82 commit f356f97
Show file tree
Hide file tree
Showing 12 changed files with 806 additions and 211 deletions.
1 change: 0 additions & 1 deletion doc/rtd/development/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,6 @@ Debugging and reporting

../howto/bugs.rst
logging.rst
debugging.rst
internal_files.rst
../howto/debugging.rst

Expand Down
131 changes: 131 additions & 0 deletions doc/rtd/explanation/exported_errors.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
.. _exported_errors:

Exported errors
===============

Cloud-init makes internal errors available to users for debugging. These
errors map to logged errors and may be useful for understanding what
happens when cloud-init doesn't do what you expect.

Aggregated errors
-----------------

When a :ref:`recoverable error<recoverable_failure>` occurs, the internal
cloud-init state information is made visible under a top level aggregate key
``recoverable_errors`` with errors sorted by error level:

.. code-block:: shell-session
:emphasize-lines: 11-19
$ cloud-init status --format json
{
"boot_status_code": "enabled-by-generator",
"config": {...},
"datasource": "",
"detail": "Cloud-init enabled by systemd cloud-init-generator",
"errors": [],
"extended_status": "degraded done",
"init": {...},
"last_update": "",
"recoverable_errors":
{
"WARNING": [
"Failed at merging in cloud config part from p-01: empty cloud config",
"No template found in /etc/cloud/templates for template source.deb822",
"No template found in /etc/cloud/templates for template sources.list",
"No template found, not rendering /etc/apt/soures.list.d/ubuntu.source"
]
},
"status": "done"
}
Reported recoverable error messages are grouped by the level at which
they are logged. Complete list of levels in order of increasing
criticality:

.. code-block:: shell-session
WARNING
DEPRECATED
ERROR
CRITICAL
Each message has a single level. In cloud-init's :ref:`log files<log_files>`,
the level at which logs are reported is configurable. These messages are
exported via the ``'recoverable_errors'`` key regardless of which level of
logging is configured.

Per-stage errors
----------------

The keys ``errors`` and ``recoverable_errors`` are also exported for each
stage to allow identifying when recoverable and non-recoverable errors
occurred.

.. code-block:: shell-session
:emphasize-lines: 4-11,16-21
$ cloud-init status --format json
{
"boot_status_code": "enabled-by-generator",
"config":
{
"WARNING": [
"No template found in /etc/cloud/templates for template source.deb822",
"No template found in /etc/cloud/templates for template sources.list",
"No template found, not rendering /etc/apt/soures.list.d/ubuntu.source"
]
},
"datasource": "",
"detail": "Cloud-init enabled by systemd cloud-init-generator",
"errors": [],
"extended_status": "degraded done",
"init":
{
"WARNING": [
"Failed at merging in cloud config part from p-01: empty cloud config",
]
},
"last_update": "",
"recoverable_errors":
{
"WARNING": [
"Failed at merging in cloud config part from p-01: empty cloud config",
"No template found in /etc/cloud/templates for template source.deb822",
"No template found in /etc/cloud/templates for template sources.list",
"No template found, not rendering /etc/apt/soures.list.d/ubuntu.source"
]
},
"status": "done"
}
.. note::

Only completed cloud-init stages are listed in the output of
``cloud-init status --format json``.

The JSON representation of cloud-init :ref:`boot stages<boot_stages>`
(in run order) is:

.. code-block:: shell-session
"init-local"
"init"
"modules-config"
"modules-final"
Limitations of exported errors
------------------------------

- Exported recoverable errors represent logged messages, which are not
guaranteed to be stable between releases. The contents of the
``'errors'`` and ``'recoverable_errors'`` keys are not guaranteed to have
stable output.
- Exported errors and recoverable errors may occur at different stages
since users may reorder configuration modules to run at different
stages via :file:`cloud.cfg`.

Where to next?
--------------
See :ref:`here<how_to_debug>` for a detailed guide to debugging cloud-init.
78 changes: 78 additions & 0 deletions doc/rtd/explanation/failure_states.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
.. _failure_states:

Failure states
==============

Cloud-init has multiple modes of failure. This page describes these
modes and how to gather information about failures.

.. _critical_failure:

Critical failure
----------------

Critical failures happens when cloud-init experiences a condition that it
cannot safely handle. When this happens, cloud-init may be unable to complete,
and the instance is likely to be in an unknown broken state.

Cloud-init experiences critical failure when:

* there is a major problem with the cloud image that is running cloud-init
* there is a severe bug in cloud-init

When this happens, error messages will be visible in output of
``cloud-init status --long`` within the ``'error'``.

The same errors will also be located under the key nested under the
module-level keys that store information related to each
:ref:`stage of cloud-init<boot_stages>`: ``init-local``, ``init``,
``modules-config``, ``modules-final``.

.. _recoverable_failure:

Recoverable failure
-------------------

In the case that cloud-init is able to complete yet something went wrong,
cloud-init has experienced a "recoverable failure". When this happens,
the service will return with exit code 2, and error messages will be
visible in the output of ``cloud-init status --long`` under the top
level ``recoverable_errors`` and ``error`` keys.

To identify which stage an error came from, one can check under the
module-level keys: ``init-local``, ``init``, ``modules-config``,
``modules-final`` for the same error keys.

See :ref:`this more detailed explanation<exported_errors>` for to learn how to
use cloud-init's exported errors.

Cloud-init error codes
----------------------

Cloud-init's ``status`` subcommand is useful for understanding which type of
error cloud-init experienced while running. The return code will be one of the
following:

.. code-block:: shell-session
0 - success
1 - unrecoverable error
2 - recoverable error
If ``cloud-init status`` exits with exit code 1, cloud-init experienced
critical failure and was unable to recover. In this case, something is likely
seriously wrong with the system, or cloud-init has experienced a serious bug.
If you believe that you have experienced a serious bug, please file a
:ref:`bug report<reporting_bugs>`.

If cloud-init exits with exit code 2, cloud-init was able to complete
gracefully, however something went wrong and the user should investigate.

See :ref:`this more detailed explanation<reported_status>` for more information
on cloud-init's status.

Where to next?
--------------

See :ref:`our more detailed guide<how_to_debug>` for a detailed guide to
debugging cloud-init.
2 changes: 2 additions & 0 deletions doc/rtd/explanation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,5 @@ knowledge and become better at using and configuring ``cloud-init``.
security.rst
analyze.rst
kernel-cmdline.rst
failure_states.rst
exported_errors.rst
50 changes: 31 additions & 19 deletions doc/rtd/howto/debug_user_data.rst
Original file line number Diff line number Diff line change
@@ -1,43 +1,55 @@
How to debug user data
======================
.. _check_user_data_cloud_config:

Two of the most common issues with cloud config user data are:
How to validate user data cloud config
======================================

The two most common issues with cloud config user data are:

1. Incorrectly formatted YAML
2. The first line does not contain ``#cloud-config``
2. The first line does not start with ``#cloud-config``

Static user data validation
---------------------------

To verify your cloud config is valid YAML you can use `validate-yaml.py`_.

To ensure the keys and values in your user data are correct, you can run:
Cloud-init is capable of validating cloud config user data directly from
its datasource (i.e. on a running cloud instance). To do this, you can run:

.. code-block:: shell-session
sudo cloud-init schema --system --annotate
Or, to test YAML in a file:
Or, to test YAML in a specific file:

.. code-block:: shell-session
cloud-init schema -c test.yml --annotate
Log analysis
------------
Example output:

If you can log into your system, the best way to debug your system is to
check the contents of the log files :file:`/var/log/cloud-init.log` and
:file:`/var/log/cloud-init-output.log` for warnings, errors, and
tracebacks. Tracebacks are always reportable bugs.
.. code-block:: shell-session
To report any bugs you find, :ref:`refer to this guide <reporting_bugs>`.
$ cloud-init schema --config-file=test.yaml --annotate
#cloud-config
users:
- name: holmanb # E1,E2,E3
gecos: Brett Holman
primary_group: holmanb
lock_passwd: false
invalid_key: true
Validation service
------------------
# Errors: -------------
# E1: Additional properties are not allowed ('invalid_key' was unexpected)
# E2: {'name': 'holmanb', 'gecos': 'Brett Holman', 'primary_group': 'holmanb', 'lock_passwd': False, 'invalid_key': True} is not of type 'array'
# E3: {'name': 'holmanb', 'gecos': 'Brett Holman', 'primary_group': 'holmanb', 'lock_passwd': False, 'invalid_key': True} is not of type 'string'
Another option to is to use the self-hosted HTTP `validation service`_,
refer to its documentation for more info.
Debugging
---------

If your user-data cloud config is correct according to the `cloud-init schema`
command, but you are still having issues, then please refer to our
:ref:`debugging guide<how_to_debug>`.

To report any bugs you find, :ref:`refer to this guide <reporting_bugs>`.

.. LINKS
.. _validate-yaml.py: https://github.com/canonical/cloud-init/blob/main/tools/validate-yaml.py
Expand Down
Loading

0 comments on commit f356f97

Please sign in to comment.