feat: support metrics log stream #28

airkei · 2025-01-08T23:52:48Z

Why

https://tier4.atlassian.net/browse/RT4-13461
This PR support a new log group for metrics on CloudWatch({self.profile}-edge-otaclient-metrics).
The reason why we switch the log group for metrics is that AWS subscription filter apply its filter based on the target log group level. Thus, by separating as log group, cloudwatch can apply the filter efficiently.

What

Change the existing data format in the queue and add LogGroupType field.

before: Queue[tuple[ecu_id, LogMessage]]
after: Queue[tuple[LogGroupType(enum), ecu_id, LogMessage]]

Once logging thread receives the log group type from the queue, handle the target CloudWatch log group based on the incoming parameter. The new log group and stream name will be

log
- group
  /aws/greengrass/edge/{self.region}/{self.account_id}/{self.profile}-edge-otaclient (same as the existing)
- stream
  yyyy/MM/dd/fms-${stage}-edge-${vehicle_id}-Core}/${ecu_id} (same as the existing)
metrics
- group
  /aws/greengrass/edge/{self.region}/{self.account_id}/{self.profile}-edge-otaclient-metrics
- stream
  same as the existing log stream name

NOTE: Metrics is supported only in gRPC interface. In the existing HTTP interface, there is no way to use metrics.

Test

Verified that the modified pytests are passed.
Verified E2E tests(the log was output as expected from ECU to CloudWatch)

airkei · 2025-01-09T00:00:35Z

src/otaclient_iot_logging_server/greengrass_config.py

+    @computed_field
+    @property
+    def aws_cloudwatch_metrics_log_group(self) -> str:
+        return (
+            f"/aws/greengrass/edge/{self.region}/"
+            f"{self.account_id}/{self.profile}-edge-otaclient-metrics"
+        )
+


new log group name for metrics

airkei · 2025-01-09T00:01:20Z

src/otaclient_iot_logging_server/_common.py

 from queue import Queue
 from typing import Literal, TypedDict

 from typing_extensions import NotRequired, TypeAlias

-LogsQueue: TypeAlias = "Queue[tuple[str, LogMessage]]"
+LogsQueue: TypeAlias = "Queue[tuple[LogGroupType, str, LogMessage]]"


change the data format of the queue.

airkei · 2025-01-09T00:01:44Z

tests/test_aws_iot_logger.py

@@ -76,10 +76,13 @@ def generate_random_msgs(
 ) -> list[tuple[str, LogMessage]]:
    _res: list[tuple[str, LogMessage]] = []
    for _ in range(msg_num):
-        _ecu, *_ = random.sample(ecus_list, 1)
+        _ecu_id, *_ = random.sample(ecus_list, 1)
+        _log_group_type = random.choice(list(LogGroupType))


test both log group type.

github-actions · 2025-02-26T08:34:19Z

Coverage Report

File	Stmts	Miss	Cover	Missing
src/otaclient_iot_logging_server
__init__.py	3	0	100%
__main__.py	19	1	94%	52
_common.py	19	0	100%
_log_setting.py	26	10	61%	63, 65–66, 68–69, 73–74, 77–78, 80
_sd_notify.py	33	8	75%	42, 52, 57–59, 65–67
_utils.py	54	2	96%	73, 137
_version.py	9	0	100%
aws_iot_logger.py	105	58	44%	70–72, 74–75, 78, 81–82, 85, 91, 95–102, 105–106, 109, 112–117, 121–124, 128–130, 133–135, 138–143, 158, 164–166, 168–172, 176, 226, 233–236
boto3_session.py	35	9	74%	50, 58–59, 61, 76–77, 81, 83, 91
config_file_monitor.py	44	6	86%	64–66, 83–85
configs.py	46	1	97%	75
ecu_info.py	37	1	97%	75
greengrass_config.py	101	5	95%	155, 274–277
log_proxy_server.py	48	29	39%	46–47, 49–51, 54–56, 59–60, 64, 67, 69–70, 73, 76, 80–82, 84–85, 89–90, 97–98, 100–101, 106, 112
servicer.py	59	5	91%	59, 101–103, 119
src/otaclient_iot_logging_server/v1
__init__.py	1	0	100%
_types.py	47	0	100%
api_stub.py	14	0	100%
TOTAL	700	135	80%

Tests	Skipped	Failures	Errors	Time
53	0 💤	0 ❌	0 🔥	16.909s ⏱️

airkei · 2025-02-26T10:08:45Z

src/otaclient_iot_logging_server/aws_iot_logger.py

+        for log_group_name in log_group_names:
+            try:
+                client.create_log_group(logGroupName=log_group_name)
+                logger.info(f"{log_group_name=} has been created")
+            except exc_types.ResourceAlreadyExistsException as e:
+                logger.debug(
+                    f"{log_group_name=} already existed, skip creating: {e.response}"
                )
-                raise e.__cause__ from None
-            logger.error(f"failed to create {log_group_name=}: {e!r}")
-            raise
-        except Exception as e:
-            logger.error(f"failed to create {log_group_name=}: {e!r}")
-            raise
+            except ValueError as e:
+                if e.__cause__ and isinstance(
+                    e.__cause__, awscrt.exceptions.AwsCrtError
+                ):
+                    logger.error(
+                        (f"failed to create mtls connection to remote: {e.__cause__}")
+                    )
+                    raise e.__cause__ from None
+                logger.error(f"failed to create {log_group_name=}: {e!r}")
+                raise
+            except Exception as e:
+                logger.error(f"failed to create {log_group_name=}: {e!r}")
+                raise


Create both AWS CloudWatch log groups for LOG and METRICS.

airkei · 2025-02-26T10:15:08Z

src/otaclient_iot_logging_server/servicer.py

@@ -86,7 +97,7 @@ async def _put_log(
        )
        # logger.debug(f"receive log from {ecu_id}: {_logging_msg}")
        try:
-            self._queue.put_nowait((ecu_id, _logging_msg))
+            self._queue.put_nowait((_logging_group_type, ecu_id, _logging_msg))


put group type to queue, then aws_iot_logger thread will get and handle it.

airkei · 2025-02-26T10:15:46Z

tests/test_log_proxy_server.py

+            _log_group_type, _ecu_id, _log_msg = self._queue.get_nowait()
+            # always log type is LOG in HTTP
+            assert _log_group_type == LogGroupType.LOG


verify log group for HTTP.

airkei · 2025-02-26T10:15:58Z

tests/test_log_proxy_server.py

+            assert _log_group_type == convert_from_log_type_to_log_group_type(
+                item.log_type
+            )


verify log group for gRPC.

Bodong-Yang

Thank you! Overall LGTM, only some minor comments related to variables' name of logging group.

Bodong-Yang · 2025-02-27T05:08:27Z

src/otaclient_iot_logging_server/aws_iot_logger.py

        self._session_config = session_config
        self._log_group_name = session_config.aws_cloudwatch_log_group
+        self._metrics_group_name = session_config.aws_cloudwatch_metrics_log_group
        self._interval = interval


About the variable naming here, previously we only have one log_group, so log_group equals to otaclient_logs log_group.
But now we have two log_groups, it becomes kind of ambiguous, (otaclient_logs_)log_group and (otaclient_)metrics_group are both (aws_cloudwatch_)log_group.

I suggest that we prefix a otaclient_logs to the variable name related to (otaclient_logs_)log_group

Thank you, agree.
Long formalized clear name should be better than short unclear name, fixed in the latest commit.

Bodong-Yang · 2025-02-27T05:14:13Z

src/otaclient_iot_logging_server/greengrass_config.py

+    @computed_field
+    @property
+    def aws_cloudwatch_metrics_log_group(self) -> str:
+        return (
+            f"/aws/greengrass/edge/{self.region}/"
+            f"{self.account_id}/{self.profile}-edge-otaclient-metrics"
+        )
+


Suggests that also change the previous aws_cloudwatch_log_group property name into aws_cloudwatch_otaclient_logs_log_group as we have two log groups now.

And to make naming schema matching, we can also rename aws_cloudwatch_metris_log_group into aws_cloudwatch_otaclient_metrics_log_group.

Thank you, fixed.

Bodong-Yang · 2025-02-27T05:17:18Z

src/otaclient_iot_logging_server/aws_iot_logger.py

                except Empty:
                    break

-            for log_stream_suffix, logs in message_dict.items():
+            for (log_group_type, log_stream_suffix), logs in message_dict.items():


Nice one, I just know that we can directly unpack tuple here!

sonarqubecloud · 2025-02-27T10:31:44Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
73.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Bodong-Yang

Thank you!
By the way, is the AWS configured to have metrics_groups created or to allow otaclient creates the metrics_group?

UPDATED: seems the case is AWS is configured to allow greengrass web.auto agent create log_group with any names. Since iot-logger also uses greengrass cert, we can do log_group creation.

( Although I think it is not a good practice to do so, I think FMS should define log_groups with IaC in advance, instead of letting edge to do so 🤔

airkei · 2025-02-28T00:47:42Z

By the way, is the AWS configured to have metrics_groups created or to allow otaclient creates the metrics_group?
UPDATED: seems the case is AWS is configured to allow greengrass web.auto agent create log_group with any names.

Yes, correct. In the test, verified the target log group is created from iot-logger as expected.

( Although I think it is not a good practice to do so, I think FMS should define log_groups with IaC in advance, instead of letting edge to do so 🤔

Yes, correct. Ideally, CloudWatch group should be created from template first, then iot-logger just puts the log to it. Otherwise, resource(log group) will be created outside of the control of IaC like this case.

Bodong-Yang · 2025-02-28T00:56:12Z

Yes, correct. Ideally, CloudWatch group should be created from template first, then iot-logger just puts the log to it. Otherwise, resource(log group) will be created outside of the control of IaC like this case.

Let's consult with FMS team later about this topic 👍, seems like the current settings in IaC have been there for quite a long time, time to update it.

airkei commented Jan 9, 2025

View reviewed changes

airkei changed the title ~~feat: support metrics log stream~~ DNM: feat: support metrics log stream Jan 9, 2025

airkei force-pushed the feat/support_metrics_log_stream branch from cce23c4 to 33972d8 Compare January 16, 2025 02:36

airkei force-pushed the feat/support_metrics_log_stream branch from 33972d8 to 3e59bb5 Compare February 26, 2025 08:33

airkei changed the title ~~DNM: feat: support metrics log stream~~ feat: support metrics log stream Feb 26, 2025

airkei commented Feb 26, 2025

View reviewed changes

feat: support metrics log stream

9b1a809

airkei force-pushed the feat/support_metrics_log_stream branch from 3e59bb5 to 9b1a809 Compare February 26, 2025 10:12

airkei commented Feb 26, 2025

View reviewed changes

airkei self-assigned this Feb 26, 2025

airkei marked this pull request as ready for review February 26, 2025 10:16

airkei requested a review from a team as a code owner February 26, 2025 10:16

Bodong-Yang reviewed Feb 27, 2025

View reviewed changes

airkei added 2 commits February 27, 2025 19:24

rename varialble names for log groups

82c9a07

fix test cases

8874c60

airkei requested a review from Bodong-Yang February 27, 2025 10:31

Bodong-Yang approved these changes Feb 28, 2025

View reviewed changes

airkei merged commit bfee052 into main Feb 28, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support metrics log stream #28

feat: support metrics log stream #28

airkei commented Jan 8, 2025 •

edited

Loading

airkei Jan 9, 2025

airkei Jan 9, 2025

airkei Jan 9, 2025

github-actions bot commented Feb 26, 2025 •

edited

Loading

airkei Feb 26, 2025

airkei Feb 26, 2025

airkei Feb 26, 2025

airkei Feb 26, 2025

Bodong-Yang left a comment

Bodong-Yang Feb 27, 2025 •

edited

Loading

airkei Feb 27, 2025

Bodong-Yang Feb 27, 2025

airkei Feb 27, 2025

Bodong-Yang Feb 27, 2025

sonarqubecloud bot commented Feb 27, 2025

Bodong-Yang left a comment •

edited

Loading

airkei commented Feb 28, 2025

Bodong-Yang commented Feb 28, 2025

feat: support metrics log stream #28

feat: support metrics log stream #28

Conversation

airkei commented Jan 8, 2025 • edited Loading

Why

What

Test

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bodong-Yang left a comment

Choose a reason for hiding this comment

Bodong-Yang Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Feb 27, 2025

Quality Gate passed

Bodong-Yang left a comment • edited Loading

Choose a reason for hiding this comment

airkei commented Feb 28, 2025

Bodong-Yang commented Feb 28, 2025

airkei commented Jan 8, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025 •

edited

Loading

Bodong-Yang Feb 27, 2025 •

edited

Loading

Bodong-Yang left a comment •

edited

Loading