Skip to content

Commit

Permalink
Merge branch 'main' into fix/revert-loading-smoother-params-from-node…
Browse files Browse the repository at this point in the history
…-instance
  • Loading branch information
mkuri authored Apr 1, 2022
2 parents 82949b5 + a8025f0 commit 071c065
Show file tree
Hide file tree
Showing 11 changed files with 430 additions and 159 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,6 @@ class OcclusionSpotModule : public SceneModuleInterface
PlannerParam param_;
tier4_autoware_utils::StopWatch<std::chrono::milliseconds> stop_watch_;
std::vector<lanelet::BasicPolygon2d> partition_lanelets_;
std::vector<lanelet::BasicPolygon2d> close_partition_;

protected:
int64_t module_id_{};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,8 @@ bool OcclusionSpotModule::modifyPathVelocity(
std::vector<utils::PossibleCollisionInfo> possible_collisions;
// extract only close lanelet
if (param_.use_partition_lanelet) {
planning_utils::extractClosePartition(ego_pose.position, partition_lanelets_, close_partition_);
planning_utils::extractClosePartition(
ego_pose.position, partition_lanelets_, debug_data_.close_partition);
}
DEBUG_PRINT(show_time, "extract[ms]: ", stop_watch_.toc("processing_time", true));
std::vector<geometry_msgs::msg::Point> parked_vehicle_point;
Expand Down Expand Up @@ -153,7 +154,6 @@ bool OcclusionSpotModule::modifyPathVelocity(
utils::applySafeVelocityConsideringPossibleCollision(path, possible_collisions, param_);
// these debug topics needs computation resource
debug_data_.parked_vehicle_point = parked_vehicle_point;
debug_data_.close_partition = close_partition_;
debug_data_.z = path->points.front().point.pose.position.z;
debug_data_.possible_collisions = possible_collisions;
debug_data_.interp_path = interp_path;
Expand Down
6 changes: 3 additions & 3 deletions system/system_monitor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,12 +128,12 @@ Thermal throttling event can be monitored by reading contents of MSR(Model Speci

## <u>HDD Monitor</u>

Generally, S.M.A.R.T. information is used to monitor HDD temperature, and normally accessing disk device node is allowed for root user or disk group.<br>
Generally, S.M.A.R.T. information is used to monitor HDD temperature and life of HDD, and normally accessing disk device node is allowed for root user or disk group.<br>
As with the CPU monitor, this package provides an approach to minimize security risks as much as possible:<br>

- Provide a small program named 'hdd_reader' which accesses S.M.A.R.T. information and sends HDD temperature to HDD monitor by using socket programming.
- Provide a small program named 'hdd_reader' which accesses S.M.A.R.T. information and sends some items of it to HDD monitor by using socket programming.
- Run 'hdd_reader' as a specific user.
- HDD monitor is able to know HDD temperature as an unprivileged user since HDD temperature is sent by socket communication.
- HDD monitor is able to know some items of S.M.A.R.T. information as an unprivileged user since those are sent by socket communication.

### Instructions before starting

Expand Down
3 changes: 3 additions & 0 deletions system/system_monitor/config/hdd_monitor.param.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,8 @@
name: /
temp_warn: 55.0
temp_error: 70.0
power_on_hours_warn: 3000000
total_data_written_warn: 4915200 # =150TB (1unit=32MB)
total_data_written_safety_factor: 0.05
free_warn: 5120 # MB(8hour)
free_error: 100 # MB(last 1 minute)
22 changes: 11 additions & 11 deletions system/system_monitor/docs/hdd_reader.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

## Name

hdd_reader - Read S.M.A.R.T. information for monitoring HDD temperature
hdd_reader - Read S.M.A.R.T. information for monitoring HDD temperature and life of HDD

## Synopsis

hdd_reader [OPTION]

## Description

Read S.M.A.R.T. information for monitoring HDD temperature.<br>
Read S.M.A.R.T. information for monitoring HDD temperature and life of HDD.<br>
This runs as a daemon process and listens to a TCP/IP port (7635 by default).

**Options:**<br>
Expand All @@ -24,15 +24,15 @@ Returns 0 if OK; non-zero otherwise.

## Notes

The 'hdd_reader' accesses minimal data enough to get Model number, Serial number, and HDD temperature.<br>
The 'hdd_reader' accesses minimal data enough to get Model number, Serial number, HDD temperature, and life of HDD.<br>
This is an approach to limit its functionality, however, the functionality can be expanded for further improvements and considerations in the future.<br><br>

### [ATA]

| Purpose | Name | Length |
| --------------------------- | -------------------- | -------------------- |
| Model number, Serial number | IDENTIFY DEVICE data | 256 words(512 bytes) |
| HDD temperature | SMART READ DATA | 256 words(512 bytes) |
| Purpose | Name | Length |
| ---------------------------- | -------------------- | -------------------- |
| Model number, Serial number | IDENTIFY DEVICE data | 256 words(512 bytes) |
| HDD temperature, life of HDD | SMART READ DATA | 256 words(512 bytes) |

For details please see the documents below.<br>

Expand All @@ -43,10 +43,10 @@ For details please see the documents below.<br>

### [NVMe]

| Purpose | Name | Length |
| --------------------------- | ---------------------------------- | ---------------- |
| Model number, Serial number | Identify Controller data structure | 4096 bytes |
| HDD temperature | SMART / Health Information | 1 Dword(4 bytes) |
| Purpose | Name | Length |
| ---------------------------- | ---------------------------------- | ------------------- |
| Model number, Serial number | Identify Controller data structure | 4096 bytes |
| HDD temperature, life of HDD | SMART / Health Information | 36 Dword(144 bytes) |

For details please see the documents below.<br>

Expand Down
14 changes: 9 additions & 5 deletions system/system_monitor/docs/ros_parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,15 @@ hdd_monitor:

&nbsp;&nbsp;disks:

| Name | Type | Unit | Default | Notes |
| :--------- | :----: | :--: | :-----: | :-------------------------------------------------------------------------- |
| name | string | n/a | none | The disk name to monitor temperature. (e.g. /dev/sda) |
| temp_error | float | DegC | 55.0 | Generates warning when HDD temperature reaches a specified value or higher. |
| temp_error | float | DegC | 70.0 | Generates error when HDD temperature reaches a specified value or higher. |
| Name | Type | Unit | Default | Notes |
| :------------------------------- | :----: | :---------------: | :-----: | :--------------------------------------------------------------------------------- |
| name | string | n/a | none | The disk name to monitor temperature. (e.g. /dev/sda) |
| temp_warn | float | DegC | 55.0 | Generates warning when HDD temperature reaches a specified value or higher. |
| temp_error | float | DegC | 70.0 | Generates error when HDD temperature reaches a specified value or higher. |
| power_on_hours_warn | int | Hour | 3000000 | Generates warning when HDD power-on hours reaches a specified value or higher. |
| total_data_written_attribute_id | int | n/a | 0xF1 | S.M.A.R.T attribute ID of total data written. |
| total_data_written_warn | int | depends on device | 4915200 | Generates warning when HDD total data written reaches a specified value or higher. |
| total_data_written_safety_factor | int | %(1e-2) | 0.05 | Safety factor of HDD total data written. |

hdd_monitor:

Expand Down
44 changes: 43 additions & 1 deletion system/system_monitor/docs/topics_hdd_monitor.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# ROS topics: CPU Monitor
# ROS topics: HDD Monitor

## <u>HDD Temperature</u>

Expand All @@ -22,6 +22,48 @@
| HDD [0-9]: serial | S4EMNF0M820682 |
| HDD [0-9]: temperature | 37.0 DegC |

## <u>HDD PowerOnHours</u>

/diagnostics/hdd_monitor: HDD PowerOnHours

<b>[summary]</b>

| level | message |
| ----- | -------------- |
| OK | OK |
| WARN | lifetime limit |

<b>[values]</b>

| key | value (example) |
| ------------------------- | ----------------------- |
| HDD [0-9]: status | OK / lifetime limit |
| HDD [0-9]: name | /dev/nvme0 |
| HDD [0-9]: model | PHISON PS5012-E12S-512G |
| HDD [0-9]: serial | FB590709182505050767 |
| HDD [0-9]: power on hours | 4834 Hours |

## <u>HDD TotalDataWritten</u>

/diagnostics/hdd_monitor: HDD TotalDataWritten

<b>[summary]</b>

| level | message |
| ----- | --------------- |
| OK | OK |
| WARN | warranty period |

<b>[values]</b>

| key | value (example) |
| ----------------------------- | ---------------------------- |
| HDD [0-9]: status | OK / warranty period |
| HDD [0-9]: name | /dev/nvme0 |
| HDD [0-9]: model | PHISON PS5012-E12S-512G |
| HDD [0-9]: serial | FB590709182505050767 |
| HDD [0-9]: total data written | 146295330 <br> not available |

## <u>HDD Usage</u>

/diagnostics/hdd_monitor: HDD Usage
Expand Down
36 changes: 36 additions & 0 deletions system/system_monitor/include/hdd_reader/hdd_reader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,39 @@
#include <boost/serialization/serialization.hpp>
#include <boost/serialization/string.hpp>

#include <bitset>
#include <map>
#include <string>

/**
* @brief ATA attribute IDs
*/
enum class ATAAttributeIDs : uint8_t { TEMPERATURE = 0, POWER_ON_HOURS = 1, SIZE };

/**
* @brief HDD device
*/
struct HDDDevice
{
std::string name_; //!< @brief Device name
uint8_t
total_data_written_attribute_id_; //!< @brief S.M.A.R.T attribute ID of total data written

/**
* @brief Load or save data members.
* @param [inout] ar archive reference to load or save the serialized data members
* @param [in] version version for the archive
* @note NOLINT syntax is needed since this is an interface to serialization and
* used inside boost serialization.
*/
template <typename archive>
void serialize(archive & ar, const unsigned /*version*/) // NOLINT(runtime/references)
{
ar & name_;
ar & total_data_written_attribute_id_;
}
};

/**
* @brief HDD information
*/
Expand All @@ -38,6 +68,9 @@ struct HDDInfo
uint8_t temp_; //!< @brief temperature(DegC)
// Lowest byte of the raw value contains the exact temperature value (Celsius degrees)
// in S.M.A.R.T. information.
uint64_t power_on_hours_; //!< @brief power on hours count
uint64_t total_data_written_; //!< @brief total data written
bool is_valid_total_data_written_; //!< @brief whether total_data_written_ is valid value

/**
* @brief Load or save data members.
Expand All @@ -53,6 +86,9 @@ struct HDDInfo
ar & model_;
ar & serial_;
ar & temp_;
ar & power_on_hours_;
ar & total_data_written_;
ar & is_valid_total_data_written_;
}
};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#define SYSTEM_MONITOR__HDD_MONITOR__HDD_MONITOR_HPP_

#include <diagnostic_updater/diagnostic_updater.hpp>
#include <hdd_reader/hdd_reader.hpp>

#include <climits>
#include <map>
Expand All @@ -32,13 +33,35 @@
*/
struct HDDParam
{
std::string device_; //!< @brief device
float temp_warn_; //!< @brief HDD temperature(DegC) to generate warning
float temp_error_; //!< @brief HDD temperature(DegC) to generate error
int free_warn_; //!< @brief HDD free space(MB) to generate warning
int free_error_; //!< @brief HDD free space(MB) to generate error
std::string device_; //!< @brief device
float temp_warn_; //!< @brief HDD temperature(DegC) to generate warning
float temp_error_; //!< @brief HDD temperature(DegC) to generate error
int power_on_hours_warn_; //!< @brief HDD power on hours to generate warning
uint64_t total_data_written_warn_; //!< @brief HDD total data written to generate warning
float total_data_written_safety_factor_; //!< @brief safety factor of HDD total data written
int free_warn_; //!< @brief HDD free space(MB) to generate warning
int free_error_; //!< @brief HDD free space(MB) to generate error

HDDParam()
: temp_warn_(55.0),
temp_error_(70.0),
power_on_hours_warn_(3000000),
total_data_written_warn_(4915200),
total_data_written_safety_factor_(0.05),
free_warn_(5120),
free_error_(100)
{
}
};

HDDParam() : temp_warn_(55.0), temp_error_(70.0), free_warn_(5120), free_error_(100) {}
/**
* @brief SMART information items to check
*/
enum class HDDSMARTInfoItem : uint32_t {
TEMPERATURE = 0,
POWER_ON_HOURS = 1,
TOTAL_DATA_WRITTEN = 2,
SIZE
};

class HDDMonitor : public rclcpp::Node
Expand All @@ -50,11 +73,6 @@ class HDDMonitor : public rclcpp::Node
*/
explicit HDDMonitor(const rclcpp::NodeOptions & options);

/**
* @brief Update the diagnostic state.
*/
void update();

protected:
using DiagStatus = diagnostic_msgs::msg::DiagnosticStatus;

Expand All @@ -64,7 +82,36 @@ class HDDMonitor : public rclcpp::Node
* @note NOLINT syntax is needed since diagnostic_updater asks for a non-const reference
* to pass diagnostic message updated in this function to diagnostic publish calls.
*/
void checkTemp(diagnostic_updater::DiagnosticStatusWrapper & stat); // NOLINT(runtime/references)
void checkSMARTTemperature(
diagnostic_updater::DiagnosticStatusWrapper & stat); // NOLINT(runtime/references)

/**
* @brief check HDD power on hours
* @param [out] stat diagnostic message passed directly to diagnostic publish calls
* @note NOLINT syntax is needed since diagnostic_updater asks for a non-const reference
* to pass diagnostic message updated in this function to diagnostic publish calls.
*/
void checkSMARTPowerOnHours(
diagnostic_updater::DiagnosticStatusWrapper & stat); // NOLINT(runtime/references)

/**
* @brief check HDD total data written
* @param [out] stat diagnostic message passed directly to diagnostic publish calls
* @note NOLINT syntax is needed since diagnostic_updater asks for a non-const reference
* to pass diagnostic message updated in this function to diagnostic publish calls.
*/
void checkSMARTTotalDataWritten(
diagnostic_updater::DiagnosticStatusWrapper & stat); // NOLINT(runtime/references)

/**
* @brief check S.M.A.R.T. information
* @param [out] stat diagnostic message passed directly to diagnostic publish calls
* @note NOLINT syntax is needed since diagnostic_updater asks for a non-const reference
* to pass diagnostic message updated in this function to diagnostic publish calls.
*/
void checkSMART(
diagnostic_updater::DiagnosticStatusWrapper & stat,
HDDSMARTInfoItem item); // NOLINT(runtime/references)

/**
* @brief check HDD usage
Expand Down Expand Up @@ -94,19 +141,39 @@ class HDDMonitor : public rclcpp::Node
*/
std::string getDeviceFromMountPoint(const std::string & mount_point);

/**
* @brief timer callback
*/
void onTimer();

/**
* @brief update HDD information list
*/
void updateHDDInfoList();

diagnostic_updater::Updater updater_; //!< @brief Updater class which advertises to /diagnostics
rclcpp::TimerBase::SharedPtr timer_; //!< @brief timer to get HDD information from HDDReader

char hostname_[HOST_NAME_MAX + 1]; //!< @brief host name

int hdd_reader_port_; //!< @brief port number to connect to hdd_reader
std::map<std::string, HDDParam> hdd_params_; //!< @brief list of error and warning levels
std::vector<std::string> hdd_devices_; //!< @brief list of devices
std::vector<HDDDevice> hdd_devices_; //!< @brief list of devices
//!< @brief diagnostic of connection
diagnostic_updater::DiagnosticStatusWrapper connect_diag_;
HDDInfoList hdd_info_list_; //!< @brief list of HDD information

/**
* @brief HDD temperature status messages
* @brief HDD SMART status messages
*/
const std::map<int, const char *> temp_dict_ = {
{DiagStatus::OK, "OK"}, {DiagStatus::WARN, "hot"}, {DiagStatus::ERROR, "critical hot"}};
const std::map<int, const char *> smart_dicts_[static_cast<uint32_t>(HDDSMARTInfoItem::SIZE)] = {
// temperature
{{DiagStatus::OK, "OK"}, {DiagStatus::WARN, "hot"}, {DiagStatus::ERROR, "critical hot"}},
// power on hours
{{DiagStatus::OK, "OK"}, {DiagStatus::WARN, "lifetime limit"}, {DiagStatus::ERROR, "unused"}},
// total data written
{{DiagStatus::OK, "OK"}, {DiagStatus::WARN, "warranty period"}, {DiagStatus::ERROR, "unused"}},
};

/**
* @brief HDD usage status messages
Expand Down
Loading

0 comments on commit 071c065

Please sign in to comment.