SkyPredictor is a metric trend predictor to generate alarm baselines.
In the SkyWalking alert system, alerts can only be configured using predefined fixed values (thresholds). However, in real-world scenarios, thresholds often change over time. For example, the system's CPU usage during the morning is significantly different from the peak business hours during the day. This makes dynamic baseline prediction particularly important.
The SkyWalking Predictor service can periodically collect metric data from the SkyWalking service and predict metric values for a future period.
- Predictor: Fetch data from OAP and provides a gRPC API for querying predicted metric values.
- OAP: It provides a data query protocol and periodically retrieves predicted data from the Predictor, integrating it into the alert module.
Configure system logs in SkyWalking Predictor.
Name | Default | Environment Key | Description |
---|---|---|---|
logging.level | INFO | LOGGING_LEVEL | Minimum log output level. |
logging.format | :%(asctime)s - %(name)s - %(levelname)s - %(message)s | LOGGING_FORMAT | Log output format. |
Configure the external services provided by SkyWalking Predictor.
Name | Default | Environment Key | Description |
---|---|---|---|
server.grpc.port | 18080 | GRPC_PORT | Port for providing external gRPC services. |
server.monitor.enabled | true | MONITOR_ENABLED | Whether to enable Prometheus metrics monitoring service. |
server.monitor.port | 8000 | MONITOR_PORT | Port for providing external monitoring services. |
Configure how to fetch data, and prediction metrics for dynamic baseline.
Please make these module in OAP have been activated:
- status-query: Query
/status/config/ttl
for getting TTL of days for fetch all metrics data. - graph in query: Query service, metrics from GraphQL.
Name | Default | Environment Key | Description |
---|---|---|---|
baseline.cron | */8 * * * * | BASELINE_FETCH_CRON | Configure the execution timing of data retrieval and prediction for the baseline by a cron expression. |
baseline.fetch.server.address | http://localhost:12800/ | BASELINE_FETCH_SERVER_ENDPOINT | Address of OAP Restful server. |
baseline.fetch.server.username | BASELINE_FETCH_SERVER_USERNAME | If OAP access requires authentication, the username must be provided. | |
baseline.fetch.server.password | BASELINE_FETCH_SERVER_USERNAME | If OAP access requires authentication, the password must be provided. | |
baseline.fetch.server.down_sampling | HOUR | BASELINE_FETCH_SERVER_DOWN_SAMPLING | Specify the type of downsampling data to download from OAP, supporting HOUR and MINUTE . Note that retrieving minute-level data takes a longer time. |
baseline.fetch.server.layers | GENERAL | BASELINE_FETCH_SERVER_LAYERS | Specify which layer service data needs to be fetch. Use a comma(, ) to separate multiple layers. |
baseline.fetch.metrics | service_cpm,service_percentile | BASELINE_FETCH_METRICS | List of metrics to be monitored. Use a comma(, ) to separate multiple names. |
baseline.fetch.predict.directory | ./out_predict | BASELINE_PREDICT_DIRECTORY | The directory for save prediction results for query purposes. |
baseline.fetch.predict.min_days | 2 | BASELINE_PREDICT_MIN_DAYS | The minimum number of days of data required for metric prediction, preventing inaccuracies due to insufficient data. |
baseline.fetch.predict.frequency | h | BASELINE_PREDICT_FREQUENCY | Specify the frequency of the predicted data. Currently, only hourly (h ) is supported. |
baseline.fetch.predict.period | 24 | BASELINE_PREDICT_PERIOD | Specify the number of future data points to predict. |
Currently, VM deployment only supports single-node deployment.
Please ensure that Python is installed on the local machine, and version >= 3.12
.
# install all dependencies
make install
# starting predictor server
python3 -m server.server
Please edit the config.yaml file to deploy in your Kubernetes cluster. The following configuration need to be updated:
- OAP Restful Address: The Restful Address of OAP to fetch metrics data.
- Metrics List: Which metrics need to be fetched for prediction.
Please follow the deployment.yml to deploy the predictor in your Kubernetes cluster. Update the comment in the file, which includes two configs:
- PVC Storage Size: The storage size for storage the prediction result.
Then, you could use kubectl apply -f deployment.yml
to deploy the SkyWalking Predictor into your cluster.
NOTE: The Predictor service currently does not support cluster awareness.
Additionally, due to the ReadWriteOnce
limitation of PVC, it can only be deployed on a single node.