-
Notifications
You must be signed in to change notification settings - Fork 0
Home
- Resource monitoring
- Performance monitoring
Consider a Black-Box where you are getting values from a proximity sensor, as a time-series data. You want to predict and do inference with those values for certain classes using best approaches, such just it is optimized to full possibility and doesn't create any overhead during real-time communications.
You have the sensor and a microcontroller board (in this repository mostly everything is related to STM32-ULP).
A possible pipeline could be:
A more representational pipeline could be:
One possible approach for data annotation after raw data collection from sensors:
Label Studio - https://labelstud.io//
An example of how configuration can be done in label-studio to represent different classes:
<View>
<Header value="Time Series classification" style="font-weight: normal"/>
<Choices name="pattern" toName="ts">
<Choice value="Left_Prox"/>
</Choices>
<TimeSeriesLabels name="label" toName="ts">
<Label value="phase_1" background="red"/>
<Label value="phase_2" background="green"/>
<Label value="phase_3" background="yellow"/>
<Label value="phase_4" background="blue"/>
</TimeSeriesLabels>
<TimeSeries name="ts" value="$Left_Prox" valueType="url">
<Channel column="Left_Prox"/>
</TimeSeries>
</View>
How the data could look like:
Transforming raw data into features
The performance of ML models heavily depends on the relevance of the features used to train them.
5 processes in feature engineering: -->Feature Creation -->Feature Transformation -->Feature Extraction -->Feature Selection -->Feature Scaling
Ref: https://www.geeksforgeeks.org/what-is-feature-engineering/
Some of the feature selection methods used here, in-general and in-specific to LinearSVC(the model)
In general approach :
-
minimum Redundancy - Maximum Relevance(mRMR) - https://github.com/smazzanti/mrmr
-
SHAP(SHapley Additive exPlanations) - explains the impact of different features on the output of the model. Its values show how much each feature contributes to the model's decision, providing insight into feature importance.
Features at the top have a higher impact on the model’s output(importance) compared to those at the bottom. The more the spread from 0, the larger the impact of that particular feature.
Model specific approach :
SVM weighting - using the absolute coefficients. These weights/coefficients tells how important each feature is to the decision boundary created by the model. The larger the absolute value of the weight, the more significant the feature is for classification.
Recursive feature elimination - least important features are pruned from the current set of features.
| see example |
svc = LinearSVC(C=0.1, dual=False, loss='squared_hinge', penalty='l2', class_weight={0: 1, 1: 10}, max_iter=10000, verbose=True)
rfe = RFE(estimator=svc, n_features_to_select=30, step=1)
rfe.fit(X_train_standardized, y_train)
X_train_rfe = rfe.transform(X_train_standardized)
X_test_rfe = rfe.transform(X_test_standardized)
svc.fit(X_train_rfe, y_train)