The company wants to develop an accurate compensation delivery base on the work position level.
Because the scatter plot of the datasets shows the non-linear relationship (As shown below), hence the model chose is the SVR machine learning model to build the non-linear regression.
-
SVR Class in the scikit-learn does not include the feature scaling procedure, hence the data need to be preprocessing before using to train the model.
-
The input for the prediction need to be scaled as well.
- The algorithms are sensitive to the magnitude. Non scaled inputs could deviate the weights and contribute to the bad model.
- Can faster the calculation.
SVR model captures the relationship between the position level and the according to salary. The model does not affect by an outlier, which is the Ceo who has a much higher income than the other C-level employees.
Here is an interesting article about the CEO compensation. Statistical research on the correlation between CEO income and the company stock price performance.
https://www.institutionalinvestor.com/article/b1db3jy3201d38/The-MBA-Myth-and-the-Cult-of-the-CEO
The SVR model can successfully predict the accurate salary base on the position level, and it is robust.