Efficient Click-Through Rate (effCTR) implements Logistic Regression with Stochastic Gradient Descent (SGD) and Naive Bayes in Python, and it utilises sparse matrices from scipy.sparse to achive up to 60x speedup compared to implementations from scikit-learn.
Below, more information is provided about Naive Bayes and Logistic Regression with SGD developed in this repository. For each algorithm, a simple user guide is presented. Furthermore, a notebook with an example of how algorithms can be used on the real dataset can be found here.
To install the package from PyPI:
pip3 install effCTR
One can also install it directly from github:
pip3 install git+https://github.com/ksolarski/effCTR.git
Due to the high dimension of the matrix in this problem, Logistic Regression is infeasible since the matrix cannot be inverted. Hence, Stochastic Gradient Descent is applied to this problem. The loss function is specified as:
Then the gradient is given by:
Hence, in each iteration of the algorithm, the weights are updated in the following way:
where learning_rate
. One can also specify how many times to go through a dataset in the argument max_epochs
, how large data chunk is used in each iteration in the argument chunksize
, and whether to iterate through consecutive batches in dataset or draw a batch randomnly in the argumnet randomized
. One can also add early stopping by using arguments early_stopping
and n_iter_no_change
. The demo shows that the implementation from this repo can be ~60x faster on large datasets than the implementation from scikit-learn.
To fit the model:
from effCTR.models.Logistic_SGD import Logistic_SGD
Logistic_SGD = Logistic_SGD()
Logistic_SGD.fit(X, y)
Afterwards, one can otain predictions:
Logistic_SGD = Logistic_SGD.predict(X)
One can also use methods plot_weights
and log_likelihood
to plot how weights and likelihood change throughout the training process.
Under assumption of conditional independence and using Bayes theorem, one can show that probability of click given set of features
where
However, estimated probability can be zero (no observations for particular feature vaue given click or given no click). Consequently, zero probabilities are replaced either by small replace_by_epsilon
and epsilon
.
To fit the model:
from effCTR.models.Naive_Bayes import Naive_Bayes
Naive_Bayes = Naive_Bayes()
Naive_Bayes.fit(X, y)
Afterwards, one can otain predictions:
preds_bayes = Naive_Bayes.predict(X)
There is no clear path for this project. It was created for fun and learning purposes. Contributions and collaborations are welcomed and encouraged.