-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performing xcs.predict(X)
alters the model when it should not (supervised learning)
#58
Comments
Note that the first option for solving this may be as simple as changing this line to if (xcsf->explore) {
clset_add(&xcsf->pset, new);
} as well as freeing up that memory at the end of |
My workaround for now looks like this (I do all this in Python):
The increasing of the population size is necessary since otherwise the maximum population size is exceeded and a rule is deleted by roulette wheel deletion—and if we use a small fitness (which we should do due to the fitness also being used as a mixing weight), the probability of the default rule being deleted is rather high. Note that I don't think that this is a very good workaround. 😉 |
You're right that generally we wouldn't expect I think a simple temporary solution from Python would be to checkpoint the population set by calling A proper solution for this will require some thought - it would be reasonably simple to make it return user-defined values instead of covering when calling I think in practice if there is no matching rule then you would want to impute the value? Something like using the most common class or picking the nearest training sample etc. etc., which I guess could be done manually in Python afterwards if those samples could be identified. |
Ah, nice, I was not aware of that workaround. Thank you! I'll have to benchmark how much of a performance problem that is when compared to adding a default rule, though.
That would definitely be an option.
I'm wrapping XCS into an (almost) scikit-learn-compatible Python object anyway and within that object I'm already computing a default prediction (right now I go with the data mean) which is used to create the default rule. Being able to provide the default prediction directly to |
Work around xcsf-dev/xcsf#58 .
Hi! 🙂
When doing supervised learning, I'd expect
model.predict(X)
not to changemodel
. However,xcs.predict(X)
does sometimes change the model. This is especially problematic for large, high dimensional data sets.Why is the model even changed by
xcs.predict(X)
? Due to how XCSF performs covering: The classifiers created by covering are simply added to the population (and correspondingly other classifers are deleted).In my opinion, this is generally undesirable when doing supervised learning where users expect
model.predict(X)
not to alter the model. Also, in most cases, supervised learning fits the model once and from then on only performs predictions which means that there will never be a fitness signal for these newly created classifiers.This is especially problematic for high dimensional data where new data is with only a low probability matched by existing classifiers. In my case, I had a large test data set (50000 test data points, 20 dimensions) and upon doing
xcs.predict(X)
the existing, fitted, population essentially got erased (all classifiers were replaced by random new classifiers with experience 0 and a correspondingly bad fit). While I'd expect the predictions to be bad in this case, I would definitely not expect the model's state to be erased.How could this be solved? I guess one way to approach this while keeping the overall XCSF character would be to, upon
xcs.predict(X)
, generate covering classifiers as necessary to perform the prediction but to not put them into the population. Another way would be to add a default rule which matches everything and predicts the data mean or something like that.What do you think?
The text was updated successfully, but these errors were encountered: