-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FeaturesRefactor] Features::view()/preprocess() #3970
[FeaturesRefactor] Features::view()/preprocess() #3970
Conversation
- implement DotIterator, SGMatrix and SGVector iterators - Perceptron training with iterators + unit test SGVector: make SGVector(index_t len) 'ctor zero-initialize memory with SG_CALLOC like SGMatrix(index_t rows, index_t cols) 'ctor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice one! we should talk about some :)
SGMatrix<ST> target(first_vec.vlen, get_num_vectors()); | ||
target.set_column(0, first_vec); | ||
|
||
for (index_t i = 1; i < get_num_vectors(); ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole part above should be rather done by a smart memcpy instead of setting the columns separately. needs a stride etc.
@@ -1067,6 +1048,24 @@ CFeatures* CDenseFeatures<ST>::create_merged_copy(CFeatures* other) | |||
return create_merged_copy(list); | |||
} | |||
|
|||
template <class ST> | |||
Some<CDenseFeatures<ST>> CDenseFeatures<ST>::view(const SGVector<index_t>& subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem with this is that this right away makes it inaccessible/usable from SWIG interfaces... we are having difficulties with exposing Some in SWIG
auto feats_view = wrap(this->duplicate()); | ||
|
||
auto sg_subset = SGVector<index_t>(subset.size()); | ||
std::copy(subset.cbegin(), subset.cend(), sg_subset.data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg_memcpy
? as both std::vector and SGVector backed by a continuous memory chunk
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue is now being closed due to a lack of activity. Feel free to reopen it. |
[Work in progress]
Continues #3968, only the last commit is relevant.
The design is outlined in this gist.
I might split this into a few PRs that builds on the fundamental changes [1-2-3],
[4] cross validation, [5]
add/remove_subset()
replacement, [6] preprocessors.Anyway, what's in here:
Base methods to construct new features
Features::preprocess()/view()
.Base method to construct new labels
Labels::view()
, this requires a shallow copy 'ctorlike in
Features
classes (duplicate()
+ copy 'ctors),implemented for (Dense/Binary/Multiclass/Regression)Labels.
DenseFeatures
:get_feature_vector()
method;eval()
method;get_feature_matrix()
constness.CrossValidation
: useview()
instead ofadd/remove_subset()
.Replace
add/remove_subset()
withview()
+ unit tests failing due toDenseFeatures
changes [see below]:StochasticGBMachine
: refactorget_subset()
and related functionsto use a new labels object instead of adding the subset to
m_labels
,refactor unit tests and add case that covers
subset_frac < 1
branch;LMNNImpl
: refactor to usepreprocess()
instead ofapply_to_feature_matrix()
;CARTree
: fix feature matrix constness, minor changes to unit test (some
);NormOne::apply_to_feature_vector()
uses linalg;PruneVarSubMean::init()
to useDotFeatures::get_mean()/get_std()
(the latter was added to
DotFeatures
), add unit test;RandomFourierGaussPreproc
: fix feature matrix constness;MultipleProcessors
unit test with eval case;LeastAngleRegression.ols_equivalence
unit test to usepreprocess()
[Misc] Initializer list constructor for
SGMatrix/Vector
(not a big breakthrough but anyway useful for unit tests with fixed data...)
Unit tests: