Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vaex-ml: package centred around machine learning related tasks #254

Merged
merged 36 commits into from
Jun 25, 2019

Conversation

JovanVeljanoski
Copy link
Member

This is a big PR in which we are introducing vaex-ml, a vaex package centred around machine learning related tasks and applications. The following describes the contents:

  • vaex.ml.transformations: methods related to preprocessing: scalers, categorical encoders, PCA
  • vaex.ml.cluster: provides an efficient KMeans clustering algorithm
  • vaex.ml.ui: provides means to construct an ipywidget for nearly any transformer in this package
  • vaex.ml.xgboost: a binding to the xgboost library
  • vaex.ml.lightgbm: a binding to the lightgbm library
  • vaex.ml.catboost: a binding to the catboost library
  • vaex.ml.sklearn: a binding to the scikit-learn library. At the moment, only the estimators are supported.
  • vaex.ml.incubator: a module housing various machine learning models. The bindings in the incubator are considered experimental and are under testing. The API, implementation or support may change without notice.
  • vaex.ml.datasets: contains datasets for experimentation and training. Currently contains the titanic and the iris classical datasets. It also contains methods for replicating the iris dataset such that it contains a total of 10^9 samples, creating a "big data" example.
  • vaex.ml.generate: module that auto-generates an alternative API for the transformers and ML models.
  • vaex.ml.pipeline: provides a pipeline object for the vaex-ml transformers and estimators
  • vaex.ml.state: methods for serialisation of vaex objects
  • vaex.ml.linear_model: provides an implementation of linear models that operate on a grid (binned data) instead of individual samples.

Everything comes with a full suite of tests handled by pytest.

NB: The contents above and this description itself may change slightly as vaex-ml is integrated with vaex.

Copy link
Member

@maartenbreddels maartenbreddels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I'll fix travis.
packages/vaex-ml/vaex/ml/pycache/ should not go in

@@ -21,7 +21,8 @@
'vaex-server==0.2',
'vaex-hdf5==0.4',
'vaex-astro==0.4',
'vaex-arrow==0.3'
'vaex-arrow==0.3',
'vaex-ml==0.4'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think vaex-ml should be installed by default when you install vaex, like vaex-ui, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially since vaex-ml now still depends on numba.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? I thought it is good to come as a default. I am afraid it will not get noticed or add extra complexity otherwise.

For production environments, people can of course choose what to install.

@@ -0,0 +1 @@
!coverage.py: This is a private format, don't read it directly!{"lines":{}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this file should go into the repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I tried to exclude it.. i will check the gitignore file again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the pycache/ like things.

@JovanVeljanoski JovanVeljanoski merged commit 24bf089 into master Jun 25, 2019
@maartenbreddels
Copy link
Member

🎉 yeah!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants