MindWare is an efficient open-source system to help users to automate the process of 1) data pre-processing, 2) feature engineering, 3) algorithm selection, 4) architecture design, 5) hyper-parameter tuning, and 6) model ensembling. It is capable of improving its AutoML power by decomposing the entire large AutoML search space into small ones, and solve each sub-problems jointly and efficiently. MindWare is designed and developed by the AutoML team from the DAIR Lab at Peking University. The goal is to make machine learning easier to apply both in industry and academia, and help facilitate data science.
- Non-expert users who want to use machine learning techniques in their applications.
- ML Platform owners who want to support AutoML in their platform.
- Researchers and data scientists who want to experiment new AutoML algorithms, may it be: hyper-parameter tuning algorithm, neural architect search method, etc.
- User friendliness. MindWare needs few human assistance. To use MindWare, the users can define the task by writing only a few lines of code, regardless of the techinical details of the execution of the system.
- High extensibility. New state-of-the-art ML algorithms or feature engineer operations can be added to the system. The decomposition techniques in MindWare ensures the efficiency of finding the best configurations over the enlarged search space.
- Advanced characteristic. MindWare provides special supports for large datasets. In addition, MindWare enables transfer-learning, meta-learning techniques to make AutoML with more intelligent behaviors.
MindWare requires python>=3.6. There are two ways to install MindWare:
MindWare is available on PyPI. You can install it by tying:
pip install mindware
If you want to try the latest version, please manually install MindWare from source code by:
git clone https://github.com/thomas-young-2013/mindware.git && cd mindware
cat requirements/main.txt | xargs -n 1 -L 1 pip install
python setup.py install --user
For more detailed installation instructions, please refer to our Installation Guide Document.
Here is a brief example that uses the package.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from mindware.utils.data_manager import DataManager
from mindware.estimators import Classifier
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1, stratify=y)
dm = DataManager(X_train, y_train)
train_data = dm.get_data_node(X_train, y_train)
test_data = dm.get_data_node(X_test, y_test)
clf = Classifier(time_limit=3600)
clf.fit(train_data)
pred = clf.predict(test_data)
For more details and characteristics, please read the documents and examples.
MindWare has a frequent release cycle. Please let us know if you encounter a bug by filling an issue.
We appreciate all contributions. If you are planning to contribute any bug-fixes, please do so without further discussions.
If you plan to contribute new features, new modules, etc. please first open an issue or reuse an existing issue, and discuss the feature with us.
To learn more about making a contribution to MindWare, please refer to our How-to contribution page.
We appreciate all contributions and thank all the contributors!
- Check the existing open and closed issues.
- File an issue on GitHub.
- Discuss on the MindWare Gitter.
Targeting at openness and advancing state-of-art technology, we have also released several open source projects.
- OpenBOX: an open source system and service to efficiently solve generalized blackbox optimization problems.
We encourage researchers to leverage the project to accelerate the AI development and research.
VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition Yang Li, Yu Shen, Wentao Zhang, Jiawei Jiang, Bolin Ding, Yaliang Li, Jingren Zhou, Zhi Yang, Wentao Wu, Ce Zhang and Bin Cui International Conference on Very Large Data Bases (VLDB 2021). https://arxiv.org/abs/2107.08861
Efficient Automatic CASH via Rising Bandits
Yang Li, Jiawei Jiang, Jinyang Gao, Yingxia Shao, Ce Zhang and Bin Cui
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2020).
https://ojs.aaai.org/index.php/AAAI/article/view/5910
MFES-HB: Efficient Hyperband with Multi-Fidelity Quality Measurements Yang Li, Yu Shen, Jiawei Jiang, Jinyang Gao, Ce Zhang and Bin Cui Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2021). https://arxiv.org/abs/2012.03011
OpenBox: A Generalized Black-box Optimization Service Yang Li, Yu Shen, Wentao Zhang, Yuanwei Chen, Huaijun Jiang, Mingchao Liu, Jiawei Jiang, Jinyang Gao, Wentao Wu, Zhi Yang, Ce Zhang and Bin Cui ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD 2021). https://arxiv.org/abs/2106.00421
The entire codebase is under MIT license.