MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack

doi:10.21105/JOSS.00638

Paper

MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack

Published Apr 22, 2018 · S. Raschka

J. Open Source Softw.

524

Citations

32

Influential Citations

PDF

Abstract

MLxtend is a library that implements a variety of core algorithms and utilities for machine learning and data mining. The primary goal of MLxtend is to make commonly used tools accessible to researchers in academia and data scientists in industries focussing on userfriendly and intuitive APIs and compatibility to existing machine learning libraries, such as scikit-learn, when appropriate. While MLxtend implements a large variety of functions, highlights include sequential feature selection algorithms (Pudil, Novovičová, and Kittler 1994), implementations of stacked generalization (Wolpert 1992) for classification and regression, and algorithms for frequent pattern mining (Agrawal and Ramakrishnan 1994). The sequential feature selection algorithms cover forward, backward, forward floating, and backward floating selection and leverage scikit-learn’s cross-validation API (Pedregosa et al. 2011) to ensure satisfactory generalization performance upon constructing and selecting feature subsets. Besides, visualization functions are provided that allow users to inspect the estimated predictive performance, including performance intervals, for different feature subsets. The ensemble methods in MLxtend cover majority voting, stacking, and stacked generalization, all of which are compatible with scikit-learn estimators and other libraries as XGBoost (Chen and Guestrin 2016). In addition to feature selection, classification, and regression algorithms, MLxtend implements model evaluation techniques for comparing the performance of two different models via McNemar’s test and multiple models via Cochran’s Q test. An implementation of the 5x2 cross-validated paired t-test (Dietterich 1998) allows users to compare the performance of machine learning algorithms to each other. Furthermore, different flavors of the Bootstrap method (Efron and Tibshirani 1994), such as the .632 Bootstrap method (Efron 1983) are implemented to compute confidence intervals of performance estimates. All in all, MLxtend provides a large variety of different utilities that build upon and extend the capabilities of Python’s scientific computing stack.

Highly Cited

Study Snapshot

MLxtend is a user-friendly library that provides machine learning and data mining utilities, extending the capabilities of Python's scientific computing stack.

PopulationOlder adults (50-71 years)

Sample size24

MethodsObservational

OutcomesBody Mass Index projections

ResultsSocial networks mitigate obesity in older groups.

MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack

References

Citations