I am happy to announce that the Datumbox Machine Learning Framework is now open sourced under GPL 3.0 and you can download its code from Github!
What types of models/algorithms are supported?
The framework is divided in several Layers such as Machine Learning, Statistics, Mathematics, Algorithms and Utilities. Each of them provides a series of classes that are used for training machine learning models. The two most important layers are the Statistics and the Machine Learning layer.
The Statistics layer provides classes for calculating descriptive statistics, performing various types of sampling, estimating CDFs and PDFs from commonly used probability distributions and performing over 35 parametric and non-parametric tests. Such types of classes are usually necessary while performing explanatory data analysis, sampling and feature selection.
The Machine Learning layer provides classes can be used in a large number of problems including Classification, Regression, Cluster Analysis, Topic Modeling, Dimensionality Reduction, Feature Selection, Ensemble Learning and Recommender Systems. Here are some of the supported algorithms: LDA, Max Entropy, Naive Bayes, SVM, Bootstrap Aggregating, Adaboost, Kmeans, Hierarchical Clustering, Dirichlet Process Mixture Models, Softmax Regression, Ordinal Regression, Linear Regression, Stepwise Regression, PCA and more.