Double click the ML Engine icon on the desktop to open ML Engine.
Make sure that you have created and saved a dataset before you begin creating models.
There are two main tabs under ML Engine - Classification and Regression. Based on the data and the problem, a user can create classification or regression models using many algorithms. To create a classification model, click on the Classification menu. To create a regression model click on the Regression menu.
Classification is a supervised learning approach in which an algorithm learns from input data that has been classified and uses this learning to classify new data.
Regression is an approach that allows predicting a continuous outcome variable (y) based on the value of one or many predictor variables (x).
Regardless of classification or regression, the process of creating a model is the same. It follows the Create Model > Cross Validation Details submenu.
Creating a model starts by creating a base model after which other models using various algorithms can be created. To start, select the Classification or Regression tab and then Click the +Add Base Model.
- Select the raw Data File
- Select dataset for modelling
- Provide a name for the Base Model
- Click Select Dataset
- Select an algorithm
- Tune parameter for the selected algorithm. The default parameters are set. Change if needed
- Click Select Classifier/Regressor
- Click Create New Model Version to create a new model using the selected algorithm and the parameters.
You can also click the AutoPilot button to automatically use algorithms to create and validate ML models. Create as many versions by changing the parameters for a particular algorithm or selecting other algorithms.
A list of all versions of the base model sorted in descending order based on performance is presented on the Leader board. For each model, the following information is shown.
Version -Tag - auto-generated version number and tag for the model. It is shown as V.”number” - “Tag name” e.g V.1 - v.acd (as shown in Figure 24 #4) means Version 1 of base model myIrismodel tagged as v.acd.
To rename just select the model and type a Tag name in the text box and click Rename Tag.
Dataset - is the “raw data”-”dataset” used in creating the model
Algorithm - the algorithm used to create the model
Rank - rank of the model in base on performance compared to all other versions of the selected base model
Accuracy (classification) - model performance in percentage
Error (regression) - how well the model predicts the response variable. The error value shown is the mean absolute error explained in step 5 (model performance -> Regression).
Doc - automated generated documentation for the model (see model manager for more information)
Publish - click to publish a model to a reviewer (see publish model for details)
Delete - delete model
Evaluate Model Performance
Classification Performance Metrics
Model performance is shown in the form of a pie chart in which green represents the accuracy of the model and red represents the error.
Accuracy - total number of correct predictions by total number of predictions (shown in %). It is simply the fraction of predictions the model got right. Accuracy could be misleading so include the other parameters below to evaluate the performance of a model.
Error - 100 - accuracy ( shown in %)
F1_score - weighted average of precision and recall (multiplied by 100). Usually more useful than accuracy if you have an uneven class distribution. Read about Precision, Recall and F1_score here.
Hamming_loss - the fraction of labels that are incorrectly predicted to the total number of labels.
Additional Performance metrics are available in the auto-generated documentation. This includes Precision, Recall and Jaccard Similarity score (see Model Documentation).
Regression Performance Metrics
Model performance is shown in a bar chart showing four main performance metrics listed and explained below
Root Mean Square Error (RMSE) - the standard deviation of the residuals(prediction error). Residuals are a measure of how far data points are from the regression line. In other words, RMSE is a measure of how to spread out these residuals are or how concentrated the data is around the line of best fit.
Mean Absolute Error (MAE) - the sum of absolute differences between target and predicted variables. It measures the average magnitude of errors in a set of predictions, without considering their directions.
Mean Squared Log Error (MSLE) - a measure of the ratio between the true and predicted values.
Median Absolute Error (MEDAE) - the median of all absolute differences between the target and the prediction. It is robust to outliers.
Other Performance Metrics
Click this icon
Receiving Operating Characteristics (ROC) curve (only for classification models). A plot of true positive rate (sensitivity) against false positive (1- specificity). ROC shows the tradeoff between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in specificity). It simply represents how well a model can distinguish between classes. In simple terms, classifiers that give curves closer to the top-left corner indicate better performance. The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.
A plot of predicted versus the original
Click to see the following information
columns and sample of raw data used to create the model
metadata of the algorithm used to create the model
metadata of data set
auto-generated documentation of the model (see model manager for details)
publish a model for review (see publish a model for details)