Forest-Based Classification and Regression

Creates models and generates predictions using an adaptation of Leo Breiman's random forest algorithm, which is a supervised machine learning method. Predictions can be performed for both categorical variables (classification) and continuous variables (regression). Explanatory variables are fields in the attribute table of the training features. The tool can be run to generate a model to assess performance, or generate a model and predict results to another datasets.

Analysis Type

Specifies the operation mode of the tool. The tool can be run to train a model to only assess performance, or train a model and predict to features. Prediction types are as follows:

Train a model to assess model performance—A model will be trained, and fit to the input data. Use this option to assess the accuracy of your model before generating predictions on a new dataset. The output of this option will be a feature service of your fitted training data, model diagnostics, and an optional table of variable importance.
Train a model and predict values— Predictions or classifications will be generated for features. Explanatory variables must be provided for both the training features and the features to be predicted. The output of this option will be a feature service of your predicted values, model diagnostics, and an optional table of variable importance.

Train a model to assess model performance

Use this mode if you want to fit a model, and investigate the fit.

With this choice model will be trained using an input layer. Use this option to assess the accuracy of your model before generating predictions on a new dataset. This option will output model diagnostics in the messages window and apply the model to your training data.

Train a model and predict values

Use this mode if you want to fit a model, and apply the model to the dataset to generate predictions.

Predictions or classifications will be generated for features. The output of this option will be a feature service, model diagnostics, and an optional table of variable importance.

Choose training layer

The feature layer containing the variable to predict and the fields that will be used to generate the prediction.

Можна не тільки обирати шар на карті, а й обрати Вибрати шар аналізу внизу розкривного списку для огляду ваших ресурсів, які містяться у наборі даних спільного файлового сховища великих даних або у векторному шарі. Ви можете додатково застосувати фільтр на вашому вхідному шарі або застосувати вибір на розміщених на хості шарах, доданих до вашої карти. Фільтри та вибори застосовуються тільки для аналізу.

Choose a layer to predict values for

A feature layer representing locations where predictions will be made. This feature layer must also contain any explanatory variables provided as fields that correspond to those used from the training features.

Choose the field to predict

The field from the training features containing the values to be used to train the model. This field contains known (training) values of the variable that will be used to predict at unknown locations. If values are categorical (for example, Maple, Pine, Oak) select the Categorical check box.

Choose one or more explanatory variables

One or more fields representing the explanatory variables (fields) that help predict the value or category of the variable to predict. Use the categorical checkbox for any variables that represent classes or categories (such as landcover or presence or absence). Specify the variable as true for any that represent classes or categories such as landcover or presence or absence and false if the variable is continuous.

Number of trees

The number of trees to create in the model. More trees will generally result in more accurate model prediction, but the model will take longer to calculate. The default number of trees is 100.

Minimum leaf size

The minimum number of observations required to keep a leaf (that is the terminal node on a tree without further splits). The default minimum for regression is 5 and the default for classification is 1. For very large data, increasing these numbers will decrease the run time of the tool.

Maximum tree depth

The maximum number of splits that will be made down a tree. Using a large maximum depth, more splits will be created, which may increase the chances of overfitting the model. The default is data driven and depends on the number of trees created and the number of variables included.

Data available per tree (%)

Specifies the percentage of the features in the training layer used for each decision tree. The default is 100 percent of the data. Samples for each tree are taken randomly from two-thirds of the data specified.

Each decision tree in the forest is created using a random sample or subset (approximately two-thirds) of the training data available. Using a lower percentage of the input data for each decision tree increases the speed of the tool for very large datasets.

Number of randomly sampled variables

Specifies the number of explanatory variables used to create each decision tree.

Each of the decision trees in the forest is created using a random subset of the explanatory variables specified. Increasing the number of variables used in each decision tree will increase the chances of overfitting your model particularly if there is one or a couple dominant variables. A common practice is to use the square root of the total number of explanatory variables if your variable to predict is numeric or divide the total number of explanatory variables by 3 if the variable to predict is categorical.

Choose how explanatory fields are matched

How the corresponding variables in the training layer will match the variables in the prediction layer. Only the variables used in training will be included in the table.

Number of runs for validation

Specifies the percentage (between 0 percent and 50 percent) of features in the training layer to reserve as the test dataset for validation. The model will be trained without this random subset of data, and the observed values for those features will be compared to the predicted value. The default is 10 percent.

Result layer name

Це назва шару, який буде створено в Моєму змісті та додано до карти. Назва за замовчуванням базується на назві інструменту та назві вхідного шару. Якщо ім'я шару вже існує, вам буде запропоновано надати іншу назву.

The results returned will depend on the type of analysis. If you are training to assess model fit, results will contain a layer of training data fit to the model and result info assessing the model fit. If you are training and predicting, results will contain a layer of the training data fit to the model, a layer of predicted results, and result info assessing the model fit.

За допомогою розкривного меню Зберегти результат в можна вказати назву папки у Моєму змісті, куди буде збережено результат.