Random Forest

Random Forest algorithm is a supervised classification algorithm. It is a tree-based algorithm comprised of several decision trees.

The difference between the Random Forest algorithm and the decision tree algorithm is that in Random Forest, the processes of finding the root node and splitting the feature nodes will run randomly.

The Random forest consolidates thousands of decision trees, trains each tree on a separate set of observations. And then divides nodes in each tree assuming a limited number of attributes or features. The final outcome of the random forest is secured by making an average of the predictions of each individual tree. The higher the number of trees greater the accuracy and counters the obstacle of overfitting.

Basic terms:

  1. Entropy: It is a degree or strength of randomness in a given dataset.
  2. Information gain: When the dataset gets split, there is a reduction in entropy, and that decreased measurement is called information gain.
  3. Leaf-nodes: It is particularly a node or intersection to support the classification or decisions.
  4. Decision-node: A simple node that has two or more branches or intersections.
  5. Root-node: The leading or uppermost decision node, where all the data is available.

Advantages:

  • It can be used for both classification and regression tasks. In Random Forest algorithm, if there are enough trees in the forest, the classifier won’t overfit the model.
  • The classifier of Random Forest can handle missing values.
  • The Random Forest classifier can be modeled for categorical values.

Comments are closed.