The cost function in the context of decision trees refers to a metric used to determine the “quality” of a split at any given node. Depending on the nature of the task (classification or regression), different cost functions are used:
- Classification:
- Gini Impurity: It measures the disorder in a set. If all elements are of the same class, the Gini impurity is 0 (complete purity). If elements are randomly distributed across various classes, the impurity is maximized.
- Entropy: It measures the amount of randomness or chaos in the set. A set with equal distribution of classes has maximum entropy.
- Information Gain: Information Gain is the entropy of the parent node minus the weighted entropy of the child nodes. The idea is to maximize information gain.
- Regression:
- Variance Reduction: For regression problems, we usually split nodes in a way that reduces the total variance in the target variable. The cost function for a potential split is the total variance of each leaf, weighted by the number of data points in the leaf.
For any given node, when building the tree, the algorithm will consider each possible split (across all features) and choose the one that minimizes the chosen impurity measure for classification or the variance for regression. This iterative process continues until a specified stopping criterion is reached, such as a certain tree depth or a minimum node size.
Though these cost functions help guide the growth of the tree, they also contribute to the potential for overfitting. So techniques like pruning (removing sections of the tree that provide little power in predicting target values) are often used post-tree construction to optimize the model further.