A decision tree is a hierarchical data structure that uses the divide and conquer technique to describe data. In this lesson, we'll discuss decision trees with categorical labels, but decision trees can also be used for non-parametric classification and regression. The process of adjusting the decision tree to minimize “misclassification error” is called pruning. It is of 2 types: pre-pruning and subsequent pruning.

Some researchers found that this method results in trees that are more accurate than the pruning process based on the size of the tree (F. In addition, due to its top-down nature, each subtree of the tree only needs to be consulted once and, in the worst case, the temporal complexity is linear with the number of non-foliar nodes in the decision tree). Pruning helps to trim branches that follow anomalies in training information due to noise or outliers and supports the original tree in a method that improves the generalization efficiency of the tree. The decision tree is made up of nodes that create a rooted tree, which means that it is a directed tree with no incoming edges.

Once the series of trees has been created, the best tree is chosen through generalized precision measured through a training set or through cross-validation. You can reduce the risk of overadaptation by defining the size of the tree or eliminating areas of the tree that support little energy. Reduce the risk of overfitting by limiting the size of the tree or eliminating sections of the tree that provide little energy. As for the first phase, the main concept is to prune the branches that show the least increase in the apparent error rate per leaf cut to produce the next best tree from the best tree.

Train an entire tree using the subset (and apply the tree over the subset) to calculate the accuracy. Start with the whole tree and compare the number of classification errors that are made in the pruning set when the subtree is preserved with the number of classification errors that are made when the internal nodes are transformed into leaves and assigned to the best class for each internal node in the tree. Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by eliminating sections of the tree that are not critical and are redundant for classifying instances.