Header Ads Widget

Reduced-Error Pruning

  • Reduced-error pruning, is to consider each of the decision nodes in the tree to be candidates for pruning
  • Pruning a decision node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training examples affiliated with that node
  • Nodes are removed only if the resulting pruned tree performs no worse than-the original over the validation set.
  • Reduced error pruning has the effect that any leaf node added due to coincidental regularities in the training set is likely to be pruned because these same coincidences are unlikely to occur in the validation set

 
The impact of reduced-error pruning on the accuracy of the decision tree is illustrated in below figure

  • The additional line in figure shows accuracy over the test examples as the tree is pruned. When pruning begins, the tree is at its maximum size and lowest accuracy over the test set. As pruning proceeds, the number of nodes is reduced and accuracy over the test set increases.
  • The available data has been split into three subsets: the training examples, the validation examples used for pruning the tree, and a set of test examples used to provide an unbiased estimate of accuracy over future unseen examples. The plot shows accuracy over the training and test sets.

Pros and Cons

Pro: Produces smallest version of most accurate T (subtree of T)

Con: Uses less data to construct T


Can afford to hold out Dvalidation? . If not (data is too limited), may make error worse(insufficient Dtrain)

Post a Comment

0 Comments