Header Ads Widget

Application to Decision Tree Learning

 Apply the MDL principle to the problem of learning decision trees from some training data.

What should we choose for the representations C1 and C2 of hypotheses and data?

  • For C1: C1 might be some obvious encoding, in which the description length grows with the number of nodes and with the number of edges
  • For C2: Suppose that the sequence of instances (x1 . . .xm) is already known to both the transmitter and receiver, so that we need only transmit the classifications (f (x1) . . . f (xm)).
  • Now if the training classifications (f (x1) . . .f(xm)) are identical to the predictions of the hypothesis, then there is no need to transmit any information about these examples. The description length of the classifications given the hypothesis ZERO
  • If examples are misclassified by h, then for each misclassification we need to transmit a message that identifies which example is misclassified as well as its correct classification
  • The hypothesis hMDL under the encoding C1 and C2 is just the one that minimizes the sum of these description lengths.

Post a Comment

0 Comments