Application to Decision Tree Learning

Rahul Saini July 05, 2021

Apply the MDL principle to the problem of learning decision trees from some training data.

What should we choose for the representations C1 and C2 of hypotheses and data?

For C1: C1 might be some obvious encoding, in which the description length grows with the number of nodes and with the number of edges
For C2: Suppose that the sequence of instances (x1 . . .xm) is already known to both the transmitter and receiver, so that we need only transmit the classifications (f (x1) . . . f (xm)).
Now if the training classifications (f (x1) . . .f(xm)) are identical to the predictions of the hypothesis, then there is no need to transmit any information about these examples. The description length of the classifications given the hypothesis ZERO
If examples are misclassified by h, then for each misclassification we need to transmit a message that identifies which example is misclassified as well as its correct classification
The hypothesis hMDL under the encoding C1 and C2 is just the one that minimizes the sum of these description lengths.

Post a Comment

0 Comments