- ID3 can be characterized as searching a space of hypotheses for one that fits the training examples.
- The hypothesis space searched by ID3 is the set of possible decision trees.
- ID3 performs a simple-to complex, hill-climbing search through this hypothesis space, beginning with the empty tree, then considering progressively more elaborate hypotheses in search of a decision tree that correctly classifies the training data
Figure: Hypothesis space search by ID3. ID3 searches through the space of possible decision trees from simplest to increasingly complex, guided by the information gain heuristic.
By viewing ID3 in terms of its search space and search strategy, there are some insight into its capabilities and limitations
1. ID3's hypothesis space of all decision trees is a complete space of finite discrete-valued
functions, relative to the available attributes. Because every finite
discrete-valued function can be represented by some decision tree
ID3 avoids one of the major risks
of methods that search incomplete hypothesis spaces:
that the hypothesis space might not contain the target function.
2. ID3 maintains only a single current hypothesis as it
searches through the space of decision trees.
For example, with the earlier version space candidate elimination method, which maintains the set of all hypotheses consistent with the available training examples.
By determining only a single hypothesis, ID3 loses the capabilities that follow from explicitly representing all consistent hypotheses.
For example, it does not have the ability to determine how many alternative decision trees are consistent with the available training data, or to pose new instance queries that optimally resolve among these competing hypotheses
3. ID3 in its pure form performs no backtracking in its search. Once it selects an attribute to test at a particular level in the tree, it never backtracks to reconsider this choice.
In the case of ID3, a locally optimal solution corresponds to the
decision tree it selects along the single search path it explores. However,
this locally optimal solution may be less desirable than trees that would have
been encountered along a different branch of the search.
4. ID3 uses all training examples at each step in the search to make statistically based decisions regarding how to refine its current hypothesis.
One advantage of
using statistical properties of all the examples is that the resulting search
is much less sensitive to errors in individual training examples.
ID3 can be
easily extended to handle noisy training data by modifying its termination
criterion to accept hypotheses that imperfectly fit the training data.
0 Comments