ENTROPY MEASURES HOMOGENEITY OF EXAMPLES

To define information gain, we begin by defining a measure called entropy. Entropy measures the impurity of a collection of examples.

Given a collection S, containing positive and negative examples of some target concept, the entropy of S relative to this Boolean classification is

Where,

p+ is the proportion of positive examples in S
p- is the proportion of negative examples in S.

Example:

Suppose S is a collection of 14 examples of some boolean concept, including 9 positive and 5 negative examples. Then the entropy of S relative to this boolean classification is

The entropy is 0 if all members of S belong to the same class
The entropy is 1 when the collection contains an equal number of positive and negative examples
If the collection contains unequal numbers of positive and negative examples, the entropy is between 0 and 1

INFORMATION GAIN MEASURES THE EXPECTED REDUCTION IN ENTROPY

Information gain, is the expected reduction in entropy caused by partitioning the examples according to this attribute.
The information gain, Gain(S, A) of an attribute A, relative to a collection of examples S, is defined as

Example: Information gain

Let, Values(Wind) = {Weak, Strong}

S = [9+, 5−]

SWeak = [6+, 2−]

SStrong = [3+, 3−]

Information gain of attribute Wind:

Gain(S, Wind) = Entropy(S) − 8/14 Entropy (SWeak) − 6/14 Entropy (SStrong)

= 0.94 – (8/14)* 0.811 – (6/14) *1.00

= 0.048

An Illustrative Example

To illustrate the operation of ID3, consider the learning task represented by the training examples of below table.
Here the target attribute PlayTennis, which can have values yes or no for different days.
Consider the first step through the algorithm, in which the topmost node of the decision tree is created.

Day	*Outlook*	*Temperature*	*Humidity*	*Wind*	*PlayTennis*
D1	Sunny	Hot	High	Weak	No
D2	Sunny	Hot	High	Strong	No
D3	Overcast	Hot	High	Weak	Yes
D4	Rain	Mild	High	Weak	Yes
D5	Rain	Cool	Normal	Weak	Yes
D6	Rain	Cool	Normal	Strong	No
D7	Overcast	Cool	Normal	Strong	Yes
D8	Sunny	Mild	High	Weak	No
D9	Sunny	Cool	Normal	Weak	Yes
D10	Rain	Mild	Normal	Weak	Yes
D11	Sunny	Mild	Normal	Strong	Yes
D12	Overcast	Mild	High	Strong	Yes
D13	Overcast	Hot	Normal	Weak	Yes
D14	Rain	Mild	High	Strong	No

ID3 determines the information gain for each candidate attribute (i.e., Outlook, Temperature, Humidity, and Wind), then selects the one with highest information gain.

The information gain values for all four attributes are

Gain(S, Outlook) = 0.246

Gain(S, Humidity) = 0.151

Gain(S, Wind) = 0.048

Gain(S, Temperature) = 0.029

According to the information gain measure, the Outlook attribute provides the best prediction of the target attribute, PlayTennis, over the training examples. Therefore, Outlook is selected as the decision attribute for the root node, and branches are created below the root for each of its possible values i.e., Sunny, Overcast, and Rain.