21.2 ID3 (Iterative Dichotomiser 3)

ID3 (Iterative Dichotomiser 3) is a popular algorithm used to construct a decision tree for classification problems.

It selects the best attribute for splitting the data based on Information Gain, which is calculated using Entropy.

The main goal of ID3 is to create a tree that classifies data with maximum purity at each node.

What ID3 Stands For

ID3 stands for Iterative Dichotomiser 3.

Key Idea of ID3

ID3 chooses the attribute that reduces entropy the most and makes the data as pure as possible after splitting.

Measures Used in ID3

ID3 uses the following measures:

Entropy – to measure impurity
Information Gain – to select the best attribute for splitting

(a) Entropy

Entropy measures the impurity or randomness in the dataset.

Formula:

Entropy = 0 → Pure node
Entropy is maximum → Highly impure node

6. Information Gain Formula

Where:

IG(S, A)
→ Information Gain of attribute A on dataset S

S
→ Complete dataset (parent/root node)

A
→ Attribute (feature) used for splitting

H(S)
→ Entropy of the dataset before splitting

Values(A)
→ All possible values of attribute A

v
→ A specific value of attribute A

Sᵥ
→ Subset of dataset S where attribute A = v

|S|
→ Total number of samples in dataset S

|Sᵥ|
→ Number of samples in subset Sᵥ

|Sᵥ| / |S|
→ Weight of subset Sᵥ

H(Sᵥ)
→ Entropy of subset Sᵥ after splitting

∑ (summation)
→ Sum of weighted entropies of all subsets

Algorithm Steps of ID3

ID3 Algorithm Steps

Calculate the entropy of the dataset.
For each attribute, calculate the Information Gain.
Select the attribute with the highest Information Gain as the root node.
Split the dataset based on the selected attribute.
Repeat the process recursively for each child node.
Stop when:
- All samples belong to the same class, or
- No attributes are left for splitting.

Characteristics of ID3

Suitable for multiclass classification

Uses Entropy & Information Gain

Mainly used for classification

Works best with categorical data

Can split a node into more than two branches

Pure and Impure Split using Entropy (with Example)

Pure and Impure Split using Entropy (with
Example)

Imagine separating apples and oranges.
If one basket has only apples, the separation is perfect.
If another basket has apples and oranges mixed, the separation is confusing.

Decision trees face the same situation.
Entropy helps the tree decide whether a split is clean (pure) or messy (impure).

Based on entropy values, splits are classified as pure or impure.
Definition of Pure and Impure Split

 Pure Split
→ All samples belong to the same class
→ Entropy = 0

 Impure Split
→ Samples belong to more than one class
→ Entropy > 0
Suppose, given data is
Feature

Overall Class Distribution

Feature-wise Split (as needed for entropy)
Feature

Total samples = 9
Yes (Y) = 6
No (N) = 3

Step 1: Entropy of Root Node (Before Split)

Formula of Entropy

Since entropy > 0, the node is IMPURE.

Step 2: Split on Feature → C1 and C2

Split 1: Node C1
From your notes:

Y = 3
N = 3
Total = 6

Maximum entropy → Completely IMPURE split

Split 2: Node C2
From your notes:

Y = 3
N = 0
Total = 3

Entropy = 0 → PURE node
Step 3: Interpretation (Pure vs Impure)

Pure Split

All samples belong to one class only
Entropy = 0
No further splitting needed
In this example:
C2 is a PURE split

Impure Split

Samples belong to multiple classes
Entropy > 0
Further splitting required
In this example:
- Root node (0.918) → Impure
- C1 node (1) → Highly impure

Step 4: Summary Table

Thus, ID3 constructs a decision tree by selecting attributes that provide maximum
information gain, resulting in an efficient and interpretable classification model.

Key Points