Introduction to data mining

Comment

Author: Admin | 2025-04-28

11/24/2018 Introduction to Data Mining, 2nd Edition 12 Apply Model to Test DataHome Owner Yes No NO MarSt Single, Divorced Married Income NO > 80K NO YES 11/24/2018 Introduction to Data Mining, 2nd Edition 13 Apply Model to Test DataHome Owner Yes No NO MarSt Single, Divorced Married Income NO > 80K NO YES 11/24/2018 Introduction to Data Mining, 2nd Edition 14 Apply Model to Test DataHome Owner Yes No NO MarSt Married Assign Defaulted to “No” Single, Divorced Income NO > 80K NO YES 11/24/2018 Introduction to Data Mining, 2nd Edition 15 Decision Tree Classification Task11/24/2018 Introduction to Data Mining, 2nd Edition 16 Decision Tree InductionMany Algorithms: Hunt’s Algorithm (one of the earliest) CART ID3, C4.5 SLIQ,SPRINT 11/24/2018 Introduction to Data Mining, 2nd Edition 17 General Structure of Hunt’s AlgorithmLet Dt be the set of training records that reach a node t General Procedure: If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset. Dt ? 11/24/2018 Introduction to Data Mining, 2nd Edition 18 Introduction to Data Mining, 2nd EditionHunt’s Algorithm (7,3) (3,0) (4,3) (3,0) (3,0) (3,0) (1,3) (3,0) (1,0) (0,3) 11/24/2018 Introduction to Data Mining, 2nd Edition 19 Introduction to Data Mining, 2nd EditionHunt’s Algorithm (7,3) (3,0) (4,3) (3,0) (3,0) (3,0) (1,3) (3,0) (1,0) (0,3) 11/24/2018 Introduction to Data Mining, 2nd Edition 20 Introduction to Data Mining, 2nd EditionHunt’s Algorithm (7,3) (3,0) (4,3) (3,0) (3,0) (3,0) (1,3) (3,0) (1,0) (0,3) 11/24/2018 Introduction to Data Mining, 2nd Edition 21 Introduction to Data Mining, 2nd EditionHunt’s Algorithm (7,3) (3,0) (4,3) (3,0) (3,0) (3,0) (1,3) (3,0) (1,0) (0,3) 11/24/2018 Introduction to Data Mining, 2nd Edition 22 Design Issues of Decision Tree InductionGreedy strategy: the number of possible decision trees can be very large, many decision tree algorithms employ a heuristic-based approach to guide their search in the vast hypothesis space. Split the records based on an attribute test that optimizes certain criterion. 11/24/2018 Introduction to Data Mining, 2nd Edition 23 Tree Induction How should training records be split?Method for specifying test condition depending on attribute types Measure for evaluating the goodness of a test condition How should the splitting procedure stop? Stop splitting if all the records belong to the same class or have identical attribute values Early termination 24 Introduction to Data Mining, 2nd EditionHow to specify the attribute test condition? 11/24/2018 Introduction to Data Mining, 2nd Edition 25 Methods for Expressing Test ConditionsDepends on attribute types Binary Nominal Ordinal Continuous Depends on number of ways to split 2-way split Multi-way split 11/24/2018 Introduction to Data Mining, 2nd Edition 26 Test Condition for Nominal AttributesMulti-way split: Use as many partitions as distinct values. Binary split: Divides values into two subsets 11/24/2018 Introduction to Data Mining, 2nd Edition 27 Test Condition for Ordinal AttributesMulti-way split: Use as many partitions as distinct values Binary

Add Comment