DMA/L/z4
From WikiZMSI
[edytuj]
In class
- For the purposes of the laboratory, use the UCI 'Wine' data set (from lab no. 1). Note: it is better for CART if the data is not discretized.
- Implement 3 scripts calculating different versions of impurity function: 'classification error', 'entropy', 'Gini index'.
- Implement a recursive script to construct the full CART tree (for given data set and impurity function passed as arguments). Remark: in MATLAB, it is convenient to memorize the tree as a matrix, where each row represents a tree node with suitable information kept in columns (index of parent, indices of children, index of split attribute, split value, assigned class).
[edytuj]
At home
For a selected data set (different than 'Wine') do the following:
- Implement a script pruning the full CART tree for a given penalty value (penalty for each 1 leaf). Difficulty: medium.
- Implement a script to cross-validate out the optimal value of the penalty, and finally pruning the full CART tree using the optimal penalty. Difficulty: high.