DMA/L/z4

From WikiZMSI

< DMA | L

In class

  • For the purposes of the laboratory, use the UCI 'Wine' data set (from lab no. 1). Note: it is better for CART if the data is not discretized.
  • Implement 3 scripts calculating different versions of impurity function: 'classification error', 'entropy', 'Gini index'.
  • Implement a recursive script to construct the full CART tree (for given data set and impurity function passed as arguments). Remark: in MATLAB, it is convenient to memorize the tree as a matrix, where each row represents a tree node with suitable information kept in columns (index of parent, indices of children, index of split attribute, split value, assigned class).

At home

For a selected data set (different than 'Wine') do the following:

  • Implement a script pruning the full CART tree for a given penalty value (penalty for each 1 leaf). Difficulty: medium.
  • Implement a script to cross-validate out the optimal value of the penalty, and finally pruning the full CART tree using the optimal penalty. Difficulty: high.