1 year ago

#297331

test-img

Gianluca Armeli

How to select the best splitting criteria in decision trees with multiple best splits?

I wrote a decision tree regressor from scratch in python. It is outperformed by the sklearn algorithm. Both trees build exactly the same splits with the same leaf nodes. BUT when looking for the best split there are multiple splits with optimal variance reduction that only differ by the feature index. The feature that my algorithm selects as a splitting criteria in the grown tree leads to major outliers in the test set prediction, whereas the feature selected from sklearn does not.

So what is the right thing to do if there are mutliple best splits in the same branch while building the tree? Which is the best feature to choose?

python

regression

decision-tree

variance

reduction

0 Answers

Your Answer

Accepted video resources