Decision Tree in R with binary and continous input

we are modelling a decision tree using both continous and binary inputs. We are analyzing weather effects on biking behavior. A linear regression suggests that "rain" has a huge impact on bike counts. Our rain variable is binary showing hourly status of rain.

Using rpart to create a decision tree does not include "rain" as a node, although we expect it to be very decisive on the number of bikes. This might be due to the classification of the rain variable. Rpart seems to prefer to use continous variables (like temperature) for decision nodes.

Is there anything we should know about how rpart determines whether to use continous or binary variables as decision node? Is it possible to control this selection of variables?

library("rpart") fit <- rpart(bikecount ~ df.weather$temp+df.weather$weekday+df.weather$rain, data=training.data, method="class")

Answers


Function rpart implements the CART algorithm of Breiman, Friedman, Olshen and Stone (1984), which is known to suffer from biased variable selection. I.e., given 2 or more variables that are equally predictive of the outcome, the variable with the largest number of unique values is most likely to be selected for splitting. See for example Loh and Shih (1997); Hothorn, Hornik & Zeileis (2006).

Unbiased recursive partitioning methods separate selection of 1) the splitting variable and 2) the splitting value, which solves this variable selection bias. Unbiased recursive partitioning has been implemented in the R package partykit.

If the code you provide above works for function rpart (as it is unclear to me why the predictor variables in formula include $, the response variable does not, while the data argument has been specified), you should be able to fit an unbiased classification tree as follows:

library("partykit")
ct <- ctree(bikecount ~ df.weather$temp + df.weather$weekday + df.weather$rain, 
            data=training.data)
plot(ct)
References

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1994). Classification and regression trees. Wadsworth, Monterey, CA.

Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical statistics, 15(3), 651-674.

Loh, W. Y., & Shih, Y. S. (1997). Split selection methods for classification trees. Statistica Sinica 7(4), 815-840.


Need Your Help

Accesing field's methods in composition

java interface composition

I have a class Player which contains few private fields of other classes ( I believe it is called composition ).

Android video View problems

android android-mediaplayer android-video-player android-videoview

I am looking for someone who can help me how to program video view with live streaming URL.