# How to create predictive model in R when outcome variable has more than 10 classes?

I have a dataset for vehicles and trying to predict what will fail. Here is my data set

vId, MileageSincePartLastReplaced, AgeOfPart, TypeOfVehicle, Failure x,100000,200days,Truck,Alt Belt y,200000,600days,PCar,Transmission Belt z,140000,230days,Van,Fan Belt

Failure is outcome variable with 20 different types of failure categories. What I am looking for is given mileage driven and age of the part which one is likely to fail?

I am unable to figure out which model/models I should be looking for? I was looking into multinomial regression & ordered logistic regression but was not sure. Any help around how to go about this and which package I could use?

Note: I asked same question in Stack Overflow and was suggested to move to this forum.

## Answers

I don't entirely understand the description of your dataset, but if there are 20 classes of interest, one typical solution would be to bundle some of them together. (classes 1-10 become class 1, classes 11-20 become class 2, etc).

Find a split which is easiest, and then split those groups with another classifier, and recurse. You need to keep track of the original labels as you go for this but assign temporary labels for splits. You may find this works better, you may not.

If you are using only two predictor variables (mileage and age of part), you are going to find that predicting 20 classes is hard no matter what you do.

Try Random Forest method. Its very useful tool in case of multinomial outcome.

library(randomForest) ?randomForest #check help and give necessary parameters. randomForest(x, y=NULL,ntree=500) # Y is response here