Logistic Regression with variables that do not vary
A few questions around constant variables and logistic regression -
Lets say I have a continuous variable, but has only 1 value across the whole data set. I know I should ideally eliminate the variable since it brings no predictive value. Instead of manually doing this for each feature, does Logistic Regression make the coefficient of such variables 0 automatically?
If I use such a variable (that has only one value) in Logistic Regression with L1 regularization, will the regularization force the coefficient to 0?
On similar lines, if I have a categorical variable for which I have 3 levels - first level spans say 60% of the data set, second spans across 35% and the 3rd level at 5%), and I split it into training and testing, there is a good chance that the third level may not end up in the test set, leading us a scenario where we have a variable that has one value in the test set and other in the training set. How do I handle such scenarios ? Does regularization take care of things like this automatically?
Regarding question 3)
If you want to be sure that both train and test set contain samples from each categorical variables, you can simply divide each subgroup into test and training set and then combine these again.
Regarding question 1) and 2)
The coefficent for a variable with variance zero should be zero, yes. However, whether such a coefficent "automatically" will be set to zero or be excluded from regression depends on the implementation.
If you implement logistic regression yourself, you can post the code and we can discuss specifically.
I recommend you to find an implemented version of logistic regression and test it using toy data. Then you will have your answer, whether or not the coeffient will be set to zero (which i assume).