How do I combine text and numerical features in training set for machine learning?

I am trying to predict the number of likes on a post in a social network basing on both on numerical features and text features. Now I have dataframe with required features, but I don't know what to do with posts text data. Should I vectorize it/do smth else in order to get a suitable train matrix? I am going to use LinearSVC from sklearn for analysis.

Answers


There are a lot of different ways you can transform your text features into numerical ones.

One of the most common ways is the Bag of Words approach. Where you transform your text into an array with the occurrences of each word.

If you are using scikit-learn I recommend you reading their Text Feature extraction User Guide.

Also look at the NLTK toolkit for more complex ways to process your text data.


Need Your Help

Math.Round methodology, starting at the smallest decimal

c# math rounding

There have been many threads started over the confusion in the way that Math.Round works. For the most part, those are answered by cluing people in to the MidpointRounding parameter and that most p...

How do I make a Visual Studio 2015 C++ project compatible with Visual Studio 2010?

c++ visual-studio-2010 visual-studio visual-studio-2015

My teacher is horsed to use Visual Studio 2010 by the school, because they don't want to bother installing anything new. I've been using Visual Studio 2015 and am really liking ...