Error: Found array with dim 3. Estimator expected <= 2
I have a 14x5 data matrix titled data. The first column (Y) is the dependent variable followed by 4 independent variables (X,S1,S2,S3). When trying to fit a regression model to a subset of the independent variables ['S2'][:T] I get the following error:
ValueError: Found array with dim 3. Estimator expected <= 2.
I'd appreciate any insight on a fix. Code below.
import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression data = pd.read_csv('C:/path/Macro.csv') T=len(data['X'])-1 #Fit variables X = data['X'][:T] S1 = data['S1'][:T] S2 = data['S2'][:T] S3 = data['S3'][:T] Y = data['Y'][:T] regressor = LinearRegression() regressor.fit([[X,S1,S2,S3]], Y)
You are passing a 3-dimensional array as the first argument to fit(). X, S1, S2, S3 are all Series objects (1-dimensional), so the following
[[X, S1, S2, S3]]
is 3-dimensional. sklearn estimators expect an array of feature vectors (2-dimensional).
Try something like this:
# pandas indexing syntax # data.ix[ row index/slice, column index/slice ] X = data.ix[:T, 'X':] # rows up to T, columns from X onward y = data.ix[:T, 'Y'] # rows up to T, Y column regressor = LinearRegression() regressor.fit(X, y)