issue plotting too many lines on curve fit with matplotlib

not sure what I'm doing wrong, but when I try and implement the polyfit to scatterplot data (year, rating) it keeps plotting a whole bunch of lines rather than one single line. It looks like this:

my code is below:

data = movies[['year', 'rtAllCriticsRating']]
data.year = data.year.astype(float).fillna(0.0)
data = data.convert_objects(convert_numeric=True)
data = data[data.rtAllCriticsRating > 0]
#print data
>>> 1995   5.4
    1950   2.3
    ....

#############issues start HERE########################
fig = plt.figure(figsize=(15, 15), dpi=100)
fig.add_subplot(212, axisbg='lightgrey')

# fit with np.polyfit
p = np.polyfit(data.year, data.rtAllCriticsRating, 3)
print p
plt.plot(data.year, data.rtAllCriticsRating, 'bo')
plt.plot(data.year,np.polyval(p, data.year),'r-') # A red solid line
plt.xlim(1900, 2020)
plt.ylim(0, 11)
plt.grid()
plt.xlabel('X Axis is by year')
plt.ylabel('Y Axis is by AllCriticRating')

what is going on, and how do I fix this? My main goal is to overlay on this scatter plot a line graph in red showing how the average movie rating (the average of rtAllCriticsRating across all movies in a year) has changed over time....

Answers


It looks like your data.year array is not in any particular order. When you put it into a scatter plot, that doesn't really matter. However, when you are using that array to overlay an average line, than you need it to be in numerical (in this case chronological) order. Try the following:

plt.plot(np.sort(data.year), np.polyval(p, np.sort(data.year), 'r-')

This should connect all of the lines in the appropriate order, forming one single curve.


Need Your Help

Adobe Air: drag & drop grouped components

flex drag-and-drop air flex4 flex-spark

I am trying to create Adobe Air application in which I require the components below: