# Smoothing Small Data Set With Second Order Quadratic Curve

I'm doing some specific signal analysis, and I am in need of a method that would smooth out a given bell-shaped distribution curve. A running average approach isn't producing the results I desire. I want to keep the min/max, and general shape of my fitted curve intact, but resolve the inconsistencies in sampling.

In short: if given a set of data that models a simple quadratic curve, what statistical smoothing method would you recommend?

If possible, please reference an implementation, library, or framework.

Thanks SO!

**Edit**: Some helpful data

(A possible signal graph)

alt text http://i40.tinypic.com/b49942.png

The dark colored quadratic is my "fitted" curve of the light colored connected data points.

The sample @ -44 (approx.), is a problem in my graph (i.e. a potential sample inconsistency). I need this curve to "fit" the distribution better, and overcome the values that do not trend accordingly. Hope this helps!

## Answers

A "quadratic" curve is one thing; "bell-shaped" usually means a Gaussian normal distribution. Getting a best-estimate Gaussian couldn't be easier: you compute the sample mean and variance and your smooth approximation is

y = exp(-squared(x-mean)/variance)

If, on the other hand, you want to approximate a smooth curve with a quadradatic, I'd recommend computing a quadratic polynomial with minimum square error. I can nenver remember the formulas for this, but if you've had differential calculus, write the formula for the total square error (pointwise) and differentiate with respect to the coefficients of your quadratic. Set the first derivatives to zero and solve for the best approximation. Or you could look it up.

Finally, if you just want a smooth-looking curve to approximate a set of points, cubic splines are your best bet. The curves won't necessarily mean anything, but you'll get a nice smooth approximation.

#include <iostream> #include <math.h> struct WeightedData { double x; double y; double weight; }; void findQuadraticFactors(WeightedData *data, double &a, double &b, double &c, unsigned int const datasize) { double w1 = 0.0; double wx = 0.0, wx2 = 0.0, wx3 = 0.0, wx4 = 0.0; double wy = 0.0, wyx = 0.0, wyx2 = 0.0; double tmpx, tmpy; double den; for (unsigned int i = 0; i < datasize; ++i) { double x = data[i].x; double y = data[i].y; double w = data[i].weight; w1 += w; tmpx = w * x; wx += tmpx; tmpx *= x; wx2 += tmpx; tmpx *= x; wx3 += tmpx; tmpx *= x; wx4 += tmpx; tmpy = w * y; wy += tmpy; tmpy *= x; wyx += tmpy; tmpy *= x; wyx2 += tmpy; } den = wx2 * wx2 * wx2 - 2.0 * wx3 * wx2 * wx + wx4 * wx * wx + wx3 * wx3 * w1 - wx4 * wx2 * w1; if (den == 0.0) { a = 0.0; b = 0.0; c = 0.0; } else { a = (wx * wx * wyx2 - wx2 * w1 * wyx2 - wx2 * wx * wyx + wx3 * w1 * wyx + wx2 * wx2 * wy - wx3 * wx * wy) / den; b = (-wx2 * wx * wyx2 + wx3 * w1 * wyx2 + wx2 * wx2 * wyx - wx4 * w1 * wyx - wx3 * wx2 * wy + wx4 * wx * wy) / den; c = (wx2 * wx2 * wyx2 - wx3 * wx * wyx2 - wx3 * wx2 * wyx + wx4 * wx * wyx + wx3 * wx3 * wy - wx4 * wx2 * wy) / den; } } double findY(double const a, double const b, double const c, double const x) { return a * x * x + b * x + c; }; int main(int argc, char* argv[]) { WeightedData data[9]; data[0].weight=1; data[0].x=1; data[0].y=-52.0; data[1].weight=1; data[1].x=2; data[1].y=-48.0; data[2].weight=1; data[2].x=3; data[2].y=-43.0; data[3].weight=1; data[3].x=4; data[3].y=-44.0; data[4].weight=1; data[4].x=5; data[4].y=-35.0; data[5].weight=1; data[5].x=6; data[5].y=-31.0; data[6].weight=1; data[6].x=7; data[6].y=-32.0; data[7].weight=1; data[7].x=8; data[7].y=-43.0; data[8].weight=1; data[8].x=9; data[8].y=-52.0; double a=0.0, b=0.0, c=0.0; findQuadraticFactors(data, a, b, c, 9); std::cout << " x \t y" << std::endl; for (int i=0; i<9; ++i) { std::cout << " " << data[i].x << ", " << findY(a,b,c,data[i].x) << std::endl; } }

How about a simple digital low-pass filter?

y[0] = x[0]; for (i = 1; i < len; ++i) y[i] = a * x[i] + (1.0 - a) * y[i - 1];

In this case, **x[]** is your input data and **y[]** is the filtered output. The **a** coefficient is a value between 0 and 1 that you should tweak. An **a** value of 1 reproduces the input and the cut-off frequency decreases as **a** approaches 0.

Perhaps the parameters for your running average are set wrong (sample window too small or large)?

Is it just noise superimposed on your bell curve? How close is the noise frequency to that of the signal you're trying to retrieve? A picture of what you're trying to extract might help us identify a solution.

You could try some sort of fitting algorithm using a least squares fit if you can make a reasonable guess of the function parameters. Those sorts of techniques often have some immunity to noise.