clean noise from an image
I need to know how to clean noise from an image with Matlab.
lets look at this example:
as you see the numbers is not look clearly.
so how can I clean the noise and the pixels that are not the numbers so the identification will be easier.
Did you start with a bilevel (two color, black and white)? Or did you threshold it yourself?
If it's the latter, you may find it easier to perform noise reduction before you threshold. In this case, please upload the image you have before thresholding.
If it's the former, then you'll have a tough time as traditional noise reduction is concerned. The reason is that a lot of noise reduction approaches take advantage of the distinction in statistical properties between the noise and the actual natural image. By thresholding, that distinction is essentially destroyed.
OK, technically, your image isn't really noisy -- it's blurry (letters are running into each other) and has background interference.
But anyway, here is how I would deal with it:
- Pick a color channel to work with (RGB is three channels, typically one is enough). I chose green because it looked the easiest to manipulate.
- Blur the image (I used a 5x5 Gaussian kernel in GIMP)
- Threshold using an empirically determined threshold (basically, try each threshold until you get a decent result). It's OK if some of the numbers have gaps -- we can close them in the next step
- Morphological image processing (erosion and dilation)
Blur (5x5 Gaussian):
Thresholded image (I used a threshold of ~93 in GIMP):
You can see that the gaps in the middle 6 and 9 have dissapeared. Unfortunately, I couldn't get the gap in the left 3 to go away -- it's simply too large. Here's what the problems causing this are:
- The line along the top of the image is much darker than some parts of the 3. If you use a threshold to remove the line, then a gap will be created. If you were to somehow remove that line (e.g. by more zealous cropping), the thresholding result would be much better as far as the 3 is concerned.
- Also, the middle 2 and 6 are running together. Heavy thresholding is required to prevent them from both forming the same blob after thresholding.
Let's do it step by step in Mathematica:
(*first separate the image in HSB channels*) i1 = ColorSeparate[ColorNegate@yourColorImage, "HSB"]
(*Let's keep the B Channel*) i2 = i1[]
(*And Binarize it *) i3 = Binarize[i2, 0.92]
(*Perform a Thinning to get the skeleton*) i4 = Thinning[i3]
(*Now we cut those hairs*) i5 = Pruning[i4, 10]
(*Remove the small lines*) i6 = DeleteSmallComponents[i5, 30]
(*And finally dilate*) i7 = Dilation[i6, 3]
(*Now we can perform an OCR*) TextRecognize@i7 -->"93 269 23"
Since this question is tagged MATLAB, I translated @belisarius's solution as such (which I think is superior to the currently accepted answer):
%# read image I = imread('http://i.stack.imgur.com/nGNGf.png'); %# complement it, and convert to HSV colorspace hsv = rgb2hsv(imcomplement(I)); I1 = hsv(:,:,3); %# work with V channel %# Binarize/threshold image I2 = im2bw(I1, 0.92); %# Perform morphological thinning to get the skeleton I3 = bwmorph(I2, 'thin',Inf); %# prune the skeleton (remove small branches at the endpoints) I4 = bwmorph(I3, 'spur', 7); %# Remove small components I5 = bwareaopen(I4, 30); %# dilate image I6 = imdilate(I5, strel('square',2*3+1)); %# show step-by-step results figure('Position',[200 150 700 700]) subplot(711), imshow(I) subplot(712), imshow(I1) subplot(713), imshow(I2) subplot(714), imshow(I3) subplot(715), imshow(I4) subplot(716), imshow(I5) subplot(717), imshow(I6)
Finally you can apply some form of OCR to recognize the numbers. Unfortunately, MATLAB has no built-in function equivalent to TextRecognize in Mathematica... In the meanwhile, look in the File Exchange, I'm sure you will find dozens of submissions filling the gap :)
I think there are two things you could aim to do to make them more detectable:
- Remove patches smaller than a certain number of pixels (this would remove the spots between the sets of digits)
- Numbers should be "closed" forms, so you need an algorithm to detect the pixels (at the top of each number) that should be changed to black in order to "close" the number "shapes".
You also have linear features that are a part of the noise signal which could be detected through edge / line detection.
Detecting contiguous "zones" and calculating characteristics such as compactness or length / height might also help in identifying which structures to keep...