I am programming a face recognition program using OpenCV.
When generating the eigenfaces:
- do I need to use a big database of unknown faces ?
- do I need to use only photos of the people I want my system to recognize ?
- do I need to use both ?
I am talking about the eigenfaces generation, this is the "learning" step.
And how many photos do I need to use to have decent accuracy ? More like 20, or 2000 ?
Eigenfaces works by projecting the faces into a particular "face basis" using principal component analysis or PCA. The basis does not have to include photos of people you want to recognize.
Instead, I would encourage you to train based upon a big database (at least 10k faces) that is well registered (eigenfaces doesn't work well with images that are shifted). The original paper by Turk and Pentland was remarkable partly due to the large pin registered face database they released. I would also say that try to have the lighting normalized to the same between the database and your test inputs.
In terms of testing, first 20 components should be sufficient to reconstruct a human recognizable face and first 100 components should be enough to discriminate between any two face for essentially arbitrarily large dataset.
You don't need too many random faces to compose a human face; somewhere close to 20 should give good results, maybe go with more if you can. They should all be lined up as much as possible to one another, front facing, and photos in grayscale under the same lighting conditions.