Business card parser. How to extract related information from business card recognized texts?

I have developed iphone app with OCR scanning feature. Using Tesseract api, got the text from the image taken. But now i need to separate each text with respect to name, address, email, phone number etc. Because business card structure/format is not specific, its bit of difficult to assume.

However few things assume 1) "@" containing string mostly going to be email id. 2) all digits with braces or + sign mostly going to be phone number.. but still there are lots and lots of possibilities.


You will need the help of NSLInguisticTagger class .. This is your best bet or else you will have to create similar logic for each part as you stated above.

You can check the logic we used in this Javascript BCR library, also based on tesseract (the porting in js).

