Optical Character Recognition or simply OCR enable us to read text on scanned or bitmap images. It is a significant tool to help us copy the text without rewriting it again. I know there are a lot of OCR software available but I just want to show you how to do it on your own.
We will be using the Microsoft Office Document Imaging (MODI) which is of course available on the Microsoft Office. MODI library is not installed by default. You have to install it by using the setup package of Microsoft Office. See screenshot below;
Select Add or Remove Features
Click on Office Tools and select Microsoft Office Document Imaging. After installing, it will now be available on your .Net reference library. Using your visual studio, add reference then select the COM tab and find Microsoft Office Document Imaging Library as shown;
That's it and now we're ready to code. You only need a few lines of code to use MODI.
Document doc = new Document();
doc.Create("D:\ScanFiles\Sample.jpg");
doc.OCR(MiLANGUAGES.miLANG_ENGLISH, true, true);
string strText;
foreach (MODI.Image image in doc.Images)
{
strText = image.Layout.Text;
}
There you go you already have your own OCR software. For any questions, hit the comments. Thank you for reading. Happy coding!



No comments:
Post a Comment