TODAY i am sharing a new thing with u that u can Extract text from image (JPG, JPEG, BMP, TIFF, GIF) and convert into editable Word, Text, Excel, PDF, Html output formats.Converted documents look exactly like the original - tables, columns and graphics.
Optical character recognition (OCR) is a system of converting scanned printed/handwritten image files into its machine readable text format. OCR software works by analyzing a document and comparing it with fonts stored in its database and/or by noting features typical to characters. Some OCR software also puts it through a spell checker to “guess” unrecognized words. 100% accuracy is difficult to achieve, but close approximation is what most software strive for.
Optical Character Recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned or photoed images of typewritten or printed text into machine-encoded/computer-readable text. It is widely used as a form of data entry from some sort of original paper data source, whether passport documents, invoices, bank statement, receipts, business card, mail, or any number of printed records. It is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data extraction and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.
this website this extraction free of cost
http://www.onlineocr.net
SOFTWARE
OCR Using Microsoft OneNote 2007For the occasional basic OCR stuff, MS OneNote’s optical character recognition feature is a timesaver. You might have missed it”¦it’s called Copy Text from Picture.
- Drag a scan or a saved picture into OneNote. You can also use OneNote to clip part of the screen or an image into OneNote.
- Right click on the inserted picture and select Copy Text from Picture. The copied optically recognized text goes into the clipboard and you can now paste it into any program like Word or Notepad.
OCR Using Microsoft Office Document Imaging
Another little used tool within the Microsoft family. It’s right there under Menu – Microsoft Office ““ Microsoft Office Tools – Microsoft Office Document Imaging.- Open the file in Microsoft Office Document Imaging – File ““ Open.
- Click the little eye icon – Recognize Text Using OCR.
- Click on MS Word Icon ““ Send Text to Word.
- A MS Word File opens with the editable converted text.
- Alternatively, you can also use MS Paint to select a specific area and copy it to the clipboard. Open MS Office Document Imaging ““ select Page ““ Paste Page to copy the selection for OCR.
So, now let’s leave the Microsoft family behind and look at three free tools which call themselves OCR Software”¦
SimpleOCR
The difficulty I was having with handwriting recognition using MS tools, could have found a solution in SimpleOCR. But the software offers handwriting recognition only as a 14 day free trial. Machine print recognition though does not have any restrictions.- The software can be set up to read directly from a scanner or by adding a page (jpg, tiff, bmp formats).
- SimpleOCR offers some control over the conversion through text selection, image selection and text ignore features.
- Conversion to text takes the process into a validation stage; a user can correct discrepancies in the converted text using an in-built spell-checker.
- The converted file can be saved to a doc or txt format.
SimpleOCR (v3.1) is a 9MB download and is compatible with Windows.
TopOCR
Just what I was talking about in the beginning! TopOCR, in a breakaway from typical OCR software, is designed more for digital cameras (at least 3MP) and mobile phones along with scanners. Like SimpleOCR, it has a two window interface ““ The source Image window and the Text window.- The software supports JPEG, TIFF, GIF and BMP formats.
- Image settings like brightness, color, contrast, despeckle, sharpen etc. can be used to improve readability of the image.
- Camera filter settings can also be configured for enhancing the image.
- The converted file can be saved in a variety of formats ““ PDF, RTF, HTML and TXT.
- TopOCR functions well with straight oriented text but the usual failing of OCR with columned text remains.
- The software though, parses a mixed page (text plus graphics) well and processes the text only.
- The software works with 11 languages.
TopOCR (v3.1) is an 8MB download and is compatible with Windows (not tested on Vista).
FreeOCR
This free OCR software uses the Tesseract OCR engine. Tesseract OCR code was developed at HP Labs between 1985 and 1995 and is currently with Google. It is thought of as one of the most accurate open source OCR engines available.FreeOCR is a simple Windows interface for that underlying code.
- It supports most image files and multi-page TIFF files.
- It can handle PDF formats and is also compatible with TWAIN devices like scanners.
- FreeOCR also has the familiar double window interface with easy to understand settings.
- Before starting the one click conversion process, you can adjust the image contrast for better readability.
Free OCR tools come with their own limitations. And scanning a page has to do a lot with resolutions, contrasts and clarity of fonts. From an average user’s standpoint, 100% OCR accuracy remains a pipedream.
Though the free tools were adequate with printed text, they failed with normal cursive handwritten text. My personal preference for offhand OCR use leans towards the two Microsoft products I mentioned in the beginning.
Your own say matters. Which is your tool of choice? Do the free OCR software recognize what you through at it? And more importantly, do you recognize what they throw back at you? Let us know”
No comments:
Post a Comment