OCR:
Why it's our Friend
Home > Build > Design
> Articles
By
Don Herion
(1)
(2)
Do
you know what the acronym OCR stands for?
Is
it:
Online
Computer Reviews
Oil Conservation
Route
Owl Care Room
Outer
Core Reactor
Onion Cutting
Routine
Optical Character
Recognition
If
you chose Optical Character Recognition you can read the rest of this
article. If you chose Owl Care Room or one of the others, you can
still read this article but maybe Web design is not for you.
OCR
Optical Character Recognition Software designed to accurately convert
scanned text documents into computer editable text. OCR technology use algorithms to
translate combinations of dots in a bitmap into a recognized character.
OCR
is a technology that has been around for years. In the beginning, the accuracy
claims by software companies was somewhat overstated (i.e. very exaggerated).
Users often had to spend hours rechecking converted documents because the OCR
software saw an i and turned it into a 1. Alternately,
it would convert italicized text it into a something only a CIA operative with
a decoder ring could decipher. Fortunately, most of these software glitches have
now been resolved. At least thats what the makers of OCR technology would
have us believe. Most claim an accuracy rate of 99%.
A
Real World Project
Recently I received permission to place a sample book chapter on our iBoost Web
site. Peachpit Press graciously gave us the OK to put Chapter 10 of HTML
4 for the World Wide Web by Elizabeth Castro online. Its an excellent book,
and I highly recommend it.
 |
| Screenshot of TypeReader 6 |
Unfortunately
for us, they could only give us the book in Framemaker format. Not having the
program, I needed an alternative. My choice was to OCR a hard copy of the chapter.
But OCR software is not cheap. OmniPage Pro 10 runs about $500. TypeReader Pro
6.0 from Expervision costs almost $300. Both claim to be the best, boasting 99
percent accuracy, the power to recognize foreign languages, export the results
to HTML and make a great cup of coffee all the while I sit back and try to decipher
Dennis Miller's latest obscure reference on Monday Night Football.
After
very, very careful review of both programs, examining all the pluses and minuses,
and expending hours of serious meditation (i.e. playing Gameboy) I chose TypeReader
6.0. The deciding factor was Expervisions
accuracy claims, its support of over 2600 fonts and most importantly the use of
a free 30-day trial version. I downloaded the 15 meg program and installed it
without a hitch.