Typical scanned text
Typical photographed text
What’s so difficult about reading text from a photograph?
The text in a photograph is much harder for a computer
to read than a clear scanned page of text.
The examples here show how photographed text typically suffers from
poor resolution and contrast as well as compression artefacts.
These effects make it much harder to separate the characters
from one another to read the text.
In a scanned page, furthermore, the lines of text are
straight and horizontal or almost horizontal,
the fonts are consistent and of reasonable size,
and the images themselves do not
suffer from the distortion introduced by a camera lens, by perspective,
or by uneven lighting.
In a typical image from a digital camera, however, text occurs in a wide range
of fonts and at all orientations. The baseline of a row of text is
typically far from straight, and the images are full of the distortions
introduced by the lens of the camera and by perspective.
Lighting
is frequently poor. Our system
copes with all these variables.
Photographic images are often full of distractions. Although a scanned page may include
an image or other non-text items, these often fit tidily with the flow of the
text and are relatively easy to find and ignore.
Photographs, however, can be full of
details — brickwork and foliage cause particular problems — that,
to a computer at least, look superficially like text. Sorting out these details
is well beyond the capabilities of most OCR systems.
|