Restoration of archival documents using a wavelet technique
Post on 06-Nov-2016
Restoration of Archival DocumentsUsing a Wavelet Technique
Chew Lim Tan, Member, IEEE,Ruini Cao, and Peiyi Shen
AbstractThis paper addresses a problem of restoring handwritten archivaldocuments by recovering their contents from the interfering handwriting on thereverse side caused by the seeping of ink. We present a novel method that worksby first matching both sides of a document such that the interfering strokes aremapped with the corresponding strokes originating from the reverse side. Thisfacilitates the identification of the foreground and interfering strokes. A waveletreconstruction process then iteratively enhances the foreground strokes andsmears the interfering strokes so as to strengthen the discriminating capability ofan improved Canny edge detector against the interfering strokes. The method hasbeen shown to restore the documents effectively with average precision and recallrates for foreground text extraction at 84 percent and 96 percent, respectively.
Index TermsDocument image analysis, wavelet enhancement, waveletsmearing, Canny edge detector, text extraction, image segmentation, bleed-through, show-through, noise cancellation, denoising.
AN important task in document image analysis is text segmenta-tion , . However, this paper introduces a rather differentproblem of text segmentation, that is, how to extract clear textstrings from interfering images originating from the reverse side.The motivation of this research arises from a request from theNational Archives of Singapore to restore the appearance of theirhandwritten archival documents that have been kept over longperiods of time during which the seeping of ink has resulted indouble images as shown in Fig. 1. Given this problem, we have toseparate three classes of objects: the foreground text, interferingstrokes from the reverse side, and the background. Usually, theforeground writing appears darker than the interfering strokes.However, there are cases where the foreground and interferingstrokes have similar intensities, or worse still, the interferingstrokes are more prominent than the foreground.
Many segmentation and binarization approaches have beenreported in the literature , , , , , . These methods aimto extract clear text from either noisy or textured background.However, they deal with one-sided documents and most methodsbasically assume separable gray scale and/or distinctive featuresbetween the foreground and background.
Similar work can be seen in solving the show-through problemin scanning duplex printed documents. Dons work segments thedouble-sided images based on the isolated gray-scale range ofinterfering images and the noise characteristics . Sharma , develops a unique scanning model and an adaptive linear-filteringscheme for removal of show-through using both sides of thedocument. Our problem, however, is different from show-throughin that the interfering strokes are the result of bleed-through dueto anisotropic absorption and spreading in the paper. As such,corresponding images on both sides may not be completely matchedlike in the show-through situation.
Related work is seen in the denoising techniques presented by
Donoho ,  based on thresholding and shrinking empirical
wavelet coefficients for recovering and/or denoising signals.
Berkner et al.  propose a wavelet-based approach to sharpening
and smoothing of images for use in deblurring or denoising of
images. Our problem again differs from these works here in that the
interfering strokes are not really noise but rather distinctive images
in their own right that we want to distinguish and remove.
Finally, Lu et al. , Lu  presents a similar wavelet method
by decreasing the edge contrast and smearing the direct components
of the edges with its neighboring pixels. Though his edge-based
wavelet image preprocessing method can handle the change of
feature coefficients (local maxima) , , , it is still
inadequate for our present problem: 1) Due to the bleed-through
problem discussed earlier, we find different edge positions between
the interfering and original strokes on either side. 2) As a result, any
mismatch between the interfering strokes observed on the front and
their original strokes on the reverse side will result in a mistaken
identity of interfering strokes as foreground edges.In view of the above, we propose a new method which differs
from others in the following aspects. 1) We develop an improvedCanny edge detector to suppress unwanted interfering strokes, . 2) We process the image by using wavelet enhancing andsmearing operations to work on the foreground and interferingstrokes, respectively. 3) We adopt a set of wavelet enhancementand smearing coefficients in different scales instead of thetraditional local maxima reconstruction method.
2 PROPOSED METHOD
2.1 Image Matching and Overlay
It is natural that weak foreground strokes may not necessarily seepinto the reverse side (Fig. 1d). On the other hand, interferingstrokes must have been originated from strong foreground strokeson the reverse side. Thus, we match both images from either sideof a page by hand. Let fx; y denote the k bits per pixel gray-scalefront images, and bx; y the reverse side image of the same page,where x and y represent the row and the column, respectively. Thetwo images have the same dimension M N . An overlayoperation is carried out as follows:
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 10, OCTOBER 2002 1399
Fig. 1. Sample images: (a) sample1, front side, (b) sample1, reverse side,
(c) sample2, front side, and (d) sample2, reverse side.
. C.L. Tan is with the School of Computing, National University ofSingapore, 3 Science Drive 2, Singapore, 117543.E-mail: firstname.lastname@example.org.
. R. Cao is with Hotcard Technology Pte Ltd., 2 Jurong East Street 21, #05-30 IMM Building., Singapore 609601. E-mail: email@example.com.
. P. Shen is with the Communication Solution Group, Agilent Technologies,Yishun Ave. 7, No. 1, Singapore, 768923.E-mail: firstname.lastname@example.org.
Manuscript received 27 Dec. 2000; revised 17 Aug. 2001; accepted 10 May 2002.Recommended for acceptance by L. Vincent.For information on obtaining reprints of this article, please send e-mail to:email@example.com, and reference IEEECS Log Number 113366.
0162-8828/02/$17.00 2002 IEEE
1. Invert the reverse side image:
invertbx; y 2k 1 bx; y: 1
2. Flip the inverted reverse side image and superimpose it onthe front image such that corresponding strokes on eitherside are matched to effect an image subtraction:
ax; y flipinvertbx; y fx; y; 2where flip means flipping the image horizontallyresulting in its mirror image:
flipbx; y bx;N y: 3
3. Scale the resultant image:
cx; y curve ax; y minax; ymaxax; y minax; y 2
where curve is a nonlinear transform that shrinks lowintensity values to make the subtracted strokes in theoverlay operation less prominent.
y curvex 2k 12k 12 x x
Fig. 2 shows the results of overlay and nonlinear curve processing.
2.2 Enhancement and Smearing Features
Comparing Fig. 1 and Fig. 2, we could see that the overlaypreprocessing has weakened most of the interfering strokes. Thoughsome of the foreground strokes have also been affected, the majorportion of the foreground strokes remains intact. These foregroundstrokes, though somewhat impaired, now serve as seeds to start thefollowing enhancement and smearing processes. The idea now is todetect the foreground strokes on the front and enhance them usingwavelets. The detected and binarized strokes from the foregroundoverlay image form the enhancement feature image. Similarly, wedetect the foreground strokes on the reverse side to locate theircorresponding interfering strokes on the front so as to smear theseinterfering strokes using wavelets. The detected and binarizedstrokes from the reverse side overlay image result in the smearing
feature image. Figs. 3a and 3b show the detected enhancement/
smearing features of sample 1 and sample 2, respectively. Iterative
enhancement and smearing processes are then carried out on the
original front image using the enhancement and smearing feature
images to identify candidate strokes. The same process could be done
ontothereversesidewherethereversesidewillbetreatedasthefront.Let Ex; y be the enhancement feature image and Sx; y be the
smearing feature image. Both have the same dimension M N asthe front image fx; y and the reverse side image bx; y. Theenhancement and smearing features may be described as follows:
Ex; y 255; detected stroke on fx; y0; otherwise;
1400 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 10, OCTOBER 2002
Fig. 2. Overlay results of images in Fig. 1: (a) sample1, front side, (b) sample1,
reverse side, (c) sample2, front side, and (d) sample2, reverse side.
Fig. 3. (a) Enhancement/smearing features of Fig.1a, (b) enhancement/smearingfeatures of Fig.1c, (Note: Multichannel in (a) and (b): green, original background;red: enhancement features; blue: smearing features); (c) enhanced/smearedimage of Fig. 1a; (d) enhanced/smeared image of Fig. 1c, (e) segmentation result of(c), (f) segmentation result of (d), (g) final binarized image of (e), and (h) finalbinarized image of (f).
Sx; y 255; detected stroke on bx; y0; otherwise:
To detect the foreground strokes on either side so as to determine
Ex; y and Sx; y, an improved Canny edge detection algorithmwith an orientation filter and orientation constraint ,  is first
adopted to pick up the edges of the foreground strokes based on the
observation that the interfering strokes are not as sharp as the
foreground strokes. The orientation filter favors foreground strokes
that are predominantly slanting at a particular angle against
interfering strokes slanting at a different angle. The orientation
constraint helps minimize erroneous edges caused by nearby
interfering strokes. The resultant edges then serve as loci to recover
streaks of gray-level images of a predetermined width from the
foreground overlay images. The recovered streaks are binarized
using an adaptive Niblacks threshold  to giveEx; y andSx; y,respectively.
2.3 Iterative Wavelet Reconstruction
The wavelet decomposition of fx; y is written as follows ,Cjfm;n < fx; y;j;m;nx; y >m;n2Z2D1j fm;n < fx; y;1j;m;nx; y >m;n2Z2D2j fm;n < fx; y;2j;m;nx; y >m;n2Z2D3j fm;n < fx; y;3j;m;nx; y >m;n2Z2 ;
where Cjfm;n is the wavelet approximation coefficient andDkjfm;n k 1; 2; 3 are the wavelet detail coefficients at scale j ofthe wavelet decomposition. With the image wavelet representation
Wfx; y, the enhancement feature Ex; y and smearing featureSx; y, the iterative wavelet reconstruction may be described asfollows:
1. The multiscale decomposition of the foreground. Unlikethe traditional wavelet decomposition, the resultant imagein each scale and iteration retains the same size as theoriginal foreground image.
Wfx; y fCJ x; y; Dkj x; y; j 0; . . . ; J; k 1; 2; 3g: 8
2. Basically, wavelet details Dkj are related to the highfrequency components in the edges which are enhancedand smeared by the enhancement coefficient ekj andsmearing coefficient skj , respectively, according toEx; y and Sx; y as below:
Dof if Ex; y 255 Dkj x; y ekjDkj x; y;if Sx; y 255 Dkj x; y skjDkj x; y;
gforj 0; . . . ; J ; k 1; 2; 3;x 1; . . . ;M; y 1; . . . ; N;9
where ekj and skj fekj > 1; 1 > skj > 0; j 0; . . . ; J; k 1; 2; 3; g
are set empirically. The enhanced/smeared image f 0x; y isreconstructed from the modified coefficients.
f 0x; y inverse wavelet transformfCJ x; y; Dkj x; y; j 0; . . . ; J; k 1; 2; 3g:
3. To the reconstructed image f 0x; y, apply the wavelettransform again. Note that, in obtaining the inverse of thewavelet transform, the revised Dkj obtained in (9) are usedagain.
Wf 0x; y fC0J x; y; D0kj x; y; j 0; . . . ; J; k 1; 2; 3g 11
f 0x; y inverse wavelet transformfC0J x; y; Dkj x; y; j 0; . . . ; J; k 1; 2; 3g:
4. Iteratively, process the wavelet decomposition and recon-struction using (11) and (12), we could get the finalenhanced/smeared gray-scale image. The iteration progres-sively strengthens or weakens the strokes in the recon-structed image through the gradual changes in the edge .
5. Clip the final enhanced/smeared image using the followingfunction:
f 0 0x; y 0; f 0x; y 0255; f 0x; y 255f 0x; y; otherwise: