Restoration of archival documents using a wavelet technique

Download Restoration of archival documents using a wavelet technique

Post on 06-Nov-2016

215 views

Category:

Documents

0 download

TRANSCRIPT

  • Restoration of Archival DocumentsUsing a Wavelet Technique

    Chew Lim Tan, Member, IEEE,Ruini Cao, and Peiyi Shen

    AbstractThis paper addresses a problem of restoring handwritten archivaldocuments by recovering their contents from the interfering handwriting on thereverse side caused by the seeping of ink. We present a novel method that worksby first matching both sides of a document such that the interfering strokes aremapped with the corresponding strokes originating from the reverse side. Thisfacilitates the identification of the foreground and interfering strokes. A waveletreconstruction process then iteratively enhances the foreground strokes andsmears the interfering strokes so as to strengthen the discriminating capability ofan improved Canny edge detector against the interfering strokes. The method hasbeen shown to restore the documents effectively with average precision and recallrates for foreground text extraction at 84 percent and 96 percent, respectively.

    Index TermsDocument image analysis, wavelet enhancement, waveletsmearing, Canny edge detector, text extraction, image segmentation, bleed-through, show-through, noise cancellation, denoising.

    1 INTRODUCTION

    AN important task in document image analysis is text segmenta-tion [1], [2]. However, this paper introduces a rather differentproblem of text segmentation, that is, how to extract clear textstrings from interfering images originating from the reverse side.The motivation of this research arises from a request from theNational Archives of Singapore to restore the appearance of theirhandwritten archival documents that have been kept over longperiods of time during which the seeping of ink has resulted indouble images as shown in Fig. 1. Given this problem, we have toseparate three classes of objects: the foreground text, interferingstrokes from the reverse side, and the background. Usually, theforeground writing appears darker than the interfering strokes.However, there are cases where the foreground and interferingstrokes have similar intensities, or worse still, the interferingstrokes are more prominent than the foreground.

    Many segmentation and binarization approaches have beenreported in the literature [2], [3], [4], [5], [6], [7]. These methods aimto extract clear text from either noisy or textured background.However, they deal with one-sided documents and most methodsbasically assume separable gray scale and/or distinctive featuresbetween the foreground and background.

    Similar work can be seen in solving the show-through problemin scanning duplex printed documents. Dons work segments thedouble-sided images based on the isolated gray-scale range ofinterfering images and the noise characteristics [8]. Sharma [9], [10]develops a unique scanning model and an adaptive linear-filteringscheme for removal of show-through using both sides of thedocument. Our problem, however, is different from show-throughin that the interfering strokes are the result of bleed-through dueto anisotropic absorption and spreading in the paper. As such,corresponding images on both sides may not be completely matchedlike in the show-through situation.

    Related work is seen in the denoising techniques presented by

    Donoho [11], [12] based on thresholding and shrinking empirical

    wavelet coefficients for recovering and/or denoising signals.

    Berkner et al. [13] propose a wavelet-based approach to sharpening

    and smoothing of images for use in deblurring or denoising of

    images. Our problem again differs from these works here in that the

    interfering strokes are not really noise but rather distinctive images

    in their own right that we want to distinguish and remove.

    Finally, Lu et al. [14], Lu [15] presents a similar wavelet method

    by decreasing the edge contrast and smearing the direct components

    of the edges with its neighboring pixels. Though his edge-based

    wavelet image preprocessing method can handle the change of

    feature coefficients (local maxima) [16], [17], [18], it is still

    inadequate for our present problem: 1) Due to the bleed-through

    problem discussed earlier, we find different edge positions between

    the interfering and original strokes on either side. 2) As a result, any

    mismatch between the interfering strokes observed on the front and

    their original strokes on the reverse side will result in a mistaken

    identity of interfering strokes as foreground edges.In view of the above, we propose a new method which differs

    from others in the following aspects. 1) We develop an improvedCanny edge detector to suppress unwanted interfering strokes[19], [20]. 2) We process the image by using wavelet enhancing andsmearing operations to work on the foreground and interferingstrokes, respectively. 3) We adopt a set of wavelet enhancementand smearing coefficients in different scales instead of thetraditional local maxima reconstruction method.

    2 PROPOSED METHOD

    2.1 Image Matching and Overlay

    It is natural that weak foreground strokes may not necessarily seepinto the reverse side (Fig. 1d). On the other hand, interferingstrokes must have been originated from strong foreground strokeson the reverse side. Thus, we match both images from either sideof a page by hand. Let fx; y denote the k bits per pixel gray-scalefront images, and bx; y the reverse side image of the same page,where x and y represent the row and the column, respectively. Thetwo images have the same dimension M N . An overlayoperation is carried out as follows:

    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 10, OCTOBER 2002 1399

    Fig. 1. Sample images: (a) sample1, front side, (b) sample1, reverse side,

    (c) sample2, front side, and (d) sample2, reverse side.

    . C.L. Tan is with the School of Computing, National University ofSingapore, 3 Science Drive 2, Singapore, 117543.E-mail: tancl@comp.nus.edu.sg.

    . R. Cao is with Hotcard Technology Pte Ltd., 2 Jurong East Street 21, #05-30 IMM Building., Singapore 609601. E-mail: caorn@hotcardtech.com.

    . P. Shen is with the Communication Solution Group, Agilent Technologies,Yishun Ave. 7, No. 1, Singapore, 768923.E-mail: pei-yi_shen@aglient.com.

    Manuscript received 27 Dec. 2000; revised 17 Aug. 2001; accepted 10 May 2002.Recommended for acceptance by L. Vincent.For information on obtaining reprints of this article, please send e-mail to:tpami@computer.org, and reference IEEECS Log Number 113366.

    0162-8828/02/$17.00 2002 IEEE

  • 1. Invert the reverse side image:

    invertbx; y 2k 1 bx; y: 1

    2. Flip the inverted reverse side image and superimpose it onthe front image such that corresponding strokes on eitherside are matched to effect an image subtraction:

    ax; y flipinvertbx; y fx; y; 2where flip means flipping the image horizontallyresulting in its mirror image:

    flipbx; y bx;N y: 3

    3. Scale the resultant image:

    cx; y curve ax; y minax; ymaxax; y minax; y 2

    k 1

    ; 4

    where curve is a nonlinear transform that shrinks lowintensity values to make the subtracted strokes in theoverlay operation less prominent.

    y curvex 2k 12k 12 x x

    q: 5

    Fig. 2 shows the results of overlay and nonlinear curve processing.

    2.2 Enhancement and Smearing Features

    Comparing Fig. 1 and Fig. 2, we could see that the overlaypreprocessing has weakened most of the interfering strokes. Thoughsome of the foreground strokes have also been affected, the majorportion of the foreground strokes remains intact. These foregroundstrokes, though somewhat impaired, now serve as seeds to start thefollowing enhancement and smearing processes. The idea now is todetect the foreground strokes on the front and enhance them usingwavelets. The detected and binarized strokes from the foregroundoverlay image form the enhancement feature image. Similarly, wedetect the foreground strokes on the reverse side to locate theircorresponding interfering strokes on the front so as to smear theseinterfering strokes using wavelets. The detected and binarizedstrokes from the reverse side overlay image result in the smearing

    feature image. Figs. 3a and 3b show the detected enhancement/

    smearing features of sample 1 and sample 2, respectively. Iterative

    enhancement and smearing processes are then carried out on the

    original front image using the enhancement and smearing feature

    images to identify candidate strokes. The same process could be done

    ontothereversesidewherethereversesidewillbetreatedasthefront.Let Ex; y be the enhancement feature image and Sx; y be the

    smearing feature image. Both have the same dimension M N asthe front image fx; y and the reverse side image bx; y. Theenhancement and smearing features may be described as follows:

    Ex; y 255; detected stroke on fx; y0; otherwise;

    1400 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 10, OCTOBER 2002

    Fig. 2. Overlay results of images in Fig. 1: (a) sample1, front side, (b) sample1,

    reverse side, (c) sample2, front side, and (d) sample2, reverse side.

    Fig. 3. (a) Enhancement/smearing features of Fig.1a, (b) enhancement/smearingfeatures of Fig.1c, (Note: Multichannel in (a) and (b): green, original background;red: enhancement features; blue: smearing features); (c) enhanced/smearedimage of Fig. 1a; (d) enhanced/smeared image of Fig. 1c, (e) segmentation result of(c), (f) segmentation result of (d), (g) final binarized image of (e), and (h) finalbinarized image of (f).

  • Sx; y 255; detected stroke on bx; y0; otherwise:

    6

    To detect the foreground strokes on either side so as to determine

    Ex; y and Sx; y, an improved Canny edge detection algorithmwith an orientation filter and orientation constraint [19], [20] is first

    adopted to pick up the edges of the foreground strokes based on the

    observation that the interfering strokes are not as sharp as the

    foreground strokes. The orientation filter favors foreground strokes

    that are predominantly slanting at a particular angle against

    interfering strokes slanting at a different angle. The orientation

    constraint helps minimize erroneous edges caused by nearby

    interfering strokes. The resultant edges then serve as loci to recover

    streaks of gray-level images of a predetermined width from the

    foreground overlay images. The recovered streaks are binarized

    using an adaptive Niblacks threshold [21] to giveEx; y andSx; y,respectively.

    2.3 Iterative Wavelet Reconstruction

    The wavelet decomposition of fx; y is written as follows [22],Cjfm;n < fx; y;j;m;nx; y >m;n2Z2D1j fm;n < fx; y;1j;m;nx; y >m;n2Z2D2j fm;n < fx; y;2j;m;nx; y >m;n2Z2D3j fm;n < fx; y;3j;m;nx; y >m;n2Z2 ;

    8>>>>>:

    7

    where Cjfm;n is the wavelet approximation coefficient andDkjfm;n k 1; 2; 3 are the wavelet detail coefficients at scale j ofthe wavelet decomposition. With the image wavelet representation

    Wfx; y, the enhancement feature Ex; y and smearing featureSx; y, the iterative wavelet reconstruction may be described asfollows:

    1. The multiscale decomposition of the foreground. Unlikethe traditional wavelet decomposition, the resultant imagein each scale and iteration retains the same size as theoriginal foreground image.

    Wfx; y fCJ x; y; Dkj x; y; j 0; . . . ; J; k 1; 2; 3g: 8

    2. Basically, wavelet details Dkj are related to the highfrequency components in the edges which are enhancedand smeared by the enhancement coefficient ekj andsmearing coefficient skj , respectively, according toEx; y and Sx; y as below:

    Dof if Ex; y 255 Dkj x; y ekjDkj x; y;if Sx; y 255 Dkj x; y skjDkj x; y;

    gforj 0; . . . ; J ; k 1; 2; 3;x 1; . . . ;M; y 1; . . . ; N;9

    where ekj and skj fekj > 1; 1 > skj > 0; j 0; . . . ; J; k 1; 2; 3; g

    are set empirically. The enhanced/smeared image f 0x; y isreconstructed from the modified coefficients.

    f 0x; y inverse wavelet transformfCJ x; y; Dkj x; y; j 0; . . . ; J; k 1; 2; 3g:

    10

    3. To the reconstructed image f 0x; y, apply the wavelettransform again. Note that, in obtaining the inverse of thewavelet transform, the revised Dkj obtained in (9) are usedagain.

    Wf 0x; y fC0J x; y; D0kj x; y; j 0; . . . ; J; k 1; 2; 3g 11

    f 0x; y inverse wavelet transformfC0J x; y; Dkj x; y; j 0; . . . ; J; k 1; 2; 3g:

    12

    4. Iteratively, process the wavelet decomposition and recon-struction using (11) and (12), we could get the finalenhanced/smeared gray-scale image. The iteration progres-sively strengthens or weakens the strokes in the recon-structed image through the gradual changes in the edge [16].

    5. Clip the final enhanced/smeared image using the followingfunction:

    f 0 0x; y 0; f 0x; y 0255; f 0x; y 255f 0x; y; otherwise:

    8

Recommended

View more >