restoration of archival documents using a wavelet technique

Download Restoration of archival documents using a wavelet technique

Post on 06-Nov-2016

215 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Restoration of Archival DocumentsUsing a Wavelet Technique

    Chew Lim Tan, Member, IEEE,Ruini Cao, and Peiyi Shen

    AbstractThis paper addresses a problem of restoring handwritten archivaldocuments by recovering their contents from the interfering handwriting on thereverse side caused by the seeping of ink. We present a novel method that worksby first matching both sides of a document such that the interfering strokes aremapped with the corresponding strokes originating from the reverse side. Thisfacilitates the identification of the foreground and interfering strokes. A waveletreconstruction process then iteratively enhances the foreground strokes andsmears the interfering strokes so as to strengthen the discriminating capability ofan improved Canny edge detector against the interfering strokes. The method hasbeen shown to restore the documents effectively with average precision and recallrates for foreground text extraction at 84 percent and 96 percent, respectively.

    Index TermsDocument image analysis, wavelet enhancement, waveletsmearing, Canny edge detector, text extraction, image segmentation, bleed-through, show-through, noise cancellation, denoising.

    1 INTRODUCTION

    AN important task in document image analysis is text segmenta-tion [1], [2]. However, this paper introduces a rather differentproblem of text segmentation, that is, how to extract clear textstrings from interfering images originating from the reverse side.The motivation of this research arises from a request from theNational Archives of Singapore to restore the appearance of theirhandwritten archival documents that have been kept over longperiods of time during which the seeping of ink has resulted indouble images as shown in Fig. 1. Given this problem, we have toseparate three classes of objects: the foreground text, interferingstrokes from the reverse side, and the background. Usually, theforeground writing appears darker than the interfering strokes.However, there are cases where the foreground and interferingstrokes have similar intensities, or worse still, the interferingstrokes are more prominent than the foreground.

    Many segmentation and binarization approaches have beenreported in the literature [2], [3], [4], [5], [6], [7]. These methods aimto extract clear text from either noisy or textured background.However, they deal with one-sided documents and most methodsbasically assume separable gray scale and/or distinctive featuresbetween the foreground and background.

    Similar work can be seen in solving the show-through problemin scanning duplex printed documents. Dons work segments thedouble-sided images based on the isolated gray-scale range ofinterfering images and the noise characteristics [8]. Sharma [9], [10]develops a unique scanning model and an adaptive linear-filteringscheme for removal of show-through using both sides of thedocument. Our problem, however, is different from show-throughin that the interfering strokes are the result of bleed-through dueto anisotropic absorption and spreading in the paper. As such,corresponding images on both sides may not be completely matchedlike in the show-through situation.

    Related work is seen in the denoising techniques presented by

    Donoho [11], [12] based on thresholding and shrinking empirical

    wavelet coefficients for recovering and/or denoising signals.

    Berkner et al. [13] propose a wavelet-based approach to sharpening

    and smoothing of images for use in deblurring or denoising of

    images. Our problem again differs from these works here in that the

    interfering strokes are not really noise but rather distinctive images

    in their own right that we want to distinguish and remove.

    Finally, Lu et al. [14], Lu [15] presents a similar wavelet method

    by decreasing the edge contrast and smearing the direct components

    of the edges with its neighboring pixels. Though his edge-based

    wavelet image preprocessing method can handle the change of

    feature coefficients (local maxima) [16], [17], [18], it is still

    inadequate for our present problem: 1) Due to the bleed-through

    problem discussed earlier, we find different edge positions between

    the interfering and original strokes on either side. 2) As a result, any

    mismatch between the interfering strokes observed on the front and

    their original strokes on the reverse side will result in a mistaken

    identity of interfering strokes as foreground edges.In view of the above, we propose a new method which differs

    from others in the following aspects. 1) We develop an improvedCanny edge detector to suppress unwanted interfering strokes[19], [20]. 2) We process the image by using wavelet enhancing andsmearing operations to work on the foreground and interferingstrokes, respectively. 3) We adopt a set of wavelet enhancementand smearing coefficients in different scales instead of thetraditional local maxima reconstruction method.

    2 PROPOSED METHOD

    2.1 Image Matching and Overlay

    It is natural that weak foreground strokes may not necessarily seepinto the reverse side (Fig. 1d). On the other hand, interferingstrokes must have been originated from strong foreground strokeson the reverse side. Thus, we match both images from either sideof a page by hand. Let fx; y denote the k bits per pixel gray-scalefront images, and bx; y the reverse side image of the same page,where x and y represent the row and the column, respectively. Thetwo images have the same dimension M N . An overlayoperation is carried out as follows:

    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 10, OCTOBER 2002 1399

    Fig. 1. Sample images: (a) sample1, front side, (b) sample1, reverse side,

    (c) sample2, front side, and (d) sample2, reverse side.

    . C.L. Tan is with the School of Computing, National University ofSingapore, 3 Science Drive 2, Singapore, 117543.E-mail: tancl@comp.nus.edu.sg.

    . R. Cao is with Hotcard Technology Pte Ltd., 2 Jurong East Street 21, #05-30 IMM Building., Singapore 609601. E-mail: caorn@hotcardtech.com.

    . P. Shen is with the Communication Solution Group, Agilent Technologies,Yishun Ave. 7, No. 1, Singapore, 768923.E-mail: pei-yi_shen@aglient.com.

    Manuscript received 27 Dec. 2000; revised 17 Aug. 2001; accepted 10 May 2002.Recommended for acceptance by L. Vincent.For information on obtaining reprints of this article, please send e-mail to:tpami@computer.org, and reference IEEECS Log Number 113366.

    0162-8828/02/$17.00 2002 IEEE

  • 1. Invert the reverse side image:

    invertbx; y 2k 1 bx; y: 1

    2. Flip the inverted reverse side image and superimpose it onthe front image such that corresponding strokes on eitherside are matched to effect an image subtraction:

    ax; y flipinvertbx; y fx; y; 2where flip means flipping the image horizontallyresulting in its mirror image:

    flipbx; y bx;N y: 3

    3. Scale the resultant image:

    cx; y curve ax; y minax; ymaxax; y minax; y 2

    k 1

    ; 4

    where curve is a nonlinear transform that shrinks lowintensity values to make the subtracted strokes in theoverlay operation less prominent.

    y curvex 2k 12k 12 x x

    q: 5

    Fig. 2 shows the results of overlay and nonlinear curve processing.

    2.2 Enhancement and Smearing Features

    Comparing Fig. 1 and Fig. 2, we could see that the overlaypreprocessing has weakened most of the interfering strokes. Thoughsome of the foreground strokes have also been affected, the majorportion of the foreground strokes remains intact. These foregroundstrokes, though somewhat impaired, now serve as seeds to start thefollowing enhancement and smearing processes. The idea now is todetect the foreground strokes on the front and enhance them usingwavelets. The detected and binarized strokes from the foregroundoverlay image form the enhancement feature image. Similarly, wedetect the foreground strokes on the reverse side to locate theircorresponding interfering strokes on the front so as to smear theseinterfering strokes using wavelets. The detected and binarizedstrokes from the reverse side overlay image result in the smearing

    feature image. Figs. 3a and 3b show the detected enhancement/

    smearing features of sample 1 and sample 2, respectively. Iterative

    enhancement and smearing processes are then carried out on the

    original front image using the enhancement and smearing feature

    images to identify candidate strokes. The same process could be done

    ontothereversesidewherethereversesidewillbetreatedasthefront.Let Ex; y be the enhancement feature image and Sx; y be the

    smearing feature image. Both have the same dimension M N asthe front image fx; y and the reverse side image bx; y. Theenhancement and smearing features may be described as follows:

    Ex; y 255; detected stroke on fx; y0; otherwise;

    1400 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 10, OCTOBER 2002

    Fig. 2. Overlay results of images in Fig. 1: (a) sample1, front side, (b) sample1,

    reverse side, (c) sample2, front side, and (d) sample2, reverse side.

    Fig. 3. (a) Enhancement/smearing features of Fig.1a, (b) enhancement/smearingfeatures of Fig.1c, (Note: Multichannel in (a) and (b): green, original background;red: enhancement features; blue: smearing features); (c) enhanced/smearedimage of Fig. 1a; (d) enhanced/smeared image of Fig. 1c, (e) segmentation result of(c), (f) segmentation result of (d), (g) final binarized image of (e), and (h) finalbinarized image of (f).

  • Sx; y 255; detected stroke on bx; y0; otherwise:

    6

    To detect the foreground strokes on either side so as to determine

    Ex; y and Sx; y, an improved Canny edge detection algorithmwith an orientation filter and orientation constraint [19]

Recommended

View more >