a multi-resolution approach for line-edge roughness detection

12
A multi-resolution approach for line-edge roughness detection Wei Sun a,1,2 , Rajib Mukherjee a,2 , Pieter Stroeve b , Ahmet Palazoglu b , Jose A. Romagnoli a, * a Cain Department of Chemical Engineering, Louisiana State University, Baton Rouge, LA 70803-7303, United States b Department of Chemical Engineering and Materials Science, University of California, Davis, CA 95616, United States article info Article history: Received 20 April 2007 Received in revised form 23 October 2008 Accepted 3 November 2008 Available online 14 November 2008 Keywords: Scanning electron microscopy (SEM) Wavelet decomposition Line-edge roughness (LER) Shannon’s entropy k-means clustering Flatness factor abstract A wavelet-based line-edge detection framework is presented that proves to be solely image-dependent. In this analysis, surfaces are considered as a combination of an underlying surface structure and a surface detail, corresponding to low-frequency and high-frequency features, respectively. Through the multi- scale analysis offered by wavelet decomposition, the underlying surface structure is extracted and used to define the line-edge searching region, which, in turn, helps characterize the line-edge roughness (LER), providing valuable information for the evaluation of device fabrication and performance. We focus on exploring the optimal wavelet decomposition, to better separate the underlying structure and the surface detail, using a number of metrics including the Shannon’s entropy, k-means clustering and the flatness factor. The impact of different wavelet functions and resolution levels on line-edge roughness character- ization is discussed. An SEM image of a plane diffraction grating is studied to demonstrate the application of the proposed framework. Ó 2008 Elsevier B.V. All rights reserved. 1. Introduction In semiconductor fabrication [1,2], with the continued shrink- ing of CD dimensions, lithography lines tend to lose their defini- tion, which makes them rougher, leading to significant yield losses and performance challenges. Accordingly, line-edge rough- ness (LER) has become a critical quantity that needs to be accu- rately measured and controlled to meet such quality objectives. Full understanding of LER is a starting point, which can provide important information about the fabrication process as well as the device performance [3–10]. Off-line LER characterization is usually based on the top-down scanning electron microscopy (SEM) image [11,12]. To characterize the LER based on an image, a clear line-edge searching region needs to be defined first; fol- lowed by the extraction of the roughness information. A 2D image surface can be considered as a combination of an underlying structure with substrate background and edges, and a high-frequency component [2]. The high-frequency component could result from fabrication errors or the artifacts from the imag- ing process [8,12]. The high-frequency components interfere with the visual observations required for the identification of the line- edge. A proper surface characterization by image analysis is re- quired to differentiate among image features and obtain the exact surface characteristics. Surface characterization is usually done by Fourier transformation (FT) which generally yields erroneous or incomplete results due to its inherent limitations. FT assumes that within the surface profile all the frequency components are present and they are infinitely continuous. In reality, a surface may have discontinuities and abrupt changes with finite features. Further- more, as FT is localized in frequency only, one loses the spatial information in the process of transformation. So it cannot be used effectively for locating or detecting spatial surface features. Thus, traditional Fourier based analyses fail to differentiate among image features with different spatial frequencies due to their inability to provide spatial localization. These limitations of FT are generally overcome by using the short-time Fourier transform (STFT) [13] in which the surface profile can be represented both in space and frequency. In STFT, window functions of specified length are used that specifies a constant space and frequency resolution of the transformation. However, STFT has its own limitations. A perfect space and frequency resolution cannot be achieved simulta- neously. For short windows, a good space but poor frequency, and for long windows, a good frequency but poor space resolution would be achieved [13]. These limitations of FT and STFT are over- come in wavelet theory. With wavelet decomposition, one can ana- lyze spectral features of the surface. It also provides an effective technique for the analysis of surface profiles where all the fre- quency components are not present in a single layer [14]. Wavelet subimages also allow certain frequency ranges to be isolated. Thus, one would have the opportunity to study surface deposition effects independent of the initial surface, where deposited layers may have different frequency characteristics from the initial base layers. Wavelet analysis would allow the deposited layers to be 0167-9317/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.mee.2008.11.001 * Corresponding author. Tel.: +1 225 578 1377; fax: +1 225 578 1476. E-mail address: [email protected] (J.A. Romagnoli). 1 On leave from Beijing University of Chemical Technology. 2 These authors contributed equally to this research. Microelectronic Engineering 86 (2009) 340–351 Contents lists available at ScienceDirect Microelectronic Engineering journal homepage: www.elsevier.com/locate/mee

Upload: ucdavis

Post on 11-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Microelectronic Engineering 86 (2009) 340–351

Contents lists available at ScienceDirect

Microelectronic Engineering

journal homepage: www.elsevier .com/locate /mee

A multi-resolution approach for line-edge roughness detection

Wei Sun a,1,2, Rajib Mukherjee a,2, Pieter Stroeve b, Ahmet Palazoglu b, Jose A. Romagnoli a,*

a Cain Department of Chemical Engineering, Louisiana State University, Baton Rouge, LA 70803-7303, United Statesb Department of Chemical Engineering and Materials Science, University of California, Davis, CA 95616, United States

a r t i c l e i n f o

Article history:Received 20 April 2007Received in revised form 23 October 2008Accepted 3 November 2008Available online 14 November 2008

Keywords:Scanning electron microscopy (SEM)Wavelet decompositionLine-edge roughness (LER)Shannon’s entropyk-means clusteringFlatness factor

0167-9317/$ - see front matter � 2008 Elsevier B.V. Adoi:10.1016/j.mee.2008.11.001

* Corresponding author. Tel.: +1 225 578 1377; faxE-mail address: [email protected] (J.A. Romagnoli).

1 On leave from Beijing University of Chemical Tech2 These authors contributed equally to this research

a b s t r a c t

A wavelet-based line-edge detection framework is presented that proves to be solely image-dependent.In this analysis, surfaces are considered as a combination of an underlying surface structure and a surfacedetail, corresponding to low-frequency and high-frequency features, respectively. Through the multi-scale analysis offered by wavelet decomposition, the underlying surface structure is extracted and usedto define the line-edge searching region, which, in turn, helps characterize the line-edge roughness (LER),providing valuable information for the evaluation of device fabrication and performance. We focus onexploring the optimal wavelet decomposition, to better separate the underlying structure and the surfacedetail, using a number of metrics including the Shannon’s entropy, k-means clustering and the flatnessfactor. The impact of different wavelet functions and resolution levels on line-edge roughness character-ization is discussed. An SEM image of a plane diffraction grating is studied to demonstrate the applicationof the proposed framework.

� 2008 Elsevier B.V. All rights reserved.

1. Introduction

In semiconductor fabrication [1,2], with the continued shrink-ing of CD dimensions, lithography lines tend to lose their defini-tion, which makes them rougher, leading to significant yieldlosses and performance challenges. Accordingly, line-edge rough-ness (LER) has become a critical quantity that needs to be accu-rately measured and controlled to meet such quality objectives.Full understanding of LER is a starting point, which can provideimportant information about the fabrication process as well asthe device performance [3–10]. Off-line LER characterization isusually based on the top-down scanning electron microscopy(SEM) image [11,12]. To characterize the LER based on an image,a clear line-edge searching region needs to be defined first; fol-lowed by the extraction of the roughness information.

A 2D image surface can be considered as a combination of anunderlying structure with substrate background and edges, and ahigh-frequency component [2]. The high-frequency componentcould result from fabrication errors or the artifacts from the imag-ing process [8,12]. The high-frequency components interfere withthe visual observations required for the identification of the line-edge. A proper surface characterization by image analysis is re-quired to differentiate among image features and obtain the exactsurface characteristics. Surface characterization is usually done by

ll rights reserved.

: +1 225 578 1476.

nology..

Fourier transformation (FT) which generally yields erroneous orincomplete results due to its inherent limitations. FT assumes thatwithin the surface profile all the frequency components are presentand they are infinitely continuous. In reality, a surface may havediscontinuities and abrupt changes with finite features. Further-more, as FT is localized in frequency only, one loses the spatialinformation in the process of transformation. So it cannot be usedeffectively for locating or detecting spatial surface features. Thus,traditional Fourier based analyses fail to differentiate among imagefeatures with different spatial frequencies due to their inability toprovide spatial localization. These limitations of FT are generallyovercome by using the short-time Fourier transform (STFT) [13]in which the surface profile can be represented both in space andfrequency. In STFT, window functions of specified length are usedthat specifies a constant space and frequency resolution of thetransformation. However, STFT has its own limitations. A perfectspace and frequency resolution cannot be achieved simulta-neously. For short windows, a good space but poor frequency,and for long windows, a good frequency but poor space resolutionwould be achieved [13]. These limitations of FT and STFT are over-come in wavelet theory. With wavelet decomposition, one can ana-lyze spectral features of the surface. It also provides an effectivetechnique for the analysis of surface profiles where all the fre-quency components are not present in a single layer [14]. Waveletsubimages also allow certain frequency ranges to be isolated. Thus,one would have the opportunity to study surface deposition effectsindependent of the initial surface, where deposited layers mayhave different frequency characteristics from the initial baselayers. Wavelet analysis would allow the deposited layers to be

Fig. 1. Haar, Daubechies, Symlet and Coiflet wavelet families.

W. Sun et al. / Microelectronic Engineering 86 (2009) 340–351 341

analyzed without the inherent masking caused by periodicities ofthe support layers. Wavelet decomposition uses basis functionswith compact support and makes it possible to study a 2D imageat different resolutions, not only quantifying the periodicity of sur-face features but also revealing their locations on the image surface[13–15]. In summary, wavelet decomposition can be used to effi-ciently separate the meaningful structural components from therandom artifacts. As a consequence, based on the identified under-lying surface structure, the edge region can be more accuratelyidentified without the arbitrariness of visual observation.

In wavelet decomposition, high-frequency components are re-moved sequentially as the decomposition level increases. At higherlevels of decomposition, one obtains a smooth approximate imagewhere only the dominant low-frequency features are retained. Thissmooth image shall contain the most important background andedge information without the interference of the imaging artifactsand unimportant surface details. For this task, there are two keydecisions. The first is the choice of the wavelet basis function andthe second is determination of the decomposition level.

In a previous study [16], we introduced the wavelet-based LERdetection and characterization method, but did not articulate onthe choice of the wavelet basis function and the decomposition level.In this paper, we explore different wavelet functions and decompo-sition levels for separating the underlying structure and the edges,therefore defining the edge searching region. Our aim is to show thatthere is an optimal wavelet function and a decomposition level andthe optimality will be quantified by three different metrics. Thesemetrics will be derived from the use of Shannon’s entropy, a clusteranalysis and the flatness factor. After obtaining the optimal decom-position level, LER values will be calculated at different thresholdvalues. The results will show that the wavelet decomposition is akey tool in establishing a rational framework for line-edge detectionand roughness characterization.

2. Wavelet-based LER detection and characterization

2.1. Wavelet transform on image analysis

Wavelet decomposition is widely used in time series and imageanalysis [13–15]. Its salient advantage over other analysis meth-ods, especially Fourier transform, is that it can give not only tem-poral (spatial) information of a signal, but can also localize thatinformation in the frequency domain. A 1D signal f(t) can bedecomposed using the wavelet transformation and a family of realorthonormal bases, wj;kðtÞ. The bases are generated from a kernelfunction wðtÞ:

wj;kðtÞ ¼ 2�j=2wð2�jt � kÞ ð1Þ

where j and k are integers, representing the dilations and transla-tions of the kernel function. The wavelet coefficients of the signalcan be calculated by the inner product

aj;k ¼ hf ðtÞ;wj;kðtÞi ¼Z

f ðtÞwj;kðtÞdt ð2Þ

Another basis function, the scaling function, /ðtÞ, is needed inaddition to the mother wavelet wðtÞ to establish the multi-resolu-tion characteristics of the wavelet decomposition. /ðtÞ can be ex-pressed in terms of a weighted sum of shifted /ð2tÞ as

/ðtÞ ¼ffiffiffi2p X

n

lðnÞ/ð2t � nÞ ð3Þ

where lðnÞ’s are the coefficients of a low-pass filter in the fast dis-crete wavelet transform (DWT) calculation. The mother waveletwðtÞ and the scaling function /ðtÞ are related through theexpression,

wðtÞ ¼ffiffiffi2p X

n

hðnÞ/ð2t � nÞ ð4Þ

where hðnÞ’s are coefficients now corresponding to a high-pass fil-ter. Two filters are related through the expression,

hðnÞ ¼ ð�1Þnlð1� nÞ ð5Þ

While the mother wavelet wðtÞ represents the detail (high-fre-quency) elements of the signal, the scaling function /ðtÞ capturesthe smoothed approximation (low-frequency) component of thesignal. In the actual calculation, lðnÞ and hðnÞ are more commonlyused to perform the transform, as

cj;k ¼X

n

cj�1;klðn� 2kÞ; dj;k ¼X

n

cj�1;khðn� 2kÞ for j ¼ 1;2; :::; J:

ð6Þ

Here, the coefficients c0;n represent the original signal. The coeffi-cients cj;k correspond to the approximate (low-frequency) part ofthe decomposed signal at jth level, and are usually called theapproximation coefficients. The coefficients dj;k correspond to thedetail (high-frequency) part of the decomposed signal at jth leveland are usually referred to as the wavelet coefficients.

Fig. 1 depicts the wavelet and scaling function shapes for fourwavelet families that will be used in this study. For Daubechies,Symlet and Coiflet families, each member is also designated byits zero-crossings (or vanishing moments). For example, db8 desig-nates the Daubechies wavelet with 8 zero-crossings.

In moving from 1D to 2D wavelet transform, the rows and col-umns of the data matrix (x and y coordinates) that represent theimage are treated as independent. Therefore, the 2D filters becomethe tensor products of their 1D counterparts [15–17]. A high-passand a low-pass filter are applied to the image in the x-direction(across the rows of the matrix), and the results are down sampledby deleting every other column. This results in two images ofapproximately half size of the original, one containing high-fre-quency components of the rows and the other containing low-fre-quency components. These two images are then each filtered downthe columns using high-pass and low-pass filters and down sam-pling the results along the rows (deleting every other row). Theresulting four images are approximately one-fourth the size ofthe original image. One of the sub-images represents the smoothedapproximation (A), while the others represent the horizontal detail(HD), the vertical detail (VD) and the diagonal detail (DD)

342 W. Sun et al. / Microelectronic Engineering 86 (2009) 340–351

sub-images. The process can be iterated on the approximation sub-image to obtain the decomposition in the next level (Fig. 2), there-by removing details at a frequency level relatively lower than thefirst level.

2.2. Wavelet-based LER detection

Usually the edge searching region in the image is segmentedout from the whole image by visual observation. Generally, thisstep includes all possible regions to avoid loss of information foredge detection. The high-frequency component is hard to differen-tiate from the edge surface only by the intensity value, which willaffect the line-edge detection, and further affect the LER character-ization result. In our method [16], when the original image isdecomposed into the nth level, one smoothed (approximate) sub-image and 3n detail sub-images are obtained (Fig. 2). Thesmoothed sub-image reveals the underlying structure of the sur-face by eliminating the high-frequency details of the surface. Bysearching the smoothed image, background and edge regions canbe defined without the interference of high-frequency compo-nents. The smoothed sub-image is used for finding the edgesearching region. This is done by searching the local minima inthe direction perpendicular to the edge. The edge searching regionis thus automatically and systematically segmented out from thewhole image. This is the line surrounding the edge along which lo-cal minima is obtained perpendicular to the edge. This step corre-sponds to the decision associated with ‘‘the specification of imageregion for line-edge searching” in the conventional edge detectionand characterization method [5], and does not require any visualobservation. High-frequency sub-images represent the surfacedeviation of both edges and valleys. The edges and valleys withoutthe high-frequency components can be used to define the back-ground intensity and further to determine the intensity level orheight value where the edge boundary is located.

Fig. 3. (a) Original image, (b) enhanced SEM image and (c) horizontally po

Fig. 2. Decomposition strategy for a three-level DWT. For

For edge boundary detection, a suitable threshold value is re-quired to separate adjacent edges and retain edge roughness infor-mation. Only the part of the edges above the suitable threshold isconsidered a segment while the rest is disregarded. The thresholdvalue for segmentation will affect the edge boundary detectionand, in turn, the edge characterization. This threshold segmenta-tion is also made visually based on height or intensity informationfrom an image. This makes LER characterization dependent on thechoice of segmented threshold value. The approximation sub-im-age describes the underlying surface and gives an average intensity(or height) along the edge separating line. Detail sub-images con-tain all high-frequency components, which represent the deviationof intensity (or height). By combining the average of surfaceapproximation and the standard deviation of surface detail, an im-age-dependent threshold value for segmentation is obtained. Thewavelet-based method ensures that results are only related tothe image itself and thus provides a systematic strategy for edgecharacterization. In order to compare the results with the line-edgesearching region obtained from different wavelet decomposition,the LER values at a number of fixed thresholds can also becalculated.

2.3. The choice of wavelet decomposition level and wavelet function

The edge region segmentation and threshold segmentation areessential for proper LER estimation. Segmentation based on wave-let analysis is efficient in the sense that it does not require any vi-sual interpretation. In this study, we are interested in theapproximation part of the wavelet decomposition since with theremoval of high-frequency components; it becomes easier to iden-tify the separating line between pair edges and also between theedge and its background. Theoretically, the wavelet decompositioncan be performed n times, where 2n

6 m < 2nþ1 and m is the num-ber of pixels of an image in both directions. However, as the image

sitioned SEM image showing the region selected for image processing.

example, L3DD represents the level 3 diagonal detail.

Fig. 4. The framework for choosing the optimal wavelet decomposition level and the most suitable wavelet function.

Table 1Minimum entropy values at different wavelet decomposition level for differentwavelet functions (1.000e + 004).

Haar db3 db8 sym4 sym8 coif1 coif3

1st level 6.3685 7.2629 7.2129 7.3803 7.4131 7.0063 7.45622nd level 6.3106 7.2855 6.9863 7.0084 6.8174 6.9729 6.98483rd level 5.9359 7.2260 6.8791 7.2500 6.7712 7.1787 6.92194th level 5.8109 7.7394 7.3619 7.4626 7.0373 7.2715 7.28655th level 4.8776 8.2945 7.1297 7.4398 7.1211 6.6188 7.38426th level 6.2146 7.6588 8.0382 6.3945 6.8811 7.5765 6.8679

W. Sun et al. / Microelectronic Engineering 86 (2009) 340–351 343

is decomposed to a certain level, the edges are totally smoothedout. One needs to find a suitable decomposition level, in order tomaintain a clear background and edge contour and also removeirrelevant higher frequency components on the surface at the sametime. Different wavelet basis functions have different waveforms,central frequencies and vanishing moments (Fig. 1) [15]. Thedecomposition performed by different wavelet function separatesthe image into different sub-images in terms of frequency and spa-tial localization. In another words, each level captures featureswith different spatial frequencies based on the characteristics ofthe selected wavelet function. The natural questions are whichwavelet function and what level of decomposition would yield anapproximate sub-image with the maximum information regardingthe key features, i.e., LER. This optimization of the information con-tent of the image used for analysis can be quantified through a setof metrics and here we focus on three such measures, namely theentropy analysis, the cluster analysis and the flatness factor analy-sis. These criteria will help identify the appropriate wavelet func-tion and decomposition level to be used for proper segmentationfor line-edge detection and characterization.

2.3.1. Shannon’s entropyThe concept of Shannon’s entropy plays a central role in infor-

mation theory and is sometimes referred to as a measure of uncer-tainty [18]. It is defined as

H ¼ �Xn

i¼1

pilog2pi ð7Þ

where n is the number of possible classes. Probability pi is calcu-lated as the percentage of class i pixels within the segment. In a per-fectly classified and segmented image, the entropy of each segmentwill be zero, since all the pixels in each segment will be classifiedinto the same class. The larger number of different classes thatare found within a segment, the higher the entropy will be. Shan-

non’s entropy provides a good measurement of the randomnessfor a given image. We shall calculate the entropy value of theapproximation image at each wavelet decomposition level. In thebeginning, the entropy value will decrease due to the removal ofthe high-frequency components, which corresponds to the randompart of the original image. After a certain level of decomposition, thehigh-frequency components associated with the original image willbe smoothed out. On further decomposition, the artifacts of thewavelet basis function will be added to the approximation image.This will make the entropy value increase again. For a particularwavelet base function, the approximation image which gives theminimum entropy value can then be used as the basis to definethe line-edge searching region. This level image would contain en-ough information for background and edge separation and have nomeaningful high-frequency components to intervene in the identi-fication of the edge region.

Among different wavelet basis functions, the entropy values ob-tained at optimal decomposition levels will be different, whichmeans that their information contents will also be different. Themore key underlying structure is retained, better is the definitionof the line-edge searching region. In this sense, the higher entropyvalue is where more information is contained. Thus, the wavelet

Fig. 5. Clusters of approximation at each wavelet decomposition level by Haar wavelet.

344 W. Sun et al. / Microelectronic Engineering 86 (2009) 340–351

function with higher entropy value, compared to other waveletfunctions at its optimal decomposition level, is expected to per-form better.

2.3.2. Clustering analysisClustering partitions a data set into several groups such that the

similarity within a group is larger than that among groups [19,20].The k-means clustering partitions n vectors xj into c groups. It findsa cluster center ci for each group such that an objective function ofdistance between the vector and the cluster center is minimized.The distance function for vector xk in cluster i with cluster centerci is dðxk; ciÞ. The objective function J is defined as

J ¼Xc

i¼1

Ji ¼Xc

i¼1

Xk;xk2Gi

dðxk; ciÞ

0@

1A ð8Þ

When the Euclidean distance is chosen as a measure betweenvectors xk in cluster i with corresponding cluster center ci, theobjective function becomes,

J ¼Xc

i¼1

Ji ¼Xc

i¼1

Xk;xk2Gi

kxk � cik2

0@

1A ð9Þ

For the SEM image, cluster members are the intensity levels atdifferent pixels to differentiate between the edges and the back-ground. A 2 cluster k-means clustering of pixel intensity of the ori-ginal image indicates the region between the two close edges. At ahigher level of decomposition, in the approximate image, thespikes associated with high-frequency artifacts are removed. Onecan observe that the removal of these spikes alters the clusteringregions. At a certain level of approximation, we cannot distinguish

the separating zone between the edges as a separate cluster. At thislevel of decomposition, the wavelength is comparable with theedge width and they are removed in the detail sub-image andthe whole region comes under one cluster. Thus, one can no longerdiscriminate between the adjacent edges.

In this paper, we show that the clustering of the image physi-cally characterizes the change of entropy. At the level where theminimum entropy is reached, the clearest 2 cluster k-means clus-tering is also obtained. Before this level, high-frequency compo-nents still exist in the approximation image and after the optimallevel of approximation, the image is over-smoothed and the entro-py value starts to increase due to the added variance from thewavelet basis function.

2.3.3. Flatness factorThe Flatness Factor (FF) is the normalized fourth moment of the

wavelet coefficient [21]. It is defined as

F4m ¼

ha4m;ðn1 ;n2Þim

ðha2m;ðn1 ;n2ÞimÞ

2 ð10Þ

where amðn1;n2Þ is the approximation coefficient at level m and pix-el coordinate n1;n2. The FF is a measure of the peakedness (or flat-ness) of a statistical distribution. In our case, it will give the flatnessof the probability distribution of the approximation image at eachlevel.

In our context, FF provides a measure of the approximation im-age surface, which statistically characterizes the surface intensitydistribution. At the lower levels of wavelet decomposition, the sur-face intensity distribution appears peakier. As the wavelet

Fig. 6. Clusters of approximation at each wavelet decomposition level by db3.

W. Sun et al. / Microelectronic Engineering 86 (2009) 340–351 345

decomposition continues, the distribution gets less peaky than theprevious level with the high-frequency components removed, thatis, FF increases. At a certain level of decomposition, FF reaches amaximum after which it starts decreasing again due to the addedvariance from the wavelet function. For each wavelet function atdifferent levels, FF will be different due to the removal of differenthigh-frequency components consistent with the central frequencyof the wavelet basis function. The level with the minimum entropyshould yield the maximum FF. The lowest FF at the optimal wave-let decomposition level shall contain the highest magnitude inten-sity (height) information, which comes from the original image andcorresponds to the key underlying structure.

2.4. Edge roughness characterization

The edge characterization usually refers to LER characterization.The LER denotes the deviation from a reference straight line orfrom a reference flat surface depending if one works in one ortwo dimensions [10,22]. There are several ways to characterizethe line-edge roughness. The ‘‘sigma” value is chosen in this paper,since it is the most straightforward one and most closely related tothe line boundary. It is defined as

r ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPNp

i¼1d2i

Np

sð11Þ

where Np is the number of points along the line-edge, and d is thesigned distance of each edge point from the linear fit to the line-edge [23,24]. There are other characterization parameters availablefor LER characterization in the literature [11,23]. In this paper, wefocus on the sigma value for comparing the performance of differentwavelet functions and decomposition levels.

Table 2Maximum FF values at different wavelet decomposition levels for different waveletfunctions.

Haar db3 db8 sym4 sym8 coif1 coif3

1st level 1.7356 1.5445 1.5725 1.5310 1.4572 1.6091 1.45852nd level 1.7895 1.5775 1.6620 1.6221 1.6819 1.6568 1.65913rd level 1.9097 1.5948 1.6870 1.5957 1.7237 1.6150 1.68764th level 1.7964 1.3468 1.4273 1.4452 1.5621 1.4871 1.49045th level 2.0793 1.2645 1.5146 1.4929 1.5273 1.7797 1.47086th level 1.4123 1.4700 1.2649 1.8722 1.7175 1.4728 1.7169

3. Results and discussion on LER detection by sigma valuecalculation

3.1. SEM image used for edge detection and characterization

An SEM image is used as an example for line-edge detection androughness characterization. The image analyzed is a SEM of a 1740line/mm plane diffraction grating, holographically produced on a50 mm diameter fused silica substrate. The grating lines are inphotoresist, nominally 550 nm tall and 150 nm wide, with nomi-nally vertical sidewalls. Fig. 3a shows the original image. High-fre-quency components may be caused by the photoresist itself,exposure variations, or may be related to the charging processassociated with SEM imaging. In this paper, our focus is to charac-terize the uniformity of the edge and its deviation from the idealstraight line. Characterizing edge uniformity in a consistent man-ner will help identify sources of variations, whether physicallyassociated with the structure under study, or else related to theimaging process.

The image data are in tiff format and imported into MATLAB� toyield an N �M array for further processing. In order to increase thecontrast of the image, image enhancement is performed by inten-sity normalization, as shown in Fig. 3b. In order to avoid artifactsassociated with features aligned to the SEM electron beamscanning direction, the scanning direction was not chosen to beparallel to the photoresist lines. To simplify the calculations, theimage is first rotated to position the edge horizontally and an512� 512 image is selected for further study as shown in Fig. 3c.Geometric transformation will not affect the intensity of thepixel and thus will not alter the results. Image enhancement bynormalization of intensity value only changes the relative valueof intensity at each pixel. This does not affect the coordination ofthe pixel which is used for LER characterization. We have usedwavelets to indicate where an edge is located and which intensitylevel, in the normalized form, is to be used for thresholdsegmentation.

3.2. Choice of wavelet decomposition level and wavelet function

Fig. 4 depicts the decision making steps involved in arriving atthe optimal solution. The image transformed in an array of pixelintensities is represented by a number of wavelet functions at

346 W. Sun et al. / Microelectronic Engineering 86 (2009) 340–351

multiple decomposition levels, and entropy, FF and clustering anal-yses are performed. The minimum entropy, maximum FF andclearest clustering by one wavelet function are obtained by com-paring the results. The optimum decomposition level for all wave-let functions is then obtained and the maximum entropy,minimum FF and clearest clustering by different wavelet functionsare investigated. From this analysis, the most suitable waveletfunction and the optimum decomposition level are obtained fol-lowed by LER characterization.

Fig. 7. Line-edge searching region obtained by different w

The approximation images at six different levels and from fourwavelet families are considered in this paper. The entropy valuesare shown in Table 1. The results show that minimum entropy va-lue is obtained at different decomposition levels when differentwavelet functions are used. Among all the minimum entropy val-ues from different wavelet functions, db3 results in the highest,and Haar is the lowest. The clustering on the edge surface showsthat for all the wavelets except Haar, the pair edges are separatedfrom the substrate background at the level where minimum entro-

avelet function at the optimal decomposition level.

Table 3Sigma values at different threshold value based on different wavelet functions.

Mean Mean + 1std Mean + 2std Mean + 3std

db3 Threshold 0.4412 0.5741 0.7070 0.8400Sigma (nm) 15.3 16.6 15.3 12.1Width (nm) 137.4 123.2 95.6 54.6

db8 Threshold 0.5216 0.6560 0.7903 0.9247Sigma (nm) 17.3 15.4 12.7 87.5Width (nm) 132.6 109.5 72.7 24.3

sym4 Threshold 0.5170 0.6372 0.7574 0.8776Sigma (nm) 19.6 16.8 14.3 12.7Width (nm) 126.7 109.7 82.0 39.6

sym8 threshold 0.5015 0.6345 0.7675 0.9005Sigma (nm) 16.4 15.5 13.3 12.7Width (nm) 135.5 112.9 76.5 34.1

coif1 threshold 0.4905 0.6169 0.7433 0.8697Sigma (nm) 19.2 18.4 15.4 12.6Width (nm) 132.8 114.9 88.4 44.9

coif3 threshold 0.5502 0.6857 0.8212 0.9567Sigma (nm) 26.8 17.5 11.9 217.1Width (nm) 132.9 99.3 59.2 41.1

W. Sun et al. / Microelectronic Engineering 86 (2009) 340–351 347

py value is obtained (Figs. 5 and 6 (only the graphs correspondingto Haar and db3 wavelets are reproduced here). It is noted thatHaar wavelet fails to separate the substrate background and theedges at the minimum entropy level, indicating that this waveletfunction is not suitable for this application, primarily due to its dis-crete structure that does not reproduce well the key feature formsof the image. It is also observed that the clearest clustering is ob-tained by db3, even though other wavelet functions are also ableto implement the separation task.

The FF values corresponding to the approximation images areshown in Table 2. The maximum FF values are also obtained atthe wavelet decomposition level where the minimum entropy val-ues are also reached. It further supports that the image at that levelis smoothed to the maximal degree of the original image; and extravariance will be added to the approximation image beyond that

Fig. 8. Line-edge boundaries found by db3 at level 3, threshold valu

decomposition level. It is also shown that the lowest FF at the opti-mal decomposition level is obtained from the wavelet function bywhich the highest entropy value is reached.

By comparing the results in Tables 1 and 2 and Figs. 5 and 6, itcan be concluded that criterion based on the entropy value forchoosing the optimal decomposition level of a wavelet functionand the most suitable wavelet function is valid for the separationbetween key underlying structure and surface detail. Clustering re-sults and FF values further support it.

The line-edge searching regions were obtained from differentwavelets at the wavelet decomposition level at which the mini-mum entropy value was obtained (Fig. 7). This is done by searchingfor local minima from the vertical direction on the edge [16]. Itshows again that it is impossible to separate two adjacent edgesby using the Haar wavelet function, since the two edge regions be-come one. With db3, a clear and almost straight line boundedsearching region is obtained, and more noisy curve boundedsearching regions are found by other wavelet functions.

3.3. Line-edge characterization based on the choice of waveletdecomposition level

An ideal LER parameter shall give a complete description of theLER. However, the roughness parameter commonly used still cor-responds to different threshold values, either in terms of surfaceintensity or height information. Given a threshold value, the line-edge boundaries are found by a signal threshold algorithm [24]and then the LER calculation is performed. How to choose thethreshold value still remains to be an open question and is largelyarbitrary. Often, different threshold values are chosen [24] and thecorresponding LER parameters are calculated. The maximum valueof LER could give a better description of the LER, but no satisfactoryconclusions can be drawn.

In a previous paper [16], we presented the systematically gen-erated threshold value based on the approximation mean at thechosen wavelet decomposition level and the standard deviation

es are 0.4412, 0.5741, 0.70700 and 0.8400 from top to bottom.

Fig. 9. Line-edge boundaries found by sym4 at level 2, threshold values are 0.5170, 0.6372, 0.7574 and 0.8776 from top to bottom.

Fig. 10. Line-edge boundaries found by coif3 at level 3, threshold values are 0.5502, 0.6857, 0.8212 and 0.9567 from top to bottom.

348 W. Sun et al. / Microelectronic Engineering 86 (2009) 340–351

Table 4Sigma values at fixed threshold value. The sigma and width values are all in nm.

Threshold 0.3 0.4 0.5 0.6 0.7 0.8

db3 Sigma 13.3 14.9 15.5 15.5 15.3 11.9Width 146.8 141.7 132.7 119.9 95.6 67.1

db8 Sigma 12.2 14.5 16.5 16.6 15.5 11.7Width 151.9 146.6 135.7 121.2 95.7 67.0

sym4 Sigma 11.3 15.0 19.0 18.7 13.6 11.5Width 151.1 144.6 132.2 113.9 91.5 66.9

sym8 Sigma 12.7 14.8 16.4 16.3 15.5 11.7Width 151.0 145.8 135.5 121.0 95.9 67.0

coif1 Sigma 18.0 18.7 19.2 18.6 16.3 11.7Width 149.6 143.6 132.8 118.5 94.8 67.0

coif3 Sigma 23.4 25.6 27.7 25.2 17.0 12.1Width 160.5 154.4 141.9 123.5 95.7 67.2

W. Sun et al. / Microelectronic Engineering 86 (2009) 340–351 349

from the detail sub-image. It covers most range of the intensity(height) values of an edge. Table 3 shows sigma values at thethreshold value defined by this method. The corresponding edgeboundaries are shown in Figs. 8–10 (only results for three waveletsare presented). By considering the edge boundaries, it is shown

Fig. 11. Edge boundaries by different wavele

that two adjacent edges are not totally separated when the meanof approximation is used as the threshold value, so the sigma valuecannot be used to characterize the LER. For all other threshold val-ues generated using the standard deviation, the maximum sigmavalues are obtained at the mean plus one standard deviation. Thisis in agreement with our previous results [16].

Among all the line-edge boundaries at different threshold val-ues, two sigma values cannot be valid, which are obtained bydb8 and coif3 at the threshold value of mean + 3std (results fromcoif3 shown in Fig. 10d), since their threshold value is higher thanthe top of the edges, and calculations fail to find the actual edge re-gion at that threshold value.

By using different threshold values obtained from differentwavelet decompositions, it is difficult to compare the characteriza-tion results from different wavelet functions. Thus, a series of fixedthreshold values are used for different wavelet functions to com-pare the sigma values (Table 4). The edge searching regions definedby different wavelet decompositions are slightly different. Hence,for a given threshold value, the sigma values from different wave-let decomposition also vary slightly. But considering the pixel

t functions at the threshold value of 0.3.

Fig. 12. Edge boundaries by different wavelet functions at the threshold value of 0.5.

350 W. Sun et al. / Microelectronic Engineering 86 (2009) 340–351

dimension in the image, which is about 6.3 � 6.3 nm2/pixel, resultsfrom Daubechies and Symlet families can be considered identical.The edge boundary plots at different threshold values (Figs. 11and 12) also show that results from different wavelet functionare similar. The Coiflet family gives relatively higher sigma andwidth value, mainly because of the different edge searching regionand more surface detail (spikes) being considered as the edgeboundary. These results agree with the conclusions drawn fromthe previous section that wavelet function db3 is the most suitableone for the analysis of this particular image.

Two sets of threshold values can cover the intensity range of theedge. The auto-generated threshold value provides a better start-ing point, where two adjacent edges are separated to some extent,by considering the underlying contour structure of the image andthe corresponding detail deviation. More threshold values can beinserted if the step between two systematically generated thresh-old values is too large for describing the LER.

4. Conclusions

This study shows that wavelet decomposition can betterdifferentiate the key underlying structure (background andedges) in a given image, and therefore provides a solely im-

age-dependent edge searching region for LER characterization.The optimal wavelet decomposition level can be defined byminimum entropy criterion theoretically, and then the mostsuitable wavelet function can be chosen by the maximum en-tropy criterion at the optimal wavelet decomposition level ofeach wavelet function. The clustering results and flatness fac-tor calculation are used to support these result physicallyand statistically. Furthermore, the auto-generated thresholdvalues give a better starting point for LER characterization,where two adjacent edges are separated to some extent. Weshow that the optimum wavelet function and level of decom-position can be obtained by comparing line-edge characteriza-tion results. These results can be used for further on-line LERcharacterization [2] and help circumvent human visual input,thus establishing an objective relationship between fabricationand device performance.

Acknowledgments

The authors gratefully acknowledge Jerald Britten of LawrenceLivermore National Laboratory (LLNL) for providing the gratingsample for analysis, and Phillip Hailey and Bassem El Dasher (LLNL)for SEM imaging.

W. Sun et al. / Microelectronic Engineering 86 (2009) 340–351 351

References

[1] D. Widman, H. Mader, H. Friedrich, Technology of Integrated Circuits, Springer-Verlag, Heidelberg, Germany, 2000.

[2] B. Yaakobovitz, Y. Coen, Y. Tsur, Microelectronic Engineering 84 (2007) 619–625.

[3] A. Yamaguchi, H. Fukuda, H. Kawada, T. Iizumi, Journal of PhotopolymerScience and Technology 16 (3) (2003) 387–393.

[4] S. Eder-Kapl, H. Loeschner, M. Zeininger, W. Fallmann, O. Kirch, G.P. Patsis, V.Constantoudis, E. Gogolides, Microelectronic Engineering 73–74 (2004) 252–258.

[5] G.P. Patsis, V. Constantoudis, A. Tserepi, E. Gogolides, G. Grozev, T. Hoffmann,Microelectronic Engineering 67–68 (2003) 319–325.

[6] A. Yamaguchi, O. Komuro, Japan Journal of Applied Physics 42 (2003) 3763–3770.

[7] W. Steinhogl, G. Schindler, G. Steinlesberger, M. Traving, M. Engelhardt,Microelectronic Engineering 76 (2004) 126–130.

[8] L.H.A. Leunissen, W.G. Lawrence, M. Ercken, Microelectronic Engineering 73–74 (2004) 265–270.

[9] L.H.A. Leunissen, M. Ercken, G.P. Patsis, Microelectronic Engineering 78–79(2005) 2–10.

[10] E. Gogolides, V. Constantoudis, G.P. Patsis, A. Tserepi, MicroelectronicEngineering 83 (2006) 1067–1072.

[11] V. Constantoudis, G.P. Patsis, A. Tserepi, E. Gogolides, Journal of VacuumScience and Technology B 21 (3) (2003) 1019–1026.

[12] T. Sugihara, F. Van Roey, A.M. Goethals, K. Ronse, L. Van den Hove,Microelectronic Engineering 46 (1999) 339–343.

[13] G. Strang, T. Nguyen, Wavelets and Filter Banks, Wellesley-Cambridge Press,Wellesley, MA, 1996.

[14] A. Maksumov, R. Vidu, A. Palazoglu, P. Stroeve, Journal of Colloid and InterfaceScience 272 (2004) 365–377.

[15] P.S. Addison, The Illustrated Wavelet Transform Handbook, IntroductionTheory and Application in Science, Engineering, Medical and Finance,Institute of Physics Publishing, Philadelphia, PA, 2002.

[16] W. Sun, J.A. Romagnoli, J. Tringe, S. Letant, P. Stroeve, A. Palazoglu, IEEETransactions on Semiconductor Manufacturing, in press.

[17] D.-M. Tsai, B. Hsiao, Pattern Recognition 34 (2001) 1285–1305.[18] C.E. Shannon, Bell System Technical Journal 27 (1948) 379–423, 623–

656.[19] B.S. Everitt, Cluster Analysis, Heinemann Education, 1993.[20] J.-S.R. Jang, C.-T. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing, Prentice-

Hall, Englewood Cliffs, NJ, 1997.[21] O. Holub, S. Ferreira, Computer Physics Communications 175 (2006) 620–

623.[22] S. Winkelmeier, M. Sarstedt, M. Ercken, M. Goethals, K. Ronse, Microelectronic

Engineering 57–58 (2001) 665–672.[23] T. Yamaguchi, K. Yamazaki, M. Nagas, H. Namatsu, Japan Journal of Applied

Physics 42 (2003) 3755–3762.[24] G.P. Patsis, V. Constantoudis, A. Tserepi, E. Gogolides, G. Grozev, Journal of

Vacuum Science and Technology B 21 (3) (2003) 1008–1018.