log-normality of indoor radon data: pragmatic approach

9
LOG-NORMALITY OF INDOOR RADON DATA: PRAGMATIC APPROACH Giorgia Cinelli 1 , François Tondeur 2 1 Dipartimento di Scienze della Terra e Geologico-Ambientali, Alma Mater Studiorum Università di Bologna, Bologna,Italy; [email protected] 2 ISIB, Haute Ecole P.-H.Spaak, Brussels, Belgium; [email protected] Abstract The analysis of the statistical distribution of indoor data is fundamental to estimate correctly the high radon risk areas. The data considered in this paper, Belgium long–term indoor radon measurements, are organized into geological units. For each unit, the hypothesis of log- normality of the distribution has been checked by statistical tests. Despite significant deviation with respect to log-normality, the log-normal distribution performs reasonably well in predicting the percentage of cases above the action level. Some improvement are obtained by restricting the log-normal model to the values above the median (High Values distribution). In addition, results are presented about the variability of the logarithmic standard deviation . Keywords: Indoor Radon, Statistical Distributions, Risk Analysis. 1. Introduction Radon risk mapping is a crucial step in the management of the indoor radon risk. Radon affected areas, where appropriate regulations and actions must be applied, are usually defined as any area where more than 1% of the dwellings show a concentration higher than the reference level used in the country (ICRP 1990, Miles and Appleton 2005). Because of a too scarce number of data, it is often impossible to evaluate this percentage directly from the data themselves, and the evaluation is normally based on the use of a statistical distribution fitted to the data. For this purpose the study of the distribution of the data is of crucial importance. Usually a Log-Normal distribution is assumed to model indoor radon data (Miles and Appleton 2005, Cinelli et al. 2010). Assumption of log-normality of data implies the omission of extreme values (low and high) that cannot be modeled correctly by such distribution. In this way only a limited importance is given to extremes, that instead, in terms of risk, could be very important (Tuia and Kanevski 2007, Hamori 2006, Pegoretti 2007). The present work is focused on the study of indoor radon data collected in the Southern part of Belgium. The goal is to find the best distribution to model our data and consequently to estimate in the best way the high radon risk areas.

Upload: independent

Post on 24-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

LOG-NORMALITY OF INDOOR RADON DATA:

PRAGMATIC APPROACH

Giorgia Cinelli 1, François Tondeur

2

1Dipartimento di Scienze della Terra e Geologico-Ambientali, Alma Mater Studiorum

Università di Bologna, Bologna,Italy; [email protected]

2 ISIB, Haute Ecole P.-H.Spaak, Brussels, Belgium; [email protected]

Abstract

The analysis of the statistical distribution of indoor data is fundamental to estimate correctly

the high radon risk areas. The data considered in this paper, Belgium long–term indoor radon

measurements, are organized into geological units. For each unit, the hypothesis of log-

normality of the distribution has been checked by statistical tests. Despite significant

deviation with respect to log-normality, the log-normal distribution performs reasonably well

in predicting the percentage of cases above the action level. Some improvement are obtained

by restricting the log-normal model to the values above the median (High Values

distribution).

In addition, results are presented about the variability of the logarithmic standard deviation .

Keywords: Indoor Radon, Statistical Distributions, Risk Analysis.

1. Introduction

Radon risk mapping is a crucial step in the management of the indoor radon risk. Radon

affected areas, where appropriate regulations and actions must be applied, are usually defined

as any area where more than 1% of the dwellings show a concentration higher than the

reference level used in the country (ICRP 1990, Miles and Appleton 2005). Because of a too

scarce number of data, it is often impossible to evaluate this percentage directly from the data

themselves, and the evaluation is normally based on the use of a statistical distribution fitted

to the data. For this purpose the study of the distribution of the data is of crucial importance.

Usually a Log-Normal distribution is assumed to model indoor radon data (Miles and

Appleton 2005, Cinelli et al. 2010). Assumption of log-normality of data implies the omission

of extreme values (low and high) that cannot be modeled correctly by such distribution. In

this way only a limited importance is given to extremes, that instead, in terms of risk, could be

very important (Tuia and Kanevski 2007, Hamori 2006, Pegoretti 2007). The present work is

focused on the study of indoor radon data collected in the Southern part of Belgium. The goal

is to find the best distribution to model our data and consequently to estimate in the best way

the high radon risk areas.

2. Instruments and methods

2.1 Database

The database of indoor radon measurements has been collected by the Federal Agency for

Nuclear Control (FANC). These data, about 10900, have been collected using track-etch

Makrofol detectors, exposed three months, from 1995 to 2009 on ground floor levels

(Dehandschutter 2009). The database includes the geographical coordinates (Belgian Lambert

System 1972), the radon concentration, and the local geological unit determined with the

digital geological map (GSB). The data are organized in geological groups (Cinelli et al.,

2009), but we consider in this study only the geological groups where we have more than 100

data in order to have good statistics for our analysis. The data have been log-transformed.

In Table 1 are reported, for each group, the number of data (N) the logarithmic mean (LM),

the logarithmic median (LMe) and the logarithmic standard deviation (LSD). The fact that

the LM is generally higher than the LMe is an indication of deviation from log-normality.

N LM LMe LSD

Revinian(Rv) 428 4.853 4.713 1.072

Salmian(Sm) 681 4.796 4.662 0.858

Gedinnian(G) 957 4.957 4.898 0.898

Siegenian(Cb) 3963 4.917 4.836 0.924

Emsian(Bt) 353 4.745 4.605 0.882

Couvinian(Co) 294 4.569 4.522 0.904

Frasnian(Fr) 165 4.101 4.060 0.623

Famennian(Fa) 304 4.168 4.182 0.734

Trias-Jurassic (Tr-Ju) 438 4.163 4.111 0.652

Eocene (Eo) 223 3.865 3.850 0.611

Table 1 Number of data, logarithmic mean (LM), logarithmic median (LMe), logarithmic standard

deviation (LSD) for each geological group considered.

2.2 Study of log-normality

Two different statistical methods have been used to verify the hypothesis of log-normality for

data comings from the same geological groups.

In the first method applied by Tuia (Tuia and Kanevski 2008) the hypothesis of log-normality

has been verified for the data set, for which skewness and kurtosis have been tested against

the lognormal hypothesis. The Shapiro-Wilk test (STATISTICA 7) is the other method used,

the p values are shown in table 2.

For each geological groups the log-normality hypothesis is rejected at the 0.05 level for both

the statistical methods that have been used.

The Skewness values are all positive, apart for two geological groups, and they are

significantly greater than zero, it indicates that the distributions of data (geologically grouped)

are thus positively skewed, i.e. the high concentration tail is probably stronger than with the

log-normal distribution.

N Skewness Kurtosis p-value

test 2

Revinian(Rv) 428 0.293 0.852 0

Salmian(Sm) 681 0.665 0.624 0

Gedinnian(G) 957 0.291 1.534 0

Siegenian(Cb) 3963 0.266 0.447 0

Emsian(Bt) 353 0.686 0.732 0

Couvinian(Co) 294 -0.295 4.331 0

Frasnian(Fr) 165 0.644 1.961 0.00145

Famennian(Fa) 304 -0.174 0.792 0.0008

Trias-Jurassic (Tr-Ju) 438 0.956 3.501 0

Eocene (Eo) 223 0.436 0.705 0.01479

Table 2 Number of data (N), Skewness and Kurtosis parameters, and p-value from Shapiro-Wilk test

for each geological group considered.

Using the normal quantile-quantile plot we can compare graphically the distribution of the

data to a standard normal distribution, providing another measure of the log-normality of the

data.

In figure 1 the normal q-q plots for some geological groups are shown. They display how the

log-normal distribution correctly describes the bulk of indoor data but fails for the extremes

value.

Figure 1 Normal quantile-quantile plots for some geological groups.

3. High values distribution

The low concentration values are highly uncertain, this may be due to several aspects:

a. the subtraction of the background can be inaccurate; b. negative concentrations that arise in a few cases from background subtraction, due to

statistical fluctuations, are never reported as negative values which would be

physically meaningless (and actually they could not be used in the log-

transformation), but often as 0 or as “less than the detection limit”;

c. very often, but not always, the other results lower than the detection limit are also not

reported by the laboratory at their value, but as “less than the detection limit”, and the

value included in the database is arbitrary.

For example in fig.2 for the Siegenian data with concentration minor than 10 Bq/m3, in one

graph the value 10 has been given to all data and in the other uniform distribution between 1

and 10 Bq/m3 was considered.

Figure 2 Normal quantile-quantile plots for Siegenian geological group, in the right considering the

value 10Bq/m3 and in the left a uniform distribution between 1 and 10 Bq/m

3 for the data minor than

10 Bq/m3 .

These uncertain low concentration values influence the LM and LSD, as well as the skewness

and the kurtosis, but the knowledge of their distribution is not useful for the determination of

the percentage above the action level, so for the risk analysis. On the contrary the knowledge

of the distribution of the high value concentration is fundamental for the estimation of the

percentage above the Action Level, fixed in Belgium at 400 Bq/m3.

For these reasons we have decided to consider only the data above the median and construct a

High Value distribution (HV), which is a normal distribution having as mean the logarithmic

median (LMe). The median in fact is robust, not influenced by low and high values, and is a

good starting point for the fit.

Uniformity Cb Fanc(3963) Distribution: Normal

Var2 = 4.9095+0.9423*x

-4 -3 -2 -1 0 1 2 3 4

Theoretical Quantile

0.01 0.05 0.25 0.50 0.75 0.90 0.99

-1

0

1

2

3

4

5

6

7

8

9

Obs

erv

ed V

alu

eQuantile-Quantile Plot of Var1 (Spreadsheet1 10v*4319c)

Distribution: Normal

Var1 = 4.9167+0.9217*x

-3 -2 -1 0 1 2 3 4

Theoretical Quanti le

0.01 0.05 0.25 0.50 0.75 0.90 0.99

2

3

4

5

6

7

8

9

Obs

erv

ed

Val

ue

Figure 3 Normal quantile-quantile for the HV values for some geological groups. The star indicates

the reduced value of the Action Level (400 Bq/m3).

In figure 3 are reported the High Value q-q plots for the some geological groups, the same

groups for that the normal q-q plot has been shown above. The reduced value of the Action

Level is displayed in each plot. For the majority of our geological groups this point is

positioned in the part where the HV distribution follows the log-normal distribution. This

implies that the assumption of the HV distribution for the estimation of the percentage above

the Action Level is very consistent. The extreme values are not really important for the

evaluation of this percentage.

To further confirm the quality of the HV distribution, we compare in table 3 the percentage

above the action level, using experimental data, High Value and log-normal distributions.

Geological Groups

% above the Action Level (400 Bq/m3)

Using

observed data

Using

HV Distribution

Using Log-Normal

Distribution

Revinian(Rv) 14.95 15.22 14.42

Salmian(Sm) 9.84 9.99 9.18

Gedinnian(G) 12.23 13.47 12.48

Siegenian(Cb) 12.44 13.00 12.24

Emsian(Bt) 9.35 9.70 7.89

Couvinian(Co) 5.44 6.10 5.79

Frasnian(Fr) 1.21 0.27 0.12

Famennian(Fa) 0.66 0.65 0.55

Trias-Jurassic (Tr-Ju) 1.40 0.59 0.25

Eocene (Eo) 0 0.05 0.03

Table 3 Percentage of house above 400 Bq/m3 using experimental data, HV and log- normal

distributions for the geological groups considered.

Both distributions perform rather well, but the HV is somewhat better, overestimating the

percentage by an average 0.15%, whereas the log-normal underestimates them by an average

0.49%. Thus, when the purpose is to evaluate this percentage, there is no need to introduce the

extreme values theory and its specific distributions.

4. Logarithmic Standard Deviation: Local versus Global

4.1 Local Logarithmic Standard Deviation

The previous section deals with global distributions of radon concentrations on a given

geological unit. Risk mapping implies to work at the local level.

In our previous work (Cinelli et al. 2010), the evaluation of the percentage above the Action

Level in each node of a one kilometer grid we used the global logarithmic standard deviation

for the corresponding geological unit. This choice was consistent with the observation of a

nearly constant variogram in the 1km-20km range.

New data collected by FANC, mainly on Siegenian, have significantly increased the sampling

density on this unit, and allow us to look in more detail at the value to be used for the LSD.

Figure4: Variograms using Siegenian data ,in the variogram on the left the max lag distance has been

set at 20000 m, in that on the right at 1000 m .

Fig.4 shows the standardized variogram calculated with SURFER8 (Surfer, 1999) for the data

of Siegenian. At short distance, the variogram shows lower values, indicating the existence of

short-range spatial correlations between the data. Consequently, one should expect that the

local LSD calculated for a local group of data should be lower than the global LSD of the

geological unit, which would reduce the estimated percentage above the action level.

Number of data 30 100

Average

Radius(m) 4432 6365

% above Action

Level with:

Local LSD 12.73% 12.51%

Global LSD 14.00% 13.06%

Average local LSD 12.56% 12.57%

Table 4 Percentage of houses above 400 Bq/m

3 considering Local LSD, global LSD and the average

of local LSD’s for 30 and 100 data and average radius considered, with a log-normal distributions

for the Siegenian geological group.

In this paper we show only the results relative to one geological group, Siegenian.

The local logarithmic standard deviations have been calculated considering a fixed number of

data in a circle around each node, the radius of which is adjusted to contain 30 or 100 data.

0 100 200 300 400 500 600 700 800 900 1000

Lag Distance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Sta

ndar

dize

d V

ario

gram

Direction: 0.0 Tolerance: 90.0Column F

16421751

2091 2251

26852733

28372838

2993 2947

31293474

30873122

3069

32753092

31093099

3014

2917 2796

2907

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

Lag Distance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sta

ndar

dize

d V

ario

gram

Direction: 0.0 Tolerance: 90.0Column F

In each node the percentage above the action level has been calculated, considering the log-

normal distribution in three different ways, using:

a. The Local LSD calculated in each node with the mentioned number of data;

b. The Global LSD, calculated from all data of the geological group;

c. The Average of Local LSDs.

The average of local percentages for each different case and the average radius are shown in

table 4. The average percentage is quite similar in the “local LSD” and “average local LSD”

options, and does not seem to depend on the number of data considered around each node.

Moreover, the results are in agreement with the observed percentage which is 12.44%. As

expected, using the global LSD one overestimates the percentage of houses above the action

level, but the difference is not so strong. More results (also with the HV distribution) will be

presented at the workshop.

Is there a reason to prefer the “average local LSD”? The local LSD shows important

variations within the Siegenian unit. Whereas this only has a small influence on the average

percentages of table 4, local percentages can be more affected. Are those variations

significant, or are they only random fluctuations? The statistical variance of the LSD is

σ2(LSD)=LSD

2/N (where N is the number of local data). Its average value is compared in

table 5 with the variance of the distribution of local LSD’s. The random component

represents approximately 70% of the total. It means that the variability of the percentage,

added by the use of the local LSD rather than its average, is mostly not significant. Our

conclusion at this stage is thus that the “average local LSD” option seems to be the best one.

This conclusion should of course be verified for other geological groups, as well as for the

HV distribution recommended in the preceding section.

Number of data σ2(LSD)Total σ

2 (LSD)random

30 0.031 0.023

100 0.012 0.008

Table 5 Variance of the Local LSD for the Siegenian data, in different situations.

5. Conclusion

The distributions of indoor radon concentrations for houses belonging to a same geological

unit in Southern Belgium clearly deviate from log-normality. To some extent, this deviation is

related to the way of reporting the data that are under the detection limit. It is also due to the

extreme values that are well above the action level.

Despite these facts, we observe that the log-normal distribution can describe reasonably well

the bulk of the data, and can thus be used to evaluate the most common risk predictor: the

percentage of cases above the action level. The results are still somewhat improved by only

considering the distribution of “high values”, i.e. values above the median, assuming the

equality between the median and the mean of the log-normal.

When applying this procedure at the local scale for mapping, rather than using directly the

logarithmic standard deviation of the local sampling, the average value of the local LSD on

the geological unit should be preferred.

Acknowledgements

The present study is related to a radon mapping project supported by the Federal Agency of

Nuclear Control. We also thank FANC for giving us the access to the indoor radon data

analysed in the present work.

References

Cinelli G, Tondeur F, Dehandschutter B (2009) Statistical analysis of indoor radon data for the

Walloon region (Belgium). Radiat Eff Defects Solids 164(5):307–312.

Cinelli,G., Tondeur,F., Dehandschutter, B.(2010) Development of an indoor risk map of the Wallon

region of Belgium, integrating geological information, Environmental Earth Sciences, First

Online www.springerlink.com (DOI 10.1007/s12665-010-0568-5)

Dehandschutter B., Noel E., Pépin, S., Poffijn A., Sonck, M. (2009) The application of radon

measurements in the radon action plan in Belgium. Annales de l’Association belge de

Radioprotection, Vol. 34, N°1, 89-110, 2009.

FANC (2009) Federal agency for nuclear control. Available in the web site:

http://www.fanc.fgov.be/fr/page/bienvenue-sur-le-siteradon- de-l-afcn/646.aspx

GSB, Geological Survey of Belgium,

http://www.naturalsciences.be/institute/structure/geology/gsb_website/products/geolmaps/cdroms

Hámori, K., Tóth, E., Pál, L., Köteles, G., Losonci, A., Minda, M., 2006. Evaluation of indoor radon

measurements in Hungary. Journal of Environmental Radioactivity 88, 189e198.

ICRP (1990) ICRP publication 60: recommendations of the international commission on radiological

protection annals of the ICRP 21, pp 1–3.

Miles JCH, Appleton JD (2005) Mapping variation in radon potential both between and within

geological units. J Radiol Prot 5:256– 276.

Pegoretti Stefano, 2007/2008, PHD thesis, La Distribuzione del gas Radon Indoor: Analisi con

Moderne Tecniche Statistiche .

STATISTICA 7, StatSoft Software

SURFER (1999), User’s Golden Software

Tuia, D,Kanevski,M, 2007, Indoor radon distribution in Switzerland: lognormality and Extreme Value

Theory. Journal of Environmental Radioactivity 99, 649-657.