data fusion of high-resolution satellite imagery and lidar data for automatic building extraction

21
Data fusion of high-resolution satellite imagery and LiDAR data for automatic building extraction Gunho Sohn , Ian Dowman Department of Geomatic Engineering, University College London, Gower Street, London WC1E 6BT, United Kingdom Received 7 December 2005; received in revised form 17 December 2006; accepted 3 January 2007 Available online 15 February 2007 Abstract This paper aims to present a new approach for automatic extraction of building footprints in a combination of the IKONOS imagery with pan-sharpened multi-spectral bands and the low-sampled (0.1 points/m 2 ) airborne laser scanning data acquired from the Optech's 1020 ALTM (Airborne Laser Terrain Mapper). Initially, a laser point cluster in 3D object space was recognized as an isolated building object if all the member points were similarly attributed as building points by investigating the height property of laser points and the normalized difference vegetation indices (NDVI) driven from IKONOS imagery. As modelling cues, rectilinear lines around building outlines collected by either data-driven or model-driven manner were integrated in order to compensate the weakness of both methods. Finally, a full description of building outlines was accomplished by merging convex polygons, which were obtained as a building region was hierarchically divided by the extracted lines using the Binary Space Partitioning (BSP) tree. The system performance was evaluated by objective evaluation metrics in comparison to the Ordnance Survey's MasterMap®. This evaluation showed the delineation performance of up to 0.11 (the branching factor) and the detection percentage of 90.1% (the correctness) and the overall quality of 80.5%. © 2007 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved. Keywords: Building extraction; LiDAR; IKONOS; Fusion; Binary space partitioning 1. Introduction One of the ultimate aims in photogrammetry is to generate an urban landscape model (ULM) to show the objects and landcover of an urban area in three dimensions (Dowman, 2000). This task requires object extraction, which detects the object of interest and extracts its geometric boundary from remote sensed data. A large number of natural and cultural objects could be involved in constructing the ULM. As the most prominent feature in an urban environment, acquiring accurate and up-to-date extraction of building objects has significance for urban planning, cartographic ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43 63 www.elsevier.com/locate/isprsjprs Significantly enhanced version of a paper published in the proceedings of International Archive of Photogrammetry and Remote Sensing: Sohn, G., 2004. Extraction of buildings from high-resolution data and airborne LiDAR. 35 (B3), pp. 10361042. Corresponding author. Present address: Department of Earth and Space Science and Engineering, York University, 4000 Keele St., Toronto, Canada, ON M3J 1P3. Tel.: +1 416 650 8011; fax: +1 416 736 5817. E-mail addresses: [email protected], [email protected] (G. Sohn), [email protected] (I. Dowman). 0924-2716/$ - see front matter © 2007 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved. doi:10.1016/j.isprsjprs.2007.01.001

Upload: independent

Post on 10-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

emote Sensing 62 (2007) 43–63www.elsevier.com/locate/isprsjprs

ISPRS Journal of Photogrammetry & R

Data fusion of high-resolution satellite imagery and LiDARdata for automatic building extraction☆

Gunho Sohn ⁎, Ian Dowman

Department of Geomatic Engineering, University College London, Gower Street, London WC1E 6BT, United Kingdom

Received 7 December 2005; received in revised form 17 December 2006; accepted 3 January 2007Available online 15 February 2007

Abstract

This paper aims to present a new approach for automatic extraction of building footprints in a combination of the IKONOSimagery with pan-sharpened multi-spectral bands and the low-sampled (∼0.1 points/m2) airborne laser scanning data acquiredfrom the Optech's 1020 ALTM (Airborne Laser Terrain Mapper). Initially, a laser point cluster in 3D object space was recognizedas an isolated building object if all the member points were similarly attributed as building points by investigating the heightproperty of laser points and the normalized difference vegetation indices (NDVI) driven from IKONOS imagery. As modellingcues, rectilinear lines around building outlines collected by either data-driven or model-driven manner were integrated in order tocompensate the weakness of both methods. Finally, a full description of building outlines was accomplished by merging convexpolygons, which were obtained as a building region was hierarchically divided by the extracted lines using the Binary SpacePartitioning (BSP) tree. The system performance was evaluated by objective evaluation metrics in comparison to the OrdnanceSurvey's MasterMap®. This evaluation showed the delineation performance of up to 0.11 (the branching factor) and the detectionpercentage of 90.1% (the correctness) and the overall quality of 80.5%.© 2007 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rightsreserved.

Keywords: Building extraction; LiDAR; IKONOS; Fusion; Binary space partitioning

☆ Significantly enhanced version of a paper published in theproceedings of International Archive of Photogrammetry and RemoteSensing: Sohn, G., 2004. Extraction of buildings from high-resolutiondata and airborne LiDAR. 35 (B3), pp. 1036–1042.⁎ Corresponding author. Present address: Department of Earth and

Space Science and Engineering, York University, 4000 Keele St.,Toronto, Canada, ON M3J 1P3. Tel.: +1 416 650 8011; fax: +1 416736 5817.

E-mail addresses: [email protected], [email protected](G. Sohn), [email protected] (I. Dowman).

0924-2716/$ - see front matter © 2007 International Society for PhotogramAll rights reserved.doi:10.1016/j.isprsjprs.2007.01.001

1. Introduction

One of the ultimate aims in photogrammetry is togenerate an urban landscape model (ULM) to show theobjects and landcover of an urban area in threedimensions (Dowman, 2000). This task requires objectextraction, which detects the object of interest andextracts its geometric boundary from remote sensed data.A large number of natural and cultural objects could beinvolved in constructing the ULM. As the mostprominent feature in an urban environment, acquiringaccurate and up-to-date extraction of building objectshas significance for urban planning, cartographic

metry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V.

44 G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

mapping, civilian and military emergency response. Intradition, the building extraction has mainly relied onmanual photo-interpretation with the support of DPWs(Digital Photogrammetric Workstations), which stillremains in an expensive process, especially when alarge amount of data must be processed. Thus, the fullautomation of extracting buildings has been regardedas an active research topic in photogrammetry andcomputer vision community (Ameri, 2000).

1.1. Background

In general, automatic building extraction can beachieved through certain hierarchical processing stages.The first stage starts with extracting cues (e.g., points,lines and regions), which are locally distinguishablegeometric, or chromatic properties extracted by low-level feature extraction algorithms from view-centereddata (building cue extraction). Then, those building cuesthat are related only to a building are found (buildingdetection), and the final stage ends up with reconstruct-ing the building boundaries in an object-centered model(building description). All the tasks mentioned above areequally important for the development of an idealbuilding extraction system. A lot of research effortshave been made towards solving this problem. However,the success of automatic building extraction is still adistant goal. Rather than directly extracting buildingoutlines, some researchers have used existing 2D groundbuilding plan, thus reducing the difficulties of buildingoutlines (Brenner, 2000; Suveg and Vosselman, 2004).There are several reasons to explain this difficulty whichare discussed in the following sections.

1.1.1. Scene complexityMost of the scenes usually contain very rich

information which provides a large amount of cues withgeometric or chromatic co-similarity to buildings, butbelong to non-building objects. This makes the buildingdetection task difficult. In earlier studies, some heuristicknowledge was used for solving this problem. Forinstance, it is assumed that building cues can be foundin an area with certain brightness (Park et al., 2000) orspecific colours (Moons et al., 1998) or pre-defined typeof line grouping structure (Jaynes et al., 1994) or nearbyshadow (Shufelt and McKeown, 1993). However, thereare many situations in complex urban areas where thosepre-specified heuristics are not valid. Since the buildingobject is generally located above the terrain with certainheight, 3D information extracted from stereo analysisplays an important role to eliminating erroneous cues. Inthis context, the building cues such as 3D corner (Fischer

et al., 1998), 3D lines (Baillard and Zisserman, 2000) and3D planar polygons (Ameri, 2000) have been investigatedfor the purpose of building extraction. In an alternativeway, the complexity of the building detection can begreatly reduced by subtracting terrain feature from acomplex scene as background information (Baillard andMaitre, 1999; Weidner and Forstner, 1995). In the light ofthis, various techniques for extracting terrain surfacesfrom optical imagery, IfSAR and LiDARDSM have beenrecently reported (Dowman, 2004). There is, however, nounified filtering algorithm, which is robust to all differentlandforms (Sithole and Vosselman, 2004).

1.1.2. Incomplete cue extractionThere is always a significant loss of relevant building

cues due to occlusion, poor contrast, shadows anddisadvantageous image perspective. Under these circum-stances, the cues may be highly fragmented or completelymissed, which lead to wrong building description. Theremaybe twoways to recover those incomplete cues in orderto fully describe the building boundary; model-driven anddata-driven method. The former approach assumes thatmost of building outlines can be reconstructed by fitting aspecific buildingmodel to the cues extracted and verifyingthem with certain evidences (Lin and Nevatia, 1998). Thebenefit of this method is to reconstruct the buildingboundary with minimal use of cues, which is a usefulproperty, especially in the case that sufficient cueextraction is difficult to obtain with given data sources.However, the method requires additional information forverifying models. It is not always easy to collect all theverification cues due to the aforementioned scenecomplexity. The latter approach performs the buildingdescription process by minimal use of a prior knowledgeof specific building structure and aims to delineatepolyhedral building shapes. This method requires verydense collection of building cues from multiple dataframes (Baillard and Zisserman, 2000), or from a high-quality of 3D measurement such as LiDAR data (Vossel-man, 1999). The method emphasizes the representation ofwide variations in building shapes, but may fail where allsignificant features of a building object cannot beextracted by low-level vision algorithms. As discussedabove, both methods have their own pros and cons. It is achallenging research topic to combine two differentapproaches in an optimal way so that their weakness canbe compensated by each other (Brenner, 2004).

1.1.3. Sensor dependencyThe building extraction system is highly influenced by

the sensor used. The reason for this is that the primary dataused limits knowledge which can be available for

45G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

extracting buildings. For instance, a building extractionsystem using 3D depth information is very different froma system that relies only on monocular imagery. Theprimary data to support the building extraction is availablefrom a variety of sources with different resolution (e.g.,optical airborne, spaceborne, LiDAR, microwave imagingsensors, and cartographic map). Each data source has itsown strength and weakness as the primary cue for thebuilding extraction. An important research issue is to fusedifferent data sources so that the weaknesses of individualcues are compensated for by others (Dowman, 2004).However, it is not a trivial task to determine how tooptimally select primary cues from multiple sensors in thefusion process and how to combine them at each processof building detection and description.

1.2. Data fusion

Recent developments ofmodern technologies diversifythe methods for acquiring the primary data in buildingextraction. Such new comers are IKONOS satellite andLiDAR. Over the last five years, both sensors havebecome powerful tools to replace or complementconventional data acquisition techniques. The rapidemergence of IKONOS satellite and LiDAR has createdan urgent need for new techniques for building extraction.Much research efforts have been made to automaticallydetect and describe buildings from either IKONOS(Haverkamp, 2004, Lee et al., 2003; Sohn and Dowman,2001) orLiDAR (Vosselman, 1999;Weidner and Forstner,1995; Maas and Vosselman, 1999). Further investigationis, however, needed to develop a fully operational buildingextraction system. This is caused by the weaknessinherited from each sensor. IKONOS imagery is stillmany times poorer than aerial images in spatial resolvingpower and often shows weak reflectance around buildingboundary. A consequence is a significant loss of geometriccueswhich leads to adverse effects in building description.On the other hand, the under-sampling nature of LiDARdata acquisition, due to small footprint size of laser beamcompared to average point spacing and disadvantageousbackscattering from illuminated targets, makes it difficultto extract building edges with height discontinuity.

However, an efficient procedure of data fusion cancompensate for the weakness of both sensors forbuilding extraction (Schenk and Csatho, 2002). It isbecause they naturally have the complementary proper-ties of each other; the IKONOS imagery is easier toextract building boundaries than in LiDAR and canprovide a variety of information such as intensity,colours, and textures, while LiDAR can reliablyattribute accurate 3D information to those cues extracted

from optical imagery. There have been a few recentstudies to extract buildings in a combination ofIKONOS imagery and LiDAR data. Complementaryfusion principles for extracting building outlines weredriven by integrating LiDAR data with multi-spectralIKONOS imagery (Kim and Muller, 2002). Although asatisfactory result was reported by the authors, exploi-tation of the synergy of both sensors mainly focused ondetecting buildings, and sophisticated technique foraccurate building description was not considered.

Rottensteiner et al. (2005) employed the Dempster–Shafer theory as a data fusion framework for buildingdetection from airborne orthophoto with multi-spectralbands and LiDAR data. The building object is detectedon a per-pixel classification by probabilistically inves-tigating various properties (e.g., colour, height varianceand surface roughness) of each pixel, which is latervalidated in a region-wise manner. Instead of chromaticinformation, Guo and Yasuoka (2002) combined mono-chromatic IKONOS imagery with LiDAR data forextracting buildings. After isolating building regionsby subtracting terrain surface from LiDAR data, anactive contour model was fitted to building outlines bythe guidance of edges extracted from both LiDAR dataand IKONOS imagery. However, reconstructed outlinesstill remain to be un-regularized. Chen et al. (2004)detected step edges from LiDAR data, which aregeometrically improved with the support of linearfeature extracted from airborne imagery with 10-centimetres ground sampling distance. Those refinedlines are grouped in a closed polygon by a splitting-and-merging algorithm. Although the proposed algorithmhas an advantageous aspect in implicitly constructingtopological relations across features extracted, themethod did not show how to cope with erroneous linesfor achieving an objective optimality. This problem maybecome significant, especially when IKONOS imageryor poor quality of LiDAR data is used as primary datasource. This paper presents a new technique of automaticbuilding extraction from multi-spectral IKONOS imag-ery in a combination of LiDARwith low point density. Aparticular research interest will be given to reconstruct-ing building outlines with global optimization criteria,especially when sufficient rectilinear lines are not able todetect from primary data sources.

2. Method overview

Fig. 1 illustrates a workflow of the major componentsand their subsequent processes in the automatedbuilding extraction system proposed in this paper. Thesystem requires IKONOS imagery with multi-spectral

Fig. 1. Schematic workflow of the overall process of the proposed building extraction system.

46 G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

bands, which is re-sampled at 1-metre resolution on theground, and LiDAR data with horizontal point spacingof about 3 m. It is pre-assumed that both data are geo-referenced. The suggested building extraction systemconsists of two major processes; building detection andbuilding description.

For building detection, the proposed system followsa focusing strategy suggested by Baillard and Maitre(1999), in which feature classes are hierarchicallycategorized and a classification algorithm is selectedaccording to the feature class targeted. This processallows the subsequent building description process tofocus only on one building structure. First, on-terrainfeatures and off-terrain features are differentiatedaccording to whether those features are located eitheron the terrain surface or above the terrain surface. Thisclassification is done as terrain surface is reconstructedby a newly developed algorithm, called RTF (RecursiveTerrain Fragmentation) filter. Once the terrain informa-tion is extracted, outlying points with a height of morethan a pre-defined threshold from the generated terrainsurface are further classified as high-rise features. At thefinest level, vegetated points are excluded from thehigh-rise features by computing the normalized differ-ence vegetation indices (NDVI) from IKONOS multi-spectral bands. As a result, LiDAR points which onlybelong to building structures can be obtained. Further

processing allows the individual objects to be boundedwith rectangles, which are then fed into the buildingdescription process with the final labels of LiDARpoints.

The next step is to reconstruct building outlines basedon the result of the building detection process. This isachieved by a BUS (Building Unit Shape) organizationmethod. This description process aims to delineatepolyhedral shape of buildings. To this end, genericbuilding shape is represented as a mosaic of convexpolygons. Each convex polygon is referred as the BUS.A rectangle resulting from the building detection is usedas an initial BUS. Linear features around buildingboundary are extracted from IKONOS image with thesupport of the classification result of LiDAR points. Inorder to compensate completely or partly missed data-driven lines, model-driven lines are additionally pro-duced by fitting specific building models to LiDARpoints. Both data-driven and model-driven lines areemployed as hyperlines to halve the initial BUS, whichleads to two sub-convex polygons. This binary partition-ing process continues until no child polygon is generatedby the hyperlines, and ends up with a set of polygons(i.e., BUSes). This recursive generation of convexpolygons is implemented by a Binary Space Partitioning(BSP) algorithm, in which a binary tree is incremen-tally expanded as child polygons are generated by the

47G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

hyperlines. The final result of the BSP tree is a BUSadjacent graph where each node represents a BUS andeach arc means the connectivity between neighbouringBUSes. A BUS grouping process merges only BUSes,which belong to building structures, and eliminatesspurious lines besides building boundaries. As a result,building outlines are reconstructed.

3. Test dataset

For the current research a LiDAR DSM wasprovided by Infoterra Co., which covers a sub-site ofGreenwich industrial area with the size of 1.2×1.2 km2

(∼1,180,073.6 m2). The LiDAR DSM was acquired bythe first pulse of Optech's 1020 airborne laser sensor.The data has been converted from OSGB36 (plan) andOSD Newlyn (height) to UTM/WGS84. The LiDARDSM contains a total of 113,541 points, whichapproximately corresponds to a point density of 0.1(points/m2), i.e., one point per 3.2×3.2 (m2). The heightof the study area varies from 0.21 m to 39 m. As can beseen in Fig. 2(a), the Greenwich LiDAR DSM shows atypical urban environment. The height of the terrain inthis scene ranges from 0.21 m–14 m, and the highestterrain height can be found in the South–West corner inFig. 2(a). A number of industrial buildings with differentsizes spread over the study area. The height of largebuildings ranges from 7 m–39 m. Some low objectswith height of 4 m–8 m can be found in the Greenwichdataset. In addition, a group of tall trees with variousheights are located on the street. A row of terracedhouses can be found at the lower left corner of the scene.Fig. 2(a) also shows very irregular surface over somerooftops, though they are formed in planar roof surfaces.This problem is highlighted when the Greenwich

Fig. 2. Greenwich

LiDAR measurements are overlaid with the IKONOSimage in Fig. 3(a). Although the point density of theGreenwich data shows 0.1 points/m2, it is much lessover some building rooftops and the ground nearbuildings. This may be caused by certain factors suchas poor weather condition, high absorption by specificmaterial property, occlusion effect, laser scanningpattern and flying height restriction over London,which limit returned laser pulses from dark surfaces,sloped and shinny surfaces. This problem may affect thebuilding extraction process, which will be discussedlater.

Over the Greenwich study area, a pan-sharpenedmulti-spectral (PSM) IKONOS image is also pro-vided by Infoterra Co. The PSM image is produced bycombing the multi-spectral data with the panchromaticdata, and re-sampled with 1-metre ground pixel. ThisIKONOS PSM image was orthorectified by SpaceImaging Co. to satisfy the positional accuracy (∼1.9 m)of Precision Product of Space Imaging. To meet thesehigh-quality accuracy levels, Space Imaging uses high-precision ground control and precise terrain models tocreate Precision products. Fig. 2(b) shows the Green-wich IKONOS PSM image, in which the red channel isreplaced with the near-infrared channel while the greenchannel with the red one respectively.

4. Building detection

This section presents a building detection method tolocalize individual buildings by sequentially removingdominant urban features which are not relevant tobuildings. The success of building detection is importantto make the remaining building description processfocused to building objects individually.

test datasets.

Fig. 3. Sequential results of building detection process; white cross means LiDAR points; white rectangles in (f ) mean building blob detection results.

48 G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

4.1. Coarsest feature classification

The first step of building detection process is todifferentiate on-terrain points from off-terrain points inthe Greenwich LiDAR data. To this end, a LiDAR filter,

called recursive terrain fragmentation (RTF) filter, wasdeveloped to classify on-terrain points from a cloud ofLiDAR points. This filter assumes that generic terrainsurface is a set of piecewise planar surfaces, and a planeterrain model (PTM) is defined as an elementary model

49G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

for reconstructing a mosaic of planar terrain surfacesfrom the LiDAR measurements. A final terrain surfaceis reconstructed in an iterative hypothesis–test scheme.The entire LiDAR space is initially hypothesized by asingle PTM. If the hypothesized area does not satisfy theconditions of the PTM, the initial terrain model will befragmented into smaller sub-regions by obtaining aterrain point from underlying LiDAR points. Since anumber of LiDAR points could be considered as theterrain point, an optimum solution of the terrain frag-mentation is achieved by MDL (Minimum DescriptionLength) principle. This fragmentation process continuesuntil all the fragmented regions are verified as PTM. Atthe end of the process, every LiDAR point is labelled aseither terrain or non-terrain point. Since the develop-ment of RTF filter is beyond the scope of this paper,details of this technique will not be described in thefollowing sections, but can be found in Sohn andDowman (2002) and Sithole and Vosselman (2004).Fig. 3(b) shows the on-terrain points detected by theRTF filter from Fig. 2(a). In this figure, some terrainsegments which are not densely covered by the filteredon-terrain points show poor quality of the GreenwichLiDAR DSM. Fig. 4 shows the terrain reconstructed bythe RTF filter from the Greenwich LiDAR DSM. Asshown in the figure, the DTM successfully removednon-terrain objects by the RTF filter despite the fact thatunderlying terrain surface consists of different terrainslopes and building objects with a range of differentsizes are located.

4.2. Intermediate feature classification

Once either an on-terrain or off-terrain label isassigned to every LiDAR point, off-terrain label points

Fig. 4. Greenwich terrain surface

are re-classified into more detailed feature classes, i.e.,low-rise or high-rise feature class. The determinationof this intermediate level of feature classification ismade by comparing height attributes of off-terrainpoints with pre-specified height threshold. Fig. 5(a)shows a height histogram of the result of the coarsestfeature classification. As can be seen in this figure, theheight histogram of on-terrain points obviously showssingle dominant peak. It suggests that the underlyingterrain surfaces of the Greenwich dataset is successfullydetected by the RTF filter. It is, however, difficult todistinguish between two classes (i.e., low-rise and high-rise classes) from the height histogram of off-terrainpoints since their heights are corrupted by the terrainsurface. This problem can be solved by subtracting theheight of underlying terrain surface from the one of off-terrain points. As shown in Fig. 5(b), it becomeseasier to distinguish between the low-rise and the high-rise class when the heights of off-terrain points arenormalized by the terrain surface. Although a heightthreshold to classify off-terrain points into either low-rise or high-rise feature class is rather arbitrarily se-lected as 4 m in this study, Fig. 5(b) shows the selec-tion of this height criterion could be automated by astatistical analysis of the normalized height histogram.Fig. 3(c) and (d) shows the final results of the inter-mediate level of feature classification, in which class-ified low-rise and high-rise points are overlaid withIKONOS imagery.

4.3. Finest feature classification

The high-rise feature class is generally a mixture oftree and building objects. For building detection, thetrees must be excluded from the high-rise features,

reconstructed by RTF filter.

Fig. 5. Height histograms of coarsest and intermediate feature classification; (a) on-terrain feature (solid line) and off-terrain feature (dash line) and (b)low-rise feature (solid line) and high-rise feature (dash line).

50 G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

by which the isolation of individual buildings can beachieved. The discrimination between buildings andtrees is accomplished through the analysis of thenormalized difference vegetation index (NDVI) of theIKONOS multi-spectral information. The NDVI is anestablished vegetation index that is widely used inthe thematic mapping applications by looking at thecombinatorial reflectance in the red and near-infraredbands and is defined as; NDVI=(NIR−R) / (NIR+R),where NIR and R represent data from IKONOS red andnear-infrared channel respectively. Since vegetated areagenerally generates relatively high reflectance by near-infrared channel than by visible red channel, higherNDVI values indicate larger amount of green vegetationpresent in the pixel. Once the NDVI map is calculated(Fig. 6(a)), the high-rise features are then, divided intobuilding and tree class using a global thresholdtechnique of the NDVI map. This method assumesthat the NDVI distribution for the high-rise feature classwill have two clear bimodal distributions, one for thebuilding and one for the tree. By back-projecting thehigh-rise points into IKONOS image space andincorporating the NDVI information with them, theNDVI for each high-rise point is calculated from a smallmask constructed around the back-projected point; thetree label is assigned to a high-rise point if any pixel ofthe masked points has the NDVI value larger than a pre-specified threshold, i.e., δNDVI, otherwise, the buildinglabel is assigned. Finally, vegetated high-rise points canbe removed, while building label points are retained.Fig. 6(b) shows a binary image when the NDVI map isthresholded by δNDVI of 0.8. Fig. 3(e) displays buildinglabel points detected when the mask size of 5×5 wasused over Fig. 6(b).

4.4. Building blob detection

Isolating the building label points and making theminto individual building objects is rather straightforward.Once tree points are differentiated from building points bythe NDVI method described in the previous section, thosepoints classified into the on-terrain, low-rise, and treeobjects are altogether assigned non-building labels. Then,building points surrounded by the non-building labels, aregrouped as isolated objects (called building blobs) bya connectivity analysis in TIN (Triangulated IrregularNetwork), which was modified based on an algorithmsuggested byLumia et al. (1983). As a result, 170 buildingblobs can be found from Fig. 3(f) after removing smallblobs whose member points is less than 30 points. Furtherprocessing allows the individual building blobs to bebounded with rectangle polygons, and these polygons arethen fed into the building extraction process, which willbe discussed in the next section.

5. Building description

This section describes a building description method,which extracts building outlines based on the result of thebuilding detection process presented in the previoussection. A BUS organization is introduced as a newmiddle-level of feature grouping method. IKONOSimagery and LiDAR points play complementary rolesto extract lines around building boundary. Then, thoselines extracted from both sensors recursively partitionthe rectangles bounding building blobs into convexpolygons until certain termination criteria are met.Finally, building outlines are reconstructed by mergingpolygons which are verified as part of building structure.

Fig. 6. Results of the finest feature classification.

51G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

5.1. Data-driven line cue generation

As a primary cue for building extraction, straightlines are extracted from IKONOS imagery by the Burnsalgorithm (Burns et al., 1986). However, since extractedline features include a number of extraneous line seg-ments, uncorrelated to building saliencies, it is neces-sary to filter those distracting features so that onlyfocused lines with significant length located aroundbuilding boundaries remain. To this end, non-building(including on-terrain, low-rise, and tree) and buildingpoints labelled by the previous classification process areused to determine whether or not line primitives can beconsidered as boundary lines.

First, straight lines extracted by the Burns algorithmare filtered by a length criterion, by which only lineslarger than pre-specified length threshold, ld, remainfor further processing. Then, two rectangle boxes withcertain width, lw, are generated along two orthogonaldirections to the line vector filtered in length. Thedetermination of boundary line can be given if non-building and building points are simultaneously foundin both boxes or if only building-label points are foundin one of the boxes and no LiDAR point can be foundin the other box. The latter boundary line condition isconsidered if a low density LiDAR dataset is used.

As a final line filtering process, a geometric distur-bance corrupted by noise is regularized over boundarylines. A set of dominant line angles of boundary lines isanalyzed from a gradient-weighted histogram which isquantized in 255 discrete angular units. In order toseparate a weak, but significant peak from other nearbydominant angles, a hierarchical histogram-clusteringmethod is applied. Once the dominant angle, θd, isobtained, lines with angle discrepancies which are less

than certain angle thresholds, θth, from θd are found.Then, their line geometries are modified as their anglesare replaced with θd. These modified lines do notcontribute to the succeeding dominant angle analysisand the next dominant angle is obtained. In this way, aset of dominant angles is obtained, by which geometricproperties of boundary lines can be regularized. Fig. 7shows boundary lines extracted by the support oflabelled LiDAR points, after being filtered in length(ld=5 m) and geometrically regularized in anglethreshold (θth=30°), where a box width of lw is chosenas 5 m for boundary line verification. The result showsthe boundary lines can be successfully extracted as theintensity line cues. Since the deficiency of LiDARmeasurements on the ground nearby building bound-aries in the Greenwich dataset, distracting lines locatedon the ground are also detected as intensity line cues.These lines will be, however, excluded during an upperlevel of cue generation, referred as polygon cuegeneration, which will be described in Section 5.3.

5.2. Model-driven line cue generation

Although a satisfactory result to extracting boundarylines could be obtained by the method presented in theprevious section, the line cues produced in a data-drivenway do not always provide sufficient cue density coveringall the parts of building edges. This is because significantboundary lines may be missed due to low contrast,shadow overcast, and occlusion effects, especially when a1-metre satellite imagery of a complex scene is solelyused. However, LiDAR data is relatively less affected bythose disadvantageous factors than the optical imagery.Thus, new lines are virtually extracted from LiDAR spacein order to compensate for the lack of data-driven line

Fig. 7. Extraction of data-driven cue extraction and verification from IKONOS image; (a) straight lines extraction; (b) boundary line verification(white dots and black dots represent non-building and building points respectively; (c) filtered boundary line.

52 G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

density by employing specific building models. It isassumed that building outlines are comprised of parallellines. Based on this, for each data-driven line, parallellines and “U” structured lines are inferred from LiDARspace. The process aims to acquire at most three model-driven lines starting from a data-driven line, but it couldfail to generate any model-driven line. The main idea ofthe model-driven line detection is that a small virtual boxis generated from each data-driven line and it grows overbuilding roof so that building points are maximallycaptured without including any non-building point.

First, a box growing direction, pointing to the lo-cation of parallel boundary line, is determined. To thisend, a small virtual box is generated with a width of lwfrom the selected data-driven line in the same way ofdetecting boundary lines described in Section 5.1. Tothat direction, the virtual box grows until it comes acrossany on-terrain point (Fig. 8(a)). Then, it shrinks in orderto have maximum building points while in its minimumsize (Fig. 8(b)). In this way, the virtual box is expanded,but at this time, towards to two orthogonal directionsparallel to the boundary line detected (Fig. 8(c)). Thus,“U” structured boundary lines made with the parallelboundary line can be detected. Finally, these three

Fig. 8. Illustration of model-dr

model-driven lines detected are back-projected ontoimage space and then, their line geometry is adjusted bygradient weighted least-square method.

When lw is chosen as 5 m, a procedure of detectingthree model-driven lines from one data-driven line canbe illustrated in Fig. 9(a)–(c). As remaining data-drivenline cues are involved, Fig. 9(d) shows the entire model-driven lines detected. Although the assumption ofgeometric regularity having parallel or “U” structuredoutlines is made for producing the model-drive lines, thisassumption may not be valid in the case that a buildingshape has no symmetric property or erroneous lines areincluded as data-driven lines. For this reason, only aportion of model-driven lines could be involved in orderto recover significant boundary segments missed in data-driven line generation. It is, however, subject to thedegree of complexity of individual buildings and afalse alarm rate of boundary line detection as towhat percentage of model-driven lines cues can beused. Therefore, a verification process of model-drivenlines is necessary. In the following section, it will beautomatically determined at polygonal cue generation,whether or not a model-driven line should be used inboundary representation of building shape.

iven line cue generation.

Fig. 9. Results of model-driven line cue generation; white lines represents detected model-drive lines, white arrow points to the virtual box growingdirection and white cross represents non-building points; (a) parallel line detection; (b) “U” structure line detection; (c) gradient fitted virtual lines;(d) entire model-driven lines detected.

53G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

5.3. Edge-constrained data interpolation

Once the data-driven and model-driven lines areextracted from IKONOS imagery and LiDAR data, bothlines are then used to refine a low and irregular densityof LiDAR data. As discussed in Section 3, theGreenwich LiDAR data acquired by Optech's ALTM1020 system provides approximately 0.1 points/m2. Themajor problem of the Greenwich data is that the laserpoints were acquired with extremely irregular pointdensity. This irregular property of the GreenwichLiDAR data can be found when the average distanceof each point from its connected points in TIN ismeasured. It shows that all the laser points aredistributed with a distance of 3.5 m in average withthe standard deviation σ of 0.88 m. However, 9.6% ofthe total 113,684 points have larger than 4.38-metrepoint spacing (1σ level) and its maximum reaches up to18 m (see Fig. 10(a)). In particular, those points withsuch extreme irregularity are mainly found over roads,parking lots and building rooftops with dark surfaceswhere the transmitted laser pulses were weakly reflectedso that many laser points were missed.

As we will discuss in the following sections, thespatial distribution of laser points is the major factorwhich affects final reconstruction results of thesuggested methods. Thus, such an irregularity of thelaser point density will degrade the overall buildingdescription performance, leading to wrong buildingshapes. To prevent this problem, it is necessary tointerpolate the Greenwich LiDAR data for havingdenser and more regular point density. The interpolationprocess in current experiment simply increases thenumbers of LiDAR data by adding new laser pointswhere triangular sizes in TIN constructed with LiDARdata are larger than certain threshold. However, theconstruction of TIN is constrained by both data-driven

and model-driven lines so that the interpolation of laserpoints over the rooftops is limited to the inside ofbuilding outlines, while the terrain points increaseoutside buildings only.

All the building boundary lines are detected byextracting the data-driven and model-driven lines aspresented in previous sections. Then, starting from thecentre point of each line, virtual points are created withthe equal distance of 2 m towards both starting andending points of the lines. The original LiDAR pointsare integrated with newly generated virtual points in aTIN structure. If a triangle of the TIN is comprised ofon-terrain points only or a mixture of on-terrain andvirtual points, the triangle is classified as “on-triangle”.In a similar way, the triangle is classified as “off-triangle” if its member points include off-terrain pointsonly or a mixture of off-terrain and virtual points. Forboth “on-“ and “off-triangle”, if any lateral length of thetriangle is larger than certain threshold (5 m), new laserpoint is calculated at the centre of the triangle and itsheight is determined as the average heights of memberpoints. In this way, the original laser points increased to229,866 points and the average point spacing becamelowered to 2.5±0.55 m (Fig. 10(b)). This figuresuggests that the edge-constrained interpolation methodsuccessfully made the Greenwich dataset to be denserand more regular. However, 10% of the data still showsthe irregularity of point spacing (larger than 3.5 m). Thisis because the extreme irregularity of the original datasetmade failed to find boundary lines where laser pointscannot be interpolated.

5.4. BUS generation

For building description, a major difficulty may lieat constructing topological relations between linearfeatures. It can be accomplished by relying on hard-

Fig. 10. Edge-constrained interpolation results.

54 G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

constraints, which explicitly limits the number ofneighbouring lines and grouping rules (Kim and Muller,1999). However, how to determine those constraints arealways subjective to various factors including sensortype used, image quality, scene and object complexity.A better idea is to implicitly construct the topologicalfeature relations which can accommodate higher degreesof shape complexity.

5.4.1. BUS partitioningThere may be two ways to extract building outlines,

either constructing topological relations from extractedlines (edge-based approach), or detecting outlines fromsegmented regions (area-based approach). Former meth-od is better to describe a building shape with rectilinearlines, but is sensitive to the quality of lines extracted.Latter approach always provides full description of thescene, but produce building outlines, which are raggedwith no regular geometry and do not coincide with realedges. A method suggested for feature grouping in thissection is to combine advantages of both approaches, andsegments whole scenes by lines extracted. By that,boundaries detected between neighbouring regions cancoincide with real discontinuity, and topological rela-tions between lines can be implicitly constructed assimilar regions are merged.

The aforementioned hybrid building descriptionmethod is accomplished by the BSP tree introduced byFuchs et al. (1980). The Binary Space Partitioning (BSP)tree is a hierarchical subdivision of n-dimensional spaceinto homogeneous regions, using (n−1)-dimensionalhyperplanes. For 2D case, the subdivision is createdusing arbitrarily oriented lines, known as hyperlines. TheBSP tree was initially developed for computer graphics

applications to represent polyhedra in 3-dimensionalspace, and perform set operations upon them. Currentlythe BSP tree is widely used for applications, rangingfrom motion planning, shadow generation, imagerepresentation, and image compression (Radha et al.,1991). However, it is rare to find researches related toobject description based on the BSP tree. Sohn andDowman (2001) exploited the feasibility of BSP tree as atool for building description, which aims to extractbuilding outlines from monocular IKONOS imagery.In this study, the BSP algorithm is newly modifiedwith a different strategy considering the contribution ofLiDAR data.

In the framework of the BUS organization, a regionsurrounded by lines is referred as the BUS, and rect-angles resulting from the building detection (Fig. 3(f))are a special type of the BUS, representing initial shapeof buildings. The BUS partitioning based on BSP tree isto decompose the initial BUS into a set of convexpolygons by both of data-driven and model-driven lines.Fig. 11 illustrates the overall partitioning scheme togenerate BUS space. Suppose that we have an initialBUS bounding a building object, P0. A line list isprepared by integrating N lines of the data-driven andmodel-driven line cues, {li: i=1,…, N}, (thick lines inFig. 11). Also, N hyperlines, {hi : i=1,…, N}, computedas P0 is intersected respectively by each line segmentfrom the line list, and hi is now defined by an expressionof the form, hi=a+bx+cy, where a, b, c are lineparameters (thin lines in Fig. 11). When hi is selected asa hyperline to subdivide P0, two sub-convex polygons,P1+ and P1−, are generated, depending on whether thedot product of each vertex point of P0 with hi is positivehiN0 or negative hib0. All vertices comprising P0 are

Fig. 11. BSP tree construction; in the BSP tree, black circle and white circle represent “closed" and “open" polygon respectively.

55G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

stored as the root node of a binary tree. As an internalnode, the root node holds a hyperline and all verticescomprising P0 (Fig. 11(a)). The two sub-polygons arerepresented by children of the node (Fig. 11(b)). A leafnode, in contrast to an internal node, holds no hyperlineand corresponds to an unpartitioned polygon. Thisprocess is repeated again recursively until no leaf nodeis generated by BUS partitioning. A different sequenceof hyperlines will produce different partitioning results.

Thus, determining an optimal sequence of hyperlinesand eliminating spurious hyperlines governs the successof our building extraction system. This partitioningoptimality will be achieved by the hypothesis-and-testframework in a hierarchical manner. In the followingtwo sections, we will discuss a triggering and terminat-ing condition for partitioning in the given polygons, andan optimality criterion to select the hyperline, producingthe best partitioning result.

56 G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

5.4.2. BUS classificationPolygon classification is a process to determine

whether or not the partitioning process is triggered overa given polygon, Pi. A polygon, Pi, is classified into fourdifferent polygon classes; “empty”, “open”, “closed”, and“garbage” polygon. These polygon classes are pre-determined depending on the labelling attributes ofthe member points of Pi or geometric property of Pi asfollows:

• “Empty” polygon: Pi is classified as “empty” polygonif there is no member point within Pi (Fig. 12(a)).

• “Open” polygon: Pi is classified as “open” polygon ifthe member points of Pi are attributed to bothbuilding and non-building labels (Fig. 12(b)).

• “Closed” polygon: Pi is classified as “closed”polygon if the member points of Pi are attributed toonly building label (Fig. 12(c)).

• “Garbage” polygon: Pi is classified as “garbage”polygon if Pi is classified as the “open” polygon,but any lateral length of Pi or the area of Pi is lessthan a certain threshold, i.e., lth and ath respectively(Fig. 12(d)).

A triggering and terminating condition of the par-titioning process over Pi is determined depending onthe polygon classification result. Basically, Pi is par-titioned further if it is classified as “open” polygon,whereas the partitioning process over Pi is rejected ifPi is classified as one of “closed” polygon or “empty”polygon or “garbage” polygon. The “garbage” poly-gon is also considered as one of the terminatingcondition in order to prevent the method from parti-tioning a polygon into too small sub-polygons (over-partitioning) which can be ignorable as parts of thebuilding.

5.4.3. BUS scoringOnce the partitioning of a given Pi is determined

through the polygon classification, we need to selecta hyperline which actually partitions Pi from all thehyperline candidates generated in Sections 5.1 and 5.2.Thus, it is necessary to have the hyperline selectioncriterion which provides an optimal partitioning resultover Pi.

There are usually many erroneous hyperlines inher-ited in their formation. For instance, blunders in data-driven lines are caused when the quality of the buildingdetection is low, and undesirable model-driven linesmay be produced if building object targeted does nothave particular geometric symmetry. Those distractinghyperlines must be eliminated during the partitioning

process. This optimality is achieved by the hypothesize-and-test scheme with a partition scoring function.All the hyperlines are tested to partition Pi, and thepartitioning result generated by each hyperline isevaluated by a partition scoring function. A hyperline,h⁎, with the highest partitioning score is finally selectedto partition Pi.

Fig. 11(b) shows a procedure to obtain the best resultof the initial polygon, P0. A hyperline candidate, hi, isselected from the hyperline list, by which P0 is dividedinto the positive and negative planes, P1+ and P1−.Then, a partition scoring function, H, evaluates thepartitioning results respectively over P1+ and P1−,generated by hi, and then determines a final score forhi as the maximum value of the partitioning results overP1+ and P1−. This scoring function, H, can be describedas follows:

HðP0; hiÞ ¼ maxðHðP1þ; hiÞ;HðP1−; hiÞÞ ð1Þ

Basically, H in Eq. (1) assigns a maximum score to hi ifit produces the best partitioning result, whereas aminimum score for the worst partitioning result. Thatis, if P0 is classified as the “open” polygon, H computespartitioning scores according to the ratio of labeldistribution of building and non-building labels overP1+ and P1−. It aims to assign higher partitioning scorewhen a “closed” polygon with larger area is produced byhi (Fig. 13). The partition scoring function, H, for“open” polygon can be described by

HðP1þ; hiÞ ¼ 12

NbldðP1þ; hiÞNbldðP0; hiÞ þ Nnon�bldðP1−;hiÞ

Nnon�bldðP0; hiÞ� �

HðP1−; hiÞ ¼ 1

2

NbldðP1−; hiÞNbldðP0; hiÞ þ Nnon�bldðP1þ; hiÞ

Nnon�bldðP0; hiÞ� �

ð2Þwhere Nnon-bld and Nbld are functions to count numbersof building labels and non-building labels belonging to acorresponding polygon.

Once the partitioning scores for hi are computed byEqs. (1) and (2), remaining hyperlines are sequentiallyselected from the hyperline list and their partitioning scoresare also measured by H. As can be seen in Fig. 11(b), ahyperline, hN, with the maximum partitioning score isfinally selected to partition P0 by

hN ¼ argmax8fhig

HðP0; hiÞ ð3Þ

Then, all the vertices of P0 and hN, are stored as aroot node of BSP tree, which is expanded as new childnodes with vertices of P1+ and P1− are added to the

Fig. 12. Illustration of polygon classification; white and dark dot represent non-building and building labels respectively.

57G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

root node for further recursive partitioning. The samemethod used for the partition of P0 is applied to P1+ andP1− respectively, but to only an “open” polygon. InFig. 11(b), P1− is classified as the “closed” polygon andP1+ is classified as the “open” polygon. Thus, thepartitioning process over P1+ is only triggered, and thepartition of P1+ by a hyperline, h2, is selected as the“best” result. The BSP tree of Fig. 11(b) is expandedwith new child polygons, P2+ and P2−, and the selectedhyperline, h2 (Fig. 11(c)). This process continuesuntil no leaf node of the BSP tree can be partitionedby hyperlines (Fig. 11(d) and (e)).

5.5. BUS grouping

Fig. 14(a)–(c) shows an example how the initial BUSis segmented with a set of convex polygons by thepartitioning as described in the previous section. Afterfinishing the process by expanding a BSP tree, finalleaves of the BSP tree (i.e., BUSes) are collected. Then, aBUS adjacency graph is computed with the BUSeswhere each node represents a BUS and each arc meansthe connectivity between neighbouring BUSes. TheBUS grouping process is to merge only nodes in theadjacency graph, which is verified as the “building”polygon. In principle, all the “closed” polygon, whichinclude only building points as their member points areconsidered as the “building” polygons. In addition, an“open” polygon, which has building labels with asignificant ratio (ρth), is also classified as the “building”polygon, though it includes non-building labels. This

Fig. 13. Illustration of partition scoring; Pi is a parent convex polygonto be partitioned and hi is a hyperline to partition Pi; white and darkdot represent non-building and building labels respectively.

heuristic grouping rule for an “open” polygon can bedescribed as follows:

8Pi ¼ bopenQ polygon PiibbuildingQ polygon;

if qptðPiÞ ¼ NbldðPiÞNmemðPiÞNqth

ð4Þ

where ρpt is a point ratio of building labels over the totalnumber of member points of Pi; Nbld and Nmem arefunctions to count numbers of building labels and non-building labels belonging to Pi. Once all “building”polygons are found, they are merged together andbuilding boundaries are reconstructed. Fig. 14(d) showsa final result of the BUS grouping process.

6. Experimental results

The aim of this section is to assess the positionalaccuracy of the building boundaries produced by theproposed building extraction technique, and evaluatetheir cartographic completeness in relation to thereference building polygons provided by the OrdnanceSurvey. Firstly, the building extraction results over theGreenwich dataset are presented with description of theparameters selected for this technique. Finally, theperformance of the proposed building extraction tech-nique is evaluated by comparing the extracted buildingboundaries to the Ordnance Survey MasterMap®.

6.1. Building extraction result

The BUS organization method presented in theprevious section is applied to the building detectionresult of Fig. 3(f), in which 170 buildings with buildinglabels were automatically detected and bounded withrectangles. Some parameters for building extraction bythe BUS organization method were pre-determined. Forextracting the data-driven lines, straight lines, whoselength is larger than ld=5 (m) are extracted from theGreenwich IKONOS image. Two rectangle boxes withthe width, lw=5 (m), are created from each straight lineand the building edge lines are found by the labelling

Fig. 14. Illustration of polygon cue generation and grouping; white line and red cross represent selected hyperlines and building-label points; (a) initialpartitioning; (b) intermediate parttitioning; (c) final BSP partitioning; (d) “open" polygon filtering.

58 G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

attributes of the LiDAR points within the boxes. Thedominant angles of those edge lines are analyzedwith theangular resolution of θth=30°, by which the geometriesof the edge lines are regularized. Then, two rectangleboxes with the width, lw=5 (m), are again created fromeach edge line in order to determine a box growingdirection towards building rooftop, and parallel lines andorthogonal lines for each edge line are created. For thegeneration of BUS space, the “garbage” polygon isdefined with lth of 3 m and ath of 50 m

2. If a LiDAR datawith higher point density is used, smaller lth and ath canbe selected and thus, higher level of details of buildingshape can be delineated. For the BUS grouping process,an “open” polygon is verified as a building part ifbuilding-label points are included over a given polygonwith the point ratio of ρth larger than 0.6. A level ofdetails to delineate a building shape can be controlled bymodifying the parameters used for the BUS groupingprocess. Since the point spacing irregularity of theGreenwich LiDAR DSM is extremely high, the roles ofthe parameter for the BUS space generating andgrouping process are important in the current imple-mentation. However, the selection of those parameterscan be more stabilized if evenly distributed LiDAR withhigher point density is used.

Fig. 15(a) shows a building extraction result overthe Greenwich dataset (referred as UCL buildingmap), which was generated by the BUS organizationmethod. As can be seen in this figure, the boundariesof most building shapes were successfully recon-structed from initial BUSes generated by the buildingdetection process. In the Greenwich scene, thecontrast between the ground and buildings is not sohigh, and most building rooftops show complextexture and different shapes comprising a number ofplanar surfaces. In these circumstances, a largeamount of erroneous linear cues, which are notrelevant to building boundaries, could be generatedfrom each BUS initialized. However, the suggested

method efficiently pruned those distracting linearcues, and retained building edge lines. This suggeststhat an optimal balance between data-driven andmodel-driven cues for building extraction can beachieved by BUS partitioning. In particular, it can beconcluded that the contribution of model-driven linecues is obvious for building extraction. For severalbuilding objects, we can observe that extendedbuilding structures with small extent were recon-structed even though those parts are occluded or havevery low contrast (Fig. 15(b)). It is difficult to obtainsufficient linear cues for describing such a smallobject, relying on a data-driven process only. Thisdifficulty can be overcome by employing a simplebuilding model with minimal use of cue groupingrules. Furthermore, the recursive part segmentation ofinitial convex polygon for extracting building bound-aries adopts a level-of-detail (LoD) strategy, whichextracts significant building part comprising largersize of “building” polygon first and less significantpart with smaller extent later. This is suitable forreducing building extraction errors, which are re-stricted only to minor building structures.

6.2. Quality assessment

The Ordnance Survey provided reference buildingpolygons for this study (Fig. 15(c)). These data wereprovided in MasterMap® format. Although the detailand accuracy of the OS MasterMap® is ideal for thisstudy, there is an inconvenient characteristic of the OSMasterMap® that hinders the accuracy assessments. Theproblem with the OS truth data is that it contains somesignificant differences in comparison to the UCLbuilding map. This is mainly because the OS data wasconstructed at a different time to the acquisition of theIKONOS image and LiDAR data, from which the UCLdata was generated. Since this research has an interest toevaluate the positional accuracy of the UCL building

Fig. 15. Building extraction results over the Greenwich dataset; (a) UCL building map; (b) building extraction results of small extended buildingstructures with low contrast; (c) OS MasterMap®; building extraction errors (light grey: true positives; medium grey: false positives; false negatives).

59G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

map, rather than the OS MasterMap®, those inherentfaults in the OS data were removed from the UCLbuilding maps. In addition, the scale of the featuresin the OS MasterMap® has been compiled at a largerscale than the one of IKONOS image. As a result,very small features cannot be clearly recognized in theIKONOS image. Thus, those small features, whosemember points are less than 30 points, were removedfrom the OS data.

To evaluate the overall performance of the suggestedbuilding extraction system, the automatically extractedbuildings (UCLbuildingmap) and the reference buildings(MasterMap®) were compared pixel-by-pixel. Based onthe comparison result between two datasets, the followingobjective metrics (Lee et al., 2003; Rottensteiner et al.,2005) were employed in order to provide a quantita-tive assessment of the developed building extractionalgorithm:

Branching Factor : FP=TPMiss Factor : FN=TPCompletenessð%Þ : 100� TP=ðTPþ FNÞCorrectnessð%Þ : 100� TP=ðTPþ FPÞQuality Percentageð%Þ : 100� TP=ðTPþ FNþ FPÞ

ð5Þ

Here, TP (True Positive) is the number of buildingpixels classified by both datasets, TN (True Negative) isthe number of non-building pixels classified by bothdatasets, FP (False Positive) is the number of buildingpixels classified only by the UCL building map, and FN(False Negative) is the number of building pixelsclassified only by the OS MasterMap®.

Each metric mentioned above provides its ownquantitative measure for evaluating the overall successof building extraction algorithms. The branching factorand miss factor are closely related to the boundarydelineation performance of the building extractionsystem. The branching factor increases when thealgorithm incorrectly over-classifies building pixels,while the miss factor increases when the algorithmmisses the building pixels. The completeness (often,called building detection percentage) and the correct-ness are the ratio of the correctly classified buildingpixels (TP) with respect to the total building pixelsclassified by the OS MasterMap® (TP+FN) and by theautomatic process (TP+FP) respectively. Both factorsindicate a measure of building detection performance.Higher building detection rate can be achieved as eitherless missing building pixels or less over-classifiedbuilding pixels are produced by the algorithm. The

60 G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

quality percentage combines aspects of both boundarydelineation accuracy and building detection rate tosummarize system performance. To obtain 100%quality, the algorithm must correctly classify everybuilding pixel, without missing any building pixel(FN=0) and without over-classifying any backgroundpixel (FP=0).

Table 1 shows the pixel classification results ac-quired by comparing the UCL building map to the OSMasterMap®. Based on this result, the evaluationmeasures on the UCL building map were computed byEq. (5). The suggested building extraction techniqueshowed 0.11 delineation performance in branching factor,while the miss factor presented slightly poor performance(0.13). This indicates that the number of building pixelsover-classified by the algorithm is less than the number ofmissed building pixels. Also, the automatic processdetected buildings at the rate of 88.3% in completeness(building detection percentage), while 90.1% in correct-ness. By the similar reason to the aforementioneddelineation performance, we achieve higher correctnessperformance than the completeness due to the tendency ofthe developed system to producing less false positivepixels than false negatives. Finally, the overall success ofthe technique was evaluated as 80.5% extraction quality(quality percentage).

The quality assessment results of building extractionsystems using various primary datasets have beenreported by many researchers. Lee et al. (2003)developed a method to extract outlines from IKONOSimagery with multi-spectral bands. An unsupervisedclassification results based on ISODATA algorithm wasused as evidences for detecting buildings, and asquaring algorithm regularized building outlines con-strained with two dominant directions. They reportedthe technique achieved a building detection rate of 64%,a branching factor of 0.39 and a quality percentage of51%. Kim and Muller (2002) integrated LiDAR withmulti-spectral bands of IKONOS for building extrac-tion. Parish (2002) evaluated the performance of thesystem, which showed that contribution of LiDAR dataimproved the building detection rate up to 86% com-pared to Lee et al.'s work, but ignorance of geometricregulation in building description process resulted in

Table 1Pixel classification results

Pixel classification Pixels

True positive (TP) 282,726True negative (TN) 253,5114False positive (FP) 31,142False negative (FN) 37,517

worse branching factor of 1.0, thereby rather poorquality percentage of 64%. Based on the theory ofDempster–Shafer, Rottensteiner et al. (2005) detectedbuildings from LiDAR data and an airborne orthophotowith multi-spectral bands, which were re-sampled at 1 mresolution. When the reference map was compared todetected buildings in a per-pixel evaluation, the systemdetected buildings with the completeness of 94% andthe correctness of 80%. In a per-building evaluation,they reported that 95% of the buildings larger than50 m2 were detected, whereas about 89% of the detectedbuildings were correct. With multiple high-resolutionairborne images (10 cm of ground sampling) and a high-quality elevation model, Fradkin et al. (2001) reportedthat they achieved a building detection rate of 80% and aquality percentage of 78%, and showed a branchingfactor can increase up to 0.1–0.37 depending on imagequality and perspective view used. It is difficult to makedirect comparison since the aforementioned techniqueswere applied to scenes with different scene complexityand primary data sources. However, in relation to otherresearches, it can be concluded that the buildingdetection performance achieved by the suggestedtechnique is acceptable.

6.3. Error analysis

Although the developed building extraction systemshowed a satisfactory result, it also produced certainnumber of building extraction errors, which should bereduced to achieve amore accurate extraction of buildingobjects. The errors apparent in the result generated by thetechnique can generally be divided into three categories:

6.3.1. Building detection errorMost of the false negative pixels in the evaluation

metrics (black pixels in Fig. 15(d)) were generated byunder-detection of buildings. This problem is partlycaused as the NDVI classification described in Section4.3 tends to over-remove building points, in particularover the terraced houses with long and narrow structure(see lower left part in Fig. 15(d)), and is partly caused asLiDAR data is acquired with extremely low density oversome buildings (see upper middle part in Fig. 15(d)). Aconsequence of both effects is to produce very smallregions, which makes the algorithm fail to recognizethem as isolated buildings by given threshold in thebuilding detection stage. The false negatives can bereduced more if LiDAR data are sufficiently acquiredwith higher density. However, it is still hard to safelyremove trees, especially when they are closely locatednearby buildings (Baillard and Maitre, 1999). This

61G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

problem could be solved with additional data, either byusing a height difference between the first and lastreturns of LiDAR or employing surface roughnesscharacterising vegetated objects or segmenting LiDARintensity data (Maas, 2001).

6.3.2. Building delineation errorMedium grey-coloured pixels in Fig. 15 represent the

false positive errors, indicating the delineation quality ofbuilding outlines. Those errors are caused by severalfactors. These include the inherent planimetric accuracyof input data (i.e., IKONOS image, LiDAR data, and OSreference data), a degree of point spacing irregularityand density of LiDAR data. Most of boundarydelineation errors were deviated from the OS referencedata with one or two pixels where LiDAR measure-ments were sufficiently acquired over buildings.However, as LiDAR points were acquired with lesspoint density, more errors were produced around itsboundaries. This is caused by the fact that the detectionof data-driven line and model-driven line is moredifficult over building rooftops with coarser pointdensity than the one with denser point density. As aresult, mis-location of both lines leads to the generationof delineation errors around building boundaries. Also, asignificant lack of LiDAR measurements over rooftopscould make the suggested automatic process producemany “empty” polygons so that building shapes recon-structed are intruded by removing them. There is thereverse case of the former problem. In the BUSgrouping, the presence of non-building points aroundbuilding outlines were used as important evidence toverify the “building” polygons. If there is a significantdeficiency of non-building points on the ground nearbuildings, erroneous “closed” polygons are generated,and could be mistakenly verified as the “building”polygons. A consequence is to extrude original buildingshapes as a reconstruction result. Therefore, uniformacquisition of LiDAR data with higher point density willmake the proposed system more controllable, and thusreduce the building delineation errors.

6.3.3. Reference data errorSome building extraction errors are presented in

Fig. 15(d) due to the inherent faults in the OS buildingpolygons. Although these errors were carefully removedfrom the UCL buildingmap before performing the qualityassessment, the inherent faults in the OS data can still befound, and thus some parts of buildingsweremissed in theOS MasterMap® although the UCL building map cansuccessfully delineate boundaries of a building based onthe result of LiDAR measurements and IKONOS image.

This is because the OS data was constructed at a differenttime to the acquisition of the IKONOS image and LiDARdata, from which the UCL data was generated. Theanalysis of the reference errors suggests that the devel-oped building extraction technique can also be used forapplications detecting changes in an urban environmentand supporting map compilation.

7. Conclusions

This paper presented a new building extractionmethod which automatically detects building objectsand delineates their boundaries using IKONOS imageand LiDAR data in coherent collaboration of twodifferent datasets. The developed technique consists ofa two step procedure: building detection and buildingdescription. A building detectionmethod was introducedto reduce a scene complexity in urban areas and simplifythe building description process. The technique subse-quently detected dominant features comprising the urbanscene, and finally isolated buildings from surroundingfeatures. The results prove that terrain informationextracted from LiDAR data and chromatic cues providedby multi-spectral bands of IKONOS imagery is impor-tant knowledge for detecting buildings. Based on theBSP (Binary Space Partitioning) tree algorithm, the BUS(Building Unit Shape) organization method was pro-posed as a new middle-level of feature grouping tool forreconstructing building outlines. The approach hasfavourable characteristics in many aspects. First, theproposed framework can accommodate diversifiedbuilding shapes. Complexity of the building shape isconstrained in a data-driven way. The degrees ofgeometric freedom, constraining the shape of thebuilding, is determined by analysing dominant direc-tionalities from data-driven lines in IKONOS imagery,which are not limited to pre-specified models. Secondly,classification results of LiDAR points serve as evidencesto employ box-type model to produce model-drivenlines, thereby completely or partly missed buildingboundaries can be compensated. Thirdly, internaltopological relations amongst linear features are implic-itly constructed, and a subjective features groupingrelying on pre-defined threshold can be avoided. Finally,optimality of the feature grouping is achieved by apartition scoring function, which makes polygonal cuesgenerated in a level-of-detail strategy. This makes asignificant loss of building parts minimised.

The developed building extraction system was ap-plied to a sub-scene of the Greenwich area. The resultsshowed that the suggested system can successfullydelineate most buildings in the scene. The overall

62 G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

quality of the results was evaluated by objectiveevaluation metrics in comparison to the OS Master-Map®. This evaluation determined the branching factorto be 0.11, the miss factor to be 0.13, the completenessto be 88.3%, the correctness to be 90.1% and the qualitypercentage to be 80.5%, which in relation to otherstudies, is satisfactory. However, a qualitative erroranalysis showed that the extracted building polygonstend to overlook some buildings (false negative pixels)and misclassify some objects as buildings (false positivepixels) because of poor point density of the GreenwichLiDAR data. These errors could be reduced if LiDARmeasurements used are more evenly distributed withhigher density. Furthermore, the error analysis demon-strated that the technique can automatically find theinherent difference between the new data and the OSMasterMap®, which may be a useful tool for improvinga laborious topographic mapping procedure by high-lighting problematic areas.

Acknowledgements

We would like to express our gratitude to ProfessorPeter Muller at University College London andInfoterra, U.K., who made available the IKONOSimagery and LiDAR data referred to in this article.Also, we would like to acknowledge the support of theOrdnance Survey, U.K., who kindly provided the OSMasterMap® for this research.

References

Ameri, B., 2000. Automatic recognition and 3D reconstruction ofbuildings from digital imagery. PhD thesis, Institute of Photo-grammetry, University of Stuttgart, 110 pp.

Baillard, C., Maitre, H., 1999. 3D reconstruction of urban scenes fromaerial stereo imagery: A focusing strategy. Computer Vision andImage Understanding 76 (3), 244–258.

Baillard, C., Zisserman, A., 2000. A plane-sweep strategy for the 3Dreconstruction of buildings from multiple images. InternationalArchives of Photogrammetry, Remote Sensing and SpatialInformation Sciences 33 (Part B3), 56–62.

Brenner, C., 2000. Towards fully automatic generation of city models.International Archives of Photogrammetry, Remote Sensing andSpatial Information Sciences 33 (Part B3), 85–92.

Brenner, C., 2004. Modelling 3D objects using weak CSG primitives.International Archives of Photogrammetry, Remote Sensing andSpatial Information Sciences 35 (Part B3), 1085–1090.

Burns, J.B., Hanson, A.R., Riseman, E.M., 1986. Extracting straightlines. IEEE Pattern Analysis and Machine Intelligence 8 (4),425–455.

Chen, L., Teo, T., Shao, Y., Lai, Y., Rau, J., 2004. Fusion of LiDARdata and optical imagery for building modelling. InternationalArchives of Photogrammetry, Remote Sensing and SpatialInformation Sciences 35 (Part B4), 732–737.

Dowman, I., 2000. Automatic feature extraction for urban landscapemodels. Proceedings of the 26th Annual Conference of the RemoteSensing Society, Leicester. September 12–14, on CD-ROM.

Dowman, I., 2004. Integration of LiDAR and IFSAR for mapping.International Archives of Photogrammetry, Remote Sensing andSpatial Information Sciences 35 (Part B2), 90–100.

Fischer, A., Kolbe, T.H., Lang, F., Cremers, A.B., Forstner, W.,Plumer, L., Steinhage, V., 1998. Extracting buildings from aerialimages using hierarchical aggregation in 2D and 3D. ComputerVision and Image Understanding 72 (2), 185–203.

Fradkin, M., Maitre, H., Roux, M., 2001. Building detection frommultiple aerial images in dense urban areas. Computer Vision andImage Understanding 82 (3), 181–207.

Fuchs, H., Kedem, Z.M., Naylor, B.F., 1980. On visible surfacegeneration by a priori tree structures. Computer Graphics 14 (3),124–133.

Guo, T., Yasuoka, Y., 2002. Snake-based approach for buildingextraction from high-resolution satellite images and height data inurban areas. Proceedings of the 23rd Asian Conference on RemoteSensing, November 25–29, Kathmandu. 7 pp., on CD-ROM.

Haverkamp, 2004. Automatic building extraction from IKONOSimagery. Proceedings of ASPRS 2004 Conference, May 23–28,Denver. 8 pp., on CD-ROM.

Jaynes, C., Stolle, T., Collins, R., 1994. Task driven perceptualorganization for extraction for rooftop polygons. Proceedingsof ARPA Image Understanding Workshop, Monterey, November,pp. 359–365.

Kim, T., Muller, J.-P., 1999. Development of a graph-based ap-proach for building detection. Image and Vision Computing 17 (1),3–14.

Kim, J.R., Muller, J.-P., 2002. 3D reconstruction from very highresolution satellite stereo and its application to object identifica-tion. International Archives of Photogrammetry, Remote Sensingand Spatial Information Sciences 34 (Part 4) (7 pp., on CD-ROM).

Lee, D.S., Shan, J., Bethel, J.S., 2003. Class-guided buildingextraction from IKONOS imagery. Photogrammetric Engineeringand Remote Sensing 69 (2), 143–150.

Lin, C., Nevatia, R., 1998. Building detection and description from asingle intensity image. Computer Vision and Image Understanding72 (2), 101–121.

Lumia, R., Shapiro, L., Zuniga, O., 1983. A new connectedcomponents algorithm for virtual memory computers. ComputerVision, Graphics, and Image Processing 22 (2), 287–300.

Maas, H.-G., 2001. The suitability of airborne laser scanner data forautomatic 3D object reconstruction. In: Grün, A., Baltsavias, E.P.,Henricsson, O. (Eds.), Automatic Extraction of Man-made Objectsfrom Aerial and Space Images, vol. III. Balkema Publishers, Lisse,pp. 345–356.

Maas, H.-G., Vosselman, G., 1999. Two algorithms for extractingbuilding models from raw laser altimetry data. ISPRS Journal ofPhotogrammetry and Remote Sensing 54 (2–3), 153–163.

Moons, T., Frere, D., Vandekerckhove, J., Gool, L., 1998. Automaticmodeling and 3D reconstruction of urban house roofs from highresolution aerial imagery. Proceedings of the European Conferenceon Computer Vision, pp. 410–425.

Parish, B., 2002. Quality assessment of automated object identificationin IKONOS Imagery. M.Sc. Thesis, University College London,London.

Park, W., Kwak, K., Su, H., Kim, T.G., 2000. Line rolling algorithmfor automated building extraction from 1-meter resolution satelliteimages. Proceedings of the International Symposium on RemoteSensing, Kyungju, November 1–3, pp. 31–36.

63G. Sohn, I. Dowman / ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) 43–63

Radha, H.R., Leonardi, R., Vetterli, M., Naylor, B.F., 1991. Binaryspace partitioning tree representation of images. Journal of VisualCommunication and Image Representation 2 (3), 201–221.

Rottensteiner, F., Trinder, J., Clode, S., Kubik, K., 2005. Using theDempster–Shafer method for the fusion of LIDAR data and multi-spectral images for building detection. Information Fusion 6 (4),283–300.

Schenk, T., Csatho, B., 2002. Fusion of LiDAR data and aerialimagery for a more complete surface description. InternationalArchives of Photogrammetry, Remote Sensing and SpatialInformation Sciences 34 (Part 3), 310–317.

Shufelt, J.A., McKeown, D.M., 1993. Fusion of monocular cues todetect man-made structures in aerial imagery. Computer Visionand Image Understanding 57 (3), 307–330.

Sithole, G., Vosselman, G., 2004. Experimental comparison of filteralgorithms for bare-earth extraction from airborne laser scanningpoint clouds. ISPRS Journal of Photogrammetry and RemoteSensing 59 (1–2), 85–101.

Sohn, G., Dowman, I.J., 2001. Extraction of buildings fromhigh resolution satellite data. In: Grün, A., Baltsavias, E.P.,

Henricsson, O. (Eds.), Automatic Extraction of Man-madeObjects from Aerial and Space Images, vol. III. BalkemaPublishers, Lisse, pp. 345–356.

Sohn, G., Dowman, I.J., 2002. Terrain surface reconstruction by theuse of tetrahedron model with the MDL criterion. InternationalArchives of Photogrammetry, Remote Sensing and SpatialInformation Sciences 34 (Part 3), 336–344.

Suveg, I., Vosselman, G., 2004. Automatic 3D reconstruction ofbuildings from aerial images. ISPRS Journal of Photogrammetryand Remote Sensing 58 (3–4), 202–224.

Vosselman, G., 1999. Building reconstruction using planar faces invery high density height data. International Archives of Photo-grammetry and Remote Sensing 32 (Part 2), 87–92.

Weidner, U., Forstner, W., 1995. Towards automatic buildingreconstruction from high resolution digital elevation models.ISPRS Journal of Photogrammetry and Remote Sensing 50 (4),38–49.