3d building model reconstruction from multi-view aerial...

AbstractA novel approach by integrating multi-view aerial imageryand lidar data is proposed to reconstruct 3D buildingmodels with accurate geometric position and fine details.First, a new algorithm is introduced for determination ofprincipal orientations of a building, thus benefiting toimprove the correctness and robustness of boundarysegment extraction in aerial imagery. A new dynamicselection strategy based on lidar point density analysis andK-means clustering is then proposed to identify boundarysegments from non-boundary segments. Second, 3D bound-ary segments are determined by incorporating lidar dataand the 2D segments extracted from multi-view imagery.Finally, a new strategy for 3D building model reconstruc-tion including automatic recovery of lost boundaries androbust reconstruction of rooftop patches is introduced. Theexperimental results indicate that the proposed approachcan provide high quality 3D models with high-correctness,high-completeness, and good geometric accuracy.

IntroductionAccurate and timely updated 3D building model data arevery important for urban planning, communication, trans-portation, tourism, and many other applications. Aerialphotogrammetry has been and still is one of the preferredways to obtain three-dimensional information of the Earth’ssurface (Brenner, 2005), and it appears to provide theeconomic means to acquire truly three-dimensional city data(Foerstner, 1999). Being very well understood and deliveringaccurate results, its major drawback is that the current stateof automation in 3D building model reconstruction is stillsurprisingly low (Suveg and Vosselman, 2004). Airbornelidar technology has also been an important way for thederivation of 3D building models. A laser scanning system isable to provide data of directly measured three-dimensionalpoints, which is beneficial to improving the automationlevel in the 3D reconstruction processes. However, thequality of the building models derived from lidar data isrestricted by the ground resolution of lidar data. In general,lidar data has reached a one-meter level of spatial resolu-tion, but it is still much lower than the aerial imageryregularly used in photogrammetry with a centimeter level ofspatial resolution. It is difficult to obtain building models

PHOTOGRAMMETRIC ENGINEER ING & REMOTE SENS ING Feb r ua r y 2011 125

Liang Cheng, Manchun Li, and Yongxue Liu are with theDepartment of Geographical Information Science, NanjingUniversity, Nanjing, Jiangsu, 210093, China([email protected]).

Jianya Gong is with the State Key Laboratory of InformationEngineering in Surveying, Mapping and Remote Sensing,Wuhan University, Wuhan, Hubei, 430079, China.

Photogrammetric Engineering & Remote Sensing Vol. 77, No. 2, February 2011, pp. 125–139.

0099-1112/11/7702–0125/$3.00/0© 2011 American Society for Photogrammetry

and Remote Sensing

3D Building Model Reconstruction from Multi-view Aerial Imagery and Lidar Data

Liang Cheng, Jianya Gong, Manchun Li, and Yongxue Liu

with highly precise geometric position and fine details onlyusing lidar data. It is clear that photogrammetry and lidartechnology are fairly complementary for 3D buildingreconstruction, and their integration can lead to moreaccurate and complete products, and a higher automationlevel of processes (Ackermann, 1999; Baltsavias, 1999).

Some typical research associated with 3D buildingreconstruction by integrating imagery and lidar data havebeen reviewed in the following section; however, for most ofthe existing approaches, it is difficult to determine highlyaccurate and detailed building models. Because most of theexisting approaches did not strive for reconstruction ofhighly accurate 3D building boundaries, their common ideais to determine approximate building boundaries from lidardata, then to refine them using image information. Its mainlimitation is the unstable quality of the approximate bound-aries determined by lidar data. The quality of these bound-aries is significantly influenced by the quality of lidar datafiltering processing and irregular point spacing. Meanwhile,the process of robust reconstruction of building roof patchesbased on lidar data is still very difficult. Therefore, a novelapproach for the reconstruction of 3D building models withprecise geometric position and fine details by integratingmulti-view aerial imagery and lidar data is proposed. Itconsists of three main steps: 2D building boundary extrac-tion, 3D building boundary determination, and 3D buildingmodel reconstruction.

Related WorksThe research of 3D building reconstruction from imagery andlidar data has been increasingly focused. Schenk and Csatho(2002) proposed a method on the establishment of a com-mon reference frame between lidar data and aerial imageryby utilizing sensor-invariant features, such as breaklines andsurface patches for a richer and more complete surfacedescription. Mcintosh and Krupnik (2002) merged edgesdetected and matched in aerial images and the digitalsurface model produced from airborne lidar data to facilitatethe generation of an accurate surface model, which providesa better representation of surface discontinuities. Rotten-steiner and Briese (2003) detected building regions and roofplanes from lidar data, and then grouped these roof planesto create polyhedral building models, in which the shapes ofthe roof plane boundaries were improved by an overalladjustment integrating aerial imagery information. Nakagawa

126 Feb r ua r y 2011 PHOTOGRAMMETRIC ENGINEER ING & REMOTE SENS ING

and Shibasaki (2003) improved the location and shape of aninitial 3D model in horizontal direction by projecting themodel to the nadir image of TLS (Three Linear Scanner)imagery and modified the 3D model in vertical directionaccording to the intersection results of the TLS imagery. Ma(2004) derived roof planes of a building using a clusteringalgorithm and performed the refinement of a buildingboundary by incorporating aerial stereo imagery. A methodbased on the theory of Dempster-Shafer for the detection ofbuildings in densely built-up urban areas was presented bythe fusion of first and last pulse laser scanner data andmulti-spectral images (Rottensteiner et al., 2005). Zhanget al. (2005) presented a coarse-to-fine strategy for 3Dbuilding model generation and texture mapping by theintegration of three video image sequences (two obliqueviews of building walls and one vertical view of buildingroofs), a coarse two-dimensional map, and lidar data. Anapproach for 3D building reconstruction automatically basedon aerial CCD imagery and sparse laser scanning samplepoints was presented by You and Zhang (2006). Sohn andDowman (2007) collected rectilinear lines around buildingoutlines by integrating a data-driven and model-drivenmanner, then accomplished a full description of buildingoutlines by merging convex polygons, which were obtainedas a building region hierarchically divided by the extractedlines using the Binary Space Partitioning (BSP) tree. Someother related approaches were introduced by (Mcintosh andKrupnik, 2002; Stamos and Allen, 2002; Seo, 2003; Fruehand Zakhor, 2004; Hu, 2007; Sampath and Shan, 2007; Sohnand Dowman, 2008; Habib et al., 2008).

Many studies have been conducted for 3D buildingmodel reconstruction only using imagery or only using lidardata. Some detailed reviews of techniques for the automatic3D reconstruction of buildings using aerial images weremade by (Remondino and El-Hakim, 2006). Some typicalapproaches associated with the 3D building reconstructionusing lidar data were also presented (Maas and Vosselman,1999; Sithole and Vosselman, 2004; Brenner, 2005; Vossel-man and Kessels, 2005; Vestri, 2006). Meanwhile, 2D mapdata, including 2D GIS and urban planning maps, has beenincorporated with lidar data or aerial imagery for 3D build-ing reconstruction by many studies (Haala and Brenner,1999; Vosselman and Dijkman, 2001).

Studies on the extraction of building boundaries are alsofocused in this research, because it is a crucial and difficultstep in 3D building model reconstruction. The automaticextraction of a building boundary from an image has been aresearch issue for decades. Mohan (1989) described anapproach based on perceptual grouping for detecting anddescribing 3D objects in complex images and applied themethod to detect and describe complex buildings in aerialimages. McGlone (1994) extracted building boundaries bycalculating vanishing points using the Gaussian Spheretechnique to detect horizontal and vertical lines. Caselleset al. (1997) introduced a scheme for the detection of objectboundaries based on active contours evolving in timeaccording to intrinsic geometric measures of the image. Adetailed review of techniques for the automatic extraction ofbuildings from aerial images was made by Mayer (1999). Inaddition, many studies on boundary extraction using lidardata have been reported. To eliminate noise effects andobtain building boundaries with precise geometric positions,some researchers used the minimum description length(MDL) method to regularize the ragged building boundaries(Weidner and Forstner, 1995; Suveg and Vosselman, 2004).Zhang et al. (2006) used the Douglas-Peucker algorithm toremove noise in a footprint, and then adjusted the buildingfootprints based on estimated dominant directions. Sampathand Shan (2007) performed building boundary tracing by

modifying a convex hull formation algorithm, thenimplemented boundary regularization by a hierarchicalleast squares solution with perpendicularity constraints.Other related studies associated with boundary extractionusing lidar point clouds were also introduced (Vosselmanand Kessels, 2005; Brenner, 2005).

2D Building Boundary Extraction

Lidar Data ProcessingIn order to obtain the segmented building points from raw lidardata, the first process is usually to separate the ground pointsfrom non-ground points. Numerous algorithms have beendeveloped to separate ground points from non-ground points.Sithole and Vosselman (2004) made a comparative study ofeight filters and pointed out that the filters estimating localsurfaces were found to perform best. So the linear predictionalgorithm proposed by Kraus and Pfeifer (1998) is used forderiving a bare DEM from raw lidar data. Non-ground pointscan be identified by comparing the bare DEM and the raw lidardata. In a dataset that contains only non-ground points, theregion-growing algorithm based on a plane-fitting techniqueproposed by Zhang et al. (2006) is used, which finally leads tothe segmented building points. Figure 1b illustrates thesegmented lidar points belonging to a building. It is reportedthat the omission and commission errors of determinedbuildings by the algorithm are 10 percent and 2 percent,respectively. It indicates that the segmentation algorithmworked well to identify most building measurements.

Building Image GenerationTo retrieve the building features of interest from a very high-resolution aerial image, a building image is first generated toreduce the complexity of processes. The complexity ofprocesses comes at first, followed by other vision problemsfrom the data acquisition: the image is a projection of the real3D object, therefore introducing deformation and multi-facesview of the object. Shadows appear more or less, dependingon the sun position and height of the buildings; occlusionsare frequent in very dense areas. Also, the aerial image ismostly composed of roads, buildings, etc, which have allsimilar types of primitives. In a building image, only onebuilding is covered and non-building features are removed(Figure 1d). Based on the lidar points belonging to eachbuilding (Figure 1b), the building image would be generatedone building by one building automatically. Since theorientation parameters of the aerial images are known, lidarpoints would be directly projected onto the aerial images bymeans of the collinearity equation shown as Figure 1c. Abounding rectangle (BR) of the building can be created basedon the projected lidar points. The BR would be enlarged witha threshold to ensure all the boundaries of a building in theaerial image can be fully covered. A raster image is generatedby interpolating the projected lidar data in Figure 1c, andthen a buffering zone can be created. Figure 1d is thebuilding image by cutting and filtering Figure 1a using the BRand the buffering zone, respectively. For automatic generationof the building images, based on ArcGIS® software, we useArcObjects to implement the above algorithm. The subse-quent processes will be based on the building image, and itscomplexity will be substantially reduced.

Line Segment Extraction

Principal Orientation DeterminationMost buildings are rectangular whose boundaries haveperpendicular structures with two rectilinear axes. These twoperpendicular axes have the most dominant contexture and

can be treated as two major principal axes (Rau and Chen,2003a). In this study, a new algorithm is proposed for deter-mining two principal orientations of a building. The principalorientations can be accurately and robustly determined by ahierarchical determination of building orientations, whichconsists of two steps: (a) rough principal orientation estima-tion, and (b) precise principal orientation determination.

Based on the segmented building points (projected onthe aerial images, shown as Figure 1c), rough principalorientations of a building can be estimated by analyzing thelidar points belonging to the building. For this purpose, analgorithm is used for determining two major directions of abuilding, which includes two following steps. (a) accordingto the segmented building lidar points, the edge points aredetected by a region-grow method, and (b) based on the edgepoints, a least-square method proposed by Chaudhuri andSamal (2007) is used to calculate the two major directions ofa building. Considering the various geometric shapes of thebuildings, a value range of rough principal orientation isconstructed by a threshold (e.g., 5 degrees) for the followingprecise principal orientation determination.

The precise principal directions are determined byfinding maximum values in the accumulative array from aHough Transformation that falls within estimated ranges ofrough principal directions. The Hough Transform cantransfer an object from image space to parameter space. Theprincipal directions of a building in an image can bedetected by analyzing the accumulative array from a Houghtransformation. In general, there are only two principal

directions for a building. Based on the rough principalorientation constraints, principal orientation determinationconsists of the following steps (Figure 2 shows the flowchart of the proposed algorithm):

Step A: selecting a building image;Step B: applying the Hough transformation on the

image;Step C: finding the maximum value M in the whole

accumulative array;Step D: setting a threshold T = M*t (the initial value of t

is set to 0.9), and keeping those cells in theaccumulative array with a value greater than thethreshold T;

Step E: selecting the range of one rough principaldirection;

Step F: adding the counting numbers with the same �value in the range. If all the accumulative valuesequal 0, then decreasing the value of t and goingback to step c, if t is greater than 0; if t equals 0,the whole processing has failed. If some accu-mulative values are greater than 0, go to nextstep;

Step G: selecting the range of the other rough principaldirection and go to Step E. If both roughprincipal directions are processed, go to Step H;

Step H: detecting a peak in each of the two ranges. Eachpeak refers to a principal orientation of abuilding.


(a) (b)

(c) (d)

Figure 1. A building image generation: (a) An aerial image, (b) The segmented lidar points, (c) The projected lidar points and BR (white), and (d) The building image.

Considering the influence of the image quality, perspec-tive distortions, and other possible errors, there may existsmall angular discrepancies between parallel lines. So avalue range for a principal orientation is constructed by athreshold (e.g., 2 degrees) for the subsequent extraction ofline segments.

Line Segment ExtractionThe Edison detector (Meer and Georgescu, 2001) is used asthe edge detector in this study. The line segments are thenextracted using Hough Transformation with the principalorientations priority. We use the Hough Transform becauseit is a sound method for detecting straight lines in digitalimages. Also we improve the Hough Transformation bysetting the searching priority for peak detection according tothe principal orientations. The Adaptive Cluster Detection(ACD) algorithm (Risse, 1989) is a classical dynamic methodto detect straight lines based on the Hough Transformation.A modified ACD algorithm is briefly described as follows:(a) search for the location of the maximum accumulation,(b) determine the candidate points that contribute to thepeak, (c) analyze the continuity for those candidate points toconstruct a line segment and to determine its two terminalpositions, (d) remove those candidate points’ contributionfrom the parameter space, and (e) proceed until the value inaccumulative arrays are smaller than the threshold. Peakdetection in the accumulative array is first performed on theprincipal orientations, which would ensure that a few linesegments with weak signals on the principal orientations(possible boundary segments) can be extracted effectively.This strategy avoids missing as many boundary details aspossible, which is very significant for determining thecomplete final models.

Boundary Segment SelectionIn this section, the extracted line segments will be automati-cally separated into two sets: boundary segments and non-boundary segments. A dynamic strategy based on lidar pointdensity analysis and K-means clustering for accurate bound-ary selection is introduced here. Plate 1a is a buildingimage; the extracted line segments are shown in Plate 1b.Based on the extracted line segments, the boundary segmentselection is performed as follows.

Boundary Candidate SelectionTwo rectangle boxes with a certain width are generatedalong two orthogonal directions of a boundary segmentwhich is invoked by the method proposed by Sohn andDowman (2007). Plate 1c shows two kinds of rectangularboxes (one is pink; the other is cyan) created for eachsegment. If no lidar points can be found in either box, theline segment is removed because the line segment is farfrom a building. If lidar points are found in both boxes andthe density values of the two boxes are almost equal, theline segment is removed because the line segment sur-rounded by lidar points should locate on the rooftops. Avalue range (1 to 10 times the lidar point spacing) is appliedto determine the optimal width of a rectangle and wesuggested that 3 to 5 times can be selected as the threshold.The use of too small values may cause that boundarysegments to be regarded as non-boundary segments; thereason is that a too small box may contain nothing (no lidarpoints) even if this box lies inside of a building due to theirregular spatial distribution of lidar points. On the otherhand, the use of too large thresholds may cause many non-boundary segments to be regarded as building segments.Because the density values of the two boxes with too large asize may be almost equal, they have a different number ofthe lidar points. The remained line segments are consideredas boundary candidate segments, as shown in Plate 1d.

Accurate Boundary SelectionThe remaining line segments are grouped according to theirangles and distances. For example, Plate 1e shows that threecandidate boundary segments are organized as one group.For the group, two rectangular boxes are also generated toeach segment. Focusing on the left segment, the lidar pointdensity of one box can be calculated by Equation 1; thedifference of lidar point density of two boxes can be calcu-lated by Equation 2. In Plate 1e, the differences of lidarpoint density for the three segments (left, middle, right) arecalculated, namely , respectively.

(1)

(2)

Adding ( ) into adataset, the dataset is defined by Equation 3:

(3)

where refers to the difference of lidar point densitycorresponding to the segment . The K-means clusteringalgorithm (with K = 2) is applied to divide the dataset intotwo new datasets: a dataset with large values, and a datasetwith small values. So two new datasets are created: onedataset with , the other dataset with

. Generally, the difference of lidar point densitycorresponding to an accurate boundary is larger than thatd2 � 3.65

d1 � 1.11,d3 � 0.72

kdk

L � { ƒ ƒdk ƒ ƒk � 0,..., m}

d1 � 1.11,d2 � 3.65,d3 � 0.72d1,d2,d3

� LPDRight Box2

Difference of Lidar Point Density � abs1LPDLeft Box

� the number of Lidar points in the box

area of the box

d1,d2,d3


Figure 2. Flow chart of the proposed algorithm forprecise principal orientation determination.

Density(LPD)Lidar Point


(a) (b)

(c) (d)

(e) (f)

Plate 1. Boundary segment selection: (a) A building image, (b) The extracted line segments (red),(c) Two rectangle boxes (pink and cyan) for each segment, (d) The remained line segments afterboundary candidate selection (red), (e) Three candidate segments organized as one group extractedfrom a small local region (the yellow box in (d)), and (f) The extracted segment after accurateboundary selection.


corresponding to an inaccurate boundary. So the boundarycorresponding to the dataset with large values ( ) is the onewith more precise position in the group. The accurateboundary segment is selected, illustrated in Plate 1f. Insome cases, if the dataset contains several large values or ifthe K-means clustering cannot produce true results, all ofsegments in one group will be selected as the accuratesegments.

3D Building Boundary DeterminationAfter the boundaries are extracted, the line-based pho-togrammetric intersection is often used for 3D buildingboundary determination. The main disadvantage of themethod is that the accuracy of the determined 3D boundarieswould be largely influenced by the intersection angle of thetwo interpretation planes and is lower than the accuracy oflidar data in the vertical direction.

A method for 3D boundary determination is introduced,which contributes to advance the geometric accuracy of the 3Dboundary segments. To overlap the extracted boundary seg-ments and lidar data, lidar points are directly projected onto theaerial images by means of the collinearity equation. Focusingon one endpoint of a boundary segment, we collect the lidarpoints around it and detect the edge points (the points on thebuilding edges), and then find the nearest one among theseedge points. The height value Z of the endpoint is assigned bythe height value of the nearest edge point. For each endpoint, aray is generated starting from the exposure station of the cameraand toward itself. The intersection between the ray and ahorizontal plane at the height value Z defines the corresponding3D point in object space of the endpoint. The detailed calcula-tion method is defined as Equation 4. Figure 3a shows abuilding covered by four aerial images. Based on the methodabove, each aerial image can provide a dataset of 3D boundaries.Different datasets are illustrated by different gray levels and linestyles in Figure 3b.

Jx � x0

y � y0 K� f � lM

x � xL

J y � yL

z � zLK

d2

where is the projection coefficient,

.

Rearranging, with known Z

(4)

Specifically, this formula describes the perspectivetransformation between the 3D point (X, Y, Z) in the objectspace and the 2D image pixel (x, y) in the image space usingexplicit interior and exterior orientation parameters. Theinterior orientation parameters include principal pointoffsets (x0, y0) and focal length f; the exterior orientationparameters consist of perspective center (XL, YL, ZL) androtation angels v, w, and k. The rotation matrix Mi isdetermined by the three rotations angles.

3D boundary segment refinement becomes possiblethanks to multi-view images. The refinement processesinclude the following three steps.

Step 1: Grouping the Segments: All the 3D segments areprojected to the 2D X-Y plane. On the 2D plane,two segments are regarded as parallel if the anglebetween them is less than a threshold (e.g.,4 degrees). All segments are grouped according totheir angles and distances. A value range (from1 degree to 10 degrees) is used to determine theangle threshold and we suggest that 4 degreescan be selected as the optimal threshold. The useof too small values may cause that many parallelsegments will be regarded as unparallel segments.Meanwhile, the use of too large values may causethat many unparallel segments will be regardedas parallel segments.

Y � YL � (Z � ZL )avwb

X � XL � (Z � ZL )auwb

Juv Kw

� MT x � x0

J y � y0

� f KM � MK Mw Mwl

(a) (b) (c)

Figure 3. 3D boundary determination: (a) A building covered by four aerial images, (b) Four sets of3D boundary segments shown by different gray levels and line styles, and (c) The refined results of 3Dboundary segments.


Step 2: Overlapping the Segments and Lidar Data on the2D Plane: In one group, the segments near to theboundary are selected using the boundarysegment selection algorithm above.

Step 3: Regarding One Group, All Endpoints of theRemaining Segments are Used to Construct aNew Segment: In those endpoints, there mayexist inliers and outliers. In general, a leastsquares method may produce a line with a badfit to the inliers because it will fit both inliersand outliers. RANdom SAmpling Consensus(RANSAC) (Fischler and Bolles, 1981), on theother hand, can produce a line which is onlycomputed from the inliers. Therefore, RANSAC ishere selected to construct the new segment,which can in general lead to the more robustresults in comparison to a least squares method.Figure 3c shows the refinement results fromFigure 3b. The use of multi-view imageryimproves the geometric accuracy of the derived3D boundaries.

3D Building Model ReconstructionFor 3D rooftop line determination, the region-growingalgorithm based on a plane-fitting technique proposed byZhang et al. (2006) is used to detect 3D rooftop patches. Wemake an improvement on this algorithm. In the originalalgorithm, a plane is constructed based on the points in thecategory using a least squares fit. The improvement is to useRANSAC algorithm instead of least squares method to con-struct a plane. Regarding the segmented lidar points, theregion-growing processes are repeated to construct allrooftop planes of a building. 3D rooftop line segments aredetermined based on intersection of the 3D rooftop planes.The processes of region-growing segmentation rely on theselection of seed points. Different strategies on seed-pointselection can sometimes result in differences in areas ofindividual patches with a building footprint. The lidarmeasurement density also has a big effect on detectionresults, and it is still difficult to full automatically extractsome complex rooftop patches. Therefore, manual editingwould be still needed, but only on rooftop line determina-tion, not on boundary determination. In many cases, 20 to30 percent rooftop lines needed to be manually edited.

In this section, an improved method is proposed for 3Dbuilding model reconstruction from the determined 3Dboundary segments and 3D rooftop segments. To reconstruct3D models using these 3D segments, the major problem isthat the topology relationships of the segments are lost. TheSMS (Split-Merge-Shape) method proposed by Rau and Chen(2003b) is an effective solution to automatically recover thetopology relationships between these 3D segments. Animproved SMS method based on lidar data is proposed here,including two main improvements: automatic recovery oflost boundary segments, and robust reconstruction of 3D roofpatches.

Lost Boundary RecoveryThe overall processes include six steps: data pre-processing,splitting a building, detecting lost boundaries, locating lostboundaries, reconstructing lost boundaries, and merging intoa building.

Step 1: Data Preprocessing. Preprocessing is the same asthe SMS method, including collinear processing,orthogonal processing, dangle removal, anddangle snapping.

Step 2: Splitting a Building: A minimum boundingrectangle (MBR) constructed by the segmentedlidar points is used to be an initial buildingoutline (the outer box in Figure 4a). The solidline segments in Figure 4a are the selectedboundary Segments, but they are topologicallydisconnected to each other. Each solid segmentis extended to split the outer box into two parts.The extended lines are shown by dotted lines inFigure 4a. Two boundaries should not crosseach other, so that segment A does not cross thesegment B in Figure 4a. Finally, many smallregions surrounded by both solid and dottedsegments are created.

Step 3: Detecting Lost Boundaries: The lidar pointdensity of each small region can be calculated.In Figure 4a, there are three types of polygons:polygons with small lidar density (Label 1),

(a)

(b)

Figure 4. Complete 3D boundary recovery.(a) Split processing. Solid lines: boundarylines; dotted lines: the extended lines ofsolid lines; outer box: initial buildingoutline; Label A, B: boundary lines, and(b) A complete boundary (bold).


polygons with large lidar density (Label 2), andpolygons with middle lidar density (Label 3). Ifthere are no lost boundary segments, only twotypes of polygons will exist: the one with largedensity and the one with small density. So, wecan identify that there exist few lost boundarysegments in the polygon of Label 3.

Step 4: Locating Lost Boundaries: In the polygon (Label3), many rectangle boxes with a certain width(about two times lidar point spacing) can begenerated along the direction of one side, asshown in Figure 5a. The lidar point density ofthese boxes can be calculated. The sameprocesses are conducted along the perpendiculardirection, as Figure 5b. The position where thelidar point density changes dramatically iswhere the lost boundary should be located, asshown in Figure 5c.

Step 5: Reconstructing Lost Boundaries: After the edgelidar points (bold points in Figure 5d) aredetected, the RANSAC algorithm is used toconstruct a new segment, which is the lostsegment.

Step 6: Merging into a Building: After the lost bound-ary segments are reconstructed, there are onlytwo types of polygons: polygons with large

lidar density and the others with small lidardensity. The K-means clustering algorithm isemployed again to divide a dataset of lidarpoint density into two sets. All polygons withlarger density are merged into a new largepolygon. The outline of the new polygon isthe complete building boundary, shown inFigure 4b.

Robust Rooftop ReconstructionA complete boundary (Figure 6b) is derived from thediscontinuous boundary segments (Figure 6a) by using thelost boundary recovery method above. Figure 6c consists ofthe recovered complete building boundary and 3D rooftoplines. The topology between all these boundaries androoftop lines can be recovered using the above “splitting-merging” processes, which would generate many closedrooftop polygons shown in Figure 6d. Focusing on a 2Drooftop polygon, the corresponding 3D patch can be recon-structed by collecting and analyzing lidar points belongingto it. As we know, there often exist many structures, suchas water towers, funnels, and pipelines on the top ofbuildings. In general, the lidar points on these top struc-tures would not be removed in the filtering processes,shown as Figure 6e. Due to these lidar points, the 3D

(a) (b)

(c) (d)

Figure 5. Lost boundary reconstruction: (a) Scan along one direction, (b) Scan along the perpendiculardirection, (c) The position of lidar density change, and (d) Edge points and the recovered segment.


rooftop patches reconstructed by a least squares methodwould have a low geometric accuracy in the verticaldirection. When reconstructing 3D models using the RANSACalgorithm, these lidar data on the top of structures can beautomatically removed, which will bring 3D models withhigher accuracy, especially in vertical direction. Figure 6fshows the reconstructed 3D model.

Evaluation

Data SetAn experimental area covered by aerial images and lidardata is used to test the performance and applicability ofthe approach, shown in Figure 7. It covers an area of 1,500m by 540 m. The lidar data have average point spacing of1.1 m, with a horizontal accuracy of 30 cm and a verticalaccuracy of 15 cm. The lidar points with different heightvalues are shown as a different shade of gray. Each aerialimage is 3,056 pixels by 2,032 pixels with a spatialresolution of 5 cm, and shown by a rectangle in Figure 7.The orientation parameters of the aerial images are known.The experimental area contains many buildings withdifferent orientations, different structures, and differenttexture conditions of rooftop. Most of buildings havecomplex geometric shapes. Image texture conditions arealso different, including simple texture, highly repeatedtexture, and complex texture. The complex textureconditions are formed because the trees are close to thebuildings.

In addition, there are existing true orthophotos and 3Dbuilding models constructed manually in the experimentalarea, which would be taken as reference data to evaluatethe quality of the 3D building models reconstructed by theproposed approach. The true orthophotos have a horizon-tal accuracy of 50 cm. Terrasolid software, a softwarepackage for processing lidar and images, was used toconstruct 3D building models. TerraModeler, a tool ofTerrasolid software, is mainly used for 3D building modelreconstruction. Orthophotos as a reference are required todetect 3D building shapes, breaklines, and other featuresfor 3D building model reconstruction.

Overlay of Imagery and Lidar DataThe aerial images and lidar data were directly overlaid bymeans of the collinearity equation. Seventeen distinct lidarpoints (e.g., corner points of a building) that are distinguish-able with its neighbors are selected to test the accuracy ofdata overlay. The offset values between the projected lidarpoints and the corners in the aerial images are checked.Root Mean Square Error (RMSE) of 3.1 pixels and 3.0 pixelson the X- axis and Y-axis, respectively, are achieved. Theaccuracy of data registration was satisfactory in this study,and the subsequent processes can continue.

Visual AppearanceFigure 8 illustrates the 3D building models reconstructedby the proposed approach. To visually observe the qualityof the 3D building models, we compared the models withthe reference data including 3D models constructed by

(a) (b) (c)

(d) (e) (f)

Figure 6. 3D building model reconstruction: (a) 3D boundary segments, (b) A complete 3Dboundary, (c) The boundary and 3D rooftop segments, (d) Closed rooftop polygons and lidar points,(e) 3D reconstruction for a rooftop patch, and (f) The constructed 3D model.


manual operation, true orthophotos, and aerial images.Figure 9 illustrates four buildings’ detailed 3D modelsderived by the proposed approach and by manual opera-tion, respectively. Three pairs of circles in Figure 9demonstrate the detailed differences between the twokinds of models (e.g., the missed corners and the missedsmall structures). The results of observation demonstratethat the reconstructed 3D models provide a reliableappearance, and most of them have a high coincidencewith the reference data, which means the proposedapproach can lead to accurate and reliable 3D buildingmodels. In subsequent analysis, the quantitative evaluationwill be conducted at two regions (Region 1 and 2 inFigure 8) in the experimental area.

Correctness and Completeness ComparisonIn the correctness and completeness analysis of 3D models,we focused on the evaluation of correctness and complete-ness on the boundaries of these 3D models. The boundariesin the true orthophotos were taken as ground truth. For one3D boundary, to check the distance and angle between theground truth and itself, we overlap its 3D model and the trueorthophotos. If the angle is smaller than a threshold and thedistance is smaller than a threshold, then the boundarysegment is considered as a true one; otherwise, it is consid-ered as a wrong one. If a boundary in the reference data didnot be detected by our approach, we call it a mis-detectedsegment. The correctness and completeness are calculatedby Equation 5.

Figure 8. The reconstructed 3D building models.

Figure 7. The experimental area (lidar points with different height values shown by different gray; aerialimages shown by rectangles).


(5)

where TP is the number of true segments, FP is the numberof wrong segments, and FN is the number of mis-detected segments. We used ten pixels as the distancethreshold because of the complexity of the whole processesduring 3D building model reconstruction, which includesregistration of imagery and lidar data, 2D building boundary

Completeness � TP

TP � FN

Correctness � TP

TP � FP

extraction, 3D building boundary determination, and 3Dbuilding model reconstruction. Any errors will be propa-gated from each step to the final result. For example, in StepOne (Registration of Imagery and Lidar Data), the RMSE of3.1 pixels and 3.0 pixels on the X- axis and Y-axis, respec-tively, are estimated. Considering more errors derived in theother process steps and errors in the lidar data and aerialimages, we selected a relatively large number (ten pixels;50 cm) as the distance threshold. For 3D building modelreconstruction from multi-source datasets, however, it isnot a loose threshold. In the 3D Building BoundaryDetermination Section, we use 4 degrees as a threshold for

(a) (b)

(c) (d)

Figure 9. Comparison on the appearance of the 3D building models reconstructed byour approach (top) and manual operation (down): (a) and (b) No difference betweenthe two kinds of 3D models, and (c) and (d) Three pairs of circles show thedifferences between the two kinds of 3D models.


parallel segment determination. Considering that more errorsmay be produced in the 3D Building Model ReconstructionSection, here we select a larger number (5 degrees) as theangle threshold. Because we focused on the quality of thereconstructed 3D boundaries, a boundary reconstructionapproach only based on lidar data introduced by Sampathand Shan (2007) is used, i.e., a lidar-approach, for furtherchecking the performance of the proposed approach. Thekey idea of this lidar-approach is a hierarchical regulariza-tion approach: initially, relatively long line segments areextracted from the building boundary and their leastsquares solution is determined, assuming that these longline segments lie on two mutually perpendicular direc-tions. Then, all the line segments are included to deter-mine the least squares solution, using the slopes of thelong line segments as weighted approximations. Wecompare the correctness and completeness of the buildingboundaries derived by the proposed approach and by thelidar-approach. Table 1 gives the results of correctness andcompleteness of 3D boundaries from the two approaches atRegion 1 and 2, respectively. In the Region 1, the correct-ness of the boundaries derived by the proposed approachand the lidar-approach are 98 percent and 85 percent,respectively; the completeness of the boundaries derivedby the proposed approach and the lidar-approach are91 percent and 82 percent, respectively. In the Region 2,the correctness of the boundaries derived by the proposedapproach and the lidar-approach are 92 and 82 percent,respectively; the completeness of the boundaries derivedby the proposed approach and the lidar-approach are 89percent and 87 percent, respectively. From Table 1,

it is clear that the proposed algorithm leads to a very highcorrectness and completeness, and the proposed approachcan lead to boundaries with much higher quality than thelidar-approach. By overlapping the boundaries recon-structed by the two approaches, Figure 10 illustrates theirdetailed differences: (a) some details may be missed bythe lidar-approach, shown by the A labels in Figure 10,(b) some artifacts, especially some fake corners, may becreated by the lidar-approach, shown by the B labels inFigure 10, and (c) building boundary segments may beshifted as an offset distance, shown by the C labels inFigure 10.

Geometric Accuracy ComparisonIn this section, we focus on the estimation of horizontalaccuracy of boundaries of these 3D models. The manuallyconstructed 3D models cannot be taken as a reference datafor geometric accuracy evaluation in vertical direction,because these 3D models have a reliable appearance, butdon’t have a high geometric accuracy. The estimation isconducted at Region 1 and 2 in the experimental area.Figure 11 shows the overlay of the 3D boundaries and thetrue orthophotos. Some correct boundary segments areselected for the estimation of the boundaries’ accuracy.Focusing on a boundary segment, we measure the distancefrom the midpoint of the segment to its reference point (thecorresponding point in the true orthophotos). According tothe statistics values of the selected boundary segments,the mean absolute error (MAE), RMSE, and maximum error(MAX) are calculated for checking the accuracy of finalboundaries.

Figure 10. Comparison on the correctness and completeness of the 3D building boundariesreconstructed by our approach (black) and lidar-approach (gray): A:the missed details, B: someartifacts, and C: shifted boundary segments).

TABLE 1. THE CORRECTNESS AND COMPLETENESS OF THE 3D BUILDING BOUNDARIES RECONSTRUCTEDBY TWO APPROACHES

Test area Approach Dataset Correctness Completeness

Region 1 Our approach Multi-view aerial images 98% 91%(5 cm) � lidar data(1.1 m)

lidar-approach lidar data (1.1 m) 85% 82%

Region 2 Our approach Multi-view aerial images 92% 89%(5 cm) � lidar data(1.1 m)

lidar-approach lidar data (1.1 m) 82% 87%

respectively. In the Region 2, the RMSE, MAE, and MAX of theboundaries derived by the proposed approach are 0.23 m,0.20 m, and 0.50 m, respectively; the RMSE, MAE, and MAXof the boundaries derived by the lidar-approach are 0.39 m,0.30 m, and 0.98 m, respectively. Figure 12 illustrates theboundaries reconstructed by our approach (left side) andlidar-approach (right side) for Region 1 and 2. The shortlines (orthogonal to boundaries) refer to errors of theboundaries, which are shown as vectors given the lengthand orientation. Each error line has been enlarged 20 times.From Table 2 and Figure 12, it is clear that boundariesdetermined by our approach have a higher geometricaccuracy than by lidar-approach, which also proves that theapproach of data integration can effectively advance thegeometric accuracy of 3D models.

ConclusionsTo automatically obtain 3D building models with precisegeometric position and fine details, a new approach integrat-ing multi-view imagery and lidar data is proposed in thisstudy. In comparison to the lidar-approach, the 3D buildingmodels derived by the proposed approach have a highercorrectness, completeness, and geometric accuracy, whichmeans that the proposed approach can lead to higher qualityproducts. The main contributions of this study are asfollows: (a) the data integration strategy of multi-viewimagery and lidar data is systematically studied for auto-matic reconstruction of 3D building models, (b) a newalgorithm for the determination of principal orientations of abuilding is proposed, thus benefiting to improve the correct-ness and robustness of boundary segment extraction in aerialimagery, (c) a new dynamic selection strategy based on lidarpoint density analysis and K-means clustering is proposedfor separating boundary and non-boundary segments, and(d) a new strategy for 3D building model reconstructionincluding “detect-locate-reconstruct” method and lidar-basedrobust reconstruction processes is proposed, which reducesthe complexity of 3D reconstruction and improves thequality of the final 3D models.

However, it is still difficult to automatically reconstruct3D building models with very complex structures (e.g., curvestructures and very complex rooftop structures), wherefurther investigation is needed. In addition, the proposedapproach still requires further improvement in order toachieve more efficient generation of 3D building models.

AcknowledgmentsThis work is supported by the National Natural ScienceFoundation of China (Grant No. 41001238), the National 863project, the National 973 project (Grant No. 2006CB701304),the Foundation of Innovative Research Groups of theNational Natural Science Foundation of China (Grant No.40721001). Sincere thanks are given for the comments andcontributions of anonymous reviewers and members of theEditorial team.


(a)

(b)

Figure 11. 3D building boundaries and orthophotos:(a) Region 1, and (b) Region 2.

TABLE 2. THE ERRORS OF THE 3D BUILDING BOUNDARIES RECONSTRUCTED BY TWO APPROACHES

Test area Approach Dataset MAE(m) RMSE(m) MAX(m)

Region 1 Our approach Multi-view aerial images 0.14 0.20 0.64(5 cm) � lidar data(1.1 m)

Lidar-approach lidar data (1.1 m) 0.35 0.43 1.13

Region 2 Our approach Multi-view aerial images 0.20 0.23 0.50(5 cm) � lidar data(1.1 m)

Lidar-approach lidar data (1.1 m) 0.30 0.39 0.98

The experimental results are listed in Table 2. In Region1, the RMSE, MAE, and MAX of the boundaries derived by theproposed approach are 0.20 m, 0.14 m, and 0.64 m, respec-tively; the RMSE, MAE, and MAX of the boundaries derived bythe lidar-approach are 0.43 m, 0.35 m, and 1.13 m,

ReferencesAckermann, F., 1999. Airborne laser scanning - Present status and

future expectations, ISPRS Journal of Photogrammetry andRemote Sensing, 54(2–3):64–67.

Baltsavias, E.P., 1999. A comparison between photogrammetry andlaser scanning, ISPRS Journal of Photogrammetry and RemoteSensing, 54(2–3): 83–94.

Brenner, C., 2005. Building reconstruction from images and laserscanning, International Journal of Applied Earth Observationand Geoinformation, 6(3–4):187–198.

Caselles, V., R. Kimmel, and G. Sapiro, 1997. Geodesic activecontours, International Journal of Computer Vision, 22(1):61–79.

Chaudhuri, D., and A. Samal, 2007. A simple method for fitting ofbounding rectangle to closed regions, Pattern Recognition,40(7):1981–1989.

Fischler, M.A., and R.C. Bolles, 1981. Random sample consensus:A paradigm for model fitting with applications to imageanalysis and automated cartography, Communications of theACM, 24:381–395.

Foerstner, W., 1999. 3D-city models: Automatic and semiautomaticacquisition methods, Proceedings of Photogrammetric Week1999.

Frueh, C., and A. Zakhor, 2004. An automated method for large-scale,ground-based city model acquisition, International Journal ofComputer Vision, 60(1):5–24.

Haala, N., and C. Brenner, 1999. Virtual city models from laseraltimeter and 2D map data, Photogrammetric Engineering &Remote Sensing, 65(79):787–795.

Habib, A.F., J. Kersting, T.M. McCaffrey, and A.M.Y. Jarvis,2008. Integration of lidar and airborne imagery for realistic

visualization of 3d urban environments, InternationalArchives of the Photogrammetry, Remote Sensing and SpatialInformation Science, XXVII(B2):617–624.

Hu, J., 2007. Integrating Complementary Information for Photorealis-tic Representation of Large-scale Environments, Ph.D.dissertation, University of Southern California, Los Angeles,California, 157 p.

Kraus, K., and N. Pfeifer, 1998. Determination of terrain models inwooded areas with airborne laser scanner data, ISPRS Journal ofPhotogrammetry and Remote Sensing, 53(4):193–203.

Ma, R., 2004. Building Model Reconstruction from Lidar Dataand Aerial Photographs, Ph.D. dissertation, The Ohio StateUniversity, Columbus, Ohio, 166 p.

Maas, H.-G., and G. Vosselman, 1999. Two algorithms for extractingbuilding models from raw laser altimetry data, ISPRS Journal ofPhotogrammetry and Remote Sensing, 54(2–3):153–163.

Mayer, H., 1999. Automatic object extraction from aerial imagery-A survey focusing on buildings, Computer Vision and ImageUnderstanding, 74(2):138–149.

McGlone, J.C., and J.A. Shufelt, 1994. Projective and object spacegeometry for monocular building extraction, Proceedings ofIEEE Computer Society Conference on Computer Vision andPattern Recognition.

Mcintosh, K., and A. Krupnik, 2002. Integration of laser-derivedDSMs and matched image edges for generating an accuratesurface model, ISPRS Journal of photogrammetry and RemoteSensing, 56(3):167–176.

Meer, P., and B. Georgescu, 2001. Edge detection with embeddedconfidence, IEEE Transactions on Pattern Analysis and MachineIntelligence, 23(12):1351–1365.

Mohan, R., and R. Nevatia, 1989. Using perceptual organization toextract 3D structures, IEEE Transactions on Pattern Analysisand Machine Intelligence, 11(11):1121–1139.

Nakagawa, M., and R. Shibasaki, 2003. Integrating high resolutionairborne linear CCD (TLS) imagery and lidar data, Proceedingsof 2nd GRSS/ISPRS Joint Workshop on Remote Sensing andData Fusion over Urban Areas.

Rau, J.Y., and L.C. Chen, 2003a. Fast straight lines detection usingHough Transform with principal axis analysis, Journal ofPhotogrammetry and Remote Sensing, 8(1):15–34.

Rau, J.Y., and L.C. Chen, 2003b. Robust reconstruction of buildingmodels from three- dimensional line segments, Photogrammet-ric Engineering & Remote Sensing, 69(2):181–188.

Remondino, F., and S. El-Hakim, 2006. Image-based 3D modeling:A review, The Photogrammetric Record, 21(115):269–291.

Risse, T., 1989. Hough Transform for line recognition: Complexity ofevidence accumulation and cluster detection, Computer Vision,Graphics, and Image Processing, 46(3):327–345.

Rottensteiner, F., and C. Briese, 2003. Automatic generation ofbuilding models from lidar data and the integration of aerialimages, International Archives of the Photogrammetry,Remote Sensing and Spatial Information Sciences,34(3/W13):174–180.

Rottensteiner, F., J. Trinder, S. Clode, and K. Kubik, 2005. Using theDempster-Shafer method for the fusion of LIDAR data andmulti-spectral images for building detection, InformationFusion, 6(4):283–300.

Sampath, A., and J. Shan, 2007. Building boundary tracing andregularization from airborne lidar point clouds, Photogrammet-ric Engineering & Remote Sensing, 73(7):805–812.

Schenk, T., and B. Csatho, 2002. Fusion of LIDAR data and aerialimagery for a more complete surface description, InternationalArchives of Photogrammetry and Remote Sensing,XXXIV(3/W6):310–317.

Seo, S., 2003. Model-based Automatic Building Extraction fromLidar and Aerial Imagery, Ph.D. dissertation, The Ohio StateUniversity, Columbus, Ohio, 139 p.

Sithole, G., and G. Vosselman, 2004. Experimental comparison offilter algorithms for bare-earth extraction from airborne laserscanning point clouds, ISPRS Journal of Photogrammetry andRemote Sensing, 59(1–2):85–101.


(a)

(b)

Figure 12. Comparison of geometric accuracy of the 3Dbuilding boundaries reconstructed by our approach (leftside) and lidar-approach (right side); the short lines(orthogonal to boundaries) refer to errors of the bound-aries, which are shown as vectors given the length andorientation; each error line has been enlarged 20 times:(a) Region 1, our approach (left side) versus the lidar-approach (right side), and (b) Region 2, our approach(left side) versus the lidar-approach (right side).

Sohn, G., and I. Dowman, 2007. Data fusion of high-resolutionsatellite imagery and lidar data for automatic building extrac-tion, ISPRS Journal of Photogrammetry and Remote Sensing,62(1):43–63.

Sohn, G., and I. Dowman, 2008. A model-based approach forreconstructing terrain surface from airborne lidar data, ThePhotogrammetric Record, 23(122):170–193.

Stamos, I., and P.K. Allen, 2002. Geometry and texture recovery ofscenes of large scale, Journal of Computer Vision and ImageUnderstanding, 88(2):94–118.

Suveg, I., and G. Vosselman, 2004. Reconstruction of 3D buildingmodels from aerial images and maps, ISPRS Journal ofPhotogrammetry and Remote Sensing, 58(3–4):202–224.

Tan, G., and R. Shibasaki, 2002. Snake based approach for buildingextraction from high resolution satellite images and height datain urban areas, Proceedings of 23rd Asian Conference on RemoteSensing.

Vestri, C., 2006. Using range data in automatic modeling ofbuildings, Image and Vision Computing, 24 (7):709–719.

Vosselman, G., and S. Dijkman, 2001. 3D building modelreconstruction from point cloud and ground plan, International

Archives of Photogrammetry, Remote Sensing and SpatialInformation Sciences, 34(3/W4):37–43.

Vosselman, G., and P. Kessels, 2005. The utilisation of airbornelaser scanning for mapping, International Journal of AppliedEarth Observations and Geoinformation, 6(3–4):177–186.

Weidner, U., and W. Forstner, 1995. Towards automatic buildingextraction from high resolution digital elevation models, ISPRSJournal of Photogrammetry and Remote Sensing, 50(4):38–49.

You, H., and S. Zhang, 2006. 3D building reconstruction from aerialCCD image and sparse laser sample data, Optics and Lasers inEngineering, 44(6):555–566.

Zhang, K., J. Yan, and S.-C. Chen, 2006. Automatic construction ofbuilding footprints from airborne lidar data, IEEE Transactionson Geoscience and Remote Sensing, 44(9):2523–2533.

Zhang, Y., Z. Zhang, J. Zhang, and J. Wu, 2005. 3D buildingmodeling with digital map, lidar data and video imagesequences, The Photogrammetric Record, 20(111):285–302.

(Received 03 September 2009; accepted 23 April 2010; final version25 May 2010)

PHOTOGRAMMETRIC ENGINEER ING & REMOTE SENS ING F eb r ua r y 2011 139

SEND COMPLETED FORM WITH YOUR PAYMENT TO:

ASPRS Certification Seals & Stamps, 5410 Grosvenor Lane, Suite 210, Bethesda, MD 20814-2144

NAMe: PhoNe:

CeRtifiCAtioN #: exPiRAtioN DAte:

ADDReSS:

City: StAte: PoStAL CoDe: CouNtRy:

PLEASE SEND ME: embossing Seal ..................... $45 Rubber Stamp ............$35

Now that you are certified as a remote sensor, photo grammetrist or GiS/LiS mapping scientist and you have that certificate on the wall, make sure everyone knows!

An embossing seal or rubber stamp adds a certified finishing touch to your professional product .

you can’t carry around your certifi-cate, but your seal or stamp fits in your pocket or briefcase.

to place your order, fill out the neces-sary mailing and certification informa-tion. Cost is just $35 for a stamp and $45 for a seal; these prices include domestic uS shipping. international shipping will be billed at cost. Please allow 3-4 weeks for delivery.

Certification Seals & Stamps

METHOD OF PAYMENT: Check Visa MasterCard American express Discover

CReDit CARD ACCouNt NuMBeR exPiReS

SiGNAtuRe DAte

Note: ASPRS certification seals and stamps can only be authorized by ASPRS and manufactured by ASPRS-approved vendors. unauthorized seal or stamp production is a violation of copyright law and the ASPRS Code of ethics.

3d building model reconstruction from multi-view aerial...

Documents