application of artificial intelligence and innovations in...

Alexander ReitererUwe EglyMichael HeinertBjörn Riedel (Eds.)

Second International Workshop (AIEG 2010)Braunschweig, Germany, June 2010Proceedings

Application of Artificial Intelligence and Innovations in Engineering Geodesy

Second International Workshop on

Application of Artificial Intelligence andInnovations in Engineering Geodesy

(AIEG 2010)

Braunschweig (Germany) – June 2010

Editors:

Alexander ReitererInstitute of Geodesy and Geophysics

Vienna University of Technology (Austria)

Uwe EglyInstitute of Information Systems

Vienna University of Technology (Austria)

Michael Heinert, Björn RiedelInstitute of Geodesy and Photogrammetry

Technische Universität Braunschweig (Germany)

Second International Workshop onApplication of Artificial Intelligence and Innovations inEngineering Geodesy (AIEG 2010)ISBN: 978-3-9501492-6-5

Local Organizing Committee:Michael Heinert, Björn Riedel

Layout:Tanja Vicovac, Alexander Reiterer

The Workshop has been supported by:

PrefaceIn the last years, Artificial Intelligence (AI) has become an essential tech-

nique for solving complex problems in Engineering Geodesy. AI is an ex-tremely broad field – the topics rage from the understanding of the natureof intelligence to the understanding of knowledge representation and deduc-tion processes, eventually resulting in the construction of computer programswhich act intelligently. Especially the latter topic plays a central role in ap-plications.

In 2008, the Working Group 4.2.3 of the IAG Sub-Commission 4.2 (“Ap-plications of Geodesy in Engineering”) was reorganized and extended from“Application of Knowledge-Based Systems” to “Application of Artificial In-telligence”. The reason behind this restructuring was to open the work-ing group to researchers working on all sorts of problems concerning AI-techniques and engineering geodesy. Current applications using AI method-ologies in engineering geodesy are geodetic data analysis, deformation anal-ysis, navigation, deformation network adjustment, optimization of complexmeasurement procedure and others. Simultaneously with the restructuringof the working group, we decided to initiate a first workshop to bring to-gether researcher from different research fields, mainly from informatics andgeodesy.

As a follow-up to the AIEG workshop that was successfully held in Vi-enna in 2008 „The Workshop on Application of Artificial Intelligence andInnovation in Engineering Geodesy – AIEG 2010” gives an overview of thestate of the art and recent developments in AI application in engineeringgeodesy. The aim of AIEG is to bring together the members of the WorkingGroup 4.2.3 in order to share and to transfer experience for the developmentof applications of artificial intelligence in engineering geodesy.

The second workshop on AIEG was held in June 2010 in Braunschweig(Germany) and was a platform for around twenty experts from informaticsand geodesy for presentation, discussion and networking.

Alexander Reiterer, Uwe Egly, Michael Heinert, Björn Riedel

Braunschweig, June 2010

Table of Contents

H. Kutterer:On the Role of Artifcial Intelligence Techniques inEngineering Geodesy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 7

T. Vicovac, A. Reiterer, U. Egly, T. Eiter, D. Rieke-Zapp:Intelligent Deformation Interpretation . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 10

S. Demirkaya:Deformation Analysis of an Arch Dam Using ANFIS . . . . . . . . . . .. . . . . . . 21

H. Neuner:Modelling Deformations of a Lock by Means ofArtificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 32

J.D. Wegner, A. Schunert, U. Sörgel:Recognition of Building Features in High-Resolution SAR andOptical Imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 42

E. Mai:Application of an Evolutionary Strategy in Satellite Geodesy. . . . . . . . . . . .47

K. Chmelina, K. Grossauer:A Decision Support System for Tunnel Wall DisplacementInterpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 59

D. Söndgerath:Statistical Interpolation – Introduction into Kriging Methods . . . . . . . . . . . . 70

M. Heinert:Support Vector Machines – Theoretical Overview and FirstPractical Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 71

Second Workshop on Application of Artificial Intelligence and Innovations in Engineering Geodesy 7

On the Role of Artificial Intelligence Techniquesin Engineering Geodesy

Hansjörg Kutterer

Geodätisches InstitutLeibniz Universität Hannover

email: [email protected]

Keywords: Observation Process, Kinematic Surveying, Modeling, Intelligent Agent, SoftComputing, Neural Networks, Optimization.

1 Engineering Geodesy and Artificial Intelligence

By definition, engineering geodesy is concerned with the art of technical surveying. Thiscomprises applications in various disciplines like civil engineering, mechanical engineering,or the geo-sciences. Typical tasks are the supply of control points, staking out, the controlof construction machines and processes, quality inspection in industrial production, and de-formation monitoring. The technical instruments as well as the strategies for observation,data processing and analysis are manifold. Most common are triangulation, trilateration,and polar techniques. Today, based on digital technology, the instruments and data streamsare highly automated; automatic target recognition and tracking may serve as an example.Hence, time is a relevant parameter. Independent of the goals of a particular application,typical stipulations for the surveying work in engineering geodesy may be formulated:

• adequate quality of the results,

• rapid provision of the results,

• high efficiency of the procedures,

• high spatial and temporal levels of details.

Moreover, added values such as additionally captured semantic information or derived cause-effect models between basic influences and observed reactions can be required. Regardingthe sketched situation, it is worthwhile to look at techniques from the field of Artificial Intel-ligence (AI). Starting already in the 1950‘s, AI had to suffer (and to survive) several setbackswhich were due to the general claim of being able to recreate the capabilities of the humanmind. Today, it provides a collection of mathematical techniques from various fields for thetreatment and solution of problems such as knowledge representation, learning, reasoning,or autonomous motion. Although a rigorous mathematical foundation is not available, theconcept of an intelligent agent (IA) provides some kind of a unifying paradigm. An IA isdefined as a system that acts intelligently, i.e. in an autonomous, proactive, reactive, social

8 Second Workshop on Application of Artificial Intelligence and Innovations in Engineering Geodesy

and adaptive way. For a comprehensive introduction to AI and IA the reader is referred toRussel and Norvig (2003) and Poole et al. (1998). Actually, several similarities betweenthe problems of engineering geodesy and AI, in particular regarding the IA concept, canbe identified. A very close relation is due to the fact that both geodesy in general and AIstrongly rely on methods from mathematical stochastics. In addition, modeling of relationsbetween variables and parameters, respectively, is basic in both disciplines. Learning – as anexample – consists of model selection and parameter identification which are common tasksin engineering geodesy. Hence, developments in AI are of special interest for engineeringgeodesy.

2 Application Examples

Looking at the scientific publications in (engineering) geodesy, the use of techniques whichare also part of AI is not new. Several studies were performed on learning techniques usingartificial neural networks or neuro-fuzzy networks (Heine, 1999; Miima, 2002). This con-cerns the field of Soft Computing in general – which can be understood as a branch of AI –with emphasis on fuzzy logic and fuzzy control (Haberler-Weber et al., 2007). In addition,Computer Vision has to be mentioned which strongly relies on automated image analysisand understanding; for an exemplary application in engineering geodesy see Reiterer (2005).Other publications could be mentioned which use Bayesian strategies or Bayesian networks.For kinematic applications in machine (or robot) control, state-space filtering approaches areused which lead, e. g., to the so-called SLAM techniques (Simultaneous Localization andMapping). All this work underlines that AI is highly relevant in engineering geodesy. Never-theless, it has to be noted that most of the techniques which are today labeled as AI actuallyoriginate from various scientific disciplines. Finally, some ongoing work at GIH (Geodätis-ches Institut, Leibniz Universität Hannover) is mentioned which shows some of the rangeof possible AI applications in engineering geodesy. The automatic modeling of cause-effectrelations for applications in geodetic deformation monitoring is studied in Martin und Kut-terer (2007) and in Neuner and Kutterer (2010). Paffenholz et al. (2010) are concerned withdirect, precise geo-referencing of static terrestrial laser scans; they use a Kalman filter ex-tension for modeling the robotics-like motion and for processing the data of a multi-sensorsystem mainly consisting of a terrestrial laser scanner and GNSS equipment. A third – yetunpublished – work is dedicated to the modeling and optimization of the efficiency of mea-surement processes as a part of construction work in civil engineering. For this purpose,Petri nets and genetic algorithms are applied.

References

Zienkiewicz, O.C. and Taylor, R.L.: The Finite Element Method. Vol. 1: The Basis, 5thEdition, Butterworth-Heinemann, Oxford, 2000.

Haberler-Weber, M., Huber, M., Wunderlich, T., Kandler, C.: Fuzzy System Based Anal-ysis of Deformation Monitoring Data at Eiblschrofen Rockfall Area. Journal of AppliedGeodesy, 1 (2007), pp. 17-26, 2007.

Heine, K.: Beschreibung von Deformationsprozessen durch Volterra- und Fuzzy-Modellesowie Neuronale Netze. Deutsche Geodätische Kommission, C, 516, München, 1999.


Miima, J. B.: Artificial Neural Networks and Fuzzy Logic Techniques for the Reconstruc-tion of Structural Deformations. Geodätische Schriftenreihe, TU Braunschweig, Germany,2002.

Martin, S., Kutterer, H.: Modellierung von Bauwerksdeformationen mit Neuro-Fuzzy-Verfahren. In: Brunner, F.K. (eds.): Ingenieurvermessung 2007 – Beiträge zum 15. Inter-nationalen Ingenieurvermessungskurs, Graz, Wichmann, Heidelberg, pp. 231-242, 2007.

Neuner, H., Kutterer, H.: Modellselektion in der ingenieurgeodätischen Deformationsanal-yse. In: Wunderlich, T. (eds.): Ingenieurvermessung 2010. Wichmann, Berlin, pp. 199-210, 2010.

Paffenholz, J.-A., Alkhatib, H., Kutterer, H.: Adaptive Extended Kalman Filter for Geo-Referencing of a TLS-Based Multi-Sensor-System. In: Proc. XXIV FIG InternationalCongress 2010: Facing the Challenges – Building the Capacity. Sydney, Australia, 2010.

Poole, D., Mackworth, A., Goebel, R.: Computational Intelligence – A Logical Approach.Oxford University Press, New York Oxford, 2010.

Reiterer, A.: A Knowledge-Based Decision System for an On-Line Videotheodolite-BasedMultisensor System. Geowissenschaftliche Mitteilungen, 72, Vienna, 2005.

Russell, S., Norvig, P.: Artificial Intelligence – A Modern Approach. Prentice Hall, PearsonEducation International, Upper Saddle River, New Jersey, 2003.


Intelligent Deformation Interpretation

Tanja Vicovac1, Alexander Reiterer1, Uwe Egly2, Thomas Eiter2 & Dirk Rieke-Zapp 3

1Research Group Engineering GeodesyInstitute of Geodesy and Geophysics

Vienna University of Technologyemail: [email protected], [email protected]

2Research Group Knowledge-Based SystemsInstitute of Information SystemsVienna University of Technology

email: [email protected], [email protected]

3Research Group Exogene GeologyInstitute of Geological Sciences

University of Bernemail: [email protected]

Abstract

Rockfalls and landslides are major types of natural hazardsworldwide that kill or injure alarge number of individuals and cause very high costs every year. Risk assessment of suchdangerous events requires an accurate evaluation of the geology, hydrogeology, morphologyand interrelated factors such as environmental conditionsand human activities. It is of partic-ular importance for engineers and geologists to assess slope stability and dynamics in orderto take appropriate, effective and timely measures againstsuch events. This paper presentsa decision-tool for geo-risk assessment on the basis of a knowledge-based system. The inte-gration of such a tool with novel measurement sensors into anadvanced system for geo-riskmonitoring, which performs data fusion on-line, is innovative. To enable such a system, aknowledge base capturing domain knowledge formally is developed, which to the best of ourknowledge is unprecedented; the completed part for initialrisk assessment works quite well,as extensive experiments with a number of human experts haveshown.

Keywords: Knowledge–Based System, Alerting System, Rockfall and Landslide Monitor-ing.

1 Background and Motivation

In the last years, damage caused by rockfalls and landslideshas been increasing, as well asthe number of persons that were killed or injured, due to a spread of settlements in mountainareas. In addition, many global climate change scenarios predict an increase in the proba-bility of heavy rain, which is a primary trigger for rockfalls and landslides. This causes anurgent need for highly effective and reliable tools for monitoring rockfalls and landslidesat an operational level. The increasing importance of rockfall and landslide monitoringis clearly also reflected by a large number of research projects. For example, in its last


two framework programs, the European Commission has positioned research about “NaturalHazards” and “Disaster-Management” as a priority topic. The core of geo-risk managementconsists of identifying, understanding and mitigating risk by reducing the probability or con-sequences of rockfalls and landslides. In the literature, several geo-risk management andgeo-monitoring systems can be found; most notable are (Fujisawa, 2000; Kahmen, 2003;Mc Hugh, 2004; Scaioni et al., 2004; Scheikl et al., 2000). Examples for systems used inpractice are GOCA (2009) and GeoMoS (2009). The main application field of these tools ismonitoring and analyzing deformations; however, they offer no possibility for deformationinterpretation. Currently this is done by human experts from geology and civil engineering,who are interpreting deformations on the basis of a large number of data records, documentsand knowledge of different origin.Given the increasing number of problematic sites and the limited number of human experts,automated intelligent monitored interpretation systems are required. The implementation ofa knowledge-based system enables an automatic process of interpretation and determining ofthe risk potential. In contrast to the mentioned monitoringtools (e.g., GOCA), it is possibleto perform deformation interpretation with our system. Based on the measured deforma-tion vectors, a measurement preprocessing is performed (mainly clustering to detect areasof similar movement). On the basis of this information and additional data about veloc-ity and orientation, some conclusions about the kind of occurring movement can be drawn.Additionally, data of different, heterogeneous sources, such as geodetic deformation mea-surements, geotechnical measurements, geological maps, geomorphological maps, in–situinvestigations, and numerical modeling methods have to be included in such a system.It should be emphasized that the integration of a knowledge-based system for solving thistask represents an innovative method.At the Vienna University of Technology (Institute of Geodesy and Geophysics), the interdis-ciplinary research project i-MeaS (“An Intelligent Image–Based Measurement System forGeo-Hazard Monitoring”) (i-MeaS, 2010) has been launched with the purpose of research,develop and implement an interpretation tool for geo-risk objects. The system gives on-line information about ongoing deformations and supports issuing alerts in case of excessivedeformation behavior.Making conclusions about incidents is a not-trivial problem; by using artificial intelligencetechniques, via the integration of a knowledge-based system, new directions are opened up.This new system is a complex intelligent system, working with several different data sets inreal-time. Deformation measurement data will be deliveredby a novel type of measurementsystem, which consists of two image-based sensors. Inside the captured images so-called in-terest points are detected. The calculation of the 3D coordinates is done by classical geodeticforward intersection . By means of such a high precision measurement system, 3D objectpoints can be detected with an accuracy of about 2-3 mm (object distances up to 1000 m).Subsequently a geodetic deformation analysis can be performed that yields as a result defor-mation movement vectors, which constitute the input for later interpretation. In this paper,we report on the architecture and functionality of the respective interpretation system and itsdevelopment stage. In particular, we present a knowledge base for risk assessment, whichto the best of our knowledge is unprecedented, and as comparative tests with a number ofdomain experts indicate, works well compared to human experts.

2 System Concept and Architecture

Remote monitoring of unstable slopes is a typical multidisciplinary problem incorporatinga network of sensors of different kinds. Movements and deformations can be measured, for


instance, with geo-technical sensors (e.g., inclinometers, tilt-meters, extensometers, etc.), oroptical measurement systems (e.g., tacheometers, laser scanners, etc.). Most of these sensorsmust be placed on-site; in hazardous terrain this is very often not possible. It is thus also nec-essary to apply remote monitoring methods, some of which arebased on photogrammetricmethods or terrestrial synthetic aperture radar (SAR). Both yield multi-temporal images thatcontain distances to the scene in each pixel.Current Systems. Recently, the interest in image-based measurement systems has increased.Leica Geosystems (Walser et al., 2003) developed a prototype of an “image-assisted total sta-tion” with the purpose of defining a hybrid or semi-automaticway to combine the strengthof the traditional user-driven surveying mode with the benefits of modern data processing.Furthermore, SOKKIA (2009) introduced a prototypical tacheometer which provides fo-cused color images. The central task of all image-based deformation measurement systemsis the calculation of 3D object coordinates from 2D image coordinates for a subsequent de-formation analysis or object reconstruction. The basic idea of deformation measurementsis capturing a zero state of the object (measurement epoch 0)and one or more subsequentobject states (measurement epoch n). The time interval between the measurements dependson the type and the estimated behavior of the objects.All these measurement systems are based on a permanent user interaction. Selection, point-ing and measurement of the relevant object points have to be operated by a measurementexpert. Most of the relevant processing steps are fully manual. The challenging task of thementioned i-MeaS project is to develop a fully automated system (user interaction will bepossible at different decision levels). Data capturing, data analysis and data interpretationshould be performed as an automated process.System Concept. In our system, we are using a new kind of optical measurementsystemwhich is based on a traditional tacheometer system, namely an image-based tacheometer sys-tem. In comparison with laser scanners, this system measures objects with higher accuracy;compared to photogrammetric systems, they are easier to usefor on-line measurement pro-cesses (e.g., object monitoring), especially because measurements can be done with a highdegree of automation.The processing chain of the new measurement concept starts with the capturing of geo-referenced images, followed by image point detection and by3D point measurement. Thefinal output of this measurement process is a list of 3D deformation vectors of the object(deformations captured between two or more measurement/time epochs). Measurement datais one of the basic elements of decision-making – however, many other factors are usedby the system (more details are given below). The system architecture can be divided intoseveral components:

• the measurement sensors (e.g., geodetic, geotechnical andon-site placed meteorolog-ical sensors),

• an image analysis system (which is needed because some sensors are working on thebasis of captured images),

• a system control component,

• a knowledge base,

• a system for deformation analysis, and

• a system for alerting.


Furthermore, the system includes an user interface. The simplified architecture of the sys-tem, with the knowledge base and the system control component as core units, is shown inFigure 1. A description of the measurement system can be found in i-MeaS (2010).

Figure 1: Simplified architecture of the system.

As mentioned above, such a complex system works on the basis of heterogeneous informa-tion. We are using the following information sources:

• generic domain knowledge (i.e., knowledge about coherences of influence factors andgeneral deformation behavior),

• case-specific knowledge (i.e., domain knowledge collectedvia historical notes, mor-phological and geological maps), measurement data (geodetic, geotechnical, geophys-ical measurement data, etc.),

• local weather data (like local temperature, the amount of precipitation, the kind ofprecipitation, etc.), and

• global meteo data, which are provided by meteorological services.

In order to test the optical measurement system under realistic field conditions the sensorsystem was installed over several days on the “Pasterze” glacier, the largest glacier in theEastern Alps. The end of the glacier which is covered by debris and a geologically stablerock face was chosen as test site. The main purpose of the testwas the evaluation of the pointdetection and the consecutive calculation of 3D coordinates under realistic environmental(especially illumination) conditions. The results show that the measurement system worksvery well and moreover could measured 3D object coordinateswith an accuracy of 2-3 mm(the distance to the object was about 1,000 m).

3 Risk Assessment

As mentioned above, there is a high demand for reliable prediction of hazardous events.Apart from monitoring, the detection of possible causes is amatter of particular interest.


According to these preconditions the requirements on a geo-risk assessment system are quitehigh. Eventually, knowledge-based interpretation shouldbe able to draw conclusions aboutthe type of occurring movements as well as providing possible causes for them.The concept of data interpretation is based on the “calculation” of risk factors for criticalcause variables and on the elaboration of an interpretationfor the deformation. Examples forcause variables can be precipitation, permafrost, critical slope angle, etc. The range valueof the risk factor is divided into six classes (low, low-medium, medium, medium-high, high,very-high). This definition is based on the discussion results with experts.The challenging problem in developing such an alerting system is (1) to identify relevantfactors and (2) subsequently to capture the interlinkage ofthese influence factors. The latterare described in the next section.In our system, the process of risk assessment is divided intotwo steps: (1) the determinationof the “Initial Risk Factor” and (2) the determination of the“Dynamic Risk Factor”. Thefirst step estimates the plausibility of an occurring movingevent. Furthermore the zero stateof interpretation and the observation is defined.The second step is focused on the processing of the temporal development of the risk factor.Therefore additional data have to be included into the decision process, e.g., measured datacaptured by the image-based monitoring system. Measurement data represent the 3D objectdeformations (data is captured in defined time periods resulting in movement/displacementvectors). As mentioned above, the system is also able to access local and global meteo datain real-time, which can be used by the dynamic system as a basis for deformation prediction.This process is leading to a detailed description of the deformation and an actual estimationof the risk factor, standardized on a predefined scale, whichcan be directly used as a globalindicator for the likelihood that a landslide or a rockfall will occur. In practice, the estima-tion of the risk factor is a continuous process, in which the dynamic risk factor has to bedetermined in a periodic feedback cycle.In the following, we focus on the determination of the “Initial Risk Factor”. Beside difficulttechnical requirements related to sensor and data fusion, the most challenging tasks in de-veloping such a system is the implementation of the knowledge base and, in a preliminarystep, the knowledge acquisition. This problem was solved using a two-step approach: in thefirst step, a single expert was consulted, while in the secondstep an extensive system eval-uation by many experts was carried out and their feedback wasincorporated into a systemrefinement (details are given below).

4 Construction of the Knowledge Base

As described, the challenging problem is the realization ofthe knowledge-based part, espe-cially the acquisition of knowledge from human experts. Forthis, we adopted a commonmethodology (Farrand et al., 2002; Negnevitsky, 2004). In order to estimate the initial riskfactor, influence factors had to be identified whose values increase the likelihood of defor-mation. During a period of extensive discussions about the domain problem, more thanthirty-five relevant influence factors were identified (e.g.vegetation, granual material, sub-soil, pieces of rock, indicates, slope angle, slope profile,slip surface, material underground,saturation of soil, leaning trees, leaning rocks, crack, rock joint, joint orientation, insola-tion, permafrost, stone chips, frost-thaw-cycle, depth ofmovement, local temperature, etc.).About thirty of them are used for the determination of the “Initial Risk Factor”. Some factorsincluding examples for possible consequences are listed inTable 1.On the basis of the identified factors and the discussion, we have developed anonline-questionnaire, which serves as a makeshift for assessing the“Initial Risk Factor” of the


Influence Factor Examples for Consequencesvegetation The vegetation has an influence on the slope stability and soil

saturation.granual material In interaction with slope angle and the kind of subsoil a

conclusion about the slope stability can be done.slope angle In interaction with slope angle and the kind of subsoil a

conclusion about the slope stability can be done.slip surface Existing slip surfaces are an indicator for slope movements.soil saturation The degree of saturation is dependent on the vegetation on

the surface.leaning trees / rocks Leaning trees and rocks are indicators for slope movements.insolation Insolation can affect factors like soil saturation and in

combination with the influence caused by granual materialand slope angle a conclusion about the slope stabilitycan be done.

permafrost The existence or absence of permafrost has an influence onthe slope stability.

Table 1: Examples for influence factors for the initial risk factor and possible consequences.

object to be observed. The questionnaire comprises questions ranging from the geologicaland morphological characterization, the vegetation, and the hydrology of the mountainsideto administrative issues. The expert may answer all object-relevant questions in-situ/online,usually using multiple sources to find the answers; geological and geomorphological maps,historical documents, data of earlier geotechnical or geodetic measurements of the observedslope, and last but not least inspection of the endangered area.Discussions with several experts revealed that estimatinga risk factor on the basis of manyinfluence factors and extensive domain knowledge is highly complicated. Moreover, expertssometimes largely disagree. Thus, a system which incorporates the opinion of more than oneexpert is indispensable to guarantee continuous high-level quality of decisions.For estimating the mentioned risk factor, we developed a knowledge-based system, adoptinga rule-based approach, more specifically using production rules. This is because the connec-tion between influence factors and possible causes or deformation behavior can be naturallyformulated by rules, and this representation is more accessible to domain experts than otherrepresentations.For the implementation, we have chosen JESS (Friedmann-Hill, 2003; JESS, 2010), whichis a rule engine and scripting environment entirely writtenin JAVA. JESS is easy to learn anduse, is well documented and supported; moreover it supportsfuzzy rules and uncertainty.Furthermore JESS integrates well into the Eclipse softwaredevelopment environment whichis widely used in industry.In order to make the collected numerical features (like measurement data, meteorologicaldata, etc.) more suitable for the rule-based decision system, we use an abstraction procedurethat is in line with the expert view. It translates the numerical input values (image features)into linguistic concepts which are represented by abstraction (“fuzzy”) sets. More specif-ically, they form an ordinalization, i.e., the sets are characterized by non-overlapping stepmembership functions; hence, this translation is a pre-stage of full fuzzification. The useof such an abstraction enables decision rules in terms of easily-understood word descriptorsinstead of numerical values. Furthermore, all data sets aresynchronized by a common time


basis.An example of a simplified initial state rule (IS) (JESS syntax) is shown in the follow-ing. The LHS of the rule ‘IS_riskpot_rockfall’ checks whether there are elements of typeIS_ROCKFALL in the working memory, fact1 and fact2, where infact1 certain slots (i.e.,attributes) have certain values (danger_recent_deformation has value ‘high’, etc., andfrost_thaw_cycle has value either ‘NO’ or ‘NN’), and fact2 states that a risk factor is notdefined. The RHS includes the instruction to update the status of fact2 to have risk_definedtrue and to set risk_pot to high_4. The values of the used slots (danger_recent_deformation,danger_slope_angle, danger_bedding, danger_fine_grit) are determined from combinationof input elements (separated rules).

(defrule IS_riskpot_rockfall(declare (salience 0))?fact1 <-(IS_ROCKFALL

(danger_recent_deformation == high) &&(danger_slope_angle == very_high) &&(danger_bedding == low) &&(danger_fine_grit == very_high) &&(frost_thaw_cycle == NO ||trost_thaw_cycle == NN))

?fact2 <- (IS_ROCKFALLrisk_defined != YES)=>

(modify ?fact2 (risk_defined YES)(risk_pot high_4)))

Generally, the rule base is divided into two groups of rules:(1) rules regarding the connec-tions between facts and consequences (e.g., rain and the consequent possible deformation ofthe object), and (2) rules determining the initial risk factor. The mentioned example is partof the second group. The whole “initial-risk-factor-system” consists of about 70 rules. It isalso notable that we have developed a tool for visualizationof rules and their firings, whichhelps in grasp and analyzing data dependencies.

5 Evaluation and Experiments

After developing and implementing a prototype for determining the “Initial Risk Factor”, weperformed an evaluation in a two step approach: (1) evaluation by one expert, followed byan improvement of the system, and (2) an exhaustive evaluation by eight experts.In step 1, 30 different data sets were prepared by one expert,where each data set models atest site with particular characteristics concerning slope profile, vegetation, insolation, etc.Facts like soil material, slope angle, hydrological properties, information about indicates formovement, etc. were predetermined. Then, the system processed the data sets and the riskfactors determined were compared with the decisions by the single expert. The discrepancieswere analyzed and the explorations were used for extending and upgrading the prototypesystem.In step 2, eight geological experts had to appraise independently the 30 test cases. Theresulting risk factors were compared with the result of the prototype system. It is striking tonote that the risk factor between the different experts varies up to two classes. In exceptionalcases, the difference is more than three. A statistical overview of the differences (∆) betweenthe eight experts and the system is shown in Figure 2; tables with results for all cases are inthe Appendix.


Figure 2: Statistic overview of the processed expert evaluation.

Also at the level of the individual test cases, the answer conformity was high; it is remarkablethat about 42 % of the answers agree completely with the system, and 30% of the answersare at distance 1. For example, expert 1 agrees completely with the system decision in 18 testcases (∆=0), in ten test cases the decision has distance 1 and in two cases it has distance 2.From case to case even the disagreement between the experts can be quite high. This mainlydepends on the individual experience and appraisal of each expert. For all the test case, therewas 102 times complete agreement (∆=0) of the experts and the system, while the maximaldisagreement (∆=4) was in 3 cases only.The answers of the system and the experts completely agree inone case, and they span aninterval of i classes, i=1,2,3,4,5, in 4,7,13,5 and 0 cases,respectively. Only in one case, thesystem answer is outside the interval of the expert answers.Furthermore, in 15 cases (i.e.50%) the system answer is the median of the expert answers andin 8 cases (27%) one ofthe middle answers (e.g., in case 6 the system answer is 2, while the middle answers in thesorted expert answers 1,1,1,1,2,2,3,4 are 1 and 2, thus the median is 1.5); in 5 cases (17%),it is one class to the median, and only in 2 cases the discrepancy is higher, with a maximumof two classes in the outlier case.As the above statistical data indicates, the risk assessment is a difficult task where expertexperience is required to obtain a reasonable solution. This is witnessed by the fact thatthere is no sole “correct” assessment for many of the different test cases, and expert opinionson the risk can vary for the same test case. The system can compete with the human experts;the differences between the result of the system on one side and the experts results varyin a similar way as the results vary between different experts. In order to further improvethe quality of risk assessments of the system and to test its usability, we initiated an evenbroader evaluation where we asked additional experts for their opinion on our test cases.These new experts should bring in a fresh sight on the problemand the system because they


have no information about the system and no training on it. The results obtained from firstnew probands are in accordance with the system and with the former evaluators. This alsoshows that the system interface is intuitive enough such that an untrained expert can easilyuse it without major difficulties.

6 Conclusion

The main goal of the presented work is the development of an innovative automated knowledge-based interpretation system for predicting rockfalls and landslides with advanced measure-ment techniques. Towards this goal, we have carried out extensive knowledge acquisitionand performed knowledge analysis with the help of several experts, leading to a rich knowl-edge base that builds on a number of influence factors, determined from various informationsources (e.g., measurement data, expert knowledge, maps, etc.).An experimental prototype for risk assessment we developedshows good results for thecompleted“Initial Risk Factor” part, in where it behaves like a human expert. Currently,the dynamic system(the second part of the final system) is under development – besidesextending the existing rule set, further rule components (e.g., for updating risk factor, in-cluding meteorological data, etc.) will be added. Future work will include testing (collectingfield data is targeted for the summer of 2010) and the integration of the knowledge basecomponent into an on-line geo-risk management system.

References

Farrand, P., Hussain, F., Hennessy, E.: The Efficacy of the Mind Map Study Technique.Medical Education, 36(5), pp. 426-431, 2002.

Friedmann-Hill, E.: JESS in Action. Rule-Based Systems in Java. Manning, 2003.

Fujisawa, K.: Monitoring Technique for Rock Fall Monitoring. Internal Report, 2000.

GeoMoS: http://www.leica-geosystems.com, 2009.

GOCA: http://www.goca.info, 2009.

i-MeaS: http://info.tuwien.ac.at/ingeo/research/imeas.htm, 2010.

JESS: http://www.jessrules.com, 2010.

Kahmen, H., Niemeier, W.: OASYS – Integrated Optimization of Landslide Alert Systems.In: Österreichische Zeitschrift für Vermessung und Geoinformation (VGI), 91, pp. 99-103,2003.

McHugh, E.: Video Motion Detection for Real-Time Hazard Warning in Surface Mines. In:NIOSHTIC No. 20026093, 2004.

Negnevitsky, M.: Artificial Intelligence – A Guide to Intelligent Systems. 2nd Edition. Ad-dison Wesley, 2004.

Scaioni, M., Giussani, A., Roncoroni, F., Sgrezaroli, M., Vassena, G.: Monitoring of Geo-logical Sites by Laser Scanning Techniques. IAPRSSIS, Vol.35, pp. 708-713, 2004.


Scheikl, M., Poscher, G., Grafinger, H.: Application of the New Automatic Laser Re-mote System (ALARM) for the Continuous Observation of the Mass Movement at theEiblschrofen Rockfall Area – Tyrol. Workshop on Advances Techniques for the Assess-ment of Natural Hazards in Mountain Areas, Austria, 2000.

Sokkia: http://www.sokkia.com, 2009.

Walser, B., Braunecker, B.: Automation of Surveying Systems through Integration of Im-age Analysis Methods. In: Optical 3-D Measurement Techniques VI, Grün and Kahmen(Editors), Volume I, Zurich, pp. 191-198, 2003.


Appendix

Statistical Overview of the Evaluation Results (En = Expert n ; S = System)

Test Case S E1 E2 E3 E4 E5 E6 E7 E8

1 5 5 5 5 5 5 4 5 5

2 5 5 4 4 5 5 5 4 5

3 3 3 3 1 2 3 3 4 2

4 3 3 2 3 4 3 3 3 2

5 0 0 2 0 0 0 0 0 0

6 2 1 1 1 4 2 2 1 3

7 0 0 0 0 0 0 0 0 0

8 3 4 3 1 4 3 1 2 3

9 3 3 3 2 4 3 1 1 4

10 4 4 4 3 3 2 2 3 3

11 4 4 4 3 5 2 3 2 4

12 5 5 4 5 5 4 4 4 5

13 3 2 3 3 4 3 2 4 4

14 3 3 3 3 4 4 1 2 2

15 4 5 3 5 5 4 5 2 5

16 4 4 3 4 4 2 0 2 2

17 4 4 4 5 4 2 1 1 1

18 3 3 3 5 3 3 4 2 2

19 3 3 3 4 2 2 1 1 1

20 3 3 3 3 2 2 1 1 1

21 3 1 3 1 3 2 2 0 1

22 2 1 3 3 3 2 1 0 1

23 2 1 4 2 1 2 1 2 1

24 2 0 3 4 2 2 1 2 2

25 4 4 3 4 3 3 4 4 4

26 3 2 3 2 2 3 4 3 4

27 2 1 3 2 2 3 3 2 4

28 4 3 3 5 4 3 4 2 4

29 2 2 3 3 1 2 1 2 3

30 0 1 4 2 3 2 2 4 1 E1-S

E2-S

E3-S

E4-S

E5-S

E6-S

E7-S

E8-S

=0 18 15 12 12 17 8 10 10 102=1 10 12 12 13 8 13 9 14 91=2 2 2 6 4 5 7 8 5 39=3 0 0 0 1 0 1 2 1 5=4 0 1 0 0 0 1 1 0 3=5 0 0 0 0 0 0 0 0 0

RISK FACTOR


Deformation Analysis of an Arch Dam UsingANFIS

Seyfullah DEMIRKAYA

Yıldız Technical UniversitySchool of Vocational Studies

Istanbul-Turkeyemail: [email protected]

Abstract

Such as arch dams the large engineering structures are required several professions cooper-ation and share their knowledge on dam monitoring and modeling their measurements is animportant area. In this study, a new procedure based on neuro-fuzzy modeling have beenpresented and discussed for the horizontal displacements of an arch dam that is an aspectof deformation process. It represents a fuzzy inference system which is implemented in theframework of adaptive networks. It is based on a supervised learning algorithm to optimizethe parameters of a fuzzy inference system. To illustrate the applicability and capability ofthe Adaptive Neuro Fuzzy Inference System (ANFIS) an arch dam named Schlegeis Dam inAustria is used as a case study. From 1992 to 1998, for 7 years daily records of water level, airand concrete temperatures (input variables) collected on downstream, middle and upstreamface and pendulum measurements (output variables) in that dam. The results demonstratethat the ANFIS can be applied successfully and provide high accuracy and reliability for thedisplacement predicting in the following periods.

Keywords: Arch Dam, Deformation Analysis, ANFIS.

1 Introduction

From the point of view Surveying Engineering integrated analysis of deformations of anytype of deformable structure includes geometrical analysis and physical interpretation. Ge-ometrical analysis describes the change in shape and dimensions of the monitored object, aswell as its rigid body movements, translations and rotations. The ultimate goal of the geo-metrical analysis is to determine in the whole deformable object the displacement and strainareas in the space and time domains. Physical interpretation is to establish the relationshipbetween the causative factors and the deformations. This can be done either a deterministic(static) method, which utilizes information on the loads, material properties and physicallaws governing the stress-strain relationship, or a stochastic (statistical) method, which anal-ysis the correlation between the measured deformations andcausative factors (Demirkaya,2005).Due to the considerable damage potential in case of failure,arch dams are among the mostsystematically monitored of large engineering structures. Although the first monitoring in-


struments were installed in arch dams more than 80 years ago,new approaches and methodsare currently being developed concerning the evaluation ofmonitoring data (Bianchi, 2000).The increased knowledge has led to the development new approaches which can exploitthe mass of data obtained by monitoring, yielding excellentresults. Within the last yearsa fundamental change took place in the methodology of geometric analysis. The classicalstochastic view is extended to such as the fuzzy inference system and artificial neural net-works of the soft computing (namely artificial intelligence) applications. Because of thestrongly non-linear, high degree of uncertainty and time-varying characteristics of the struc-tural behavior, none of a wide variety of approaches has beenproposed for deformationprediction can be considered as a single superior model. Recently, artificial neural networkshave been accepted as a potentially useful tool for modelingcomplex non-linear systems andwidely used for displacement forecasting on dam bodies (TheIalad Web Page, 2010). Be-sides to ANN, Fuzzy Inference System (FIS) to describe complicated systems has becomevery popular and been successfully used in various engineering problems (Demirkaya, 2008;Heine, 2008; Chang, 2006) . Each method has advantages and disadvantages. While in FISthere is no systematic procedure for design of a fuzzy controller, on the other hand ANN hasthe ability to learn from the input-output data set, self-organize its structure and adapt to it inan interactive manner.For that reason, it is proposed the use of the Adaptive Neuro-Fuzzy Inference System (AN-FIS) method to self-organize network structure and to adaptparameters of the fuzzy infer-ence system to forecast the displacements of an arch dam’s crest. This method is data-driven;it deduces the model directly from the data. Also, it is basedon a supervised learning algo-rithm to optimize the parameters of a FIS.In the following, the methodology of constructing the ANFISmodel is presented. The the-orem, network architecture and parameters estimating algorithms are described first. Next,a shortly description of the dam, available data set and model construction are given. Then,the results of ANFIS models that developed are explained anddiscussed. In last section, aconclusion of this research study is outlined.

2 Neuro-Fuzzy Modeling

ANNs are able to learn a kind of process connection from givenexamples of input-outputdata. They are consisted of independent processing units (neurons) and simulate the process-ing principle of biological networks like human brain. A high computation rate and a highdegree of robustness and failure tolerance are the advantages of ANNs. Also, they have theability to generalize and to learn adaptively (Heine, 2008).Fuzzy logic is another method of artificial intelligence. The key idea about fuzzy logictheory is that it allows for something to be partly that, rather than having to be either all thisor all that. The degree of "belongingness" to set or categorycan be described numericallyby a membership number between 0 and 1. The variables are "fuzzified" through the use ofmembership function that defines the membership degree to fuzzy sets. These variables arecalled linguistic variables. Membership functions are curves that define how each point in theinput space is mapped to a membership value in the interval0, 1. It can be of different formsincluding triangular, trapezium and gauss curve etc. The fuzzy rule based model operateson an "IF-THEN" principle, where the "IF" is a vector of fuzzyexplanatory variables ofpremises (input) and "THEN" is fuzzy consequence or dependent variable (output). Fuzzylogic allows the user to capture uncertainties in data (Chang, 2006) .It can be summarized that the two methods with their contributions in a neuro-fuzzy model-ing (Table 1).


Artificial Neural Network Fuzzy Inference System√

Difficult to use prior rule knowledge√

Prior rule-base can be incorporated√

Learning from scratch√

Cannot learn (linguistic knowledge)√

Black box√

Interpretable (if-then rules)√

Complicated learning algorithms√

Simple interpretation and implementation√

Difficult to extract knowledge√

Knowledge must be available

Table 1: The Comparison of ANN and FIS.

The neuro-fuzzy modeling refers to the way of applying various learning techniques devel-oped in the neural network literature to fuzzy modeling or toa fuzzy inference system (FIS).The basic structure of a FIS consists of three conceptual components: a rule-base, whichcontains a selection of fuzzy rules; a data-base which defines the membership functions(MF) used in the fuzzy rules; and a reasoning mechanism, which performs the inferenceprocedure upon the rules to derive an output. FIS implementsa nonlinear mapping from itsinput space to the output space. This mapping is accomplished by a number of fuzzy if-thenrules. The parameters of the if-then rules (antecedents or premises in fuzzy modeling) de-fine a fuzzy region of the input space, and the output parameters (also consequents in fuzzymodeling) specify the corresponding output. Hence, the efficiency of the FIS depends on theestimated parameters. However, the selection of the shape of the fuzzy set (described by theantecedents) corresponding to an input is not guided by any procedure (Mehta, 2009). Butthe rule structure of a FIS makes it possible to incorporate human expertise about the systembeing modeled directly into the modeling process to decide on the relevant inputs, numberof MFs for each input, etc. and the corresponding numerical data for parameter estimation.In the present study, the concept of the adaptive network, which is a generalization of thecommon back-propagation neural network, is employed to tackle the parameter identifica-tion problem in a FIS. This procedure of developing a FIS using the framework of adaptiveneural networks is called an Adaptive Neuro Fuzzy Inferencesystem (ANFIS) (Jang, 1993).As the name suggests, ANFIS combines the fuzzy qualitative approach with the neural net-work adaptive capabilities to achieve a desired performance(Jang, 1995). The details onadaptive networks are described by researchers (Jang, 1993) and a novel architecture andlearning procedure for the FIS that uses a neural network learning algorithm for constructinga set of fuzzy if-then rules with appropriate MFs from the stipulated input-output pairs isintroduced (Mehta, 2009; Jang, 1993).

3 ANFIS (Takagi-Sugeno)

ANFIS is a fuzzy system named Takagi-Sugeno type model placed in the framework ofadaptive systems to facilitate learning and adaptation. Such a framework makes fuzzy modelsmore systematic and less relying on expert knowledge and an affective practice for dataprocessing in laboratory works (Mehta, 2009; Jang, 1993). To present the proposed modellet’s consider a first order Sugeno model is shown in Fig.1. Instead of the using the classicalfuzzy IF-THEN rules, Takagi and Sugeno proposed to use the following fuzzy IF-THENrules:

Rl: IF x1 is F l

1and ... and xn is F l

n,

THEN yl= cl

0+ cl

1x1 + ... + cl

nxn

(1)


whereF l

iare fuzzy sets,ci are real-valued parameters,yl is the system output due to rule

Rl, andl = 1, 2, · · ·, M . That is, they considered rules that are IF part is fuzzy but whoseTHEN part is crisp-the output is a linear combination of input variables. For a real valuedinput vectorx = (x1, x2, · · ·, xn)

T , the outputy(x) of Takagi and Sugeno’s fuzzy system is aweighted average of theyl’s: where the weight implies the overall truth value of the premiseof rule for the input and is calculated as

wl=

n∏

i=1

µF l

i

(xi) (2)

whichµF represents the membership function of fuzzy set F.

Figure 1: T-S (Takagi-Sugeno) type Fuzzy Inference System.

The advantage of this fuzzy logic system is that it provides acompact system equation(2) and, therefore, parameter estimation and order determination methods such as neuro-fuzzy algorithms or neuro-adaptive learning techniques can be developed to estimate theparameterscl

iand the orderM . These techniques provide a method for the fuzzy model-

ing procedure to learn information about a data set, in orderto compute the membershipfunction parameters that best allow the associated fuzzy inference system to track the giveninput/output data. In this study, the well known adaptive algorithm called ANFIS with theaid of Matlab Fuzzy Logic Toolbox is used.

4 Data Used In the Analysis

It is used the data from the Theme C of the 6th ICOLD Benchmark Workshop on NumericalAnalysis of Dams was dedicated to the interpretation and a subsequent prediction of the crestdisplacements of Schlegeis arch dam (The Ialad Web Page, 2010). The dam was constructedbetween 1969 and 1971. It is a double curvature arch dam and height is 131m, crest length725m and crest thickness 9m with top water level at1782m (a.s.l.).


Figure 2: The Direct and Inverse Pendulum system of The Schlegeis Dam.

The observed radial crest displacements of dam are analyzedusing the time histories of waterlevel and concrete temperatures as input parameters. The data for this study are the waterlevel, the air temperature and the concrete temperatures at6 points - one value per day. Thewater level at09 : 00 is provided every day. The air temperatures are the arithmetic meanvalues from00 : 00 until 23 : 00. Concrete temperatures are measured daily in block0 intwo horizons, elevation1750.65m and1677.15m. In each horizon three thermometers areinstalled. The response value which has to be interpreted isthe radial crest displacement ofthe central cross section. This crest displacement is measured by pendulums and the pointof reference is80m below the foundation surface. Again, one value per day at09 : 00 isprovided. All of these data are related to the period 1992 to 1998 (The Ialad Web Page,2010; Demirkaya, 2008).

5 Model Construction

Once the fuzzy subsets of the water level, air and concrete temperatures recorded in differentsection of the dam, the linear equations for the crest displacements in Sugeno Type fuzzymodels are determined by ANFIS. It is possible to estimate the displacements from a givenwater level and temperature values. Hence, to realize this,Gaussian-type MFs was used,so as to run TS fuzzy inference system in this study. The 2555 data set (daily, 7 years) aredivided into three independent subsets: the training, verification, and the testing subsets. Thetraining subset includes 1095 data (3 years); the verification subset has 1095 data (3 years);while the testing subset has the remaining 365 data (1 year).


First, the training subsets are repeatedly used to build FISand to adjust the connected weightsof the constructed networks. Afterward, the verification subset is used to simulate the per-formance of the built models for checking its suitability ofgeneralization, and the best FISis selected for later use. The testing data set is then used for final evaluation of the selectednetwork performance. It is worth mentioning that the testing sets must be unseen by modelin training or verification phase. One of the most important tasks in developing a satisfactoryforecasting model is the selection of the input variables which determines the architecture ofthe model. The input variables are consisted of: WL= Water level of the reservoir, UT, andDT are the values of the thermometer embedded in upstream anddownstream face and MT isin the middle of the dam, respectively. AT is the air temperature near vicinity of the dam. Themeasured horizontal displacement values are showed with P.The MFs for input variables isshown in Fig.3(a-e). All of the input MF’s has four subsets and named suitable as QL: QuiteLow, L: Low, M: Medium and H: High. In this study, main intent is to demonstrate that theneuro-fuzzy network has the ability to deal with expert knowledge and enhance the modelperformance. In the beginning, it is developed two ANFIS models named ofModel

−1 and

Model−2 based on the number of MF for displacement forecasting. Theyhave 4 and 6 MF,

respectively. Lastly, theModel−3 is being obtained from divided the data set to year based

models and has 4 MF. The training, verification and testing data sets had been selected ran-domly. The rules related to the proposed model of the displacement forecasting can be givenas follows in the Rule Base.

6 Rule Base

IF WL is H andUT is M andMT is H andDT is M andAT is M THENP = 1.17 ∗ WL − 3.044 ∗ UT − 2.281 ∗ MT − 1.412 ∗ DT − 0.02781 ∗ AT − 1967

IF WL is QL andUT is L andMT is QL andDT is L andAT is L THENP = 0.2844 ∗ WL + 0.256 ∗ UT − 1.25 ∗ MT − 1.439 ∗ DT − 0.01542 ∗ AT − 455

IF WL is M andUT is QL andMT is M andDT is QL andAT is QL THENP = 0.548 ∗ WL + 1.154 ∗ UT − 2.359 ∗ MT − 1.058 ∗ DT + 0.0007784 ∗ AT − 893.4

IF WL is L andUT is H andMT is L andDT is H andAT is H THENP = 0.4676 ∗ WL + 1.145 ∗ UT − 3.737 ∗ MT − 1.829 ∗ DT − 0.1407 ∗ AT − 759.9

7 Evaluation of the Model Performance

The daily recorded data from January 1992 to December 1998 are studied. Because of thereare no fixed rules for developing an ANFIS, we followed a way based on trials and foundthat the data set 1992, 1996 and 1997 is used for training whilst that 1993, 1994 and 1995 isused for verification. Lastly, 1998 data set is used for testing.The ANFIS models are compared based on their performance in training sets and testingsets. The results are summarized in Table 2.It appears that the ANFIS models are accurate and consistentin different subsets, where allthe values of the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) aresmaller, and all Correlation Coefficients (CC) are also veryclose to unity.It also shows that the forecastingModel

−3 results in a much lower value of the MAE and

RMSE and higher value of the CC thanModel−1 and2. These results might also suggest

that the ANFIS has a great ability to learn from input-outputpatterns, which represent thewater level and temperatures are lumped effects on displacements of dam’s crest.Overall, the performance of all of three ANFIS models is verygood. The results demonstrate


Figure 3: The Membership Functions for the Input Variables.


Models Training DataCC RMSE MAE

Model−1 0.9895 3.1171 1.6154

Model−2 0.9895 3.3600 1.6347

Model−3 0.9936 2.1506 1.1483

Testing DataCC RMSE MAE

Model−1 0.979 2.364 1.571

Model−2 0.974 2.858 1.886

Model−3 0.9996 1.066 0.701

Table 2: Evaluation the performance of the models with training and testing data.

that the ANFIS can be successfully applied to establish the forecasting models that couldprovide accurate and reliable daily horizontal displacement prediction of the Schlegeis archdam’s crest.

Fig. 4 and 5 shows the measured and forecasted displacementsand water level and concretetemperatures, respectively, by the ANFISModel

−3 training and testing phases. The figures

nicely demonstrate that (1) the models’ performances are, in general, accurate, where alldata points roughly fall onto the line of agreement; (2)Model

−3 is consistently superior to

Model−1 andModel

−2 in training (training + verification) and testing phases. In Fig.

4 it showed that the performance of the fuzzy model on the training data set which consistsof 1992-1998 period of the measured data. Fig. 5 depicts the results of the validation of thefuzzy model.80% of the data set was used for training process and the remaining 20% usedfor the test process.

Figure 4: Forecasts of Crest Displacements for training data


Figure 5: Forecasts of Crest Displacements for testing data

To get a brief picture of the general performance of the constructed model, we also providethe pendulum records of the crest displacements versus water level for training and testingdata, respectively in Fig. 6 and 7.

Figure 6: Crest Displacements versus water level for training data


Figure 7: Crest Displacements versus water level for testing data

8 Conclusion

In this study, it is proposed the use of a novel neuro-fuzzy model, the ANFIS, to constructdaily displacement forecasting system to insure the Schlegeis arch dam structural healthsafety. ANFIS is a powerful fuzzy logic neural network, which provides a method for fuzzymodeling to learn information about the data set that best allow the associated fuzzy inferencesystem to trace the given input/output data.In general, the ANFIS models provide accurate and reliable displacement forecasting, wherethe correlation coefficients (CC) are very close to unity (larger than 0.99) in most of cases.We conclude that the constructed ANFIS models, through the subtractive fuzzy clustering,can efficiently deal with vast and complex input-output patterns, and has a great ability tolearn and build up a neuro-fuzzy inference system for prediction, and the forecasting re-sults provide a useful guidance or reference for deformation analysis studies. Investigationsshould be carried out using the ANFIS in combination with results from structural and/orstatistical analysis.

9 Acknowledgements

I would like to thank Prof. Gerald ZENZ, Dr. Franz PERNER and his valuable workingfriends for providing data. In addition, the authors are indebted to the reviewers and auditionsof this conference for their valuable comments and suggestions.

References

Demirkaya, S., (2005.) The physical interpretation of horizontal displacements of an archdam In: Proc. of the 10th Mapping and Scientific Congress of Turkey, The Chamber of


Surveying Engineers, March 29th-April 1st, Ankara, Turkey, 2005, pp.278-289 (in Turk-ish).

Bianchi, M. and Bremen, R., (2000) Health Monitoring of ArchDams- Recent Devel-opments, International Workshop on the Present and The Future in Health Monitoring,Bauhaus University-Weimar, September 3rd -6th, 2000, Germany.

Ialad: http://nw-ialad.uibk.ac.at/Wp3/Tg4/Se2/Files/6_BW_C.pdf (as 2010).

Demirkaya, S., Sahin, U. (2008) ANFIS approach for modelinghorizontal displacements ofan arch dam, In: Topcu, I. B. et al (eds.), Proc. of The Symposium for Modern Methodsin Science, Eskisehir, Turkey, 2008, pp. 345-356 (in Turkish).

Heine, K. (2008). Fuzzy technology and ANN for analysis of deformation processes, In:Proc. Of First Workshop on Application of Artificial Intelligence in Engineering Geodesy,Wien, 2008, pp. 9-25.

Chang, F. J. and Chang, Y.T. (2006). Adaptive neuro-fuzzy inference system for predictionof water level in reservoir, Advances in Water Resources, Vol.29, Issue 1, January, 2006,pp. 1-10.

Mehta, R. and Jain, S. K. (2009). Optimal operation of a multi-purpose reservoir using neuro-fuzzy technique, The Journal of Water Resources Management, Vol. 23, No. 3, February,2009, pp. 509-529.

Jang, J.S.R., ”ANFIS: Adaptive-Network-based Fuzzy Inference Systems“, IEEE Transac-tion Systems, Man, and Cybernetics, vol. 23, pp. 665-685, May 1993.

Jang, J.S.R., and Sun, C.T.(1995). Neuro-Fuzzy Modeling and Control, The Proceedings ofthe IEEE, vol. 83, pp. 378-406, March, 1995.


Modelling Deformations of a Lock by Means ofArtificial Neural Networks

Hans Neuner

Geodetic InstituteLeibniz Universität Hannover


Abstract System identification is one main task in modern deformationanalysis. If the phys-

ical structure of the monitored object is unknown or not accessible the system identificationis performed in a behavioural framework. Therein the relations between input and outputsignals are formulated on the basis of regression models. Artificial neural networks (ANN)are a very flexible tool for modelling especially non-linearrelationships between the inputand the output measures. The universal approximation theorem ensures that every contin-uous relation can be modelled with this approach. However, some structural aspects of theANN-based models, like the number of hidden nodes or the number of data needed to obtaina good generalization, remain unspecified in the theorem. Therefore we are still facing amodel selection problem.In this study the methodology of modelling the deformationsof a lock occurring due to waterlevel and temperature changes is described. We emphasize the aspect of model selection, bypresenting and discussing the results of two heuristic approaches for the determination of thenumber of hidden nodes. The first one is cross-validation. The second one is a weight dele-tion technique based on the exact computation of the Hessianmatrix. The results of thesemethods are compared from the viewpoint of generalization.

Keywords: artificial neural networks, lock, model selection, system identification

1 Introduction

One main task of modern deformation analysis is the identification of the monitored object.In a system theoretical approach this object is regarded as asystem (Welsch et al., 2000). Theloads acting on the object represent the input to the system and its reaction by deformationthe output. The dynamic deformation modelling aims at the time-related description of thecausal chain consisting of the input, the system and the output.In engineering geodesy the properties of the monitored system are typically derived fromsynchronous observations of the input and output measures in the framework of the so calledexperimental system identification. The modelling equations describing the system’s prop-erties can be formulated based on physical principles or in apurely mathematical way. Thefirst case leads to the so-called structural model. This is the most expressive way of de-scribing the system’s characteristics due to the fact that the model coefficients are physicalparameters, i.e. elasticity coefficients, coefficients of thermal expansion etc. A drawback


of this approach is its complexity and its object-specific formulation. The model’s structurecannot be used for similar objects as well.This is why in a predominant number of applications the second approach in system identi-fication is used - the so called behavioural modelling. Therein the relationship between thesystem’s input and output is expressed by mathematical functions. The coefficients of thesefunctions have no or very restricted physical meaning. The determination of these coeffi-cients is a problem of regression analysis. The generality of the model’s structure makes itapplicable to a large number of different object types.In most cases of system identification in the behavioural approach, the actual deformationstateyk is modelled as a linear combination of present and past values of the loadsxk andxk−l, respectively, and past deformation statesyk−p. The characteristics of the monitoredobject are represented by the coefficientsg of a weighting function :

yk = g0 xk + g1 xk−1 + . . . + gm xk−m + ek (1)

or alternatively by the coefficients of a AR-, MA- or ARMA-model (Box and Jenkins, 1976).Representative for this second category of models the equation of the AR-model is givenhere:

yk = a1 yk−1 + a2 yk−2 + . . . + ap yk−p + ek (2)

In (1) und (2) the parametersm andp define the order of the model which needs to be chosenprior to the solution of the regression problem. Therefore aproblem of model selection hasto be solved before estimating the coefficients.If the linear model leads to unsatisfactory results one can set up higher model dimensions byformulating a non-linear relationship between the deformations and the acting loads. For ap-plications in the field of engineering geodesy the Volterra-model is quite common (Pfeufer,1990). This model can be regarded as an extension of model (1)because it includes terms ofthe second orderg1g2, of the third orderg1g2g3, etc. It is obvious that in case of the Volterramodel the problem of model selection still has to be solved.This paper addresses the task of system identification in thebehavioural approach by mod-elling the deformations of a lock with respect to time and to the change of the acting loadswater pressure and temperature by means of artificial neuralnetworks (ANN). The modelstructure of ANN is used here because it includes, as shown byNeuner and Kutterer (2010),all the aforementioned modelling strategies. However, in this very general modelling ap-proach one still faces the model selection problem, as will be described in the next chapter.To strategies for solving the model selection task will be presented in the chapter 3. Theresults obtained for the modelling of the deformations of the lock with these strategies willbe described in the last chapter of the paper.

2 Feed-Forward ANN

In the context of system identification ANN can be regarded asmodel structures with pro-cessing units, so called neurons, organised in layers. The minimal configuration of a mean-ingful ANN consists of an input and an output layer. The number of units in these layerscorresponds to the number of observed acting loads and to thenumber of the deformationcomponents respectively and can therefore be regarded as fixed with respect to the model’sstructure. Further processing units can be included in the model in so called hidden layers.The number of hidden layers and the number of units containedin them are variable withrespect to the model’s structure.


The strength of the connection between a neuronk in the LayerL and a neuroni from theprevious layer(L−1) is established by the weightsw(L)

ki. These are the unknown parameters

of the model that need to be estimated from the observed data.In this paper only structures of ANN are considered where theinformation processing isperformed in just one sense through the network. Such structures are called feed-forwardnetworks. The use of such network architectures implies that phase differences between theinput and deformation measures are calculated prior to the modelling with ANN and there-fore, that the input and output data series are aligned in time.The output y from a network consisting ofNI = 2 input units,NH units organised in1hidden layer andNO = 1 output unit is given by:

y(2)

1 = ϕ(2)

NH∑

j=1

w(1)

1j y(1)

j + b(2)

1

= ϕ(2)

NH∑

j=1

w(1)

1j ϕ(1)

[2∑

i=1

w(0)

ji xi + b(1)

i

]

+ b(2)

1

, (3)

with:

- ϕ(L) - the activation function of the units in the Lth-layer,L = 1, 2,

- b(L)

i - the bias term of the ith-unit in the Lth-layer,

- y(L)

i - the output from the ith-unit in the Lth-layer,

- xi - the ith observed acting load.

Various activation functions are available from technicalliterature (i.e. Haykin, 1999). In thepresent study we have considered one of the most common ones:thetanh-function and thelinear function. While the first one is a non-linear sigmoidal function that maps its argumentsinto the domain[−1, 1], the linear function leaves the arguments unchanged.Motivated by the theorem of universal approximation we haverestricted this study to net-work architectures that contain one hidden layer. This theorem states that networks with onehidden layer and sigmoidal activation of the units in that hidden layer are able to approxi-mate every continuous function from one finite dimensional space to another to any desireddegree of accuracy, provided a sufficient number of hidden units (Hornik et al., 1989).The adjustment of the ANN model to the behavior of the monitored object is based on theobserved input and output data. It requires the estimation of the model parametersw(L)

kithat

lead to the global minimum of a loss functionEav. Typically, this loss function is chosento be the sum of the squared discrepancies between computed and observed output over allsamples N and all output unitsNO:

Eav =1

N

N∑

i=1

Ei =1

N

N∑

i=1

1

2

NO∑

k=1

(y

(k)

ANN,i− y

(k)

obs,i

)2

. (4)

Due to the high non-linearity of the ANN model the equations obtained from differentiationof the loss function with respect to the unknown weights are still non-linear. Therefore, theestimation problem cannot be solved with closed formulas.There are several methods available for obtaining a solution to the mentioned estimationproblem. Two different ones were used in the present study. The first one is the gradient-based method of steepest descent (Haykin, 1999). The main idea of this method is to estimatethe weights in an iterative procedure, by modifying them in opposite direction to the gradientof the loss function in the point of actual estimation:


w(L)

ji = w(L)

ji − η∂Eav

∂w(L)

ji

. (5)

While the gradient of the loss function with respect to the weights of the output units is ob-tained directly by differentiating (4), the gradient with respect to the weights of the hiddenunits requires a back-propagation of the error through the network.The parameterη denotes the learning rate. It sets the magnitude of the change in oppositedirection to the gradient. The estimation was started in this study with a learning rate of0.5.The choice of the learning rate has a direct impact on the convergence of the iterative esti-mation process. Due to the fact that only an adaptive learning rate ensures the convergence(Bishop, 2008),η was decreased during the iterative procedure by a factor of0.8 if the lossfunction decreases and increased by a factor of1.05 when the loss function increased. In thelatter case an update of the weights was omitted.The second estimation procedure used in this study is the Levenberg-Marquardt algorithm(LM-algorithm). The main difference to the aforementionedmethod of steepest descent isthe use of the second order approximation of the loss function. For a second order approxi-mation the Hessian matrix needs to be computed. This is a cumbersome computational task.Therefore the LM-algorithm uses an approximation of the Hessian based on the Jacobi ma-trix J. In the iterative process of the LM-algorithm the weights are changed according to therule:

∆w =

(JTJ + µI

)−1

· ∇E (w) (6)

In (6) µ is a regularisation parameter that assures the invertibility of the Hessian’s approx-imation andI is the identity matrix. In this study we have chosen a value of0.01 for theregularisation parameter. This small value gives a high contribution of the quadratic form -the approximation of the second order - to the change of the weights.The aspects briefly presented in this chapter make clear thatthere are theoretically well-founded methodologies for solving ANN models, provided that the model structure is given.While the theorem of universal approximation reveals the minimum number of hidden lay-ers, it doesn’t state the precise number of the units contained in them. It not even guaranteesthat this number is finite. Therefore, one still faces a problem of model selection prior to theestimation task. The solution to this problem is object of numerous research activities of thelast years (Anthony and Bartlett, 2009). However, a unique solution has not been given yet.We will address this problem in the next chapter and present two available methodologiesfor finding a suitable number of units in the hidden layer.

3 Setting the number of units in the hidden layer

3.1 Cross-Validation

The main goal of model selection is to find a model with good generalisation properties. Bygeneralisation is meant, that the model performs well beyond the data used for the estimationof its parameters. In order to do so, the model has to capture the functional relationship thatleads to the mapping of the input data into the output data.There is a clear distinction between the approximation and generalisation capabilities of amodel. A good approximation property doesn’t imply a good generalisation property. This ismainly due to observation- and system-induced noise that superimposes the error-free valuesof the data. A model structure that is chosen to be too complexin relation to the functionalrelationship that has to be captured alsomemorizesïn its free coefficients the noise contained


in the data. This occurrence is called overfitting. Such a model will perform well in approx-imating the data used for the estimation of its parameters but extremely poor on new data.These ideas are the basis for the procedure of selecting a suitable number of hidden unitsin an ANN named cross-validation. The available data set is divided into two subsets: Thefirst data set is used for estimating the weights of a certain model structure with a specifiednumber of hidden units. Therefore, this data is called training data. Then the input data ofthe second subset, that hasn’t been used in the estimation process, is fed into the ANN whichruns in the forward (or prediction) mode. The discrepancy between the computed and theobserved data of the second subset, expressed as mean squareerror (mse), is a measure ofthe generalisation property of the network. This is why the data of the second subset is calledtest data. The relationship between training and test data is usually chosen to be70% to 30%.In the cross-validation procedure one starts with a small model structure which is stepwiseincreased by adding hidden units (s. Figure 1). For every structure the approximation andgeneralisation errors are computed on the basis of the training and the test data respectively.The approximation error decreases continuously with increasing complexity of the model.In the first steps the generalisation error will also decrease until a certain model complexityis reached. Beyond this point the generalisation error increases because the model overfitsthe training data. The number of hidden units correspondingto the minimum of the general-isation error determines the optimal model structure and can be viewed as the solution of themodel selection problem.

Figure 1: Cross-Validation

3.2 Saliency of weights

A typical approach to model selection in linear regression analysis is to start with a largemodel and evaluate sequentially the significance of single coefficients. This is done by a sig-nificance test that uses the ratio between the model coefficient and its variance as a test value.In the linear case there is no problem to set up this test value, because the variance of thecoefficients results from the estimation process. This approach cannot be applied unalteredto ANN models due to their high degree of non-linearity. However, the concepts used in thelinear case can be transferred to the ANN case in order to evaluate the relative importance -the saliency - of single weights of the ANN.As in the linear case one starts with a relatively large network and removes sequentially


connections between the units with low relative importanceuntil a reasonable network ar-chitecture is obtained. The key of the method is of course theevaluation of the relativeimportance of the single weights. Due to the fact that the ANNis trained by minimizing aloss function (s. Chapter 2) it is a natural way to use this function for the definition of therelative importance of the weights. The saliency of a weightis defined as the change in theloss function that results from its removal from the model structure.For the direct identification of the weight with the lowest saliency the complete model hasto be trained first. Then, in a second step each weight is removed temporarily in turn fromthe model, the reduced model is trained and the corresponding change in the loss function isstored. The weight with the lowest contribution to the loss function is removed for ever fromthe model. However, this direct approach is computation intensive and very time consumingespecially for large networks.Therefore, a different way of evaluating the saliency of weights is presented in (Bishop,2008). The change of the loss function∆Eav resulting from the deletion of a weight wi isgiven in a second order approximation by:

∆Eav,wi=

∑

i

∂Eav

∂wi

∆wi +1

2

∑

i

∑

j

Hij∆wi∆wj + O(∆w3

)(7)

In (7) Hij denote the elements of the Hesse-matrixH. Note that the removal of a weightcorresponds to a change of that weight by∆wi = −wi. Due to the fact that the networkis trained the first term on the right side of (7) will vanish. Thus, the variation of the lossfunction is mainly determined by the elements of the Hessian. If the non-diagonal terms ofH are discarded, a usual procedure in evaluating the Hessian,the change in the loss functionfollows to:

∆Eav,wi≈

1

2

∑

i

Hii∆w2

i(8)

If a weight wi is removed from the model the loss function increases according to (8) ap-proximately withHii ·w

2

i/2. Thus, this quantity represents a measure for the saliency of the

weights.

4 Modelling the deformations of a lock by ANN

The methodologies presented in the former two chapters wereused to perform a system iden-tification of a lock situated in Uelzen. This lock was object of numerous research works; es-pecially at the Geodetic Institute of Hanover (i.e.: Neuner, 2008; Boehm and Kutterer, 2006).The deformation model of the lock was already discussed in a large number of publicationsrelated to this research work. Therefore it will not be repeated here. The deformation mea-surements were performed with an inductive measuring plummet system that was mountedin the tower of the tail-bay. The sampling interval is10 min. The analysed data covers a timespan of 4 days. The main acting loads causing the deformation: the change of water level inthe chamber due to the activity of the lock and the temperature were recorded synchronousto the deformation. The analysed data is presented in Figure2:


0 100 200 300 400 500−1

0

1de

form

atio

n[m

m]

0 100 200 300 400 500−20

0

20

−20

wat

er le

vel

[m]

0 100 200 300 400 500−20

0

20

Sampling No.

tem

pera

ture

[°C

]

Figure 2: Analysed data (centred in the mean of the time series)

The phase differences between the deformation and the acting loads were computed withmethods of time series analysis. Afterwards the time serieswere aligned in time prior totheir modelling with ANN. The phase difference between the time series in Figure 2 is al-ready removed.The causal relationship between the system’s input and output is obvious from Figure 2. Al-though one might assume a linear relationship between the two components of the causalchain this is not the case. As described in Neuner (2008) the direction of the deformationchanges during the filling and the emptying of the lock. Furthermore, the system’s reactionto the changes of water level is also temperature dependent.Therefore, a non-linear modelstructure is chosen for the system identification.Different to the identification performed in Neuner and Kutterer (2010) that was based onlyon water level changes, the feed-forward ANN used here contains two input units corre-sponding to the abovementioned acting loads. The single output unit corresponds to thedeformation. The network weights were estimated using the method of steepest descent andthe LM-algorithm. The values used for the learning rateη and the regularisation parameterµ were already mentioned in chapter 2.The main task of this study is to assess the number of units in the hidden layer that leadsto a model structure with good generalisation properties. For this purpose the two methodsdescribed in chapter 3 were used.For the cross-validation method the data set was separated into two subsets: the training dataset covering a time span of three days and the test data set covering one more day. The train-ing and testing procedures were performed on model structures with1, 2, 3, 5 and10 hiddenunits. The maximum number of10 was set in accordance with the number of samples of thetraining set such that the weights of the resulting model canstill be determined. The resultsobtained from the training and the testing of the ANNs with the method of steepest descent(msd) and with the LM-algorithm are presented in Figure 3:


Figure 3: Results of the cross-validation method

The variation of the mse with respect to the number of hidden units agrees well with thetheoretical concepts described in chapter 3.1. The mse computed from the training data isfor both estimation methods less than the one calculated from the test data. Figure 3 alsoreveals that the estimation with the LM-algorithm leads to better results than the one withthe method of steepest descent. This is mainly due to the 2nd order approximation of the lossfunction used in the LM-algorithm.One notices from figure 3 that the major decrease in the mse is obtained between1 and2

hidden units. In case of more complex structures the mse remains widely constant for thetest data. One exception is the minimum for the structure with 5 hidden units obtained byestimating its weights with the LM-algorithm. This minimumdiffers from the mse of thesimpler structures with2 and3 hidden units by an order of0.01 mm. This can be an inherentvariation due to the chosen initial solution for the weights. Relating this decrease to the11

supplementary parameters of the model with5 units in the hidden layer it seems quite small.Therefore a structure containing2 units in the hidden layer can be regarded as most suitablefor the system identification.The saliency of weights method requires as described in chapter 3.2 the computation of theHessian matrixH. This contains the 2nd order derivations of the loss functionEav withrespect to the weights. The elements of the Hessian were calculated in this study using sym-bolic differentiation. It turns out, that the analytic formof the differentials is quite rapidlyavailable while the replacement of the values into these forms is very time consuming. There-fore the evaluation of the Hessian was restricted to just a few samples of training data.The Hessian calculated for the structures with1 hidden layer were badly conditioned and insome cases even singular. This might be an important result when using methods of modelselection that imply the inversion of the matrixH (s. Bishop, 2008).The main result obtained with the saliency-of-weights method is shown in Figure 4 for thenetwork structure with 5 hidden units. Due to the structure of the ANN used in this study, theremoval of a weight between the hidden and the output unit implies the removal of the par-ticular hidden unit from the structure. Therefore, it is straightforward to analyse the saliencycoefficients for these connections separately (blue line inFigure 4). For a better overview


the saliency coefficients corresponding to the connectionsof the inputs and the bias with acertain hidden unit were summed up (red line in Figure 4).

Figure 4: Figure Results of the saliency of weights method

As it can be seen from Figure 4 there are 2 units in the hidden layer with salient connectionsto the output unit. For the same units the grouped coefficients are also salient. Therefore, themodel structure resulting from the saliency-of-weights method contains 2 units in the hiddenlayer. This result confirms the one obtained by cross-validation.Therefore, it can be stated that an ANN structure with 2 tanh-activated units organised in1 hidden layer is adequate for the system identification of the lock in Uelzen consideringthe water level and temperature as inputs to the system and the deformation as the system’soutput.

5 Conclusions

This paper deals with the problem of model selection in the framework of system identifica-tion based on ANN. Especially it addresses the task of choosing an appropriate number ofunits in the hidden layer. Two solutions were implemented inorder to solve this problem:the cross-validation and the saliency-of-weights method.For the monitored object, the lockin Uelzen, the results obtained with the two methods agreed very well. Thus, an adequatemodel structure was defined for the system identification of the lock.


References

Anthony, M., Bartlett, P. L.: Neural Network Learning: Theoretical Foundations. CambridgeUniversity Press, UK, 2009.

Bishop, C. M.: Neural Networks for Pattern Recognition. Oxford Press, UK, 2008.

Boehm, S.and Kutterer, H.: Modelling the Deformations of a Lock by means of Neuro-FuzzyTechniques. XXIII International FIG Congress, 8. - 13- October 2006, Munich, Germany,2006.

Box, G. and Jenkins, G. : Time Series Analysis: Forecasting and Control. Holden-Day, SanFrancisco, USA, 1976.

Haykin, S.: Neural Networks: a comprehensive foundation. 2nd edition, Pearson Education,Singapore, 1999.

Hornik, K., Stinchcombe, M. and White, H.: Multilayer Feedforward Networks are Univer-sal Approximators. Neural Networks, Vol. 2, S. 359 - 366, Pergamon Press, 1989.

Neuner, H.: Zur Modellierung und Analyse instationärer Deformationsprozesse. Wis-senschaftliche Arbeiten der Fachrichtung Geodäsie und Geoinformatik der Leibniz Uni-versität Hannover, Nr. 269, 2008.

Neuner, H. and Kutterer, H.: Modellselektion in der ingenieurgeodätischen Deformations-analyse. In: Wunderlich, Th. A. (Ed.): "Beiträge zum 16. Internationalen Ingenieurver-messungskurs, München, 2010", pp. 199 - 210, Wichmann, Berlin, 2010.

Pfeufer, A.: Beitrag zur Identifikation und Modellierung dynamischer Deformation-sprozesse. Vermessungstechnik (38), No. 1, pp. 19 - 22.

Welsch, W., Heunecke, O. and Kuhlmann, H.: Auswertung geodätischer Überwachungsmes-sungen. In the series: Möser, M., Müller, G., Schlemmer, H. and Werner, H. (Eds.): Hand-buch Ingenieurgeodäsie. Wichmann Verlag, Heidelberg, 2000.


Recognition of building features inhigh-resolution SAR and optical imagery

Jan Dirk Wegner, Alexander Schunert & Uwe Sörgel

Institute of Photogrammetry and GeoInformationLeibniz Universität Hannover

email: wegner, schunert, [email protected]

Abstract

In data of modern high-resolution SAR sensors like TerraSAR-X (TSX) man-made objectsbecome visible in high detail. However, layover, occlusions, and interfering backscatter ofdifferent objects within the same resolution cell complicate automatic analysis particularlyin urban areas. Two possibilities to facilitate interpretability are the use of additional datasources and time series of SAR images. We will present the current status of two researchprojects concerning those possibilities. First, we show building detection results achievedwith a combination of SAR and optical aerial images. Features are extracted and analysedin a Conditional Random Field (CRF) framework exploiting context-knowledge. Second,the persistent scatterer (PS) interferometric SAR (InSAR)technique is applied to discoverbuilding deformations in a TSX time series. Those PS are grouped and interpreted using a3D city-model.

Keywords: Building detection, Conditional Random Fields, Persistent Scatterer, Data fu-sion.

1 Introduction

In this paper we deal with the idea of introducing some kind ofprior knowledge (or contextinformation) in order to automatically interpret high-resolution SAR data. Furhermore, weintroduce additional data to facilitate the recognition and analysis of characteristic patterns.First, Conditional Random Fields (CRF), introduced by Lafferty et al. (2001) and adaptedto imagery by Kumar and Hebert (2003), are applied to combinehigh-resolution airborneInSAR data with an orthophoto for building detection. Second, persistent scatterers (PS) areestimated from time series of TSX images of the city of Berlin. In order to more robustlyestimate deformations, the PS are grouped and assigned to particular buildings with the aidof a 3D city model. In the next section an overview of the CRF approach is given followedby the PS analysis supported by a 3D city model.

2 Conditional Random Fields

The basic idea is to extract characteristic features for buildings in both optical and InSARdata, to insert both feature sets into a single feature vector, and to finally classify the data


based on this feature vector using a CRF. CRFs are graphical models and thus provide prob-abilities of the final labeling instead of just decisions. Those probabilities are very useful forpost-processing or decision making. Moreover, they are undirected graphical models (i.e.,random fields) as opposed to, for example Bayesian nets, which are directed graphical mod-els. Their main advantage is that they do not suffer from the label bias problem, which statesthat labels with fewer successors in the tree are preferred Lafferty et al. (2001). Furthermore,unlike Markov Random Fields (MRF) CRF are globally conditioned on all observationsxmaking them highly flexible for context modelling. They are discriminative models and thusmodel only the posterior distributionP (y|x) of labelsy given datax as opposed to MRF thatmodel the joint distribution of data and labels. CRFs as introduced by Lafferty et al. (2001)are defined as follows (x contains all observations andy all labels):

Let G = (N, E) be a graph such thaty = (yv)v∈V

, so thaty is indexed by thevertices ofG. Then(x, y) is a conditional random field in case, when conditionedon x, the random variablesy

vobey the Markov property with respect to the

graph: p (yv|x, y

w, w 6= v) = p (y

v|x, y

w, w ∼ v), wherew ∼ v means thatw

andv are neighbors inG.

The most common CRF approach is based on sufficient statistics of exponential functions:

P (y|x) =1

Z (x)exp

∑

i∈S

Ai (x, yi) +∑

i∈S

∑

j∈Ni

Iij (x, yi, yj)

(1)

The association potentialAi (x, yi) measures how likely it is that a sitei takes labelyi givenall datax (see Eq. 2). Datax in our case are the orthophoto and the InSAR data. We usea generalized linear model to distinguish building and non-building sites in the associationpotential.

Ai (x, yi) = exp

(yiwT hi (x)

)(2)

Vector hi (x) contains all node features. We take mean of red channel, green channel, hue(Fig. 1(b)) and saturation as orthophoto features. In addition, features based on the gradientorientation histogram are used. As InSAR features we extract bright double-bounce lines,overlay them with a segmentation of the orthophoto, and calculate distance maps of suchsegments (Wegner et al., 2009) (Fig. 1(d)). VectorwT contains the weights of the features inhi (x) that are tuned during the training process.The interaction potentialI ij (x, yi, yj) determines how two sitesi andj should interact re-garding all datax (see Eq. 3). In our case, feature vectorµij (x) is simply calculated bysubtracting the single scale feature vector from sitej from such of the sitei of interestµij (x) = hi (x) − hj (x). However, in generalµij (x) could also be chosen based on otherfeatures than such already used for the association potential and other methods of compar-ing the features are possible, too. VectorvT contains the weights of the features, which areadjusted during the training process.yi is the label of the site of interest andyj the label itis compared to. Unlike clique potentials in MRFs, labelyj does not necessarily have to be alabel of a sitej in the local neighborhood ofyi.

Iij (x, yi, yj) = exp

(yiyjvT µij (x)

)(3)

In order to obtain a posterior probabilityP (y|x) of labelsy conditioned on datax the expo-nential of the sum of association potential and interactionpotential is normalized by divisionthrough the partition functionZ (x), which is a constant for a given data set.


(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 1: (a) One test region of the orthophoto, (b) corresponding hue image, (c) Aes-1 SARimage of the same region (c©Intermap), (d) distance map of segments that overlap with SARdouble-bounce lines, (e) building ground truth, and building detection results with MaximumLikelihood (f), Support Vector Machines (g), and Conditional Random Fields (h)

CRF version DTR µ DTR σ FPRµ FPRσ

Maximum Likelihood 61 6 13 6Support Vector Machine 85 5 24 7

Conditional Random Field 85 7 27 8

Table 1: Meanµ and standard deviationσ of detection rate and false positive rate of firstML, SVM, and CRF experiments in percent

Table 1 shows preliminary results using the limited-memoryBroyden-Fletcher-Goldfarb-Shanno (L-BFGS) (Liu and Nocedal, 1989) approach for training and Loopy Belief Propa-gation (LBP) (Frey and MacKay, 1998) for inference. In orderto evaluate the performanceof the current CRF set-up we compare building detection results to those of a MaximumLikelihood classifier (ML), and Support Vector Machines (SVM) (see corresponding labelimages in Fig. 1 (f,g,h)). The CRF performs better than ML anddelivers results on the samelevel as the SVM. In order to further improve the CRF performance compared to the SVMcontext has to be modelled in a more sophisticated way in the interaction potential.

3 Persistent Scatter InSAR

Persistent Scatterer InSAR (PSI) is an extension to the classical InSAR approach mitigatingthe effects of atmospheric disturbances and temporal decorrelation. This is achieved by theuse of a stack of SAR images and the restriction of the processing to a set of radar targetsreferred to as Persistent Scatterers, which exhibit a coherent backscattering behaviour overtime. The outcome of a PSI analysis is an estimate of the deformation and the height for eachscatterer with respect to some reference point. The PSI algorithms of the first generation,which were mainly applied to stacks of ERS and ENVISAT imagesessentially applied signal


processing techniques to discriminate between deformation and height signal on the one handand nuisance terms like atmosphere on the other hand (Ferretti et al., 2000). Thereby onlylittle knowledge about the relationship of the PS to each other was used, which was mainlydue to the quite low spatial resolution of the ERS and ENVISATdata. With the advent ofhigh resolution SAR sensors like TerraSAR-X, which providea spatial resolution of up toone meter, this has fundamentally changed. While it is even hard to assign PS to singlebuildings in the ERS case, there are usually plenty of PS found for every building in theTerraSAR-X case, which can be exploited for the PSI analysis(Bamler et al., 2009). Themain idea is to group the PS into reasonable clusters and to work with these clusters insteadof working with the single PS. One way to cluster the PS is the use of building features likebuilding outlines, which can be extracted from 3D city models. After the projection of thesebuilding features into the radar geometry a matching with the PS set is conceivable, whichresults in an assignment of PS to building features. Finallythe relationship of the PS to eachother can be inferred offering the opportunity to introduceconstraints into the PSI analysis.An intermediate result for this procedure applied to a stackof TSX high resolution spotlightimages of Berlin is shown in Figure 2. On the left side the 3D city model is depicted, whichis taken from googleearth. The two buildings of interest areframed with a dashed and asolid rectangle respectively. On the right hand side the appropriate part of the SAR image isdepicted together with the radar coded building features, which are illustrated in dashed andsolid style respectively corresponding to the target buildings. The features used here are theapproximate building outline and rows of windows.

Figure 2: Left: 3D city model of a site in Berlin with two buildings of interest framed bythe two rectangles. Right: TSX image of this site overlayed with the radar coded buildingfeatures.

References

Frey, B.J. and MacKay, D.J.C.: A revolution: Belief propagation in graphs with cycles. inM.I. Jordan, M.J. Kearns, S.A. Solla (Eds), Advances in Neural Information ProcessingSystems, 1998, vol. 10, MIT Press.

Kumar, S. and Hebert, M.: Discriminative Random Fields: A Discriminative Framework


for Contextual Interaction in Classification. in Proc. IEEEInt. Conf. on Computer Vision,2003, vol. 2, pp. 1150-1157.

Lafferty, J., McCallum, A. and Pereira, F.: Conditional Random Fields: Probabilistic Modelsfor segmenting and labeling sequence data. in Proc. Int. Conf. on Machine Learning, 2001,8 p..

Liu, D.C. and Nocedal, J.: On the limited memory BFGS for large scale optimization. Math-ematical Programming, 1989, vol. 45, no. 1-3, pp. 503-528.

Wegner, J.D., Thiele, A. and Soergel, U.: Fusion of optical and InSAR features for buildingrecognition in urban areas. International Archives of the Photogrammetry, Remote Sensingand Spatial Information Sciences, vol. 38, part 3/W4, 2009,pp. 169-174.

Bamler R., Eineder, M., Adam, N., Zhu, X., Gernhardt, S.: Interferometric Potential of HighResolution Spaceborne SAR. In: Photogrammetrie - Fernerkundung - Geoinformation,no. 5, 2009, pp. 407-419

Ferretti, A., Prati, C., Rocca, F.: Permanent scatterers inSAR interferometry. In: IEEE Trans.Geosci. Remote Sens., vol. 39, no. 1, 2000, pp. 8-20.


Application of an Evolutionary Strategy inSatellite Geodesy

Enrico Mai

Satellite Geodesy and Earth System Research GroupInstitute of Geodesy and Geoinformation Science

Berlin University of Technologyemail: [email protected]

Abstract

Evolutionary strategies (ES) can be used to implement technical optimization algorithms ofreal-world problems. They are universal, undemanding, close to reality, easy to implement,and can be considered as a compromise between volume and pathorientated searches for theoptimal solution.Yet, there is no guarantee to find the global optimum and the convergence speed might beless compared to methods that are tuned to a specific problem.It is demonstrated that ES arecapable of dealing with typical geodetic problems: determination of a a low-dimension Earthgravity field, determination of some orbital elements of a satellite, simultaneous calculationof the offset, secular trend, and periodical parameters (frequencies, amplitudes, phases) fora given tabulated time series.

Keywords: Evolutionary Strategy, Gravity Field, Orbital Determination, Spectral Analysis.

1 Motivation

Evolutionary Strategies (ES) as being introduced in the early 1970s by Rechenberg (1994)have seen many improvements within the last decades. Nowadays, this approach can beregarded as an alternative to standard optimization techniques in many scientific areas, seefor instance Alvers (1998) or Kursawe (1999), especially incases where gradient methodslike the classical least-squares algorithm fail.This paper first summarizes the advantages and drawbacks of this special optimization tech-nique (ES), and compares it with the other branch ofEvolutionary Algorithms (EA), namelythe Genetic Algorithms (GA), as being promoted by Goldberg (1993), Goldberg (2002).Sometimes there’s confusion about the differences betweenES and GA. Therefore, Table 1outlines the main properties of both techniques. Being aware of its fundamental differences,one is able choose the appropriate method to attack a given problem, depending on practicalneeds.Compared to other optimization techniques, ES algorithms are relatively easy to realize,because the main idea behind it is very simple (Rechenberg, 1994). They are universal, un-demanding, close to reality, robust, and can be considered as a compromise between volumeand path orientated search strategies. Once implemented, the same algorithm can be applied


to a wide range of problems without any big changes. In many cases it’s even sufficient justto set up the new performance index that’s specific to the actual problem; one rarely needsany additional a priori insight into the mathematical/physical nature of the optimization task.

Genetic Algorithms Evolutionary Strategies

- imitation of the cause - imitation of the effect- mutation at the genotype - mutation at the phenotype- more experimental, less theoretical - sustained theoretically- mainly driven by recombination+selection- mainly driven by mutation+selection

(low mutation rate, soft selection) (high mutation rate, rough selection)- very large population size - relatively small population size- no waste of any offspring - variable selection pressure is possible- parents of good quality will be reproduced- all parents have the same chance to

more likely (proportional to its fitness) become reproduced (same probability)- faster convergence in the beginning - needs some time for step size adaptation- poor fine tuning capabilities - parameters can be adjusted precisely- strong causality may become violated - the more robust the slower- recombination result mainly just reflects - there is no need to transform into other

the way of encoding representations (e.g., binary)

Table 1: Genetic Algorithms vs. Evolutionary Strategies

The only necessary condition for the ES to be applicable to a specific problem, is the inher-ent existence of strong causality, not to be confused with weak causality (Rechenberg, 1994).But on the other hand, there is no guarantee to actually find the global optimum. In addi-tion, the convergence speed of an ES algorithm might be less compared to some alternativemethods that are tuned to a specific problem.Many contributions have already been made in order to improve the original ES theory andalgorithms, see for instance Ostermeier (1997), or Hansen and Ostermeier (2001). This papermakes use of an ES with adaption of the covariance matrix (ES-CMA) as described in detailin Hansen and Ostermeier (1996). The programming was done inMATHEMATICATM simplyon a single-core PC. Any of the strategy parameters were chosen empirically here; this couldbe avoided by applying a Meta-ES. Such an algorithm optimizes its own strategy parameters.

2 Determination of Earth Gravity Field Coefficients

Details on this application can be found in Mai (2008). The goal is to find some parametersof Earth’s gravity field, traditionally represented by spherical harmonicscnm andsnm up to agiven maximum degreenmax and ordermmax. Here we solve for a4×4-gravity field. Undercertain assumptions (Mai, 2005) this leads in total to 21 unknowns (spherical harmonics), sowe are dealing with a 21-dimensional optimization problem.Earth’s gravity field directly influences the motion of an orbiting satellite; we can treat thelatter as a test mass for an unknown force field. In order to determine its parameters, a certainnumber of satellite positions (either observed in reality or simulated as in this study) is given.In this case, we get a so-called inverse problem in satellitegeodesy.We are searching for an optimal set of spherical harmonics, the usage of which is leading tocalculated positionsrc

i. Comparing these position vectors with the simulated onesrs

iyields a

number of deviations∆ri := rs

i−rc

i. These differences should not exceed a chosen threshold

value. Depending on the norm, the performance index (quality criterion) may be defined as


Q =

N∑

i=1

‖∆ri‖ → min, (1)

whereN is the total number of given satellite positions.The following example usesN = 90 vectors. The simulation was done by applying numeri-cal orbit integration taking a slim version of the well-establishedUTOPIA software package,as provided by the University of Texas at Austin. It is based on a Krogh-Shampine-Gordonnumerical integrator. The following Keplerian elements were used as initial values:

a0 = 7000 km, e0 = 0.007, i0 = 70 deg, Ω0 = 0, ω0 = 0, M0 = −70 deg. (2)

Applying a fixed integration step size (= output step size) of 60 seconds leads to a simulatedorbital arc of 90 minutes length. At an intentionally low orbital height of approximately 630kilometers (to be fairly sensitive to gravitational effects), the satellite will almost completeone single revolution within that time interval.For the ES algorithm it is not necessary to choose initial values for the unknowns that areclose to reality and simulation, respectively. A user doesn’t need to care about reasonablevalues at all, he might even just take zeros instead. A bad first guess, against all expectations,theoretically has no influence on the duration of the optimization procedure, especially inhigh-dimensional problems (Rechenberg, 1994).In this case, a (1,40)-ES-CMA was realized. For each new generation there are only 1 parentbut 40 descendants, andonly the (mutated) offspring is subject to selection afterwards. Thisis denoted by the comma within the round bracket, following the usual ES-notation; whereasa plus sign would imply that both, parentstogether with the offspring, are taking part in theselection step. The optimization procedure contains an adaption of the covariance matrix,which describes the mutability of the unknowns.Depending on the chosen threshold value (= termination qualityQ∗), the optimization run-time can vary greatly. In general, most of the time is spent on(several) adaption phase(s), butwhen its done, the quality can improve dramatically for several orders of magnitude in shortduration. Withλ denoting the number of offspring, andG being the number of generationsit takes to reach the threshold value, the computational effort can be expressed by the totalnumber of function calls (performance index evaluations)λ · G.

300 600 900 1200 1500 1800 2100

-14

-12

-10

-8

-6

-4

-2

0

Figure 1: Logarithm of the absolute residual values of the unknowns vs. generation number.


Figures 1 and 2 show all of the interim results of a single ES optimization run for the givenexample. Adaption phases of the covariance matrix are clearly visible. In practice, one cantry to save some of this time by incorporating any available problem-specific pre-knowledge.Remark: a diagram like figure 1 is of course only available in case of a simulation, where theoptimal solution is actually known.

300 600 900 1200 1500 1800 21001.´10-6

0.001

1

1000

1.´106

Figure 2: Quality following equation (1) inm vs. generation number.

Table 2 presents the final values for all unknowns and compares it to the original sphericalharmonics (based on the Joint Gravity ModelJGM-3) as being used for the satellite orbitsimulation byUTOPIA. It also enables a glimpse at the orbit’s sensitivity to gravity changes.The termination quality was set toQ∗

= 1/1000 mm which is way beyond the accuracylevels of any of the available satellite observation techniques (GPS, SLR, etc.).

n m cnm snm

2 0 −4.8416954834480 · 10−04 -

3 0 +9.5717060002975 · 10−07 -

4 0 +5.3977705833457 · 10−07 -

2 1 −1.8694714700433 · 10−10

+1.1954500954474 · 10−09

3 1 +2.0301372076698 · 10−06

+2.4813079540691 · 10−07

4 1 −5.3624358305647 · 10−07

−4.7377249759825 · 10−07

2 2 +2.4392609849473 · 10−06

−1.4002665205972 · 10−06

3 2 +9.0470636114776 · 10−07

−6.1892285463862 · 10−07

4 2 +3.5067012168619 · 10−07

+6.6257136424735 · 10−07

3 3 +7.2114491711647 · 10−07

+1.4142039502771 · 10−06

4 3 +9.9086882512345 · 10−07

−2.0098746087090 · 10−07

4 4 −1.8848146556533 · 10−07

+3.0884815006772 · 10−07

Table 2: Final result of the ES optimization. All digits identical with the original sphericalharmonics are in bold print. For details aboutJGM-3 values and parameters, see Mai (2005).


The final values are, on average, precise to the level of10−13. This gives an idea about

the achievable absolute resolution of similar inverse problems. In fact, several gravity fieldsdiffering in degree and order, were tested. Figure 3 depictsthe evolving optimal solution fora3 × 3-example by showing some of the resulting satellite orbits.

Figure 3: Resulting satellite orbits around Earth (by looking directly onto the orbital plane),for a3× 3- gravity field. Only a few selected orbits are actually shown; from generation 105(light gray) till generation 330 (black). In total 747 generations were necessary to reachQ∗.

The applied algorithm did not make use of any pre-knowledge from celestial mechanics, orphysics. Reasoned conditions might be imposed to the solution of this inverse problem. Asan example, there exist certain integrals of motion that could be accounted for by addingpenalty terms (Alvers, 1998) to the quality function. In doing so, the algorithm can sort outunfitted solutions more quickly, but only if condition evaluations take just a little extra time.Remark: please be aware, that the whole optimization was done without any inversion of a(normal equation) matrix. For high-resolution gravity field models there are hundred thou-sands of unknowns, and the commonly applied least-squares methods have to invert a corre-spondingly huge normal equation matrix. This imposes a heavy computational burden andtakes much time and space in terms of memory.

3 Determination of a Satellite’s Orbital Parameters

The goal is to find the solution to a seemingly simple boundaryvalue problem. Given are twoposition vectorsrA andrB, valid at different epochstA andtB with tB > tA (to fix the senseof direction for the satellite’s motion), and a known force field (e.g., a4 × 4-gravity field).The positions may be the result of some ground-based directional measurements (figure 4).The fundamental task now is to transform the original boundary value problem into an initial


value problem; e. g., we search for the corresponding initial velocity vectorvA. By knowingthe initial state vectorzA := (rA,vA)

T , and the arc length, i. e., the time of flighttB − tA,the satellite orbit betweenA andB can be determined using traditional methods. Here, weapply the same numerical integration subroutines fromUTOPIA as in the last section; butthis time we assume an8 × 8-gravity field, again followingJGM-3.

x y

z

A B

rA

vA

rB

Figure 4: Exemplary boundary value problem: giventA, rA, tB, rB and a force field, findvA.

In this case, there are only three unknowns, namely the cartesian components of the initialvelocity vectorvA. It is absolutely sufficient to be familiar with the equationof motion ofthe perturbed two-body problem and its numerical integration. We do not require any othertheoretical knowledge about satellite geodesy or celestial mechanics, e. g., the availability ofintegrals of motion.Regarding the ES, we take updated values for the unknowns andperform classical numer-ical integrations usingUTOPIA. It yields current final state vectorszUT

B:= (rUT

B,vUT

B)T ,

from which we take just the position information. Comparingit to the originally given finalposition vectorrB, a very simple and easy to evaluate performance index can be defined:

Q = ‖∆rB‖ := ‖rB − rUT

B‖ → min. (3)

As an example, the following boundary values are given fortA = 0 sec andtB = 90 min:

rA =

2301.718 292 292 185 km

−2255.051 484 571 533 km

−6195.703 033 567 912 km

, rB =

−984.488 266 610 331 km

−2371.119 117 315 444 km

−6520.625 324 462 045 km

. (4)

In fact, the values forrB were obtained from a simulated orbit usingUTOPIA and (2). There-fore, as within the last section, true values for the unknowns are at hand in this artificial case.Transforming (2) into cartesian elements leads to

vtrueA

=

7.124 581 369 839 439 km/sec

0.868 731 490 519 958 km/sec

2.386 820 153 772 743 km/sec

. (5)


Again, we can plot residuals of the unknowns, i. e., the components of|∆vA| := |vtrueA

−vES

A|.

Figure 5 shows all interim results of a successful (1,40)-ES-CMA run.

20 40 60 80 100 120 140

-15

-10

-5

0


The ES was able to reproduce the true values (5) with high precision. The termination qualitywas set toQ∗

= 1 · 10−16 km = 1 · 10

−10 mm, which again exceeds the accuracy level ofany conceivable application. A matching solution was foundafter 145 generations and 5800quality function evaluations, respectively. Figure 6 shows the corresponding quality plot.

20 40 60 80 100 120 140

1. ´ 10-13

1. ´ 10-9

0.00001

0.1

1000

Figure 6: Quality following equation (3) inkm vs. generation number.

Figure 7 illustrates how changing velocity vectorsvES

Alead to different orbits, and therefore

to different final position vectorsrUT

B(emphasized by larger plot points). The corresponding

orbit for generation 0, i. e., the initial guess, is also shown - it’s the leftmost.Remark: the vector(0, 0, 0)

T wasnot chosen as the initial guessv(0)

Afor the ES algorithm,

because this choice would lead to singularities with the corresponding Keplerian elements.These elements describe the size, shape, spatial orientation, and relative position of a celestialbody within an osculating or momentary ellipse. However, a satellite released in space withzero initial velocity w. r. t. Earth would just fall straighttowards Earth’s center of gravity. Theshape of the resulting trajectory wouldn’t be elliptical.UTOPIA expects Keplerian elements


as input for its numerical integration, thus a somewhat arbitrary but eligible velocity vector(5 km/sec, 5 km/sec, 5 km/sec)T was chosen to be the initial guess.

Figure 7: Resulting satellite orbits around Earth, depending onvES

A. Only a few selected

orbits are actually shown; from generation 1 (light gray) till generation 30 (black). In total,145 generations were necessary to reachQ∗.

Other ES applications in the field of orbit determination problems are thinkable (Mai, 2010).

4 Spectral Analysis, also allowing for an Offset and a Secular Trend

The goal is to simultaneously determine periodical parameters of a given time series(tk, yk),which possibly is superimposed by a first degree polynomial.Hence, additionally to somefrequenciesfi, amplitudesai, and phasesφi, the ES algorithm shall detect an offsetn, and asecular trendm. The time series is given by a number ofkmax data points.In general, classical spectral analysis methods like FFT donot provide the best solution interms of an optimal, i. e., minimal, number of parameters necessary to accurately describea tabulated periodical signal. They allow only for integer-valued frequencies, and thereforemay require a relatively large number of different harmonicoscillations in order to approx-imate the given time series. In opposition, the ES algorithmcan deal with real-valued fre-quencies. Furthermore, it’s possible to determine any additional parameters (offset, seculartrend, ...) in parallel, and data epochs don’t have to be equidistant.The time series is modeled by

yk = n + m tk +

imax∑

i=1

ai sin(fi tk + φi), (6)

so, we first have to make an assumption about the numberimax of different oscillations thatmight be necessary to approximate the given signal with sufficient accuracy. Again, this canbe avoided by setting up a Meta-ES, that would treatimax simply as an additional parameterto be minimized.To prove the theoretical concept, some (quite important) practical aspects will be neglectedhere, e. g., superimposed noise, and outliers. The following performance index will be used:


Q =

kmax∑

k=1

v2

k→ min, (7)

equivalent to a classical least-squares approach, wherevk := yk−yES

kdenotes data residuals.

First, we simulate an exemplary signal withimax = 4, using

n =2

3, m =

1

3, a1 =

5

2, f1 =

19

10, φ1 = 1,

a2 = 2, f2 =15

7, φ2 = 4,

a3 = 3, f3 =4

3, φ3 = 5,

a4 = 1, f4 =31

9, φ4 = 3.

(8)

After defining an interval of[ta, te] := [0, 10] a number ofkmax = 100 randomly distributeddata epochs were created within this interval. The precision of thetk values was set to be16 significant digits. The simulation is finished by calculating the correspondingyk valuesfollowing equation (6). Figure 8 depicts the resulting timeseries, and the function behind it.

1 2 3 4 5 6 7 8 9 10

-3

-1

1

3

5

7

9

Figure 8: Simulated time series.

The optimization was done with exactly the same (1, λ)-ES-CMA algorithm as before. Onlythe strategy parameterλ (number of descendants) was changed to a lower level (λ = 10). Thetermination quality was set to be equivalent to the precision of the data, namelyQ∗

= 10−16.

1000 3000 5000 7000

-10

-8

-6

-4

-2

0



The following values were chosen as an initial guess for the unknowns:

n(0)= 0, m(0)

= 0, a(0)

i = 1, f(0)

i = 1, φ(0)

i = 0 (i = 1, 2, 3, 4). (9)

Pasting (9) into our model (6) leads to a single harmonic oscillation, i. e., an ordinarysin-wave with an amplitude equal to 4.

1000 3000 5000 7000

1. ´ 10-16

1. ´ 10-12

1. ´ 10-8

0.0001

1

Figure 10: Quality following equation (7) vs. generation number.

It took 7945 generations for the quality to fall below the threshold value. Figures 9 and 10are showing the interim results. Clearly, much time was spent to adapt the covariance matrix.Figure 11 depicts the corresponding change of the power spectrum (the plot was trimmed toa maximum amplitude of 4). Simultaneously, this plot illustrates a changing superimposedfirst order polynomial (straight line), indicating both offset and trend. The completing phasespectrum was omitted here, because it was showing a quite similar qualitative behavior.

0 1 2 3 4

1

2

3

4

Figure 11: Evolution of the power spectrum (amplitude vs. frequency), offset, and trend.


Actually, it’s more comprehensible to look at corresponding interim signal plots. Figure 12shows a selection of generations (not all of them in order to lighten the plot). It also includesthe initial guess, e. g., the aforementionedsin-wave.It’s obvious, that there exist several adjoining optimal solutions of different quality. Theseoptima are quite stable, and therefore it’s essential to usea sufficiently large number of signif-icant digits in every calculation. Otherwise the ES optimization run may become deadlocked.

1 2 3 4 5 6 7 8 9 10

-4

-2

2

4

6

8

10

Figure 12: Evolution of the resulting signal. Only a few sample generations are shown here.

In fact, spectral analysis from an ES point of view can be a highly multi-modal optimizationproblem. Some investigations (Meiselbach and Weisbrich, 2009) imply, that in certain casesthe absolutely essential strong causality principle may become violated, especially when theinvolved frequencies differ by several orders of magnitude. In these cases, ES won’t succeed.In our example, the ES algorithm met the challenge and was able to find a satisfying solution:

n∗

= 0.66666666687723, m∗

= 0.33333333336748,

a∗

1= 2.50000000068126, f ∗

1= 1.89999999983549, φ∗

1= 1.00000000277728,

a∗

2= 2.00000000346823, f ∗

2= 2.14285714261250, φ∗

2= 4.00000000188409,

a∗

3= 2.99999999898113, f ∗

3= 1.33333333315552, φ∗

3= 5.00000000112991,

a∗

4= 0.99999999989789, f ∗

4= 3.44444444466839, φ∗

4= 2.99999999862597.

(10)

In comparison with (8) we can see that all true nominal valuescould be reproduced with aprecision of 9-10 significant digits.Remark: of course, this precision is correlated directly with the quality threshold valueQ∗.Future investigations should also consider signals with some noise added, and test the abilityof ES algorithms to deal with outliers and other real-world challenges.


5 Conclusion

Evolutionary strategies are an alternative to classical optimization techniques in (satellite)geodesy. The same ES-CMA algorithm can be used, for instance, to determine gravity fieldcoefficients and satellite orbits, respectively. The fundamental prerequisite for an evolution-ary strategy to work is the existence of strong causality. Spectral analysis problems, in somecases, may not comply with this condition. But if so, evolutionary strategies are capable tosimultaneously detect (real-valued) frequencies, amplitudes, phases, offset, and trend in atabulated time series. Its data epochs needn’t be equidistant.Only three applications of evolutionary strategies in (satellite) geodesy were presented here.In future, this optimization technique should gain more importance, especially when it comesto solve inverse problems directly. The ever improving hardware and software capabilitieswill leverage this direct approach. Also geodesists shouldhave it in their toolkit.

References

Alvers, M.: Zur Anwendung von Optimierungsstrategien auf Potentialfeldmodelle, Disser-tation, Fachbereich Geowissenschaften, FU Berlin, 1998.

Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning, Addi-son-Wesley, Reading, Massachusetts, 1993.

Goldberg, D.E.: The Design of Innovation - Lessons from and for Competent Genetic Algo-rithms, Kluwer Academic Publishers, Dordrecht, 2002.

Hansen, N. and Ostermeier, A.: Adapting Arbitrary Normal Mutation Distributions in Evo-lution Strategies: The Covariance Matrix Adaption, Proceedings of the 1996 IEEE Inter-national Conference on Evolutionary Computation (ICEC’96), 312ff, 1996.

Hansen, N. and Ostermeier, A.: Completely Derandomized Self-Adaptation in EvolutionStrategies, Evolutionary Computation, 9(2), 159ff, 2001.

Kursawe, F.: Grundlegende empirische Untersuchungen der Parameter von Evolutionsstrate-gien - Metastrategien, Dissertation, Fachbereich Informatik, Universität Dortmund, 1999.

Mai, E.: Spektrale Untersuchung GPS-ähnlicher Orbits unter Anwendung einer Analytis-chen Bahntheorie 2. Ordnung, Dissertation, Fachbereich Geodäsie und Geoinformation-stechnik, TU Berlin, 2005.

Mai, E.: Zur Bestimmung des Erdschwerefeldes mittels einerEvolutionsstrategie, Allge-meine Vermessungs-Nachrichten, Heft 6, 225ff, 2008.

Mai, E.: Numerische Integration von Satellitenbahnen mittels Liereihen-Entwicklung, Ha-bilitation, Fachbereich Geodäsie und Geoinformation., TUBerlin, to be published, 2010.

Meiselbach, A. and Weisbrich, S.: Anwendung einer Evolutionsstrategie am Beispiel derFrequenzanalyse, Seminararbeit, Fachbereich Geodäsie und Geoinformationstechnik, TUBerlin, 2009.

Ostermeier A.: Schrittweitenadaption in der Evolutionsstrategie mit einem entstochastisier-ten Ansatz, Dissertation, Fachbereich Verfahrenstechnik, Umwelttechnik, Werkstoffwis-senschaften, TU Berlin, 1997.

Rechenberg, I.: Evolutionsstrategie ’94, Frommann-Holzboog, Stuttgart, 1994.


A Decision Support System for Tunnel WallDisplacement Interpretation

Klaus Chmelina1 & Karl Grossauer2

1Geodata ZT GmbHHütteldorferstrasse 85A-1150 Vienna, Austria

email: [email protected] Engineering Ltd.

Trockenloostrasse 21, P.O. Box 27CH-8105 Regensdorf-Watt, Switzerland


Abstract

The daily geotechnical interpretation of 3d tunnel wall displacements acquired in under-ground construction projects is an expert task requiring a high degree of experience andqualification. To assist the responsible engineer on site inthis sophisticated work, a decisionsupport system has been developed. In the introduction thispaper explains the relevanceand status of 3d displacement monitoring in the New AustrianTunnelling Method. Thenit presents the developed decision support system and the connected Virtual Reality sys-tem used to visualise the system results. In more details, itdescribes the two major systemcomponents (the knowledge-based library and the decision support application), the inputdata and how these data are stepwise evaluated by use of a pattern matching method anda trend correlation matrix. Finally, the paper summarizes the development as a successfulinterdisciplinary cooperation of geotechnical, geodeticand informatics experts.

Keywords: Deformation monitoring, Tunnelling, Knowledge-Based System, Decision Sup-port System, Virtual Reality.

1 Introduction

The geotechnical interpretation of 3d tunnel wall displacements is a particular problem ofthe observational method in tunnelling (Terzaghi and Peck,1948). This method is an in-tegral part of the New Austrian Tunnelling Method (NATM) which is widely applied inunderground construction. The Eurocode 7 (EN 1997-1, 2004)specifies conditions for theapplication of the observational method. Requirements to be met before the construction isstarted are:

• Acceptable limits of the system behaviour shall be established.

• The range of possible behaviours shall be assessed and it shall be shown, that there isan acceptable probability that the actual behaviour is within the acceptable limits.


• A plan of monitoring shall be devised, which will reveal whether the actual behaviourlies within the acceptable limits. The monitoring shall make this clear at a sufficientlyearly stage and with sufficiently short intervals to allow contingency actions to beundertaken successfully

• The response time of the instruments and the procedures for analyzing the results shallbe sufficiently rapid in relation to the evolution of the system.

• A plan of contingency actions shall be devised which may be adopted if the monitoringreveals behaviour outside the acceptable limits.

In underground construction the above mentioned system behaviour, which reflects the in-teraction between the ground and excavation and support measures (OEGG, 2008) has tobe predicted during design and continuously refined during construction. For this refine-ment, up to date monitoring methods and efficient data evaluation and interpretation tech-niques play an important role (Schubert and Steindorfer, 1996; Schubert, 2002; Steindorfer,1996). Among these methods, especially the geodetic technique of measuring absolute 3ddisplacements of targets fixed to the lining by use of a total station has become the mostprominent one. This method has considerably developed during the last 15 years. Its ac-curacy has reached the range of one millimeter (and below), which is good enough for thepurpose. Typically, the displacements are manually measured one time per day by the sitesurveyor(s). But in an increasing number of projects already automatic monitoring systemsemploying computer-controlled robotic total stations areused. These systems continuouslymeasure displacements and some even provide their data in near real-time.The information contained in these displacements can be evaluated in sophisticated ways andused for different purposes (Sellner and Grossauer, 2002; Grossauer and Schubert, 2008).Several tools are available on the market, which make data handling, evaluation and inter-pretation easier and more efficient on site. However, interpretation is still done manuallyrequiring expert knowledge. In practice, the job is often done in a very individual, empiricalway and the quality of the interpretation highly depends on the qualification and educationof the engineer. This particular circumstance is well knownby all experts and repeatedlydiscussed.This problem has motivated to develop a decision support system that should assist thegeotechnical engineer on site in the evaluation of 3d displacement data gained during tunnelexcavation. The objective of the system is to provide the possibility to automatically inter-pret encountered situations (e.g. whether they are normal or not). Warning criteria shouldbe implemented in order to detect the development of critical trends. In order to properlyunderstand the output produced all relevant data should be presented in an advanced andproblem-oriented way.

2 Decision support system

The developed decision support system consists of two main components,the knowledge-based library(chapter 2.1) andthe decision support application(chapter 2.2).

2.1 Knowledge-based library

The knowledge-based library is a data store containing characteristicdisplacement behaviourcasesand adisplacement trend catalogue. A displacement behaviour case is a compositionof the following different kinds of data:


• the 3d displacements (of all targets of a measuring cross section, Figure 1 shows anexample),

• all data potentially controlling/having an influence on thedisplacements such as the

- geological conditions (rock mass properties, joint sets,...),

- geometrical conditions (tunnel geometry, overburden, ...),

- driving related data (construction progress, construction phases, ...),

- support data (rockbolts, anchors, shotcrete layers, ...)and

• the case interpretation (= the conclusions of an expert).

The cases stored in the library may be real or synthetic, theymay indicate normal or abnormalbehaviour. They are all provided by leading geotechnical experts and, so to say, representrecognized state-of-the-art tunnelling knowledge. The library establishes a knowledge baseand training catalogue for geotechnical engineers. It serves as the basic input data source forthe decision support application (see chapter 2.2).

Figure 1: Displacement diagram showing settlements of a displacement behaviour case asstored in the knowledge-based library.

The second component of the library is the displacement trend catalogue. It contains/describescharacteristic trend developments for different geotechnical situations when tunnelling fromstiff to soft rock mass and vice versa (Lenz, 2007; Grossauerand Lenz, 2007). Figure 2shows such a set of typical displacement trends when tunnelling through a fault zone of softrock mass that strikes the tunnel in a certain horizontal angle. The upper graphic in Figure 2,for example, reflects the usually observed tendency that crown settlements increase whenentering this type of a fault zone and decrease when leaving such a zone again. The graphicsbelow show the related trends for the ratios of longitudinaland vertical crown displacements


(L/S), horizontal and vertical crown displacements (H/S) and the ratios of settlements of theright and left foot points of the top heading excavation.Technically, this library has been implemented as MS SQL Server database. A softwareapplication (the Tunnel Information SystemKronosof Geodata) allows the user to performall necessary data management operations like querying, input, output, import, export anddisplay of cases and trends and their associated data.

Figure 2: Example of characteristic displacement trends asstored in the library.

2.2 Decision support application

The application developed consists of three modules subsequently performing the followingoperations:

1. Module 1: Retrieval of displacement scenariosCandidates (so called displacement scenarios) generally fitting to the actual geologi-cal/geotechnical conditions as encountered on site are retrieved from the library. Thissearch/filtering step reduces the total number of stored cases to a smaller and – forperformance reasons better computable – number which is then further processed bymodule 2.


2. Module 2: Comparison of the displacements of two casesThe displacement characteristic of the actual case is compared with each of the can-didates determined by module 1. This comparison is performed by applying a rastergraphic pattern matching method (Grossauer, 2008). This method (see Figure 3) trans-fers two to be compared displacement vector paths into raster paths, transfers thesepaths into binary matrices, superposes them to a similaritymatrix (showing whereboth paths equal each other) from which finally a ‘match’ parameter in the interval[0,1] is calculated by use of an heuristic algorithm.

Figure 3: Illustration of the raster graphic pattern matching method (from left to right): Twodisplacement vector paths of a synthetic test data set, their corresponding raster paths, the im-ages of their binary matrices (black represents ‘1’, white represents ‘0’) and the superposedsimilarity matrix. The example gives a ‘match’ parameter of0,34.

This method is now applied for all vectors a case consists of (for the vectors of allpoints of a measuring cross section) delivering several ‘match’ parameters which areaggregated to a ‘match’ parameter for the overall case. Final result of module 2 isthe candidate list again, now ordered by this ‘match’ parameter for each case thatempirically expresses the degree of similarity of the compared two cases. In case nocandidate is determined by module 1, the expected displacement behaviour has to beinput manually.

3. Module 3: Comparison of trend characteristicsAssuming that no satisfactory match can be achieved by module 2 (no comparable casecan be found) the question of ‘normality’ remains open and module 3 is invoked. Thismodule another time performs a kind of case-based reasoningby evaluating a correla-tion matrix to find the degree of correlation between the actual trend development andthe trends represented in the displacement trend catalogue. The result is a list of alltrends and a rating value describing to which trend the actual trend fits best.

The Figures 4 to 6 show the overall evaluation process and theparticular tasks of the threemodules in detail. The evaluation process of the initial phase, when the first monitoring sec-tion is installed, is depicted in Figure 4. The result is the information whether the observeddisplacement characteristic is within the expected behaviour or not. The process starts with


Figure 4: Evaluation procedure for the initial phase (startof the evaluation process); greyareas highlight the components accessing the knowledge-based library.

the input of the encountered geological/geotechnical conditions. Module 1 retrieves properdisplacement scenarios from the library that generally meet this input. Depending on whetheror not a proper scenario is found, the evaluation process follows two different ways. For thecase “no”, the expected displacement characteristic for the encountered conditions must beinput manually (e.g. using predictions made during tunnel design). Subsequent evaluationsteps comprise the comparison of the actual displacement characteristic with the expecteddisplacement behaviour utilising module 2 and the evaluation of the degree of their similar-ity. If both characteristics are equal in terms of a high similarity level, the actual displacementbehaviour is categorised as “normal”. Contrary, if they differ, either the tunnel performs ab-normal, or the normal displacement behaviour determined isinaccurate or wrong with regardto the actual conditions. However, the situation must be analysed thoroughly and counter-measures initiated if necessary. The evaluation procedurefor the case “yes” is generally thesame, except that the actual displacements are now comparedwith the displacement scenar-ios (= candidates) derived from the library. With advancingexcavation and new displacementdata, the displacement behaviour of the observed monitoring section has to be continuouslychecked against the normal behaviour.

The evaluation procedure following the initial phase is shown in Figure 5. It is quite similarto the procedure already described.


Figure 5: Recurrent evaluation procedure for all monitoring sections; grey areas highlightthe components accessing the knowledge-based library.

It starts with the update of geological/geotechnical conditions encountered at the respectivemonitoring section in case they have changed compared to theprevious section. If this is notthe case, the conditions can be taken from the previous section. The next steps comprise theestablishment of the normal behaviour (retrieval of a displacement scenario or, if no properscenario can be found, the determination of the expected behaviour) and the comparison ofboth, the actual and the normal displacement characteristic. If both do not differ substan-tially, the observed displacement behaviour is defined as normal and the tunnel performs asexpected. If all scenarios differ substantially, the displacement trends (see Figure 2) mustbe analysed following the procedure shown in Figure 6 in order to evaluate the observeddisplacement behaviour in more detail. This particular evaluation is done by a systematicextraction and summary of the observed trends in the so called ‘input vector’ (see Figure 7).The comparison of this ‘input vector’ with the basic trend types listed in the correlationmatrix allows to identify the geotechnical situation whichfits best to the trends observed.Figure 7 shows an example where the single trends of the basictrend type 3.1 significantly(90 %) correlate with the observed trends. Thus, type 3.1 marks a very probable geotechni-cal situation for the observed behaviour which no longer hasto be regarded abnormal. Alsoother situations (e.g. type 3.2 with 80% correlation) are probable as their correlation is still


Figure 6: Procedure for the trend analysis utilizing the displacement trend catalogue; greyareas indicate all components accessing the knowledge-based library.

higher than 75%. But if no geotechnical situation can be correlated with sufficient signifi-cance, the status of the observed behaviour is definitely abnormal and needs special furthertreatment.The decision support application has been developed under C# and the Microsoft .NETframework. It is implemented as a software module in the tunnel information system Kro-nos of Geodata where a user is guided through the subsequent evaluation procedures of themodules 1-3 by help of a user friendly wizard technique (Figure 8).

3 Virtual reality (VR) visualisation system

To visualise the data of the decision support system in an advanced style, a VR visualisa-tion system has been developed in cooperation with VRVIS (center of Virtual Reality andVisualisation, Vienna). It allows to render a virtual tunnel, to navigate through it and to inter-actively animate the data of a displacement behaviour case in time and space. The particularbenefit of the system is that it presents a case much more vividthan other systems do becauseit allows the dynamic and complete visual perception of all the data of a case simultaneously.As a consequence, the quality of data interpretation is improved. The system hardware con-sists of standard VR presentation equipment like a large wall mounted silver screen, a pairof video beamers, polfilter glasses and a standard desktop computer. The system creates astereoscopic 3d visualisation (Figure 9) that can be viewedby multiple users in a normalmeeting room. It is to note that it shows the content necessary for interpretation but not animpressive virtual world without use.


Figure 7: Example of a trend evaluation utilizing the correlation matrix as stored in theknowledge-based library as displacement trend catalogue.

Figure 8: Kronos user interface displaying observed displacements and displacements fromthe knowledge-based library in one diagram to validate the output of the decision supportsystem.


Figure 9: Virtual reality presentation scene showing an expert presenting data of a displace-ment behaviour case.

4 Conclusion

The presented development integrates geotechnical, geodetic and informatics know-how inan interdisciplinary approach in order to achieve an advanced evaluation and interpretationtool supporting geotechnical engineers on NATM tunnel sites. In a line with earlier de-velopments (e.g. (Chmelina, 2000; Chmelina and Kahmen, 2003)) it is a further approachto introduce AI-methods (Artificial Intelligence) in underground construction which is seenmore and more necessary to meet today‘s challenges. The development has taken placein the Tunconstruct project (Tunconstruct, 2009) that has been co-funded by the EuropeanCommission under its sixth framework program which is highly acknowledged. It provesthat interdisciplinary thinking can open new working fieldsto geodesists. The developedsystem is in prototype stage and has not been tested in tunnelling practice so far. It needsfurther refinement and, especially, it is necessary to further feed the knowledge-based librarywith displacement behaviour cases as, up to now, only some very characteristic cases areimplemented. Refinement and knowledge acquisition will be addressed by a future researchproject.

References

Chmelina, K.: Knowledge Based Analysis of 3-D Displacements in Tunnelling. Proceed-ings of the 5th Conference on Optical 3D-Measuring Techniques, Vienna, Oct. 1-4, 2001,pp. 292-298, 2001.

Chmelina, K. and Kahmen, H.: Combined Evaluation of Geodetic and Geotechnical Dataduring Tunnel Excavation by Use of a Knowledge–Based System. A Window of the Futureof Geodesy, Fernando Sanso, Proceedings of the IAG General Assembly, Sapporo, Japan,2003, Verlag Springer, pp. 105-110, 2003.


EN 1997-1, Eurocode 7, Geotechnical Design – Part 1: GeneralRules, 2004.

Grossauer, K.: A method for the Application of the KnowledgeBased Library to Real ProjectData. Tunconstruct Deliverable D3.2.4.4 (confidential), Graz, 2008.

Grossauer, K. and Lenz, G.: Is it Possible to Automate the Interpretation of DisplacementMonitoring Data? Felsbau 25 (2007), No. 5, pp. 99-106, 2007.

Grossauer, K. and Schubert, W.: Analysis of Tunnel Displacements for the GeotechnicalShort Term Prediction. Geomechanik und Tunnelbau 1 (2008),No. 5, pp. 477-485, 2008.

Lenz, G.: Displacement Monitoring in Tunnelling – Development of a Semiautomatic Eval-uation System. Diploma Thesis, Graz University of Technology, 2007.

OEGG, Guideline for the Geotechnical Design of UndergroundStructures with Cyclic Ex-cavation. 2nd Revised Edition, The Austrian Society for Geomechanics, Salzburg, 2008.

Schubert, W. and Steindorfer, A.: Selective Displacement Monitoring during Tunnel Exca-vation. Felsbau 14 (1996), No. 2: pp. 93-97, Essen: VGE, 1996.

Schubert, W.: Displacement Monitoring in Tunnels – an Overview. Felsbau 20 (2002), No. 2,pp. 7-15, 2002.

Sellner, P. and Grossauer, K.: Prediction of Displacementsfor Tunnels. Felsbau 20 (2002),No. 2, pp. 22-28, 2002.

Steindorfer, A. F.: Short Term Prediction of Rock Mass Behaviour in Tunnelling by Ad-vanced Analysis of Displacement Monitoring Data. PhD Thesis, Graz University of Tech-nology, 1996.

Terzaghi, K. and Peck, R. B.: Soil Mechanics in Engineering Practice. John Wiley and Sons,New York, 1948.

Tunconstruct: IP Integrated Project Funded by the EuropeanCommission under its 6thFramework Programme, http://www.tunconstruct.org, 2009.

70 First Workshop on Application of Artificial Intelligenceand Innovations in Engineering Geodesy

Statistical Interpolation - Introduction intoKriging Methods

Dagmar Söndgerath

Research Group of Environmental System AnalysisInstitute of Geoecology

TU Braunschweigemail: [email protected]

An introduction into the geostatistical interpolation technique called kriging will be given.This technique is named after D. G. Krige, a South African mining engineer and nowadaysprofessor at the University of Witwatersrand. The theory behind was formalized in the 1960’sby G. Matheron, a French mathematician at the Paris School ofMines in Fontainebleau.Common to all (whatever non-statistical or statistical) interpolation techniques is that thevalue of a random field at unobserved locations will be evaluated from given observations.Well known interpolation methods like Thiessen polygons ortriangulation methods do notuse any spatial correlation of the data points. Inverse distance methods evaluate a weightedsum of the data points with with proportional to distance to the point being estimated.Kriging is a similar method but with weights deduced from thecovariance function of theunderlying stochastic process. The major benefit of this method is that it yields not onlythe interpolated surface but also the estimation error, i.e. a measure of the accuracy of theprediction. This property makes kriging superior to all other interpolation methods.Kriging is the last step in a geostatistical analysis which in addition includes the calculationand modeling of the covariance function (or the variogram function which is closely related).These steps will be briefly introduced and then the focus willbe laid on different krigingtechniques, like ordinary, universal or co-kriging. Finally some remarks concerning technicalrealization will be given.

Keywords: Kriging, Stochastic Process, Geostatistical Analysis.


Support Vector Machines – Theoretical Overviewand First Practical Steps

Michael Heinert

Institut für Geodäsie und PhotogrammetrieTechnische Universität Braunschweig

e-mail: [email protected]

Abstract

Beside a lot of established learning algorithms, the support vector machines offer a highmodel capacity, whereas the well-known over-fitting problem of all modelling techniques –either they are parametrical or non-parametrical – can be avoided. Originally designed forpattern recognition, the support vector machines are now improved to solve interpolations,extrapolations and non-linear multiple regressions on hybrid data. Accordingly, support vec-tor machines have already successfully been used for many geodetic purposes, e.g. for land-slide modelling or velocity field interpolation. Within this paper, a theoretical overview willbe given as well as some "cooking recipe" how to realise small examples in MSr-Excel.Such working sheets offer a good insight, what is going on within these computational tech-niques.

Keywords: Support Vector Machines, Statistical Learning Theory, MSr-Excel-Exercises

1 Introduction

8Within the last years, the Support Vector Machines (SVM) have successfully been used forseveral geodetical purposes. On the one hand, one can find – by the way, widely unknown –the SVMs within several programmes, that are used by geodesists. Especially they are addedin as classification tools for geo-information systems or for remote sensing. On the otherhand, they were tested as velocity field interpolation algorithms in recent tectonics and landslide detection (Riedel and Heinert, 2008; Heinert and Riedel, 2010).Accordingly, it seems to be worthwhile and, maybe, even enjoyable to enter a new field ofalgorithms. Due to the fact, that the SVMs are developed for pattern recognition purposesfirst and afterwards extended on non-linear regression, it is suitable to make initially a detourthrough the pattern recognition.A main interior idea of these algorithms is the implicit transform of the patterns, given bya n-dimensional input data and a skalar output, into a high-dimensional feature spacex 7→

x = Φ(x). Usually, the dimension of this Kernel-Hilbert spaceH is significantly higher thanthe pattern vector dimensionn + 1. This transform into such a space of higher order allowsthe linear separation of the patterns that are originally linear not separable arranged in thedata spaceX . The decision surface is built by a hyperplane.This mappingΦ : X → H leads to an algorithm which does not depend on basic decisions of


the user: there have neither centres of radial basis functions to be chosen nor the architectureof a neural network to be set up nor a number of linguistic fuzzy classes a priori to be fixed.This is unique among the supervised learning algorithms.

2 Support Vector Machine for linearly separable patterns

The easiest case of a SVM is the separation of linearly separable patterns. Accordingly, thisexample is suitable to be explained in detail.In this first case the mappingPhi : X → H is trivial thatx 7→ x = Φ(x). The data spaceand the feature space are identical.

2.1 Geometrical approach

The decision surface that shatters the two classes of patterns can be described by the defini-tion of the hyperplane with a main point and a normal vector

wT(x − x0) = 0 (1)

or easier by

wTx + b = 0 mit b = −wTx0 (2)

as it can be found in Haykin (1999, p. 319). The separation of the patterns[xi, yi] is reachedby the application ofxi into the indicator function given by Eq. (1) and yields

wTxi + b ≥ 0 ∀ yi = +1 (3)

wTxi + b < 0 ∀ yi = −1. (4)

A positive or negativeyi assigns a pattern to its class according to the present position of thehyperplane. This means, such patterns with positiveyi are situated on the opposite side ofthe origin referring to the hyperplane, such with a negativeyi on the same side. At that statethe parametersw andb are not optimal yet, what leads to several wrong assignments.Let us assume, that the hyperplane already reached an approximate optimal position and nofaulty assignment can be found anymore. To create a margin ofseparation between differ-ently assigned patterns[xi, yi], we postulate:

wTxi + b ≥ +1 ∀ yi = +1 (5)

wTxi + b ≤ −1 ∀ yi = −1. (6)

Note, that patterns[xi, yi] for which

−1 < wTxi + b < +1, (7)

are situated inside of the present margin. Accordingly its width has to be fit as well.The product of each of the both inequalities (5) and (6) with the corresponding assignmentyi yields

yi

(wTxi + b

)≥ 1. (8)

Remember Eq. (1) that we may write

yi

(wTxi − wTx0

)≥ 1. (9)


wopt

woptxi ||wopt||

wopt ||wopt||

x0 xi

||wopt||-1 -bopt

||wopt||

woptx0 + b = –1

woptx0 + b = +1

woptx0 + b = 0

a

b

c

Figure 1: SVM for linearly separable patterns:x0 andwopt define the position and directionof the hyperplane. The margin is fixed by the position of the support vectors on both sidesof the hyperplane.

Using this expression it can be shown that the only variable part can be found in the normalvectorw, whereas the training patterns[xi, yi] are given and therefore constant. Note, thatthe variable vectorx0 only describes the position vector of the hyperplane, but does notinfluence the width of the margin either. Accordingly, them-dimensional normal vectorwith its Euclidean norm

||w|| =(wTw

) 12 =

√w2

1+ w2

2+ . . .+ w2

m(10)

can be normalized to the unit vector

w0 =w

||w||

. (11)

Therefore, both sides of the equation (9) have to be divided by ||w||. In the resulting expres-sion

yi

(wT

||w||

xi −

b

||w||

)≥

1

||w||

(12)

we can find Euclidean distances. We can state, thatwT

0xi means the projection of a single

pattern vectorxi onto the unit normal vectorw0. This corresponds with the distance ofxi tothat plane through the origin that is parallel to the wanted hyperplane (Fig. 1). The qoutient−b · ||w||

−1 is the projection of the hyperplane’s position vectorx0 onto the unit normalvectorw0 and is equivalent to the distance hyperplane – origin. The difference of the twoterms denotes the perpendicular distance||w||

−1 of an arbitrary patternxi to the hyperplane.Accordingly, in Eq. (12) we recognize the Hessian normal form of the hyperplane. Herein,the distance of the patterns on the margin is given as

∣∣wT

0xi − b||w||

−1∣∣ = ||w||

−1. (13)

This expression makes clear, that the margin’s width2 · ||w||−1 is reciprocal to the length of

the normal vector. Note, that this width is the double distance between the hyperplane andone of the symmetrical margin’s edges. If thus a short normalvector leads on a wide margin,


then is||w|| to minimise to get the margin’s maximisation. Therefore, minimise the quadriccost function

Φ(w) =1

2||w||

2=

1

2wTw. (14)

The geometrical meaning of this instruction is as follows: the hyperplane’s distance to thepatterns[xi, yi] has to increase more and more. Unfortunately, the hyperplane doesn’t dothis by separating the patterns, but it departs from all patterns equally and furthermore itleaves the space in between the classes (Fig. 2a). Actually,the normal vector becomes – asdemanded – infinitesimal small and, at its limit, it degenerates even to a point. This meansthat the hyperplane rotates in infinite distance around the centroid of all patterns.More constraints are necessary to force the hyperplane intothe space between the bothclasses. It has therefore to be demanded, that all perpendicular distances to the positiveas well as to the negative edge of the margin

∑

i

di =

N∑

i=0

yi

(wT

||w||

xi −

b

||w||

)−

1

||w||

(15)

are to be maximized together. Each condition is only satisfied if a) the labellingyi is correct,because otherwise the result would bedi ≤ −1, and if b) the margin has been left, becauseotherwise the result would be−1 < di < 0 instead. A single condition

di = yi

(wTxi − b

)− 1 ≥ 0 (16)

can be re-formulated without the normalization by the normal unit vector. All these condi-tions together force the hyperplane between the two classes[xi, y

+

i] and[xi, y

−

i]. But now the

hyperplane is able to near the patterns of one class while themargin becomes infinitisimalsmall, because of an increasing Euclidean norm||w|| (Fig. 2b).Note that we have to formulate a dual problem. Accordingly, each condition

αi

(yi

(wTxi − b

)− 1)

= 0 (17)

gets a pre-factor, namely the Lagrange multiplier

αi ≥ 0. (18)

W0

w

w'

w'''

w''

a

b

c

Figure 2: a) The minimisation ofwTw forces the hyperplane away from all patterns. b) Amaximisation of allyi

(wTxi − b

)− 1 pushes the hyperplane in the direction of the smaller

group of samples and the margin vanishes. c) Both conditionstogether achieve an optimalresult.


Such weights the condition to avoid that all patterns participate equally in the solution. Theboundary conditionαi ≥ 0 repulse patterns to be wrongly labelled. Inevitably, a negativeαi

would implicitly assign a pattern to be in the opposite class(Schölkopf et al., 2007, p. 12).The equations (16), (17) and (18) are the so-called Karush-Kuhn-Tucker (KKT) conditions(Burges, 1998, p. 131). The are generally necessary for the solution of constrained non-linearoptimizationes (Kuhn and Tucker, 1951; Hillier and Liebermann, 2002).Now we can construct a Lagrangean function

L(w, b,ααα) =1

2||w||

2

−

N∑

i=0

αi

(yi

(wTxi − b

)− 1). (19)

This function has to be optimised with respect to the normal vectorw and the biasb, that wemay write for the saddle point

∂L(w, b,ααα)

∂w= 0 und

∂L(w, b,ααα)

∂b= 0. (20)

The differentiation ofL(w, b,ααα) with respect tow yields

∂L(w, b,ααα)

∂w=

1

2· 2w −

N∑

i=0

αiyixi

!= 0, (21)

what is equal to

w =

N∑

i=0

αiyixi. (22)

So, the computation ofw is the result of the sum over the weighted products of the input xi

and outputyi of each pattern. Each product can be seen as the un-normalized covariance ofxi andyi.The differentiation ofL(w, b,ααα) with respect tob yields

∂L(w, b,ααα)

∂b= −

N∑

i=0

αiyi

!= 0, (23)

what is equal to

N∑

i=0

αiyi = 0 (24)

and has got a special meaning, that we may conclude that

∑

i

α+

iy+

i

︸︷︷︸∀i=0...n+

= −

∑

ι

α−

ιy−

ι

︸︷︷︸∀ι=0...n−

. (25)

The numbern+ andn− of patterns on each side of the margin is therefore irrelevant.


The expansion of the Lagrangean function (19)

L(w, b,ααα) =1

2wTw

−

N∑

i=0

αiyiwTxi − b

N∑

i=0

αiyi

+

N∑

i=0

αi (26)

can be expanded – term by term – once more using the optimalityconditions which are yieldby the derivations (22) and (24). The third term of the Lagrangean function vanishes byvirtue of the condition (24):

b

N∑

i=0

αiyi = 0. (27)

The right-hand side of the other optimality condition (22) can be used in the Lagrangeanfunction instead ofw that

L(w, b,ααα) =1

2

N∑

i=0

αiyixi ·

N∑

j=0

αjyjxj

−

N∑

i=0

αiyixi ·

N∑

j=0

αjyjxj

+

N∑

i=0

αi. (28)

2.2 Solution for linearly separable patterns

Eq. (28) contains the optimality conditions, given by the minimisation with respect towandb. Now we are looking for the saddle point that can be expressedby an optimal set ofLagrange multipliersαi. Accordingly, we reformulate the objective function as to be themaximisation of

Q(α) =

N∑

i=1

αi −

1

2

N∑

(i,j)=1

αiαjyiyjxT

ixj (29)

subject to the constraints

N∑

i=1

αiyi = 0 (30)

and

αi ≥ 0 ∀ i = 1 . . . N. (31)

Herein, all conditions (17) have to be maximized (Haykin, 1999, p. 322).Several well-known algorithms are available for the necessary non-linear optimization (Dom-schke and Drexl, 2002; Grundmann, 2002; Hillier and Liebermann, 2002; Rardin, 1998).


Whilst their description could fill another paper of this length, it is worth noting that thesemethods are comparable in the quality of their results. Nevertheless, their are quite differentin the computing speed. Recently, it is recommended to maximize the object function usingsequential minimal optimization (SMO) (Platt, 1998, 1999).Suppose that the object functionQ(α) has been maximized with respect to (30) and (31),then the optimal weight vector is given as

wopt =

N∑

i=1

αopt,iyixi. (32)

Most of the optimal Lagrange multipliersαopt,i will be zero, so that only a few patternscontribute to the summation. Accordingly, the summation can be reduced to

wopt =

N(s)∑

ι=1

αopt,ιy(s)

ιx(s)

ι∀αopt,ι > 0. (33)

This number ofN (s) patterns[x(s)

ι , y(s)

ι ] are called thesupport vectors. They form a subsetof patterns laying on the margin and determine the position and direction of the hyperplane.The optimalbiasof this hyperplane is given by

bopt = y(s)

ι− wT

optx(s)

ι(34)

using an arbitrary support vector. Analogue we may write

[x(s)

ι, y(s)

ι] = [xi, yi] ∀αi > 0. (35)

2.3 SVM for linearly non-separable patterns

Note that it could be necessary to allow false classifications, especially, if the given set ofpatterns is a priori linearly non-separable. The sharp classification by a hyperplane can berelaxed that way that patterns may fall on the wrong side of the decision surface. To constructthat kind of a soft margin we introduceslack variablesξi ≥ 0. Accordingly, the classifiercan be extended like

yi(wTxi + b) ≥ 1 − ξi ∀ i = 1 . . . N. (36)

Such new defined variablesξi describe the empirical risk of a wrong classification. Threecases are possible:

• 0 < ξi ≤ 1: The pattern falls inside the region of separation, but still on the right sideof the decision surface,

• 1 < ξi ≤ 2: The pattern falls inside the region of separation on the wrong side of thedecision surface,

• ξi > 2: The pattern falls into the wrong class.

The indicator function that has to be constructed is non-convex. The related loss function

Φ(w, ξ) =1

2wTw + C

N∑

i=1

ξi (37)


is expanded by a second term, as approximation of the non-convex indicator function (Haykin,1999, p. 327).The object function of this classifier (29) stays the same. The derivations with respect to∂w,∂b and∂ξξξ together with the KKT conditions (Burges, 1998, p. 136) leadto

0 ≤ αi ≤ C ∀ i = 1 . . .N (38)

instead of the second constraint following Eq. (31). Surprisingly, the slack variablesξi donot occur explicitly. Furthermore they are implicitly defined by the choice ofC. The inputparameterC acts as trade-off controller between the machine’s complexity and the num-ber of non-separable patterns: with an increasingC the complexity increases as well. Soit is upon the user to determine heuristically an optimal balance between complexity andmisclassification.To manage this, the algorithm has got two basic control outputs beside the optimised empiri-cal risk: at first, the number of support vectors and at secondthe VC-dimension of the model(Sect. 4). The computation of the weight vectorwopt is the same as for linearly separablepatterns (33). However, to compute thebiasb the Eq. (34) has to be used for all support vec-tors. The result of the optimal bias is given by the mean value. The positive slack variablesare finally given by

ξi = 1 − yi(wTxi + b) ∀ξi > 0. (39)

The size ofC is not easy to interpret. Therefore, it is possible as well topredefine a positivemaximum forξi – for what the user has got a better imagination – andC has to be optimised.

2.4 Linear SV pattern recognition using MSr-Excel

An example in MSr-Excel allows to demonstrate how a SVM works (Fig. 3). Withinthefirst two columns

[A3:A14],[B3:B14]

there are put the training patterns[xi, yi] with a two-dimensional inputxi and the a prioriknown binary classification(y+, y−) in column

[C3:C14]

as output. The two following columns

[D3] =IF(K3>0;B3;-999),[E3] =IF(K3<0;B3;-999)

are only necessary for the graphics. The equations are only demonstrated for the first row,but they can easily be copied into the cells beneath.In columnF there are stored the Lagrange multipliersαi:

[F3] =α1

...[F14] =α12.


Figure 3: The linear SVM under MSr-Excel.

Before the computation theseαi are set to zero or small random values.MSr-Excel contains the add-in programme SOLVER which is designed for different opti-mization purposes. Running this add-in, these Lagrange multiplicators are assigned to bethe "changing cells". The next column contains the monomialsαix1,iyi andαix2,iyi of theobject function:

[G3] =A3*C3*F3,[H3] =B3*C3*F3.

These and the contents of the other columns can in the easiestway summed up in the firstrow:

[F1] =SUM(F3:F100)...

[K1] =SUM(K3:K100).

By using the dot product〈xi,xj〉 like

[I3] =G3*G$1,[J3] =H3*H$1

the cell references become quite easy. Accordingly, the sumof all monomials can be multi-plied by each single monomial. The products of all Lagrange multipliers and the homologousclassification outputs

[K3] =C3*F3.

have to be summed up in[K1]. Following the KKT condition in Eq. (24), this cell has to bezero (Kuhn and Tucker, 1951; Burges, 1998). The next columnsdeliver the inputs of theidentified support vectorsx(s)

i:

[L3] =IF(F3>0;A3;0),[M3] =IF(F3>0;B3;0)

They can be used to determine thebias, what means the distance between the hyperplaneand the origin. The following condition

[N3] =IF(F3>0;(C3–(G$1*L3+H$1*M3));" ")

ensures, that this computation is carried out only for support vectors. In the last column thatdepends on the patterns one can find the slack variablesξi


[O3] =IF((1–C3*(A3*$G$1+B3*$H$1+$R$7))>0;(1–C3*(A3*$G$1+B3*$H$1+$R$7));" "),

which are only of interest, if they are positiv.In a separate block of cells, firstly, the complexity parameterC will be given

[R2] =C,

that controls the size of the slack variables. In the case of linearly separable patterns, thevalue ofC is does not matter, because there are noξi to control.The complete object functionQ(α) is computed in the target cell

[R4] =F1–1/2*SUM(I1:J1).

This cell will be used by the add-in SOLVER as "Set Target Cell", when the optimizationhas to be started. The next cells

[R5] =-G1/H$1,[R6] =-R7/H$1

supply the parametersm,X0 which are necessary to depict the decision surface in a graphic.The y-interceptX0 depends on the bias. In the case of linearly separable patterns a singlecomputation ofb with an arbitrary support vector is sufficient. In the other case that evenone pattern violates this separability condition, it is necessary to compute the average usingall support vectors

[R7] =AVERAGE(N3:N53)

The next block of cells

[Q10] =0,[Q11] =R6,[R10] =MAX(A3:A14),[R11] =R6+Q11*R5,[S10] =-(R7–1)/H1,[S11] =S10+Q11*R5,[T10] =-(R7+1)/H1,[T11] =T10+Q11*R5

contains the functions for the depiction of the decision surface and its margin. After arrang-ing all the cells and the necessary references the optimization may be started.To solve the object function within this example the following inputs have to be filled in intothe SOLVER window:

Set Target Cell $R$4Equal to MaxBy Changing Cells $F$3:$F$14Subject to the constraints $F$3:$F$14 <= $R$2

$F$3:$F$14 >= 0$K$1=0

as it can be seen in figure 9.


2.5 Results

The classification of linear separable patterns with an one-dimensional inputxi and a scalaroutputyi is by far the easiest: high values for the parameterC lead to three support vectorson the margin between the classes. The margin itself is empty(Fig. 4a). Higher values forC are without any impact. This is comparable to a solution without slack variablesξi. Inthe chosen example (Fig. 4) they all vanish forC ≥ 2

−

12 . Taking smaller values instead, the

valid support vectors get slack variablesξi ≥ 0. Nevertheless, the hyperplane remains thesame, although the margin increases. But with further decreasing values forC the solutionchanges. There are added more support vectors and the position and direction of the decisionsurface starts changing (Fig. 4b).In a case of a priori linear non-separable patterns, different patterns get directly positive slackvariables. With a high value forC the margin vanishes (Fig. 4c). Note that a margin is theresult of a sufficient smallC. The smallerC and the wider the according margin, the moresupport vectors participate in the solution. For infinitesimalC all patterns taking part in thesolution. Accordingly, the two centroids of the patterns determine the hyperplane.

0

1 2

3 4

5

6 7

8 9

10

0 1 2 3 4 5 6 7 8 9 10

C ⇒ ∞ ξι

(s) = 0 ∀ι Stütz-vektoren

Trenn-bereich

a

0

1 2

3 4

5

6 7

8 9

10

0 1 2 3 4 5 6 7 8 9 10

C = 0,5 ξι

(s) ≥ 0

ξ1 ξ2

b

0

1 2

3 4

5

6 7

8 9

10

0 1 2 3 4 5 6 7 8 9 10

C = 0,5 ξι

(s) ≥ 0 d

0

1 2

3 4

5

6 7

8 9

10

0 1 2 3 4 5 6 7 8 9 10

C ⇒ ∞ ξι

(s) ≥ 0

ξ3

ξ1

c

Figure 4: Linear SV pattern recognition: the result for linear separable patterns in depen-dence of a maximum or small value forC (a and b) as well as the comparable result linearnon-separable patterns (c and d)

3 Nonlinear Support Vector Regression

3.1 Kernels

It would be quite costly mapping big amounts of data explicitly into a space of higher orderto look for an optimal hyperplane in there. Accordingly, instead of transforming the data into


the feature spaceH, a kernel function is used within the data space. Such a kernel representsthe hyperplane from the feature space. This kernel re-transforms the hyperplane into the dataspaceX , what is called "the kernel trick".Such a continuous symmetric function

K(xi, xj) =

∞∑

ι=1

λιϕι(xi)ϕι(xj) (40)

with the eigenvectorsϕ(x) and the positive eigenvaluesλ is defined within the closed intervala ≤ xi,xj ≤ b. A kernel function has to converge continuously and totallyin the objectfunctionQ( · ) to be suitable for a SVM. Therefore this function has to be positive definite.According to Mercer (1909, p. 442) the kernelK(xi, xj) is positive definite iff

b∫

a

b∫

a

K(xi, xj)ψ(xi)ψ(xj)∂xi∂xj ≥ 0. (41)

The functionsψ(x) have to be square integrable:

b∫

a

ψ2(x)∂x <∞. (42)

But Mercer’s condition does not describe how to construct a kernel function. It only answersthe question whether an already chosen function can be a suitable kernel (Haykin, 1999,p. 332). Among these functions one can find the cross-correlation function as well as theweighted sum of a neuron within an artificial neural network (ANN) or the radial basisfunction (RBF) of the Euclidean norm.Let us revisit once more the object function (29) with its Lagrangean multiplicators:

Q(α) =

N∑

i=1

αi −

1

2

N∑

(i,j)=1

αiαjyiyj〈xi,xj〉. (43)

According to the non-linear mapping, a transform of the patterns into the feature space wouldbe

Q(α) =

N∑

i=1

αi −

1

2

N∑

(i,j)=1

αiyiΦ(xi) · αjyjΦ(xj), (44)

what we rejected because of the computational costs. Let

K(xi, xj) = Φ(xi) · Φ(xj) (45)

be a function that is a suitable kernel, then the data transform will be done implicitly. Akernel is said to be suitable if it e.g. contains the dot product 〈xi,xj〉 of the pattern inputsxi.Even the dot product〈xi,xj〉 itself is a suitable kernel that we Eq. (29) may rewrite as

Q(α) =

N∑

i=1

αi −

1

2

N∑

(i,j)=1

αiαjyiyjK(xi, xj), (46)

but now with an arbitrary suitable kernelK(xi, xj).


The – now repeatedly mentioned – dot product kernel

K(xi, xj) = 〈xi,xj〉 = xi · xj (47)

yields good results first of all for linearly separable patterns. Even if there exist some mis-classifications because of stochastic errors, this kernel can be used successfully. A basicimprovement is given by the polynomial kernel (Schölkopf and Smola, 2001, p. 45)

K(xi, xj) = (〈xi,xj〉)d (48)

respectively the inhomogenious polynomial kernel

K(xi, xj) = (〈xi,xj〉 + 1)d. (49)

The latter allows in its quadratic form – accordinglyd = 2 – to solve the classical XOR-problem (Haykin, 1999, p. 355). A commonly used kernel is theGaussian or the extendedRBF kernel

K(xi, xj) = e(−γ‖xi−xj‖). (50)

Principally, the radial basis functions are widely used in interpolations, fuzzy clusteringmethods or neural RBF networks for modelling purposes. So, they are quite interestingfor SVMs as well. Another "import" from neural networks is the neural kernel

K(xi, xj) = tanh(a〈xi,xj〉 + b), (51)

that can be defined with other activation functions instead of the hyperbolic tangent like e.g.sigmoid functions. The by far most flexible algorithm is the so-called ANOVA-kernel

KD(xi, xj) =

∑

1≤i1<...<iD≤N

(D∏

d=1

Kid(xid

,xjd)

). (52)

This allowsn-dimensional regressions, especially under the condition, that theκ elementsof the input vector(xi)κ have got quite different value distributions.All introduced kernels are characterized by the fact, that they are arranged around dot product〈xi,xj〉. The RBF kernels are the only exception. The are constructedwith the Euclideannorm around the dot product of the pattern distances〈xi−xj ,xi−xj〉 (Schölkopf and Smola,2001, p. 46).

3.2 Geometrical approach

Finally, our new knowledge from the pattern recognition andkernel functions can be trans-ferred to regression models

yi = f(xi) + νi = wTxi + b+ νi. (53)

Consider the variableyi that is a summation of a deterministic partf(xi) and stochastic noiseνi. For regression purposes the meaning of the hyperplane has to be re-defined: our hyper-plane is not a decision surface anymore, because now it creates a optimal model approxi-mation. Accordingly, all patterns which fall into the margin between−ε . . . ε on both sidesof the hyperplane, do not contribute to the object function anymore (Fig. 5). Furthermore,the patterns got non-negative slack variables, if they falloutside the margin. This results


in a so-calledε-insensitive loss function (Haykin, 1999, Ch. 6.7). Consider once more theminimisation (37) in the form

Φ(w, ξ, ξ∗) =1

2wTw + C

N∑

i=1

(ξi + ξ∗i), (54)

but here with two sets of non-negative slack variablesξi, ξ∗

i, with respect to

wTx + b

≥ yi − ε− ξ∗

i

≤ yi + ε+ ξi

∀ i = 1 . . .N (55)

under the condition

ξi, ξ∗

i≥ 0 ∀ i = 1 . . . N

can equally be solved by maximisation of the object function

Q(αi, α∗

i) = −ε

N∑

i=1

(αi + α∗

i)

+

N∑

i=1

yi(αi − α∗

i)

−

1

2

N∑

(i,j)=1

(αi − α∗

i)(αj − α∗

j)K(xi, xj) (56)

under the condition

N∑

i=1

αi =

N∑

i=1

α∗

i, (57)

and

0 ≤ αi, α∗

i≤ C ∀ i = 1 . . .N. (58)

ε −ε

Ψε (yi , f(xi))

yi − f(xi) ξi ξi

∗

Figure 5: Loss function of the SV-regression: inside the2εwide insensitive area the residualsare zero, outside the patterns got positive or negative slack variables that contribute to theobject function.


During the model set-up the parametersε, respectively+ξi and−ξ∗i

have to be adapted to-gether withC that kind that the number of support vectors and the complexity of the model,namely the VC dimension (Sect. 4) are as small as possible.

The solution of the regression for arbitrary inputsx is given by

f(x) =

N∑

i=1

(αi − α∗

i)K(x,xi) + b. (59)

In a SV regression the computation of the biasb is quite specific problem (Smola andSchölkopf, 2004, Ch. 1.4) and cannot be solved directly.

3.3 Nonlinear SV-Regression using MSr-Excel

Similar to the first example, one can find in the first two columns

[A3:A14],[B3:B14]

the patterns[xi, yi], containing twelve one-dimensional input vectorsxi and one scalar out-putyi. The cells

[C3] =IF(U3<>" ";B3;-999),[D3] =IF(V3<>" ";B3;-999)

of the next columns are designed once more for graphical purposes. They enable the depic-tion of the support vectors. Much more of importance for the algorithm are the two Lagrangemultipliersαi, α

∗

i, that are stored in

[E3] =α1 [F3] =α∗

1

... and...

[E14] =α12 [F14] =α∗

12.

These are the changeable cells, that will be used by the add-in SOLVER. The sum of and thedifference between the two corresponding Lagrange multipliersαi andalpha∗

i

[G3] =E3+F3,[H3] =E3–F3

are part of the object function. The sums

[E1] =SUM(E3:E100)...

[J1] =SUM(J3:J100).

fulfil several tasks of the solution. Especially the sum of all Lagrange multipliers has to bezero to satisfy the KKT condition (Heinert, 2010, Ch. 5). Within the cells of the followingcolumns

[I3] =B3*H3,[J3] =A3*H3


Figure 6: A non-linear SV-regression under MSr-Excel.

one can find provisional results of the object function. Out of the typical view of an MSr-Excel table, the cells with the matrix elements of the main diagonal

[AD3] =1/2*AD$1*$H3. . .

[AO14] =1/2*AO$1*$H14

and the upper triangular

[AE3] =EXP(-$AB$2*ABS(AE$2–$A3))*AE$1*$H3. . .

[AO3] =EXP(-$AB$2*ABS(AO$2–$A3))*AO$1*$H3. . .

...[AO13] =EXP(-$AB$2*ABS(AO$2–$A13))*AO$1*$H13.

contain the RBF-kernels. These have to substitute the programme loops of the summationthat way. The elements of this matrix are summed up to get the present value of the objectfunctionQ(α, α∗

). For reasons of the clarity we may place the transposed vectorsxi and∆α

at the disposal of the computation of the matrix

[AD1:AO1] =TRANSPOSE(H3:H14),[AD2:AO2] =TRANSPOSE(A3:A14)

and it is with

[AQ1] =AD1.. .

[BB2] =AO2

positioned above the second matrix


[AQ3] =EXP(-$AB$2*ABS(AQ$2–$A3))*AQ$1. . .

.... . .

.... . .

[BB14] =EXP(-$AB$2*ABS(BB$2–$A14))*BB$1.

This is needed for the computation of the regression function at the positions of the givenpatterns(xi, yi). The first approximation of the regression for an arbitrary input x is thesummation over the column

[K3] =SUM(AQ3:BB3).

The converged object function delivers support vectors, whichx(s)

iandy(s)

istored in the cells

of the columns

[P3] =IF(G3>0;A3;" "),[Q3] =IF(G3>0;B3;" ").

In the present example, the full solution of the regression is built by the median

[R1] =MEDIAN(R3:R100)

of the singularly computed valuesbi of each support vector

[R3] =IF(E3>0;(Q3-K3);IF(F3>0;(Q3–K3);" "))

to yield a first approximated bias. Now, the solution for the regression with the biasb can beupdated in

[L3] =K3+$R$1.

Accordingly, we got a first approximation for the residuals in

[S3] =B3–L3,

and for the slack variablesξi, ξ∗i in

[T3] =IF(E3>0;($S3–$AB$3);" "),[U3] =IF(F3>0;(-$S3–$AB$3);" ").

In this computational state, they are quite often negative.This is an unacceptable violationof the KKT conditionξi, ξ∗i ≥ 0. Accordingly, we put once more an updated solution of theregression in

[M3] =K3+$AB$7.

Only for graphical purposes are the following cells in the columnsN andO:

[N3] =M3+$AB$3,[O3] =M3–$AB$3,


which contain the limits of theε-insensitive object function. Using the optimal solution wecan store the residuals and their squares in columnM:

[V3] =B3–M3,[W3] =S3 2.

Accordingly, we can compute in

[X3] =IF(E3>0;($V3–$AB$3);" "),[Y3] =IF(F3>0;(-$V3–$AB$3);" ")

the optimal slack variablesξi, ξ∗i . Very rarely one can find negative values here. In these rarecases a further iteration is necessary to get the optimal bias.The final solution is based on the control inputs by the user inthe cells

[AB2] =γ,[AB3] =ε,[AB4] =C.

The next block of cells

[AB6] =-AB3*G1+I1–SUM(AD3:AO14),[AB7] =R1–MIN(MIN(U3:U14);0)

+MIN(MIN(T3:T14);0),[AB8] =J1,[AB9] =(AVERAGE(W3:W14)*12/11) (1/2)

contains the output parameters: present value of the converged object functionQ(α, α∗

),optimal biasb = X0, surface normalm = w and the empirical riskσ. To solve the ob-ject function within this second example the following inputs have to be filled in into theSOLVER window:

Set Target Cell $AB$6Equal to MaxBy Changing Cells $E$3:$F$14Subject to the constraints $E$1=$F$1

$E$3:$F$14 <= $AB$4$E$3:$F$14 >= 0.

3.4 Results

The second chosen example shows the regression of twelve patterns situated roughly on asine function (Fig. 7). The quality of the solution is determined by the general parametersCandε of the SV-regression as well as by the width parameter of the RBF kernelγ.A first solution bases on a wide and stiff kernel given by a small γ, on a moderate width ofthe insensitive area given byε in the scale of the expected standard deviation between theregression and the patterns and finally on a moderate force onthe slack variables given by anoptimalC. This solution has got just four support vectors (Fig. 7a). Thus, only every thirdpattern determines the regression. The approximation of the sine function is constructed byfour automatically assembled lines, that fit the sine function quite well.Imagine that the insensitive area is smaller given by a smaller ε in combination with a re-laxed force on the slack variables given by a smallerC, then the approximation of the sinefunction looks much better. Unfortunately, this solution is less robust as a result of a highernumber of support vectors (Fig. 7b).


0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7

γ = 0,025 ε = 0,330 C = 50

Stütz-vektoren

Soll-funktion

a

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7

γ = 0,025 ε = 0,110 C = 12

b

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7

γ = 0,025 ε = 0,990 C = 50

c

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7

γ = 0,750 ε = 0,330 C = 50

d

Figure 7: Non-linear SV-regression: the coloured patternsare the supoort vectors of theregression, the small dots represent all those patterns without an impact on the solution. Thehigherε the wider the insensitive area, the smallerγ the stiffer is the regression function andthe higherC the more patterns are forced into the insensitive area.

The opposite example, just one with high robustness is reached by minimal number of sup-port vectors. In this very case the minimum is given by two vectors. Therefore, we combinea highε what means wide kernel functions with a moderate force onto the slack variables byan optimalC (Fig. 7c).Small RBF kernels given by a highγ deliver in this very case a bad solution. The a pri-ori continuous function of a sine can not easily be assumed after such a zigzag regression(Fig. 7d).All solutions have the common characteristics, that the yield automatically a non-linear re-gression. Consider that the user chooses more or less optimal parameters, then we mayexpect that the qualities of the un-known are optimal reflected. It is upon the sophisticateduser to find the optimal balance between robustness and complexity of the regression. Gen-eral suggestions for this balance and the parameters to reach this balance can not be givenhere. It depends on the actual task that has to reached.

4 Vapnik-Chervonenkis dimension

Each model – also an regression – exhibits a certain complexity. This complexity has be ina suitable balance to the dimension and the number of patterns. In a case of an imbalance –mostly the complexity is too high – occurs the feared effect of over-fitting (Heine, 1999; Mi-ima, 2002; Heinert and Niemeier, 2007; Heinert, 2008a,b). This imbalance happens earlierthan a low redundance in the view of an adjustment (Niemeier,2008).


Figure 8: An example of VC dim(Φ(x)) = 3. There exist23 possibilities how the patternscan be separated error-free into two classes byh = 3 lines.

Since this problem is that serious, it is necessary to find a numerical measure of a model’scomplexity. Such is the so-called Vapnik-Chervonenkis dimension or abbreviated VC-dim(F )(Haykin, 1999, p. 95).

Definition: The VC-dimension of an ensemble of dichotomiesF = Φw(x) : w ∈

W,Φ : RmW → 0, 1 is the cardinalityh = |L| of the largest setL that is shattered byF .

A more explanatory version of this quite compact definition is given by Vapnik himself in1998 (p. 147):

The VC-dimension of a set of indicator functionsΦw(x), w ∈ W is equal to the largestnumberh of vectors that can be separated into two different classes in all the2h possibleways using this set of functions.

According to these quotations, the VC-dim(F ) is the maximal numberL of patterns, whichcan be separated without errors into two classes in an-dimensional data spaceRn (Haykin,1999, p. 94f). Consider that the patterns may have fuzzy degrees of membership and can betherefore elements of two or more sets or classes. Referringthis theh = |L| =VC(N ) de-notes the cardinality of separable patterns. Remember thatthe cardinality of a limited set isthe sum of all degrees of membership of its elements (Bothe, 1995, p. 35). The membershipfunction of an element describes each set membership with values between 0 and 1, whatcan be interpreted as probability as well.The definition of the VC-dim(F ) refers to the use of a suitable setw of indicator functionsΦw(x), which carry out this shattering of the all the patterns intotwo classes. The maxi-mal numberh, respectively the cardinality, can happen in2

h possible ways (Schölkopf andSmola, 2001, p. 9).Said in other words: every set of functions and their combinations have got a own modelcapacity given by the VC dimension.The practical computation of this dimension is not solved for all indicator functions ingeneral (Koiran, P. and Sontag, 2002; Elisseeff and Paugam-Moisy, 1997; Sontag, 1998;


Schmitt, 2001, 2005). Furthermore is until now often only possible to limit this dimensionby Bachmann-Landau notations depending on the number of model kernels.

Nevertheless, the VC-dim(F ) concept is of high importance for the demonstrated methodof modelling. It emphasizes the necessity to reduce the model capacity. Why? Generallyspeaking, the VC dimension in comparison with number of patterns describes the mentalstate of the model. Accordingly, we may assign the followingterms to a model: "stupid"if VC-dim(F ) is too small, "intelligent" if the VC-dim(F ) reaches the lower bound of anoptimal size, "wise" if the model is really optimal, "experienced" if the model’s VC-dim(F )is around the upper bound of an optimal size and finally "autistic" if it has got a far too highsize. Especially this case is often found: the user creates amodel with too much weights orparameters. The model will "remember" every single patterninstead of creating any generalrules. Getting a new pattern the model will be confused.

5 Résumé

Why is an support vector machine robust against over-fitting? Although, a support vectormachine uses powerful kernels, in the case of RBF or AVOVA kernels the theoretical VC-dim(F ) can be even infinite, this algorithm is said to be robust?The decisive step has already be made in the set-up of the hyperplane. However the hyper-plane is positioned between the patterns, only a few patterns on the margin or in its vicinitydecide over the exact position of the decision surface. Thisvery special group of supportvectors is – we imply the user chose suitable values forC andε or kernel parameters likeγ –significantly smaller than the number of available patterns. Accordingly, all patterns that arenot support vectors are implicitly classified or approximated by the model. That way theydo not have nay impact on the optimization process and therefore the ugly effect of modelover-fitting is a priori excluded.

References

Bothe, H.-H.: Fuzzy Logic – Einführung in Theorie und Anwendung. 2nd ext. ed., Springer-Verlag, Berlin- Heidelberg- New York- Barcelona- Budapest- Hong Kong- London- Mailand-Paris- Santa Clara- Singapur- Tokyo, 1995.

Burges, C. H. C.: A Tutorial on Support Vector Machines for Pattern Recognition. In: Data Miningand Knowledge Discovery. vol. 2, pp. 121–167, Kluwer Academic Publishers, 1998.

Domschke, W. and Drexl, A.: Einführung in Operations Research. 5th edition, SpringerBerlin/ Heidelberg, 2002.

Elisseeff, A. and Paugam-Moisy, H.: Size of multilayer networks for exact learning: analytic ap-proach. NeuroCOLT Techn. Rep. Series, NC-TR-97-002, 1997.

Grundmann, W.: Operations Research Formeln und Methoden. Teubner, Stuttgart, Leipzig, Wies-baden, 2002.

Haykin, S.: Neural Networks – A Comprehensive Foundation. 2nd edition, Prentice Hall, UpperSaddle River NJ, 1999.

Heine, K.: Beschreibung von Deformationsprozessen durch Volterra- und Fuzzy-Modelle sowie Neu-ronale Netze. PhD thesis, German Geodetic Commission, series C, issue 516, München, 1999.


Heinert, M. and Niemeier, W.: From fully automated observations to a neural network model infer-ence: The Bridge "Fallersleben Gate" in Brunswick, Germany, 1999 - 2006. J. Appl. Geodesy 1,2007, pp. 671–80.

Heinert, M.: Systemanalyse der seismisch bedingten Kinematik Islands. PhD thesis, Geod. Schriftenr.TU Braunschweig 22, Brunswick (Germany), 2008.

Heinert, M.: Artificial neural networks – how to open the black boxes? In: Reiterer, A. and Egly, U.(Eds.): Application of Artificial Intelligence in Engineering Geodesy. Vienna, 2008, pp. 642U62.

Heinert, M.: Support Vector Machines – Teil 1: Ein theoretischer Überblick. zfv 135 (3): 2010, inpress.

Heinert, M. and Riedel, B. (2010): Support Vector Machines –Teil 2: Praktische Beispiele undAnwendungen. zfv 135, 2010, in press.

Hillier, F. S. and Liebermann, G. J.: Operations Research Einführung. 5th edition, Oldenbourg Verlag,Munich, Vienna, 2002.

Koiran, P. and Sontag, E. D.: Neural Networks with QuadraticVC Dimension. NeuroCOLT Techn.Rep. Series NC-TR-95-044, 1996.

Kuhn, H. W. and Tucker, A. W.: Nonlinear Programming. Proc. of 2nd Berkeley Symp., pp. 481–492,Univ. of California Press, Berkeley, 1951.

Miima, J. B.: Artificial Neural Networks and Fuzzy Logic Techniques for the Reconstruction of Struc-tural Deformations. PhD thesis, Geod. rep. series Univ. of Technology Brunswick (Germany), issue18, 2002.

Niemeier, W.: Ausgleichsrechnung – Eine Einführung für Studierende und Praktiker desVermessungs- und Geoinformationswesens. 2nd rev. and ext. edition Walter de Gruyter, Berlin-New York, 2008.

Platt, J. C.: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Ma-chines. Microsoft Research, Technical Report MSR-TR-98-14.

Platt, J. C.: Fast Training of Support Vector Machines usingSequential Minimal Optimization. In:Schölkopf, B., Burges, C. J. C., Smola, A. J.: Advances in kernel methods: support vector learning.MIT Press, Cambridge (MA), 1999, pp. 185–208.

Rardin, R. L. (1998): Optimization in Operation Research. Prentice Hall, Upper Saddle River USA,1998.

Riedel, B. and Heinert, M.: An adapted support vector machine for velocity field interpolation atBaota landslide. In: Reiterer, A. and Egly, U. (Eds.): Application of Artificial Intelligence inEngineering Geodesy. Vienna, 2008, pp. 101–116.

Schmitt, M.: Radial basis function neural networks have superlinear VC dimension. In Helmbold, D.and Williamson, B. (Eds.): Proceedings of the 14th Annual Conference on Computational LearningTheory COLT 2001 and 5th European Conference on Computational Learning Theory EuroCOLT2001, Lecture Notes in Artificial Intelligence. 2111, pp. 614–630. Springer- Verlag, Berlin, 2001.

Schmitt, M.: On the capabilities of higher-order neurons: Aradial basis function approach. NeuralComputation 17 (3), pp. 6715–729, 2005.

Schölkopf, B. and Smola, A. J.: Learning with Kernels: Support Vector Machines, Regularization,Optimization, and Beyond (Adaptive Computation and Machine Learning). MIT Press, 2001.


Schölkopf, B., Cristianini, N., Jordan, M., Shawe-Taylor,J., Smola, A. J., Vapnik, V. N., Wahba,G., Williams, Chr. and Williamson, B.: Kernel-Machines.Org, 2007. URL: http://www.kernel-machines.org.

Smola, A. J. and Schölkopf, B.: A tutorial on support vector regression. Statistics and Computing 14,pp. 199–222, 2004.

Sidle, R. C. and Ochiai, H.: Landslides- Processes, Prediciton and Land Use. AGU Books Board,Washington, 2006.

Sontag, E. D. (1998): VC Dimension of Neural Networks. In Bishop, C. (Ed.): Neural networks andmachine learning, pp. 69–95. Springer-Verlag, Berlin, 1998.

Vapnik, V. N.: Statistical Learning Theory. in Haykin, S. (Ed.), Adaptive and Learning Systemsfor Signal Processing, Communications and Control, John Wiley & Sons, New York-Chichester-Weinheim- Brisbane-Singapore-Toronto, 1998.

Zhang, J. and Jiang, B.: GPS landslide monitoring of YunyangBaota. Report of University Wuhan,2003.


Appendix

Non-linear conditioned optimisation using the SOLVER

How can the Lagrange multipliers be optimised under MSr-Excel to get a maximised object func-tion? Within this software programme one can find under the menu "Extras" an add-in called SOLVER.This add-in created by the software company Frontline Systems Inc. is not part standard installation.It is to called by the add-ins manager. The SOLVER enables us to source out all the questions ofoptimisation while we may concentrate on the algorithmic design of support vector machines. Thisis quite pleasant, because one can find a lot of books and scientific articles only dealing with modeloptimisation (Domschke and Drexl, 2002; Rardin, 1998; Platt, 1998, and many others). Accordingly,it is sufficient for the moment to know that the SOLVER does themaximisation of the object functionunder MSr-Excel. Therefore the user assigns a "Target Cell", that should contain the present valueof the object function (Fig. 9). In our very case the option "Max" has to be chosen, that the optimi-sation has got the right direction. Furthermore the "Changing Cells", namely the cells containing theLagrange multipliers, have to be assigned. Finally, the user programmes the Karush-Kuhn-Tuckerconditions under the window "Subject to the Constraints".Quite extended explanations and practical examples are given by Staiger (2007, in German).

Figure 9: Solver under the "Extras" menu in MSr-Excel: Assignment of Lagrange multi-pliers in the "Changing Cells" and the "Target Cell" including the order that its value repre-senting the object function has to be the "Max" "Subject to the Constraints" of the Karush-Kuhn-Tucker conditions.

application of artificial intelligence and innovations in...

Documents