software tools and data processing methods to support ... · software tools and data processing...

emil holmegaard

S O F T WA R E T O O L S A N D D ATA P R O C E S S I N G M E T H O D S T OS U P P O RT S O F T WA R E D E F I N E D B U I L D I N G S

S O F T WA R E T O O L S A N D D ATAP R O C E S S I N G M E T H O D S T O

S U P P O RT S O F T WA R E D E F I N E DB U I L D I N G S

byemil holmegaard

The Mærsk Mc-Kinney Møller InstituteThe Technical Faculty

SDU - Center for Energy InformaticsUniversity of Southern Denmark

August 2017

Evaluation CommitteeJan Corfixen Sørensen, Ph.D., Assistant Professor (Chair)Rune Hylsberg Jacobsen, Ph.D., Associate ProfessorHenrik Blunck, Ph.D., Professor

SupervisorsMikkel Baun Kjærgaard, Ph.D., Associate ProfessorBo Nørregaard Jørgensen, Ph.D., Professor

PrintPrint and Sign, University of Southern Denmark

CorrespondenceEmil [email protected]: +45 40 16 90 11

Software Tools and Data Processing Methods to Support Software Defined Buildings© August 2017

A B S T R A C T

The future energy system will have fluctuating energy production due to energyproduction from wind and solar. One solution for the future energy system, will beto integrate buildings as thermal buffers. This promotes applications among oth-ers for energy optimization, Demand Response (DR) and intelligent control strate-gies for HVAC systems. To enable such applications, the challenge is to createan infrastructure where applications are easy to port between buildings. SoftwareDefined Buildings (SDBs) which facilitate a semantic representation (metadata) forbuildings, can provide an infrastructure for Portable Building Applications (PBAs).The challenge is to facilitate a semantic representation for buildings, which is sim-ple and powerful. Therefore the overall research question for this thesis is: Howto transform energy and sensor data from buildings to knowledge that supportSoftware Defined Buildings?This thesis has a constructive research approach. The approach combines analysisof existing solutions, prototyping software tools and data processing methods onreal data from buildings. The software tool Metafier has been developed to sup-port the task of annotating and structuring metadata from sensors in buildings.Metafier has been evaluated by three subjects with relevant backgrounds withinthe topic of energy and buildings. Metafier includes data processing methods forsemi-automated metadata generation.Besides Metafier, a study of data processing methods for Non-Intrusive Load Mon-itoring (NILM) in an industrial has been performed. The data processing meth-ods disaggregate one sensor into a semantic representation of the equipment con-nected to the sensor. In addition to the semantic representation the data process-ing methods estimate the power draw for the connected equipment. This researchprovides preliminary concepts for applications, to provide disaggregated energyconsumption with a minimal sensor infrastructure. Such applications could bepart of SDB and developed as PBAs.The task of annotating and structuring metadata for sensors in buildings, bringsvalue to corresponding data stream of the sensor. The metadata provides contextfor the sensor and data stream. Metadata for sensors provides the semantic repre-sentation which can combine a physical environment to a SDB.Based on the contributions of the thesis future research will include developmentof software tools as PBAs. Refinement of data processing methods for automatedmetadata generation.

v

D A N S K R E S U M É

Det fremtidige energisystem vil have fluktuerende produktion af energi baseretpå energi fra sol og vind. En løsning vil være at integrere bygninger som en slagsbatterier. Dette sætter fokus på applikationer for blandt andet energioptimering,Demand Response (DR) og intelligent styring af indeklima. For at muliggøre dis-se applikationer, er udfordringen at skabe en infrastruktur hvor applikationer for-holdsvis nemt kan porteres og genbruges i bygninger. Software definerede bygnin-ger (SDB) kan understøtte en semantisk repræsentation (metadata) for bygninger,og kan tilbyde den manglende infrastruktur til portable bygnings applikationer(PBA). Udfordringen er at muliggøre en simpel men brugbar semantisk repræ-sentation. Dette leder til følgende problem stilling: Hvordan omformes og trans-formeres energi- og sensordata fra bygninger til viden der understøtter softwaredefinerede bygninger?Fremgangsmåden for denne afhandling består af en kombination mellem analyseaf eksisterende løsninger, opbygning af software værktøjer og processeringsmeto-der til behandling af data fra bygninger. Værktøjet Metafier er blevet udviklet til atunderstøtte annotering og strukturering af metadata fra sensorer i bygninger. Me-tafier er blevet evalueret af tre forsøgspersoner med relevante baggrunde indenforområderne energi og bygninger. Metafier indeholder processeringsmetoder, somgør Metafier i stand til, via en semi-automatisk metode, at generere metadata.Foruden Metafier, er et studie omkring processeringsmetoder, til at udføre ikke-indtrængende strømforbrugs overvågning (NILM) i et industrielt miljø, blevet ud-ført. Processeringsmetoderne er anvendt til at adskille energidata, til en semantiskrepræsentation for tilkoblet udstyr for måleren. Foruden en semantisk repræsenta-tion, har processeringsmetoderne estimeret strømforbruget for det tilkoblede ud-styr. Denne forskning bringer nogle indledende koncepter for applikationer, deranvender NILM, og dermed kan minimere antallet af sensorer i et lignende setup.En sådan applikation kan udvikles som en PBA.Opgaven med at annotere og strukturere metadata for punkter, bringer kontekstfor den tilhørende datastrøm. Metadata for punkter giver en semantisk represen-tation, der kan kombinere SDB og en bygnings fysiske omgivelser.Baseret på denne afhandlings bidrag kan fremtidig forskning tage udgangspunkti udvikling af applikationer som PBA. Processeringsmetoder til automatisk at ge-nerere metadata kan også forfines.

vi

P U B L I C AT I O N S

included in thesis

1 . Metafier - a Tool for Annotating and Structuring Building Metadata - byEmil Holmegaard, Aslak Johansen and Mikkel Baun Kjærgaard. Acceptedfor Proceeding of the 2017 IEEE Smart World Congress (SmartWorld 2017) byIEEE. 2017.

2 . Mining Building Metadata by Data Stream Comparison - by Emil Holmegaardand Mikkel Baun Kjærgaard. Published in Proceeding of the 2016 IEEE Confer-ence on Technologies for Sustainability (SusTech) by IEEE. 2016. Pages 28-33.

3 . Towards a Metadata Discovery, Maintenance and Validation Process to Sup-port Applications that Improve the Energy Performance of Buildings - byEmil Holmegaard, Aslak Johansen and Mikkel Baun Kjærgaard. Publishedin 2016 IEEE International Conference on Pervasive Computing and Communica-tion Workshops (PerCom Workshops) by IEEE. 2016. Pages 1-6.

4 . NILM in an Industrial Setting: A Load Characterization and Algorithm Eval-uation - by Emil Holmegaard and Mikkel Baun Kjærgaard. Published in 2016IEEE International Conference on Smart Computing (SMARTCOMP) by IEEE.2016. Pages 1-8.

5 . Status and Challenges of Residential and Industrial Non-Intrusive LoadMonitoring - by Ali Adabi, Patrick Mantey, Emil Holmegaard and MikkelBaun Kjærgaard. Published in Proceeding of the 2015 IEEE Conference on Tech-nologies for Sustainability (SusTech) by IEEE. 2015. Pages 181-188.

other publications by the author

6 . Demand Response with Model Predictive Comfort Compliance in an OfficeBuilding - by Peter Nelleman, Mikkel Baun Kjærgaard, Emil Holmegaard,Krzysztof Arendt, Aslak Johansen, Fisayo Caleb Sangogboye and Bo Nør-regaard Jørgensen. Accepted for Proceedings of the 2017 IEEE InternationalConference on Smart Grid Communications (SmartGridComm) by IEEE. 2017.

7 . OccuRE: an Occupancy REasoning Platform for Occupancy-driven Applica-tions - by Mikkel Baun Kjærgaard, Aslak Johansen, Fisayo Caleb Sangogboyeand Emil Holmegaard. Published in Proceedings of the 19th International AcmSigsoft Symposium on Component-based Software Engineering by Association forComputing Machinery. 2016. Pages 39-48.

8 . Energy Efficiency in a Mobile World - by Mikkel Baun Kjærgaard, Zheng Ma,Emil Holmegaard and Bo Nørregaard Jørgensen. Published in Smart Gridsfrom a Global Perspective : Bridging Old and New Energy Systems by Springer.2016. Pages 249-268.

vii

9 . Towards Automatic Identification of Activity Modes in Electricity Consump-tion Data for Small and Medium Sized Enterprises - by Emil Holmegaardand Mikkel Baun Kjærgaard. Published at NILM Workshop 2014. 2014. Pages1-4.

viii

A C K N O W L E D G M E N T S

It would never have been possible to make this Ph.D. project without the assis-tance, guidance and collaboration with a numerous of individuals. .

Firstly, I would like to thank my supervisors Associate Professor Mikkel BaunKjærgaard and Professor Bo Nørregaard Jørgensen for entrusting me with thisproject, and for their sustained advices during both easy and tough times over thelast four years.

A special thank to the technical staff at the cold store in Vejle, for being a part ofEnergy Guild Vejle Nord and letting us use their electricity data. Furthermore athank for introducing us to all the systems and processes of a cold store.

Further I would like to thank Professor David Culler for hosting me at UC Berke-ley doing my change of research environment. Professor David Culler is one ofthe most inspiring people I have met. His knowledge with respect to the topic ofSoftware Defined Buildings is enormous.

Additionally, I would like to thank Aslak Johansen for collaboration, technical dis-cussions and advices doing my Ph.D Study. Also I would like to thank AishaUmair and Morten Gill Wollsen for discussions of all kinds during my time atCenter for Energy Informatics.

Finally I owe a special thanks to my wife Gitte and daughter Ella for loving methe most even in the hardest times and when I have been absent both mentallyand physically. All my late hours, stubbornness, ambitions for life are efforts toprotect, love, and support you.

ix

C O N T E N T S

I setting the stage 1

1 introduction 3

1.1 Energy Informatics and Software Defined Buildings . . . . . . . . . . 3

1.2 Common Feature of Projects . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 research challenges and state of the art 11

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 research approach 27

3.1 Research Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Research Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3 Case Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

II non intrusive load monitoring 33

4 a load characterization and algorithm evaluation 35


4.2 Preliminary Analysis of Industrial Equipment . . . . . . . . . . . . . 36

4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.4 Evaluation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

III software tools for metadata 55

5 the tool metafier 57


5.2 Requirements for Metafier . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3 Features and Implementation Details . . . . . . . . . . . . . . . . . . 60

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6 an evaluation of metafier 69


6.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.3 Results from Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.5 Comparison of Existing Solutions . . . . . . . . . . . . . . . . . . . . 74

6.6 Requirements for a Tool to Handle Building Metadata . . . . . . . . 75

6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7 data processing methods for data mining in metafier 77


xi

7.2 Algorithms for Generation of Metadata . . . . . . . . . . . . . . . . . 78

7.3 Evaluation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

IV perspectives and implications 85

8 perspectives 87

9 conclusion 89

10 future research 93

bibliography 95

glossary 101

xii

L I S T O F F I G U R E S

Figure 1.1 Similarities between a smartphone application and a PBA.Two different phones are illustrated to the left, with twoapplications which can be executed on both phones via theabstraction layer provided by the OS. Two different build-ings are illustrated to the right, with two applications whichcan be executed on both buildings via the abstraction layerprovided by the BOS. . . . . . . . . . . . . . . . . . . . . . . . 5

Figure 1.2 Transformation of data to knowledge. This figure is basedon Fayyad et al. [1]. . . . . . . . . . . . . . . . . . . . . . . . . 5

Figure 1.3 Overview of the timeline of this project. . . . . . . . . . . . . 6

Figure 1.4 Contributions related to Figure 1.2 . . . . . . . . . . . . . . . 8

Figure 2.1 UnitOfMeasure for points with more than 500 of the sameUnitOfMeasure from the sMAP-instance [2] at http://new.openbms.org. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Figure 2.2 A directed acyclic graph representing the conditional inde-pendence relations in a factorial HMM with 1 to K under-lying Markov chains. . . . . . . . . . . . . . . . . . . . . . . . 25

Figure 3.1 SDBs as a concept. The colored areas relates to topics fromthis Ph.D. study. The blue markings relates to data process-ing methods and the green markings relates to software tools. 28

Figure 3.2 GTH located in Vejle, Denmark. . . . . . . . . . . . . . . . . . 30

Figure 3.3 SDU OU44, located in Odense, Denmark. . . . . . . . . . . . 30

Figure 4.1 Equipment in cold store setup, a total of forty points, allconnected to one main meter. . . . . . . . . . . . . . . . . . . 36

Figure 4.2 Number of events within a specific hour of the day for com-pressor, condenser in the cold store and a residential house. 37

Figure 4.3 Power distributions for compressor, condenser and heat pumpin the cold store with colored clusters estimated via k-meansclustering with k=3. . . . . . . . . . . . . . . . . . . . . . . . . 38

Figure 4.4 Event correlation between equipment in a cold store. . . . . 39

Figure 4.5 Event correlation between equipment house 1 in the REDD. 40

Figure 4.6 Test accuracy with one main meter for CO and FHMM. . . . 44

Figure 4.7 MNE with one main meter for CO and FHMM. . . . . . . . 44

Figure 4.8 Setup with meter-groups for the four logical sections in thecold store data, with freezing, storage and heating connectedto the main meter. . . . . . . . . . . . . . . . . . . . . . . . . . 45

Figure 4.9 Test accuracy with four sub-meters for CO and FHMM. . . . 46

Figure 4.10 MNE with four sub-meters for CO and FHMM. . . . . . . . 46

Figure 4.11 MNE with four sub-meters for three periods (hot, normaland cold) with different outside temperature. . . . . . . . . . 48

Figure 4.12 Regression between power, temperature and goods flow, forEvaporator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

xiii

http://new.openbms.org


Figure 4.13 Regression between power, temperature and goods flow, forCondenser. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Figure 4.14 Test accuracy FHMM with day specific training. . . . . . . . 52

Figure 4.15 MNE for FHMM with day specific training. . . . . . . . . . . 52

Figure 5.1 States within the process view of metadata discovery andvalidation for points. The process supports the life cycle fora point. To the right we have a simple example, for a pointfollowing the process. . . . . . . . . . . . . . . . . . . . . . . . 59

Figure 5.2 System setup, here illustrate with an interface to a sMAPinstance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Figure 5.3 Screenshot for "list view" with an overview of all points inMetafier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Figure 5.4 The flow in Metafier. The blue arrows indicate processeswithin Metafier. The green arrows indicate processes startedfrom the GUI. . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Figure 6.1 Click statistics for subject A. . . . . . . . . . . . . . . . . . . . 72

Figure 6.2 Click statistics for subject B. . . . . . . . . . . . . . . . . . . . 73

Figure 6.3 Click statistics for subject C. . . . . . . . . . . . . . . . . . . . 73

Figure 7.1 Results with a combination of 1 (validation points: 3) forthe three algorithms and All. . . . . . . . . . . . . . . . . . . 81




L I S T O F TA B L E S

Table 2.1 Solutions and systems to support metadata from buildings.3 is used for “supports” and 7 is used for “does not sup-port”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Table 2.2 Overview of NILM research with HF methods . . . . . . . . 23

Table 3.1 Summary of sites from Section 3.3. . . . . . . . . . . . . . . . 29

Table 4.1 Results for Season Weighted CO. . . . . . . . . . . . . . . . . 51

Table 6.1 Tasks for subjects evaluating Metafier. . . . . . . . . . . . . . 70

Table 6.2 Results from evaluation of Metafier. Results collected byJavaScript in Metafier. . . . . . . . . . . . . . . . . . . . . . . . 72

xiv

Table 6.3 Features for annotation of metadata in existing BAS andBMS. * have been used when the feature are partial meet,or require a plugin. . . . . . . . . . . . . . . . . . . . . . . . . 75

L I S T I N G S

Listing 2.1 A simple query, for retrieving all points from a certain room,with the point type of temperature . . . . . . . . . . . . . . . 12

Listing 5.1 An example of end points in Metafier. . . . . . . . . . . . . . 62

Listing 5.2 A location metadata profile in Metafier. . . . . . . . . . . . . 64

Listing 5.3 A valid location for a point, validated through Listing 5.2. . 65

Listing 5.4 Snippet from DataProvider class. Part of the class have beenleft out, see comments in snippet. . . . . . . . . . . . . . . . . 65

Listing 5.5 Method signatures for primary methods in foundation. . . . 68

A C R O N Y M S

ABC Abstract Base Class

AMR Automated Meter Reading

AMI Automated Metering Infrastructure

API Application Programming Interface

BAPPS Building APPlication Stack [3]

BAS Building Automation System

BMS Building Management System

BOS Building Operating System

BOSS Building Operating System Services [4]

CFEI University of Southern Denmark (SDU) - Center for Energy Informatics

CO Combinatorial Optimization

CPS Cyber-Physical System

DA Data Analytics

DR Demand Response

DTO Data Transfer Object

xv

DTW Dynamic Time Warping

EGVN Energy Guild Vejle Nord [5]

EI Energy Informatics

EMD Empirical Mode Decomposition

EMI Electro Magnetic Impulse

EV Electrical Vehicle

FHMM Factorial Hidden Markov Model

GTC Green Tech Center

GTH Green Tech House

GUI Graphical User Interface

HAN Home Area Network

HPL Hardware Presentation Layer

HMM Hidden Markov Model

HF High Frequency

HTTP HyperText Transfer Protocol

HVAC Heating, Ventilation and Air Conditioning

IMF Intrinsic Mode Functions

IOT Internet of Things

ICT Information and Communications Technology

JSON JavaScript Object Notation

KD Knowledge Discovery

LF Low Frequency

MNE Mean Normalized Error

MVC Model-View-Controller

NILM Non-Intrusive Load Monitoring

NILMTK Non-Intrusive Load Monitoring Toolkit

NP Nondeterministic Polynomial Time

ODBC Open Database Connectivity

OS Operating System

PBA Portable Building Application

xvi

PIR Passive InfraRed

PV Photovoltaic

REDD The Reference Energy Disaggregation Data Set [6]

REST REpresentational State Transfer

SCADA Supervisory Control And Data Acquisition

SC Slope Compare

SD Software Development

STD Standard Deviation

SDB Software Defined Building

SDU University of Southern Denmark

SMAP Simple Measurement and Actuation Profile [7]

UK United Kingdom

US United States

UUID Universally Unique Identifier

VFD Variable Frequency Drive

XML eXtensible Markup Language

XMPP eXtensible Messaging and Presence Protocol

YAML YAML Ain’t Markup Language

xvii

Part I

S E T T I N G T H E S TA G E

This part is presenting the intention of the work conducted in this Ph.D.study. First a presentation of the challenges to solve, the settings of thechallenges, areas of work related to the challenges, then the selected ap-proach for solving the challenges, and what specific research questionsto answer.

1I N T R O D U C T I O N

This chapter provides an introduction to the topics of Energy Informatics (EI),Cyber-Physical System (CPS), Software Defined Building (SDB) and Non-IntrusiveLoad Monitoring (NILM). This chapter holds the motivation and sets the stage ofthis project.

1.1 energy informatics and software defined buildings

In the near future, fossil fuels will be limited and new energy solutions should befound. The society, governments and energy experts all points to renewable en-ergy, like wind and solar to substitute the fossil energy sources [8, 9, 10]. The tran-sition from fossil fuels to renewable energy sources will have challenges [8] withinterdisciplinary tasks. This includes control and optimization of a decentralizedenergy production, from wind and Photovoltaic (PV) production. Another chal-lenge is the integration of Electrical Vehicles (EVs), which require large amount ofelectricity to charge. Furthermore if EVs should be used as batteries to supportthe electricity grid, there will be a task for control and coordinate which EVs andwhen.

EI is a research area, with focus on applications where Information and Com-munications Technologys (ICTs) supports and facilitate the transition towards asustainable energy system. EI covers the interdisciplinary tasks of ICTs, energyengineering and software engineering to address energy challenges [11].

The Danish Government have developed a 2050 energy strategy [10], which fo-cus on renewable resources like wind and bio fuel. One of the elements in thestrategy is to add 1000 MW offshore wind power to the danish electricity grid.Besides renewable energy sources, are two of the initiatives to future-proof re-quirements for energy effective construction materials for new buildings and topromote energy effective buildings. The initiatives promotes analyzing consump-tion of energy and development of new software to optimize energy consumptionin buildings. The objective of the strategy, is to dynamically produce energy ondemand or shift energy consumption.

Denmark have around 30% of the electricity production covered by renewableenergy [12]. The primary renewable energy source is wind power, which lead tofluctuations in energy production and energy prices [9]. First solution for neutral-izing the fluctuations in the renewable energy production is storage, like batteries.The problems with batteries are cost, efficiency and size. The second solutionwould be using buildings as batteries or more precise as thermal buffers [13]. Athird solution would be Demand Response (DR), which require integration to com-mercial buildings, to turn off energy heavy equipment like Heating, Ventilationand Air Conditioning (HVAC). When applying DR in buildings, uncomfortablesituations for the occupants of the building should be avoided. The three solutionsall rely on flexibility of the energy consumer. Hazas et al. [14] argue for, a focuson reducing energy consumption, and that consumers should not start changing

3

Emil Holmegaard

Energy Infromatics def

behaviors. One of the ideas is to integrate energy reduction into the products,such consumers do not notice it.

The industrial sector and commercial buildings are responsible for a large shareof the world-wide energy consumption. In fact the industry accounts for nearly30% of the total electricity consumption in the United States (US) [15] and Den-mark [16]. Commercial buildings accounts for around 40% of the total energyconsumption [17]. An important area to reduce energy consumption is commer-cial buildings. To reduce energy consumption in commercial buildings, it requiresknowledge from different disciplines. An example would be optimization of theconsumption of light, here it requires a software engineer to implement an opti-mization algorithm, a sociologist to analyze the effect for the occupants and anelectrical engineer for analyzing whether the equipment can be managed by thealgorithm. Buildings often have several systems installed (i. e. different formsof Building Management System (BMS) and Building Automation System (BAS)),with the combined goal of providing a good comfort level for occupants in build-ings. The building instrumentation includes the physical sensor infrastructure ina building, which different forms of BMS and BAS uses. With a combined goal ofproviding a good comfort level, it would be natural that systems share informationand collaborate, but often they work as silos [18]. Improving at scale the energyperformance of buildings requires new software infrastructures for buildings withsoftware applications to break these silos. SDB are in this thesis defined as:

Software Defined Building (SDB) is the concept of representing cyberand physical elements of a building to provide integration and/or in-teraction to cyber and physical elements within the building

SDB creates a representation of the building, such a building can be seen as aOperating System (OS) and more specific as a Building Operating System (BOS).With a BOS it will be possible to create applications for buildings, which canapplied or ported to multiple buildings. A Portable Building Application (PBA) isa software applications which can be used for multiple buildings, with a minimaleffort of porting the application.

PBAs can be seen as an analogy to smartphone applications using the sensinginstrumentation in a smartphone. Figure 1.1 illustrates the similarities betweensmartphone applications and PBAs. Different smartphone models expose differ-ent sensors and hardware in a more or less common way [19]. The abstractionlayer provided by the smartphone OS makes it easy to develop and distributeapplications via an app store.

The concept of SDB could lead to an app store for buildings, where a facilitymanager could upgrade the control of a HVAC system to a more energy efficientversion, by a few clicks. To have a SDB, the challenges are to install a BOS whichprovides a semantic representation of the building instrumentation. The semanticrepresentation might be exposed by metadata.

Metadata for points from a building instrumentation might providerelevant information about: location of the point, type of point, encod-ing of data from point and unit for the associated data stream of apoint. If the majority of points from a building instrumentation has

4

Emil Holmegaard

Energy Comsunption pr sector

1 2 3

4 5 6

7 8 9

# 0 *

Facebook App

Google Maps App NILM

Intelligent Building Control

BOSOS

Appl

icat

ions

Inst

rum

enta

tion

Appl

icat

ions

Inst

rum

enta

tion

Figure 1.1: Similarities between a smartphone application and a PBA. Two differentphones are illustrated to the left, with two applications which can be executedon both phones via the abstraction layer provided by the OS. Two differentbuildings are illustrated to the right, with two applications which can be exe-cuted on both buildings via the abstraction layer provided by the BOS.

metadata, the building instrumentation can provide a semantic repre-sentation. Metadata provides a semantic representation to understandthe context of a point

The semantic representation provides an abstraction layer for buildings, whichenable an interface for PBAs. To enable PBA a challenge is to have the rightamount of metadata for the building instrumentation.

1.2 common feature of projects

The overall theme for this Ph.D. project is Knowledge Discovery (KD), Data Ana-lytics (DA) and Software Development (SD). The three themes summarize to thetask of transforming data to knowledge, and following the process in Figure 1.2based on [1]. For this Ph.D. project, data processing methods is used as a termfor tasks related to development of algorithms and Data Analytics (DA). Softwaretools is used when software applications are interfacing other systems or humans.

Data

Target Data

Preprocessed Data

Trasnsformed Data

Patterns

Knowledge

Selection Preprocessing Transformation Data Mining Interpretation / Evaluation

Figure 1.2: Transformation of data to knowledge. This figure is based on Fayyad et al. [1].

5

Figure 1.2 illustrates five subprocesses and six sub-products (11 steps) for KD.The subprocesses are: selection is the task of selecting a subset of data, prepro-cessing is the task of cleaning and structuring data, transformation is the task ofreducing the complexity in data, data mining is the task of finding patterns in dataand interpretation / evaluation which is the task of interpret the patterns found.

KD can be iterative within each subprocess, between multiple subprocesses orover the whole transformation. Figure 1.2 will be used for setting the context ofeach sub-project.

1.3 projects

For this Ph.D. study, the research have been conducted as part of two researchprojects. The first project is Micro Grid Living Lab, a project on creating a liv-ing lab to study new solutions for reducing energy consumption with support ofsoftware tools and data processing methods. The second project is COORDICY,a project on EI for reducing energy consumption in commercial buildings. Thefocus of the two projects are slightly different, with a common objective of reduc-ing energy consumption by analyzing energy data and development of softwaretools. Figure 1.3 illustrates which project and which tasks there have been workedon doing the Ph.D. study.

Q1 Q2 Q3 Q4

2014

Q1 Q2 Q3 Q4

2015

Q1 Q2 Q3 Q4

2016

Q1 Q2 Q3

2017

Q4

Microgrid Living Lab

COORDICY

Paternity Leave / Research Assistant

Data Processing Methods

Software Tools

Processing Methods / Software Tools

Figure 1.3: Overview of the timeline of this project.

Figure 1.3 illustrates the tasks associated to the sub-project. For Micro GridLiving Lab the work has been within the task of data processing methods. ForCOORDICY the work has been a combination of data processing methods andsoftware tools.

For a nine month period, this Ph.D. study has been paused, due to a paternityleave (Q2 2016) and an employment as research assistant (Q3 and Q4 2016). Whilebeing research assistant, the focus of the work has been integration of the build-ing instrumentation of the building University of Southern Denmark (SDU) OU44,into a implementation of a BOS. The chosen BOS was Simple Measurement andActuation Profile [7] (sMAP), which provides an archiver for a database of timeseries data from sensors with associated data streams and a database with associ-ated metadata for the sensors. The tasks have included labeling of all sensors andactuators from SDU OU44 with metadata.

The two following subsections present the research projects of Micro Grid LivingLab and COORDICY.

6

Emil Holmegaard

Description of KD

1.3.1 The Micro Grid Living Lab Project

Micro grids are physical connections in the energy grid, a part of the grid, or anisland in the grid which can operate in cooperation with the energy grid or by itsown [20]. A building with its own energy production site could be considered asa small micro grid.

The project was a collaboration in a triple helix, between companies in EnergyGuild Vejle Nord [5] (EGVN), Green Tech Center (GTC) and SDU - Center forEnergy Informatics (CFEI). EGVN is a consolidation of companies in the area ofVejle, who wants to share thoughts, knowledge and data about energy. The objec-tive of EGVN is to optimize conditions with respect to energy for the companies inthis area. At GTC there is a set of renewable energy production facilities installed,furthermore the energy grid can be bypassed (heat and electricity), which meanGTC can act as a micro grid.

The objective for the project, was to create a living lab in the area of Vejle. To ob-tain a living lab where companies share energy data, the majority of the companiesin the area was visited. One of the companies in this project was GridManager1,who was responsible for equipment to monitor energy data for the companies inEGVN. We had several meetings with GridManager where they explained thetime consuming task of analyzing the energy consumption of different buildings,industrial processes and industrial equipment. Therefore one of our objectiveswas to provide data processing methods for the companies to understand theirenergy consumption. From the interviews of the companies in EGVN, they wouldalso like to have data processing methods to analyze their energy data. Basedon a literature study, we found a minimal set of research articles with the topic ofNILM in industrial settings. NILM is a data processing method for disaggregationof energy data.

Following Figure 1.2, first KD iteration for obtaining knowledge of an industrialsettings is to understand the processes within the setting. Next KD iteration isto analyze sensor data for the individual processes, to get knowledge of the mostfrequent and maybe most valuable process.

As illustrated in Figure 1.3, this project has been conducted in the first partof the course of this Ph.D. study. The main focus has been on data processingmethods, particular with the topic of NILM.

1.3.2 The COORDICY Project

Commercial buildings certified by energy efficiency and sustainability standardslike LEED [21], ENERGY Star [22] and Globes [23] often experience a gab be-tween predicted and actual energy performance. In some cases the certified build-ings have performed even worse than non-certified buildings [24]. Therefore CO-ORDICY aim to introduce continuous commissioning and retrofits based on EIand building intelligence [11]. The COORDICY project has the title: "ICT-drivenCoordination for Reaching 2020 Energy Efficiency Goals in Public and CommercialBuildings" [25]. The objective for COORDICY is to facilitate SDBs, develop PBAs

1 This company went bankrupt 2015-02-17

7

Emil Holmegaard

MicroGrid

Emil Holmegaard

Coordicy

which can optimize control strategies and discover energy performance gabs incommercial buildings.

The topic within COORDICY and this project is SDB, BOS and metadata forbuilding instrumentation, to enable PBA. BOS exposing services within a building,like known from a smartphone, where the OS provides interfaces to the underlyingsensor infrastructure [19]. Metadata for all points in a building instrumentation,can facilitate an abstraction, which enables PBA through BOS. Following Fig-ure 1.2, first KD iteration for obtaining knowledge of a building is to get metadataabout the building instrumentation, next KD iteration is to analyze sensor data toobtain the systems within the building, and then the intelligence within the build-ing might increase to obtain better comfort with minimal energy consumption.

As illustrated in Figure 1.3, this project has been conducted in the last part ofthe course of this Ph.D. study. The focus has been on data processing methodsand software tools, this includes DA, KD and SD with respect to SDB.

1.4 contributions

The contributions of this thesis include both theoretical and practical construct.Figure 1.4 illustrates the contributions with respect to the KD process.

Data

Target Data

Preprocessed Data

Trasnsformed Data

Patterns

Knowledge

Selection Preprocessing Transformation Data Mining Interpretation / Evaluation

NILM in an Industrial Setting

Data Stream Comparison

Metafier for Maintenance of Metadata

Model for Discovery, Validation and Maintenance

Figure 1.4: Contributions related to Figure 1.2

The theoretical contributions are:

• A model for discovery, validation and maintenance of metadata from build-ing instrumentations.

• Requirements for a software tool for annotating and structuring metadatafrom building instrumentations.

The practical contributions are:

• An implementation and evaluation of data processing methods for NILM inan industrial setting.

• The framework Metafier, used for implementation and evaluation of dataprocessing methods for semi-automated metadata generation, based on datastream comparison.

• An implementation and evaluation of the software tool Metafier for mainte-nance of metadata from building instrumentations for two real buildings.

8

1.5 thesis outline

This thesis is based on several publications conducted doing the course of thisPh.D. study. Results and developed theory from publications are included in thisthesis. The thesis are self-contained with respect to data and results. This thesisis structured by four parts, Part I provides background for the challenges, re-search questions, state of the art for SDBs and NILM and describes the approachused for this thesis. Part II provides an analysis of NILM in an industrial set-ting, with an preliminary analysis of industrial equipment to support the task ofapplying NILM. Then follows Part III which focuses on metadata for buildinginstrumentations. Chapter 5 introduces the software tool Metafier. Chapter 6 eval-uates Metafier with respect to annotating and structuring of metadata. Chapter 7

evaluates Metafier with respect to data processing methods for semi-automatedmetadata generation. Part IV finalizes this thesis, Chapter 8 provides perspectivesfor this thesis with respect to validity and limitations of the conducted experi-ments. Chapter 9 concludes on the research questions followed by future researchin Chapter 10.

9

2R E S E A R C H C H A L L E N G E S A N D S TAT E O F T H E A RT

This chapter presents the challenges within Software Defined Building (SDB) andNon-Intrusive Load Monitoring (NILM) for an industrial setting. The chapter isstructured by an introduction to the current and future state of the energy system,SDB and NILM. Then summarized into the challenges of each field which isfollowed by research questions for this project. Finally, follows state of the art forthe two fields.

2.1 background

The energy sector have changed the last couple of decades, from a system whereenergy production and demand followed each other, to a system where productionis fluctuating due to renewable energy sources [26]. Fossil fuels will be limited,and should be stored for purposes where it would be difficult to replace fuele. g. for flights [27]. Processes with a requirement of energy with high energydensity, fossil fuels should be earmarked. The focus on renewable energy sources,is clear in the 2050 energy strategy plan from the Danish Government [10]. Oneof the objectives is to optimize the utilization of the infrastructure. This includesa common objective of a sustainable energy system for district heating, gas, andelectricity [10].

Overall buildings uses 40% of the total energy consumption [17], which makesthem a target for reducing and optimizing energy consumption. For optimizingthe infrastructure, buildings can be used as batteries, where the thermal mass canbe controlled based on the predicted energy production from renewable energysources [13]. The transition of the energy system introduces challenges, whichrequire a combined effort for Information and Communications Technology (ICT),energy engineering and software engineering [8], the field of Energy Informat-ics (EI). The focus for this Ph.D. study is minimizing energy consumption withsupport from SDB via software tools and data processing methods. With SDBs toprovide an abstraction layer, where applications can be ported to different build-ings. All effort can be focussed on optimizing energy heavy processes withinbuildings by software tools and data processing methods.

The following part of this background section, relates to the two projects CO-ORDICY and Micro Grid Living Lab. First for SDB with respect to metadata. Thena challenge for NILM with respect to industrial settings.

2.1.1 Software Defined Buildings

This section describes challenges and existing research systems handling build-ing instrumentation, points, Building Operating System (BOS) and metadata forbuilding instrumentations.

11

A point represents a connection between the cyber and physical worldwhich may be discretized into a data stream. Such a data stream con-tains either sensor readings or actuation requests depending on direc-tion of the connection

Improving at scale the energy performance of buildings requires software in-frastructures for buildings like SDB. The infrastructure should facilitate applica-tions which are portable, such the effort of integrating the application are mini-mal. SDBs might break the silos between systems within buildings. To establisha software infrastructure like SDB, it requires metadata to provide a semantic rep-resentation of the building. The metadata must be easy to query for retrievingcorresponding data streams for the query.

Data streams are continuously data from points. The term data streamare also used for historical data from points

For having Portable Building Applications (PBAs), one challenge is to enable asimple and efficient way to query data streams from points. This relates to data-on-demand via sophisticated intuitive queries which will support web3 and Internetof Things (IoT) [28]. For applications to be portable, there must be a separationbetween the actual building and the application. The metadata can provide a levelof abstraction. An example would be querying data streams, by querying a certainpoint, by only knowing the point type and location. An example of a simple queryfor points could be:

SELECT points FROM room="1.1.1" AND building="OU44" WHICH HANDLE "temperature" ⇧Listing 2.1: A simple query, for retrieving all points from a certain room, with the point

type of temperature

The query in Listing 2.1 returns a set of points, which are located in buildingOU44, room 1.1.1 and have a type of temperature. The set of points does haveassociated data streams.

As defined earlier, SDB is a concept of representing cyber and physical ele-ments (points) of a building to easily interact and integrate to the building. Thechallenges for SDB is to create an abstraction layer, which makes the integrationsimple but also powerful. Simple in terms of steps to integrate, and powerful interms of operations performed on the building. Krioukov et al. [3] have with theBuilding APPlication Stack [3] (BAPPS), shown how powerful an abstraction layercan be, where they integrated BAPPS with two buildings with different buildinginstrumentations. For establishing an abstraction layer, annotated metadata for allpoints in buildings provides this abstraction. An abstraction layer to the buildinginstrumentation will support SDBs, which thereby support PBAs.

The major challenge for metadata, is to define a common way of annotatingpoints. At the current state there are multiple ways to annotate points [29]. Thiscomplicate the task of creating intuitive queries, that can port to multiple build-ings, for retrieving data streams from points and thereby enable a level of ab-straction. The format for metadata in Simple Measurement and Actuation Pro-file [7] (sMAP) uses a key-value store. An example of metadata could be: meta-data/room:1.1.1, where the metadata-key is metadata/room representing the room

12

Emil Holmegaard

Point def

Figure 2.1: UnitOfMeasure for points with more than 500 of the same UnitOfMeasurefrom the sMAP-instance [2] at http://new.openbms.org.

of a point and the value for the room is 1.1.1. Figure 2.1 illustrates the specificmetadata-key "UnitOfMeasure" for points from the sMAP-instance [2] at http:

//new.openbms.org. There have been used four different annotations for writingthe unit for fahrenheit in Figure 2.1. Another problem for a BOS is to maintainor set up the building instrumentation which require annotating metadata for allpoints. Calbimonte et al. [30] argue that metadata will be held at a very low levelor be incorrect if the person whom annotate the sensor does not have to bene-fit from the metadata. Slightly same issue was shown in Holmegaard et al. [31],where a facility manager had problems annotating metadata.

Working with points and their associated data streams, a challenge is to knowwhether the reported data are correct or not. Discovery is the process of findingnew points in the building instrumentation and extract metadata for the points.Maintenance is the process of updating, adding or removing metadata for a pointin the building infrastructure, to reflect the actual situation of the point.

Data streams are continuously data from points. The term data streamare also used for historical data from points

Validation is the process when a human has verified that the data stream of thepoint seem to match its annotated metadata. Another way to use validation, is torequire a minimum level of metadata. An example of a minimum level of requiredmetadata, could be a requirement of having metadata about location for all pointsfrom a building instrumentation.

The Knowledge Discovery (KD) process for transformation of data from pointsand data streams to generate metadata, can take many forms. Data might be tags,

13

Emil Holmegaard

Why metadata




which often are a concatenated string with information about location, type andsystem of a points. Other data includes data streams, information from BuildingManagement System (BMS) or Building Automation System (BAS) etc. Bhattacharyaet al. [32] have based their methods on tags from BAS, and developed a learningscheme based on examples provided to an expert with a human in the loop ap-proach. An short example of a tag is "SODA3R419_RVAV", which holds informa-tion about location and type of point. The expert for Bhattacharya et al. [32], wasa facility manager with insight to the building instrumentation for the experiment.The expert was introduced to a few tags, which then was labeled by the expert. Amodel based on the labels, was applied to the rest of the tags. Based on a few tagexamples, 70% of the building instrumentation have been annotated with meta-data such as location, point type and which system the point belongs to. Balajiet al. [33] have created the framework Zodiac, where the methods are based on acombination of tags and data streams. They have obtained results where metadatahave been attached for 98% of the points in a building.

Another challenge for metadata regarding buildings and building instrumenta-tions is the format of the metadata. Bhattacharya et al. [29] analyzed the mostcommon metadata formats, and concluded that none of the formats would be ex-pressive enough to support the building applications described in the literature.Brick builds upon the results from Bhattacharya et al. [29]. Brick [34] is an at-tempt for a common metadata format, to support PBAs. The format have beencreated as a collaboration between the leading universities within the researcharea of Cyber-Physical Systems (CPSs), EI and building control. The format whichcan hold relationships between points, systems and other elements in a building.Brick stores these relations as subject ⇥ predicate ⇥ object triples. The work hasdemonstrated how the format support a set of relevant applications. Brick so fardoes not include tools for annotating metadata, including versioning of changesto the Brick information. However, Brick pushes the metadata community to haveone common metadata format, which can store all relationships and importantmetadata for buildings to enable PBAs.

To summarize the challenges and topics of interest within the area of metadatafor points:

tools for metadata One of the major challenges for SDBs is metadata forpoints from building instrumentations. Tools for maintenance of metadataare missing, or does not support metadata which can be used in third partyapplications. BMSs with support for metadata, does not use the metadatafor a internal model representation.

humans in the loop It is human to fail, humans can and will introduce errors.Research using tags from BAS, have shown that simple mistakes in tags haveincreased the complexity for retrieving metadata [30, 32]. On the other side,facility managers are fast at analyzing whether a data stream from a pointare correct or not. Facility managers are essential for getting metadata forbuildings.

automated metadata generation Often a script is developed to apply sim-ple rules that fulfill a minimum of metadata for a building, such points

14

can be understood by others, working with the building. Creating frame-works for automated generation of metadata, require a common metadataformat, like Brick [34]. Furthermore there is a requirement for a validationof the generated metadata, due to errors which will propagate from train-ing data e. g. tags. This framework should then run with small intervals, toreflect the actual state of a building.

2.1.2 Non-Intrusive Load Monitoring

For NILM the data processing methods disaggregate one point or meter into asemantic representation of the equipment connected to the point. With disaggre-gated energy data a simple PBA would be a dashboard displaying the energyconsumption of a building. This section is based on work presented in Adabi et al.[35]. The section describes the status and challenges for NILM.

Smart meters measure and send information to energy providers. These mea-surements can be used for more purposes than billing, for example prediction ofnext peak in energy consumption, provide information of energy breakdown orsuggest the consumer to renew old equipment [36].

Smart meter measurements are often collected as aggregated power consump-tion at 15 minutes samples. The market for Automated Metering Infrastructure(AMI) have meters with high sampling resolution in the range of 1-2 kHz, butthose meters are expensive1. The Automated Meter Reading (AMR) measure-ments, is the foundation for Data Analytics (DA) with respect to energy data,which also is the reason to collect data with higher frequency, to show the poten-tials for DA. The data can be used for providing knowledge to energy consumers,for reducing consumption. DA can also be used for energy breakdown, where theenergy breakdown can be used for diagnostics, or for providing knowledge of asystem to identify maintenance requirement or malfunction.

Armel et al. [36] argue that equipment augmented with feedback can reduce theelectricity consumption with more than 12% in the residential sector, and Darby[38] point to results in the range of 5-15%. In an industrial setting augmentedfeedback, tailored to the setting might also provide similar savings. A collabora-tion with a cold store facility have been establish via the project related to MicroGrid Living Lab, collaboration with a cold store facility. The specific cold storeconsumes the same amount of electricity as 2040 average four-person householdsfor one year [39]. Potentially it would be more effective to apply saving efforts inone cold store, than to implement systems for reducing electricity consumption in2040 residential households. Disaggregated energy data, might be implementedas PBAs. Another usage of disaggregated energy data would be PBAs, wherethe building instrumentation are limited, and the disaggregated energy data areused as a proxy. Disaggregated energy data can provide knowledge for decision-making in general, e. g. for replacing inefficient equipment or optimization of workprocedures to reduce and optimize the electricity consumption.

Since Hart [40] introduced NILM for disaggregation of electricity consumptionreadings in the late eighties, almost all research efforts have been focusing on the

1 For United States (US) residential use: 265e + installation [37]

15

residential sector [41, 42, 43, 44]. To apply NILM in an industrial setting requiresnew assumptions as equipment, load levels and temporal patterns are different.

Additionally, it cannot be assumed that a full disaggregation of all industrialloads can be performed with only measurements of the main entry meter. NILMin residential settings, have at maximum disaggregated around 100 appliances [45].Therefore sub-metering is needed in relation to the size of the industry, the numberof equipment and the type of equipment.

Industrial settings like a cold store deviate from residential settings, related tothe number of appliances and the type of equipment. For residential settings thenumber of equipment can be limited to around 50 and maximum 100, but in anindustrial setting this number can be enormous. Multiple appliances will havetransient events at the same time. This will increase the difficulties for energydisaggregation. Furthermore the temporal pattern may change from industry toindustry.

To summarize the challenges and topics of interest within the area of NILM:

equipment type One of the major challenges for NILM in an industrial setting,is the equipment which the algorithms have to predict. Identification of dif-ferent types of equipment might be difficult, where in residential settingsequipment like fridge, stove and water heater can be found in every home.Equipment in an industrial setting might be managed by Variable FrequencyDrives (VFDs), which can change the standard power signature of the equip-ment.

temporal patterns In most residential houses the occupants leave the houseat 7am and reach the house at 5pm, they do almost the same routines eachday, and then there can be some abnormal patterns for holidays and specialdays. For example, have Kim et al. [42] used the patterns to optimize the ac-curacy of the algorithms. For industrial settings the patterns depend on howthe processes within the industry are and whether the company uses shiftwork. There is a challenge for retrieving information about the industrialsetting where NILM is applied, due to special setups in different settings. Itmight be difficult to find and understand the temporal patterns in industrialsettings rather than for residential settings.

industrial secrecy There can be an extensive amount of inside informationon an energy bill for a public company. It is a challenge to get public datafrom industrial settings, for having research of energy data in industries.There are industrial secrecies stored in the energy information. An example,could be knowing how much energy a data center uses for cooling, then youcan estimate the number of servers in the data center.

2.2 research question

The challenge for this thesis is to enable and support SDB. We suggest an approachbased on KD, DA and Software Development (SD), with collection of data streamsfrom points. Thereby transform data from buildings to knowledge. Knowledgein the form of metadata can then be used for providing an abstraction layer for

16

buildings, where it will be possible to install applications, for energy analysis, newcontrol strategies etc.

The research questions for this project are:

How to transform energy and sensor data from buildings to knowledge thatsupport Software Defined Buildings?

With the following sub-questions related to metadata for SDB:

Which steps does it take to have automated or semi-automated metadatageneration? How much metadata can we retrieve from simple data streamcomparison? Which features are required to create a tool for maintenance ofmetadata?

and the following sub-questions related to NILM:

How to disaggregate one sensor into a semantic representation of the equip-ment connected to the sensor in an industrial setting? How can data processingmethods help make energy expensive processes more feasible?

2.3 state of the art

As highlighted above, the focus for this project is to obtain and gain knowledgevia KD and DA from energy data and data streams from points. This section issplit into a section focussing on SDB and NILM.

For SDBs multiple research frameworks and systems exist which attempt toenable SDB. The majority of selected frameworks are developed upon and aroundsMAP. This section covers state of the art for SDB and metadata for buildings withrespect to automated generation and software tools for maintenance of metadata.

The section describing state of the art for NILM is split into three parts inspiredby Zoha et al. [46]. The three sections covers data acquisition, features for learningand learning algorithms.

2.3.1 Software Defined Buildings

This section provides state of the art for SDBs with respect to frameworks andmetadata for buildings.

sMAP is first and foremost a protocol which provide an interface for points inCPSs e. g. from a building. The protocol is implemented via a REpresentationalState Transfer (REST) Application Programming Interface (API) which enables allkind of devices to connect and integrate in a larger setup. Dawson-Haggerty et al.[7] have shown how sMAP can be scaled up and down to e. g. embedded de-vices, such as an AC plug meter. sMAP facilitates two types of databases, onefor data streams from points and one for associated metadata. The two databasesare wrapped in an archiver interface. Furthermore sMAP comes with a concept

17

Emil Holmegaard

Research Questions

of drivers, where a driver implements the sMAP protocol. When using the driverconcept, a local buffer is created, to minimize the performance load of the archiver.sMAP creates an uniform interface for points, which minimize the burden of inte-gration of building instrumentation.

To enable PBAs, as described in Section 1.1, applications should be able to con-trol and interact with points, with a minimal knowledge of the point. An ab-straction layer should minimize the references between the actual building instru-mentation and the application. This abstraction layer is similar to Open DatabaseConnectivity (ODBC) which is independent of database system and programminglanguage. An abstraction layer for PBAs would be independent of building in-strumentation. BAPPS provides an abstraction layer, by creating categories for thesMAP drivers of the building instrumentation. BAPPS have created a simple andefficient hierarchy for equipment in buildings, e. g. for light, where the minimalset of operations may be turning ON and OFF, but there could be a subset of lightswhich could be dimmed. Beside the abstraction layer to the building instrumen-tation, BAPPS provides a query language, which can be used as a protocol forPBA.

Building Operating System Services [4] (BOSS) have identified key features forBOS. In BOSS there is a Hardware Presentation Layer (HPL), which is imple-mented using sMAP, this gives an abstraction to the building instrumentation. TheHPL in BOSS provides a uniform way to interface the building instrumentation,here via the sMAP protocol. BOSS have a transaction manager as known fromdatabase systems, where conflicting operations will be ignored and the systemcreate a rollback. The transaction manager reasons if a service and an applicationtries to create two contrary operations. An example could be if equipment wouldstart to oscillate due to an operation. The transaction manager can rollback morethan one operation, and can be used to rollback to a certain timestamp. BOSS pro-vides an authentication service to protect and limit the building instrumentation.Authentication does also provide access grants for applications using BOSS. If auser or application have been granted access to a certain area for a certain time,then the service uses the transaction manager service, to rollback to settings asthey appeared before access was granted.

2.3.1.1 Existing Systems for BOS

This section is based on work presented in Holmegaard et al. [31, 47]. The sectiondescribes how leading research systems handles building instrumentation, pointsand BOS with respect to the way they handle metadata. All systems have beenanalyzed with respect to how they handle discovery, maintenance and validationof metadata.

Sensor Andrew [48] is a software system for enabling systems and devicesin a building environment to communicate over Internet. Sensor Andrew useseXtensible Messaging and Presence Protocol (XMPP) at the devices to enable pub-lishing of data streams, when new data are available, which gives as little com-munication to points as possible. Sensor Andrew uses an eXtensible Markup Lan-guage (XML) format to share metadata between devices and systems using SensorAndrew. The metadata in Sensor Andrew follows an XML schema, which ensuresvalidation and a minimum of required metadata. The discovery process in Sen-

18

sor Andrew is done via the XMPP registration and a Sensor Over XMPP protocol.The point uses protocol for populating metadata to a central repository. SensorAndrew does not have direct support for maintenance of metadata, but points canbe replaced.

HomeOS [49] is an Operating System (OS) for devices in homes. HomeOS hasroles defining which devices can be used through each role. The devices andapplications in HomeOS use a manifest to share information about compatibilityand relationships between devices. Discovery of metadata is done at the timewhen a device is registered at the home network. Maintenance of metadata inHomeOS is not supported in the prototype presented by Dixon et al. [49]. It hasnot been possible to find information whether the manifest follows a schema ornot.

The Building Operating System Services [4] (BOSS) introduces six services tosupport building applications. One of the services is the HPL which handles thetransformation from a query to actual points based on metadata provided by thepoints. The metadata in BOSS uses JavaScript Object Notation (JSON) key-valuepairs, which is also used for describing the relationship between points, systemsand subsystems. The relationships are parsed into a graph representing the build-ing. There is no information about schema and minimum required metadata forBOSS. HPL uses sMAP to expose the points and thereby provide manual function-ality for discovery, validation and maintenance.

BuildingDepot 2.0 [50] provides an infrastructure for building applications. Build-ingDepot consists of three central services: DataService, CentralService and AppSer-vice. The DataService provides the service for communicating with the physicalenvironment. The CentralService handles permissions for roles and users. TheAppService is the environment where building applications are executed. Build-ingDepot has created sensor and building templates, to ensure a common levelof metadata and a communication protocol within the system and to applicationsrunning in the system. The template also covers relationships between locationand points. The template covers unit and type for the point. BuildingDepot usesJSON via a REST API for communication. Discovery of metadata is done via thedata connector in the DataService. Metadata is validated against the templates.BuildingDepot does not support or have a software tool for maintenance of meta-data.

System Metadata Format Schema/Metadata profile Extensible Discovery Maintenance ValidationSensor Andrew [48] XML 3- Single 3- Schema Auto - XMPP 7 AutoHomeOS [49] Manifest 7 3- Manifest Auto - Device Registration 7 Manual - ManifestBOSS [4] JSON 7 3- Key-value pairs Manual - Driver 3- GUI Manual - DriverBuildingDepot 2.0 [50] JSON 3- Single 3- Key-value pairs Auto - Data Connector 7 AutoHomeKit [51] Objective-C 7 3- Custom Object Auto / Device Registration 7 Semi - Class LibraryWeave [52] JSON 3- Single 3- Custom Object Auto - Device Registration 7 Auto

Table 2.1: Solutions and systems to support metadata from buildings. 3 is used for“supports” and 7 is used for “does not support”.

For residential houses, Apple and Google are creating platforms to handle IoTand BAS. Apple has created HomeKit [51], which is a framework for iOS, thatApp-developers can use to connect for devices like Philips Hue [53]. HomeKithas a library of common household appliances, such as light, door lock, thermo-stat etc. HomeKit discovers metadata when new devices are recognized on the

19

local network. Maintenance is not possible via HomeKit. The validation processin HomeKit is done via the class library implemented in iOS.

Google has created Brillo as an OS for IoT, where the communication betweendevices uses Weave [52]. Weave is a JSON format to communicate between devices,the format has a schema, which can be extended. Weave has a discovery process,which is run when new devices are recognized on the local network. Weave usesa schema for validation of metadata, and ensure a minimum of required metadata.From the documentation of Weave it could not be determined whether Weavesupports maintenance of metadata.

A metadata profile is a schema, which supports and enables the processof validation for a point. The metadata profile defines for a given part,how the key-value set defining metadata for a specific part must bestructured. A part would be e. g. location, where a key-value set woulddefine a relationship for building, floor and room

The existing systems are summarized in Table 2.1. In Table 2.1 the column ofSchema/Metadata profile indicates whether a system supports a schema or metadataprofile to validate metadata and whether the solution supports one or multipleschemas. Extensible covers if and how a system can be extended for handlingmetadata. Discovery can be manual or automatically, the field provides a spec-ification of where or how the system discovers metadata. Validation can be au-tomatically via schema, manual inspection or semi-automatically via a structure.From Table 2.1 none of the analyzed solutions support automatic discovery, soft-ware tools for maintenance and automatic validation of metadata at the same time.Systems like Sensor Andrew, BuildingDepot and Weave have automatic discoveryand a validation process is implemented, but they does not support maintenanceof metadata. The systems which implement a solution using schemas, supportsonly a single schema. For supporting PBAs, it could be essential to have sup-port for multiple schemas. Support of multiple schemas, can be used for havingdifferent metadata protocols, which can be used by different PBAs. Furthermoremultiple schemas, would give the potential for PBAs, to depend on certain part ofthe schemas. sMAP and BOSS support maintenance, but do not have schemas forthe validation process of metadata.

2.3.1.2 Building Metadata

Bhattacharya et al. [32] has shown a way to minimize the time consuming task ofannotating points with metadata. Bhattacharya et al. [32] have combined activelearning and clustering techniques with a human in the loop approach. The datawhich have been used, are tags describing each point at a BMS. The tags have infirst place been created by a human, when the building was constructed. Thosetags have been presented for a domain expert, which has the knowledge to splitthe tags into meaningful metadata describing the point. A real example of a tagwould be "SODA3R419_RVAV", where the first three letters gives the building site,which is Soda Hall at UC Berkeley. "A3" indicates that the sensor is part of airhandling unit 3. "R419" gives the room location, which is 419. "RVAV" gives thepoint type, which is reheat discharge air pressure sensor for variable air volume.Based on a small set of tags, Bhattacharya et al. [32] was able to annotate up to

20

70% of all points from the building. One problem for this approach is existingerrors in tags, that will lead to errors in the learnt model. Known errors in the setof tags includes physical changes of the building e. g. have a wall split a room intotwo, which then have to share a set point for room temperature. Another knownerror was newly added equipment, which was not updated in the BMS where thetags originated.

Balaji et al. [33] have created the framework Zodiac, which successfully classifypoints with an average accuracy of 98%. Balaji et al. [33] have used BMS tags incombination with data streams from points to extract metadata. They identify theproblem of having mistakes in tags. An example could be a swop of two lettersin a tag, giving the tag a new mening. Furthermore they identify the problemof having multiple tags meaning the same or slightly the same. They have usedhierarchical clustering in combination with an approach similar to Bhattacharyaet al. [32] to extract metadata from the tags. To obtain an average accuracy of98% they combined the approaches using tags and clustering of data streams. Theresults of the two approaches individual was 94% and 63%, respectively.

Balaji et al. [34] have designed Brick which can hold relationships betweenpoints, systems and other elements in a building. Brick stores these relations assubject ⇥ predicate ⇥ object triples. An example of a triplet could be: "room:e26-210-2" isPartOf "floor:2" and "floor:2" isPartOf "SDU OU44". The work has demon-strated how the format support a set of relevant applications. Brick uses RDF [54]files for storing the relationship graph, then SparQL [55] can be used for queryingthe graph. Brick does not have a validation of the created model, which meansyou could have incorrect relations. Brick can not store historical information, likea replaced point, here you would have to store a new RDF file for each version ofthe metadata model. Brick pushes the metadata community to have one commonmetadata format, which can store all relationships and important metadata forbuildings to enable PBA.

2.3.2 Non-Intrusive Load Monitoring

This section covers the state of the art for NILM. The structure of this sectionfollows the three steps of NILM presented by Zoha et al. [46], which includes: DataAcquisition, Feature Extraction and Learning Algorithms. Essential the three stepsare larger chunks of the 11 steps in Figure 1.2. First data acquisition, which collectinformation of the environment, e. g. power consumption. The data acquisitionrelates to selection and preprocessing. Then extraction of features to minimize thedata size and complexity, e. g. transient events based on Electro Magnetic Impulse(EMI), if the measurements are using High Frequency (HF) samples. This step issimilar to transformation. Then the learning algorithm compute on the features,either learning the model or using the model. The last step relates to data miningand interpretation / evaluation.

The research on NILM for residential households, has been active after the in-stallation of smart meters, particularly in United Kingdom (UK). One outcomeof the activity has lead to the development of the Non-Intrusive Load MonitoringToolkit (NILMTK) by Batra et al. [56]. NILMTK provides algorithms for process-ing and analysis of all publicly available Low Frequency (LF) datasets e. g. the

21

The Reference Energy Disaggregation Data Set [6] (REDD). NILMTK addressesthe problem of scarcity of an established code base for developers. Furthermore,NILMTK enables comparison of algorithms on heterogeneous datasets with dif-ferent data type, data rate and metadata. The NILMTK platform promises toaccelerate and streamline algorithm development for LF data.

2.3.2.1 Data Acquisition

Although studies conducted in recent years have contributed substantially, NILMstill faces substantial challenges and limitations in its application, especially train-ing time and recognition accuracy. The widespread use of NILM is especially hin-dered by the limitations of the smart meters now widely deployed for AMI andthe used sampling rate [36]. For research in NILM, there is very little data avail-able at sampling rates that can capture even the low harmonics of the 60Hz signals.There are also scalability challenges due to inaccessibility of high resolution datathrough smart meters. For HF data, the lack of availability of reliable dataset, is alimiting factor. Various data rates have been applied for identifying the devices inboth industrial and residential settings. From its inception, NILM has been seenas a tool, especially valuable for residential energy monitoring with data gatheredat the revenue meter. With the wide deployment of smart meters, which provideboth data acquisition and networking for AMR (and the overall system AMI), therehas been growing interest for AMR data with respect to NILM. Unfortunately, thedata sampling frequency required for billing purposes led to data being providedto the utility – at best – every five minutes and sometimes with as much as an hourbetween samples. Utilities do not make data available until a day or two later thanthe sampling. This provides challenges for using the data in applications e. g. fordisaggregation of the consumption for each equipment. When the data are one ortwo days old, it will be difficult to remember exactly which equipment were used.In California the investor-owned utilities have added the capability in the meter touse Home Area Network (HAN) in real-time, via Zigbee (802.15.4) [57]. This pathprovides real-time data with a sampling interval at best of 10 seconds. Existingsmart meters can provide 1 second data with a firmware upgrade based on Armelet al. [36]. One challenge is still to increase the sampling rate, where smart metersare limited by processor memory and buffer size. So even with the HAN link, asmart meter cannot provide 2kHz, due to the small processor memory. Therefore,companies such as Pecan Street Inc [58] have chosen to instrument more than 1200

houses with eGauge [59] meters. The instrumentation provides simultaneous 1

second data for 12 circuits as well as two or three voltage phases and collected incloud storage.

The majority of studies on residential NILM can be divided into two maingroups: Studies which investigated the LF sampling data [43, 60] (frequency 61Hz), and studies which examined the HF sampling data (frequency > 1Hz) [61,45, 62].

Transient voltage features were initially studied by Patel et al. [61] and contin-ued further with EMI analysis by Gupta et al. [45]. These EMI based methodsshowed higher accuracies with shorter training periods. Voltage transient featureson data rate above 40kHz differ from home to home because these features aretied to the wiring of the specific home. This suggests that 40kHz and above tran-

22

sients data might not be significant to introduce verifiable and salient signaturesacross homes. However, HF methods have shown to be more effective in detect-ing appliances more precisely. Large number of appliances can be recognized inthe 10- 40kHz ranges. Even though the reported training time varies from studyto study [63], a pattern can be drawn that higher sampling rates result in moreaccurate models which can decrease the training time of the algorithms. Table 2.2shows a few high frequency studies including the variables they measured, num-ber of appliances they targeted, sampling rate and the training duration, and thepercentage accuracy they achieved.

Name Description Variable Appliances Sample rate Training AccuracyBerges et al. [64] Signatures Power, Voltage 17 10kHz 5 days non real-time 86%Kolter and Jaakkola [65] Factorial HMM Current 9 15kHz 2 weeks 83%Figueiredo et al. [66] Integer programing Current , Voltage 42 40kHz Not reported 80%Patel et al. [61] ON/OFF transient noise Current , Voltage 40 100kHz 150- 350 events 85- 95%Gupta et al. [45] Harmonic analysis Voltage 94 1MHz 6 months 94%

Table 2.2: Overview of NILM research with HF methods

2.3.2.2 Feature Extraction

Hart [60], the pioneer in the field of NILM, created a PQ feature2, for findingwhich equipment was turned ON or OFF. This step can be seen as transforma-tion with respect to KD to reduce the solution space. The few selected equipmentin the setup of Hart [60], had different signatures in the PQ-plan. One problemfor using PQ, is that multiple equipment do have the same signature, meaningthat the feature could not distinguish all equipment. Parson et al. [43] have sub-metered everyday equipment and generated a learnt model. Here a feature basedon the meter data and e. g. time between cycles of the refrigerator was used. Par-son et al. [43] learnt an model for a refrigerator, and then created a fit for the actualhouse, which optimized the results. Reactive power measurements stores the mosteffective information, when looking at features for LF current and voltage. Com-bination of the duration for ON/OFF, date/time, daily schedule of the occupantscan increase the accuracy [42]. Kim et al. [42] shows how the dependency betweenequipment like TV, video and gaming console can be used for giving an easiertask of labeling and disaggregation at equipment level.

HF current and voltage features used for equipment monitoring and are specif-ically good for event detection. Laughman et al. [62] used the �P - �Q plan,with HF, which could disaggregate more equipment than Hart [60]. HF meth-ods generally apply signal-processing techniques which require extra hardwareon the circuit breaker. HF methods can identify steady state or switching transientfeatures. Transient voltage features were initially studied by Patel et al. [61] andcontinued further with EMI analysis by Gupta et al. [45]. The methods showedhigher accuracies with shorter training periods, see Table 2.2. HF methods haveshown to be more effective in detecting equipment more precisely. Large numberof equipment can be recognized in the 10-40kHz ranges. Even though the reportedtraining time varies from study to study [63], higher sampling rates result in moreaccurate models.

2 P for active power, Q for reactive power

23

Increase in number of features, will often make the computation heavy. Morefeatures will also give higher accuracy, due to an easier task of selecting the properequipment. Focus of using other information than the active or reactive powermeasurements is shown in [42, 46]. It has been shown that additional informationare giving higher accuracy and more precise predictions of which equipment hasturned ON or OFF.

2.3.2.3 Learning Algorithms

Hart [60] proposed to disaggregate a power signal into the individual equipmentrepresented in the signal. This can be formulated as the aggregated signal, seeEquation 2.1, where N is the types of equipment and the power consumption ofequipment n at the time t given as x

n,t.

x

t

=NX

n=1

x

n,t (2.1)

The problem is to find x

n,t, with the only known variable x

t

. Equipment havestates e. g. ON/OFF or ON/OFF/PEAK. The equipment states add more com-plexity to the problem. To reduce the complexity, an one at a time assumption isintroduced. The assumption assumes that equipment can only be in one of theK-states at the time t. The one at a time assumption is expressed mathematically

ask=KPk=1

z

n

t,k = 1. The power consumption ✓̂

n

t,k for equipment n, at the time t in

state k is given by ✓̂

n

t,k =KP

k=1

z

n

t,kµn

k

, where µ

n

k

is the power draw of equipment n

in state k. The N types of equipment, which can be in K states, gives Equation 2.2.

x̂

t

=NX

n=1

KX

k=1

z

n

t,kµn

k

(2.2)

There are three approaches for solving the problem: supervised, semi-supervisedand unsupervised. For the unsupervised approach Kim et al. [42] have created anadaptive model of the equipment types. For the semi-supervised approach sub-meters provide data to train a model of ✓

n

t,k. Parson et al. [43] have done thisby using Hidden Markov Model (HMM) and prior knowledge of equipment. Asupervised approach is to minimize the error e

t

in Equation 2.3. This is done byusing a model, based on the sub-metered data, and then uses Combinatorial Op-timization (CO) algorithm for finding a combination of turned ON equipment atthe time t, given a minimal error e

t

.

x

t

= x̂

t

+ e

t

(2.3)

Using CO for finding the power consumption of each equipment, ✓̂nt,k can be

formulated as an optimization problem. The optimization problem is stated inEquation 2.4, here it is an optimization for a minimal e

t

, and the output providesa state vector z

t

for the time t. The CO problem is Nondeterministic Polynomial

24

x (1)t-1 x (1)

t

x (2)t-1 x (2)

t

x (n)t-1 x (n)

t

θ t-1 θ t

x (1)t+1

x (2)t+1

x (n)t+1

θ t+1

Figure 2.2: A directed acyclic graph representing the conditional independence relationsin a factorial HMM with 1 to K underlying Markov chains.

Time (NP)-complete which gives an exponential solution space, for this problemit is K

N, where N is the number of equipment and K is the number of states.

z

t

= argminz

t

|xt

- ✓̂

n

t,k| (2.4)

Due to the NP-completeness of CO, the algorithm can in particular have prob-lems with equipment of the type VFDs where equipment will have multiple states.Here another solution can be Factorial Hidden Markov Model (FHMM), wherethe model combines time and power draw in independent Markov chains.

Given several independent Markov chains, and the observation is a joint func-tion of all hidden states, this can be seen in Figure 2.2. Due to the problem ofa non-tractable inference, Gibbs sampling [67] are applied. This is formulated inequation Equation 2.5. The output of the FHMM is an additive function of thedifferent hidden states in the Markov chains, as in Ghahramani and Jordan [68].

x

(i)1

⇠Y �

�

i

�

x

(i)t

|x(i)t-1

⇠Y✓

�

(i)

x

(i)t-1

◆

✓

t

|x(1:N)t-1

⇠ N

NX

i=1

µ

(i)

x

(i)t

,⌃

! (2.5)

In Equation 2.5, the initial state distribution for the ith Markov chain is given by�

(i), N is the number of HMMs. x(i)t

denotes the state of the ith Markov chain attime t. The transition matrix is described by �. The aggregate output is given by✓ and µ

i and is the mean of the ith HMM. ⌃ is the covariance matrix describingthe covariance between each state distribution.

The learning algorithms for NILM presented here are CO and FHMM. CO pro-vides a fast and simple approach, if the number of equipment are limited. Besides

25

a low number of equipment, the states of the equipment should also be held ata low number for CO. For CO each time t are considers as a separate optimiza-tion problem, each time t is assumed to be independent. FHMM provides a morecomplex model for disaggregation, where time and power draw are modeled inindependent Markov chains.

26

3R E S E A R C H A P P R O A C H

The chapter present the chosen research approach for this project. Firstly, theresearch context is explained, each of the following chapters will be related to thecontext. Then follows an overview of the applied approach, followed by case sitesused through the thesis. Lastly the used evaluation criteria are listed.

3.1 research context

Figure 3.1 illustrates the concepts for a Software Defined Building (SDB) enabledby a Building Operating System (BOS). The illustration shares multiple featureswith Building Operating System Services [4] (BOSS) and Building APPlicationStack [3] (BAPPS). Basically the concept consists of three levels. The physicallevel, which are monitored and controlled via points. Then follows the OperatingSystem (OS), which have different services including a BOS with Authentica-tion, Query Engine etc.The integration between the physical level and OS arehandled by an implementation of the Simple Measurement and Actuation Pro-file [7] (sMAP). The last level is the environment for Portable Building Appli-cations (PBAs), here it is assumed that the integration follows a protocol viaREpresentational State Transfer (REST).

Figure 3.1 uses the following colors for topics: Topics with respect to data pro-cessing methods are marked with a blue color. Topics with respect to softwaretools are marked with a green color. Topics of software tools which also includesdata processing methods are marked with a mix of blue and green colors.

Based on the two projects, see Section 1.3, Micro Grid Living Lab have beenpart of the building application layer. This relates to the usage of data, which aretransformed via Knowledge Discovery (KD) and Data Analytics (DA). The resultsof KD and DA can then be used for reducing the energy consumption of thebuilding. For COORDICY the tasks have been related to the software tool Metafier.Metafier supports the Query Engine, by annotating points with metadata.

In the layer for PBA, would be an application like Comfy [69]. Comfy is acommercial product by Building Robotics, Inc. The company behind Comfy isstarted by Andrew Krioukov and Stephen Dawson-Haggerty, Comfy builds upon[7, 3, 4, 70]. Another PBA could be Intelligent Building Control e. g. using Con-troleum [71] to optimize comfort and energy consumption. An Intelligent BuildingControl could be implemented in the same way as Umair et al. [71] have been do-ing for greenhouses. The objectives of the control would then be to optimize com-fort and minimize energy consumption. A third PBA could be for Non-IntrusiveLoad Monitoring (NILM), using some of the techniques from Part II. An applica-tion for NILM can then be used for finding equipment, which should be replaceddue to malfunctions. A way to find equipment, which malfunction would be tovalidate the trained model to the actual consumption pattern. Another applicationfor NILM would be a feedback or diagnostic application, to display the current en-ergy consumption of a building.

27

sMAP

Networked Relay

Networked Light

Networked Termostat

Weather ?

Metafier

sMAP Metadata Store

sMAP Timeseries Store

Physical World / Building

Metafier Store

BOS

Metadata Model

Query Engine

…

Auth

Building API

Metadata Generator

Maintenance of Metadata

Validation of Sensor Infrastructure

Operating System

Comfy Intelligent Building Control NILM

Building Applications

Figure 3.1: SDBs as a concept. The colored areas relates to topics from this Ph.D. study.The blue markings relates to data processing methods and the green markingsrelates to software tools.

Figure 3.1 will be used for describing the research context in the following chap-ters, to relate the sub-projects with respect to the overall project and researchquestion, see Section 2.2.

3.2 research approach

In the thesis a constructive research approach is chosen. This involve ideas, theo-ries and prototypes are created to address the research question from Section 2.2.The contributions of this thesis are derived from the proposed ideas, theories andprototypes. The process from Figure 1.2 together with the constructive researchapproach have been used when working with prototypes. The approach presentsseveral steps, which have been applied to this thesis.

• Selection of relevant research problems

• Literature study to obtain a knowledge base, identify challenges and identifymethods and tools to build upon

• Development of models and theories which address the challenges of theresearch area

• Implementation of prototypes which takes the proposed models and theoriesinto account

• Evaluation of prototypes with real life scenarios and data from real buildings

28

• Experimental analysis of the applicability of the models and theories

• Discussing and concluding on the approach, limitations and strength of theproposed models

The approach has been iterative, related to the projects, described in Section 1.3,this thesis have been build on work within the two projects Micro Grid LivingLab and COORDICY. Furthermore the methodology have been repeated doingthe course of this Ph.D. study. Tthe 11 steps for transforming data to knowledgefrom Section 1.2 have been used iterative and modified to fit each sub-project andthe corresponding prototypes.

3.3 case sites

This section summarizes the case sites used through this project. The case siteshave been applied to the different sub-projects, to have realistic tests and to showapplicability in real scenarios. The sites vary in size, purpose and how they areinstrumented. Table 3.1 provides the overall information for the different casesites.

Site Construction Year Purpose Location Area Floors Points Data SourceGreen Tech House (GTH) 2014 Offices Vejle, Denmark 3.000 m2

3 1.893 BMSUniversity of Southern Denmark (SDU) OU44 2015 Offices / Teaching Odense, Denmark 8.300 m2

4 7.865 BAS, BMSCold Store 1998 Cold Store Vejle, Denmark 300.000 m3

1 50 GridPoints

Table 3.1: Summary of sites from Section 3.3.

The summary in Table 3.1 is based on Section 3.3.1 to Section 3.3.3.

3.3.1 Green Tech House (GTH)

The GTH is a 3.000 m2 building split across 3 floors, constructed in 2014, see Fig-ure 3.2. It is located at the Green Tech Center (GTC), Vejle and contains offices,canteen and a few meeting rooms and the total number of rooms are 50. Thebuilding is home for startups and established companies working with withinrenewable energy and energy counseling. Each company is a tenants, wherethey pay for electricity and common expenses. The building is equipped witha Building Management System (BMS) controlling Heating, Ventilation and AirConditioning (HVAC), on room level there is a Building Automation System (BAS)controlling light. The BMS is a Honeywell Niagara, where a REST Application Pro-gramming Interface (API) is provided. The BAS at room level communicates viaModbus and BACnet.

At room level the REST API provides current room temperature, boolean valuesfor Passive InfraRed (PIR), current CO

2

level, current lux level, a percentage ofopenness for radiator valves and percentage of shading height. At system levelthe API provide selectricity consumption for ventilation, elevator and electricityconsumption for each tenancy. The REST API provides a data structure with idsbased on human readable strings. An example of an id for a point would be:GreenTech/House/Floor1/Offices/1A11Office/CO2 level. The first two parts of the idgive the building, then the floor, followed by the room type. The last two partshold information of the room name and then the type of point.

29

Figure 3.2: GTH located in Vejle, Denmark.

3.3.2 SDU OU44

SDU OU44 is a 8.300 m2 building split across 4 floors, constructed in 2015. Itis located at the University of Southern Denmark (SDU), Campus Odense seeFigure 3.3. The building contains auditoriums, teaching rooms and a few offices.The building is equipped with two BASs, one controlling at system level andone controlling at room level. At system level the BAS is a Schneider ElectricStruxureWare [72] controlling the HVAC. The building consists of four identicalHVAC systems. At room level the BAS is ETS [73] controlling every point at aroom level, using a KNX bus [74].

Figure 3.3: SDU OU44, located in Odense, Denmark.

For SDU OU44 a full integration into sMAP have been developed, which meanall points can be accessed in both directions, read and write. The integration havebeen split into two parts, one for Schneider Electric StruxureWare and one forETS. Schneider Electric StruxureWare uses a middleware from Schneider Electricwhich provides a REST API for StruxureWare Enterprise Server or StruxureWareAutomation Servers. The integration is directly to the automation servers, due toperformance of the system. The integration to ETS is using KNX gateways, whichare set up via an NetX Automation Server [75] exposing data from KNX gatewaysvia OPC UA [76].

At system level all parameters for the HVAC system are exposed. At roomlevel the building instrumentation follows best practise, which include the follow-ing properties: current room temperature, boolean values for PIR, current CO

2

level, relative humidity, current lux level (divided into multiple zones for largerrooms, e. g. classrooms have 3 zones), a percentage of openness for radiator valves,percentage of shading height. All exits to other buildings or to outdoor are instru-mented with 3D cameras for counting the number of persons in the building.

30

3.3.3 Cold Store

The cold store is around 300.000 m3 split into multiple storage floors. The coldstore is constructed in 1998. The cold store is located at Vejle, Denmark. Thepurpose of building is to freeze and store goods. The cold store used in thisproject, can not be mentioned by name, due to business considerations.

The processes in a cold store can be simplified into the three processes of freezing,storage and heating. The process of freezing, is initiated when new items arrivewhich are not already frozen, the items are then set into a freezing station to reacha proper temperature. The operation principle of the freezing station is to blowice cold air through the items. Typically the freezing process takes 36 hours foreach pallet. The process of storage, is to keep the items in the storage facility at atemperature below -18

�C. The operation principle of the storage facility is like a

fridge, with a set point and control mechanisms to reach the set point temperature.The process of heating, is for avoiding icing on the floors, and avoiding permafrostunder the building.

The data set covers forty electricity meters (hereafter points) in the cold storefrom the period 2014-06-01

1 to 2014-12-01 with a one minute resolution. The par-ticular cold store uses two set points for the storage process, one for day and onefor night, where the day temperature is -18

�C, and night temperature is -23

�C.

The set point changes each day at 21 and 06. The reason for having two set points,relates to the energy prices which are typically lower in the night hours, followingElspot day-ahead prices [77].

The points are allocated to equipment with 18 points for compressors, 8 pointsfor light, 5 for industrial fans, 3 for condensers, 2 for evaporators and 4 for heatpumps. The forty points measure the equipment types individual, meaning thata group of light are measured by one point, there are 8 points measuring lightgroups. The cold store has more electrical loads than the metered loads. Therefore,for the NILM project, it has been necessary to create virtual main meters ratherthan use the real main meter.

The meters used are GridPoints [78] which measure the cumulative active en-ergy (watt-hour) at least once per 60 seconds. The variable sampling can lead tomore than one reading per 60 seconds. A weighted mean filter have been used toaggregate the readings to a time series with a resolution of exactly 60 seconds. TheGridPoints are using the wireless technology ZigBee to send the measurements toa data collector, a GridAgent [79]. The data collector forwards data to a centralserver, with access via a REST API is provided.

3.4 evaluation criteria

Results and answers to the research question, Section 2.2, are evaluated with re-spect to the following criteria.

• Software tools and data processing methods must be validated with datafrom real buildings. All prototypes must be tested with real data, to have arealistic test and to show the applicability

1 Following YYYY-MM-DD

31

Emil Holmegaard

Cold Store - data about the cold store

• Software tools must be evaluated by experts working in the area of the tar-geting solution

• Data processing methods for Micro Grid Living Lab must bring more valueto the energy data. The work related to Micro Grid Living Lab, needs toprovide, enrich and transform simple data, to minimize the sensor infras-tructure. To make expensive processes more feasible with respect to energycounseling, the infrastructure should be simple

• Electricity data from Micro Grid Living Lab should be applied to Non-IntrusiveLoad Monitoring Toolkit (NILMTK), to validate the quality. NILMTK pro-vides the ability to test multiple disaggregation algorithms on the same dataset

• Data processing methods for COORDICY, should use data from buildinginstrumentation, to generate metadata. To minimize the errors, which canbe in tags from BMS, it should be based on data streams from points. Fur-thermore should generated metadata be evaluated against manual labelledground truth metadata

• Data processing methods for automated metadata generation, should focuson point types, eg. temperature and humidity. Point types have been chosenbased on Figure 2.1

32

Part II

N O N I N T R U S I V E L O A D M O N I T O R I N G

This part is based on Holmegaard and Kjærgaard [80], and presentswork related to NILM for an industrial setting. First an preliminaryanalysis of industrial equipment related to power consumption, the-ories and methods for NILM, then the evaluation setup is presented,followed by the results and a discussion. This part provides resultsand insights for answering research questions related to NILM

4A L O A D C H A R A C T E R I Z AT I O N A N D A L G O R I T H ME VA L U AT I O N

The chapter is based on Holmegaard and Kjærgaard [80], with the topic of dataprocessing method and Non-Intrusive Load Monitoring (NILM) in an industrialsetting. The industrial setting is a cold store, see Section 3.3.3. The cold store havebeen anonymized, due to business considerations.


The experiments for this chapter been conducted via the project Micro Grid LivingLab. The data have been collected at a cold store, the case site is described inSection 3.3.3.

The topic for this chapter relates to data processing method, and for KnowledgeDiscovery (KD) it relates the areas of data throughout data mining, illustrated inFigure 1.2. The first part of the chapter is a preliminary analysis, see Section 4.2.The preliminary analysis focus on the equipment in the cold store, to identify chal-lenges and difficulties for NILM. Then followed by preprocessing, transformationand data mining with six different experiments have been evaluated.

The concept for a Software Defined Building (SDB) is illustrated in Figure 3.1,NILM can be seen as an application which uses services and data from a SDB.Applications using NILM, can be used to analyze electricity consumption with anobjective of reducing electricity consumption. Finding equipment which patternvary in frequency e. g. due to a change in the pattern or simple due to changes ofthe processes.

As shown in the review of NILM, see Section 2.1.2, the research for NILM in in-dustrial settings have been limited. One of the challenges is getting access to data,for Holmegaard and Kjærgaard [80] data was provided via the collaboration withEnergy Guild Vejle Nord [5] (EGVN). Based on state of the art, see Section 2.3.2,was Non-Intrusive Load Monitoring Toolkit (NILMTK) chosen for analysis of elec-tricity data from the cold store.

The contributions of work related to NILM in an industrial setting are as follows:

• Holmegaard and Kjærgaard [80] was one of the first to analyze NILM in alarge industrial setting, which carries out several processes and uses a hugeamount of electricity.

• An analysis of how NILM can be used with different levels of sub-meteringfor providing detailed breakdowns of the power consumption in an indus-trial setting, here a cold store. The results show that changing the level ofsub-metering increased the test accuracy, F1-score, by a third from 0.4 to0.6. These results apply to Combinatorial Optimization (CO) and FactorialHidden Markov Model (FHMM).

35

4.2 preliminary analysis of industrial equipment

This section provides an analysis of the equipment, the data collected, and to someextend what can be hypothesized when using NILM for equipment in a cold store.The tasks with respect to KD for this preliminary analysis covers data, selectionand target data. The case site and the used points are described in Section 3.3.3.The basic meter setup are illustrated in Figure 4.1.

18 x Compressor4 x Heat Pump8 x Light Groups 3 x Condenser2 x Evaporator 5 x Industrial Fanmain

Figure 4.1: Equipment in cold store setup, a total of forty points, all connected to one mainmeter.

4.2.1 Power Consumption for Industrial Equipment

In this section an analysis is presented of the collected data to characterize theelectricity consumption of industrial equipment in the cold store. The analysisfocuses on characteristics relevant to the performance of NILM disaggregationincluding amount of power level change events, power states and correlation inusage among equipment.

4.2.1.1 Equipment Events

The task of disaggregation is particularly difficult when equipment only rarelychanges power level. For instance, equipment which only turns ON/OFF orchange power state once a week. A power change event is defined as a changeof minimum ±�P between two electricity consumption samples. The value of �Pis set to 10 following previous research, e. g. this is the value used by the open-source toolkit NILMTK [56]. The number of events indicate how often equipmentis going from one state to another. The data set has been analyzed with respectto events, for finding the difficulties with respect to the used NILM algorithms.Counts have been split for the hours of the day to evaluate if there are any hourlypatterns. Figure 4.2 shows histograms for a compressor and a condenser. For com-parison, the figure also includes a histogram for a residential house, showing thesockets of house 1 in The Reference Energy Disaggregation Data Set [6] (REDD).Histograms with minute and daily splits have also been created, but did not pro-vide any patterns of significant interest. Therefore, plots with minute and dailysplits have been omitted from further analysis.

From the histograms in Figure 4.2 it can be observed that the compressor has thehighest number of events between 1800 and 2800 events peaking in the afternoon.For most of the equipments, an increase in events around 20-21 and 05-06 canbe observed, this increment most likely relates to the change in day and nighttemperature set points, see Section 3.3.3. The lowest number of events is observedfor the categories light and evaporator which are always below 250 events, thosehave been left out of the figure. One might expect an event pattern in relation to

36

Figure 4.2: Number of events within a specific hour of the day for compressor, condenserin the cold store and a residential house.

when the workers arrive and leave the cold store. However such pattern was notpresent in the data, as the influence of the workers is too small compared to theprocess loads. Furthermore, equipment used directly by workers, e. g. electricalforklifts, has been left out of the study due to the point setup in the cold store.Additionally, the lower temperature set points in the night lower the consumptionin the cold store before noon due to the storage of energy in the thermal mass ofthe buildings and goods.

In comparison, it can be observed from Figure 4.2 that the residential house dataset, has a clear pattern of events in the morning (7-9) and in the evening (18-23).This pattern is expected as the occupants wake up and make breakfast, are awayduring the day, and then in the afternoon come home and cook for dinner.

4.2.1.2 States of Equipment

The task of disaggregation depends on learning a model of possible power statesfor each type of equipment. Therefore, it is relevant to analyze the power states ofindustrial equipment. The equipment in a cold store is hypothesized to have mul-tiple states, as observed by Chang et al. [81] who have described load signaturesof industry equipment. As compressors can be managed as a Variable FrequencyDrive (VFD), the compressors can potentially have infinite many operation lev-els. Other equipment groups in industry settings can also be managed as a VFDusing frequency converters. Figure 4.3 illustrates the distribution of electricity con-sumption in descending order for condenser, compressor and heat pump. FromFigure 4.3 it can be observed that the distribution of electricity consumption in-cludes several plateaus, indicating individual power states. A popular method for

37

Figure 4.3: Power distributions for compressor, condenser and heat pump in the cold storewith colored clusters estimated via k-means clustering with k=3.

estimating power states is k-means clustering used among others by NILMTK[56].Clustering via k-means clustering with k = 3 have been applied, representing theON/OFF/PEAK states. The clustering results are shown in Figure 4.3 with onecolor per cluster.

It was hypothesized that the compressor was run as a VFD. From the slopeof the distribution of electricity consumption for the compressor in Figure 4.3, itcan be observed that the compressor consumption has several plateaus, indicatingthat the compressor is managed as an ON/OFF device and not as a VFD. Thisinformation has been validated with the technical staff of the cold store. Thetechnical manager tells that they have minimized ramp time, to obtain higherefficiency of the equipment at specific operation levels. The clustering identifiestwo major states and one ramp state for the compressor. The clustering assumesthat there are three states in addition to the OFF state, identifying two insignificantramp states. The condenser results suggest that it has several states especially athigher consumption levels. The k-means clustering finds three quite representativestates. PEAK state found over 3000W, middle state around 2000W and a low stateunder 1000W. The heat pump when running consumes a significant amount ofpower with a trend as a VFD within a band of consumption between 1700W and2400W. The clustering separates the consumption levels into two clusters and oneless used ramp state.

4.2.1.3 Event Correlation for Equipment

For NILM in industrial settings, one problem might be the number of equipmentsand overlapping power change events. To investigate this problem, correlation

38

matrices have been calculated between events of the different equipments, see Fig-ure 4.4. For the correlation matrices 1 means that the equipment is dependent oneach other. Negative correlation, -1 means the equipment is opposite dependent,e. g. the equipment follows each other where the first equipment is turned ON,when the other is turned OFF. The correlation matrices show events on a minutebasis for the hourly time spans 00-06, 06-12, 12-18 and 18-24 to enable a tempo-ral analysis, e. g. to evaluate if any equipment has higher correlation in the nighthours than day hours.

Figure 4.4: Event correlation between equipment in a cold store.

From the matrices in Figure 4.4, it can be observed that generally events are nothighly correlated. In the period 00-06 the light and fan are negatively correlated.This relates to the light being turned OFF in the night and the fan being ONfor the freezing process. The light and the evaporator are also slightly negativelycorrelated in the period 00-06. The compressor has a weak correlation with thelight especially between 06-12, which relates to the light having a high number ofevents in this period.

For a comparison have the same correlation matrix for house 1 in REDD beencreated, see Figure 4.5. A clear correlation between sockets, microwave and stove,were found, this indicates that the occupants in the house presumably have beencooking in the period 18-24. The correlation between appliances has been usedby Kim et al. [42] for labeling appliances. In comparison to the industrial equip-

39

Figure 4.5: Event correlation between equipment house 1 in the REDD.

ment, there are less equipment with overlapping events and stronger correlationswhen they overlap.

4.3 methods

Hart [60] proposed in the late eighties, to disaggregate a power signal into theindividual equipment represented in the signal. Here is the aggregated signal for-mulated as Equation 2.1, with N types of equipment and the power consumptionof equipment n, at the time t given as x

n,t.The problem is to find x

n,t, when the only known is x

t

. Equipment can havestates e. g. ON/OFF or ON/OFF/PEAK etc. The equipment states add some morecomplexity to the problem. The One-At-A-Time assumption is introduced, whichassumes that equipment can only be in one of the K-states at the time t, which

is expressed mathematically ask=KPk=1

z

n

t,k = 1. The power consumption ✓̂

n

t,k for

equipment n, at the time t in state k is given by ✓̂

n

t,k =KP

k=1

z

n

t,kµn

k

, where µ

n

k

is the

power draw of equipment n in state k. With N types of equipment, which can bein K states, this gives Equation 2.2.

40

There are two major approaches for solving the problem, a supervised and anunsupervised approach. For the unsupervised approach Kim et al. [42] have cre-ated an adaptive model of the equipment types. For the supervised approachsub-meters provide data to train a model of ✓n

t,k. Parson et al. [43] have done thisby using Hidden Markov Models (HMMs) and prior knowledge of equipment.Another supervised approach is to minimize the error e

t

in Equation 2.3. Thiscan be done by using a model based on the sub-metered data, and then use a COalgorithm for finding a combination of turned ON equipment at the time t, givena minimal error e

t

.Furthermore, the problem can be solved by learning a Markov-chain, represent-

ing the equipment states over time, and combine the chains into a FHMM. Kolterand Jaakkola [44] used this approach for disaggregation of electricity consumptionin residential households.

There are a number of known challenges with the standard algorithms for NILM(e. g. CO and FHMM). The algorithms have difficulties to distinguishing multipleequipment of the same type/make. The algorithms also have challenges withequipments which are steady-state loads or equipment with very few events. An-other problem for the industrial loads is that equipment with a relative smallconsumption can drown between higher consuming equipment. Some of the chal-lenges have been eliminated using High Frequency (HF) sampling and ElectroMagnetic Impulse (EMI) signatures [41], the only downside being the enormousdata collection and more complex sensor setups. Using EMI have not been fea-sible with the used point setup, furthermore the GridPoint does not support HFsampling.

It has been chosen to use the open source NILMTK [56] for solving the prob-lem of finding ✓̂

n

t,k given x

t

. NILMTK has been developed as a common toolkitfor researchers working with NILM. Furthermore, it has been developed to makeit possible for comparison and evaluation of new NILM algorithms against stan-dard algorithms. NILMTK provides preprocessing tools, metrics to evaluate thealgorithms and the essential code-base for creating NILM experiments.

4.3.1 Combinatorial Optimization (CO)

Using CO for finding the power consumption of each equipment, ✓̂nt,k is not new,

Hart [60] used this approach in one of the first papers about NILM. The opti-mization problem is stated in Equation 2.4, where an optimization to minimizee

t

are performed, the output is a state vector z

t

for the time t. The CO problemis Nondeterministic Polynomial Time (NP)-complete which gives an exponentialsolution space, for this problem it is KN, where N is the number of equipment andK is the number of states.

Due to the NP-completeness of CO, the algorithm can in particular have prob-lems with industrial loads consisting of VFD equipment. Batra et al. [82] haveimproved the CO algorithm, by adding five different preprocessing steps for op-timizing the predictions and for decreasing the solution space for CO. The fivesteps are; Time Series Synchronization to resample data from mains and sub-metersto the same timestamps, using forward filling for missing data. Downsampling isused for removing transient noise and fluctuations in the data. Assigning Loads to

41

Mains splits the data into more than one main meter for reducing the state space,in NILMTK this is handled by YAML files describing the data via a metadataformat by Kelly and Knottenbelt [83]. Clustering is used for finding the states ofthe different equipments by using k-means++ [84]. Equipment Power Calibration isthe functionality for calibrating measurements, if the meter devices are not of thesame type, and have different accuracies.

4.3.2 Factorial Hidden Markov Model

Given several independent Markov chains, and the observation is a joint functionof all hidden states. Due to the problem of a non-tractable inference, Gibbs sam-pling are used. This is formulated in Equation 2.5. The output of the FHMMis an additive function of the different hidden states in the Markov chains, asin Ghahramani and Jordan [68].

In Equation 2.5, the initial state distribution for the ith Markov chain is given by�

(i), N is the number of HMMs. x(i)t

denotes the state of the ith Markov chain attime t. The transition matrix is described by �. The aggregate output is given by✓ and µ

i and is the mean of the ith HMM. ⌃ is the covariance matrix describingthe covariance between each state distribution.

4.4 evaluation setup

Based on the preliminary analysis in Section 4.2, it have been chosen to evalu-ate the standard NILM algorithms in an industrial setting. The focus are on thefollowing parameters;

impact of point setup Previous work for residential settings has shown thatincreased sub-metering can increase accuracy [85, 86]. This part should in-vestigate the effect on NILM accuracy of two different point setups, one witha single main meter, and one according to logical sections in a cold store.

impact of power draw A cold store is built with materials of high insulatingproperties, but is still impacted by outside temperatures. It is hypothesizedthat higher outside temperature will increase the power draw of the coolingequipment. This part will investigate the impact of the outside temperaturerelated to the accuracy of NILM algorithms.

impact of goods flow A cold store is built with materials of high insulatingproperties, and it is hypothesized that there can be a relation between goodsflow inside the cold store and the power draw. Furthermore, an investigationof a possible relation between outside temperature and goods flow are per-formed. In this part a study of the impact with respect to goods flow insidethe cold store, the outside temperature and the power draw are performed.

impact of trained model The preliminary analysis revealed several patternsin the power consumption, based on this observations, a modification of theway the model created by CO have been developed. This part will investigatethe impact of using a adjustment for temperature differences between periodof training and period of testing the model.

42

impact of training data The preliminary analysis revealed several patternsin the power consumption, based on this observations, a modification of theway the model was trained for the FHMM algorithm has been developed.An investigation on the impact of using day specific data to train the model,and use models with similar patterns for disaggregation are created.

The evaluation of the results for NILM algorithms, are based on two metrics, theF1-score and Mean Normalized Error (MNE). F1-score has values between 0 and1, where 1 is for only true-positives in the test. MNE is presented in Equation 4.1,where 0 relates to none errors between the model and ground truth. The F1-score gives the correctness of the true-positives and false-positives of the predictedelectricity consumption, but does not consider how the algorithm performs overtime. MNE are used for evaluating the individual predictions compared to groundtruth at each time step. The metric is calculated based on the difference betweenthe ground truth ✓

n

t

and the predicted ✓̂

n

t

and normalized with ground truth.

MNE

n

=

TPt=1

|✓nt

- ✓̂

n

t

|

TPt=1

✓

n

t

(4.1)

The results are labeled in the figures, with the name of the algorithm, followedby TR# and T#. TR means training period and T means test period, an examplecould be "CO TR2W - T4W", meaning Combinatorial Optimization (CO) trainedwith two weeks of data and tested with four weeks of data.

4.5 results

NILM for residential buildings focuses on only having data from one main meter,and then disaggregate the consumption into the individual equipment. In thissection, the NILM in an industrial setting using two different setups are evaluated,first with one main meter, and then with four sub-meters mapping to the logicalsections in a cold store (freezing, heating and two storage).

4.5.1 Classic NILM Setup

For the evaluation have two weeks of data (the period 2014-07-01 to 2014-07-14)been used for training. For testing have data for the following four weeks (theperiod 2014-07-14 to 2014-08-14) been used. To train the algorithms two weeksof data have been used, such the training data contains all relevant states of theequipment. From the raw poser consumption, it has been observed that the coldstore has a lower consumption in weekends, when there is less activity and goodsarriving for freezing.

The setup have one main meter and forty points connected and six types ofequipments. The points for light are set to measure light for a group of lightfittings. The main meter is a virtual meter, which has been established and ag-gregated by our software. The test accuracy for the setup with one main meter

43

is shown in Figure 4.6. It can be observed that both of the NILM algorithmshave challenges for disaggregating the power consumption. The problem here isfor finding the right amount of power, caused by wrongly predicted equipmentstates, which relates to the results of Section 4.2.1.2. For this setup, both CO andFHMM could not find the proper states.

Figure 4.6: Test accuracy with one main meter for CO and FHMM.

As seen in Figure 4.6 the F1 scores are rather low, the best test accuracy isaround 0.7, but the average is around 0.4. It is the same tendency for all types ofequipments, and the only difference between the two algorithms, can be seen withthe fan. The fan has an test accuracy of 0.

Figure 4.7: MNE with one main meter for CO and FHMM.

44

The MNE score is lower for the equipment with high test accuracy, see Figure 4.6and Figure 4.7. The average MNE scores for CO and FHMM are 1.3 and 0.6respectively. It can be observed that the CO algorithm did not recognize the fan,where the MNE is around 4, which was also shown by a low test accuracy.

4.5.2 Impact of Point Setup

Based on the results in Section 4.5.1, a study of the impact, having more sub-metersin the NILM setup are performed. For larger industrial buildings, the electricalwiring will provide a natural division of the electricity consumption. Dividingthe total consumption x

t

into sub-totals x

i

t

where i is the number of meters, willreduce the state space for the CO and FHMM algorithms, as the number of equip-ments N will decrease. Sub-meters are costly and require maintenance1, thereforehave the the number of sub-meters been limited to the number of processes in thecold store. The second test setup is illustrated in Figure 4.8, the test setup is struc-tured with meter-groups divided into the processes of the cold store as describedin Section 3.3.3. The four sub-meters are virtual, and have been established andcalculated by our software.

16 x Compressor

4 x Heat Pump

4 x Light Groups

1 x Evaporator

1 x Compressor

4 x Light Groups

1 x Evaporator

1 x Compressor

5 x Industrial Fan

1 x Condenser

2 x Condenser

main

freezing

heating

storage 2

storage 1

Figure 4.8: Setup with meter-groups for the four logical sections in the cold store data,with freezing, storage and heating connected to the main meter.

The results for the processes setup for the cold store are shown in Figure 4.9 andFigure 4.10. The processes included are freezing, heating and two storage facilities,indicated by the number after the equipment in the same order. Equipment withnumber 1, is equipment used for freezing. Equipment with number 2, is equipmentused for heating. Equipment with number 3 and 4, are equipment used for storageone and storage two respectively.

Comparing Figure 4.6 and Figure 4.9, the test accuracy for the setup with foursub-meters, have improved. The test accuracy has increased by a third for both COand FHMM. Minimizing the number of equipment behind each meter, increases

1 The cold store have a price of around 130e meter/year, including data storage and maintenance.

45

the test accuracy. For the two algorithms, there is a notable difference for heatpump, where CO, has a higher test accuracy of 0.2 more than FHMM.

Figure 4.9: Test accuracy with four sub-meters for CO and FHMM.

For storage two in Figure 4.9 and Figure 4.10 (equipment with number 4), theresults are affected by very few events from the evaporator in both the trainingand test data. This creates large error offsets for the other equipment in this meter-group. All equipment in meter-group 4, are affected by the few events of theevaporator in the training data. This provides an insufficiently trained model forboth CO and FHMM in this meter-group.

Figure 4.10: MNE with four sub-meters for CO and FHMM.

46

The MNE gives an indication of how good the prediction has been at each timestep, here FHMM outperforms CO as an overall tendency.

Comparing the results from Figure 4.6 and Figure 4.9, the results with foursub-meters have an increased test accuracy, and a smaller error. For the resultsregarding freezing (equipment with number 1), does highlight shortcomings, andhave results with lower test accuracy. For the sub-meter regarding freezing, thedominant equipment connected is compressors. Furthermore, the equipment withthe highest error is condensers. The lower test accuracy relates to a high numberof events from compressors and condensers. In conclusion four sub-meters hasimproved the average test accuracy with 27.6%. The MNE in average for CO andFHMM are 1.2 and 0.7 respectively. The MNE has not changed as much as the testaccuracy, but has reached the same level as with one main meter.

4.5.3 Impact of Power Draw

This section provides an evaluation for the impact of the power draw. For the coldstore case, it is hypothesized for the outside temperature to impact the power draw.Furthermore, is it also hypothesized for flow of goods to influence the power draw,but for the selected store, where the type of goods were bread, no or a minimalinfluence was found, see Section 4.5.4.

To study the impact of temperature with respect to power draw, data from threeperiods with different temperature ranges, relative to the Danish norms, one withhigh, medium and low temperatures, have been used. The three different weeksare: week 28 for hot (2014-07-07 to 2014-07-13), week 38 for normal (2014-09-15 to2014-09-21) and week 48 for cold (2014-11-24 to 2014-11-30). For the evaluation,the week starts Monday at 00.00 and ends Sunday at 23.59. The average meantemperature for the three weeks are as follows [87]: hot=19.5�C, normal=15.4�Cand cold=4.5�C.

The period 2014-09-15 to 2014-09-21 have been used for training the algorithm,the model was tested with data from three different weeks (hot, normal and cold).The average mean temperature for the test period was 15.4�C. The three weekshave different levels of outside temperatures (19.5�C, 15.4�C and 4.5�C). The setupused for this experiment is with four sub-meters as described in Section 4.5.2.FHMM has been used for the results, due to the higher test accuracy shown inSection 4.5.2. The results for the three weeks with different outside temperatureare illustrated in Figure 4.11.

From the results in Figure 4.11, there is not a clear tendency. The hot weekhas a lower MNE for light 3, but a higher test accuracy for light 4. The largestMNE difference, is found between light 3, for hot and normal with around 1.8.Comparing Figure 4.10 and Figure 4.11, the results from Section 4.5.2, have lowerMNE results due to a more similar training period, than the results shown inFigure 4.11.

For testing the influence of the training period, in relation to MNE, an exper-iment was conducted with different training periods. The influence of trainingwith cold, medium and hot weeks, was tested and applied to the model for thetest period. The difference in MNE where less than 0.2, indicating that the eventsfrom the cold store have more influence than the outside temperature.

47

Figure 4.11: MNE with four sub-meters for three periods (hot, normal and cold) withdifferent outside temperature.

4.5.4 Impact of Goods Flow

In residential buildings, it is the occupant behavior, that controls the usage ofequipment, and thereby have an influence for the NILM algorithms. For the coldstore there might be other factors, like outside temperature and flow of goods.Here are a study performed to investigate, how goods flow might influence equip-ment usage. When moving goods out of the cold store, energy stored in the goodsdue to cooling are removed. Similarly when new goods arrive energy are neededto cool down the newly arrived goods. Information about pallet movements per-formed by workers on a weekly basis in 2014, have been used for this evaluation.A scatter plot have been created, with outside temperature, power draw and palletmovements for each type of equipment in the cold store. Here have two types ofequipment with the strongest relationship been chosen. However, as can be ob-served from Figure 4.12 and Figure 4.13, there is no strong relationship betweenpallet movements and the equipment usage whereas there is a noticeable relation-ship with temperature as also explored in Section 4.5.3. For the measurement ofthe correlation coefficient have R2 been used, and should be read as R

2. Evap-orator and condenser were the equipment with the highest R2 between powerand temperature. The R2 for evaporator and condenser were 0.0 and 0.03 respec-tively for pallet movements and power draw (the green line in Figure 4.12 andFigure 4.13).

It was hypothesized with a relationship between the flow of goods and thepower draw in the cold store. There are several potential reasons why such pat-tern was not present. Firstly, the data from the cold store about pallet movementsare on a weekly basis, and the level of pallet movements seams to be almost steady.Secondly, the pallet movements are from the whole cold store, whereas the data

48

Figure 4.12: Regression between power, temperature and goods flow, for Evaporator.

Figure 4.13: Regression between power, temperature and goods flow, for Condenser.

set used for this study, is only from segments of the cold store. The segments ofthe cold store that was studied, stores mainly bread, this type of goods do notstore much energy and therefore do not need significant cooling to reach propertemperatures. Based on the considered case with a granularity of weekly data,no relationship between power consumption and pallet movements was found.From Figure 4.12 and Figure 4.13 it can be observed that there is a slightly rela-

49

tionship between outside temperature and power draw, which also were found inSection 4.5.3.

4.5.5 Impact of Trained Model

In this section an evaluation of the impact of the trained model with respect tothe outside temperature is performed. For the cold store case, it is hypothesizedthat the outside temperature will impact the power draw, this was shown by therelationship in Section 4.5.4.

To minimize the impact of the outside temperature, have an adjustment to thetrained model ✓̂n

t,k been developed. The adjustment is developed as a weight givenby the season. The season weight, w, are given by Equation 4.2:

w =T

d

T

m

(4.2)

In Equation 4.2 is T

d

the average temperature for the period of the data todisaggregate. T

m

is the average temperature for the period where the model havebeen trained. The temperature used for both T

d

and T

m

is the monthly averagetemperature of the month, the month is found based on the first timestamp inperiod. This can be misleading, if the first day in a period, is the last day of amonth.

The model for CO have been trained with 7 days of data (2014-07-01 to 2014-07-07), in this experiment the model have been tested on the same three periods asthe experiment with outside temperature in Section 4.5.3 (2014-07-07 to 2014-07-13,2014-09-15 to 2014-09-21 and 2014-11-24 to 2014-11-30). The model used for thisexperiment is the test setup with four sub-meters, illustrated in Figure 4.8. Theresults are shown in Table 4.1.

For the test accuracy shown in Table 4.1, there is no clear indication that theseason weight compensate for the outside temperature. The results shows that thetest periods with the most similar outside temperature, hot and normal, wherealso the ones with the highest test accuracy. It is the same picture for MNE, wherethe average MNE are lower for the hot and normal periode.

The season weighted algorithm which should compensate for the time of yearwhere the CO model have been trained, have not performed as well as hypothe-sized. The results for the cold weeks, should have been closer to the normal andhot weeks.

4.5.6 Impact of Training Data

This section present an evaluation of the impact of the training data with respect tothe model for disaggregation of electricity. The used point setup, is the one whichfollows the processes. The FHMM algorithm has been used, FHMM performedbest with respect to MNE. Based on Section 4.2.1 and the observations from Sec-tion 4.5.3 have the training of the FHMM been split into a day specific training.The training of the model, have been split into seven models, one for each day ina week. If the day to disaggregate is Monday, then use the model trained withMondays. In this setup have the same period as for Section 4.5.1 been used, train

50

Emil Holmegaard

Day specific training - use for result section

Equipment Type F1 MNE

Cold Normal Hot Cold Normal Hot

Light (storage 1) 0.64 0.40 0.51 1.05 3.08 3.84

Light (storage 2) 0.59 0.53 0.98 0.64 0.71 0.94

Compressor (freezing) 0.00 0.00 0.01 0.40 0.53 0.98

Compressor (storage 1) 0.13 0.96 0.98 3.01 0.46 0.32

Compressor (storage 2) 0.00 0.00 0.00 1.00 2.00 2.00

Heat Pump (heating) 0.77 0.86 0.74 0.80 0.41 0.40

Evaporator (storage 1) 0.84 0.86 0.86 0.89 0.34 0.12

Evaporator (storage 2) 0.28 0.53 0.71 2.01 1.37 0.62

Industrial Fan (freezing) 0.33 0.66 0.65 0.60 0.24 0.41

Condenser (freezing) 0.97 1.00 0.99 0.61 0.27 0.11

Condenser (storage 1) 0.64 0.80 0.95 0.95 0.64 0.11

Average 0.47 0.60 0.67 1.08 0.91 0.90

Table 4.1: Results for Season Weighted CO.

period (2014-07-01 to 2014-07-14) and test period (2014-07-14 to 2014-08-14). Us-ing a model which uses a specific day-pattern, have been chosen based on theraw electricity consumption data. The electricity consumption in average is higherin the first three days of the week (Monday to Wednesday), than in the last part(Thursday to Sunday). The results for having specific trained model for each daytype, are shown in Figure 4.14 and Figure 4.15. The test accuracy provides slightlysimilar results in Figure 4.14 as for Figure 4.9.

Using day specific training of the FHMM, has reduced MNE for all equipmentsby half, from 0.7 to 0.36- 0.23 at best. For the results here, the evaporator, hassignificant low test accuracy and high MNE. The problems for evaporator, relatesto the raw electricity consumption, as described in Section 4.5.2. The heat pump,for Thursday, has a larger error for the day specific training. Study of the rawpower consumption for the heat pump, there is a change in the consumptionpattern. The heat pump has been ON from around 21 in all timestamps, where forthe other days, the heat pump has been running half of the time, with half an hourON and OFF for the next half an hour. Comparing Figure 4.10 and Figure 4.15,the day specific algorithm outperforms the normal FHMM.

The day specific approach has improved the MNE significant. The only down-side for this approach is that it requires at least one week of training data for eachtype of equipment. An ideal situation would be to have one year of training data,to generate a model which can take all edge cases into account.

51

Figure 4.14: Test accuracy FHMM with day specific training.

Figure 4.15: MNE for FHMM with day specific training.

4.6 discussion

The cold store has significantly different loads than residential houses, and theloads have a larger variety in equipment and power draw. The pattern of con-sumption is very different than in a residential house, as seen in Figure 4.2. Thedifferent loads from a cold store have been studied in Section 4.2. For the resultshave standard NILM algorithms been applied. The results had an average test ac-curacy in the range of 0.54- 0.61 and an average error in the range of 1.34- 0.62.The results achieved fairly results for industrial equipment. Potentially, resultsmight be further improved by algorithms tuned to the particular domain. Forthe cold store, was the largest improvement to change the training of the model,to a day specific training, see Section 4.5.6. The effect of introducing additionalmeter-groups have been studied. Increasing the number of points by introducing

52

multiple meter-groups, improved accuracy of NILM. Minimizing the number ofequipment connected to the main meter, increased the results. It was also hy-pothesized due to a lower solution space for the algorithms. The test accuracyobtained in the experiments was increased by third, when introducing four sub-meters. Therefore the results provide a recommendation for sub-metering downto ten loads per sub-meter or less to get usable results (four sub-meters for fortyloads for results in Section 4.5.2).

Introducing a day specific model, where the FHMM algorithm was trained andtested with data from a similar day has been implemented. This approach reducedthe MNE by a half.

4.6.1 Difficulties for NILM in Industrial Settings

Based on the preliminary analysis in Section 4.2, an outline for challenges in a coldstore with respect to NILM have been found. From Section 4.2.1.1 it was hypothe-sized that equipment with a low number of events will be difficult to disaggregate.It was hypothesized that equipment like compressors with a high power drawand a high number of events would be very dominant. Furthermore, a daily orweekly pattern in the events was hypothesized, but very few patterns were found.From Section 4.2.1.2 it was found that some of the equipments in the cold storeare VFDs, which can be difficult to disaggregate. Furthermore from Section 4.2.1.2,equipment like compressors which might be run as VFDs, was actually operatedas ON/OFF devices. The technical staff have optimized the performance of thecompressors by minimizing the ramp time, between the different frequencies oroperational states. The compressor was slightly correlated with light, shown inSection 4.2.1.3, which will increase the difficulty of disaggregation.

4.6.2 Variable Frequency Drives

Chang et al. [81] states that VFDs can be difficult to disaggregate. From the coldstore data set, have equipment with a characterization as VFDs been included,e. g. condenser and heat pump. It was also found that several types of equip-ments hypothesized to be VFDs, are in practice managed like ON/OFF devices,e. g. compressor.

4.6.3 One-At-A-Time Assumption

Batra et al. [88] states that the One-At-A-Time assumption is not valid for largerbuilding complexes. It have been analyzed by creating an event correlation matrix,to see if any equipment has a correlation between their events. Figure 4.4 showsa slight correlation between light and compressor which can make it difficult todistinguish between light and compressors but else in general equipment wasfound to be generally uncorrelated.

53

4.7 summary

The focus of existing work within NILM has been on residential settings. Forthis work standard NILM algorithms have been used for disaggregating loads ina cold store. For understanding the equipment and the challenges for NILM inan industrial setting, here a cold store, Section 4.2 have provided a detailed anal-ysis of power changes, power states and event correlations with data from a coldstore. For improving the performance of the NILM algorithms, Section 4.5.2 haveanalyzed the effect of multiple meter-groups / points. Introducing four pointsinstead of one, improved the test accuracy (F1 score) in average with 27%. Theeffect of introducing more points improved the accuracy significantly. For NILMin cold stores, the recommendation for sub-metering is below ten loads per point,to get reasonable results. In Section 4.5.5 a season weighted algorithm have beenintroduced, to compensate for the time of year where the CO model have beentrained. The season weighted algorithm did not perform as well as hypothesized.The challenges for the season weighted algorithm, relates to the simple cluster-ing used to train the model of the CO. In Section 4.5.6 a day specific training ofFHMM have been introduced, which reduced the MNE by a half, compared to thestandard FHMM algorithm. The results in Holmegaard and Kjærgaard [80] opensup for further research on particular challenges of improving NILM in industrialsettings as cold stores which illustrate the potential given suitable sub-metering.

54

Emil Holmegaard

Conclusion - NILM

Part III

S O F T WA R E T O O L S F O R M E TA D ATA

This part is based on Holmegaard et al. [47], Holmegaard and Kjær-gaard [89] and Holmegaard et al. [31]. Chapter 5 presents the soft-ware tool Metafier, including requirements and implementation de-tails. Chapter 6 presents an evaluation of Metafier with respect to howMetafier supports the task of annotating and structuring metadata forbuilding instrumentations. Chapter 7 presents data processing meth-ods in Metafier for semi-automated generation of metadata for build-ing instrumentations. This part provides results for answering researchquestions related to metadata for SDB.

5T H E T O O L M E TA F I E R

The chapter presents an introduction to Metafier with requirements for the pro-totype and an overview of the implementation and features. Firstly, the researchcontext to Metafier is explained. Secondly requirements for Metafier and thirdlychosen technologies for implementation of Metafier.


This chapter is an introduction to the software tool Metafier. The software tool isdesigned to support annotating and structuring of metadata for points. The workrelated to Metafier have been conducted in the project COORDICY. The overallobjective for Metafier is to generate and provide metadata for data streams frompoints, to provide context of the data streams.

Metadata is used to describe the data and to provide context of the informationin the data. Meta in information technology often means “an underlying defi-nition or description” and metadata is often referred as “data about data” [90].Data streams without metadata for e. g. unit of measure, timezone, and locationof where it derived from, are difficult to interpret. Metadata will provide the in-formation and context which in combination with the data stream can provideknowledge.

Figure 3.1 illustrates interconnections for Metafier with respect to Software De-fined Buildings (SDBs). Metafier facilitates tasks related to validation, mainte-nance and generation of metadata for a metadata model, which enables and sup-ports querying of points from building instrumentations.

A preliminary study of Metafier for annotating and structuring metadata ispresented in Chapter 6. An evaluation of algorithms to data mine data streams toannotate metadata, is presented in Chapter 7.

5.2 requirements for Metafier

To develop Metafier, a few requirements was set, such the research questions fromSection 2.2 have been met. Holmegaard et al. [47] and Section 2.1.1 give the basisfor requirements to Metafier. The requirements are listed here:

1. Expandable integration for multiple systems handling building instrumenta-tion and points.

2. Expandable for multiple algorithms regarding data mining of data streamsto generate metadata.

3. Application Programming Interface (API) such multiple Graphical User In-terfaces (GUIs) and test scenarios can be created.

4. Foundation and storage developed around Simple Measurement and Actua-tion Profile [7] (sMAP).

57

5. Support processes following Holmegaard et al. [47] including:

a) Discovery, validation and maintenance of points.

b) Versioning of metadata.

The tool must be able to integrate multiple of the system listed in Table 2.1.This requirement should ensure that Metafier can be used in other research envi-ronments than at University of Southern Denmark (SDU) - Center for Energy Infor-matics (CFEI). Metafier is a tool, where it should be easy to test algorithms for au-tomated metadata generation, with inspiration of Non-Intrusive Load MonitoringToolkit (NILMTK). Therefore, the tool must be expandable for multiple algorithmsfor data mining. Furthermore, Metafier should facilitate an API, such multipleGUIs can be developed, based on which kind of evaluations are required for ex-periments. The API should also facilitate integration to external systems e. g. Kjær-gaard et al. [91]. One of the collaboration partners in COORDICY is UC Berkeley,therefore it is chosen to have storage and foundation developed around sMAP. Arequirement for versioning of metadata, such changes in data steams can be re-flected by associated metadata. Purposes of rooms might change over time e. g. acopier room which have been turned into an offices or similar. The informationabout a change in the purpose, should be represented for all points of the loca-tion, which might be valuable when interpreting data streams for the location.From Section 2.3.1.1 six of the leading research platforms for Building Operat-ing System (BOS) and points have been evaluated for how they handle discovery,validation and maintenance. Therefore the process for discovery, validation andmaintenance also should be included in Metafier.

A description of the processes for discovery, validation and maintenance followsin Section 5.2.1. Validation uses metadata profiles which have similarities with theconcept of Building Templates from BuildingDepot [50]. A description of metadataprofiles can be found in Section 5.2.2.

5.2.1 Life Cycle for a Point

One challenge regarding SDBs and Portable Building Applications (PBAs) is main-tenance of metadata for points of a building instrumentation. A process view fora solution are illustrated in Figure 5.1. The idea is that points are imported froma building instrumentations or a BOS. When a point is first discovered, it is as-signed the state “Discovered”. The next transition is the validation process wherethe point gets the state “Validated”. Maintenance operations may cause the pointto drop down to the “Invalidated” state to indicate a need for validation or re-validation. An example for a point to get the state invalidated, would occur if thepoint type changes under maintenance, with a mismatch between the reporteddata stream and an ordinary data stream pattern for the new point type.

The subprocesses includes discovery, maintenance and validation. Discoveryis the process of including points, and extracting metadata from a sensor instru-mentation where the point originate. Extraction of metadata is a subprocess ofdiscovery to retrieve metadata about each single point from the underlying build-ing instrumentation. Validation is the process of controlling whether metadata fora point follows the metadata profiles. Validation supports reaching a minimum re-

58

start

Discovered

Invalidated

Validated

No

Yes

Valid?

State: DiscoveredType: ??{

{

{

Space: 12

State: InvalidatedType: ??

Space: ??

Space: 12

State: ValidatedType: PIR

Figure 5.1: States within the process view of metadata discovery and validation for points.The process supports the life cycle for a point. To the right we have a simpleexample, for a point following the process.

quired level of metadata associated with the point. The required level of metadatadepends on the PBA which uses the points. The process of validation involvesmetadata profiles, see Section 5.2.2. Validation must follow at least one metadataprofile, but are not limited to one; if metadata does not follow one of the metadataprofiles, it gets the status of invalidated.

Maintenance covers everything from discovery to removing a point, thereforethis process have been left out from Figure 5.1. Maintenance is also the processof updating the metadata for a point, e. g. setting the type of a point. Each timemetadata has changed for a point, the point is re-validated, to ensure the metadatafollows the metadata profiles.

In Figure 5.1 we have an example, of how the process can be used for a singlepoint. First the point gets the state “Discovered”, and extracts metadata from thebuilding instrumentation, here it could not find the type, but find the space forthe point. After the point have been discovered, it validates against the metadataprofiles, and gets the state “Invalidated”, because the point type has not been set.When the point type has been set, here the point type is Passive InfraRed (PIR),then the point gets the state “Validated”.

The validation process runs periodically as well as after each metadata change,to re-validate whether the points still have the minimum required metadata. Thispart of the process, ensure that metadata follows the metadata profiles, also aftermaintenance for reflecting the actual instrumentation of the building.

5.2.2 Metadata profile for Validation of a Point

To perform a validation of points, an idea of having metadata profiles to validateagainst will be described. The idea originates from BuildingDepot [50], where thevalidation of metadata are performed by multiple BuildingTemplates. The Build-

59

ingTemplates enables multiple schemas for one point and define a minimum ofrequired metadata, and are implemented by eXtensible Markup Language (XML)schemas. A reason for having multiple metadata profiles, is due to the differencebetween points, but it is also to divide and structure the metadata profiles intological groups, e. g. a metadata profile for location and a metadata profile for met-rics. Having multiple metadata profiles avoids one overly complex schema whichshould fit all points. The metadata profiles should follow the same principlesas for well-formed design classes in object-oriented programming [92]; Primitive;High cohesion; Complete and sufficient. This will give a guideline for creatingmetadata profiles which are easy to understand.

Support for multiple metadata profiles, can be useful for having a requiredminimum of metadata supporting different PBAs and having multiple metadataprofiles introduces compliance between the metadata level of one building anddifferent PBAs. One metadata profile can have the responsibility of metadataregarding location, another regarding Heating, Ventilation and Air Conditioning(HVAC)-systems within buildings. An example where multiple metadata profilesare useful is given in the following: A PBA called X, needs information aboutlocation and metrics to provide an eco-feedback view for occupants, based on theirlocation. Therefore X requires two specific implementations of metadata profilesregarding location and metrics. A BOS, following Figure 3.1, can then provide thespecific PBA with a version of metadata which supports both metadata profiles.

Based on the requirement of a foundation build upon sMAP, metadata arestored as key-value pairs. With metadata as key-value pairs, a simple solutionis JavaScript Object Notation (JSON) schemas, where structure of required keys,data type of values and a readable format are included. A consequence with JSONschemas comes with the format of JSON, which only have 6 defined data types(object, array, number, sting, boolean, null). Furthermore the content of the valuecan not be validated, for the specific format or value. An example of a validationwould be to validate a room name, for having the correct format e. g. "e26-210-2".With this room name, a validation of the range in numbers and exact two dashescannot be validated by a JSON schema. This will enable us to define a minimumof required metadata while allowing flexibility. A minimum could be informationabout location of points combined with the type of points. Furthermore the meta-data profiles should define the data type of all values, e. g. that floor is a number.At some point it would be convenient if metadata profiles provides a full semanticfor understanding all kinds of metadata in a building. An example could be linksbetween systems, properties of isolation materials, links to blue prints etc.. Firstversion of metadata profiles needs to be simple, and still provide enough seman-tics to be useful for PBAs. Simple in the sense of which type of instrumentation itcan handle.

5.3 features and implementation details

Based on the simple requirements in Section 5.2, the the following subsystemshave been identified: Generator, API, Maintenance, Validation and a Store. Asystem setup is illustrated in Figure 5.2. Each of the subsystems are explained inSection 5.3.1 to Section 5.3.5.

60

Figure 5.2: System setup, here illustrate with an interface to a sMAP instance.

The programming language for implementation of Metafier have been Python2.7. The choice of Python relates to libraries for Data Analytics (DA) and particularPandas [93]. Pandas provides a large toolset for preprocessing like resampling andfilters. Furthermore, Flask [94] have been used for establishing a REpresentationalState Transfer (REST) API for Metafier.

For Metafier the following classes have been identified: User, Profile, Point, In-stance, Generator and Estimate. Where User contains information of the end userof Metafier, including password and information of which Points the end user canaccess. The class Profile transforms a JSON schema to a object, which are used forvalidation of the metadata entered for a Point-object. The class Point consists ofa key-value set for metadata, together with information about version, referenceto Instance and an Universally Unique Identifier (UUID) reference for the point.The class Instance is used as a Data Transfer Object (DTO), which stores informa-tion needed to establish a connection i. e. to a specific sMAP instance. The classGenerator are used for each of the algorithms which generate metadata based ondata streams. The Generator holds a name of the specific class that implements thealgorithm, an UUID to identify the Generator-object. A Generator-object, generatesresults of the class Estimate. An Estimate stores information of which Point-objectsthere have been used, the period used and a calculated percentage value for howsimilar to Point-objects are.

61

5.3.1 API for Metafier

The API have been implemented using Flask [94], and the structure of the APIfollows a simple approach where HyperText Transfer Protocol (HTTP) GET willprovide JSON, and HTTP POST will modify objects in Metafier. The structure ofthe end points in the API uses the following structure:

http://IP:PORT/api/points/<uuid>http://IP:PORT/api/users/<uuid>http://IP:PORT/api/instances/<uuid>http://IP:PORT/api/profiles/<filename> ⇧

Listing 5.1: An example of end points in Metafier.

The parameter in <> are optional, and can be used for getting a specific objectbased on the parameter. If the parameter is left out, the whole collection of thespecific type will be returned e. g. all Point-objects. For HTTP POST operationsmust the parameter be present.

The implementation of the API follows a Model-View-Controller (MVC) pattern,where a controller module have been created for each of the classes in Metafier.Authorization for the API have been implemented as a Bearer-token, with a HS256

algorithm for hashing the username and password.

5.3.2 Maintenance of Metadata

Maintenance of metadata covers multiple processes; annotate metadata, removemetadata, validation of points and update metadata. Therefore, maintenance havebeen implemented as with core functionality via API described in Section 5.3.1and a GUI implemented in Polymer Web Components [95]. The GUI uses the API.Building upon the API, ensures that external systems like OccuRe [91], can usethe same features. The GUI is using standard Polymer Web Components, e. g. aGoogle Charts Component for data visualization.

For the GUI six key features have been implemented: search for a point, groupingof points, visualization of data from points, data validation to verified that the datastream of a point seem to match its annotated metadata, versioning of metadata andmetadata validation to ensure the metadata follows the format specified in metadataprofiles. These features have been chosen for reducing the task of annotatingbuilding metadata.

A search and grouping mechanism have been implemented via sMAP queriesreturning collections of points, which are saved in Polymer with HTML5 WebStorage. A group is a collection of at least two points. The search mechanismwill automatically synthesize a group out of the result of any query returningmore than one point. This allows for a workflow where common annotationscan be quickly applied to large groups of points. The objective of having searchand grouping, is to collect similar points, for increasing the speed of annotatingmetadata. Groups can be used for editing a field e. g. location for a collection ofpoints which share the same location. Metafier has one predefined type of group,which contains all points that relates to the same building instrumentation. This

62

is illustrated in Figure 5.3, where the GUI shows a group for Building A andBuilding B, respectively.

Figure 5.3: Screenshot for "list view" with an overview of all points in Metafier.

Each point in Metafier holds a state, which can take the value of either Dis-covered, Validated or Invalidated as defined in Section 5.2.1. This feature canminimise the effort of validating data from points in other applications.

5.3.3 Validation of Points

Validation have been implemented as a core feature in Metafier. The implemen-tation uses Python Abstract Base Class (ABC) [96] in combination with JSONschema [97] validation. Each time a Point-object are retrieved or saved, Metafiervalidates the object through all associated Profile-objects. For Metafier metadataprofiles have been used to ensure a consistent level of metadata. The metadataprofiles provides a set of metadata keys, to have a common metadata key-set forall buildings.

For Metafier, have the following metadata profiles been implemented: location isa key-set for floor, room and building. system is a key-set for how the point relatesto subsystems, e. g. heating, radiator, booking, ventilation, lightning. properties isa key-set for how data stream should be interpreted, e. g. timezone information,data type for readings (float or int). setpoint is a key-set for direction of the point,for setpoint the point is an actuator. sensor is also a key-set for direction of thepoint. unit is a key-set for physical measure and encoding, e. g. for PIR wouldthe encoding be boolean. Listing 5.2 illustrate the metadata profile for location. A

63

validated Point-object of Listing 5.2, could take the form of Listing 5.3.

{"type": "object","$schema": "http://json-schema.org/metafier/location#","properties": {"Metadata": {"type": "object","description": "Aspects holds the information required for having

location associated with a point","properties": {"Location": {"type": "object","description": "Construction relates to the top level of a location,

Building etc","properties": {"Building": {"type": "string","description": "Name of the Building","default": "Not Specified"

},"Floor": {"type": "number","description": "Floor where the point is located","default": 0

},"Room": {"type": "string","description": "Room name of the Point","default": "*"

}},"required": ["Building"

]}

},"required": ["Location"

]}

}} ⇧

Listing 5.2: A location metadata profile in Metafier.

The JSON schema (metadata profile) define which keys there must be presentand the data type. The keys which must be present are represented in the valueof required. For the metadata profile in Listing 5.2, are Location and Building re-quired. The metadata profile provides a description of each key, such the formatshould be easy to follow. Furthermore are the metadata profile used for settingdefault values.

64

{"Metadata": {"Location": {"Building": "OU44","Floor": 1,"Room": "e22-603-1"

}}

} ⇧Listing 5.3: A valid location for a point, validated through Listing 5.2.

Metadata profiles are used when points are retrieved and stored at its source, e. g. asMAP instance. To have an extensible integration for multiple sources handlingbuilding instrumentation, ABC have been used, where the validation part are im-plemented in the base class, and the specific integration for retrieving and savingare implemented by a child class. Snippets from the base class, DataProvider, areillustrated in Listing 5.4. The public method get_metadata will be the method touse, when metadata from other systems should be retrieved. The method uses thechild implementation, of __retrieve_metadata__ and validate the metadata beforereturning it. The method __validate_metadata_structure__ is used for validationof metadata, in this case it will only validate through metadata profile for properties.It uses the static method get_by_name in Profile, which loads a metadata profilebased the paramter for the filename of the metadata profile. The DataProvider classcontains an instance of a Draft4Validator stored in the attribute self.validator. Themethod __validate_metadata_structure__ returns a tuple with a boolean value forwhether the validation succeed and an associated error message.

# removed: imports

class DataProvider(object):__metaclass__ = ABCMeta

# removed: constructor + methods for get_all_points, set_metadata andget_data + abstract methods for __retrive_all_points__,__set_metadata__ and __retrive_data__

def get_metadata(self, identifier):metadata = self.__retrieve_metadata__(identifier)validation = self.__validate_metadata_structure__(metadata)if validation[0]:

return metadataelse:

return dict()

@abstractmethoddef __retrieve_metadata__(self, identifier):

pass

def __validate_metadata_structure__(self, metadata, profiles=[’properties-profile’]):

65

fails = []

for profile in profiles:try:

if len(profile) > 0:schema = json.loads(Profile.get_by_name(profile))r = self.validator(schema).validate(metadata)if r is not None:

fails.append(r)except Exception as e:

self._logger.warn(e)fails.append(e)

if len(fails):return False, fails

else:return True, None ⇧

Listing 5.4: Snippet from DataProvider class. Part of the class have been left out, seecomments in snippet.

5.3.4 Generators for Metadata

The concept of generators are to generate metadata from raw data streams. Theinput of a generator, should be a set of validated points and a set of invalidatedpoints. The output of a generator should then be a set of estimates indicating howsimilar certain points and their corresponding data stream are. A set of generatorscan be implemented, which have different approaches for comparing data streams.Based on the estimates, can metadata be transferred and annotated to invalidatedpoints.

An approach similar to Section 5.3.3 have been used for the implementation.A base Generator class have been implemented, which are in charge of fetch-ing data streams for the list of associated Point-objects. Furthermore, the baseclass provides functionality for sanity checks of the validated points and cap-turing statistics for timing and memory usage. ABC have been used, such theGenerator class provides a run method, which uses the implementation of thechild classes, which implement the different algorithms like Dynamic Time Warp-ing (DTW) and Empirical Mode Decomposition (EMD). The child classes mustimplement get_estimate_model and __create_estimate_result__. The first methodare responsible for creating a Estimate-object, which can be used for storage ofinformation about how similar two Point-objects are, used memory and timings.The method __create_estimate_result__ performs the evaluation of the chosen al-gorithm e. g. DTW and returns a tuple with an Estimate-object and the calculatedconfidence of how similar two Point-objects are.

Figure 5.4 illustrates the flow within Metafier. Circles indicate validated points,rhombuses indicate invalidated points. The three colors of the circles and rhom-buses indicate the different point types, the green color could indicate a pointwith the type of room temperature. The blue arrows indicate tasks performed byMetafier. Metafier runs the algorithms for each data stream and calculates a simi-

66

Figure 5.4: The flow in Metafier. The blue arrows indicate processes within Metafier. Thegreen arrows indicate processes started from the GUI.

larity confidence of whether the data streams are similar. The similarity confidenceis calculated based on e. g. the normalized distance matrix from DTW. The greenarrows indicate tasks performed in Metafier GUI by a user whom has knowledgeof the building instrumentation. The role of a user is to evaluate whether the simi-larity confidence has reached an acceptable threshold and select whether metadatashould be transferred or not.

5.3.5 Store and Foundation for Metafier

The requirement of having a foundation developed around sMAP, have beensolved by wrapping the protocol into a few methods, which handle the RESTAPI of sMAP. The used version of sMAP, is Giles [98] which is a version of thearchiver written in golang. The implementation has been limited to two publicstatic methods as illustrated in Listing 5.5: get_or_create and update. The param-eters for get_or_create are a name of the object type, a query holding a sMAPquery and a dictionary containing the key-value set, e. g. describing a Point-object.Python do have an option to pass arguments with variable length1. Working withJSON in Python is handled as a dictionaries. sMAP is using JSON for metadata.The combination of metadata in the format of JSON, and the option of argumentswith variable length, is used to limited methods to two methods handling storageand update of all object types. In Metafier have JSON for all stored objects beenconverted to dictionaries. get_or_create retrieves a object based on the parameters,if such exists, else it will create a new object with the parameters and a UUID. Themethod update modify the object in the archiver, and create a version of the oldobject. The functionality of update, compares the parameters with a object fromget_or_create based on the UUID.

1 Arguments starting with * are a list of arguments, arguments starting with ** are keyworded argu-ments - a dictionary

67

@classmethoddef get_or_create(cls, model, query=None, **kwargs):@classmethoddef update(cls, model, **kwargs): ⇧

Listing 5.5: Method signatures for primary methods in foundation.

All models have a name and are mapped to a key-value set, to make the inte-gration for sMAP as simple as possible.

5.4 summary

The chapter present an introduction to Metafier and the processes of discovery,validation and maintenance. Furthermore, the concept of metadata profile havebeen introduced. This chapter is an introduction for Chapter 6 and Chapter 7.Which presents a preliminary study of Metafier for annotating and structuringmetadata and an evaluation of algorithms to data mine data streams to annotatemetadata.

68

6A N E VA L U AT I O N O F M E TA F I E R

This chapter present a preliminary study of the software tool Metafier for perform-ing the tasks of validation and maintenance. Metafier have been populated witha subset of points from University of Southern Denmark (SDU) OU44 and GreenTech House (GTH). For the study three subjects have been using Metafier for an-notating points. The content of this chapter originates from Holmegaard et al. [31]and evaluates the processes from Holmegaard et al. [47].


Research of existing software tools for Software Defined Building (SDB), see Ta-ble 2.1, indicated a gab for software tools to handle metadata for buildings. There-fore the software tool Metafier have been developed, see implementation detailsin Chapter 5. Metafier facilitates processes for discovery, validation and mainte-nance, see Section 5.2.1 for definitions. The objective for Metafier is to supporta SDB with metadata for points, by annotating points. Figure 3.1 illustrates howMetafier supports a Building Operating System (BOS) and thereby the SDB.

This chapter originates from Holmegaard et al. [31], and provides a preliminarystudy of using Metafier for two buildings. The used buildings are GTH and SDUOU44. Metadata profiles is applied to create a common metadata structure whichfocuses Portable Building Applications (PBAs), such the structure will be similarfor the two buildings.

The findings of this chapter includes:

• A preliminary study of a software tool to handle maintenance of metadata.

• Requirements for software tools to handle maintenance of metadata derivedfrom the preliminary study.

6.2 evaluation

A field study at the facility managers office, see Section 6.2.1 have been conducted.Furthermore, with a goal of having metadata for all points in buildings, a prelim-inary study of the software tool Metafier has been conducted. An interview ofthree subjects have been conducted, while they were using Metafier.

The evaluation covers the software tool Metafier and not the underlying meta-data model. The three subjects have been introduced to the features of Metafiershortly before they were asked to perform the tasks listed in Table ??.

6.2.1 Tasks related to BAS at the Facility Managers Office

For this preliminary study, a collaborated with the facility managers at SDU havebeen established. The facility managers maintain the Building Automation Sys-tem (BAS), and thereby know how systems and subsystems interact at SDU. The

69

Task DescriptionLogin Type user and passwordAnnotate a point Enter metadata related to selected pointAnnotate a group Enter metadata related to selected pointsValidate a point Select point and click ValidateCreate a group Search or Click on multiple pointsUse metadata profiles Select a metadata profile matching selected point(s)Save annotation Click on save

Table 6.1: Tasks for subjects evaluating Metafier.

facility managers have been an entry point for understanding the BAS used onSDU. The facility managers holds all building instrumentation information neces-sary to annotate the building with enough metadata to provide a metadata modelrich enough to support most types of portable applications.

Based on collaboration with facility managers, their tasks related to BAS andmetadata have been analyzed. Three main tasks have been identified. The threetasks are covered in Section 6.2.1.1 to Section 6.2.1.3:

6.2.1.1 Control

The task of Control, relates to global set points and control loops influencing mul-tiple rooms. An example could be defining the global set point for room tempera-ture, when a room is booked. The corresponding annotation task would be to useMetafier to assign a point type of set point to a point.

6.2.1.2 Logic

The task of Logic, relates to actions or state machines within the building automa-tion. The facility managers define actions for e. g. when a room is booked, thenthe light level and comfort set point should be activated. The facility manager isalso linking points, such the shading height, the required lux level and the outputof the lightning can be controlled by one set point. The corresponding annotationtask would be to use Metafier to assign the type of control and effects of the logicto a point.

6.2.1.3 Inspection and Commissioning

The task of Commissioning, relates to test of the interconnection between the sub-systems within the building. The facility manager can test whether the light levelreaches the correct level, when the room is set to booked. The task of Inspection,relates to validate that the sensors are reporting the expected values. The corre-sponding annotation task would be to use Metafier to assign a boolean acceptedstatus to a point. This relates to the process described in [47]. The task of vali-dating points is very important for applications using data from the building, likeOccuRe [91] or a personalized lighting control [70].

70

6.2.2 Subjects

For the evaluation of Metafier three test subjects have participated. The three testsubjects does have different background and knowledge about BAS and buildinginstrumentation metadata.

subject a is a facility manager at SDU, with more than 10 years of experiencewithin the field. Subject A has a background as electrical technician, and canbe seen as an expert within BAS.

subject b is a student in software engineering at SDU, and has worked withsensor data from buildings for the last year. Subject B can be seen as a novicewithin BAS, but have some knowledge about metadata regarding buildinginstrumentation.

subject c is a student in software engineering at SDU. Subject C can be seen as anovice within BAS, but have had an introduction to building instrumentationand software for building instrumentation.

6.2.3 Buildings and Data

A subset of points from SDU OU44, see Section 3.3.2 and GTH, see Section 3.3.1have been used for the evaluation of Metafier. The subset has been chosen to coverone room in SDU OU44, and two rooms in GTH. The reason for having differentnumber of rooms, was the fact that the SDU OU44 have more than double theamount of points in the selected room. The building instrumentation is stored inSimple Measurement and Actuation Profile [7] (sMAP) instances.

6.3 results from evaluation

For the evaluation described in Section 6.2, results have been collected for threesubjects using Metafier. The three subjects from Section 6.2.2 have performed tasksfrom Table ??, while observed for interactions in Metafier. The term event are usedfor either a click or a pressed key by the users of Metafier. Timestamps for eachevent have been recorded, which can provide the total time, from first event tolast event. Annotated Points, represent the number of points from SDU OU44

and GTH, where the subject have either set a metadata profile, typed a key-valuepair, or annotated metadata in other ways. The maximum number of annotatedpoints a subject could reach was 112. The results consists of a screen and audiorecord together with JavaScript collecting all events in Metafier. A summary of thecollected events is shown in Table 6.2. In Table 6.2 indicates "Avg. [s]" the averagetime between two events and "Max [s]" indicates the maximum time between twoevent.

From Table 6.2 it can be observed that in total 2 out of 3 events were a click.Subject A have used 116 events for annotating metadata of 1 point. Subject B hada click to key ratio, where 1 out of 3 events was a click. Subject B used in average2 events for each point. Subject C have used the mouse for 4 out of 5 events andhave used 7 events per point.

71

Subject Events Clicks Entered Keys Total Time [s] Avg. [s] Max [s] Annotated PointsA 116 80 36 1171 11 300 1

B 194 60 134 560 2 155 112

C 547 436 111 1193 4 87 79

Table 6.2: Results from evaluation of Metafier. Results collected by JavaScript in Metafier.

6.3.1 Individual Results

Data have been collected for clicks and pressed keys, the events are illustrated astimelines for each of the subjects in Figure 6.1 to Figure 6.3. For Figure 6.1 toFigure 6.3 have been used triangle markers, which have been colored based ontime since last event, where darker colors mean larger delta.

Figure 6.1: Click statistics for subject A.

Subject A logged into Metafier, which is shown around 09:58 in Figure 6.1, thensubject A had some questions regarding the software tool and what the task was.At 10:08 subject A started editing a point, and then had some questions regardingthat specific point. The question resulted in subject A changing to his own com-puter, to investigate and answer the questions regarding the point. Afterwardssubject A changed values of what he had entered. A period of around 4 minuteswithout events occurred, from 10:14 to 10:18, due to subject A was again investi-gating the point on his own computer. Subject A ended by clicking on the savebutton. Subject A was annotating one point, where he entered information aboutthe sensor type and location of the sensor.

Subject B logged into Metafier, which is shown around 09:55 in Figure 6.2, thensubject B was clicking around the software tool. After 5 minutes with questionsregarding Metafier, subject B was editing a group of points from GTH. SubjectB had some questions before editing points from SDU OU44, which can be seenfrom 10:00, with around 2 minutes without any events. Subject B was annotatingall points in this test, he entered building for all of them, furthermore he wassetting type information for one group of temperature sensors.

72

Figure 6.2: Click statistics for subject B.

Figure 6.3: Click statistics for subject C.

Subject C logged into Metafier, which is shown around 09:13 in Figure 6.3, sub-ject C was clicking around Metafier for almost 7 minutes, until 09:20. Subject Cstarted editing points from GTH via the group feature. At 09:27 subject B edited 1

point from SDU OU44 and clicked on the save button. Subject C was annotatinga total of 79 points, he entered location for all of those points, and for 1 point heentered type information.

6.4 discussion

Based on the results in Section 6.3, a discussion of how the subjects performed thetasks from Table ?? and the overall usage of Metafier are presented.

An explanation for a point is said to be validated when a human has verified thatthe data stream of the point seem to match its annotated metadata, was presentedto subject B, whom said: “. . . else ten persons working with data and every single oneneed to have control that it is correct what is received. . . ”. For a building with a high

73

level of building instrumentation, validated and invalidated points, can be usedin the commissioning phase. Subject B, whom have worked with sensor data for ayear, identified and appreciated the feature of validating points.

From Table 6.2 it is clear, based on the number of annotated points and numberof events, that the grouping feature was used for subject B and subject C. SubjectB was on average using only 2 events to process a point. Subject C said “. . . as Ican see it, the grouping mechanism is strong . . . ” and subject B “. . . it was exactly thefeature I was thinking of. . . ” while using the feature. By using the grouping feature,subject B could easily annotate all points in this setup. Subject B was the one usingshortest time and events, but still annotating the most points.

All subjects have used a minimum of one metadata profile, for annotating meta-data. While editing a set of points, subject B said: “. . . and then I just need to finda way to write celcius, which should be consistent. . . when I have wrote celcius once, Icould chose it again. . . ”. Metadata profiles in Metafier gives structure, but it couldprovide even more structure. If Metafier was reusing earlier entered values, thenvalues would be more consistent.

Performing the evaluation of Metafier, it was found that views could have hadmore explanatory texts. Subject A said: “. . . if you should address this to me, thenI would like that the descriptions was changed a bit. . . maybe just by hovering and thenhighlight with a more descriptive text. . . ”, and was confused about the used keys,and had to ask what he should write as values. Subject A and subject C neededmore guidance, due to a misunderstanding of the used metadata model. SubjectC needed further explanation for a couple of the keys. Furthermore it was expe-rienced, a missing understanding of how Metafier should be linked with respectto tasks for facility managers and users of a building. It was like the softwaretool was not enough, there was missing an understanding of the importance ofmetadata for the building instrumentation.

Subject B which had knowledge to the underlying data model, had an advantageand was the only to annotate all points from SDU OU44 and GTH. From thispreliminary study, it has been found that the software tool is not enough, it is alsothe mindset of the users. It is important that the user of a software tool to annotatemetadata, has an idea of why and how the metadata should be used afterwards.In order to take advantage of the domain knowledge held by facility managersthere is a challenge to convince them, that metadata will benefit their work.

6.5 comparison of existing solutions

This section compare features in existing Building Management System (BMS) andBAS software to the features of Metafier. The existing solutions chosen, are themost commonly used ones in Denmark, which are installed in SDU OU44 andGTH: ETS, Honeywell Niagara AX, Siemens Desigo and Schneider StruxureWare.Six features have been chosen, based on Section 5.3.2, describing tasks related tomaintenance of metadata. The comparison is shown in Table 6.3, and are basedon [99, 100, 101, 102].

ETS is used for managing KNX products and it is an application for BAS. Themain purpose of ETS is the ability of creating the logic between points. It has

74

Feature ETS[100] Niagara AX[101] Desigo[99] StruxureWare[102] MetafierSearch 7 7 7 7 7

Grouping 7 7 7 7 7

Visualization 7* 7 7 7 7

Data validation 7 7

Versioning 7

Metadata validation 7* 7* 7* 7

Table 6.3: Features for annotation of metadata in existing BAS and BMS. * have been usedwhen the feature are partial meet, or require a plugin.

features for searching, grouping and visualization but requires a plugin. ETS doesnot have a feature for validation data or metadata.

Niagara AX is a full BMS which has support for search, grouping and visualiza-tion. For metadata validation, Niagara AX has a integrated type system for settingthe type of a point, e. g. a temperature sensor can be linked with either �

C or �F as

unit.Desigo is the European version of Siemens BASs, and it supports search, group-

ing and visualization. Desigo does have a type system like Niagara AX, where themost common types are included.

StruxureWare is a full BMS which has support for searching and grouping. Forvisualization it is required to create trendlogs for the points which should be visu-alized. A trendlog can be seen as a data storage for a specific point. StruxureWaresupports data validation with a state for whether the data is validated or not.StruxureWare has a type system like Niagara AX, where the most common typesare included.

The three BMSs supports validation of metadata with respect to the most com-mon type of points. None of these systems supports validation towards a pre-defined metadata schema. Instead, they define all fields as optional. While thisprovides full flexibility it also represents a lack of formal standardization and thusa risk of deviations.

6.6 requirements for a tool to handle building metadata

The section presents requirements for tools to maintain metadata. The require-ments are constructed based in the interviews of the three subjects.

Based on an interview of facility managers for SDU, data visualization of datafrom points will only benefit facility managers the first time they inspect a point.Validation of points relates to the same task, first time a facility manager inspecta point, they can give the point a status of validated. For usage of data frompoints in third part applications, data visualization and validation will still haveits eligibility.

While requiring an initial amount of metadata a search and group feature willincrease the efficiency of metadata annotation. For Metafier the query languagefrom sMAP have been used, which can be difficult to use without knowledge ofthe metadata model. A requirement must therefore be to use a simpler querylanguage, e. g. a plain text search.

75

The interface for facility managers, need to be simple. The used metadata modelshould be hidden for the end user or – alternatively – the user could be presentedby a choice of view. There should be simple labels with a help text introducingthe end user to what the field should contain.

The end user should be able to reuse text already entered to the system, so thatspelling and typing mistakes will be minimized, or at least consistent.

Metafier have been develop to fulfill all features in Table 6.3, but few modifica-tions and additions would bring the same features to StruxureWare[102] as well.

6.7 summary

Metafier have been created as a software tool for annotating and structuring build-ing metadata. It has been found that search and grouping of points increasedthe efficiency of annotating metadata. The preliminary study showed a simplemodel for validating points, which was understood by the test subjects. Overallchallenges was found, regarding the mindset of the end users, where the metadatamodel was difficult to understand. The purpose of annotating metadata was un-clear for most of the test subjects. Which could be solved by an application usingthe metadata to solve some of the facility managers tasks. The facility managerrequested for more guidance, while annotating metadata, to ensure that metadatawas correctly applied. Section 6.6 presents a set of requirements for how to cre-ate a software tool which can handle building metadata. The most importantrequirement is to make the software tool as simple and intuitive as impossible.Furthermore, the objective of using the software tool should be clear for the enduser.

76

7D ATA P R O C E S S I N G M E T H O D S F O R D ATA M I N I N G I NM E TA F I E R

The chapter is based on Holmegaard and Kjærgaard [89], the topic is data pro-cessing methods for data mining to support the task of annotation of metadatafor building instrumentation. The results are conducted via Metafier, details forimplementation of Metafier are in Chapter 5.


The topic of this chapter relates to Knowledge Discovery (KD) via Data Analyt-ics (DA) to support Software Defined Buildings (SDBs) and Portable Building Ap-plications (PBAs). A description of the KD process can be found in Section 1.2.Figure 3.1 illustrates the concept of SDB. The illustration shows how Metafiersupports the Building Operating System (BOS) with metadata. The metadata caneither be manual annotated or generated via metadata generators. This chapterpresents data processing methods to generate metadata. One of the objectives forhaving automated or semi-automated metadata generation is to perform intuitivequeries [28] on metadata from building instrumentations. The metadata providesan abstraction layer for points. An example of an intuitive query is illustrated inListing 2.1. With a common metadata language or semantic representation, differ-ent devices will be able to interpret data from other devices.

Three algorithms are implemented in Metafier. The supervised learning algo-rithms to generate metadata are: Dynamic Time Warping (DTW), Empirical ModeDecomposition (EMD), and Slope Compare (SC). The algorithms are chosen: ForDTW, based on displacement between two data streams. For EMD, based on noisydata, where components with slow frequency could have common features. ForSC, based on similar patterns with a offset e. g. different temperature set pointsbut same increase when people occupy a room. The algorithms creates estimateswhich has a similarity confidence of how similar a invalidated point are to a vali-dated point. For a definition of invalidated and validated points see Section 5.2.1.

The objective is to achieve knowledge in form of metadata related to a specificpoint. The semi-automated metadata generation in Metafier is illustrated in Fig-ure 5.4. Data for the KD process of this evaluation is data streams for points fromGreen Tech House (GTH). The target data is set to a period of 7 days. The onlyform of preprocessing, is that data is ordered by date. Patterns are created for allpoints, which are compared to validated points, to match similar patterns. If thereis a match between certain similarity confidences, metadata are transfered fromthe validated point to the invalidated point.

The findings of this chapter includes:

• Design of three algorithms for semi-automated metadata generation in Metafier(SC, EMD and DTW).

77

Emil Holmegaard

Why DTW, EMD and Slope Compare

• An evaluation of a semi-automated metadata generation for point type. Theresults showed accuracy in the range 94.39% to 98.13% for the three algo-rithms.

7.2 algorithms for generation of Metadata

This section present three algorithms for transformation. Besides the three algo-rithms follows one algorithm to calculate similarities of two lists.

7.2.1 Slope Compare (SC) Algorithm

SC divides a data stream into smaller chunks of 4 sensor readings. The algorithmthen calculates whether the data stream decrease or increase more than �. Forall evaluations � = ±0.05, this value can be changed based on the expected pointtype. SC calculates the differential coefficient by Equation 7.1.

s =y

n

- y

1

x

n

- x

1

(7.1)

The slope change is given by s in Equation 7.1, where y is the value of the sensorreading, x is the index, 1 indicates the first coordinate, and n the last coordinateof a chunk. The method returns the midpoint of the calculated slope. After com-puting all indices with a slope change, the results for two streams are compared.For comparison a list comparison have been used as described in Section 7.2.4.

7.2.2 Empirical Mode Decomposition (EMD) Algorithm

At its core, the EMD method is an optimization problem, where the envelope the-orem [103] is applied. The implementation in Metafier follows the same approachas Fontugne et al. [104] for EMD. Instead of finding anomalies, signatures areused for finding similarities in the data streams. The EMD method is constructedwith the following steps, given the data stream of a point as f(x):

• Identify local maxima and minima of f(x)

• Generate the upper envelope �max of the local maxima using spline inter-polation. Generate a lower envelope �min of the local minima using splineinterpolation. Calculate a local mean x̄(x) = �max+�min

2

• h(x) = f(x)- x̄(x) defines the modulated oscillation

• If h(x) meet the Standard Deviation (STD) in Equation 7.2, set h(x) as theith Intrinsic Mode Functions (IMF) and replace f(x) with the residual r(x) =f(x)- h(x), otherwise replace f(x) with h(x);

STD =XX

x=1

"|(h(k-1)(x)- h

k

(x))|2

h

2

(k-1)(x)

#

(7.2)

• Repeat until r(x) reach the stopping criteria.

78

The IMF identify patterns of modulated oscillation for the data stream in differ-ent frequency bands. The IMF are afterwards aggregated, where high frequenciesare removed, all IMF with a time scale lower than 20 minutes are removed. Afterthe aggregation of IMFs for a specific point, the IMF aggregation have been com-pared with a validated point. For calculating the similarity confidence on the IMFaggregation between two points, a list comparison have been applied, as describedin Section 7.2.4.

7.2.3 Dynamic Time Warping (DTW) Algorithm

DTW is used for measuring similarities between two temporal sequences whichmay vary in speed. The data streams from points are temporal sequences, e. g. theeffect of sun light in two rooms can vary in speed. DTW is divided into two steps,first step is to calculate the distance between the two data streams and create adistance matrix. Next step is to find the optimal path in the distance matrix. Thetwo data streams are denoted as f

a

and f

b

with the length n and m respectively.The distance matrix will be a n⇥m matrix. The distance matrix are filled withEuclidean distance between f

a

and f

b

, given by (fa

(i)- f

b

(j))2 where (i, j) is thecoordinate in the distance matrix. The warping path function, W is given by:

W = w

1

,w2

, · · · ,wk

max(m,n) 6 K 6 m+n+ 1 (7.3)

Here is the k-th element of W defined as w

k

= (i, j)k

. The warping path W mustsatisfy the following constraints:w

1

= (1, 1) and w

K

= (n,m). Start point of the warping path must be the startpoint of the data stream, and end point of the warping path must be the last pointof the data stream.

If wk

= (a,b), wk-1

= (a 0,b 0), a- a

0 6 1, b- b

0 6 1. This limits the allowedsteps of the warping path.

If wk

= (a,b), wk-1

= (a 0,b 0), a-a

0 > 0, b-b

0 > 0. This ensure that the stepsare monotonically along the x-axis.

The last step is to define an objective function for the optimum warping path.The objective function are given by:

D(fa

, fb

) = minw2W

X

(i,j)2w

(fa

(i)- f

b

(j))2 (7.4)

Besides normalized distance matrix mean, max and min of the two data streamshave been compared. If the two data streams have a low warping path, but havehuge difference in magnitude, the implementation will set the data streams to bedifferent. A difference of ±10% in magnitude have been used.

7.2.4 List Similarity

For comparing of lists, have two dimensions for calculation similarity been used.Length similarities and cosine similarities are calculated then multiplied and con-verted to a percentage, to provide a one combined value of similarity of two lists.

79

The length similarity ls is given by Equation 7.5 and is the difference betweenlengths of list A and list B.

ls =min(len(A), len(B))max(len(A), len(B))

(7.5)

The length similarity gives a value between 0 and 1. The cosine similarity, cs,is given by Equation 7.6 where A is the result of the validated point and B is theresult of the invalidated point, both given as vectors.

cs = cos(✓) =A ·B

kAkkBk (7.6)

The cosine similarity gives a value between 0 and 1 for how similar the orienta-tion of the vectors are, not how similar the magnitude is. Two vectors with a cosinesimilarity of 1 has the same orientation. If the vectors are perpendicular oriented,the cosine similarity will be 0. The cosine similarity gives a value between 0 and 1

for a positive space.

7.3 evaluation setup

The evaluated algorithms in Metafier, are evaluated with data from GTH. A set of9 rooms is chosen, 3 at each floor, 4 offices and 5 conference rooms have been se-lected. For all rooms validated points representing room temperature, CO

2

level,and illuminance have been included. This gives 9 rooms with 3 points in each,which is a total set of 27 validated points. All 903 points have been manuallylabeled with point type, as ground truth. The data from all points in GTH aresampled with an interval of 5 minutes, with data from 7 days ( Midnight 2016-06-28 to Midnight 2016-07-05). The effect of occupancy have not been taken intoconsideration. A similarity confidence over 75% have been used, as this is an ac-ceptable threshold which the user could have used. All estimates with a similarityconfidence over 75% are used as a true positive, when both validated and invali-dated have the same point type. The accuracy a is given by Equation 7.7 as whereTP are true positives (i. e.an invalidated point with the same point type as the val-idated point and a similarity confidence over 75%), TN are true negatives, FP arefalse positives, and FN are false negatives.

a =TP+ TN

TP+ TN+ FP+ FN

(7.7)

For evaluation of the selection of validated points, all experiments have beencreated as one-by-one for each of the validated point and tested for all other points.Afterwards a different combinations of the validated points have been created. Acombination of all combinations with 1, 3, 5, and 7 of each validated type havebeen created. Combination 1, is structured with 1 validated point with the pointtype of room temperature, 1 CO

2

level, and 1 illuminance. Combination 3, isstructured with 3 validated points with the point type of room temperature, 3 CO

2

level, and 3 illuminance. Combination 5, is structured with 5 validated points with

80

the point type of room temperature, 5 CO2

level, and 5 illuminance. Combination7, is structured with 7 validated points with the point type of room temperature,7 CO

2

level, and 7 illuminance. For the combination 7, 21 points of 27 validatedpoints have been used.

7.4 results

This section present the results from the algorithms in Metafier, EMD, DTW, andSC. Furthermore, a combination of the three algorithms are shown under thelabel "All", for Figure 7.1 to Figure 7.4. As described in Section 7.3, combinationsof validated points have been shown in Figure 7.1 to Figure 7.4. The results havebeen split for each point type. The x-axis shows the point type and the algorithmwhich have produced the result. The y-axis shows the accuracy in percent. Forilluminance the box is colored blue, for CO

2

level the box is colored green, andfor room temperature the box is colored red.

Figure 7.1: Results with a combination of 1 (validation points: 3) for the three algorithmsand All.

The results in Figure 7.1 for llluminance shows a minimum accuracy of 94.49%and a maximum accuracy of 94.71% with a STD of 0.06. For CO

2

level the min-imum accuracy was 94.60% and a maximum accuracy of 97.19% with a STD of0.53. For room temperature the minimum accuracy was 94.38% and a maximumaccuracy of 98.13% with a STD of 1.22. For DTW the maximum accuracy was98.13% for room temperature. For SC the maximum accuracy was 97.14% forroom temperature. For EMD the maximum accuracy was 97.19% for CO

2

level.


The results in Figure 7.2 for illuminance shows a minimum accuracy of 94.49%and a maximum accuracy of 94.64% with a STD of 0.04. For CO

2

level the min-imum accuracy was 94.60% and a maximum accuracy of 95.78% with a STD of0.25. For room temperature the minimum accuracy was 94.49% and a maximumaccuracy of 98.09% with a STD of 1.10.

81



2




2


7.5 discussion

The results in Figure 7.1 to Figure 7.4 show that the process of selecting validatedpoints does not effect the results as much as the algorithm and point type incombination. It was hypothesized that the results in Figure 7.1 could have a lowor random accuracy, due to the fact that the results was generated with only 1

validated point.Having more validated points decreased the standard deviation e. g.for SC from

1.22 to 1.07 using the combination of 7 validated points instead of 1. It is the overalltendency, that the standard deviation has decreased when using more validatedpoints.

DTW has a high score of accuracy regarding points with the point type of roomtemperature. For DTW and SC the results for points with point type of roomtemperature was higher than the two other point types. Only for EMD was theaccuracy almost equal for all three point types. This is surprising, due to thefact that room temperature changes slowly, and thereby should fit for EMD. One

82

reason for this, can be found in the raw temperature data, where the temperatureis almost steady within a interval of ±2

�C. With an almost steady data stream, thealgorithm will only find the slow frequency band of changing between day andnight set point.

The three algorithms had success in determining when a point was not similarto another point, but had some difficulties at detecting when two points weresimilar. This result is not clear from Figure 7.1 to Figure 7.4, but can be seen in theratio of true positives compared to the false negatives.

7.6 summary

Manual annotation of metadata for points from building instrumentation is a timeconsuming task. With Metafier algorithms for mining metadata by data streamcomparison have been demonstrated. The system uses several algorithms and arerobust enough to handle data streams with only slightly similar patterns. An eval-uation of Metafier with points and data from GTH have been performed. Metafierhas been evaluated with 903 points, and the overall accuracy, with only 3 knownexamples was 94.71%. Furthermore, using DTW for mining points with the pointtype of room temperature achieved an accuracy as high as 98.13%. Mining meta-data are extremely useful for SDB and for creating the infrastructure for PBA.Metafier has only been evaluated by three point types (room temperature, CO

2

level, and illuminance), but are not limited to those three point types. Anotherpoint type could be power, which is the most frequent point type in Figure 2.1. Inaddition could point type for valves be considered, due to the unit of percentagebeing second most frequent in Figure 2.1.

83

Part IV

P E R S P E C T I V E S A N D I M P L I C AT I O N S

In this part we round off the thesis by first providing a discussion fu-eled by a number of insights, observations, and anecdotal evidence ac-quired during our work. The intention of this discussion is to provideadditional angles for future research initiatives. With the perspectivesof Chapter 8, we have set the backdrop for drawing out matters ofimportance for the following conclusion, limitations, and future worksections in Chapter 9

The conclusion section distills the results and insights from Part II andPart III by providing answers to the research questions posed in Sec-tion 2.2.

8P E R S P E C T I V E S

This chapter presents the perspective for the research of this thesis. There is a focuson validity and limitations for the experiments conducted in Part II and Part III.The validity of the experiments are validated through the evaluation criteria fromChapter 3.

"Software tools and data processing methods must be validated with data from real build-ings. All prototypes must be tested with real data, to have a realistic test and to show theapplicability"Data processing methods for Non-Intrusive Load Monitoring (NILM), see Part II,have been evaluated and applied to data from Energy Guild Vejle Nord [5] (EGVN).Data from a cold store, see Section 3.3.3, have been used. The data set, whichincludes electricity data for 40 points cannot be shared, due to business considera-tions. The results from Part II, indicate that NILM and Knowledge Discovery (KD)can provide information with fewer sensors.

Software tools and data processing methods for annotation and structuringmetadata for building instrumentation, see Part III, have been evaluated withpoints from GTH and SDU OU44. The evaluation for Part III, could have lead to alarger contribution, if the data set from building instrumentations have included amore diverse set of buildings e. g. buildings from United States (US). Furthermore,could buildings with diverse purposes e. g. a hospital, a shopping mall etc.increasethe impact. Buildings from US or buildings with diverse purposes would providedata with different boundary values and equipment types.

"Software tools must be evaluated by experts working in the area of the targeting solution"Metafier has been tested with one facility manager and two students from soft-ware engineering with relations to Energy Informatics (EI). The evaluation mustbe seen as a preliminary study, due to the fact that only three subjects have beenincluded in the evaluation. The evaluation should have included at least one fa-cility manager from another facility. At best a set of facility managers from otherfacilities should have been included, such data could have been compared withineach facility.

"Data processing methods for Micro Grid Living Lab must bring more value to the energydata. The work related to Micro Grid Living Lab, needs to provide, enrich and transformsimple data, to minimize the sensor infrastructure. To make expensive processes more fea-sible with respect to energy counseling, the infrastructure should be simple"NILM can bring more value to a data stream based on a trained model, but itrequire a large data set to generate the trained models. To optimize the approachusing NILM, a simple sensor which can be installed without an electrician, couldbe moved around to collect the training data. This would be a feasible solution.Part II indicates NILM and KD can provide more value to sensor data, with asmaller setup. 4 virtual meters have been applied with NILM, and provided esti-

87

mates of the energy breakdown for 40 sensors.

"Electricity data from Micro Grid Living Lab should be applied to Non-Intrusive LoadMonitoring Toolkit (NILMTK), to validate the quality. NILMTK provides the ability totest multiple disaggregation algorithms on the same data set"Data from a cold store, see Section 3.3.3, has been evaluated with NILMTK. Stan-dard algorithms, Combinatorial Optimization (CO) and Factorial Hidden MarkovModel (FHMM), for NILM have been applied. Furthermore, two modifications ofthe standard algorithms was applied. The algorithms were: Season Weighted CO,meaning CO which takes outside temperature into account and FHMM with dayspecific training, meaning one model for each day of the week.

"Data processing methods for COORDICY, should use data from building instrumenta-tion, to generate metadata. To minimize the errors, which can be in tags from BuildingManagement System (BMS), it should be based on data streams from points. Furthermoreshould generated metadata be evaluated against manual labelled ground truth metadata"With a combination of the used approaches in Calbimonte et al. [30] and Balajiet al. [33] where data processing methods for Metafier have been evaluated. Theevaluation have used data from GTH for temperature, CO

2

and illuminance withan accuracy as high as 98%. The evaluation have used ground truth metadata foraround 900 points. Perspectives for Metafier includes evaluation on SDU OU44,which has over 7000 points. In addition to a high number of points, does SDUOU44 include points with a different characteristics. An example is points mea-suring pressure and humidity inside the HVAC.

"Data processing methods for automated metadata generation, should focus on point types,eg. temperature and humidity. Point types have been chosen based on Figure 2.1"Chapter 7 has been focusing on semi-automated generation of metadata, with re-spect to point type for: temperature, CO

2

and illuminance.

For the overall evaluation, all criteria have been met. One limitation is found withdata from EGVN, where the data cannot be shared due to business considerations.NILM can provide knowledge of equipment, and bring value to data from a singlesensor. It requires a data set for training a model.

Data processing methods for semi-automated metadata generation, can supportBMSs to have an automated configuration. The data processing methods shouldbe included in the transition of having Software Defined Buildings (SDBs) to re-place BMSs. For this thesis the focus for SDB has been on commercial buildings,but it should not be limited to commercial buildings, it could be hospitals, shop-ping malls etc. For a shopping mall, could a Portable Building Application (PBA)provide customers of the shopping mall information of where certain productcategories are placed. Another example could be a PBA to optimize lightning con-ditions inside the shopping mall, to reduce energy consumption. A third examplecould be a PBA which uses information from the Supervisory Control And DataAcquisition (SCADA) system of the shopping mall about refrigerated display caseto determine if equipment should be replaced.

Data processing methods have not been evaluated for other challenges thanmetadata regarding building instrumentation.

88

9C O N C L U S I O N

This chapter concludes the research described in the previous chapters. Focus areon the original questions from Section 2.2:

How to transform energy and sensor data from buildings to knowledge that support Soft-ware Defined Buildings?

The approach of Fayyad et al. [1] for Knowledge Discovery (KD) and Data Analyt-ics (DA) has been followed, for transforming sensor data from buildings into morevaluable information. A concept for Software Defined Buildings (SDBs) illustratedin Figure 3.1, has set the context of the research with respect to the overall researchquestion.

Part II has been focussing transforming and disaggregation raw energy datainto a representation of the equipment connected to the meter. Non-IntrusiveLoad Monitoring (NILM) has been applied for 4 virtual meters and provided esti-mates of the energy breakdown for 40 sensors.. This research provides preliminaryconcepts for applications, to bring more value to energy data, and minimize thesensor infrastructure. The applications could be developed as Portable BuildingApplications (PBAs).

Part III has been focussing on annotating and structuring of metadata for build-ing instrumentations via the software tool Metafier. The task of annotating andstructuring metadata for points, provides context for the corresponding data stream.Metadata for points provide a semantic representation which provide a represen-tation of a physical environment to a SDB. Metadata for points supports SDBto enable PBA. Metafier has been developed with data processing methods forsemi-automated metadata generation.

Which steps does it take to have automated or semi-automated metadata generation? Howmuch metadata can we retrieve from simple data stream comparison?

Part III has focussed on the software tool Metafier for annotation and structuringmetadata for points from building instrumentations. KD and DA have been ap-plied on data streams, to transform data to knowledge in the form of metadata.For Metafier to generate and annotate metadata, the process in Metafier takesthree steps. First a set of validated points should be created. Then can generators,see Section 5.3.4, be applied on data streams from points with or without incor-rect metadata. Finally, a certain similarity confidence threshold can be used. Thesimilarity confidence can also be manually applied via the Graphical User Inter-face (GUI) of Metafier. When an estimate comply with a similarity confidence,metadata will be transferred and annotated for the points.

For the evaluation of data stream comparison in Chapter 7, a threshold of 75%for similarity confidence has been used, for having a automated evaluation with-out human in the loop. Three point types have been recognized by data stream

89

comparison in the evaluation. The data stream comparison has been evaluatedwith three data processing methods: Slope Compare (SC), Empirical Mode De-composition (EMD) and Dynamic Time Warping (DTW). A total of 903 points wasincluded in the evaluation. The accuracy for annotation points with point type forthe three selected types, was from 94.38% to 98.13%.

Insights

This approach can be applied for all data streams which provide a pattern. Thedata processing methods cannot be used for data streams holding a certain valuefor all timestamps. An example would be a boolean value indicating if people arepresent in a room. Another boolean value could indicate that the light in the sameroom is turned on. Here it would be a challenge to annotate the correct type ofpoint. The two points indicates whether people are present and a value of either1 or 0, but their point type should be "persons present" and "light status".

Contributions

Three data processing methods for simple data stream comparison, to simi-automatemetadata generation. The accuracy for the three data processing methods was94.38% to 98.13%. The data processing methods determined point type for tem-perature, CO

2

and illuminance.

Which features are required to create a tool for maintenance of metadata?

Chapter 5 describes the life cycle of a point. The life cycle focuses on discovery,validation and maintenance, Metafier has been designed around those three statesof a point. Chapter 6 evaluates Metafier by three subjects. It has been found thatsearch and grouping of points increased the efficiency of annotating metadata. Thelife cycle for a point, was understood by the test subjects. The major challenge forthe evaluation, was regarding the mindset of the subjects. The used representationof metadata was difficult to understand. The purpose of annotating metadata wasunclear for most of the test subjects.

Insights

The preliminary study of Metafier with respect to maintenance of points, focusesthe background for facility managers. After the evaluation a couple of open ques-tions emerge: Could facility managers be more efficient with software tools sup-porting their task related to metadata for points? Which application should bedeveloped for facility managers to have a focus on metadata for points? Should afacility manager have a degree in software engineering to understand why meta-data is important? The evaluation of Metafier was conducted with three subjects,therefore is it difficult to provide a clear conclusion for this research question.

90

Contributions

Analysis of systems for building instrumentation with respect to discovery, val-idation and maintenance. Requirements for how to develop a software tool formaintenance of metadata for points.

How to disaggregate one sensor into a semantic representation of the equipment connectedto the sensor in an industrial setting?

An analysis of energy data from an industrial setting has been performed inPart II. The analysis illustrated a limited number of equipment which was oper-ated as Variable Frequency Drive (VFD). Non-Intrusive Load Monitoring Toolkit(NILMTK) has been applied to the data from the industrial setting, for this studya cold store. Analysis has been performed, to study how NILM can be used withdifferent levels of sub-metering for providing detailed breakdowns of the powerconsumption in an industrial setting. The results show that changing the level ofsub-metering increased the test accuracy, F1-score, by a third from 0.4 to 0.6. Theseresults apply to Combinatorial Optimization (CO) and Factorial Hidden MarkovModel (FHMM).

Insights

Part II studied the effect of using multiple data sources, to increase the test accu-racy. Weather and flow of goods were taken into account. The experiment withweather data, did not show any differences, mainly due to a high insulation forthe cold store. The experiment with flow of goods, where limited by aggregateddata for flow. The pallet data reported movements per day for the whole facility,where the used energy data only covered a third of the facility.

Contributions

Analysis of energy data from a cold store. NILM has been applied on a dataset with energy data from industrial equipment. Holmegaard and Kjærgaard [80]have been one of the first to apply NILM in an industrial setting.

How can data processing methods help make energy expensive processes more feasible?

An analysis of energy data from a cold store, see Section 3.3.3, has been performedin Part II. NILM has been applied to energy data from a cold store. 4 virtual me-ters have been applied with NILM, and provided estimates of the energy break-down for 40 sensors. The analysis of energy data from a cold store, emphasizethat compressors in the cold store were operated as ON/OFF devices, which washypothesized to be VFDs. The technical staff at the cold store, have optimized theperformance of the compressors, by reducing the ramp time of the compressors.This finding from the technical staff, could be transferred to other companies inEnergy Guild Vejle Nord [5] (EGVN). Similar pattern from the equipment, maybe found by applying NILM or other machine learning techniques to energy datafrom EGVN.

91

Insights

The cold store has a price of around 130e meter/year, which gives a total of5200e for 40 points. Introducing NILM will reduce the cost by a factor of 10.The analysis illustrated a reduction of the Mean Normalized Error (MNE) by half,when 4 points was introduced instead of one. The hypothesized challenges forusing NILM for data from compressors was neglected, due to the fact that thecompressors was operated as ON/OFF devices.

92

10F U T U R E R E S E A R C H

This chapter provides possible future research with respect to this thesis. Thereare challenges for the individual projects, which this thesis have been part of.

Future research related to Non-Intrusive Load Monitoring (NILM), would in-clude a higher sample frequency for electricity data from the cold store, see Sec-tion 3.3.3. One challenge regarding the data, which has a sample frequency of 1minute, is that transient events are aggregated. With High Frequency (HF) datatransient events could be detected, which could enable a human-in-the-loop ap-proach to learn the environment where NILM is applied. Then active learningcould be used for labelling unknown transient event when they occur.

For semi-automated generation of metadata via Metafier, an improvement wouldbe to implement data processing methods for location mining. Gonzalez et al.[105] have used an approach with event correlation between points. Here is pointsclustered based on event correlations. This would provide information of the re-lationship between points e. g. valve position, room temperature and temperatureof inlet air for the ventilation system. Another improvement would handle multi-ple point types, besides temperature, CO

2

and illuminance. An example which ischallenging is how to determine whether a data stream is a set point, due to thenon changing value of the data stream.

For Metafier the foundation should be replaced with Brick [34] for the meta-data model. This will support relationships between entities and a stronger querylanguage. Metafier would then be part of the collaboration behind Brick, whichincludes the leading universities in the field of Energy Informatics (EI). The eval-uation from Chapter 6 should be used to improve features of the software toolMetafier. Then an evaluation with multiple facility manager can be performed fordiverse buildings, both with respect to purpose and location. Such evaluationscan contribute to define the tasks of facility managers with respect to SoftwareDefined Buildings (SDBs) and Portable Building Applications (PBAs).

PBAs which communicate and integrate with the metadata model created viaMetafier should be included in future research. Three applications which could bedeveloped includes an application similar to the Comfy [69] app, an applicationfor NILM and an application for optimizing energy consumption as suggested byNelleman et al. [106]. Such three application should be developed and evaluatedon a set of buildings.

Integration of multiple buildings with diverse location and with diverse pur-poses, to have a set of buildings which are representative. For a set of buildings,there should be multiple buildings from each continent. There should also bebuildings from different climatic regions. An example of difference between con-tinent is the usage of air conditioning, which are common for buildings in UnitedStates (US) but not as common for buildings in Denmark. Furthermore a set ofbuildings should contain buildings with different ages, such an evaluation of PBAswill face realistic challenges.

93

B I B L I O G R A P H Y

[1] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. Knowledge discovery and data mining:Towards a unifying framework. In Proceedings of the Second International Conference on Knowledge Discoveryand Data Mining, KDD’96, pages 82–88. AAAI Press, 1996. URL http://dl.acm.org/citation.cfm?id=3001460.3001477.

[2] University of California Berkeley. sMAP instance at OpenBMS. URL http://new.openbms.org. [Online;Accessed 2015-11-17].

[3] Andrew Krioukov, Gabe Fierro, Nikita Kitaev, and David Culler. Building application stack (bas).In Proceedings of the Fourth ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Build-ings, BuildSys ’12, pages 72–79, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1170-0. URLhttp://doi.acm.org/10.1145/2422531.2422546.

[4] Stephen Dawson-Haggerty, Andrew Krioukov, Jay Taneja, Sagar Karandikar, Gabe Fierro, Nikita Ki-taev, and David Culler. Boss: Building operating system services. In Presented as part of the 10thUSENIX Symposium on Networked Systems Design and Implementation, NSDI ’13, pages 443–457, Lombard, IL,2013. USENIX. URL https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/dawson-haggerty.

[5] Green Tech Center. Energy guild. URL http://greentechcenter.dk/uk/projects/energy-guild.aspx.[Online; Accessed 2014-08-23].

[6] Zico Kolter and Matthew J Johnson. Redd: A public data set for energy disaggregation research. InWorkshop on Data Mining Applications in Sustainability (SIGKDD), volume 25, pages 59–62.

[7] Stephen Dawson-Haggerty, Xiaofan Jiang, Gilman Tolle, Jorge Ortiz, and David Culler. sMAP: a simplemeasurement and actuation profile for physical information. SenSys ’10, pages 197–210. ACM. URLhttp://dl.acm.org/citation.cfm?id=1870003.

[8] Sarvapali D. Ramchurn, Perukrishnen Vytelingum, Alex Rogers, and Nicholas R. Jennings. Putting the’smarts’ into the smart grid: A grand challenge for artificial intelligence. Commun. ACM, 55(4):86–97, April2012. URL http://doi.acm.org/10.1145/2133806.2133825.

[9] Peter Meibom, Klaus Baggesen Hilger, Henrik Madsen, and Dorthe Vinther. Energy comes together indenmark: The key to a future fossil-free danish power system. IEEE Power and Energy Magazine, 11(5):46–55, Sept 2013. URL https://doi.org/10.1109/MPE.2013.2268751.

[10] Danish Government. Summary Energy Plan 2050, From Coal, Oil and Gas to Green Energy.ISBN 9788792727114. URL http://www.ens.dk/sites/ens.dk/files/forbrug-besparelser/energispareraadet/moeder-energispareraadet/moede-energispareraadet-16-marts-2011/Energistrategi2050_sammenfatning.pdf.

[11] Bo Nørregaard Jørgensen, Mikkel Baun Kjærgaard, Sanja Lazarova-Molnar, Hamid Reza Shaker, and Chris-tian T. Veje. Challenge: Advancing energy informatics to enable assessable improvements of energy perfor-mance in buildings. In Proceedings of the 2015 ACM Sixth International Conference on Future Energy Systems,e-Energy ’15, pages 77–82, New York, NY, USA, 2015. ACM. URL http://doi.acm.org/10.1145/2768510.2770935.

[12] Nordic Energy Regulators NordREG. Nordic market report 2014. Technical report, Tech. rep. NordREG,Nordic Energy Regulators. URL http://www.nordicenergyregulators.org/wp-content/uploads/2014/06/Nordic-Market-Report-2014.pdf.

[13] Brian Vad Mathiesen, Henrik Lund, David Connolly, Poul Alberg Østergaard, Bernd Møller, and BerndMøller. The design of smart energy systems for 100% renewable energy and transport solutions. In 8thConference on Sustainable Development of Energy, Water and Environment Systems. URL https://goo.gl/B8ines.

[14] Mike Hazas, A. J. Bernheim Brush, and James Scott. Sustainability does not begin with the individual.interactions, 19(5):14–17, September 2012. URL http://doi.acm.org/10.1145/2334184.2334189.

[15] U.S. Energy Information Administration. Annual energy outlook, . URL http://www.eia.gov/forecasts/aeo/er/index.cfm. [Online; Accessed 2014-09-22].

[16] Danish Energy Agency. Energy Statistic for Denmark 2012 - Data, Tables, Statistics and Maps. ISSN 0906-4699. URL https://ens.dk/sites/ens.dk/files/energistyrelsen/Nyheder/energistatistik2012dk.pdf.

95

http://dl.acm.org/citation.cfm?id=3001460.3001477



http://doi.acm.org/10.1145/2422531.2422546

https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/dawson-haggerty

https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/dawson-haggerty

http://greentechcenter.dk/uk/projects/energy-guild.aspx

http://dl.acm.org/citation.cfm?id=1870003

http://doi.acm.org/10.1145/2133806.2133825

https://doi.org/10.1109/MPE.2013.2268751

http://www.ens.dk/sites/ens.dk/files/forbrug-besparelser/energispareraadet/moeder-energispareraadet/moede-energispareraadet-16-marts-2011/Energistrategi2050_sammenfatning.pdf



http://doi.acm.org/10.1145/2768510.2770935

http://doi.acm.org/10.1145/2768510.2770935

http://www.nordicenergyregulators.org/wp-content/uploads/2014/06/Nordic-Market-Report-2014.pdf

http://www.nordicenergyregulators.org/wp-content/uploads/2014/06/Nordic-Market-Report-2014.pdf

https://goo.gl/B8ines

https://goo.gl/B8ines

http://doi.acm.org/10.1145/2334184.2334189

http://www.eia.gov/forecasts/aeo/er/index.cfm

http://www.eia.gov/forecasts/aeo/er/index.cfm

https://ens.dk/sites/ens.dk/files/energistyrelsen/Nyheder/energistatistik2012dk.pdf

https://ens.dk/sites/ens.dk/files/energistyrelsen/Nyheder/energistatistik2012dk.pdf

[17] U.S. Energy Information Administration. Frequently asked questions, . URL https://www.eia.gov/tools/faqs/faq.php?id=86&t=1. [Online; Accessed 2017-03-29].

[18] Randy H Katz, David E Culler, Seth Sanders, Sara Alspaugh, Yanpei Chen, Stephen Dawson-Haggerty,Prabal Dutta, Mike He, Xiaofan Jiang, Laura Keys, et al. An information-centric energy infrastructure:The berkeley view. Sustainable Computing: Informatics and Systems, 1(1):7 – 22, 2011. URL http://www.sciencedirect.com/science/article/pii/S2210537910000028.

[19] Gian Paolo Perrucci, Frank H. P. Fitzek, and Jörg Widmer. Survey on energy consumption entities on thesmartphone platform. In 2011 IEEE 73rd Vehicular Technology Conference, VTC2011-Spring, pages 1–6, May2011. doi: 10.1109/VETECS.2011.5956528.

[20] Laurent Schmitt, Jayant Kumar, David Sun, Said Kayal, and S. S. Mani Venkata. Ecocity upon a hill:Microgrids and the future of the european city. IEEE Power and Energy Magazine, 11(4):59–70, 2013. URLhttp://ieeexplore.ieee.org/abstract/document/6548108/.

[21] U.S. Green Building Council. LEED - Leadership in Energy and Environmental Design. URL http://www.usgbc.org/leed. [Online; Accessed 2015-12-17].

[22] Energy Star - The simple choice for energy efficiency. URL https://www.energystar.gov/. [Online;Accessed 2015-12-17].

[23] Green Building Initiative. Green Building Initiative - Green Globes Certification. URL https://www.thegbi.org/green-globes-certification/. [Online; Accessed 2015-12-17].

[24] John H. Scofield. Efficacy of leed-certification in reducing energy consumption and greenhouse gas emis-sion for large new york city office buildings. Energy and Buildings, 67:517 – 524, 2013. ISSN 0378-7788. URLhttp://www.sciencedirect.com/science/article/pii/S037877881300529X.

[25] University of Southern Denmark. COORDICY. URL http://sdu.dk/coordicy. [Online; Accessed 2015-12-17].

[26] Hassan Farhangi. The path of the smart grid. IEEE Power and Energy Magazine, 8(1):18–28, January 2010.URL http://ieeexplore.ieee.org/abstract/document/5357331/.

[27] Steven Chu and Arun Majumdar. Opportunities and challenges for a sustainable energy future. Nature,488(7411):294–303. URL http://dx.doi.org/10.1038/nature11475.

[28] Jayavardhana Gubbi, Rajkumar Buyya, Slaven Marusic, and Marimuthu Palaniswami. Internet of things(iot): A vision, architectural elements, and future directions. Future Generation Computer Systems, 29(7):1645–1660, 2013. URL http://dx.doi.org/10.1016/j.future.2013.01.010.

[29] Arka Bhattacharya, Joern Ploennigs, and David Culler. Short paper: Analyzing metadata schemas forbuildings: The good, the bad, and the ugly. In Proceedings of the 2Nd ACM International Conference onEmbedded Systems for Energy-Efficient Built Environments, BuildSys ’15, pages 33–34, New York, NY, USA,2015. ACM. URL http://doi.acm.org/10.1145/2821650.2821669.

[30] Jean-Paul Calbimonte, Zhixian Yan, Hoyoung Jeung, Oscar Corcho, and Karl Aberer. Deriving semanticsensor metadata from raw measurements. In Proceedings of the 5th International Conference on SemanticSensor Networks - Volume 904, SSN’12, pages 33–48, Aachen, Germany, Germany, 2012. CEUR-WS.org.URL http://dl.acm.org/citation.cfm?id=2887689.2887692.

[31] Emil Holmegaard, Aslak Johansen, and Mikkel Baun Kjærgaard. Metafier - a tool for annotating andstructuring building metadata. In Proceeding of the 2017 IEEE Smart World Congress. IEEE, .

[32] Arka A. Bhattacharya, Dezhi Hong, David Culler, Jorge Ortiz, Kamin Whitehouse, and Eugene Wu. Au-tomated metadata construction to support portable building applications. In Proceedings of the 2Nd ACMInternational Conference on Embedded Systems for Energy-Efficient Built Environments, BuildSys ’15, pages 3–12,New York, NY, USA, 2015. ACM. URL http://doi.acm.org/10.1145/2821650.2821667.

[33] Bharathan Balaji, Chetan Verma, Balakrishnan Narayanaswamy, and Yuvraj Agarwal. Zodiac: Organizinglarge deployment of sensors to create reusable applications for buildings. In Proceedings of the 2Nd ACMInternational Conference on Embedded Systems for Energy-Efficient Built Environments, BuildSys ’15, pages13–22, New York, NY, USA, 2015. ACM. URL http://doi.acm.org/10.1145/2821650.2821674.

[34] Bharathan Balaji, Arka Bhattacharya, Gabriel Fierro, Jingkun Gao, Joshua Gluck, Dezhi Hong, AslakJohansen, Jason Koh, Joern Ploennigs, Yuvraj Agarwal, Mario Berges, David Culler, Rajesh Gupta,Mikkel Baun Kjærgaard, Mani Srivastava, and Kamin Whitehouse. Brick: Towards a unified meta-data schema for buildings. In Proceedings of the 3rd ACM International Conference on Systems for Energy-Efficient Built Environments, BuildSys ’16, pages 41–50, New York, NY, USA, 2016. ACM. URL http://doi.acm.org/10.1145/2993422.2993577.

96

https://www.eia.gov/tools/faqs/faq.php?id=86&t=1

https://www.eia.gov/tools/faqs/faq.php?id=86&t=1

http://www.sciencedirect.com/science/article/pii/S2210537910000028


http://ieeexplore.ieee.org/abstract/document/6548108/

http://www.usgbc.org/leed

http://www.usgbc.org/leed

https://www.energystar.gov/

https://www.thegbi.org/green-globes-certification/

https://www.thegbi.org/green-globes-certification/

http://www.sciencedirect.com/science/article/pii/S037877881300529X

http://sdu.dk/coordicy

http://ieeexplore.ieee.org/abstract/document/5357331/

http://dx.doi.org/10.1038/nature11475

http://dx.doi.org/10.1016/j.future.2013.01.010

http://doi.acm.org/10.1145/2821650.2821669


http://doi.acm.org/10.1145/2821650.2821667

http://doi.acm.org/10.1145/2821650.2821674

http://doi.acm.org/10.1145/2993422.2993577

http://doi.acm.org/10.1145/2993422.2993577

[35] Ali Adabi, Patrick Mantey, Emil Holmegaard, and Mikkel Baun Kjærgaard. Status and challenges ofresidential and industrial non-intrusive load monitoring. In Proceeding of the 2015 IEEE Conference on Tech-nologies for Sustainability, SusTech ’15, pages 181–188. IEEE. URL http://ieeexplore.ieee.org/document/7314344/.

[36] Carrie Armel, Abhay Gupta, Gireesh Shrimali, and Adrian Albert. Is disaggregation the holy grail ofenergy efficiency? the case of electricity. Energy Policy, 52:213 – 234, 2013. URL http://www.sciencedirect.com/science/article/pii/S0301421512007446.

[37] Inc. Sense. Sense Labs, Inc. URL https://sense.com. [Online; Accessed 2017-06-07].

[38] Sarah Darby. The effectiveness of feedback on energy consumption. A Review for DEFRA of the Literatureon Metering, Billing and direct Displays, 486.

[39] Dong Energy. Energy consumption for residential. URL http://www.dongenergy.dk/privat/energitips/tjekditforbrug/gennemsnitsforbrug/Pages/elforbrugihus.aspx. [Online; Accessed 2014-09-22.

[40] George W Hart. Prototype Nonintrusive Appliance Load Monitor. Technical report, MIT Energy Labora-tory and Electric Power Research Institute, . URL http://georgehart.com/research/Hart1985.pdf.

[41] Jon Froehlich, Eric Larson, Sidhant Gupta, Gabe Cohn, Matthew Reynolds, and Shwetak Patel. Disaggre-gated end-use energy sensing for the smart grid. IEEE Pervasive Computing, 10(1):28–39.

[42] Hyungsul Kim, Manish Marwah, Martin F Arlitt, Geoff Lyon, and Jiawei Han. Unsupervised disag-gregation of low frequency power measurements. In SDM, volume 11, pages 747–758. SIAM. URLhttps://doi.org/10.1137/1.9781611972818.64.

[43] Oliver Parson, Siddhartha Ghosh, Mark Weal, and Alex Rogers. Non-intrusive load monitoring usingprior models of general appliance types. In AAAI. URL https://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/4809.

[44] Zico Kolter and Tommi Jaakkola. Approximate inference in additive factorial hmms with applicationto energy disaggregation. In Proceedings of the Fifteenth International Conference on Artificial Intelligence andStatistics, volume 22 of Proceedings of Machine Learning Research, pages 1472–1482, La Palma, Canary Islands,21–23 Apr 2012. PMLR. URL http://proceedings.mlr.press/v22/zico12.html.

[45] Sidhant Gupta, Matthew S. Reynolds, and Shwetak N. Patel. Electrisense: Single-point sensing using emifor electrical event detection and classification in the home. In Proceedings of the 12th ACM InternationalConference on Ubiquitous Computing, UbiComp ’10, pages 139–148, New York, NY, USA, 2010. ACM. URLhttp://doi.acm.org/10.1145/1864349.1864375.

[46] Ahmed Zoha, Alexander Gluhak, Muhammad Ali Imran, and Sutharshan Rajasegarar. Non-intrusiveload monitoring approaches for disaggregated energy sensing: A survey. Sensors, 12(12):16838–16866.URL http://www.mdpi.com/1424-8220/12/12/16838.

[47] Emil Holmegaard, Aslak Johansen, and Mikkel Baun Kjærgaard. Towards a metadata discovery, mainte-nance and validation process to support applications that improve the energy performance of buildings.In 2016 IEEE International Conference on Pervasive Computing and Communication Workshops, PerCom Work-shops ’16, pages 1–6. IEEE, . URL http://ieeexplore.ieee.org/document/7457145/.

[48] Anthony Rowe, Mario E Berges, Gaurav Bhatia, Ethan Goldman, Ragunathan Rajkumar, James H Gar-rett Jr, José MF Moura, and Lucio Soibelman. Sensor andrew: Large-scale campus-wide sensing and actu-ation. IBM Journal of Research and Development, 55(1.2):6–1. URL http://ieeexplore.ieee.org/document/5697279/.

[49] Colin Dixon, Ratul Mahajan, Sharad Agarwal, AJ Brush, Bongshin Lee, Stefan Saroiu, and Paramvir Bahl.An operating system for the home. In Proceedings of the 9th USENIX conference on Networked Systems Designand Implementation, pages 25–25. USENIX Association. URL https://www.usenix.org/node/163038.

[50] Thomas Weng, Anthony Nwokafor, and Yuvraj Agarwal. Buildingdepot 2.0: An integrated managementsystem for building analysis and control. In Proceedings of the 5th ACM Workshop on Embedded SystemsFor Energy-Efficient Buildings, BuildSys’13, pages 7:1–7:8, New York, NY, USA, 2013. ACM. URL http://doi.acm.org/10.1145/2528282.2528285.

[51] Apple Inc. Apple HomeKit framework. URL https://developer.apple.com/homekit/. [Online; Accessed2015-11-17].

[52] Google. Google Weave. URL https://developers.google.com/weave/. [Online; Accessed 2015-11-17].

[53] Philips Lightning Holding B.V. Meet Hue. URL http://www2.meethue.com/en-us/. [Online; Accessed2017-04-17].

97

http://ieeexplore.ieee.org/document/7314344/




https://sense.com

http://www.dongenergy.dk/privat/energitips/tjekditforbrug/gennemsnitsforbrug/Pages/elforbrugihus.aspx

http://www.dongenergy.dk/privat/energitips/tjekditforbrug/gennemsnitsforbrug/Pages/elforbrugihus.aspx

http://georgehart.com/research/Hart1985.pdf

https://doi.org/10.1137/1.9781611972818.64

https://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/4809

https://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/4809

http://proceedings.mlr.press/v22/zico12.html

http://doi.acm.org/10.1145/1864349.1864375

http://www.mdpi.com/1424-8220/12/12/16838




https://www.usenix.org/node/163038

http://doi.acm.org/10.1145/2528282.2528285

http://doi.acm.org/10.1145/2528282.2528285

https://developer.apple.com/homekit/

https://developers.google.com/weave/

http://www2.meethue.com/en-us/

[54] W3C. Resource description framework, . URL https://www.w3.org/RDF/. [Online; Accessed 2017-02-17].

[55] W3C. Sparql query language, . URL https://www.w3.org/TR/rdf-sparql-query/. [Online; Accessed2017-02-17].

[56] Nipun Batra, Jack Kelly, Oliver Parson, Haimonti Dutta, William Knottenbelt, Alex Rogers, AmarjeetSingh, and Mani Srivastava. Nilmtk: An open source toolkit for non-intrusive load monitoring. In Pro-ceedings of the 5th International Conference on Future Energy Systems, e-Energy ’14, pages 265–276, New York,NY, USA, 2014. ACM. URL http://doi.acm.org/10.1145/2602044.2602051.

[57] Coalton Bennett and Darren Highfill. Networking ami smart meters. In 2008 IEEE Energy 2030 Conference,pages 1–8, Nov 2008. URL http://ieeexplore.ieee.org/document/4781067/.

[58] Pecan street inc. | we provide utilities, technology companies and university researchers access to theworlds best data on consumer energy and water consumption behavior, testing and verification of technol-ogy solutions, and commercialization services to help them bring their innovations to market faster. URLhttp://www.pecanstreet.org/. [Online; Accessed 2015-02-04].

[59] eGauge Systems LLC. egauge hardware. URL https://www.egauge.net/overview/. [Online; Accessed2017-04-26].

[60] George William Hart. Nonintrusive appliance load monitoring. volume 80, pages 1870–1891. IEEE, . URLhttp://ieeexplore.ieee.org/document/192069/.

[61] Shwetak N. Patel, Thomas Robertson, Julie A. Kientz, Matthew S. Reynolds, and Gregory D. Abowd. Atthe flick of a switch: Detecting and classifying unique electrical events on the residential power line, 2007.URL http://dl.acm.org/citation.cfm?id=1771592.1771608.

[62] Christopher Laughman, Kwangduk Lee, Robert Cox, Steven Shaw, Steven Leeb, Les Norford, and Pe-ter Armstrong. Power signature analysis. Power and Energy Magazine, IEEE, 1(2):56–63. URL http://ieeexplore.ieee.org/document/1192027/.

[63] Michael Zeifman and Kurt Roth. Nonintrusive appliance load monitoring: Review and outlook. ConsumerElectronics, IEEE Transactions on, 57(1):76–84. URL http://ieeexplore.ieee.org/document/5735484/.

[64] Mario Berges, Ethan Goldman, H. Scott Matthews, and Lucio Soibelman. Enhancing electricity auditsin residential buildings with nonintrusive load monitoring. Journal of Industrial Ecology, 14(5):844–858,October 2010. URL http://onlinelibrary.wiley.com/doi/10.1111/j.1530-9290.2010.00280.x/pdf.

[65] Zico Kolter and Tommi Jaakkola. Approximate inference in additive factorial hmms with application toenergy disaggregation. In Proceedings of the Fifteenth International Conference on Artificial Intelligence andStatistics, AISTATS 2012, pages 1472–1482.

[66] Marisa B. Figueiredo, Ana de Almeida, and Bernardete Ribeiro. An Experimental Study on Electrical Sig-nature Identification of Non-Intrusive Load Monitoring (NILM) Systems, pages 31–40. Springer Berlin Hei-delberg, Berlin, Heidelberg, 2011. ISBN 978-3-642-20267-4. doi: 10.1007/978-3-642-20267-4_4. URLhttp://dx.doi.org/10.1007/978-3-642-20267-4_4.

[67] Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restorationof images. IEEE Trans. Pattern Anal. Mach. Intell., 6(6):721–741, November 1984. URL http://dx.doi.org/10.1109/TPAMI.1984.4767596.

[68] Zoubin Ghahramani and Michael I Jordan. Factorial hidden markov models. Machine learning, 29(2-3):245–273, 1997. URL http://dx.doi.org/10.1023/A:1007425814087.

[69] Building Robotics. Comfy (by Building Robotics). URL https://www.comfyapp.com. [Online; Accessed2016-02-17].

[70] Andrew Krioukov, Stephen Dawson-Haggerty, Linda Lee, Omar Rehmane, and David Culler. A livinglaboratory study in personalized automated lighting controls. In Proceedings of the Third ACM Workshop onEmbedded Sensing Systems for Energy-Efficiency in Buildings, BuildSys ’11, pages 1–6, New York, NY, USA,2011. ACM. URL http://doi.acm.org/10.1145/2434020.2434022.

[71] Aisha Umair, Anders Clausen, and Bo Nørregaard Jørgensen. An agent-based negotiation approachfor balancing multiple coupled control domains. In 2015 IEEE PES Innovative Smart Grid TechnologiesLatin America, ISGT LATAM 2015, pages 46–51, Oct 2015. URL http://ieeexplore.ieee.org/document/7381128/.

[72] Schneider Electric. Schneider Electric StruxureWare, 2016. URL http://www.schneider-electric.com/site/StruxureWare/. [Online; Accessed 2016-11-30].

98

https://www.w3.org/RDF/

https://www.w3.org/TR/rdf-sparql-query/

http://doi.acm.org/10.1145/2602044.2602051


http://www.pecanstreet.org/

https://www.egauge.net/overview/






http://onlinelibrary.wiley.com/doi/10.1111/j.1530-9290.2010.00280.x/pdf

http://dx.doi.org/10.1007/978-3-642-20267-4_4

http://dx.doi.org/10.1109/TPAMI.1984.4767596

http://dx.doi.org/10.1109/TPAMI.1984.4767596

http://dx.doi.org/10.1023/A:1007425814087

https://www.comfyapp.com

http://doi.acm.org/10.1145/2434020.2434022



http://www.schneider-electric.com/site/StruxureWare/

http://www.schneider-electric.com/site/StruxureWare/

[73] KNX Association. About ETS, . URL https://www.knx.org/za/software/ets/about/index.php. [Online;Accessed 2017-04-27].

[74] KNX Association. What is KNX?, . URL https://www.knx.org/knx-en/knx/association/what-is-knx/index.php. [Online; Accessed 2017-04-27].

[75] NETxAutomation Software GmbH. NETxAutomation - Building Management Software. URL https://www.netxautomation.com/netx/en/. [Online; Accessed 2017-06-08].

[76] OPC Foundation. Unified Architecture, 2017. URL https://opcfoundation.org/about/opc-technologies/opc-ua/. [Online; Accessed 2016-04-27].

[77] Nord Pool. Nord Pool. URL https://www.nordpoolspot.com. [Online; Accessed 2014-03-20].

[78] Grid Manager. GridPoint, . URL http://www.grid-manager.com/gridpoint-142. [Online; Accessed 2014-09-25].

[79] Grid Manager. GridAgent, . URL http://www.grid-manager.com/gridagent-143. [Online; Accessed2014-09-25].

[80] Emil Holmegaard and Mikkel Baun Kjærgaard. Nilm in an industrial setting: A load characterization andalgorithm evaluation. In 2016 IEEE International Conference on Smart Computing, SMARTCOMP ’16, pages1–8. IEEE, 2016. URL http://ieeexplore.ieee.org/document/7501709/.

[81] Hsueh-Hsien Chang, Hong-Tzer Yang, and Ching-Lung Lin. Computer supported cooperative work indesign iv. chapter Load Identification in Neural Networks for a Non-intrusive Monitoring of IndustrialElectrical Loads, pages 664–674. Springer-Verlag, Berlin, Heidelberg, 2008. URL http://dx.doi.org/10.1007/978-3-540-92719-8_60.

[82] Nipun Batra, Haimonti Dutta, and Amarjeet Singh. Indic: Improved non-intrusive load monitoring usingload division and calibration. In Proceedings of the 2013 12th International Conference on Machine Learning andApplications - Volume 01, ICMLA ’13, pages 79–84, Washington, DC, USA, 2013. IEEE Computer Society.URL http://dx.doi.org/10.1109/ICMLA.2013.21.

[83] Jack Kelly and William Knottenbelt. Metadata for energy disaggregation. In 2014 IEEE 38th Interna-tional Computer Software and Applications Conference Workshops, pages 578–583. IEEE, 2014. URL http://ieeexplore.ieee.org/document/6903193/.

[84] David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In Proceedings of theEighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, pages 1027–1035, Philadelphia,PA, USA, 2007. Society for Industrial and Applied Mathematics. URL http://dl.acm.org/citation.cfm?id=1283383.1283494.

[85] S. N. Uttama-Nambi, Thanasis G Papaioannou, Dipanjan Chakraborty, and Karl Aberer. Sustainableenergy consumption monitoring in residential settings. In 2nd IEEE INFOCOM Workshop on Commu-nications and Control for Smart Energy Systems (CCSES 2013), number EPFL-CONF-183771, 2013. URLhttp://ieeexplore.ieee.org/document/6562866.

[86] Alan Marchiori, Douglas Hakkarinen, Qi Han, and Lieko Earle. Circuit-level load monitoring for house-hold energy management. Pervasive Computing, IEEE, 10(1):40–48, Jan 2011. URL http://ieeexplore.ieee.org/document/5582070/.

[87] Danish Meteorological Institute. Weather archive. URL http://www.dmi.dk/vejr/arkiver/ugeoversigt/.[Online; Accessed 2014-12-01].

[88] Nipun Batra, Oliver Parson, Mario Berges, Amarjeet Singh, and Alex Rogers. A comparison of non-intrusive load monitoring methods for commercial and residential buildings. URL http://arxiv.org/abs/1408.6595.

[89] Emil Holmegaard and Mikkel Baun Kjærgaard. Mining building metadata by data stream comparison. InProceeding of the 2016 IEEE Conference on Technologies for Sustainability, SusTech ’16, pages 28–33. IEEE. URLhttp://www.ieeeexplore.ws/document/7897138/.

[90] William K. Michener. Meta-information concepts for ecological data management. Ecological Infor-matics, 1(1):3–7, 2006. ISSN 1574-9541. URL http://www.sciencedirect.com/science/article/pii/S157495410500004X.

[91] Mikkel Baun Kjærgaard, Aslak Johansen, Fisayo Caleb Sangogboye, and Emil Holmegaard. Occure: anoccupancy reasoning platform for occupancy-driven applications. In Proceedings of the 19th InternationalAcm Sigsoft Symposium on Component-based Software Engineering, CBSE ’16, pages 39–48. ACM. URL http://ieeexplore.ieee.org/document/7497429/.

99

https://www.knx.org/za/software/ets/about/index.php

https://www.knx.org/knx-en/knx/association/what-is-knx/index.php

https://www.knx.org/knx-en/knx/association/what-is-knx/index.php

https://www.netxautomation.com/netx/en/

https://www.netxautomation.com/netx/en/

https://opcfoundation.org/about/opc-technologies/opc-ua/

https://opcfoundation.org/about/opc-technologies/opc-ua/

https://www.nordpoolspot.com

http://www.grid-manager.com/gridpoint-142

http://www.grid-manager.com/gridagent-143


http://dx.doi.org/10.1007/978-3-540-92719-8_60

http://dx.doi.org/10.1007/978-3-540-92719-8_60

http://dx.doi.org/10.1109/ICMLA.2013.21





http://ieeexplore.ieee.org/document/6562866



http://www.dmi.dk/vejr/arkiver/ugeoversigt/

http://arxiv.org/abs/1408.6595

http://arxiv.org/abs/1408.6595

http://www.ieeeexplore.ws/document/7897138/





[92] Jim Arlow and Ila Neustadt. UML 2 and the unified process: practical object-oriented analysis and design.Pearson Education. ISBN 978-0321321275.

[93] Lambda Foundry, Inc. and PyData Development Team. Python data analysis library. URL http://pandas.pydata.org/. [Online; Accessed 2015-06-15].

[94] Armin Ronacher. Flask - web development, one drop at a time. URL http://flask.pocoo.org/. [Online;Accessed 2015-06-15].

[95] Polymer Authors. Polymer project. URL https://www.polymer-project.org/. [Online; Accessed 2016-01-16].

[96] Python Software Foundation. 28.8 abstract base classes. URL https://docs.python.org/2/library/abc.html. [Online; Accessed 2015-06-15].

[97] Geraint Luff and Francis Galiegue. Json schema. URL http://json-schema.org/. [Online; Accessed2016-01-16].

[98] Gabe Fierro. Giles. URL http://gtfierro.github.io/giles. [Online; Accessed 2015-06-02].

[99] Siemens Switzerland Ltd. Desigo Insight - Operating the management station, V6.0 SP, Users guide,Volume 1. URL https://goo.gl/jVcCIM. [Online; Accessed 2017-05-17].

[100] KNX Association. ETS5 Features. URL https://www.knx.org/knx-en/software/ets/features/index.php.[Online; Accessed 2017-05-17].

[101] Inc Tridium. NiagaraAX Framework - GETTING STARTED. URL https://goo.gl/j4RHoS. [Online;Accessed 2017-05-17].

[102] Schneider Electric. SmartStruxure Solution Overview. URL http://www.acscompanies.com/download/attachment/11459. [Online; Accessed 2017-05-17].

[103] Eugene Silberberg. A revision of comparative statics methodology in economics, or, how to do comparativestatics on the back of an envelope. Journal of Economic Theory, 7(2):159 – 172, 1974. ISSN 0022-0531. URLhttp://www.sciencedirect.com/science/article/pii/0022053174901045.

[104] Romain Fontugne, Jorge Ortiz, Nicolas Tremblay, Pierre Borgnat, Patrick Flandrin, Kensuke Fukuda, DavidCuller, and Hiroshi Esaki. Strip, bind, and search: A method for identifying abnormal energy consumptionin buildings. In Proceedings of the 12th International Conference on Information Processing in Sensor Networks,IPSN ’13, pages 129–140, New York, NY, USA, 2013. ACM. URL http://doi.acm.org/10.1145/2461381.2461399.

[105] Luis I. Lopera Gonzalez, Reimar Stier, and Oliver Amft. Data mining-based localisation of spatial low-resolution sensors in commercial buildings. In Proceedings of the 3rd ACM International Conference on Systemsfor Energy-Efficient Built Environments, BuildSys ’16, pages 187–196, New York, NY, USA, 2016. ACM. URLhttp://doi.acm.org/10.1145/2993422.2993428.

[106] Peter Nelleman, Mikkel Baun Kjærgaard, Emil Holmegaard, Krzysztof Arendt, Aslak Johansen,Fisayo Caleb Sangogboye, and Bo Nørregaard. Demand response with model predictive comfort com-pliance in an office building. In Proceedings of the 2017 IEEE International Conference on Smart Grid Commu-nications, SmartGridComm’17. IEEE.

100

http://pandas.pydata.org/

http://pandas.pydata.org/

http://flask.pocoo.org/

https://www.polymer-project.org/

https://docs.python.org/2/library/abc.html

https://docs.python.org/2/library/abc.html

http://json-schema.org/

http://gtfierro.github.io/giles

https://goo.gl/jVcCIM

https://www.knx.org/knx-en/software/ets/features/index.php

https://goo.gl/j4RHoS

http://www.acscompanies.com/download/attachment/11459

http://www.acscompanies.com/download/attachment/11459

http://www.sciencedirect.com/science/article/pii/0022053174901045

http://doi.acm.org/10.1145/2461381.2461399

http://doi.acm.org/10.1145/2461381.2461399

http://doi.acm.org/10.1145/2993422.2993428

G L O S S A RY

app store A digital distribution platform, for applications(apps). 4

building instrumentation Building instrumentation is the physical sensor in-frastructure in a building, which different forms ofBuilding Management System (BMS) and BuildingAutomation System (BAS) uses. A building instru-mentation contains multiple points. 4–6, 8, 9, 11–15,18, 30, 32, 55, 57–59, 62, 65, 67, 70, 71, 74, 77, 83,87–89, 91

data The raw data for Knowledge Discovery (KD). 6, 13,16, 27, 29, 35, 36, 77, 89

data mining The process of finding patterns in transformed data.Methods to apply can be summarization, classifica-tion, regression. 6, 21, 35, 57, 58, 68, 77, 83

data processing method The term is used for tasks related to developmentof algorithms and Data Analytics (DA). 5–9, 11, 15,27, 31, 32, 35, 55, 77, 87–90, 93

data stream Data streams are continuously data from points.The term data stream are also used for historicaldata from points. 6, 8, 12–14, 16–18, 21, 32, 57, 58,61–63, 66–68, 73, 77–79, 83, 87–90, 93, 101

discovery The process of finding new points in a building in-strumentation and extract metadata for the points.8, 13, 18–20, 58, 59, 63, 68, 69, 90, 91

energy guild vejle nord Collaboration in a triple helix between Green TechCenter (GTC), University of Southern Denmark(SDU) - Center for Energy Informatics (CFEI) andcompanies in the area of Vejle, Denmark. The objec-tives for the guild are to share thoughts, knowledgeand data about energy. xvi, 7, 35, 87, 91

101

energy informatics Energy Informatics (EI) covers the interdisciplinarytasks of Information and Communications Technol-ogys (ICTs), energy engineering and software engi-neering to address energy challenges [11]. xvi, 3, 11,87, 93, 102

freezing In relationship with analysis of Non-Intrusive LoadMonitoring (NILM) in an industrial setting andChapter 4, freezing is used for the process of cool-ing down goods to a certain temperature, before itcan be stored in the cold store. 31, 39, 43, 45, 47, 51

gridagent GridAgent [79] is a data collector and data for-warder. The GridAgent have multiple commonica-tion protocols, the wireless technology ZigBee, forcommunication to GridPoints and GSM or Internetvia ethernet to send the measurements to a central-ized database. 31

gridpoint GridPoint [78] is a smart meter device, which mea-sure the cumulative active energy (watt-hour) atleast one sample per 60 second. The GridPoints areusing the wireless technology ZigBee to send themeasurements to a data collector, a GridAgent. 29,31, 41

heating In relationship with analysis of Non-Intrusive LoadMonitoring (NILM) in an industrial setting andChapter 4, heating is the process of avoiding per-mafrost under the building facility of the cold store.31, 43, 45, 51

interpretation / evaluation The process of understanding patterns from datamining methods. This process can e. g. involve vi-sualization of patterns or possible return to some ofthe previous steps. 6, 21

invalidation The process of rejecting that the data stream of thepoint seem to match its annotated metadata. Oppo-site of validation. 58, 59, 63, 66, 74, 77, 80

102

knowledge The result of the whole Knowledge Discovery (KD)process. The knowledge can take multiple form anddimensions. 15–17, 29, 57, 77, 88, 89

maintenance The process of annotating metadata for a point inthe building infrastructure, to reflect the actual situ-ation of the point. 8, 13, 14, 17–20, 57–59, 62, 68, 69,74, 75, 90, 91

metadata Metadata for points from a building instrumenta-tion might provide relevant information about: lo-cation of the point, type of point, encoding of datafrom point and unit for the associated data streamof a point. If the majority of points from a buildinginstrumentation has metadata, the building instru-mentation can provide a semantic representation.Metadata provides a semantic representation to un-derstand the context of a point . 4–6, 8, 9, 11–22, 27,32, 55, 57–63, 65–71, 73–78, 83, 87–91, 93

metadata profile A metadata profile is a schema, which supports andenables the process of validation for a point. Themetadata profile defines for a given part, how thekey-value set defining metadata for a specific partmust be structured. A part would be e. g. location,where a key-value set would define a relationshipfor building, floor and room. 20, 58–60, 62–65, 68–71, 74

metafier Metafier is the name of the platform and environ-ment where data processing methods and softwaretools with respect to metadata for buildings havebeen developed and tested. xv, 8, 9, 27, 55, 57, 58,61–64, 66–78, 80, 81, 83, 87–90, 93

pattern Result from the data mining process for KnowledgeDiscovery (KD), this can be classification rules /trees, clusters. 35, 58, 77, 79, 83

103

point A point represents a connection between the cyberand physical world which may be discretized intoa data stream. Such a data stream contains eithersensor readings or actuation requests depending ondirection of the connection. xiv, xv, 8, 11–21, 27,29–32, 36, 37, 41–43, 50, 52, 54, 57–63, 65, 66, 69–83,87–93

preprocessing The process of cleaning, resampling and structuringdata. 6, 21, 35, 41, 61, 77

selection The process of selecting a subset of data, this can beautomated, semi automated or manual. 6, 21, 36

software defined building Software Defined Building (SDB) is the concept ofrepresenting cyber and physical elements of a build-ing to provide integration and/or interaction to cy-ber and physical elements within the building. v,xvii, 3, 4, 11, 17, 27, 35, 57, 69, 77, 88, 89, 93, 104

software tool The term is used when software applications are in-terfacing other systems or humans. 5, 6, 8, 9, 11, 17,19, 20, 27, 31, 32, 55, 57, 69, 72, 74, 76, 87, 89–91, 93

storage In relationship with analysis of Non-Intrusive LoadMonitoring (NILM) in an industrial setting andChapter 4, storage is the process of holding the tem-perature at a certain point inside the cold store. 31,43, 45, 46, 51

tag The term tag, are used for a small text describing apoint in a BAS or BMS. A tag could be in the formof: "SODA3R419_RVAV", where the first three lettersgives the building site, which is Soda Hall at UCBerkeley. "A3" indicates that the sensor is part ofair handling unit 3. "R419" gives the room location,which is 419. "RVAV" gives the point type, which isreheat discharge air pressure sensor for variable airvolume. 13–15, 20, 21, 32, 88

target data A subset of data for Knowledge Discovery (KD). 36,77

104

transformation The process of reducing the complexity within thedata by applying features. This can be by reducingor projecting the data. 6, 13, 21, 23, 35, 78

validation The process of verifying that the data stream of thepoint seem to match its annotated metadata. 8, 13,15, 18–21, 31, 57–59, 61–63, 66, 68, 69, 73–75, 77, 79–82, 87, 89–91

yyyy-mm-dd The used date format follows: YYYY-MM-DD,mening 4 digits for year, two digits for month andtwo digits for day of month. 31

105

colophon

This document was typeset using the typographical look-and-feel classicthesisdeveloped by André Miede. The style was inspired by Robert Bringhurst’s seminalbook on typography “The Elements of Typographic Style”. classicthesis is availablefor both LATEX and LYX:

http://code.google.com/p/classicthesis/

http://code.google.com/p/classicthesis/

software tools and data processing methods to support ... · software tools and data processing...

Documents