application of labview and myrio to voice controlled home automation...

86
UPTEC E 19003 Examensarbete 30 hp April 2019 Application of LabVIEW and myRIO to voice controlled home automation Tim Lindstål Daniel Marklund Masterprogram i förnybar elgenerering Master Programme in Renewable Electricity Production

Upload: others

Post on 28-Jan-2021

20 views

Category:

Documents


0 download

TRANSCRIPT

  • UPTEC E 19003

    Examensarbete 30 hpApril 2019

    Application of LabVIEW and myRIO to voice controlled home automation

    Tim LindstålDaniel Marklund

    Masterprogram i förnybar elgenereringMaster Programme in Renewable Electricity Production

  • Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

    Abstract

    Application of LabVIEW and myRIO to voicecontrolled home automation

    Tim Lindstål & Daniel Marklund

    The aim of this project is to use NI myRIO and LabVIEW for voice controlled homeautomation. The NI myRIO is an embedded device which has a Xilinx FPGA and adual-core ARM Cortex-A9 processor as well as analog input/output and digitalinput/output, and is programmed with the LabVIEW, a graphical programminglanguage. The voice control is implemented in two different systems. The first system is based on an Amazon Echo Dot for voice recognition, which is acommercial smart speaker developed by Amazon Lab126. The Echo Dot devices areconnected via the Internet to the voice-controlled intelligent personal assistantservice known as Alexa (developed by Amazon), which is capable of voice interaction,music playback, and controlling smart devices for home automation. This system inthe present thesis project is more focusing on myRIO used for the wireless control ofsmart home devices, where smart lamps, sensors, speakers and a LCD-display wasimplemented.

    The other system is more focusing on myRIO for speech recognition and was built onmyRIO with a microphone connected. The speech recognition was implemented usingmel frequency cepstral coefficients and dynamic time warping. A few commands couldbe recognized, including a wake word "Bosse" as well as other four commands forcontrolling the colors of a smart lamp.

    The thesis project is shown to be successful, having demonstrated that theimplementation of home automation using the NI myRIO with two voice-controlledsystems can correctly control home devices such as smart lamps, sensors, speakersand a LCD-display.

    UPTEC E 19003Examinator: Tomas NybergÄmnesgranskare: Ping WuHandledare: Payman Tehrani

  • Populärvetenskaplig sammanfattning

    Röstigenkänning och röststyrning har blivit mycket populärt på senare tid med ett flertal stora

    företag som satsat enorma resurser på att utveckla system för detta ändamål. Både Amazon,

    Google och Apple har sina egna Cloud-baserade röstigenkänningssystem som hela tiden förbättras

    allteftersom fler användares röster kan spelas in och därefter användas som jämförelse i ett

    bibliotek. Till dessa röstigenkänningssystem finns det dessutom en mängd olika

    tredjepartsapplikationer som kan användas för explicita ändamål. Exempel på dessa är Philips hue

    system som med hjälp av Amazon Alexa eller Google Home kan användas för att röststyra smarta

    lampor. Användandet av Amazons hemapplikationssystem (Amazon Alexa) och Googles (Google

    home) har dock vissa begränsningar i form av att de kräver en brygga som konverterar kod till ett

    protokoll som smarta lampor förstår samt att det är svårt att göra något helt skräddarsytt. För att

    kunna göra något skräddarsytt behövs någon form av mikrokontroller som kan hantera och tolka

    kod.

    NI myRIO är en relativt kraftfull mikrokontroller som är specialanpassad för att användas i

    studentrelaterade projekt. Den är uppbyggd med två processorer, en FPGA och en

    realtidsprocessor. Programmeringsspråket som används för att skriva kod på denna

    mikrokontroller är LabVIEW , vilket är ett grafiskt programmeringsspråk som använder sig av

    dataflödesprogrammering.

    Målet med detta projekt har varit att skapa två röststyrda system för hemapplikationer med hjälp

    av mikrokontrollern NI myRIO och programmeringsspråket LabVIEW. Det ena systemet har

    använt Amazon Alexa för röstigenkänning och fokuset för detta system har varit att med hjälp av

    olika kommunikationsprotokoll kunna styra några smarta lampor, sensorer, en LCD-skärm och ett

    par datorhögtalare samt att få dessa att interagera med varandra på givna kommandon.

    För det andra systemet har fokuset mer legat på teorin bakom röstigenkänning, där målet varit att

    bygga ett system som kan tolka några få röstkommandon och även styra en smart lampa med hjälp

    av dessa.

    Båda systemen uppnår projektspecifikationerna. Det ena systemet med Amazon alexas

    röstigenkänning har implementerats med 28 olika röstkommandon, där exempel på

    implementerade funktioner är en ljusshow och ett reglersystem för ljusstyrka i ett rum. Det andra

    systemet kan känna igen fem olika kommandon, varav ett är ett ‘aktiveringsord’ som måste sägas

    innan övriga kommandon. De övriga kommandona är fraserna ‘Red light’, ‘Blue light’, ‘Green

    light’ samt ‘Yellow light’. Systemet har använt Mel frekvens cepstral koefficienter som en identitet

    för en fras och dynamisk tidsförvrängning för att jämföra med förinspelade fraser i ett bibliotek.

  • Acknowledgements

    The authors are very grateful for all the help and support provided by the supervisors PingWu and Payman Tehrani during the project, as well as the support from family and friends.Gratitude is also directed toward colleagues in the department of signals and systems at Uppsalauniversity for various support. A special gratitude is directed to Daniel’s girlfriend, Mimi Riblom,for all her patience and support when the apartment has been used as base for the project.

    ii

  • Contents

    Abstract i

    Acknowledgements ii

    Contents iii

    List of Figures vi

    List of Tables ix

    Abbreviations x

    1 Introduction 1

    1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Purpose and project specifications . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.3 Tasks and scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.4 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 LabVIEW & NI myRIO 5

    2.1 LabVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2 NI myRIO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    3 Theory 10

    3.1 Speech recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3.1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3.1.2 Type of speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3.1.2.1 Isolated word . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.1.2.2 Connected word . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.1.2.3 Continuous speech . . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.1.2.4 Spontaneous speech . . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.1.3 Speech recognition techniques . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.1.3.1 Mel Frequency Cepstral Coefficients (MFCC) . . . . . . . . . . . 11

    3.1.3.2 Hidden Markov Model (HMM) . . . . . . . . . . . . . . . . . . 13

    3.1.3.3 Deep Neural Network (DNN) . . . . . . . . . . . . . . . . . . . . 16

    3.1.3.4 Dynamic Time Warping (DTW) . . . . . . . . . . . . . . . . . . 19

    3.2 Communication protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    iii

  • Contents iv

    3.2.1 Open System Interconnection Model (OSI) . . . . . . . . . . . . . . . . . 22

    3.2.2 IEEE 802.11 & WIFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    3.2.2.1 IEEE 802.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    3.2.2.2 WIFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    3.2.3 ZigBee & IEEE 802.15.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    3.2.3.1 IEEE 802.15.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    3.2.3.2 ZigBee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3.2.3.3 Zigbee cluster library . . . . . . . . . . . . . . . . . . . . . . . . 26

    3.2.4 Inter-Integrated Circuit (I2C) . . . . . . . . . . . . . . . . . . . . . . . . 28

    3.2.5 Universal Asynchronous Receiver/Transmitter (UART) . . . . . . . . . . 30

    4 Implementation 32

    4.1 Voice controlled system using Alexa . . . . . . . . . . . . . . . . . . . . . . . . . 32

    4.1.1 Amazon Alexa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    4.1.2 IFTTT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    4.1.3 Webserver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    4.1.3.1 Network variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    4.1.3.2 Port Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    4.1.3.3 Debug & Public server . . . . . . . . . . . . . . . . . . . . . . . 35

    4.1.4 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    4.1.4.1 LCD Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    4.1.4.2 Temperature sensor . . . . . . . . . . . . . . . . . . . . . . . . . 38

    4.1.4.3 Light sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    4.1.4.4 Radio module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    4.1.4.5 Philips hue bulbs . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    4.1.4.6 Computer speakers . . . . . . . . . . . . . . . . . . . . . . . . . 43

    4.1.5 System setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    4.1.5.1 Parallel processes . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    4.1.5.2 Queue system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    4.1.6 Command & functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    4.1.6.1 On/Off & Dim Lights . . . . . . . . . . . . . . . . . . . . . . . . 46

    4.1.6.2 Temperature display on/off . . . . . . . . . . . . . . . . . . . . . 46

    4.1.6.3 Light sensor display on/off . . . . . . . . . . . . . . . . . . . . . 47

    4.1.6.4 Speaker feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    4.1.6.5 Light Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    4.1.6.6 Dim level control . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    4.1.6.7 Colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    4.1.6.8 Light show . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    4.2 Customized voice controlled system directly in labVIEW . . . . . . . . . . . . . 50

    4.2.1 Components and setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    4.2.2 FPGA configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    4.2.3 Wake-word and LED lights . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    4.2.4 Audio configuration and input . . . . . . . . . . . . . . . . . . . . . . . . 52

    4.2.5 Decoding the audio signal . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    4.2.5.1 Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    4.2.5.2 Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    4.2.5.3 Start- and end time . . . . . . . . . . . . . . . . . . . . . . . . . 54

  • Contents v

    4.2.5.4 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    4.2.5.5 Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    5 Results & Discussions 59

    6 Conclusions & Future work 63

    A 65

    B 70

    Bibliography 71

  • List of Figures

    2.1 Dataflow programming example [4] . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2 Different data type and their respective wire appearance [4] . . . . . . . . . . . . 6

    2.3 The front panel and block diagram of a labVIEW VI . . . . . . . . . . . . . . . . 6

    2.4 Mathscript node inside a While loop (grey square) . . . . . . . . . . . . . . . . . 6

    2.5 The front of NI myRIO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.6 An overview of the two processor chips on NI myRIO [11]. . . . . . . . . . . . . . 8

    2.7 Primary/Secondary Signals on MXP Connectors A and B at NI myRIO [11]. . . 8

    2.8 An overview of NI myRIO [11]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    3.1 A block diagram describing the MFCC process. . . . . . . . . . . . . . . . . . . . 12

    3.2 Principle of Mel scale filter bank [20]. . . . . . . . . . . . . . . . . . . . . . . . . 13

    3.3 A graphical Hidden Markov model, where the circles indicates the states and thearrows indicates probabilistic dependencies between states. . . . . . . . . . . . . 14

    3.4 A graphical overview of the Hidden Markov model parameters for one coin whereto possible outcome can be either heads or tails. . . . . . . . . . . . . . . . . . . 14

    3.5 A graphical overview of the Hidden Markov model parameters for three coinswhere probability that a certain coin will be used are showed. . . . . . . . . . . . 15

    3.6 A hidden Markov Model showing the three phonetic letters of the word nine. . . 15

    3.7 A hidden Markov Model showing the three different states, beginning, middle &end for phoneme ay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    3.8 A hidden Markov Model showing the probability for a word as well as a specificphoneme and a specific part of a phoneme to occur. . . . . . . . . . . . . . . . . 16

    3.9 A spectrum of the first 20 ms frame of the word ”Hello” where it is possible tosee more low frequency energies than high frequency energies, typical for malevoices [9]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    3.10 A full spectrogram for the word ”Hello” with all 20 ms frames added up together [9]. 18

    3.11 A simplified model of a recurrent neural network. . . . . . . . . . . . . . . . . . . 18

    3.12 A Deep neural network with input-, middle- & output layers. . . . . . . . . . . . 19

    3.13 Euclidean & DTW matching of two sequences [7]. . . . . . . . . . . . . . . . . . . 20

    3.14 An empty 10× 10 cost matrix, D. . . . . . . . . . . . . . . . . . . . . . . . . . . 213.15 A 10× 10 cost matrix, D, where the two first columns of values are calculated. . 213.16 A 10× 10 cost matrix, D, where all columns of values are calculated as well as

    the warp path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.17 The ISO model with its layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    3.18 Example of a WIFI network [24]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    3.19 The IEEE 802.15.4 standard uses two layers as well as the LCC- and SSCS-layersfor communication with all layers above defined by additional standards. . . . . . 24

    3.20 OSI model and the ZigBee model . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    vi

  • List of Figures vii

    3.21 Different types of zigbee network topology . . . . . . . . . . . . . . . . . . . . . . 26

    3.22 Example of an I2C-bus with one master device and three slave devices . . . . . . 28

    3.23 Example of START and STOP conditions in an I2C circuit. [23] . . . . . . . . . 29

    3.24 An example of a single byte I2C data transfer. [23] . . . . . . . . . . . . . . . . . 29

    3.25 Example of an I2C write register. [23] . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.26 Example of an I2C read register. [23] . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.27 UART connections between two devices. . . . . . . . . . . . . . . . . . . . . . . 31

    3.28 Example of a one byte UART communication. [21] . . . . . . . . . . . . . . . . . 31

    4.1 A model of Alexa voice service [14]. . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    4.2 Block diagram code of the initialization of the LCD Display . . . . . . . . . . . . 37

    4.3 Block diagram code of the first sequence window in the Write bytes VI . . . . . . 38

    4.4 Block diagram code of the Print text VI . . . . . . . . . . . . . . . . . . . . . . . 38

    4.5 I2C Write/Read express VI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    4.6 Illustration of XCT-U software, where ’Discover Radio Modules’ is marked. . . . 40

    4.7 X-CTU discover radio devices menu. The data should be left default . . . . . . . 41

    4.8 X-CTU searching for radio modules. . . . . . . . . . . . . . . . . . . . . . . . . . 42

    4.9 Discovered radio modules in X-CTU. . . . . . . . . . . . . . . . . . . . . . . . . . 42

    4.10 Conversion of a .wav file to an array in LabVIEW. . . . . . . . . . . . . . . . . . 43

    4.11 An overview of the voice controlled system with alexa. . . . . . . . . . . . . . . . 44

    4.12 Block diagram of process 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    4.13 Block diagram of process 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    4.14 Block diagram of the Lightcontrol part in process 4 . . . . . . . . . . . . . . . . . 48

    4.15 Block diagram of the Dim level control VI . . . . . . . . . . . . . . . . . . . . . . 49

    4.16 Color and saturation scale in decimal values (0-255) . . . . . . . . . . . . . . . . 49

    4.17 Process 3 Block diagram with the code for the command Lightshow . . . . . . . 50

    4.18 Fpga configuration vi with controls and input for LED diodes and audio. . . . . 51

    4.19 LabVIEW code for audio configuration . . . . . . . . . . . . . . . . . . . . . . . . 52

    4.20 LabVIEW code for retrieving elements from Audio IN FIFO queue . . . . . . . . 53

    4.21 LabVIEW code for setting up frame length . . . . . . . . . . . . . . . . . . . . . 53

    4.22 LabVIEW code detecting the start- and end time of the Utterance . . . . . . . . 54

    4.23 Calculation of the Mel filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    4.24 The Mel Filter Bank used in the project . . . . . . . . . . . . . . . . . . . . . . . 55

    4.25 Calculation of Mel frequency cepstral coefficients. . . . . . . . . . . . . . . . . . . 55

    4.26 Calculation of delta coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    4.27 Calculation of delta-delta coefficients. . . . . . . . . . . . . . . . . . . . . . . . . 56

    4.28 Block diagram of the code where the set of words are tested against the utteredword in Match input creating the Distance matrix and Word index array . . . . . 57

    4.29 Calculation of the cost matrix used in DTW . . . . . . . . . . . . . . . . . . . . . 57

    4.30 Block diagram of the code for the requirements of the best match . . . . . . . . . 58

    5.1 An overview of the system. 1. The radio module, which transmits Zigbee signalsto the smart lamps. 2. LED diodes, which indicates when a a word could be saidto the customized system. 3. The LCD screen which can display temperatureand Lux values on command with with the system based on Amazon Alexa. 4. Alight sensor used for light level display on the LCD screen as well as in the lightcontrol system. 5. A temperature sensor used for temperature display on theLCD screen and as voice response of temperature from a speaker. . . . . . . . . 59

  • List of Figures viii

    5.2 A chart of the The PID regulator in the light control system, which changes thelight intensity between different Lux values in range 200-800 Lux. The Y-axisrepresent the Lux values and the X-axis the time, where the scale is 1/10 s. Eachset point is also marked. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    5.3 A chart of the The PID regulator in the light control system, which changes thelight intensity between different Lux values in range 500-1500 Lux. The Y-axisrepresent the Lux values and the X-axis the time, where the scale is 1/10 s. Eachset point is also marked. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    5.4 Unfiltered & filtered audio input signals and utterance sequence of the command”Blue Light” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    5.5 Matching results and data from a run when the command ”Blue Light” was said 62

    A.1 A chart flow of process 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    A.2 A chart flow of process 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

    A.3 A chart flow of process 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    A.4 A chart flow of process 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    A.5 A chart flow of the whole system . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

  • List of Tables

    3.1 Dictionary with numbers and the corresponding phonetic numbers . . . . . . . . 17

    3.2 Zigbee public profile IDs and Profile name . . . . . . . . . . . . . . . . . . . . . . 27

    3.3 Payload data packet in Hex values and their corresponding function and scale . . 28

    4.1 Created applets and their function . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    4.2 LCD initialize commands and their function . . . . . . . . . . . . . . . . . . . . . 36

    4.3 Processes and which commands they each handles . . . . . . . . . . . . . . . . . 45

    4.4 Queues and where they are enqueued and dequeued as well as which componentsthey affect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    4.5 Color and saturation values for the different colors used . . . . . . . . . . . . . . 49

    5.1 Success rate for the customized system when Daniel is speaking . . . . . . . . . . 61

    5.2 Success rate for the customized system when Tim is speaking . . . . . . . . . . . 61

    B.1 The phonetic alphabet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    ix

  • Abbreviations

    API Application Programming Interface

    ASR Automatic Speech Recognition

    CCA Clear Channel Assessment

    DLL Dynamic Linked Library

    DNN Deep Neural Network

    DCT Discrete Cosine Transform

    DTW Dynamic Time Warping

    ED Energy Detection

    FFT Fast Fourier Transform

    FIR Finite Impulse Response

    FPGA Field Programmable Gate Array

    HMM Hidden Markov Model

    IEEE The Institute of Electrical and Electronics Engineers

    IoT Internet of Things

    LabVIEW Laboratory Virtual Instrument Engineering Workbench

    LAN Local Area Network

    LLC Logical Link Control

    LQI Link Quality Indication

    LR - WPAN Low Rate Wireless Personal Area Network

    LSB Least Significant Bit

    MAC Medium Access Control

    MAN Metropolitan Area Network

    MATLAB Matrix Laboratory

    MFCC Mel Frequency Cepstral Coefficients

    MLME MAC Sublayer Management Entity

    x

  • Abbreviations xi

    MLME -SAP MAC Sublayer Management Entity Service Accsess Point

    MPDU MAC Protocol Data Unit

    MSB Most Significant Bit

    NI National Instruments

    OSI Open System Interconnection

    PHY PHYical layer

    PPDU Physical Protocol Data Units

    RT Real Time

    RTOS Real Time Operating System

    SAP Service Accsess Point

    SPI Serial Peripheral Interface

    STFT Short Time Fourier Transform

    WLAN Wireless Local Area Network

    WPAN Wireless Personal Area Network

    ZCL Zigbee Cluster Library

    ZLL Zigbee Light Link

  • Application of LabVIEW and myRIO to voice controlled home automation xii

  • Chapter 1

    Introduction

    This section presents the background for this thesis, specifications and objectives of the project,tasks required to fulfill the specifications, how the work procedure of the thesis is planned andan outline of the report.

    1.1 Background

    Speech is a basic form of communication between people and originates in its most primitiveway at least 100 000 years back [16]. No other communication method is faster and morenatural for humans. This is one of the reasons modern speech recognition systems have beenincreasingly developed in the last decades. These systems have enabled humans to communicatewith machines and computers by voice commands and can make daily life easier for people.Today there are many different options for speech recognition systems available on the marketand huge companies like Amazon and Google have their own systems which are compatible witha number of interesting smart home products by WIFI-connection. The use of these systemshowever requires extra tools called bridges to communicate wireless with home automationproducts as well as an app and can not be used fully customized.

    Internet is something that connects people all around the globe but it is not limited to justpersonal computers and mobile devices, it can also be used to control home automation devicessuch as lights and fans. The technology of controlling home appliances over the internet is calledInternet of Things (IoT) and has seen a great rise in popularity during the recent years. Withthe use of internet home appliances can communicate with each other and making it possibleto integrate them in an automation system. Other functions with IoT is for example a smartmirror which is an ordinary mirror with a LED screen making it possible for a mirror to displayboth the reflection and communicate with the internet to display for example the local time andweather.

    1.2 Purpose and project specifications

    The purpose of this project is to use NI myRIO which is an embedded device developed nyNational Instruments Inc. and the graphical programming language LabVIEW to create two

    1

  • Application of LabVIEW and myRIO to voice controlled home automation 2

    wireless home automation systems, where both systems should be able to be used as stand aloneapplications.

    The first system should use Amazon Alexa for voice recognition and all commands sent to smartlamps, speakers, sensors and a LCD screen should be controlled by NI myRIO. The systemshould implement the use of several communication protocols like WIFI, inter-integrated circuits(I2C) and Zigbee. The system should be able to handle the following criteria.

    • The system should take in commands recognized by Amazon Alexa into the NI myRIO asstrings, which can be handled further by the system.

    • The system should handle several communication protocols.

    • Implementation of at least three external devices like smart lamps, sensors and LCDscreens.

    The second system is more focused on speech recognition and will use Mel frequency cepstralcoefficients (MFCC) and Dynamic time warping (DTW) to recognize a few commands controllingcolors on a smart lamp. The following criteria should be fulfilled for this system.

    • The system should recognize at least four different commands.

    • A ’wake word’ which needs to be said before an executing command should be implemented.

    1.3 Tasks and scope

    The work procedure of this thesis can be divided into four parts; literature study, hardware work,software work and finally an evaluation. Security issues related to the communication protocolsare not handled in this thesis, nor is the theory behind cloud-based communication associatedwith Amazon Alexa. A couple of other common speech recognition techniques are handled intheory, but not implemented in the system.

    The study of literature aims at presenting relevant theory of existing projects related to speechrecognition and voice controlled systems as well as the theory behind different communicationprotocols. The study’s tasks includes:

    • Studying theory related to NI myRIO and LabVIEW.

    • Studying WIFI-, I2C-, and Zigbee protocols.

    • Studying theory of speech recognition techniques, and in particular the theory behind melfrequency cepstral coefficients and dynamic time warping.

    • Investigating related projects and extract ideas from these.

    To create the systems the NI myRIO has to be wired to the applications used in the project, likesensors, LCD screens, a radio module and LED diodes. The project’s hardware part containsthe following tasks:

  • Application of LabVIEW and myRIO to voice controlled home automation 3

    • Connect a temperature sensor to the SDA- and SCL- input on the myRIO.

    • Connect a light sensor to the SDA- and SCL- input on the myRIO.

    • Connect a LCD screen to the SDA- and SCL- input on the myRIO.

    • Connect a pair of computer speakers to the audio output on the myRIO.

    • Connect a radio module via UART to the transmitter- and receiver ports on the myRIO.

    • Connect 9 LED diodes to digital outputs on the myRIO.

    • Connect a microphone to audio in on the myRIO.

    The biggest work load lays in the implementation of the software built in LabVIEW. Afterconnecting the hardware components, the software is responsible for all functionality. Thefollowing tasks are included in this part:

    • Implement communication between hardware components and the NI myRIO.

    • Set up a web server on the NI myRIO and use this URL address to communicate withAmazon Alexa.

    • Implement a voice controlled system which uses two way communication between thehardware components.

    • Implement a speech recognition system based on MFCC and DTW.

    The last part is the evaluation part which measures the performance of the systems, wheretesting results are evaluated.

    1.4 Method

    The literature study was the first part of this thesis project. By investigating the conceptsand theory behind speech recognition and voice controlled systems, including communicationprotocols, an overview of the intended systems could be obtained. Further, a closer look at thetheory of NI myRIO were taken as well as tutorials and a short web based course in LabVIEW.By studying related projects, many common problems could be avoided, or later in the processsolved.

    Following the literature study, the focus was on implementing the chosen hardware componentsin the first system with Amazon Alexa. This work was closely made together with the work withprogramming the software to see that all connections were established correctly. The AmazonAlexa were at the same time prepared and the software for this installed on an app. When allhardware components were connected correctly, the work began with programming the softwareprogram in LabVIEW for the first system. A web server was established on the myRIO andcommands to use were defined in IFTTT. The program for then expanded continuously andimplemented with more functions.

    When the first system was nearly finished, the work with the second system based on MFCCand DTW began. The work procedure was similar to the first system, but theory on FPGAneeded to be studied carefully since this was used for sampling the signals.

    When both the programs was finished, both systems were evaluated based on performance.

  • Application of LabVIEW and myRIO to voice controlled home automation 4

    1.5 Outline

    Chapter 1 presents the background, purpose and project specifications, tasks as well as the workflow of this thesis. The objective of Chapter 2 is to give an introduction to the microcontroller NImyRIO and the graphical programming language LabVIEW. Chapter 3 presents history, relevanttheory and concepts related to speech recognition and communication protocols used in the thesis.Chapter 4 presents the implementation of software for both the systems. Chapter 5 presentsthe results for the project and discussions. Chapter 6 contains conclusions and suggestions forfurther work related to the results.

  • Chapter 2

    LabVIEW & NI myRIO

    2.1 LabVIEW

    LabVIEW is a graphical programming language that creates applications using icons insteadof lines of text. In contrast to programming languages based on texts, where the executionorder is determined by instructions, LabVIEW uses data flow programming. Code and functionsare written in block diagrams belonging to a VI and flow of data passes through nodes, whichdetermines the order of execution. A node will in other words execute when it has receivedall its inputs and when executed produce output data that is passed to the next node in thedata flow path. A simple example is shown in Figure 2.1 where two numbers are added andthen subtracted by 50. In this example the VI will execute from left to right since the subtractfunction will not be able to execute before the add function has executed and sent output data tothe input of the subtract function. One of the benefits of data flow programming is that different

    Figure 2.1: Dataflow programming example [4]

    tasks is allowed to be processed concurrently which makes it very easy to design multitaskingblock diagrams for example parallel tasks managed in multiple while loops [18]. Wires is used totransfer data between objects in the block diagram and depending on the type of data beingtransferred the appearance of the wire will change. The wires must also be connected to inputsand outputs that are compatible with the data that are being transferred, for example an arrayoutput can’t be connected to a numeric input. A figure of a couple of different types of data andtheir respectively wire appearance is shown in Figure 2.2.

    The VI (Virtual instrument) shown in Figure 2.3, built in LabVIEW, consists of the blockdiagram where the source code is but also of something called a front panel which is. The frontpanel is a GUI where controls, indicators, buttons and graphs etc. can be added by selectingthem from a drop down menu. This makes it extremely easy to create interactive and helpfullUI for your applications. As the name implies the VI is an imitation of a physical instrument[13]. LabVIEW is used in many industries around the world, for example LabVIEW is used

    5

  • Application of LabVIEW and myRIO to voice controlled home automation 6

    Figure 2.2: Different data type and their respective wire appearance [4]

    Figure 2.3: The front panel and block diagram of a labVIEW VI

    in the large hadron collider at CERN. LabVIEW can be used for data acquisition as well asequipment control and although it might be different from traditional programming it is veryuser friendly for first time users. It is also very flexible as you can import C/C++ code bymaking a Dynamic link library (DLL) in C/C++ and then call that DLL from labVIEW. Forusers that is familiar with MATLAB labVIEW has something called a mathscript node were itis possible to write your own textbased computations or cut and paste your MATLAB code intothe node. A mathscript node is shown in Figure 2.4. It is also possible to call the MATLABsoftware to execute scripts provided the user has a valid licence.

    Figure 2.4: Mathscript node inside a While loop (grey square)

  • Application of LabVIEW and myRIO to voice controlled home automation 7

    2.2 NI myRIO

    Figure 2.5: The front of NI myRIO.

    The myRIO shown in Figure 2.5 is an embedded device developed by National Instrumentsfor student projects with an architecture based on the configurable multiprocessor Xilinx Zynqz-7010. The two processors in the device (one fixed and one configurable) can be programmedindependently due to their own peripherals and memory. The fixed processor is a dual core pro-cessor called ARM Cortex-A9 MPCore, which includes a fixed set of peripherals and implementsthe ARMv7 instruction set architecture (ISA). The myRIO ARM Cortex-A9 is pre-configured inthe factory with a real-time Linux distribution. Linux is a popular real-time operating system(RTOS) with real-time extensions, a more deterministic scheduling and behavior fitting foroperating systems in a wide range of embedded applications and the overview of the FPGAprcessor aswell as the RT procsessor is shown in Figure 2.6. For NI Linux Real-Time, EclipseEdition, C and c++ development tools are used for downloading, compilation and executionas well as debugging of c applications on myRIO. But for structured dataflow applications,compiling, downloading and executing code on myRIO are performed in LabVIEW instead [11].

    The myRIO has a plethora of functions and is a great choice for standalone systems. It hasan inbuilt accelerometer and also 4 LEDs that can be controlled and the FPGA chip makes itpossible for advanced robotics control. Since it has inbuilt WIFI it is also possible to use themyRIO for IoT systems and wireless control. Besides the many digital inputs and outputs aswell as analog input/output and audio in and output the myRIO has 3.3V, 5V and 15/-15Vpower outputs that can be connected to your own devices and applications. As seen in figure2.7 the myRIO has two MXP connectors where inputs and outputs for different communicationprotocols can be connected. The myRIO supports both I2C (inter-intergrated communication)where the SDA and SCL port is on pin 34 and 32 respectively as well as SPI (Serial PeripheralInterface), It also has a Universal Asynchronous Receiver/Transmitter (UART) with Rx andTx pins making the myRIO able to communicate with many different devices directly withouthaving to add an adapter of some sorts.

  • Application of LabVIEW and myRIO to voice controlled home automation 8

    Figure 2.6: An overview of the two processor chips on NI myRIO [11].

    Figure 2.7: Primary/Secondary Signals on MXP Connectors A and B at NI myRIO [11].

  • Application of LabVIEW and myRIO to voice controlled home automation 9

    The Xilinx Artix-7 field programmable gate array (FPGA) is the reconfigurable processor onmyRIO. The FPGA consists of logic units, memory and other key building blocks that can bereconfigured at hardware level. An FPGA can implement peripheral hardware such as PWMgenerators, communication buses, quadrature encoder interfaces, video rendering and decoding,algorithms for signal processing and other processor architectures [11]. The FPGA processor aretherefore especially useful for systems requiring very fast calculations/response. An overview ofNI myRIO including expansion boards is shown in Figure 2.8. The specific information such asinputs and outputs can be found in the user guide for NI myRIO-1900 [22].

    Figure 2.8: An overview of NI myRIO [11].

  • Chapter 3

    Theory

    3.1 Speech recognition

    3.1.1 History

    The first voice recognition technologies was developed in 1952 by Bell Labs and called TheAudrey System. This early system could only recognize ten digits spoken by a single voice [15].A following step was taken in 1962 by IBM with their Shoe-box machine, which above the tendigits also could recognize 16 English words and six arithmetic commands. Greater steps wastaken in the 1971-1976 by the U.S. Department of Defence who funded the DARPA SUR, aresearch program for speech recognition who developed Harpy by Carnegie Mellon, a programwhich could understand 1011 words. During the same period a voice recognition company,Threshold Technology, was founded, the first commercial one and a system which could handlemultiple voices was also introduced by Bell Labs. A new milestone was taken in 1978 by TexasInstruments when they introduced Speak & Spell since it used a speech chip. This made itpossible to make synthesize sound more human-like. The major breakthrough in the subject washowever when probability of unknown sounds and statistics in a special so called, Hidden MarkovModel, was introduced 1980’s. After this voice recognition started entering homes, with thefirst system for consumers called ’Dragon dictate’ developed in the beginning the 1990’s. Thissystem was further improved in 1997 and could at this point recognize 100 words per minute.BellSouth made the first voice - activated portal (VAL) in 1996. However, for many people, thissystem was inaccurate and caused nuisance. By 2001, the development of speech recognitiontechnology had hit a plateau until the arrival of Google. Google invented an app called ’GoogleVoice Search’ for iPhones that used data centers to calculate the huge amount of data analysisneeded to match user queries with samples of actual human speech. In 2010, Google introducedcustom recognition on Android devices where recorded voice queries from different users wasused to develop an enhanced model of speech. This systems library consisted of 230 billionwords. Eventually the modern systems developed by Apple, Google and Amazon rely on cloudbased calculations. These systems also implement third part applications and can be both funnyand behave more like an assistant.

    3.1.2 Type of speech

    There are four separated speech recognition classes which recognizes different types of utterances.

    10

  • Application of LabVIEW and myRIO to voice controlled home automation 11

    3.1.2.1 Isolated word

    Isolated word system typically recognizes a single word inside an utterance window. It requiressilence both before and after the recorded word and also have a ”listen & non-listen state”.

    3.1.2.2 Connected word

    Connected word system are closely related to isolated word system, but they also allow separateutterances to be concatenated with minimal pauses in between.

    3.1.2.3 Continuous speech

    Continuous speech systems allows a user to speak naturally while a computer analyzes its content.The systems outer utterance boundaries can vary in a complex way, which makes continuousspeech systems one of the most difficult to create.

    3.1.2.4 Spontaneous speech

    Spontaneous speech systems can analyze unprepared speech like disfluencies. Examples of theseare filled pauses, repetitive words or a false start. Systems of this type which can understandspoken material in an accurate way as well as understand the context of the words are still beyondexisting technology, but would enable new features like making summaries of conversations,notes at business meetings and eventually even translate any existing languages perfectly.

    3.1.3 Speech recognition techniques

    3.1.3.1 Mel Frequency Cepstral Coefficients (MFCC)

    Sounds generated by humans are all different depending on the shape of the vocal tract, likethe tongue or the teeth. The determination of this shape in a correct way would make itpossible to produce any sound accurately. Vocal tract can be represented by MFCC, which is arepresentation of a speech signal’s time power spectrum, where the MFCC is the coefficientsthat the Mel Frequency Cepstrum consist of [6].

    The MFCC consists of six computational steps as presented by the block diagram in figure3.1. Every step represents either a function or some mathematical approach which are brieflydiscussed below.

    The pre emphasis let the input signal pass through a filter which emphasizes the high frequencies.The energy in the higher frequencies will then increase due to the process as explained byequation 3.1, where a is a probability factor.

    Y (n) = X(n)− aX(n− 1) (3.1)

    Lets for example say a = 0.90. Then it is 90 % probability that a sample originates from theprevious sample.

  • Application of LabVIEW and myRIO to voice controlled home automation 12

    Pre Emphasis Framing Windowing

    FFTMel Filter BankDiscrete CosineTransform

    Voice input

    MagnitudeSpectrum

    MelSpectrum

    MelSpectrum

    Figure 3.1: A block diagram describing the MFCC process.

    Next step is to divide the speech signal into frames in range of 20-40 ms, where the total numberof samples in a frame is defined as N .

    Further, a windowing technique called hamming window is used, which helps to reduce thediscontinuity at the start and end of each frames. The Hamming window is defined by equation3.2, where the window (W (n)) is in the range 0 ≤ n ≤ N − 1.

    W (n) = 0.54− 0.46 cos( 2πnN − 1

    ) (3.2)

    The windowed output signal (Y (n)) is then given as equation 3.3, where X(n) represents theinput signal.

    Y (n) = X(n) ·W (n) (3.3)

    Since all samples in each frame are in time domain, some transform method need to be used totransform the samples to frequency domain. This is achieved by the Fast Fourier Transform,which converts the vocal impulse response (H(n)) and also the glottal pulse convolution (U(n)[12]. The output in frequency domain can then be described as in equation 3.4.

    Y (w) = FFT [h(t) ∗X(t)] = H(w) ∗X(w) (3.4)

    The range of the frequencies in the spectrum of FFT is wide and doesn’t follow a linear scale,why a log scale is used for the filter bank output, a scale called the Mel scale. A representation ofthis scale is shown in Figure 3.2, where a series of triangular filters are used for computation ofa weighted sum of spectral components which filters the output so that it approximates the Melscale. The magnitude of each filter’s frequency response is shaped triangular and equals unity atthe centre, while it linearly decreases to zero at closely adjacent filter centres [17]. The sum ofevery single filter’s spectral components then represents the output of each filter. An equationfor calculating the Mel frequency is described in equation 3.5. Maximum frequency should be

    selected below the Nyquist frequency (fnyquist =fsample

    2) and minimum frequency should be

    selected above 100 hz. Typical values used for a sample rate of 11025 hz is fmax = 5400 andfmin = 130 [3].

  • Application of LabVIEW and myRIO to voice controlled home automation 13

    Figure 3.2: Principle of Mel scale filter bank [20].

    fMel = 2595 log10(1 +fHz700

    ) (3.5)

    The final step is to transform the Mel scale back to the time domain which is achieved by theDiscrete Cosine Transform. The transform results is the Mel Frequency Cepstral Coefficients,where the series of coefficients are called acoustic vectors. Each and every input utterance istherefore transformed into an arrangement of acoustic vectors. The Discrete Cosine Transformis described by equation 3.6, where i = (1, 2, ...,K) and K defines the number of a frequencyband in the Mel scale. Further, n = (1, 2, ..., N), where N is the extracted number of MFCC:sand Si is the Short-Time Fourier Transform (STFT) of the discrete input signal [20].

    Cn =

    K∑i=1

    log10(Si) cos[n(i−1

    2)π

    K] (3.6)

    Deltas and delta-deltas represent the first respectively the second order derivatives of the MFCCfeature vector and are also known as differential and acceleration coefficients. The use of thesecoefficients increases information about the dynamics that the MFCC:s describes i.e. over time,since MFCC:s in it self only describes the power spectral envelope of a single frame. The ASRperformance increases significantly if the MFCC trajectories are calculated and appended tothe original feature vector. If 12 MFCC coefficients are calculated, we can get 12 delta- and 12delta-delta coefficients, giving a total feature vector length of 36. [2].

    The delta coefficients (dt) can be calculated according to equation 3.7, where this project usesN = 1. The delta-delta coefficients can be calculated in the same way, but are calculated fromthe deltas and not the static coefficients.

    dt =

    ∑Nn=1 n(ct+n − ct−n)

    2∑N

    n=1 n2

    (3.7)

    3.1.3.2 Hidden Markov Model (HMM)

    Hidden Markov model is the most commonly used modelling technique in modern speechrecognition systems and depends on probabilistic functions for known (observable) states but inunknown (hidden) sequences.

  • Application of LabVIEW and myRIO to voice controlled home automation 14

    Figure 3.3: A graphical Hidden Markov model, where the circles indicates the states and thearrows indicates probabilistic dependencies between states.

    Lets for example say we have a person tossing three coins in a closed room, where the outcomecan be either heads (H) or tail (T ) which results in the following sequence; THTTHHTH. Thissequence will be called the observation sequence (O). Someone outside of the room will onlyknow the outcome, but not in which sequence the different coins were tossed, nor the bias of thedifferent coins. To estimate in which dimension the outcome depends on the order of the tossingcoins or the individual biasing, we set up a probabilistic model which explains the sequenceof observations O = (o1, o2, o3, o4, o5, o6, o7, o8) = (T,H, T, T,H,H, T,H). The coins will hererepresent the hidden states since it is unknown which coin was tossed each time. It is possible toassume the likeliness for a state from the observations, but this sequence will not be unique. Tosimplify the idea we can first look at one coin, where the model parameter can be described asP (H). In this case the hidden states will be the actual observed states and P (H) will then bethe ratio of heads and tails i.e the probability for both heads and tails will be 0.5 [5].

    O = T,H, T, T,H,H, T,H

    S = 2, 1, 2, 2, 1, 1, 2, 1

    P (H) = 1− P (H) = 0.5

    1 2Heads Tails

    1  P(H)

    P(H) 1  P(H)

    P(H)

    Figure 3.4: A graphical overview of the Hidden Markov model parameters for one coin whereto possible outcome can be either heads or tails.

    If we go back to the example with three coins the model parameter will look different sincethe hidden states are unknown, but probability parameters extracted from relevant informationon the side can give an idea of the model parameters for which coins that are tossed [5]. Anoverview of the possible outcome is shown in Figure 3.5. Lets say we have the known observationsequence and for this sequence the hidden states (S) are as below.

    O = T,H, T, T,H,H, T,H

    S = 3, 3, 2, 1, 1, 3, 2, 3

  • Application of LabVIEW and myRIO to voice controlled home automation 15

    For someone not seeing the hidden states the model parameters can then only be described as

    P (H) : P1, P2, P3

    P (T ) : 1− P1, 1− P2, 1− P3

    1 2

    a11

    3

    a12

    a22

    a21

    a31

    a13

    a32

    a23

    a33

    Figure 3.5: A graphical overview of the Hidden Markov model parameters for three coinswhere probability that a certain coin will be used are showed.

    The concept can be directly transferred to speech recognition, where analyzed text transformedto phonetic alphabet can give a probability for the next phoneme to occur in a word (Withoutanalyzed text, nothing can of course be said). Instead of just searching for all phonemes in adecided order of the phonetic alphabet and try to find a matching one, the HMM will reorderthe list and look for the one with most probability and if not matching, the one with second bestprobability and so on. The method will then increase the speed of finding the correct phoneme.

    Figure 3.6: A hidden Markov Model showing the three phonetic letters of the word nine.

    When windowing techniques are used, each phoneme will be repeated several times in a row, sinceeach window represent a very short time period. The phoneme will then have different sounds atthe beginning, in the middle and in the end, why the HMM has to take this in consideration[25]. A more representative illustration of how a specific phoneme could be determined couldtherefore be seen in figure 3.7.

  • Application of LabVIEW and myRIO to voice controlled home automation 16

    Figure 3.7: A hidden Markov Model showing the three different states, beginning, middle &end for phoneme ay.

    As seen in figure 3.6 & figure 3.7 there are two different probabilities to take in to account whenspeech recognition is used to determine a word. The one between phonemes and the one withinphonemes. When these parameters are known, it is possible to take the last HMM parameter inconsideration. The one that describes the probability for a certain word to be used. A modelof the concept for a few numbers is shown in Figure 3.8. A dictionary of the phonetic lettersfor numbers can be seen in Table 3.1, while a table for the whole phonetic alphabet can befound in Appendix A. Between the words there are always a silent moment, why there alsoneeds to be a probability for silence to occur. This will most often lead to the end where allphonemes determined will be compared to a dictionary and the final word is solved. After thisthe procedure repeats again.

    w w w ah ah ah n n n

    ey ey ey t t t Sil

    Sil

    z z z iv iv iv r r r ow ow ow

    Start

    Sil

    End

    P("one")

    P("eight")

    P("zero")

    Figure 3.8: A hidden Markov Model showing the probability for a word as well as a specificphoneme and a specific part of a phoneme to occur.

    3.1.3.3 Deep Neural Network (DNN)

    The most accurate way to recognize speech is by the use of deep neural network, especially for alarge vocabulary system like a whole language [9]. The description of the concept first needs anintroduction of how audio samples are recorded. Raw mp3 audio is typically sampled in 44.1kHz, but human speech does normally not exceed 4 kHz, why speech recognition systems usesa sample rate of 8 or 11.025 kHz (one quarter of mp3 audio) to catch all fundamental speech

    signals according to Nyquist frequency (fnyquist =fsample

    2). These samples can be broken down

    to smaller parts called frames, where a typical frame is 20 ms long and all frames are analyzedseparately. The use of Fast Fourier Transform (FFT) which is an algorithm going from timedomain to frequency domain makes it possible to extract the signal energies in the frequency

  • Application of LabVIEW and myRIO to voice controlled home automation 17

    Numbers Phonetic numbers

    one w ah ntwo t uwthree th r iyfour f ao rfive f ay vsix s ih k sseven s eh v eh neight ey tnine n ay nzero s iy r ow

    Table 3.1: Dictionary with numbers and the corresponding phonetic numbers

    band. The equation is shown in 3.8, where N is the number of samples for the resulting sequenceXn.

    Xn =

    N−1∑k=0

    ke

    −2πjknN , 0, 1, 2...N − 1 (3.8)

    The result (Xn) can then be illustrated as a spectrum as shown in Figure 3.9 where it is easy tosee in what frequency range we have the energies. For this particular frame, the spectrum canbe seen as a ”fingerprint” for this part of the speech.

    Figure 3.9: A spectrum of the first 20 ms frame of the word ”Hello” where it is possible to seemore low frequency energies than high frequency energies, typical for male voices [9].

    If all frames in a sample are added up together as columns after each other it is then possibleto see the whole spectrogram for a word. An example of this is shown in Figure 3.10 for theword ”Hello”. This spectrogram will then be the ”fingerprint” for the whole word which can befurther analyzed with DNN.

    The full spectrogram can be interpreted as an n ∗m image with pixels where DNN can be usedto analyze what letter each frame of sound corresponds to. The most time effective type is arecurrent neural network with a memory which can predict the future outcome. If for example”Gard” is said in the beginning of a word it is more likely that ”en” or ”ener” is coming in theend rather than something irrelevant like ”zxl”. So by saving previous predictions in the neuralnetwork in a memory will make the system more accurate a predict correct letter faster nexttime. A simplified model of this type is shown in Figure 3.11.

    The technique behind it is however far more complicated and consist of an input layer, severalmiddle layers and an output layer as seen in 3.12, where each consist of many neurons. If welook at the spectrum of the 20 ms frame in figure 3.9 and imagine that each spectral valuecorresponds to an input neuron. These neurons can then be seen as a, where

  • Application of LabVIEW and myRIO to voice controlled home automation 18

    Figure 3.10: A full spectrogram for the word ”Hello” with all 20 ms frames added up together[9].

    Recurrent NeuralNetwork

    Likelihoodsaying 'A'

    Likelihoodsaying 'B'

    Likelihoodsaying 'C'

    And so on...

    Input Stateful model Output

    Figure 3.11: A simplified model of a recurrent neural network.

    a = a1 + a2 + a3 + a4...an−1 + an

    Each input neuron will be connected to every neuron in the first middle layer, where the analyticprocess begins. In this layer small fractions of the speech are broken down to pieces which areeasier to put together to something more understandable. All parts in the spectrum is not asinteresting for the determination of speech, why weighting functions (w) are used to distinguishthe importance of a parameter. The input parameter together with the weighting function willthen look like

    w1a1 + w2a2 + w3a3 + w4a4...wn−1an−1 + wnan

    To get the sum of the weighted function squeezed into a value between 0 and 1 a sigmoid functioncan be used.

  • Application of LabVIEW and myRIO to voice controlled home automation 19

    σ(x) =1

    1 + e−x(3.9)

    This sigmoid value multiplied with the weighted function will give a measure of how positive therelative sum (equation 3.10) is and with a bias value set to some specific threshold value willthen activate the neuron.

    p = σ(w1a1 + w2a2 + w3a3 + w4a4...wn−1an−1 + wnan) (3.10)

    The same procedure will be repeated a few times depending on how many middle layers thereare, and the weighting functions will of course look different in all layers. Finally only one neuronin the output layer will be activated representing a letter.

    Input layer Middle layers Output layer

    Figure 3.12: A Deep neural network with input-, middle- & output layers.

    3.1.3.4 Dynamic Time Warping (DTW)

    Dynamic time warping is an algorithm depending on time series alignment. The method wasoriginally developed for speech recognition and uses two sequences of feature vectors and warpsthe time axis in an iterative way until there is an optimal path found between the two sequences[19]. Since a word can be said in many different ways (Slow, fast, high pitch, low pitch etc.) thismethod still allows to recognize them.

    Lets say we have two speech sequences of the same word as in figure 3.13 that we want tocompare. Using euclidean distance matching element by element will most often lead to poorrecognition, why DTW are used instead.

  • Application of LabVIEW and myRIO to voice controlled home automation 20

    Figure 3.13: Euclidean & DTW matching of two sequences [7].

    The first step in DTW is to create the cost matrix D : m×n, where the two comparing sequencesx & y represent each axis.

    x = [x1, x2, x3, ..., xi, ..., xn]

    y = [y1, y2, y3, ..., yj , ..., ym]

    The cost matrix can then be represented as in equation 3.11 or as an illustration with emptyvalues in figure 3.14.

    D(i, j) = Dist(i, j) +min

    D(i− 1, j)D(i, j − 1)D(i− 1, j − 1)

    (3.11)

    If values of the comparing sequences x & y as given as below

    x = [1, 2, 4, 3, 5, 3, 2, 3, 2, 5]

    y = [1, 1, 2, 4, 3, 5, 3, 2, 3, 2]

    Then equation 3.11 is used to calculate each element. The calculation of the elements begin inleft bottom corner i.e. D(1, 1), where the absolute value of the distance is |1− 1| = 0. Since novalue exist below or to the left of this value, the result will be D(1, 1) = 0. If we instead look in

  • Application of LabVIEW and myRIO to voice controlled home automation 21

    1 2 3 . . . i . . m

    1

    2

    3

    .

    .j

    .

    .

    .n

    Figure 3.14: An empty 10× 10 cost matrix, D.

    the second column for the value of D(2, 5), we have |2− 3| = 1 and the min(6, 4, 2) = 2, whichyields D(2, 5) = 1 + 2 = 3. In this way the matrix can be filled up column by column. The firstand second column in this example will then look like in figure 3.15

    1 2 4 3 5 3 2 3 2 5

    1

    1

    2

    4

    3

    5

    3

    2

    3

    2

    1

    0

    0

    12

    10

    4

    16

    15

    13

    1

    1

    0

    3

    6

    7

    7

    8

    8

    6

    2

    Figure 3.15: A 10× 10 cost matrix, D, where the two first columns of values are calculated.

    When all elements in the matrix are calculated, the process of creating a warp path, W =(w1, w2, w3, ..., wk) begins. This is done by backtracking and greedy search to minimize thedistance as in equation 3.12.

    Dist(W ) =

    k=L∑k=1

    Dist(wki, wkj) (3.12)

    The equation begin to look in the upper right corner of the matrix and searches for the minimalvalue of the neighbouring left, bottom and left-bottom values. The process is repeated until apath is found all the way to the bottom-left corner as in figure 3.16. Since Dist(W ) is the sum

  • Application of LabVIEW and myRIO to voice controlled home automation 22

    of all values in the path, it is possible to see that the warp path in this example will be justequal to the value of the upper right corner (Dist(W ) = 3). This is a very low value, whichindicates a good match.

    1 2 4 3 5 3 2 3 2 5

    1

    1

    2

    4

    3

    5

    3

    2

    3

    2

    1

    0

    0

    12

    10

    4

    16

    15

    13

    1

    1

    0

    3

    6

    7

    7

    8

    8

    6

    2

    4

    4

    2

    0

    1

    2

    3

    5

    6

    8

    6

    6

    3

    1

    0

    2

    2

    3

    3

    4 6

    6

    5

    2

    0

    2

    2

    6

    10

    10

    2

    1

    1

    0

    2

    2

    3

    7

    12

    12

    1

    1

    0

    1

    5

    3

    5

    7

    13

    13

    1

    0

    1

    1

    5

    3

    6

    8

    15

    15

    0

    1

    1

    2

    6

    4

    8

    8

    16

    16

    3

    3

    4

    4

    4

    6

    9

    11

    20

    20

    Figure 3.16: A 10× 10 cost matrix, D, where all columns of values are calculated as well asthe warp path.

    3.2 Communication protocols

    3.2.1 Open System Interconnection Model (OSI)

    To describe how information is transferred from one networked device to another by a transmissionmedium the OSI model (Open Systems Interconnection Model) is used. This model has a totalof seven layers each with its own function. Starting from the top is the Application layer(layer 7) which is the layer where the user is interacting with high level APIs. Next is thePresentation layer where the operating system handles the data, it can be functions such asencryption/decryption, translation or data compression. The fifth layer is the session layerwhose function is to handle multiple back and forth continuously transmission of informationbetween two nodes in other words a session, a session between a computer and a web serviceis for example created whenever a website is visited. The following layer is the transport layerwhich handles the reliability of the data segments being sent with functions such as segmentation(dividing data packets into smaller parts), acknowledgement (a signal to specify that data hasbeen sent/received), multiplexing (combining multiple stream of information/signals into onecomplex signal). Following these is the Network layer, this layer handles the structuring of anetwork i.e. addressing, routing and traffic control. Layer 2 is called the Data link layer and thislayer consist of two sub layers named Logical link control (LLC) and Medium access control(MAC). The LCC manage the flow control of information and multiplexing for the logical linkand the MAC layer provides flow control and multiplexing for the transmission medium. The lastlayer is the Physical layer whose function is the transmission and reception of raw bit streamsover a physical medium. The OSI model can be divided into two parts, one part is mostlysoftware and can be called the host layers (layer 7-4) and the other layers (layer 3-1) is mostlyhardware and is sometimes called the media layers and is illustrated in figure 3.17.

  • Application of LabVIEW and myRIO to voice controlled home automation 23

    76543

    Application.

    Presentation

    Session

    Transport

    Network

    21

    Data Link

    Physical

    Host L

    ayers

    Media Layers

    Figure 3.17: The ISO model with its layers.

    3.2.2 IEEE 802.11 & WIFI

    3.2.2.1 IEEE 802.11

    The IEEE 802 is a set of standardized protocols developed by the IEEE (Institude of Electricaland Electronics Engineers) dealing with LAN (Local area networks) and MAN (Metropolitan areanetworks). There are many protocols that are part of the IEEE 802 family with different workinggroups, for example the IEEE 802.3 deals with Ethernet and the IEEE 802.19 working groupdevelops standards for coexistence between standards of unlicensed devices and the standardsfor WLAN (Wireless local area network) is the IEEE 802.11. The 802.11 protocol only operatesin the lower layers of the OSI model i.e. the data link and physical layers.

    3.2.2.2 WIFI

    WIFI is a wireless networking communication technology, the communication uses radio waves totransmit and receive information. WIFI represents a wireless local area network (WLAN) whichis part of the IEEE 802.11 standards. The most widely used operating frequency is 2.4 GHz,but newer routers can also operate at 5 GHz, a technology called dualband. This technologyoffer more channels and a data transfer rate up to 600Mbit/s.

    The WLAN consist of a gateway, formerly a router which receives and transmits signals from aninternet service provider. The router then forward signals to a receiver within the range, whichcould be a computer, cell phone or other WIFI enabled device. The range of the WLAN can beextended by a WIFI bridge, but in case of low signals- or if a device is out of the range, Ethernetcables can be used. A wireless local area network is exemplified in figure 3.18.

  • Application of LabVIEW and myRIO to voice controlled home automation 24

    Figure 3.18: Example of a WIFI network [24].

    3.2.3 ZigBee & IEEE 802.15.4

    3.2.3.1 IEEE 802.15.4

    The IEEE 802.15.4 is an architecture, on which the zigbee protocol is based on. To simplifythe standard, the IEEE 802.15.4 is defined in terms of a number of blocks. These blocks arereferred to as layers. Each layer is responsible for one part of the standard and provides thehigher layers with services and between the layers there are interfaces which serves to definelogical links described by this standard [10].

    A low rate wireless personal area network (LR - WPAN) device consists of at least one PHY layercontaining the radio frequency (RF) transceiver together with its low - level control mechanismand a MAC sublayer providing access to the physical channel for all types of transfers. In agraphic representation, figure 3.19 shows these blocks which are further described below [10].

    Layer 1

    Layer 2

    Figure 3.19: The IEEE 802.15.4 standard uses two layers as well as the LCC- and SSCS-layersfor communication with all layers above defined by additional standards.

  • Application of LabVIEW and myRIO to voice controlled home automation 25

    Two services are provided by PHY: the PHY data service and the PHY management service.The PHY data service enables PHY protocol data units (PPDUs) to be transmitted andreceived across the physical radio channel. PHY’s features are radio transceiver activationand deactivation, energy detection (ED), link quality indication (LQI), channel selection, clearchannel assessment (CCA), and transmission and receiving packets across the physical medium.The ultra wide band PHY also has a feature of precision range [10].

    Two services are provided by the MAC sublayer: the MAC data service and the MAC managementservice interfacing with the MAC sublayer management entity (MLME) service access point (SAP)(MLME - SAP). The MAC data service enables MAC data units (MPDUs) to be transmitted andreceived throughout the PHY data service. MAC sublayer features include beacon management,channel access, guaranteed time slot management, frame validation, recognized frame delivery,association, and disassociation. The MAC sublayer also provides features for appropriate securitymechanisms to be implemented [10].

    The IEEE 802.15.4 standard also includes a logical link control (LLC) and service specificconvergence sub-layer (SSCS), which are added to communicate with layers above defined byadditional standards [8].

    The intention of the standard is to have a format which can be used by other protocols andfeatures (layers three to seven). There are three different frequency assignments used by theprotocol (902-928 MHz in America & 868 MHz in Europe), but a frequency of 2.4GHz is mostwidely used worldwide [8].

    3.2.3.2 ZigBee

    ZigBee is representing the enhancement layers three to seven according to the OSI model, wherethe representation of each layer is shown in Figure 3.20.

    Layer three and four defines the additional communication features. Example of enhancements inremaining layers can be to check valid nodes, encrypt for security and use forwarding capabilityand data routing to enable mesh networking. The most prominent utilization of ZigBee is remotesensor systems utilizing the mesh topology. A nice benefit with mesh topology is that all nodesin the system can communicate with any other node [8], leading to an increased network whichcan be spread over a larger area. The functionality and reliability will also be better since nodescan be bypassed in case of an disabled node. Examples of network topology is shown in Figure3.21, where a coordinator (the black dot) can communicate with other nodes in different ways.There is also another version of ZigBee available supporting energy saving. This version doesnot need either a battery or AC power to be maintained. A key benefit with ZigBee is howeverthe fact that there are a huge amount of pre-developed applications available. The applicationused in this paper is called Light link and are specifically used to control smart LED lights.

  • Application of LabVIEW and myRIO to voice controlled home automation 26

    Figure 3.20: OSI model and the ZigBee model

    Star Mesh Cluster Tree

    Figure 3.21: Different types of zigbee network topology

    3.2.3.3 Zigbee cluster library

    The Zigbee Cluster Library (ZCL) is a library of standardized commands and functions whichare grouped together as clusters, ”A cluster is a related collection of commands and attributes,which together define an interface to specific functionality” [1]. The different clusters havedifferent IDs but for this project the focus will be on the clusters ”on/off”, ””Level Control”and ”Color Control” (Cluster ID ”0x0006”, ”0x0008 and ”0x300” respectively). Beside clusterID there are also profile IDs which are IDs for related applications and devices. There are anumber of public profiles and they are designed so that products from different manufacturescan work together. For this project the Home Automation (HA) profile is used and a table

  • Application of LabVIEW and myRIO to voice controlled home automation 27

    of other profiles and there respectively IDs can be seen in Table 3.2. The ZCL uses frames to

    Profile ID Profile Name

    0101 Industrial Plant Monitoring (IPM)0104 Home Automation (HA)0105 Commercial Building Automation (CBA)0107 Telecom Applications (TA)0108 Personal Home & Hospital Care (PHHC)0109 Advanced Metering Initiative (AMI)

    Table 3.2: Zigbee public profile IDs and Profile name

    transmit information where the cluster ID and profile ID is specified. The frames are createdusing a software called XCTU and consist of the following parameters

    • Delimiter (DL)

    • Length (L)

    • Frame type (FT)

    • Frame ID (FI)

    • 64-bit destination address (64bit)

    • 16-bit destination address (16bit)

    • Source endpoint (SE)

    • Destination endpoint (DE)

    • Cluster ID (CI)

    • Profile ID (PI)

    • Broadcast radius (BR)

    • Options (Opt)

    • Data payload (DP)

    • Checksum (CS)

    The delimiter is the first byte of a frame which indicate the beginning of a data frame and itis for this project always 0x7E. The length specifies the total number of bytes excluding thedelimiter, length and checksum. The frame type specifies the API type that is used and forthis project the API ID is 0x11 which is ”Explicit Addressing Command Frame” and allowsendpoint and cluster ID to be specified for a wireless transmission. To receive a response of thetranmission the Frame ID was chosen as 0x01 (setting it to 0 will disable the response frame).The 64-bit address represent the destination address, for this project Philips hue light bulbs wasused so the MAC address of those bulbs is the 64-bit address and it is described in section 4.1.4.4how these addresses are obtained. A device that joins a Zigbee network receives a 16-bit addressalso called the network address and when the address is unknown or when sending a broadcastthe value is ”0xFFFE”, this is why the 64-bit adress is included in the frame to ensure that the

  • Application of LabVIEW and myRIO to voice controlled home automation 28

    data is being transmitted to the correct device. The source endpoint and destination endpointwere set to default values (0xE8 and 0xB). Cluster ID is the specific ID for the function thatis to be accessed example (0x0300 for color function). The Profile ID is ”0104” or the HomeAutomation profile (HA). Broadcast radius is set to 0x00 which sets the broadcast hops amountsto the maximum value. The data payload is the command and will be different depending onthe function that is to be accessed, for example the payload ”01 00 01 00 10” is the payloadfor the command ”turn on” and the payload ”01 00 00 00 10” is the payload for ”turn off” thepayloads used in this project can be seen in Table 3.3. Checksum is the control of data integrityi.e. to check if there were any error during transmission. The equation to calculate the checksumfrom a frame is shown in equation 3.13 and the method is to add all frame parameters exceptthe delimiter and length then take out the lowest 8-bit from this sum and subtract it from FFotherwise known as an Bitwise AND operation. To check if the checksum is valid and correctlyadd all parameters including the checksum and the last 2 digits of the sum will be FF. All digitsare in hexadecimal.

    Checksum = (FT+FI+(64bit)+(16bit)+SE+DE+CI+PI+BR+Opt+DP )&FF (3.13)

    Payload[Hexadecimal] Command Scale [Decimal/Hex]

    01 00 01 00 10 Turn on -01 00 00 00 10 Turn off -01 00 04 XX 10 00 10 Level control, XX represent the specific light intensity 0-255/0x00-FF01 00 06 XX YY 10 00 10 Color, XX=specific color and YY=saturation 0-255/0x00-FF

    Table 3.3: Payload data packet in Hex values and their corresponding function and scale

    3.2.4 Inter-Integrated Circuit (I2C)

    The I2C-bus is a very popular communication protocol which is developed to communicatebetween master- and slave devices. It has the benefits that a single bus can be used for multipledevices as shown in Figure 3.22, where typically a micro controller is used as master device andsensors, DACs/ADCc, LCD screens and controls are slave devices. All the devices are connectedto just two pins controlled by the master device [23].

    Master

    Slave Slave Slave

    SDA

    SCL

    Figure 3.22: Example of an I2C-bus with one master device and three slave devices

    The communication begins with initialization of a START condition sent by the master, whichhappens when the SDA line goes from high to low and the SCL line is high. The communicationthen terminates when a STOP condition is sent from the master, which is defined by a transitionfrom low-to-high on the SDA line while the SCL is high as seen in figure 3.23.

    The SCL clock defines the speed of transfer, where only one data bit is transferred during eachclock pulse. One byte then contains eight bits on the SDA line. A byte can represent either a

  • Application of LabVIEW and myRIO to voice controlled home automation 29

    Figure 3.23: Example of START and STOP conditions in an I2C circuit. [23]

    register address, device address or some data read from or written to a slave, where the orderof the data bits will be with Most significant bit (MSB) first. Between the Start and STOPconditions, data bytes of any length can be transferred between the master and slave. It isimportant that data in the SDA line is stable during the clock periods high phase, since changesof data when SCL is high will be interpreted as a START- or STOP condition [23].

    When a byte has been sent it will be followed by an ACK bit from the receiver. This bit tellsthe transmitter that the byte was received successfully and that it is ready for another byte tobe sent. The ACK bit can not be sent from the receiver before the SDA line has been released,why the receiver pulls down The SDA line from high to low. This will be done during the wholeclock period 9 (The ACK-period) so that stability of the SDA line as low is guaranteed whenthe clock pulse is high [23]. An illustration of this is shown in Figure 3.24.

    Figure 3.24: An example of a single byte I2C data transfer. [23]

    When writing to a slave on the I2C-bus, the start condition sent from the Master will be theaddress of the slave. Following bit will be the Read/write-bit which are set to 0, which representswrite. When the slave has sent an ACK bit, the master will send a byte representing the registeraddress of the specific register it wants to write to. Another ACK-bit is then sent from the Slave,which tells the master it is ready. Afterwards, the master will begin sending data accordingto the register until all data has been sent, which finally finishes with a STOP-condition thatterminates the transmission. An example of a write register are presenter in figure 3.25

    To read from a slave device is similar to the writing process, but needs some more steps. Thefirst step is to send an instruction from the master to the slave telling what register it wantsit to read from. This is done in the same way as in the writing process, by sending the slave

  • Application of LabVIEW and myRIO to voice controlled home automation 30

    Figure 3.25: Example of an I2C write register. [23]

    address followed by the Read/Write bit set to 0 (which mean write) and then further the registeraddress it wants to read from. When the slave has sent the ACK command, the master onceagain sends a START condition, followed by the Slave’s address, but this time the Read/Writecommand is set to 1 (representing read). The slave will then send an ACK command for therequest to read which makes the master release the SDA bus. The master will however stillcontinue to supply the clock, even though the slave has the command. At this point, the masterwill become a master-receiver while the slave acts as a slave-transmitter.

    Clock pulses will still be sent out by the master, but at the same time the SDA line will bekept released, which makes it possible for the slave to transmit data. After every byte an ACKcommand will be sent by the master to the slave to make the slave ready to send more data.When all expected bytes has been received by the master, a NACK command will be sent outby it, which tells the slave that the bus should be released to halt the communication. Thenfinally, a STOP condition is sent out by the master [23]. An example of an I2C read command isshown in Figure 3.26.

    Figure 3.26: Example of an I2C read register. [23]

    3.2.5 Universal Asynchronous Receiver/Transmitter (UART)

    UART is a serial communication protocol which requires only two wires for communicatingdata between source and destination. In serial communication a single bit is transmitted atthe time sequentially over a single wire. By serial transmission of digital information througha single wire, channel size and overall communication can be reduced compared to parallelcommunication through multiple wires. Since UART communicates asynchronously, no clocksignal is required by the receiver to synchronize or validate data sent from the transmitter. Thistype of communication is in contrast to synchronous serial communication, where a clock signalis shared between the transmitter and the receiver in order to synchronize data. In UART, dataflows from The Tx-pin of the transmitter to the Rx-pin of the receiver and vice versa as seen infigure 3.27. Also, to achieve a common reference, both devices should be connected to the sameground.

    UART achieve data synchronization by two mechanisms. The two communicating devices needsto share the same timing reference, which can be achieved by setting the baud rate. Also a start-as well as a stop-bit are used in the beginning and end of each data byte. Baud rate is a rate of

  • Application of LabVIEW and myRIO to voice controlled home automation 31

    Rx

    Tx

    GND

    Tx

    Rx

    GND

    Device 1 Device 2

    Figure 3.27: UART connections between two devices.

    data transfer in serial communication and is expressed in bits per second (bps). There are somestandard baud rates defined (2400,4800,9600...) which can be configured in both devices.

    The data frame of UART (figure 3.28) begins with an idle state where the logic is high. Thisshows that the line and transmitter is not damaged. Each frame then consist of a start-bit,data-bits, possibly a parity-bit and finally a stop-bit. The start-bit signals that a new characteris coming. The next 5-9 bits, depending on the configuration, represent the data. If a paritybit is configured, it is placed directly after the last data bit. This bit is used by the receiver todetect errors in the transmission if any data has been changed. The stop bit (or two stop bits)will always be in the logic high state, which signals that the transmission of the character iscompleted. All characters will be sent from the transmitter one by one in the same format.

    Figure 3.28: Example of a one byte UART communication. [21]

  • Chapter 4

    Implementation

    4.1 Voice controlled system using Alexa

    A voice controlled system using Alexa echo dot and myRIO was created. For the speechrecognition part the Alexa was used and myRIO was used as the microcontroller acting as thecentral control unit which sends and receives commands to/from various devices. This systemmakes use of both WIFI and Zigbee communication as well as UART and I2C protocols.

    4.1.1 Amazon Alexa

    Amazon Alexa is a cloud based voice recognition system supported by Alexa devices. There arecurrently three different kinds of Alexa devices available on the market; Amazon Echo, EchoDot and Amazon Tap. The first generation of device released was Amazon Echo, which staysin listening mode until the wake up word ”Alexa” is said. When it wakes up, a single voicecommand can be served and then the device returns to listening mode again. Amazon Echo isthe largest in the series with its 9.25-inches in height and appear as a cylinder shaped speakerwith an array of 7 small microphones. The mid-size option is the 6.2-inch-tall Amazon Tap,which is also portable and supplies batteries. Otherwise, functions are similar to the AmazonEcho. The latest generation of Echo devices is the Echo Dot, which is the smallest one with only1.6 inches in height, but also share most functions with Amazon Echo. Both Amazon Echo andEcho Dot require plug in power supply, why they best fit for fixed locations, like inside a kitchenor living room [14].

    The voice recognition part for certain voice commands is further made by Alexa voice service. Aconcept of the voice service in connection with an Alexa device controlling smart home devicesis illustrated in figure 4.1. Sending a control command to a smart device begins with wakingthe Alexa up by saying ”Alexa”. Afterwards the specific command can be said. The soundof this voice command are then sent via the connected WIFI network to the Voice processingcloud. If the cloud validates the sound as a known command, the command is sent further to aSmart Home Skill Adapter, which enables cooperation with third party providers. Further, thecommand is sent to the third party provider’s cloud, which remotely can communicate with thespecific smart home device. [14].

    32

  • Application of LabVIEW and myRIO for voice controlled home automation 33

    Figure 4.1: A model of Alexa voice service [14].

    4.1.2 IFTTT

    IFFT (If This Then That) is a web-based platform that connects different services and devicestogether and makes it easy to create and customize your own apps called applets. It supports awide variety of services such as YouTube, Google, Dropbox and Amazon Alexa to name a few.The priciple is that if a specific action is done then do an action, hence the name IFTTT. Theconcept can be broken down in 4 steps.

    • Choose a service

    • Choose a trigger (the IF statement)

    • Choose action service (which service should the If statement trigger)

    • Choose action fields (The That statement)

    For this project the services Amazon Alexa and Webhooks was used as service and action servicerespectively. For the If statement a specific phrase said to Alexa was chosen as a trigger andWebhooks was used to send a web request to a specific URL. The only requirement for thetrigger is that the user has to say ”alexa trigger” before the actual command, for this project anexample would be that if a user would like to turn on lamp 1 the phrase ”alexa trigger Light 1on” would be uttered. A full list of all the specific applets created and their functions can beseen in Table 4.1.

  • Application of LabVIEW and myRIO for voice controlled home automation 34

    Trigger phrase: Function:

    Light 1 on Turns on lamp 1Light 1 off Turns off lamp 1Dim Light 1 Decrease the light intensity on lamp 1Full dim light 1 Maximum light intensity on lamp 1Light 2 on Turns on lamp 2Light 2 off Turns off lamp 2Dim Light 2 Decrease the light intensity on lamp 2Full dim light 2 Maximum light intensity on lamp 2Normal light Set normal (white) light on lamp 2Red light Set red light on lamp 2Blue light Set blue light on lamp 2Green light Set green light on lamp 2Yellow light Set yellow light on lamp 2Lightshow Lamp 2 fades between red,blue,green and yellow light every two secondsLight control on Automatically changes the intensity of lamp 1 to a specific setpointLight control off Turns light control offLight sensor display on Display the current light sensor value on a displayLight sensor display off Remove the light sensor value from the displayTemperature display on Display the temperature sensor value on a displayTemperature display off Remove the temperature value from the displayVoice temperature Get the current temperature value as an audio responseSetpoint 200 Set the light control value at 200 (Lux)Setpoint 300 Set the light control value at 300 (Lux)Setpoint 500 Set the light control value at 500 (Lux)Setpoint 800 Set the light control value at 800 (Lux)Setpoint 1000 Set the light control value at 1000 (Lux)Setpoint 1200 Set the light control value at 1200 (Lux)Setpoint 1500 Set the light control value at 1500 (Lu