mälardalen university press licentiate theses no. 200 utilizing hardware monitoring...

111
Mälardalen University Press Licentiate Theses No. 200 UTILIZING HARDWARE MONITORING TO IMPROVE THE PERFORMANCE OF INDUSTRIAL SYSTEMS Marcus Jägemar 2016 School of Innovation, Design and Engineering

Upload: others

Post on 22-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Mälardalen University Press Licentiate ThesesNo. 200

    UTILIZING HARDWARE MONITORING TO IMPROVETHE PERFORMANCE OF INDUSTRIAL SYSTEMS

    Marcus Jägemar

    2016

    School of Innovation, Design and Engineering

    Mälardalen University Press Licentiate ThesesNo. 200

    UTILIZING HARDWARE MONITORING TO IMPROVETHE PERFORMANCE OF INDUSTRIAL SYSTEMS

    Marcus Jägemar

    2016

    School of Innovation, Design and Engineering

  • Copyright © Marcus Jägemar, 2016ISBN 978-91-7485-203-5ISSN 1651-9256Printed by Arkitektkopia, Västerås, Sweden

    Abstract

    THE drastically increasing use of Information and Communications Tech-nology has resulted in a growing demand for network capacity. In thisLicentiate thesis, we show how to monitor, model and finally improvenetwork performance for large industrial systems. We also show how to usemodeling techniques to move performance testing to an earlier design phase,with the aim to reduce the total development time of large systems. Our firstcontribution is a low-intrusive method for long-term hardware characteristicmeasurements of production nodes located at customer sites. Our second con-tribution is a technique to mimic the hardware usage of a production envi-ronment by creating a characteristics model. The cloned environment makesfunction test suites more realistic. The goal when creating the model is to re-duce the system development time by moving late-stage performance testingto early design phases thereby improving the quality of the test environment.The third and final contribution is a network performance improvement wherewe dynamically trade computational capacity for a message round-trip timereduction when there are CPU cycles to spare. We have implemented an au-tomatic feedback controlled mechanism for transparent message compressionresulting in improved messaging performance between interconnected networknodes. Our mechanism continuously evaluates eleven compression algorithmson message stream content and network congestion level. The message sub-system will use the compression algorithm that provides the lowest messagingtime. If the message content or network load change, a new evaluation is per-formed. We have conducted several case studies in an industrial environmentand verified all contributions on a large telecommunication system manufac-tured by Ericsson. System engineers frequently use the monitoring and model-ing functionality for debugging purposes in production environments. We havedeployed all techniques in a complicated industrial legacy system with minimalimpact. We show that we can provide not only a solution but a cost-effectivesolution, which is an important requirement for industrial systems.

    i

  • Abstract

    THE drastically increasing use of Information and Communications Tech-nology has resulted in a growing demand for network capacity. In thisLicentiate thesis, we show how to monitor, model and finally improvenetwork performance for large industrial systems. We also show how to usemodeling techniques to move performance testing to an earlier design phase,with the aim to reduce the total development time of large systems. Our firstcontribution is a low-intrusive method for long-term hardware characteristicmeasurements of production nodes located at customer sites. Our second con-tribution is a technique to mimic the hardware usage of a production envi-ronment by creating a characteristics model. The cloned environment makesfunction test suites more realistic. The goal when creating the model is to re-duce the system development time by moving late-stage performance testingto early design phases thereby improving the quality of the test environment.The third and final contribution is a network performance improvement wherewe dynamically trade computational capacity for a message round-trip timereduction when there are CPU cycles to spare. We have implemented an au-tomatic feedback controlled mechanism for transparent message compressionresulting in improved messaging performance between interconnected networknodes. Our mechanism continuously evaluates eleven compression algorithmson message stream content and network congestion level. The message sub-system will use the compression algorithm that provides the lowest messagingtime. If the message content or network load change, a new evaluation is per-formed. We have conducted several case studies in an industrial environmentand verified all contributions on a large telecommunication system manufac-tured by Ericsson. System engineers frequently use the monitoring and model-ing functionality for debugging purposes in production environments. We havedeployed all techniques in a complicated industrial legacy system with minimalimpact. We show that we can provide not only a solution but a cost-effectivesolution, which is an important requirement for industrial systems.

    i

  • Sammanfattning

    TELEKOMMUNIKATIONSBRANCHEN står just nu inför en stor utmaningdär kommunikationsprestanda och snabba leveranstider blir allt merviktiga för att positionera sig i den ökande konkurrensen. I denna li-centiatavhandling beskriver vi hur man kan observera, modellera och slutligenförbättra kommunikationsprestandan på telekommunikationssystem och andrastora industriella datorsystem. Vi visar också hur man kan korta ner den totalautvecklingstiden genom att använda modellsystem för prestandautvärdering itidiga delar av utvecklingsprocessen.

    Det första forskningsbidraget är en fallstudie med en effektiv metod föratt kontinuerligt läsa ut hårdvarukaraktäristik från ett produktionssatt telekom-system. Vi har inriktat oss mot tekniker med låg påverkan på det system somobserveras, vilket är lämpligt för undersökningar i prestandakritisk produktion-smiljö. Den hårdvarukaraktäristik som lästs ut använder vi i vårt andra forskn-ingsbidrag där vi har skapat en exekveringsmodell som kör på ett mindre lab-system. Målet med modellen är 1) att korta ner tiden mellan utvecklingsstartoch prestandatester samt 2) skapa en bättre testmiljö för karaktäristiktester. Idet tredje och sista forskningsbidraget presenterar vi en metod för prestanda-förbättringar genom att selektivt komprimera meddelanden om det ger en snab-bare överföringstid i kommunikationssystemet. Flera komprimeringsalgorit-mer utvärderas kontinuerligt och den kompressionsalgoritm som ger kortastöverföringstid används för en majoritet av meddelandena. Förändringar i med-delandeströmmen eller nätverkets utnyttjandegrad övervakas löpande och an-vänds vid utvärderingen av de tillgängliga kompressionsalgoritmerna.

    All programvaruutveckling och test har genomförts på ett industriellt tele-kommunikationssystem tillverkat av Ericsson. Alla tekniker är implementer-ade för bruk i produktionsmiljö och monitorerings- och modelleringsfunktion-aliteten används kontinuerligt i felsökningsysfte av produktionssystemet. Detekniker vi presenterar i denna avhandling ger också en kostnadseffektiv lös-ning, vilket är en viktigt krav för industriella system.

    iii

  • Sammanfattning

    TELEKOMMUNIKATIONSBRANCHEN står just nu inför en stor utmaningdär kommunikationsprestanda och snabba leveranstider blir allt merviktiga för att positionera sig i den ökande konkurrensen. I denna li-centiatavhandling beskriver vi hur man kan observera, modellera och slutligenförbättra kommunikationsprestandan på telekommunikationssystem och andrastora industriella datorsystem. Vi visar också hur man kan korta ner den totalautvecklingstiden genom att använda modellsystem för prestandautvärdering itidiga delar av utvecklingsprocessen.

    Det första forskningsbidraget är en fallstudie med en effektiv metod föratt kontinuerligt läsa ut hårdvarukaraktäristik från ett produktionssatt telekom-system. Vi har inriktat oss mot tekniker med låg påverkan på det system somobserveras, vilket är lämpligt för undersökningar i prestandakritisk produktion-smiljö. Den hårdvarukaraktäristik som lästs ut använder vi i vårt andra forskn-ingsbidrag där vi har skapat en exekveringsmodell som kör på ett mindre lab-system. Målet med modellen är 1) att korta ner tiden mellan utvecklingsstartoch prestandatester samt 2) skapa en bättre testmiljö för karaktäristiktester. Idet tredje och sista forskningsbidraget presenterar vi en metod för prestanda-förbättringar genom att selektivt komprimera meddelanden om det ger en snab-bare överföringstid i kommunikationssystemet. Flera komprimeringsalgorit-mer utvärderas kontinuerligt och den kompressionsalgoritm som ger kortastöverföringstid används för en majoritet av meddelandena. Förändringar i med-delandeströmmen eller nätverkets utnyttjandegrad övervakas löpande och an-vänds vid utvärderingen av de tillgängliga kompressionsalgoritmerna.

    All programvaruutveckling och test har genomförts på ett industriellt tele-kommunikationssystem tillverkat av Ericsson. Alla tekniker är implementer-ade för bruk i produktionsmiljö och monitorerings- och modelleringsfunktion-aliteten används kontinuerligt i felsökningsysfte av produktionssystemet. Detekniker vi presenterar i denna avhandling ger också en kostnadseffektiv lös-ning, vilket är en viktigt krav för industriella system.

    iii

  • To Karolinn

  • To Karolinn

  • Acknowledgements

    FIRST of all, I would like to thank my supervisors and co-authors, BjörnLisper, Sigrid Eldh and Andreas Ermedahl for your patience and help-ful discussions during my studies. I would also like to express grat-itude towards my manager, Magnus Schlyter, who has always supported methroughout the work on this thesis. The work presented in this Licentiate the-sis has been funded by Ericsson and the Swedish Knowledge Foundation (KKstiftelsen) through the ITS-EASY program at Mälardalen University.

    Furthermore, thanks to all students in the ITS-EASY research group, we allshare the ups and downs of studying for a PhD; Apala Ray, Daniel Hallmans,Daniel Kade, David Rylander, Eduard Paul Eniou, Fredrik Ekstrand, GaetanaSapienza, Kristian Wiklund, Markus Wallmyr, Mehrdad Saadatmand, MelikaHozhabri, Sara Dersten, Stephan Baumgart, and Tomas Olsson.

    I would also like to thank my additional co-authors: Björn Lisper, Sigrid Eldh,Andreas Ermedahl, Gordana Dodig-Crnkovic, Rafia Inam, Mikael Sjödin, DanielHallmans, Stig Larsson and Thomas Nolte. I really enjoyed working with you.

    I have the greatest gratitude to my parents; my mother and father who alwayswanted me to study hard to become something they never could.

    Finally and foremost, I want to express my endless love for Karolinn and ourthree daughters, Amelie, Lovisa and Elise. I would not have been able to writethis thesis without your support and encouragement.

    Marcus Jägemar

    Sigtuna, May 2016

    vii

  • Acknowledgements

    FIRST of all, I would like to thank my supervisors and co-authors, BjörnLisper, Sigrid Eldh and Andreas Ermedahl for your patience and help-ful discussions during my studies. I would also like to express grat-itude towards my manager, Magnus Schlyter, who has always supported methroughout the work on this thesis. The work presented in this Licentiate the-sis has been funded by Ericsson and the Swedish Knowledge Foundation (KKstiftelsen) through the ITS-EASY program at Mälardalen University.

    Furthermore, thanks to all students in the ITS-EASY research group, we allshare the ups and downs of studying for a PhD; Apala Ray, Daniel Hallmans,Daniel Kade, David Rylander, Eduard Paul Eniou, Fredrik Ekstrand, GaetanaSapienza, Kristian Wiklund, Markus Wallmyr, Mehrdad Saadatmand, MelikaHozhabri, Sara Dersten, Stephan Baumgart, and Tomas Olsson.

    I would also like to thank my additional co-authors: Björn Lisper, Sigrid Eldh,Andreas Ermedahl, Gordana Dodig-Crnkovic, Rafia Inam, Mikael Sjödin, DanielHallmans, Stig Larsson and Thomas Nolte. I really enjoyed working with you.

    I have the greatest gratitude to my parents; my mother and father who alwayswanted me to study hard to become something they never could.

    Finally and foremost, I want to express my endless love for Karolinn and ourthree daughters, Amelie, Lovisa and Elise. I would not have been able to writethis thesis without your support and encouragement.

    Marcus Jägemar

    Sigtuna, May 2016

    vii

  • List of Publications

    Included PublicationsA Marcus Jägemar, Sigrid Eldh, Andreas Ermedahl, Björn Lisper and

    Gabor Andai. Automatic Load Synthesis for Performance Verifica-tion in Early Design Phases. Technical Report, 2016. [68].This technical report, quoted in Chapter 7, is an extension of the al-ready published papers C [64], E [65] and the technical report I [63].

    B Marcus Jägemar, Sigrid Eldh, Andreas Ermedahl and Björn Lisper.Automatic Message Compression with Overload Protection. In press:Journal of Systems and Software, 2016. [67].This paper, quoted in Chapter 8, is an extension of the already pub-lished paper G [66].

    Changes to Included PublicationsPapers A and B are quoted in full but have been reformatted to fit the layoutof this thesis. Chapter 5, includes related work sections of both papers. In asimilar fashion, Chapter 6 contains future work from both papers.

    ix

  • List of Publications

    Included PublicationsA Marcus Jägemar, Sigrid Eldh, Andreas Ermedahl, Björn Lisper and

    Gabor Andai. Automatic Load Synthesis for Performance Verifica-tion in Early Design Phases. Technical Report, 2016. [68].This technical report, quoted in Chapter 7, is an extension of the al-ready published papers C [64], E [65] and the technical report I [63].

    B Marcus Jägemar, Sigrid Eldh, Andreas Ermedahl and Björn Lisper.Automatic Message Compression with Overload Protection. In press:Journal of Systems and Software, 2016. [67].This paper, quoted in Chapter 8, is an extension of the already pub-lished paper G [66].

    Changes to Included PublicationsPapers A and B are quoted in full but have been reformatted to fit the layoutof this thesis. Chapter 5, includes related work sections of both papers. In asimilar fashion, Chapter 6 contains future work from both papers.

    ix

  • x

    Other PublicationsC Marcus Jägemar, Sigrid Eldh, Andreas Ermedahl and Björn Lisper.

    Towards Feedback-Based Generation of Hardware Characteristics.In Proceedings of the International Workshop on Feedback Comput-ing, 2012. [64]

    D Rafia Inam, Mikael Sjödin and Marcus Jägemar. Bandwidth Mea-surement using Performance Counters for Predictable Multicore Soft-ware. Proceedings of the International Conference on Emerging Tech-nologies and Factory Automation (ETFA12), 2012. [58]

    E Marcus Jägemar, Sigrid Eldh, Andreas Ermedahl and Björn Lisper.Automatic Multi-Core Cache Characteristics Modelling. In Proceed-ings of the Swedish Workshop on Multicore Computing, Halmstad,2013. [65]

    F Daniel Hallmans, Marcus Jägemar, Stig Larsson and Thomas Nol-te. Identifying Evolution Problems for Large Long Term IndustrialEvolution Systems. In Proceedings of IEEE International Workshopon Industrial Experience in Embedded Systems Design, Västerås,2014. [54]

    G Marcus Jägemar, Sigrid Eldh, Andreas Ermedahl and Björn Lisper.Autonomous Feedback Controlled Message Compression. In Pro-ceedings of Computers, Software and Applications Conference(COMPSAC), Västerås, 2014. [66]

    H Marcus Jägemar and Gordana Dodig-Crnkovic Cognitively Sustain-able ICT with Ubiquitous Mobile Services - Challenges and Oppor-tunities. In Proceedings of the International Conference on SoftwareEngineering (ICSE), Firenze, Italy, 2015. [62]

    Other Technical ReportsI Marcus Jägemar, Sigrid Eldh, Andreas Ermedahl and Björn Lisper.

    Technical Report : Feedback-Based Generation of Hardware Char-acteristics, 2012. [63].

  • x

    Other PublicationsC Marcus Jägemar, Sigrid Eldh, Andreas Ermedahl and Björn Lisper.

    Towards Feedback-Based Generation of Hardware Characteristics.In Proceedings of the International Workshop on Feedback Comput-ing, 2012. [64]

    D Rafia Inam, Mikael Sjödin and Marcus Jägemar. Bandwidth Mea-surement using Performance Counters for Predictable Multicore Soft-ware. Proceedings of the International Conference on Emerging Tech-nologies and Factory Automation (ETFA12), 2012. [58]

    E Marcus Jägemar, Sigrid Eldh, Andreas Ermedahl and Björn Lisper.Automatic Multi-Core Cache Characteristics Modelling. In Proceed-ings of the Swedish Workshop on Multicore Computing, Halmstad,2013. [65]

    F Daniel Hallmans, Marcus Jägemar, Stig Larsson and Thomas Nol-te. Identifying Evolution Problems for Large Long Term IndustrialEvolution Systems. In Proceedings of IEEE International Workshopon Industrial Experience in Embedded Systems Design, Västerås,2014. [54]

    G Marcus Jägemar, Sigrid Eldh, Andreas Ermedahl and Björn Lisper.Autonomous Feedback Controlled Message Compression. In Pro-ceedings of Computers, Software and Applications Conference(COMPSAC), Västerås, 2014. [66]

    H Marcus Jägemar and Gordana Dodig-Crnkovic Cognitively Sustain-able ICT with Ubiquitous Mobile Services - Challenges and Oppor-tunities. In Proceedings of the International Conference on SoftwareEngineering (ICSE), Firenze, Italy, 2015. [62]

    Other Technical ReportsI Marcus Jägemar, Sigrid Eldh, Andreas Ermedahl and Björn Lisper.

    Technical Report : Feedback-Based Generation of Hardware Char-acteristics, 2012. [63].

  • Key Concepts

    Table 1 lists the most common abbreviations used throughout this thesis.

    Key Concept Description

    2G (GSM) The second generation telecom network, 1991, intro-duced digital communication.

    3G The third telecom network generation, 1998, enabledlarge scale digital communication with increased band-width and service availability.

    3GPP 3GPP is a standardization organization created by thetelecommunication industry. 3GPP aims to create aglobal standard that is used for development and main-tenance of telecommunication systems.

    4G (LTE) Long Term Evolution is the fourth generation telecom-munication network, 2008, with increased capacity.

    Action Re-search (AR)

    A research method where the researcher is an active partof an incremental procedure (plan, act/observe and re-flect), which is repeatedly used to improve the objectbeing investigated. AR was first expressed in 1946 byLewin [83]

    ASIC Application Specific Integrated Circuits are circuits thatcan be pre-programmed with specific functionality

    Continued on next page →

    xiii

  • Key Concepts

    Table 1 lists the most common abbreviations used throughout this thesis.

    Key Concept Description

    2G (GSM) The second generation telecom network, 1991, intro-duced digital communication.

    3G The third telecom network generation, 1998, enabledlarge scale digital communication with increased band-width and service availability.

    3GPP 3GPP is a standardization organization created by thetelecommunication industry. 3GPP aims to create aglobal standard that is used for development and main-tenance of telecommunication systems.

    4G (LTE) Long Term Evolution is the fourth generation telecom-munication network, 2008, with increased capacity.

    Action Re-search (AR)

    A research method where the researcher is an active partof an incremental procedure (plan, act/observe and re-flect), which is repeatedly used to improve the objectbeing investigated. AR was first expressed in 1946 byLewin [83]

    ASIC Application Specific Integrated Circuits are circuits thatcan be pre-programmed with specific functionality

    Continued on next page →

    xiii

  • xiv

    ← Continued from previous page

    Abbreviation Description

    Capacity As specified by the Oxford English dictionary; “Abilityto receive or contain; holding power”. We use the phrasein this thesis as the maximum available. We use capacityas a description of the maximal capability of a resource.

    CompressionRatio

    Compression ratio is denoted as cr =sizeuncomprsizecompr

    . Ahigh cr means that the compressed data is smaller thanthe uncompressed.

    COTS Common Off The Shelf are devices that does not need tobe tailored for a specific need, they can be bought fromother device manufacturer that produce common hard-ware for many purposes.

    CPI Cycles Per Instruction is a metric to determine the per-formance of a computer system. An average estimationexplains how large part of total exection can be attributedto different execution parts, such as cache misses, branchmisses, TLB misses etc. Eyerman, Eeckhout and Karkha-nis provides a good explanation a paper [40] explaining amodern CPI structure.

    Five Nines 99.999% uptime, which results maximum of approx. 5min downtime per year.

    FPGA Field Progrmmable Arrays are generic circuits that canbe programmed in runtime with new functionality.

    HW HW is a simple abbreviation for hardware, which meansall physical parts in the network, including computers,cables, circuit-boards etc.

    ICT Information Communication Technology that makes itpossible for people to communicate and easily access in-formation.

    Continued on next page →

    xv

    ← Continued from previous page

    Abbreviation Description

    Low-intrusiveMonitoring

    The monitoring mechanism does not affect the behavioror performance of the monitored system. There is no no-ticeable effect on the system.

    Node A computer designed for message processing, which ispart of a telecommunication system.

    Performance As specified by the Oxford English dictionary; “Thequality of execution of such an action, operation, or pro-cess; the competence or effectiveness of a person or thingin performing an action; spec. the capabilities, produc-tivity, or success of a machine, product, or person whenmeasured against a standard.” [93]. More specifically;a quantifiable metric on how good a particular action isperformed.

    PIDController

    Proportional Integrative Controller [12].

    ProductionNode

    One node that is running at a customer site handling realend-user traffic.

    SuperscalarProcessors

    Low-level instructions can be executed in parallel toachieve higher performance, typically more than one in-struction per clock cycle. The first commercial appear-ance was in 1988 with Intel i960CA [85].

    SW As specified by the Oxford English dictionary; ”The pro-grams and procedures required to enable a computer toperform a specific task, as opposed to the physical com-ponents of the system” [93]

    Test Node Test nodes are typically smaller than production nodesand usually only accessible by corporate personnel. Eco-nomic reasons and keeping debugging simple drive thedemand to keep test nodes being small.

    Table 1: Key concepts used in the context of this thesis.

  • xiv

    ← Continued from previous page

    Abbreviation Description

    Capacity As specified by the Oxford English dictionary; “Abilityto receive or contain; holding power”. We use the phrasein this thesis as the maximum available. We use capacityas a description of the maximal capability of a resource.

    CompressionRatio

    Compression ratio is denoted as cr =sizeuncomprsizecompr

    . Ahigh cr means that the compressed data is smaller thanthe uncompressed.

    COTS Common Off The Shelf are devices that does not need tobe tailored for a specific need, they can be bought fromother device manufacturer that produce common hard-ware for many purposes.

    CPI Cycles Per Instruction is a metric to determine the per-formance of a computer system. An average estimationexplains how large part of total exection can be attributedto different execution parts, such as cache misses, branchmisses, TLB misses etc. Eyerman, Eeckhout and Karkha-nis provides a good explanation a paper [40] explaining amodern CPI structure.

    Five Nines 99.999% uptime, which results maximum of approx. 5min downtime per year.

    FPGA Field Progrmmable Arrays are generic circuits that canbe programmed in runtime with new functionality.

    HW HW is a simple abbreviation for hardware, which meansall physical parts in the network, including computers,cables, circuit-boards etc.

    ICT Information Communication Technology that makes itpossible for people to communicate and easily access in-formation.

    Continued on next page →

    xv

    ← Continued from previous page

    Abbreviation Description

    Low-intrusiveMonitoring

    The monitoring mechanism does not affect the behavioror performance of the monitored system. There is no no-ticeable effect on the system.

    Node A computer designed for message processing, which ispart of a telecommunication system.

    Performance As specified by the Oxford English dictionary; “Thequality of execution of such an action, operation, or pro-cess; the competence or effectiveness of a person or thingin performing an action; spec. the capabilities, produc-tivity, or success of a machine, product, or person whenmeasured against a standard.” [93]. More specifically;a quantifiable metric on how good a particular action isperformed.

    PIDController

    Proportional Integrative Controller [12].

    ProductionNode

    One node that is running at a customer site handling realend-user traffic.

    SuperscalarProcessors

    Low-level instructions can be executed in parallel toachieve higher performance, typically more than one in-struction per clock cycle. The first commercial appear-ance was in 1988 with Intel i960CA [85].

    SW As specified by the Oxford English dictionary; ”The pro-grams and procedures required to enable a computer toperform a specific task, as opposed to the physical com-ponents of the system” [93]

    Test Node Test nodes are typically smaller than production nodesand usually only accessible by corporate personnel. Eco-nomic reasons and keeping debugging simple drive thedemand to keep test nodes being small.

    Table 1: Key concepts used in the context of this thesis.

  • Contents

    I Thesis 3

    1 Introduction 71.1 Monitoring a Production System . . . . . . . . . . . . . . . . 81.2 Modeling a Production System . . . . . . . . . . . . . . . . . 81.3 Improving the Communication System . . . . . . . . . . . . . 91.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2 Background 152.1 Telecommunication Standards . . . . . . . . . . . . . . . . . 172.2 Telecommunication Services . . . . . . . . . . . . . . . . . . 192.3 Industrial Systems . . . . . . . . . . . . . . . . . . . . . . . . 202.4 Deploying Our Target System . . . . . . . . . . . . . . . . . 232.5 System Details . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3 Research Summary 333.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . 34

    3.1.1 System Monitoring . . . . . . . . . . . . . . . . . . . 343.1.2 System Modeling . . . . . . . . . . . . . . . . . . . . 343.1.3 Improving System Performance . . . . . . . . . . . . 35

    3.2 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    3.3.1 System Monitoring . . . . . . . . . . . . . . . . . . . 373.3.2 System Modeling . . . . . . . . . . . . . . . . . . . . 413.3.3 System Improvement . . . . . . . . . . . . . . . . . . 433.3.4 Message Compression . . . . . . . . . . . . . . . . . 44

    3.4 Research Methodology . . . . . . . . . . . . . . . . . . . . . 46

    xvii

  • Contents

    I Thesis 3

    1 Introduction 71.1 Monitoring a Production System . . . . . . . . . . . . . . . . 81.2 Modeling a Production System . . . . . . . . . . . . . . . . . 81.3 Improving the Communication System . . . . . . . . . . . . . 91.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2 Background 152.1 Telecommunication Standards . . . . . . . . . . . . . . . . . 172.2 Telecommunication Services . . . . . . . . . . . . . . . . . . 192.3 Industrial Systems . . . . . . . . . . . . . . . . . . . . . . . . 202.4 Deploying Our Target System . . . . . . . . . . . . . . . . . 232.5 System Details . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3 Research Summary 333.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . 34

    3.1.1 System Monitoring . . . . . . . . . . . . . . . . . . . 343.1.2 System Modeling . . . . . . . . . . . . . . . . . . . . 343.1.3 Improving System Performance . . . . . . . . . . . . 35

    3.2 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    3.3.1 System Monitoring . . . . . . . . . . . . . . . . . . . 373.3.2 System Modeling . . . . . . . . . . . . . . . . . . . . 413.3.3 System Improvement . . . . . . . . . . . . . . . . . . 433.3.4 Message Compression . . . . . . . . . . . . . . . . . 44

    3.4 Research Methodology . . . . . . . . . . . . . . . . . . . . . 46

    xvii

  • xviii Contents

    3.5 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . 473.5.1 Construct Validity . . . . . . . . . . . . . . . . . . . 483.5.2 Internal Validity . . . . . . . . . . . . . . . . . . . . 483.5.3 Conclusion Validity . . . . . . . . . . . . . . . . . . . 493.5.4 Method Applicability . . . . . . . . . . . . . . . . . . 49

    4 Contributions 534.1 Publication Mapping . . . . . . . . . . . . . . . . . . . . . . 544.2 Publication Hierarchy and Timeline . . . . . . . . . . . . . . 554.3 Paper A (Based on Papers C, E and I) . . . . . . . . . . . . . 564.4 Paper B (Based on Paper G) . . . . . . . . . . . . . . . . . . 57

    5 Related Work 615.1 System Monitoring . . . . . . . . . . . . . . . . . . . . . . . 625.2 System Modeling . . . . . . . . . . . . . . . . . . . . . . . . 635.3 Message and Data Compression . . . . . . . . . . . . . . . . 655.4 Adaptive Compression . . . . . . . . . . . . . . . . . . . . . 66

    6 Conclusion and Future Work 716.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    Bibliography 74

    II Included Papers 89

    7 Automatic Load Synthesis for Performance Verification in EarlyDesign Phases 937.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 957.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 967.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    7.3.1 Method Details . . . . . . . . . . . . . . . . . . . . . 1017.4 Target System . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    7.4.1 Target System Details . . . . . . . . . . . . . . . . . 1037.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 104

    7.5.1 The Characteristics Monitor . . . . . . . . . . . . . . 1057.5.2 The CPI Stack . . . . . . . . . . . . . . . . . . . . . 1067.5.3 The Load Controller . . . . . . . . . . . . . . . . . . 1077.5.4 Generating L1 I-cache Misses . . . . . . . . . . . . . 110

    Contents xix

    7.5.5 Generating L1 and L2 Data Cache Misses . . . . . . . 1107.5.6 Experimental Setup . . . . . . . . . . . . . . . . . . . 111

    7.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1117.6.1 Running The Test Application With The Load Generator1137.6.2 Production vs. Modeled Characteristics . . . . . . . . 1137.6.3 System Performance Measurement . . . . . . . . . . . 1167.6.4 Performance Prediction When Switching OS . . . . . 117

    7.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 1227.8 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 1247.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 126References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

    8 Automatic Message Compression with Overload Protection 1328.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

    8.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . 1368.2 Problem Formulation and System Model . . . . . . . . . . . . 1368.3 Adaption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

    8.3.1 The Communication Procedure . . . . . . . . . . . . 1408.3.2 Network Measurements . . . . . . . . . . . . . . . . 1418.3.3 Compression Measurements . . . . . . . . . . . . . . 1418.3.4 Selecting the Best Compression Algorithm . . . . . . 1428.3.5 Compression Throttling . . . . . . . . . . . . . . . . 143

    8.4 Test System Setup . . . . . . . . . . . . . . . . . . . . . . . . 1458.4.1 The Test System . . . . . . . . . . . . . . . . . . . . 1458.4.2 Compression Algorithms . . . . . . . . . . . . . . . . 1468.4.3 Putting it All Together . . . . . . . . . . . . . . . . . 1488.4.4 Real-World Compression Throttling . . . . . . . . . . 150

    8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1518.5.1 Automatic Compression . . . . . . . . . . . . . . . . 1518.5.2 Algorithm Selection Methods . . . . . . . . . . . . . 1538.5.3 Automatic Algorithm Selection for Changing Message

    Streams . . . . . . . . . . . . . . . . . . . . . . . . . 1548.5.4 Overload Handling . . . . . . . . . . . . . . . . . . . 156

    8.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 1578.7 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 1608.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 161References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

  • xviii Contents

    3.5 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . 473.5.1 Construct Validity . . . . . . . . . . . . . . . . . . . 483.5.2 Internal Validity . . . . . . . . . . . . . . . . . . . . 483.5.3 Conclusion Validity . . . . . . . . . . . . . . . . . . . 493.5.4 Method Applicability . . . . . . . . . . . . . . . . . . 49

    4 Contributions 534.1 Publication Mapping . . . . . . . . . . . . . . . . . . . . . . 544.2 Publication Hierarchy and Timeline . . . . . . . . . . . . . . 554.3 Paper A (Based on Papers C, E and I) . . . . . . . . . . . . . 564.4 Paper B (Based on Paper G) . . . . . . . . . . . . . . . . . . 57

    5 Related Work 615.1 System Monitoring . . . . . . . . . . . . . . . . . . . . . . . 625.2 System Modeling . . . . . . . . . . . . . . . . . . . . . . . . 635.3 Message and Data Compression . . . . . . . . . . . . . . . . 655.4 Adaptive Compression . . . . . . . . . . . . . . . . . . . . . 66

    6 Conclusion and Future Work 716.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 726.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    Bibliography 74

    II Included Papers 89

    7 Automatic Load Synthesis for Performance Verification in EarlyDesign Phases 937.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 957.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 967.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    7.3.1 Method Details . . . . . . . . . . . . . . . . . . . . . 1017.4 Target System . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    7.4.1 Target System Details . . . . . . . . . . . . . . . . . 1037.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 104

    7.5.1 The Characteristics Monitor . . . . . . . . . . . . . . 1057.5.2 The CPI Stack . . . . . . . . . . . . . . . . . . . . . 1067.5.3 The Load Controller . . . . . . . . . . . . . . . . . . 1077.5.4 Generating L1 I-cache Misses . . . . . . . . . . . . . 110

    Contents xix

    7.5.5 Generating L1 and L2 Data Cache Misses . . . . . . . 1107.5.6 Experimental Setup . . . . . . . . . . . . . . . . . . . 111

    7.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1117.6.1 Running The Test Application With The Load Generator1137.6.2 Production vs. Modeled Characteristics . . . . . . . . 1137.6.3 System Performance Measurement . . . . . . . . . . . 1167.6.4 Performance Prediction When Switching OS . . . . . 117

    7.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 1227.8 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 1247.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 126References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

    8 Automatic Message Compression with Overload Protection 1328.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

    8.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . 1368.2 Problem Formulation and System Model . . . . . . . . . . . . 1368.3 Adaption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

    8.3.1 The Communication Procedure . . . . . . . . . . . . 1408.3.2 Network Measurements . . . . . . . . . . . . . . . . 1418.3.3 Compression Measurements . . . . . . . . . . . . . . 1418.3.4 Selecting the Best Compression Algorithm . . . . . . 1428.3.5 Compression Throttling . . . . . . . . . . . . . . . . 143

    8.4 Test System Setup . . . . . . . . . . . . . . . . . . . . . . . . 1458.4.1 The Test System . . . . . . . . . . . . . . . . . . . . 1458.4.2 Compression Algorithms . . . . . . . . . . . . . . . . 1468.4.3 Putting it All Together . . . . . . . . . . . . . . . . . 1488.4.4 Real-World Compression Throttling . . . . . . . . . . 150

    8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1518.5.1 Automatic Compression . . . . . . . . . . . . . . . . 1518.5.2 Algorithm Selection Methods . . . . . . . . . . . . . 1538.5.3 Automatic Algorithm Selection for Changing Message

    Streams . . . . . . . . . . . . . . . . . . . . . . . . . 1548.5.4 Overload Handling . . . . . . . . . . . . . . . . . . . 156

    8.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 1578.7 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 1608.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 161References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

  • I

    Thesis

    3

  • I

    Thesis

    3

  • More and better collaboration between academia and the softwareindustry is an important means of achieving the goals of morestudies with high quality and relevance and better transfer of re-search results.

    — D. Sjøberg, T. Dybå , M. Jørgensen [111]

  • More and better collaboration between academia and the softwareindustry is an important means of achieving the goals of morestudies with high quality and relevance and better transfer of re-search results.

    — D. Sjøberg, T. Dybå , M. Jørgensen [111]

  • 1Introduction

    WE have investigated how to improve the communication performanceof a large-scale telecommunication system [13] with a major marketshare [121]. Our most important driving force is the ever increasingdemand for higher communication capacity. Mobile operators are compelledto make significant investments in more efficient and powerful telecommuni-cation equipment to meet the requests from end-users. As a telecommunica-tion equipment manufacturer, it is getting increasingly important to enhancethe system performance continuously, both for current implementations andby developing new infrastructure. We describe the findings from our work onincreasing the capacity of a large-scale telecommunication system. We havefocused on two ways to improve the communication performance.

    The first improvement area investigated by us is how to achieve higher sys-tem capacity by increasing the release rate for new software and hardware. Ourmethod is to reduce the development time by running performance verificationearlier in the development process. Many development processes do perfor-mance verification at the end of the development phase. Our suggestion is tomonitor the hardware characteristics of production systems, Section 1.1, andthen synthesize a hardware usage model, Section 1.2. By using this model, itis possible to test a large part of the performance of newly developed softwareduring the design phase, thus reducing the total development time.

    As a second improvement area, we have designed, implemented and useda characteristics measurement tool to systematically monitor and improve theperformance of selected subsystems. In this thesis, we have addressed one per-formance problem where we have reduced the round-trip message time throughselective message compression, Section 1.3.

    7

  • 1Introduction

    WE have investigated how to improve the communication performanceof a large-scale telecommunication system [13] with a major marketshare [121]. Our most important driving force is the ever increasingdemand for higher communication capacity. Mobile operators are compelledto make significant investments in more efficient and powerful telecommuni-cation equipment to meet the requests from end-users. As a telecommunica-tion equipment manufacturer, it is getting increasingly important to enhancethe system performance continuously, both for current implementations andby developing new infrastructure. We describe the findings from our work onincreasing the capacity of a large-scale telecommunication system. We havefocused on two ways to improve the communication performance.

    The first improvement area investigated by us is how to achieve higher sys-tem capacity by increasing the release rate for new software and hardware. Ourmethod is to reduce the development time by running performance verificationearlier in the development process. Many development processes do perfor-mance verification at the end of the development phase. Our suggestion is tomonitor the hardware characteristics of production systems, Section 1.1, andthen synthesize a hardware usage model, Section 1.2. By using this model, itis possible to test a large part of the performance of newly developed softwareduring the design phase, thus reducing the total development time.

    As a second improvement area, we have designed, implemented and useda characteristics measurement tool to systematically monitor and improve theperformance of selected subsystems. In this thesis, we have addressed one per-formance problem where we have reduced the round-trip message time throughselective message compression, Section 1.3.

    7

  • 8 Chapter 1. Introduction

    1.1 Monitoring a Production System

    We have implemented a characteristics monitoring tool aimed for running atcustomer sites. Our goal with the monitoring tool was to get a better under-standing of real-world systems by sampling hardware (HW) characteristics.

    Our monitor samples HW events from the CPU or any other low-level HWcomponents. We have grouped these events into sets that represent a certaintype of behavior, for example, cache-usage, TLB-usage, cycles per instruction.

    Running a monitoring tool in a production environment pose special re-strictions and requirements such as:

    • It must be possible to run the monitor simultaneously to the productionsystem.

    • The monitor must have a low probe-effect [43] since it is not allowed toaffect the behavior and performance of production system.

    • The monitor must be able to capture long time intervals because the sys-tem behavior changes slowly depending on end-customer usage.

    We have addressed the production environment constraints by being very re-strictive when implementing the monitoring application. First, we implementedour application as simple as possible. It is vital that no undesired behavior orfaults occur when running in a sensitive environment. Secondly, we have cho-sen a low HW event sample frequency (1Hz) to reduce the probe effect. Thesampling frequency is sufficient for the slowly changing behavior of our targetsystem.

    1.2 Modeling a Production System

    We have devised a method that automatically synthesize an HW characteristicsmodel from data obtained by the monitoring tool, see Section 1.1. The modelcan replicate the HW usage of the production system.

    Our goal was to create an improved test suite consisting of an HW charac-teristics model together with a functional test suite. Our assumption was thata test suite covering both the functional- and the characteristics perspectiveshould improve testing in the early stages of system development. Improvingthe test suite should also make it possible to discover, primarily performancerelated, bugs earlier in the development process. Finding bugs in the early

    1.3 Improving the Communication System 9

    design phases adheres well to the desire of reducing the total system develop-ment time since bug-fixing becomes much more difficult and time-consumingfurther from the introduction of the bug.

    Our method uses a Proportional Integrative Derivative (PID) controller [12]to synthesize automatically the model from the HW characteristics data ob-tained through our monitoring tool. No manual intervention is needed. Theoverall method is generic and supports any hardware characteristics. The sys-tem we have investigated is IO-bound and mostly limited by cache and memorybandwidth. We have implemented one PID-control loop per characteristics en-tity. In our model, we have used L1-Instruction, L1-Data and L2-Data cacheusage to represent the behavior of the system.

    We have evaluated our monitoring and modeling method by synthesizinga model for L1 Instruction-, L1 Data-, and L2 Data cache misses according tothe hardware characteristics extracted from a running production system. Wehave successfully tested the model on a test node together with an unmodi-fied functional test suite. Our experiments show that using our characteristicsmodel during the test of a production system bug fix causes the detected mes-sage round-trip time to increase by 10.8%. Using the traditional performancemeasurement tests results in a 0.75% RTT increase, which may be a too smallchange to be detectable in an automated test suite.

    1.3 Improving the Communication System

    We have contrived and implemented a mechanism to automatically find anduse a compression algorithm that provides the shortest message Round-TripTime (RTT).

    Our goal, when performing this work, was to improve the communicationperformance of our target system. We had already implemented the monitoringtool, Section 1.1, and the characteristics model, Section 1.2 and could use thesetools for performance measurements.

    We added a software metric to our monitoring tool, measuring messageRTT. We could deduce that 1) The message RTT varied depending on the net-work congestion levels and 2) The hardware usage varied but was relatively lowin certain conditions. Our assumption was that we could trade computationalcapacity for an increased messaging capacity by using message compression.We defined some critical considerations such as:

  • 8 Chapter 1. Introduction

    1.1 Monitoring a Production System

    We have implemented a characteristics monitoring tool aimed for running atcustomer sites. Our goal with the monitoring tool was to get a better under-standing of real-world systems by sampling hardware (HW) characteristics.

    Our monitor samples HW events from the CPU or any other low-level HWcomponents. We have grouped these events into sets that represent a certaintype of behavior, for example, cache-usage, TLB-usage, cycles per instruction.

    Running a monitoring tool in a production environment pose special re-strictions and requirements such as:

    • It must be possible to run the monitor simultaneously to the productionsystem.

    • The monitor must have a low probe-effect [43] since it is not allowed toaffect the behavior and performance of production system.

    • The monitor must be able to capture long time intervals because the sys-tem behavior changes slowly depending on end-customer usage.

    We have addressed the production environment constraints by being very re-strictive when implementing the monitoring application. First, we implementedour application as simple as possible. It is vital that no undesired behavior orfaults occur when running in a sensitive environment. Secondly, we have cho-sen a low HW event sample frequency (1Hz) to reduce the probe effect. Thesampling frequency is sufficient for the slowly changing behavior of our targetsystem.

    1.2 Modeling a Production System

    We have devised a method that automatically synthesize an HW characteristicsmodel from data obtained by the monitoring tool, see Section 1.1. The modelcan replicate the HW usage of the production system.

    Our goal was to create an improved test suite consisting of an HW charac-teristics model together with a functional test suite. Our assumption was thata test suite covering both the functional- and the characteristics perspectiveshould improve testing in the early stages of system development. Improvingthe test suite should also make it possible to discover, primarily performancerelated, bugs earlier in the development process. Finding bugs in the early

    1.3 Improving the Communication System 9

    design phases adheres well to the desire of reducing the total system develop-ment time since bug-fixing becomes much more difficult and time-consumingfurther from the introduction of the bug.

    Our method uses a Proportional Integrative Derivative (PID) controller [12]to synthesize automatically the model from the HW characteristics data ob-tained through our monitoring tool. No manual intervention is needed. Theoverall method is generic and supports any hardware characteristics. The sys-tem we have investigated is IO-bound and mostly limited by cache and memorybandwidth. We have implemented one PID-control loop per characteristics en-tity. In our model, we have used L1-Instruction, L1-Data and L2-Data cacheusage to represent the behavior of the system.

    We have evaluated our monitoring and modeling method by synthesizinga model for L1 Instruction-, L1 Data-, and L2 Data cache misses according tothe hardware characteristics extracted from a running production system. Wehave successfully tested the model on a test node together with an unmodi-fied functional test suite. Our experiments show that using our characteristicsmodel during the test of a production system bug fix causes the detected mes-sage round-trip time to increase by 10.8%. Using the traditional performancemeasurement tests results in a 0.75% RTT increase, which may be a too smallchange to be detectable in an automated test suite.

    1.3 Improving the Communication System

    We have contrived and implemented a mechanism to automatically find anduse a compression algorithm that provides the shortest message Round-TripTime (RTT).

    Our goal, when performing this work, was to improve the communicationperformance of our target system. We had already implemented the monitoringtool, Section 1.1, and the characteristics model, Section 1.2 and could use thesetools for performance measurements.

    We added a software metric to our monitoring tool, measuring messageRTT. We could deduce that 1) The message RTT varied depending on the net-work congestion levels and 2) The hardware usage varied but was relatively lowin certain conditions. Our assumption was that we could trade computationalcapacity for an increased messaging capacity by using message compression.We defined some critical considerations such as:

  • 10 Chapter 1. Introduction

    • The compression algorithm must be selected automatically because themessage content can change over time and depend on the location ofsystem deployment.

    • Our mechanism should only use message compression if there are com-putational resources to spare since other co-located services should notstarve.

    • Our mechanism must handle overload situations with grace and messagecompression can be resumed when the system has returned to normaloperation.

    Our implementation automatically selects the most efficient compressionalgorithm depending on the current message content, CPU-load and networkcongestion level. We have evaluated our implementation by using productionsystem communication data gathered at customer sites and replayed it in a lab(with explicit customer concent). Our experiment shows that the automaticcompression mechanism produces a 9.6% reduction in RTT and that it is re-silient to manually induced overload situations.

    1.4 OutlineThe thesis consists of two major parts. The first part puts our research into itscontext and explains the method we have used. The second part contains thescientific papers covered in the thesis.

    Part I starts at Chapter 1 with an introduction to performance benchmarkingand modeling of hardware behavior of industrial systems. The thesis continuesin Chapter 2 with further explanations of our target system. We describe stan-dards and functionality supported by the telecommunication system we haveinvestigated. We also describe system setup, design, and structure.

    In Chapter 3 we give a detailed summary of our research problems, re-search questions, and research methodology. A summary of our contributionsis presented in Chapter 4. We further contextualize this thesis by reviewing re-lated work in Chapter 5. Chapter 6 concludes part I of the thesis by describingfindings and references to future work.

    Part II begins with Chapter 7 where Paper A describes how to monitor andmodel parts of a large scale industrial system. Chapter 8 includes Paper B thatdescribe how to improve the performance of a telecommunication system byusing online message compression.

  • 10 Chapter 1. Introduction

    • The compression algorithm must be selected automatically because themessage content can change over time and depend on the location ofsystem deployment.

    • Our mechanism should only use message compression if there are com-putational resources to spare since other co-located services should notstarve.

    • Our mechanism must handle overload situations with grace and messagecompression can be resumed when the system has returned to normaloperation.

    Our implementation automatically selects the most efficient compressionalgorithm depending on the current message content, CPU-load and networkcongestion level. We have evaluated our implementation by using productionsystem communication data gathered at customer sites and replayed it in a lab(with explicit customer concent). Our experiment shows that the automaticcompression mechanism produces a 9.6% reduction in RTT and that it is re-silient to manually induced overload situations.

    1.4 OutlineThe thesis consists of two major parts. The first part puts our research into itscontext and explains the method we have used. The second part contains thescientific papers covered in the thesis.

    Part I starts at Chapter 1 with an introduction to performance benchmarkingand modeling of hardware behavior of industrial systems. The thesis continuesin Chapter 2 with further explanations of our target system. We describe stan-dards and functionality supported by the telecommunication system we haveinvestigated. We also describe system setup, design, and structure.

    In Chapter 3 we give a detailed summary of our research problems, re-search questions, and research methodology. A summary of our contributionsis presented in Chapter 4. We further contextualize this thesis by reviewing re-lated work in Chapter 5. Chapter 6 concludes part I of the thesis by describingfindings and references to future work.

    Part II begins with Chapter 7 where Paper A describes how to monitor andmodel parts of a large scale industrial system. Chapter 8 includes Paper B thatdescribe how to improve the performance of a telecommunication system byusing online message compression.

  • I believe that many events in my work and life have been a matterof luck or accident. But I am also aware of several occasions onwhich I explicitly made choices to step off the obvious path, anddo something that others thought odd or worse. . . I have come tothink of these events as ’detours’ from the obvious career pathsstretching before me. Frequently these detours have become themain road for me. There are obvious costs to such detours. Otherchoices might have made me richer, more influential, more famous,more productive, and so on. But I like what I am doing, eventhough the path has involved a lot of wandering through unchartedterritory.

    — L.D. Brown1

    1Quoted from the book by M. Brydon-Miller, D. Greenwood and P. Maguire [20]

  • I believe that many events in my work and life have been a matterof luck or accident. But I am also aware of several occasions onwhich I explicitly made choices to step off the obvious path, anddo something that others thought odd or worse. . . I have come tothink of these events as ’detours’ from the obvious career pathsstretching before me. Frequently these detours have become themain road for me. There are obvious costs to such detours. Otherchoices might have made me richer, more influential, more famous,more productive, and so on. But I like what I am doing, eventhough the path has involved a lot of wandering through unchartedterritory.

    — L.D. Brown1

    1Quoted from the book by M. Brydon-Miller, D. Greenwood and P. Maguire [20]

  • 2Background

    IN this chapter, we will further describe our target system. We start by list-ing telecommunication standards, Section 2.1, and how they relate to cur-rent and future telecommunication services, Section 2.2. The platform wehave worked with supports various standards spanning from 2G (GSM) via 3G(UMTS, WCDMA) and 4G (LTE) and further towards the current 5G standard.The main driver for new communication standards is the growing demand forhigher communication bandwidth. Both traffic applications and remote controlof equipment require low message latency and power efficient communication.

    We continue, in Section 2.3, by defining our view of large-scale industrialsystems. Such systems have common attributes such as 1) low acceptance forsystem faults, 2) many simultaneously deployed software and hardware gen-erations within one system, 3) long lifetime spanning several decades, 4) verylarge size and complexity, and 5) continuous development over the completesystem lifespan.

    Section 2.4 illustrates our production system, which is an example of alarge-scale industrial system. We show several deployment scenarios and theeffect on system complexity. A complete production system spans from singlecircuit boards with one CPU up to multiple circuit boards with a total of severalthousand of CPU’s.

    We conclude this chapter, Section 2.5, with a detailed description of ourtarget system. The system we have investigated has a layered structure usingmany different programming languages and has continuously been developedduring several decades. It is a very large system that is fault-tolerant with highrequirements on uptime and robustness.

    15

  • 2Background

    IN this chapter, we will further describe our target system. We start by list-ing telecommunication standards, Section 2.1, and how they relate to cur-rent and future telecommunication services, Section 2.2. The platform wehave worked with supports various standards spanning from 2G (GSM) via 3G(UMTS, WCDMA) and 4G (LTE) and further towards the current 5G standard.The main driver for new communication standards is the growing demand forhigher communication bandwidth. Both traffic applications and remote controlof equipment require low message latency and power efficient communication.

    We continue, in Section 2.3, by defining our view of large-scale industrialsystems. Such systems have common attributes such as 1) low acceptance forsystem faults, 2) many simultaneously deployed software and hardware gen-erations within one system, 3) long lifetime spanning several decades, 4) verylarge size and complexity, and 5) continuous development over the completesystem lifespan.

    Section 2.4 illustrates our production system, which is an example of alarge-scale industrial system. We show several deployment scenarios and theeffect on system complexity. A complete production system spans from singlecircuit boards with one CPU up to multiple circuit boards with a total of severalthousand of CPU’s.

    We conclude this chapter, Section 2.5, with a detailed description of ourtarget system. The system we have investigated has a layered structure usingmany different programming languages and has continuously been developedduring several decades. It is a very large system that is fault-tolerant with highrequirements on uptime and robustness.

    15

  • 16 Chapter 2. Background

    Telecom.Standard

    Max DownLink Speed

    FirstIntrod.

    Main Features

    1G (NMT,C-Nets,AMPS,TACS)

    - Early1980

    Several different analog stan-dards for mobile voice tele-phony.

    2G (GSM) 14.4kbit/scircuit switched,22.8kbit/spacket data [45]

    1991 The first mobile phone networkusing digital radio. Introducedservices such as SMS.

    → GPRS 30–100kbit/s 2000 Increased bandwidth over GSM.

    → EDGE 236,8 kbit/s 2003 Increased bandwidth over GSM-GPRS.

    3G(UMTS,WCDMA)

    384kbit/s 2001 Mobile music and other typesof smart-phone apps started tobe used through more advancedsmart-phones, which changedawareness and increased com-munication bandwidth.

    → HSPA 14.4–672Mbit/s [90] 2010 Increased bandwidth over 3G.

    4G (LTE) 100Mbit/s–1Gbit/ 2009 Mobile video.

    5G 1Gbit/s to many userssimultaneously

    2018 Massive deployment of highbandwidth to mobile users,smart homes, high definitionvideo transmission.

    Table 2.1: The most important telecommunication standards and their commu-nication bandwidth linked to the main features introduced by the standard..

    2.1 Telecommunication Standards 17

    2.1 Telecommunication StandardsTelecommunication systems are complex because they implement several com-munication standards. Standards define how systems should interact and is afundamental tool when connecting different manufacturer’s systems. The stan-dards continuously evolve to reflect customer demands, which drive equipmentmanufacturer to continually develop new features and system improvements.Several standards execute concurrently for efficiency reasons. See Table 2.1for a list of telecommunication standards and their main features.

    Groupe Spécial Mobile (GSM) [120] (2G) was introduced in 1991 and pro-vided the second generation of mobile communication. It was the first commer-cial and widely available mobile communication system that supported digitalcommunication [97]. Needless to say, the GSM system was an astonishingcommercial success with 1B subscribers in 2002 [123] and 3.5B [52] in 2009.The introduction of GSM changed the way people communicate by allowing asignificant portion of the population in industrialized countries to use mobilephones. Several extensions to the GSM standard, GPRS, and EDGE, furtherincreased the communication bandwidth, thus allowing the implementation ofeven more complex services.

    In 2001, the third generation (3G) standard was introduced as a responseto customer demands for further increased bandwidth. The 3G standard is alsoknown as Universal Mobile Telecommunication System (UMTS).

    A fourth increment (4G) of the telecommunication standard, also calledLong Term Evolution (LTE) [61], was introduced to the market in 2009. At thispoint, a large part of the industrialized world had adapted the “always-online”paradigm. The society, as a whole, looks favorably on mobile broadband andsocial networking services [62] demanding higher capacity in the telecommu-nication infrastructure.

    Today, in 2016, we are standing on the brink of the next telecommunicationstandard to be implemented (5G). It is estimated to be released to the market in2020 with substantial improvements compared to LTE [14]. The first improve-ment is a massive increase in bandwidth when there are many simultaneoususers. A drastically reduced latency (below 1ms) is needed to support trafficsafety and industrial infrastructure processes [36]. There is also an increasingdemand for a reduction of energy consumption [21] so that it is environmen-tally friendly [37], while also making it possible to install network nodes inremote places [38] with scarce power supply.

  • 16 Chapter 2. Background

    Telecom.Standard

    Max DownLink Speed

    FirstIntrod.

    Main Features

    1G (NMT,C-Nets,AMPS,TACS)

    - Early1980

    Several different analog stan-dards for mobile voice tele-phony.

    2G (GSM) 14.4kbit/scircuit switched,22.8kbit/spacket data [45]

    1991 The first mobile phone networkusing digital radio. Introducedservices such as SMS.

    → GPRS 30–100kbit/s 2000 Increased bandwidth over GSM.

    → EDGE 236,8 kbit/s 2003 Increased bandwidth over GSM-GPRS.

    3G(UMTS,WCDMA)

    384kbit/s 2001 Mobile music and other typesof smart-phone apps started tobe used through more advancedsmart-phones, which changedawareness and increased com-munication bandwidth.

    → HSPA 14.4–672Mbit/s [90] 2010 Increased bandwidth over 3G.

    4G (LTE) 100Mbit/s–1Gbit/ 2009 Mobile video.

    5G 1Gbit/s to many userssimultaneously

    2018 Massive deployment of highbandwidth to mobile users,smart homes, high definitionvideo transmission.

    Table 2.1: The most important telecommunication standards and their commu-nication bandwidth linked to the main features introduced by the standard..

    2.1 Telecommunication Standards 17

    2.1 Telecommunication StandardsTelecommunication systems are complex because they implement several com-munication standards. Standards define how systems should interact and is afundamental tool when connecting different manufacturer’s systems. The stan-dards continuously evolve to reflect customer demands, which drive equipmentmanufacturer to continually develop new features and system improvements.Several standards execute concurrently for efficiency reasons. See Table 2.1for a list of telecommunication standards and their main features.

    Groupe Spécial Mobile (GSM) [120] (2G) was introduced in 1991 and pro-vided the second generation of mobile communication. It was the first commer-cial and widely available mobile communication system that supported digitalcommunication [97]. Needless to say, the GSM system was an astonishingcommercial success with 1B subscribers in 2002 [123] and 3.5B [52] in 2009.The introduction of GSM changed the way people communicate by allowing asignificant portion of the population in industrialized countries to use mobilephones. Several extensions to the GSM standard, GPRS, and EDGE, furtherincreased the communication bandwidth, thus allowing the implementation ofeven more complex services.

    In 2001, the third generation (3G) standard was introduced as a responseto customer demands for further increased bandwidth. The 3G standard is alsoknown as Universal Mobile Telecommunication System (UMTS).

    A fourth increment (4G) of the telecommunication standard, also calledLong Term Evolution (LTE) [61], was introduced to the market in 2009. At thispoint, a large part of the industrialized world had adapted the “always-online”paradigm. The society, as a whole, looks favorably on mobile broadband andsocial networking services [62] demanding higher capacity in the telecommu-nication infrastructure.

    Today, in 2016, we are standing on the brink of the next telecommunicationstandard to be implemented (5G). It is estimated to be released to the market in2020 with substantial improvements compared to LTE [14]. The first improve-ment is a massive increase in bandwidth when there are many simultaneoususers. A drastically reduced latency (below 1ms) is needed to support trafficsafety and industrial infrastructure processes [36]. There is also an increasingdemand for a reduction of energy consumption [21] so that it is environmen-tally friendly [37], while also making it possible to install network nodes inremote places [38] with scarce power supply.

  • 18 Chapter 2. Background

    0

    2

    4

    6

    8

    10

    12

    2010

    2011

    2012

    2013

    2014

    2015

    2016

    2017

    2018

    2019

    Traffi

    c [E

    xaBy

    tes]

    Voice CommunicationMobile Phone DataMobile Computer Data

    (a) Voice and data traffic.

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    20

    2010

    2011

    2012

    2013

    2014

    2015

    2016

    2017

    2018

    2019

    Traffi

    c [E

    xaBy

    tes]

    SumVideoAudio,Web,File sharing,Social Networking Services

    (b) Mobile application traffic.

    Figure 2.1: World-wide market outlook for mobile traffic 2010 – 2019 [34].Previously published in Paper H [62].

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Jan-

    2008

    Jul-2

    008

    Jan-

    2009

    Jul-2

    009

    Jan-

    2010

    Jul-2

    010

    Jan-

    2011

    Jul-2

    011

    Jan-

    2012

    Jul-2

    012

    Jan-

    2013

    Jul-2

    013

    Jan-

    2014

    Jul-2

    014

    0

    20

    40

    60

    80

    100

    Avai

    labl

    e Ap

    ps [M

    x#]

    Dow

    nloa

    ded

    Apps

    [Bx#

    ]Available Apple AppsAvailable Google AppsApple DownloadsGoogle Downloads

    Figure 2.2: Download-statistics for mobile phone application [1,9,10,59,118].Previously published in Paper H [62].

    2.2 Telecommunication Services 19

    2.2 Telecommunication ServicesThe introduction of mobile phones quicky made voice communication the mostimportant service. It was the natural way to extend the already existing wirebound voice service into the mobile era. Voice services have now reached itspeak from a capacity perspective [34], see Figure 2.1a. It is also apparent thatdata communication is rapidly increasing for both mobile phones and mobilecomputers. A report [35] by Ericsson Consumer Lab attributes the increaseddata usage to three main usage areas:

    • Streaming services are quickly gaining acceptance among the populationand include on-demand services such as music, pay-per-view TV andmovies. Ericsson estimates that mobile video will be one of the mostrequested services in the coming years (2010–2019), see Figure 2.1b.

    • Home appliance monitoring is increasing rapidly. For example waterflood monitoring, heat and light control, refrigerator warning systems,coffee-machine refill sensors, entry and leave detection and much more.

    • Data usage are expected to increase further at a rapid pace with the useof Information Communication Technology (ICT) devices such as mobilephones, watches, tablets and laptops. There is a common acceptance touse ICT devices for a large portion of daily activities [24] such as banktransactions, purchases, navigation, etc. The use of devices is expectedto further increase the utilization of telecommunication networks [129].The extraordinary increase in download rate of mobile apps indicates theacceptance of mobile usage among people, see Figure 2.2.

    • Vehicle communication to support self-driving cars [36] and automatedvehicle fleet management [37].

    • Reduced network latency is needed to implement Industrial infrastruct-ure [36] operations over wireless networks.

    The overall increase in geographical and population coverage paired with newservices, such as the ones described above, will contribute to an enormousgrowth in mobile data traffic. The geographical coverage is in 2014 mainly fo-cused on Europe and USA with Asia, mainly India and China, quickly catchingup and surpassing [37]. In 2015 there were approx. 7.4(3.4)1 billion mobilesubscribers world-wide and it is estimated that there will be 9.1(6.4) billionsubscriptions by 2021 [37]. Increasing both geographical and population cov-erage causes an unprecedented change in global mobile data usage, which iscurrently one of the biggest challenges for network operators.

    1The number of advanced smartphone subscriptions in parenthesis.

  • 18 Chapter 2. Background

    0

    2

    4

    6

    8

    10

    12

    2010

    2011

    2012

    2013

    2014

    2015

    2016

    2017

    2018

    2019

    Traffi

    c [E

    xaBy

    tes]

    Voice CommunicationMobile Phone DataMobile Computer Data

    (a) Voice and data traffic.

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    20

    2010

    2011

    2012

    2013

    2014

    2015

    2016

    2017

    2018

    2019

    Traffi

    c [E

    xaBy

    tes]

    SumVideoAudio,Web,File sharing,Social Networking Services

    (b) Mobile application traffic.

    Figure 2.1: World-wide market outlook for mobile traffic 2010 – 2019 [34].Previously published in Paper H [62].

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Jan-

    2008

    Jul-2

    008

    Jan-

    2009

    Jul-2

    009

    Jan-

    2010

    Jul-2

    010

    Jan-

    2011

    Jul-2

    011

    Jan-

    2012

    Jul-2

    012

    Jan-

    2013

    Jul-2

    013

    Jan-

    2014

    Jul-2

    014

    0

    20

    40

    60

    80

    100

    Avai

    labl

    e Ap

    ps [M

    x#]

    Dow

    nloa

    ded

    Apps

    [Bx#

    ]Available Apple AppsAvailable Google AppsApple DownloadsGoogle Downloads

    Figure 2.2: Download-statistics for mobile phone application [1,9,10,59,118].Previously published in Paper H [62].

    2.2 Telecommunication Services 19

    2.2 Telecommunication ServicesThe introduction of mobile phones quicky made voice communication the mostimportant service. It was the natural way to extend the already existing wirebound voice service into the mobile era. Voice services have now reached itspeak from a capacity perspective [34], see Figure 2.1a. It is also apparent thatdata communication is rapidly increasing for both mobile phones and mobilecomputers. A report [35] by Ericsson Consumer Lab attributes the increaseddata usage to three main usage areas:

    • Streaming services are quickly gaining acceptance among the populationand include on-demand services such as music, pay-per-view TV andmovies. Ericsson estimates that mobile video will be one of the mostrequested services in the coming years (2010–2019), see Figure 2.1b.

    • Home appliance monitoring is increasing rapidly. For example waterflood monitoring, heat and light control, refrigerator warning systems,coffee-machine refill sensors, entry and leave detection and much more.

    • Data usage are expected to increase further at a rapid pace with the useof Information Communication Technology (ICT) devices such as mobilephones, watches, tablets and laptops. There is a common acceptance touse ICT devices for a large portion of daily activities [24] such as banktransactions, purchases, navigation, etc. The use of devices is expectedto further increase the utilization of telecommunication networks [129].The extraordinary increase in download rate of mobile apps indicates theacceptance of mobile usage among people, see Figure 2.2.

    • Vehicle communication to support self-driving cars [36] and automatedvehicle fleet management [37].

    • Reduced network latency is needed to implement Industrial infrastruct-ure [36] operations over wireless networks.

    The overall increase in geographical and population coverage paired with newservices, such as the ones described above, will contribute to an enormousgrowth in mobile data traffic. The geographical coverage is in 2014 mainly fo-cused on Europe and USA with Asia, mainly India and China, quickly catchingup and surpassing [37]. In 2015 there were approx. 7.4(3.4)1 billion mobilesubscribers world-wide and it is estimated that there will be 9.1(6.4) billionsubscriptions by 2021 [37]. Increasing both geographical and population cov-erage causes an unprecedented change in global mobile data usage, which iscurrently one of the biggest challenges for network operators.

    1The number of advanced smartphone subscriptions in parenthesis.

  • 20 Chapter 2. Background

    Node

    Node

    Node

    Node

    Node

    Node Node

    Node

    NodeNode

    Node

    Node

    Node

    Inte

    rfac

    e

    Interface

    Stan

    dard

    ized

    Stan

    dard

    ized

    Inte

    rfac

    e

    Stan

    dard

    ized

    Industrial System

    InterfacesInternal

    Internet

    Standardized

    Interface

    Other Industrial System

    Figure 2.3: Industrial systems interacts with surrounding systems using stan-dardized interfaces. We have concentrated on node-internal characteristics andperformance improvements for internal interfaces.

    2.3 Industrial Systems

    The system we have targeted and also performed our experiments upon is anexecution platform handling several generations of telecommunication stan-dards. The platform has been developed by Ericsson is called Cello or Connec-tivity Packet Platform [3,76] (CPP). The platform is generic and supports manyexisting communication standards [28], including 3G and LTE. The telecom-munication system we have investigated in this thesis shares similar proper-ties with other large-scale industrial systems. We believe that other systemsalso can use our research results since they share a similar system structureand behavior. A simplified view of the telecommunication system we haveinvestigated is shown in Figure 2.3. The system distributes over many com-

    2.3 Industrial Systems 21

    puters, denoted nodes. Internal nodes that implement a subset of the systemfunctionality does not necessarily use standardized communication protocols.Performance improvements can, therefore, be achieved using proprietary pro-tocol implementations. Standardized communication is, of course, necessaryfor external communication. We have defined [54] behavioral patterns that arecommon to industrial and telecommunication systems, for example:

    • There is a low acceptance for system downtime.• There are multiple concurrent hardware and software generations.• The lifetime spans over several decades.• The size and system complexity causes long lead-times when developing

    new functionality.

    • Substantial internal communication between nodes inside the industrialsystem. External connections are often using standardized protocols, forexample 3GPP for telecommunication systems, Figure 2.3.

    We have tried to generalize our research results as far as possible. In general,our research results should be applicable for many other systems sharing thesame structure and behavior as the telecommunication system we have investi-gated. Some industrial systems are located in large server facilities, providingeasy access for engineers and scientists. Other industrial systems are locatedin “friendly” places where a support engineer can access them and extract anyinformation needed. Telecommunication systems are typically deployed in adifferent environment. Most network operators have their own infrastructurewhere the telecommunication nodes are located. Support and maintenance per-sonnel is often employed by the operator. In the rare cases when the operatorreceives support help from the equipment manufacturer, they are not given fullaccess to the nodes. Such restrictions makes it difficult to monitor hardwarecharacteristics for production nodes. Operators are traditionally very restric-tive towards running diagnostics, test programs or monitoring tools that are notverified as production level software.

    Physical access restrictions also make it vital to have adequate error han-dling that gathers enough information when a fault occurs. It is not possibleto retrieve additional troubleshooting information at a later time meaning thatall necessary information must be packaged together with the trouble report.The scenario of restricted node access is one aspect we have tried to address inthe work leading up to this thesis. System developers have always demandedhardware characteristics measurements for production nodes, but it has beenhard to obtain such information.

  • 20 Chapter 2. Background

    Node

    Node

    Node

    Node

    Node

    Node Node

    Node

    NodeNode

    Node

    Node

    Node

    Inte

    rfac

    e

    Interface

    Stan

    dard

    ized

    Stan

    dard

    ized

    Inte

    rfac

    e

    Stan

    dard

    ized

    Industrial System

    InterfacesInternal

    Internet

    Standardized

    Interface

    Other Industrial System

    Figure 2.3: Industrial systems interacts with surrounding systems using stan-dardized interfaces. We have concentrated on node-internal characteristics andperformance improvements for internal interfaces.

    2.3 Industrial Systems

    The system we have targeted and also performed our experiments upon is anexecution platform handling several generations of telecommunication stan-dards. The platform has been developed by Ericsson is called Cello or Connec-tivity Packet Platform [3,76] (CPP). The platform is generic and supports manyexisting communication standards [28], including 3G and LTE. The telecom-munication system we have investigated in this thesis shares similar proper-ties with other large-scale industrial systems. We believe that other systemsalso can use our research results since they share a similar system structureand behavior. A simplified view of the telecommunication system we haveinvestigated is shown in Figure 2.3. The system distributes over many com-

    2.3 Industrial Systems 21

    puters, denoted nodes. Internal nodes that implement a subset of the systemfunctionality does not necessarily use standardized communication protocols.Performance improvements can, therefore, be achieved using proprietary pro-tocol implementations. Standardized communication is, of course, necessaryfor external communication. We have defined [54] behavioral patterns that arecommon to industrial and telecommunication systems, for example:

    • There is a low acceptance for system downtime.• There are multiple concurrent hardware and software generations.• The lifetime spans over several decades.• The size and system complexity causes long lead-times when developing

    new functionality.

    • Substantial internal communication between nodes inside the industrialsystem. External connections are often using standardized protocols, forexample 3GPP for telecommunication systems, Figure 2.3.

    We have tried to generalize our research results as far as possible. In general,our research results should be applicable for many other systems sharing thesame structure and behavior as the telecommunication system we have investi-gated. Some industrial systems are located in large server facilities, providingeasy access for engineers and scientists. Other industrial systems are locatedin “friendly” places where a support engineer can access them and extract anyinformation needed. Telecommunication systems are typically deployed in adifferent environment. Most network operators have their own infrastructurewhere the telecommunication nodes are located. Support and maintenance per-sonnel is often employed by the operator. In the rare cases when the operatorreceives support help from the equipment manufacturer, they are not given fullaccess to the nodes. Such restrictions makes it difficult to monitor hardwarecharacteristics for production nodes. Operators are traditionally very restric-tive towards running diagnostics, test programs or monitoring tools that are notverified as production level software.

    Physical access restrictions also make it vital to have adequate error han-dling that gathers enough information when a fault occurs. It is not possibleto retrieve additional troubleshooting information at a later time meaning thatall necessary information must be packaged together with the trouble report.The scenario of restricted node access is one aspect we have tried to address inthe work leading up to this thesis. System developers have always demandedhardware characteristics measurements for production nodes, but it has beenhard to obtain such information.

  • 22 Chapter 2. Background

    Figure 2.4: Many circuit boards (to the left) are interconnected to form a cabi-net (to the right). Courtesy of Ericsson 2016.

    Figure 2.5: Several interconnected cabinets construct a large-scale telecommu-nication system. One node in Figure 2.3 can vary in size from a single circuitboard up to several cabinets. Courtesy of Ericsson 2016.

    2.4 Deploying Our Target System 23

    Figure 2.6: Complex lab test environment. Courtesy of Ericsson 2016.

    2.4 Deploying Our Target SystemThe physical layout of a telecommunication system is governed by strict rules.One cabinet, to the right in Figure 2.4, consists of three vertically mountedsub-racks. Each sub-rack holds up to 20 circuit boards, illustrated to the left inFigure 2.4. In total, a cabinet sums up to approximately 20 ∗ 3 = 60 circuitboards, depending on the desired configuration. Several cabinets can be con-nected to form a large-scale node, see Figure 2.5. Each circuit board can haveseveral CPUs with multiples of 10’s of cores each. In total the largest systemscan consists of thousands of CPU’s.

    It is possible to deploy the system in several different levels, which is par-ticularly useful for testing purposes. Running one board by itself provides themost basic level of system used for low-level testing. A slightly bigger systemis achieved when at least two boards are interconnected to form a small cluster.This level of system is useful for verifying cluster functionality. Much morecomplex testing scenarios can be formed by configuring larger nodes, suchas Figure 2.6. These type of nodes are seldom available for software designpurposes since they are very costly. Large-scale nodes are mainly used whentesting complex traffic scenarios and for performance related verification.

  • 22 Chapter 2. Background

    Figure 2.4: Many circuit boards (to the left) are interconnected to form a cabi-net (to the right). Courtesy of Ericsson 2016.

    Figure 2.5: Several interconnected cabinets construct a large-scale telecommu-nication system. One node in Figure 2.3 can vary in size from a single circuitboard up to several cabinets. Courtesy of Ericsson 2016.

    2.4 Deploying Our Target System 23

    Figure 2.6: Complex lab test environment. Courtesy of Ericsson 2016.

    2.4 Deploying Our Target SystemThe physical layout of a telecommunication system is governed by strict rules.One cabinet, to the right in Figure 2.4, consists of three vertically mountedsub-racks. Each sub-rack holds up to 20 circuit boards, illustrated to the left inFigure 2.4. In total, a cabinet sums up to approximately 20 ∗ 3 = 60 circuitboards, depending on the desired configuration. Several cabinets can be con-nected to form a large-scale node, see Figure 2.5. Each circuit board can haveseveral CPUs with multiples of 10’s of cores each. In total the largest systemscan consists of thousands of CPU’s.

    It is possible to deploy the system in several different levels, which is par-ticularly useful for testing purposes. Running one board by itself provides themost basic level of system used for low-level testing. A slightly bigger systemis achieved when at least two boards are interconnected to form a small cluster.This level of system is useful for verifying cluster functionality. Much morecomplex testing scenarios can be formed by configuring larger nodes, suchas Figure 2.6. These type of nodes are seldom available for software designpurposes since they are very costly. Large-scale nodes are mainly used whentesting complex traffic scenarios and for performance related verification.

  • 24 Chapter 2. Background

    3

    5

    4

    Lo

    gic

    Bu

    sin

    ess

    2