redes de sensores sem fio autonômicas: abordagens, aplicações e desafios

II Brazilian

Embedded Systems

PUC - Campinas - SP, BrazilMay 21 - 25, 2012

Promoted by: Support:

Conference on Critical

Organized by: Sponsors:

Published by:

II Critical Embedded Systems School (CES-School) II Brazilian Conference on Critical Embedded Systems (CBSEC 2012)

Preface

The II Brazilian Conference on Critical Embedded Systems (CBSEC 2012) aims to join

academy and industry to discuss major technical and practical issues in the development

of critical embedded systems. The first edition took place in May, 2011, in São Carlos

(Brazil).

In this second edition, the emphasis is on aerial and terrestrial autonomous vehicles. The

main objective is to boost the capabilities of the academy and industry in teaching,

training, researching and development in the area through papers presentation, short

courses, tutorials, a student workshop and an exhibition. A comprehensive display of

relevant scientific and technological tools, applications and methodologies with social

and economic impact in strategic areas such as agriculture, security and defense,

automotive, aviation, satellite and environment protection will be put together and

discussed from 20th to 25th of May, 2012, in Campinas (Brazil).

The II Critical Embedded Systems School (CES-School) is a joint event of the CBSEC.

In this edition, we received 12 short courses proposals from which four were selected for

presentation. In addition, two advanced courses and one international tutorial were

invited. All of them explore themes of interest for academics and professionals involved

with the development of critical embedded systems.

We thank the Pontifícia Universidade Católica de Campinas for hosting the second

edition of CES-School into CBSEC. Finally, we welcome the speakers and participants of

CES-School 2012. We wish everyone a great conference!

Ellen Francine Barbosa (ICMC/USP)

Itana Maria de Souza Gimenes (DIN/UEM)

CES-School 2012 Chairs

Table of Contents

Tutorial

Interaction Control for Contact Robotics

Neville Hogan (MIT-USA)

Invited Courses

The “Why” and “How” of Software Safety Analysis in a Cross-Domain Review

Sören Kemmann (Fraunhofer/IESE)

Model-Driven Engineering of Complex Embedded Systems: Concepts and Tools

Flávio R. Wagner, Francisco A. M. Nascimento, Marcio F. S. Oliveira

(UFRGS / University of Paderborn)

Short Courses

Introdução ao Desenvolvimento de Software Embarcado

Alexandra C. P. Aguiar, Sérgio J. Filho, Felipe G Magalhães, Fabiano P. Hessel

(PUC-RS)

Introdução a Sistemas Embarcados e Projeto baseado em Plataformas

Marcio S. Oyamada, Alexandre A. Giron, João A. Martini (UEM, UNIOESTE)

Introdução aos Sistemas Embarcados utilizando FPGAs

Edilson R. R. Kato, Emerson C. Pedrino (UFSCar)

Redes de Sensores sem Fio Autonômicas: Abordagens, Aplicações e Desafios

Alex S. R. Pinto, Gustavo M. Araújo, José M. Machado, Adriano Cansian, Carlos

Montez (UNESP - Rio Preto)

Interaction Control for Contact Robotics Neville Hogan Sun Jae Professor of Mechanical Engineering Professor of Brain and Cognitive Sciences Massachusetts Institute of Technology

Abstract Contact robotics—close physical contact and cooperation between robots and humans— requires reliable, robust control of interaction. I will review some of the interesting and perhaps unique challenges of interaction control. Most control theory is permeated by a “signals” perspective: each system component is described as a mathematical operator that unilaterally determines its output (signals) as a function of its input (signals)—but not vice-versa. Composition of operators is straightforward and the result is modularity: behavior of a component is essentially unaffected by its assembly into a system, thereby dramatically simplifying design of complex machines. Unfortunately, the interactions due to physical contact are usually bi-lateral—each system affects the other. The “controlled system” blends the robot dynamics with those of the contacted object, which may be poorly or incompletely unknown. As a result the “signals” perspective doesn’t work well. I will review the mechanical physics of interaction, define what is meant by a “port” and show its usefulness for establishing impedance or admittance control. Drawing heavily on concepts from physical systems, I will review how a port-based perspective yields simple solutions for stabilizing contact, coping with (and taking advantage) of redundancy and selecting optimal behavior for different tasks.

Background papers Hogan, N. and S. P. Buerger (2004). Impedance and Interaction Control. Robotics and

Automation Handbook. T. R. Kurfess, CRC Press: 19-1 to 19-24. Fasse, E. D. and N. Hogan (1995). Control of physical contact and dynamic interaction.

7th International Symposium on Robotics Research. Germany. Mussa-Ivaldi, F. A. and N. Hogan (1991). "Integrable Solutions of Kinematic

Redundancy Via Impedance Control." International Journal Of Robotics Research 10(5): 481-491.

Hogan, N. (1988). "On The Stability of Manipulators Performing Contact Tasks." IEEE Journal of Robotics and Automation 4(6): 677-686.

Hogan, N. (1985). "Impedance Control: An Approach to Manipulation." ASME Journal of Dynamic Systems, Measurement and Control 107: 1-24.

Model-Driven Engineering of ComplexEmbedded Systems: Concepts and Tools

Flavio Rech Wagner∗, Francisco A. M. Nascimento∗ and Marcio F. S. Oliveira∗†∗Institute of Informatics , Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil

∗†Cooperative Computing and Communication Laboratory(C-LAB), University of Paderborn, Paderborn, Germany

Abstract—This paper starts presenting a brief history of engi-neering methods till Model-Driven Engineering (MDE). Then, itintroduces the basic principles of MDE, including the conceptsof models, meta-models, transformation between models, anddomain specific languages (DSLs). One can identify two classesof tools. The first one is the framework required to supportMDE of any kind. It supports different operations and commontasks, independently from development domain, and strongly relyon standards. As such, some MDE standard approaches, as forexample MDA (Model Driven Architecture), Software Factories,and MIC (Model Integrated Computing), are explained, and weprovide a short survey on different technologies supporting MDE- MOF and Ecore for metamodeling, UML and DSLs for model-ing, QVT, ATL, Xtend, and Xpand as transformations languages.A second class of tools adopts an MDE framework to provideDomain Specific Engineering Tools (DSET), which aggregatedomain specific knowledge to define relations between modelsand how these models can be refined. Focusing on ComplexEmbedded Systems, some DSETs for the development of thesesystems are described. The paper shows in detail a DSET, whichuses an MDE framework based on the Eclipse Modeling Projectand OMG standards. Finally, the paper presents the applicationof a DSET for Embedded Systems in a complete developmentflow. For this, we start defining a sample embedded system andshow how the system requirement specification can be refinedthrough different development phases using an MDE approach.The development process relies on different tools, which supportmultiple semi-automatic or automatic development tasks.

I. INTRODUCTION

Nowadays we are surrounded by devices containing hard-ware and software components. These devices support manydifferent domains, such as telecommunication, avionics, au-tomobile, space, military, medical care, and others. Theyare inserted into our dayly lives, in cell phones, in cars ascontrollers for multiple subsystems (e.g., ABS, EPS, etc), inthe electronic toys, in the blood pressure measurement systemsand so on. In short, they are found anywhere, and so they arecalled Embedded Systems, as they are information processingsystems embedded into products, where the processing systemis not the main goal or functionality of the product [1].

The ever growing complexity in modern embedded sys-tems require the utilization of more hardware and softwarecomponents to implement the functions incorporated into asingle system. Such increasing functionality leads to a growingdesign complexity, which must be managed properly, because,besides stringent requirements regarding power, performanceand cost, also time-to-market hinders the design of embeddedsystems.

In order to overcome the difficulty in rising the abstractionlevel and to improve the automation of the design from theinitial specification until the final system, research efforts lookfor modeling methods, formalisms, and suitable abstractionsto specify, analyze, verify, and synthesize embedded systemsin a fast and precise way.

The main motivation to use models in the design ofembedded systems is abstraction. Abstraction helps us tounderstand a complex system, hiding irrelevant informationto solve a specific problem. However, abstraction alone doesnot improve the development. Accuracy is required, so thatmodels truly represent a specific system view. A model mustclearly communicate its intent and must be easy to understandand to develop, in order to be effective [2].

A prominent effort that attempted to use models in orderto rise abstraction and automate development tasks originatedthe Computer Aided Software Engineering (CASE) tools.CASE tools provide graphical representations for fundamentalprogramming concepts and automatically generate implemen-tation code from them. The main purpose of these tools was toreduce the effort of manually coding, debugging and portingprograms. However, due to the limited platforms existing atthat time, the code to be generated was too complex forthe available technology. Moreover, the graphical represen-tations were too generic and poorly customizable, and thusthey could not support many application domains. Nowadays,these limitations have been drastically reduced, due to object-oriented languages and development frameworks, which makeeasier the reuse of software components. However, thesedevelopment frameworks and platforms are extremely complexand evolve quickly, causing a fragmented view due to multipletool integrations required for developing new applications [3].

Although models are used in any engineering domain, onlyrecently models start playing a central role in the develop-ment process of software and embedded systems [2]. Model-Driven Engineering (MDE) [4] has been proposed in order toimprove the complexity management and also the reusabilityof previously developed/specified artifacts. The MDE methodraises the design abstraction level and provides mechanisms toimprove the portability, interoperability, maintainability, andreusability of models.

In our MDE approach, we use only MOF concepts (MetaObject Facility, a standard representation for meta-modelsand models proposed by OMG [5]) to define our internalrepresentation. Thus, as our metamodels conform to MOF, the

representation can take advantage of the concept of transfor-mations between models to implement DSE (Design SpaceExploration), formal verification and co-synthesis tasks.

Our MDE approach defines internal models conforming toMOF-based metamodels proposed to represent applications,capturing functionality by means of processes communicat-ing by ports and channels; platforms, indicating availablehardware/software resources; mappings from applications intoplatforms; and implementations, oriented to code generationand hardware synthesis. Additional metamodels and transfor-mations extend this infrastructure to perform design specifictasks such as DSE and verification.

We support a formal verification methodology. By using theMDE approach, we generate a MOF-based representation of anetwork of timed automata [6] from UML Class and Sequencediagrams. We use the network of timed automata as input tothe UPPAAL model checking tool [7], which can validate thedesired functional and temporal properties of the embeddedsystem specification. Since the network of timed automata isautomatically generated from UML models, the methodologyis very useful for the designers, making easier the debuggingand formal validation of the system specification.

Moreover, we offer an MDE methodology for the co-synthesis problem [8], which is integrated with the formalverification approach. This way, after the formal validationof the desired properties, the validated system specification isused as input to our MDE co-synthesis tool. Therefore, thismethodology exploits the MDE approach to automatically gen-erate a correct-by-construction implementation for a specificplatform.

Other approaches do not consider the influence of thestructural features of the UML model in the communicationbehavior of a specified application. Our internal design repre-sentation, in turn, also captures the hierarchy and communica-tion structure of the UML model in the form of a graph. Thisway, we represent the control and data flow dependencies ina convenient way for the co-synthesis algorithms.

During the development of complex embedded systems, awide range of design alternatives arises from different designactivities. The combination of alternative designs and stringentrequirements unveils a complex design space, which the designteam must evaluate under reduced time-to-market. DesignSpace Exploration (DSE) consists in systematically searchingfor different design candidates, by mapping an application intoan architectural platform. Each different candidate correspondsto a trade-off regarding design requirements and constraints.

Concerning the DSE process, all methods discussed in thefollowing Section IV restrict the design space, according tothe activity to be performed. Moreover, the generation ofcandidate designs is internally implemented, usually as afunction that is programmed directly in the tool. As a result,no extension mechanisms are provided, requiring multipletools to support each design activity. Moreover, for mostapproaches either the constraints set is restricted to previousconstraints implemented by the tool or the method supportslimited constraints constructs.

The method proposed in our MDE approach overcomesthose restrictions by defining a design space abstraction,using a categorical graph product [9]. Besides the automaticconstruction of the design space, performed by the productof graphs, this abstraction provides a common representationfor multiple design activities. Moreover, the specification ofa metamodel using a well-adopted technology allows us toexploit the MDE approach, such that model-to-model trans-formation rules are used to implement any user constraints,improving the flexibility of the DSE method.

The remaining of the text is organized as follows. SectionII provides the background on MDE. Section III presentstechnologies developed to support MDE, such as modelinglanguages, transformation languages, and engines. SectionIV presents an overview on MDE approaches for embeddedsystems design. Sections V and VI present the MDEframework for embedded systems design, in development atUFRGS. Section VII present a case study, which illustrates themethodology described inthe previous secion. Section VIII,finally, discusses future trends and gives final remarks.

II. MODEL-DRIVEN ENGINEERING BACKGROUND

The MDE approach was proposed to overcome the lim-itation of the object technology to rise the abstraction anddeal with the increasingly more complex and rapidly evolvingsystems we are developing today. Proposing that ”Everythingis a model”, MDE promotes the paradigm shift required to thenecessary evolution [10]. Although the central concept of thisproposal still has multiple definitions, a consensual definitionof model and modeling is presented in [11]:

”Modeling, in the broadest sense, is the cost-effective use of something in place of something elsefor some cognitive purpose. It allow us to use some-thing simpler, safer or cheaper than reality instead ofreality by some purpose. A model represents realityfor the given purpose; the model is an abstractionof reality in the sense that it cannot represent allaspects of reality. This allows us to deal with theworld in a simpler manner, avoiding the complexity,danger and irreversibility of reality.” [11]

Since the main principle of MDE is that ”Everything is amodel”, models play a central role in the development process,thus defining the scope of MDE proposed in [4]. The basicconcepts to support the MDE principle are system, model,metamodel, and the relations between them, so that a modelrepresents a system and conforms to a metamodel [10]. Suchconcepts were organized in 3+1 layers [10], and are illustratedin Figure 1.

Formally, a model in MDE is a graph composed of elements(vertices and edges), where each element corresponds to aconcept in a reference graph (metamodel) as defined below:

Definition 1: A directed graph G = 〈NG, EG,ΓG〉 consistsof a set of distinct nodes NG; a set of edges EG; and amapping function ΓG : EG → NG ×NG.

Definition 2: A model M = 〈G,ω, µ〉 is a tuple whereG = 〈NG, EG,ΓG〉 is a directed graph; ω is itself a model,

Fig. 1. Basic concepts and layered organization

Fig. 2. Model transformation in the context of MDE

named reference model of M , associated to a graph Gω =〈Nω, Eω,Γω〉; and µ : NG ∪ EG → Nω is a functionassociating elements (nodes and edges) of G to nodes of Gω(metamodel).

A metamodel is a model, which is a reference model forother models, so that it defines classes of models that can beproduced conforming to it. It is an abstraction, which collectsconcepts of a certain domain and the relations between theseconcepts.

MDE models are operated through transformations, aimingat the automation of some development activity. Such trans-formations define clear relationships between models [10] andusually are specified in a specialized language to operate on(graph) models. Following the description in [12], a modeltransformation means converting one or more source modelsto a target model, where all models must conform to somemetamodel, including the model transformation itself, whichis also a model. Figure 2 illustrates the concept of modeltransformation in the MDE context.

Model transformation plays a key role in MDE and hasmany applications, as enumerated in [13]:• Generating low-level models from high-level ones• Generating development artifacts (e.g. configuration files

and source code)• Mapping and synchronizing models• Creating query-based views of a system

Fig. 3. MDE context: principles, standards and tools

• Model refactoring• Reverse engineering• Verification, etc.Based on the posible applications of model transformations,

they can be classified in:• Model-to-Model, when the source and target of the trans-

formation are models, e.g. transformation from UML toa relational Data Base (RDB) schema or from a PlatformIndependent Model (PIM) to a Platform Specific Model(PSM);

• Model-to-System, characterizing a generation from modelto system, which can include program code or any otherartifact, e.g. UML to Java or Simulink to C++;

• System-to-Model, meaning a reverse engineering, such asfrom Java code to a UML model or from Java code to abusiness model.

A more detailed survey on model transformation approachesis presented in [13].

III. TECHNOLOGICAL FRAMEWORKS

Technological frameworks [14] are tools to support differentoperations and common tasks for MDE independently fromthe application domain. Such tools rely on standards, such asMDA, MIC, and Software Factories, in order to generalizethe manipulation of models, providing facilities such as per-sistence, repository management, copy, etc. Figure 3 illustratesthe relationship between the principles, presented in Section II,standards and tools.

An overview on some standards and tools are presented inthe next subsections.

A. MDE Standards

1) Model-Driven Architecture: Model-Driven Architecture(MDA) is a standard proposed by OMG for software de-velopment. The main purpose of MDA is the abstractionof platforms, so that the business models can be reused asthe technological platform evolves. MDA integrates differentOMG standards, such as MOF for metamodeling, UML forsystem modeling, SPEM for process modeling, and QVT formodel transformation. In order to separate business and appli-cation models from the underlying platform, MDA advocatesthree modeling dimensions (view points):

• The Computation Independent Model (CIM) focuses onthe required features of the system and on the environ-ment where it must operate;

• Platform Independent Model (PIM) focuses on businessfunctionality and behavior, which are unlikely to changefrom one platform to another;

• Platform Specific Model (PSM) describes platform spe-cific details integrated with elements of PIM.

The relationship between PIM and PSM in MDA canbe established by automatic or semi-automatic mechanisms,specifying a mapping between these models. MDA suggeststhat this mapping can be specified by using QVT, so thata transformation engine can generate the automatic transfor-mation from PIM to PSM. The languages used to expressthese models are defined by means of metamodels using MOF,which are able to represent abstract and concrete syntaxes, aswell as the operational semantics of the modeling language.Originally, MDA was proposed for enterprise architecturesthat use platforms, such as Java2EE, CORBA, VisiBroker,and WebSphere. However, as using the MDA approach thedevelopment of systems can be focused on aspects that donot involve implementation details, many other domains startconsidering the MDA approach, such as real-time and embed-ded systems. Therefore, MDA and the experience with OMGstandards are in the origins of MDE.

2) Model Integrated Computing: Model Integrated Com-puting (MIC) [15] is an initiative from Vanderbilt University.In this approach, models representing different views capturethe designer’s understanding of the computer-based system,including information process, physical architecture, and oper-ating environment. A formal specification of the dependencesand constraints among these models allows the generation oftools to solve an entire class of problems. MIC proposes atwo step development process. In the first step, a domain-independent abstraction is used to formally define a domainspecific environment and the required models, languages andtools. In the second step, three typical components deliveredfrom the previous phase are used for system engineering:

• A graphical model builder is used to specify domainspecific models. Constraints explicitly defined at meta-level allow model testing.

• A model database stores domain specific multiview mod-els using a multigraph architecture.

• Model Interpreters are used to synthesize executableprograms from the domain specific models and generatedata structures for the tools.

MIC has a strong influence on the principles of MDE as ithas a wider basis on engineering of systems than MDA. More-over, the two step process advocated by MIC is close to theidea of Technological Frameworks as a basis of developmentfor Domain Specific Engineering Tools present in the MDEapproach.

3) Microsoft Software Factories: The main idea behind theSoftware Factories [16] is to introduce patterns of industrial-ization in the software development. It is ”a software product

line that provides a production facility for the product familyby configuring extensible tools using a software template basedon a software schema” 1

A Software Factory Schema describes the artifacts thatcomprise a software product. It is represented by a graph,where vertices are viewpoints and edges are relationships be-tween viewpoints (mapping). Each viewpoint defines the toolsand materials required by a concern in a specific abstractionlevel. Attached to a viewpoint, a micro process is definedfor producing the artifacts described in the viewpoint. Suchprocess is constrained by preconditions, postconditions andinvariants that must hold when the view is stabilized.

A Software Factory Template is the collection of DSL’s,patterns, frameworks and tools described in the SoftwareFactory Schema, which is made available to developers, inorder to create a specific software product.

B. MDE Tools

The MDE approach has a practical relevance only if it canproduce and transform models bringing considerably morebenefits than the current practices. Therefore, to enhance thevalue of models, they must become tangible artifacts, whichcan be simulated, verified, transformed, and so on, and theburden for maintaining these models in synchronization withthe produced system must be reduced [4].

Supporting tools are essential to provide all benefits ofMDE. This section describes some MDE tools, focusing ontools supported by the Eclipse Modeling Project (EMP)2. EMPprovides a unified set of modeling frameworks, tooling, andstandards implementations.

1) Metamodeling/Abstract Syntax: As the model is themost important artifact in MDE, defining the class of modelsan MDE process must work on is one of the first steps. Thisis done by metamodeling, which defines the structured datatypes used to represent a system (abstract syntax). In EMP,metamodels are defined conforming to ECORE, a metameta-model (layer 3 in Figure 1) defined by the Eclipse ModelingFramework (EMF). EMF is a projection of ECORE, and ofthe models conforming to it, into Java API. It provides codegeneration facilities and tools to building model editors and tocompare, query, persist and validate models. As most tools inEMP are based on ECORE and EMF, and many other projectsmake use of EMF, ECORE is a de fato standard.

Besides Ecore metametamodels and EMF, other metamod-eling tools are found. Kermeta 3 is based on the OMG standardEssential MOF (EMOF), which was originated from ECOREand KM3, a metametamodel proposed in [17]. MetaGME isa metamodeling tool, which implements the metamodelingconcepts for MIC. Originally, its metametamodel was calledMultigraph Architecture. Newest versions use UML classdiagrams notation and OCL for metamodeling.

1http://msdn.microsoft.com/en-us/library/ms954811.aspx2http://www.eclipse.org/modeling/3http://www.kermeta.org/

2) Concrete Syntax: A concrete syntax for a DSML (Do-main Specific Modeling Language) can be defined using thetools from the Eclipse Graphical Modeling Project. It providestools, such as GMF Notation and Graphiti, to specify theconcrete syntax and to generate an editor to express modelsgraphically.

The definition of the concrete syntax of languages expressedas text is also possible by using tools such as Xtext. It providesa simple EBNF language, which is used to define grammars,and a generator to create a parser, an AST-metamodel (im-plemented in EMF), and a Eclipse text editor for the definedlanguage.

3) Model Development: For common general purpose anddomain specific languages, there is no need to build neweditors as good tools can be found, such as Magic Draw, En-terprise Architecture and Rhapsody for modeling with UML.Simulink and Scade are DSML’s commonly used for controlengineering and signal processing and specialized tools for thatare also provided. Eclipse Model Development tools providemodel editors for some standards such as UML, XML, andOCL.

4) Model Transformation: Since model transformation isthe key operation for MDE, many transformation engines andlanguages were proposed. However, after the experience withfirst languages, a discussion on classification [13] and qualitymetrics [18] is starting to take place in the research agenda,so that a standard with high adoption may rise.

EMP had many model-to-model transformation languages,but now the efforts concentrate on ATL and in a reference im-plementation of QVT, the QVT Operational. Other languagesare provided as Eclipse projects or Eclipse plug-ins, such asVITRAII and GReAT.

Model-to-text (Model-to-System) transformation is pro-vided by EMF through three different template-based lan-guages: Java Emitter Template (JET); Acceleo, which is aimplementation of an OMG standard, named MOF to TextLanguage; and Xpand, which was initially an openArchitec-turalware component.

IV. MODEL DRIVEN ENGINEERING OF COMPLEXEMBEDDED SYSTEMS

In [14] two classes of MDE tools are identified. The firstclass is called MDE Technology Framework, which supportthe MDE process by providing tools for different operationsand common tasks, independently from development domain,such as metamodeling, transformation engines and languages,debugger, tracing, and other facilities. These tools rely stronglyon standards. Some of them where presented in the previoussection, such as the tools provided by the Eclipse ModelingProject. The second type of tools adopts an MDE framework toprovide Domain Specific Application Development Environ-ments (DSAEs), which aggregate domain specific knowledgefor defining relations between models and how these modelscould be refined. Generalizing this concept, we assume thatDomain Specific Model-Driven Engineering Tools (DSMDET)are those tools which rely on an MDE technology framework

to engineer not only software, but entire systems, which maybe also composed of hardware, electrical, and mechanicalparts. This section present some DSMDETs for embeddedsystem development.

The adoption of platform-independent design and exe-cutable UML has been vastly investigated. For example,xtUML [19] defines an executable and translatable UML sub-set for embedded real-time systems, allowing the simulation ofUML models and C code generation oriented to different mi-crocontroller platforms. The Model Execution Platform (MEP)[20] is another approach based on MDA, oriented to codegeneration and model execution, as well as the Frameworkfor UML Model Behavior Simulation (FUMBeS) [21].

Other approaches improve the integration of the designtools into an MDE environment, by defining meta-models, andthe transformations on them include some refinement. Thisapproach includes the DaRT (Data Parallelism to Real Time)project [22], [23], whose evolution produced the Gaspard2framework. It proposes an MDA-based approach that hasmany similarities with our approach in terms of meta-modelingconcepts. DaRT defines MOF-based metamodels to specify ap-plication, architecture, and software/hardware associations anduses transformations between models to refine an associationmodel. In the Gaspard2 framework [24] UML/MARTE modelsare used as input and transformation to other tools, providingsupport for co-synthesis, simulation and formal verification,by translating its model into synchronous reactive languages.However, no automated DSE (Design Space Exploration)strategy based on these transformations is implemented, andthe main focus is code generation for simulation at TLM(Transaction Level Model) and RT (Register Transfer) levels.In this approach, each candidate solution is simulated at adifferent abstraction level, thus guiding the designer in theDSE activities.

The Aspect-oriented Model-Driven Engineering for Real-Time systems (AMoERT) methodology [25] proposes anautomated integration of design phases for distributed em-bedded real-time systems, focusing on automation systems.The proposed approach uses MDE techniques together withAspect-Oriented Design (AOD) and previously developed (orthird party) hardware and software platforms to design thecomponents of distributed embedded real-time systems. AODconcepts allow a separate handling of functional and non-functional requirements, improving the modularization of theproduced artifacts. In addition, the mehodology is supportedby GenERTiCA code generation tool [25], which uses map-ping rules for the automatic transformation of UML modelsinto source code for software and hardware components,which can be compiled or synthesized by other tools, thusobtaining the realization/implementation of the distributedembedded real-time system. During the generation process,the tool includes the required implementation code to handlethe specified aspects for non-functional requirements (modelweaving).

Metropolis [26] is an infrastructure for electronic system de-sign, in which tools are integrated through an API and a com-

mon metamodel. Following the platform-based approach [27],the Metropolis infrastructure captures application, architectureand mapping using a proposed UML-platform profile [28].Furthermore, its infrastructure is general enough to supportdifferent Models of Computation and to accommodate newones. Non automatic support for Design Space Exploration isprovided by Metropolis, which proposes an infrastructure tointegrate different tools. Nevertheless, the current simulationand verification tools integrated into Metropolis and the pro-posed refinement process can be used to manually performsome architectural explorations (task mapping, scheduling,hardware/software partitioning) and component configuration.Moreover, the refinement process allows the explicit explo-ration of application algorithms, which implement a higherlevel specification.

Koski [29] is a UML-based framework to support MPSoC(Multi-Processor System-on-Chip) design. It is a library-basedmethod, which implements a platform-based design. Koskiprovides tools for UML system specification, estimation, ver-ification, and system implementation on FPGA. The Koskidesign flow starts with a requirement analysis, which spec-ifies the application or architecture requirements and designconstraints. Following the design flow, the application, ar-chitecture and the initial mapping are specified as UML 2.0models. A UML interface handles these models and generatesan internal representation, which is used for architecturalexploration. The architectural exploration is performed in twosteps; the first one is static, fast and less accurate; the secondone is dynamic. At the end of the design flow, the UML modelsare used to generate code and the selected components fromthe platform are linked to build the system.

Other complete environment for design space explorationis the MILAN [30] framework, with two exploration toolscalled DESERT [31] and HiPerE [32]. The focus of MILANis the integrated simulation of embedded systems, so thatit evaluates pre-selected candidate solutions. The hierarchi-cal simulation provided by MILAN allows a designer toexplore the design space at several abstraction levels, byusing the DESERT and HiPerE tools. First, the DESERTtool uses models of aggregated system sub-components andconstraints to automatically compose the embedded systemthrough Ordered Binary Decision Diagrams (OBDD), basedon a complete pre-characterization of components. Moreover,the DESERT tool performs design space pruning, reducingthe number of candidate solutions. After that, HiPerE can beused for accurate system-level estimation, exploring the pruneddesign space. Finally, by using integrated simulation at lowerabstraction levels, the designer can explore the reminder ofthe design space, performing then also platform tuning.

V. MODES: AN MDE FRAMEWORK FOR EMBEDDEDSYSTEMS DESIGN

Our MDE approach to embedded systems design automa-tion is supported by the Model-based Design for EmbeddedSystem (MoDES) Framework [33]. In this approach, the

Fig. 4. MODES Development Flow

engineer specifies the application independently from the plat-form using UML as modeling language. MoDES provides thecomponents System Designer and Application, Platform andImplementation Managers, which transform the UML modelsinto internal models conforming to metamodels proposed torepresent applications, capturing functionality by means ofprocesses communicating by ports and channels; platforms,indicating available hardware/software resources; mappingsfrom applications into platforms; and implementations, ori-ented to code generation and hardware synthesis. Additionalmetamodels and transformations extend this infrastructure toperform design specific tasks such as DSE (Design SpaceExploration) and verification. Figure 4 illustrates the MoDESinfrastructure including the models, according to metamodelswith the same names, and the flow of transformation be-tween tools, which provides support to DSE (H-SPEX) [34],estimation (SPEU) [35], formal verification (UPPAAL) [7],and co-synthesis (System Designer, Application, Platform andImplementation Managers).

We implemented the MODES framework using the open-source Eclipse Modeling Framework (EMF) to define ourEcore conformant metamodels, while openArchitectureWareis used to define transformations and workflow between tools.The UML models can be specified in any editor that providesan XMI compatibility with EMF tools such as Eclipse UML2.

A. System Modeling

The proposed system development methodology adoptsUML and the MARTE profile together with modeling guide-lines to specify application, architecture, and mapping. As anexample, consider a real-time embedded system dedicated tothe automation and control of an intelligent wheelchair. Theapplication structural model is specified using Class Diagrams.Figure 5 shows a partial class diagram for the movement

Fig. 5. Application Model: UML Class Diagram

Fig. 7. Architecture Models: Composite Diagram

control of the automated wheelchair.The behavioral model is defined using Interaction Diagrams,

containing loop and conditional execution, interaction betweenobjects, and dependencies between execution scenarios. AnInteraction Overview Diagram identifies and link the scenariosused to evaluate the system during the estimation process.For our example, an Interaction Overview Diagram specifies aparallel composition of three UML sequence diagrams, whichare illustrated in Figures 6 (a), (b), and (c).

The allocated architectural components, such as processingunits, memories and communication busses, are defined inComposite Diagrams. The Composite Diagram can also beused to define the mapping from application to architecture,for example to specify in which processing unit a software el-ement must execute, as illustrated in Figure 7. The ComponentDiagrams are considered as constraints during the automaticDSE process.

Alternatively, a Domain Specific Language (DSL), definedto specify models for application, platform, and implementa-tion, can be used instead of UML. To this purpose we use theXtext feature of openArchitectureWare, which automaticallygenerates the parser and a text editor for these DSLs as Eclipseplug-ins from an Extended BackusNaur Form (EBNF) [36]specification.

B. Internal Application Metamodel

Representing an application in a standard way, the modelcaptured from UML is translated into a common applica-tion model defined by our Internal Application Metamodel(IAMM), partly depicted in Figure 8.

Conforming to this metamodel a system specification cap-tures the functionality of the application in terms of a set ofModules. Each Module has ModuleDeclarations and

Fig. 8. Internal Application Metamodel

a ModuleBody. ModuleDeclarations are used to spec-ify typed Channels, Ports, Signals, and Variables.These concepts come from hardware description languages,such as VHDL. Channels are used by Processes tosend or receive messages. Ports interconnect the Modules.Signals are used to specify shared memories for processes.Variables correspond to the local memories for processes.A ModuleBody consists of Interconnections withother Modules and a ModuleBehavior, as well as sub-modules. The ModuleBehavior is captured in terms of a setof Processes, and each Process has a set of Actions,which represent the occurrence of UML events in the scenarios(UML sequence diagrams), as will be shown in the following.

The behaviors of the processes are associated to a Modelof Computation (MoC). This association allows the translationfrom an abstract behavior description to a specific MoCand the execution of algorithms to automate design tasks.Currently, two MoC’s are supported and their metamodelsextend the IAMM as described in subsections V-C and V-D.

C. Interaction Graph Metamodel

The control and data flow graph (CDFG) [8] of an applica-tion model is defined conforming to the metamodel presentedin Figure 9.

Figure 9 represents an InteractionGraph, which con-sists of a set of IGNodes and IGEdges. Each IGNode canrepresent different kinds of control flow:• IGInitialNode and IGFinalNode indicate the be-

gin and end of an InteractionGraph, respectively;• IGForkNode and IGJoinNode represent parallel ex-

ecution; and• IGDecisionNode and IGMergeNode represent con-

ditionals and loops.There are also two kinds of executable nodes, which are sub-

classes of the IGMessageNode class: IGCallNode cap-tures the sending of messages and IGReplyNode representsthe reply messages in the UML sequence diagram.

For each UML sequence diagram SDm there is anInteractionGraph IGSDm = 〈V,E,K,L〉, which is aCDFG, where:

Fig. 6. UML Sequence Diagrams identified as: a) SD1; b) SD2; c) SD3

Fig. 9. Interaction Graph metamodel

• V is the set of nodes, representing the actions in thebehavioral modeling;

• E is the set of edges, representing the data and controlflow between the actions;

• K : V → {Initial; Final; Fork; Join; Merge;Decision; Call; Reply} is a function that indicates thetype of each node; and

• L : V → {IGSDi} is a relation that associates anIGCallNode to another InteractionGraph andallows the capture of the behavioral hierarchy of theapplication.

An Interaction Overview Diagram links multiple Se-quence Diagrams, and from this diagram is generated anInteractionGraph IGapp = 〈V,E,K,L〉, representingthe CDFG for the entire application.

Therefore, our IAMM captures structural aspects of an ap-plication model by using a hierarchy of modules and processes,as well as behavioral aspects by means of the actions ofsending and replying messages, where a message may executesome method in the corresponding object.

D. Labeled Timed Automata Metamodel

Additionally, the functional behavior of a UML model istranslated into a network of Labeled Timed Automata (LTA)

Fig. 10. Labeled Timed Automata metamodel

[6], conforming to the LTA metamodel illustrated in Figure 10.The LTA metamodel captures all concepts introduced by

the UPPAAL model checking tool [7]. According to thismetamodel, a system consists of LTADeclarations, whichare used to declare variables, functions, and channels, andLTAProcesses, which are instances of LTATemplates.Each LTATemplate corresponds to a timed automaton,which can also have LTADeclarations of local variablesand functions. Each timed automaton is represented by a setof LTALocations, corresponding to states of the automa-ton, and LTATransitions, corresponding to transitionsbetween states, thus having source and target locations.

Each transition may have attributes such as:• LTASelections, which non-deterministically bind a

given identifier to a value in a given range when atransition is taken;

• LTAGuards - the transition is enabled in a state if andonly if the guard evaluates to true;

• LTASyncronizations - transitions labeled with com-plementary synchronization actions (send and receive)over a common channel; and

• LTAUpdates - when the transition is taken, its updateexpression is evaluated and the side effect of this expres-sion changes the state of the system.

The LTA model is used in the UPPAAL model checkerto perform formal verification of specified properties of thesystem. This feature is very useful for the designer, since

Fig. 11. Internal Platform metamodel

the LTA model is automatically generated and can help thedesigner to debug and validate the specification.

E. Internal Platform Metamodel

In a platform-based design context, a large number ofhardware and software components are provided and can bereused in the system development. Such reusable componentsmust be pre-characterized such that their Quality of Servicevalues, such as performance, energy, memory footprint, andothers, are acquired. This pre-characterized library dramati-cally reduces the design phases and the uncertainty about thesystem properties, thus improving the design productivity. Thesoftware component characterization is performed after thecomponent code is compiled for the target architecture, sinceat this time a simulation/estimation tool can capture archi-tectural information with high accuracy. The characterizationof hardware components must be performed from adequatesynthesized descriptions, to obtain values that are independentof technology and frequency, such as execution cycles and gateswitchings per cycle (a measure for power consumption). Inour methodology, the available hardware/software componentsand the characterization information are stored in a platformrepository. Figure 11 shows our Internal Platform Metamodel(IPMM).

In our IPMM, a Platform contains differentComponents, which offer Services for the application.These Services must be pre-characterized in terms ofQuality of Service (QoS), and this information is reusedduring system development. Our approach uses performance,energy, data memory, and program memory as QoS metrics.However, other metrics could also be used, thus extendingthe QoS concept.

F. Mapping Metamodel

The Mapping Metamodel is responsible for describing therules used to transform instances of IAMM and IPMM intoan instance of the Implementation Metamodel. Conformingto the Mapping Metamodel, presented in Figure 12, a Map-ping model consists of a set of Transformations, whoseRules are specified by leftSides and rightSides. The

Fig. 12. Mapping metamodel

queries on the leftSide determine the source metamodel el-ements, which will be manipulated by the Action of the ruleside, and the specified action is applied only when the specifiedCondition evaluates to true. Thus, a transformation rulemay also change the elements of the source metamodel.

Similarly, the queries on the rightSide determine thetarget metamodel elements, which will be manipulated bythe Action of the rule side, and the specified action isapplied only when the specified Condition evaluates totrue. Instead of defining our own transformation language asa concrete syntax for the Mapping Metamodel, we use theXtend transformation language. Therefore, transformations inXtend are considered instances of the Mapping Metamodel.The source models are IAM and IPM, and the target is theImplementation Model. A Mapping model defines also therules, which guide the DSE process and prune the designspace. By means of model-to-model transformations, the ruleson the Mapping model manipulate instances of the DSEMetamodel, to generate candidate designs during the DSEprocess. The Mapping model provides flexibility to specifyconstraints that directly handle the concepts of the design, suchas processors, tasks, slots, voltage level, and others.

G. Design Space Exploration Metamodel

The DSE Metamodel defines the relevant concepts to per-form automated DSE. Figure 13 shows this metamodel.

The root container in this metamodel is DSEDomain, whichis a container for all elements related to DSE. It inheritsproperties from DSEModelElement as all other elementsin this metamodel. The generalization was omitted to keepthe diagram clear. DSEDomain contains DSEProblema,which define a DSE scenario. DSEProblem contains a listof DesignGraphs extracted from Application and PlatformModels.

A DesignGraph contains vertices and edges, where ver-tices are ExplorableElements and Edge represents thedependences between vertices. ExplorableElement is areference to a design element from which the DesignGraphis generated. This reference is important to hook theExplorableElements to the design model and allowsthe metamodel to be attached to multiple models, such asUML, Simulink, and others. Currently, this reference is im-plemented by holding the name of the design element as afield of ExplorableElement and using queries to findthe instance of the design element in the design repository.

Fig. 13. Design Space Exploration metamodel

This implementation could be improved, but it is important toevaluate factors such as performance, increase of dependencebetween metamodels, and traceability of design elements.DSEProblem also contains a list of Objectives, which

are the values to be optimized, defined by their name andunit. We represent a DesignSpace as a categorical graphproduct, as we propose in [37]. DesignDecisions repre-sent vertices in the design space graph, and Alternativeslink the allowed DesignDecisions. DesignDecisionis a tuple of n vertices from the DesignGraphs. It con-tains a GraphToExplorableMap, which contains an in-stance of DesignGraph as a key and an instance ofExplorableElement as a value, so that it can map adesign decision to the ExplorableElements representedin the DesignGraphs.DSESolution is a sub-graph of the design space

and represents candidate designs. A DSESolution hasits costs defined in the ObjectiveToCostMap, ac-quired from an estimation/simulation process. DSESolutionalso contains a list of decisions, which identifies theDesignDecisions selected from DesignSpace andmaps it to an ObjectiveTo-CostMap.

Our H-SPEX DSE tool invokes the engine that executesthe transformations defined by ExplorationRules, whichis an instance of the Mapping model required to generateDSESolutions conforming to the DSE Metamodel.

H. Implementation Metamodel

The Implementation Metamodel, presented in Figure 14,represents a model that can implement a system. AnImplementation is a list of Resources, which are theHardware, Software and Communication componentsrequired to implement the system.

The metamodel also represents the association betweenHardware and Software and the Communicationbetween these resources. Each Resource may haveImplementationLinks, which are references to artifactsrequired for its final implementation, such as source code files

Fig. 14. Implementation metamodel

and scripts. An instance of this metamodel is obtained bythe application of mapping rules, which are selected from themapping model by means of our DSE approach.

VI. DESIGN AUTOMATION TASKS

A. Co-Synthesis Tasks

A co-synthesis design process, from a specification of thesystem functionality, produces an efficient implementation ofthe embedded system in terms of: a set of software modules tobe executed by hardware components from a given platform;a set of hardware modules, which are specifically designedfor the application, in the form of ASICs or FPGAs, withminimum latency and costs; and a set of interface modules toperform the communication between all the elements of theimplementation. Thus, the co-synthesis process must includedesign tasks for the specification of the system functionalityand its translation to a representation. This representationmust be adequate for the execution of tasks such as hard-ware/software partitioning, scheduling, allocation and binding,during the design space exploration, and code generation, forobtaining the final implementation of the specified system.The following sections present the co-synthesis tasks of ourapproach.

1) Capturing Application: Our Application Manageradopts an MDE approach to generate the Internal Applica-tion Model (IAM) conforming to our IAMM. It does so byperforming model transformations on the UML application

Fig. 15. UML to IAM transformation

model, which are implemented using the Xtend language fromthe openArchitectureWare framework. To give an idea of thekind of model transformations we define, Figure 15 illustratespart of our model transformations.

UML structural constructsThe main transformation is performed in lines 6-10 of

Figure 15, where each Package in the UML model is traversed(line 7). After that, the sub-modules are identified (line 8),the processes are built from the sequence diagrams (line 9),and, finally, the InteractionGraphs are built from thesequence diagrams (line 10).

As seen in lines 12-14, which show the functionhandlePackage(), each Package in the UML modelis traversed recursively (see line 14), and each existingUML Class in a package is transformed into a Moduleclass by calling the function mapModule() (line 13). EachUML Attribute of each UML class is transformed into aModuleDeclaration class, as shown in lines 17-18.

The associations between the UML classes determine thesub-modules of each module. Each UML Class, which ispart of an aggregation or composition of another UML Class,is transformed into a sub-module, by calling the functionputSubModule() (lines 23-25).

UML behavioral constructsIn Figure 16, we have the Interaction Graph for the sequence

diagram SD1 from Figure 6-C. The IGExecutableNodesare shown as circles, the IGControlFlow edges as arrows,and the IGControlNodes as rounded boxes.IGCallNodes cn-m1 and cn-m2 in Figure 16-A rep-

resent the message calls for calcAngle() and move() inthe SD1 of Figure 16-C, respectively. The IGReplyNodesrn-m1 and rn-m2 represent the corresponding reply mes-sages for calcAngle() and move() in the same SD1,

Fig. 16. InteractionGraph: CDFG for application

respectively.The InteractionGraph for the entire application

is shown in Figure 16-B, where we have threeIGExecutableNodes cn-ig1, cn-ig2, and cn-ig3,which are associated by relation L to the correspondingInteractionGraphs of the sequence diagrams SD1,SD2, and SD3, respectively. The IGForkNode fk-m1and the IGJoinNode jn-m1 indicate that the threeInteractionGraphs are composed in parallel.

2) Capturing the Platform: Our Platform Manager gener-ates the Internal Platform Model (IPM) from a specificationof the platform resources. The platform specification is givenusing a textual DSL (Domain Specific Language) for the IPM.

A parser and an editor for the textual IPM’s DSL were au-tomatically generated using the Xtext feature of openArchitec-tureWare. From an EBNF specification, openArchitectureWareautomatically produces Eclipse plug-ins, which implement theparser and editor for a DSL. Listing 1 shows part of the EBNFfor the IPM’s DSL.

Listing 2 presents an example of IPM given in the textualDSL defined by the EBNF of Listing 1.

In the IPM example of Listing 2 we have two platformcomponents. The component Comp1 consists of a processorwith memory (component HwComp1) and an interface compo-nent InterfComp1, which can implement hardware to hard-ware and software to software communications. The specifiedprocessor has a functional unit fu1, which can implementoperations calcspeed and calcangle with latencies 2and 1, respectively (as indicated by the qos attributes). Thecomponent Comp2 defines a software component (SwComp1),which consists of an API with methods that also can im-plement the operations calcspeed and calcangle. Thisplatform information is read by the Platform Manager andpassed to the System Designer during the co-synthesis process.

P l a t f o r m : ’ p l a t f o r m ’ name=ID’{ ’ ( components +=Component ) ∗ ’} ’ ;

Component : ’ p l a t componen t ’ name=ID ’{ ’( hardwarecomps += HardwareComp )∗( sofwarecomps += SoftwareComp )∗( i n t e r f a c e c o m p s += I n t e r f a c e C o m p )∗( c o m p s e r v i c e s += ComponentServ ice )∗ ’} ’ ;

HardwareComp : ’ p l a t h a r d w a r e ’ name=ID ’{ ’( memories += MemoryComp )∗( p r o c e s s o r s += ProcessorComp )∗ ’} ’ ;

MemoryComp : ’ platmemory ’ name=ID ’{ ’( a t t r i b u t e s += A t t r i b u t e )∗ ’} ’ ;

A t t r i b u t e : name=ID ’= ’ Value ’ ; ’ ;

Value : v a l u e =STRING | v a l u e =INT | v a l u e =ID ;

ProcessorComp : ’ p l a t p r o c e s s o r ’ name=ID ’{ ’( a t t r i b u t e s += A t t r i b u t e )∗ ’} ’ ;

SoftwareComp : ’ p l a t s o f w a r e ’ name=ID ’{ ’( Oss += OSComp)∗ ( APIs += APIComp )∗ ’} ’ ;

OSComp : ’ pla tOS ’ name=ID ’{ ’ ( s y s c a l l s += S y s c a l l )∗ ’} ’ ;

Listing 1. Xtext grammar for the Platform DSL

p l a t f o r m TTA1 {p l a t c o m p o n e n t Comp1 {

p l a t h a r d w a r e HwComp1 {pla tmemory MemComp1{S i z e =4095; Width =32;}

p l a t p r o c e s s o r P r o c e s s o r 1 { v e r s i o n =1;FU fu1 {

s e r v i c e c a l c s p e e d{qos Delay{Value =2;}}s e r v i c e c a l c a n g l e{qos Delay{Value =1;}}

}RF r f 1 {

s e r v i c e move {qos Delay{Value =1;}}} . . .

} / / P r o c e s s o r 1p l a t i n t e r f a c e In te r fComp1 {

plathwhw HwHwInterf1 { Width =32;s e r v i c e r e a d {qos Delay{Value =1;}}

}p la t swsw SwSwInter f1 {Width =32;

s e r v i c e r e a d {qos Delay{Value =1;}}}

} / / I n t e r fComp1} / / HwComp1p l a t c o m p o n e n t Comp2 {

p l a t s o f w a r e SwComp1 {p l a t A P I APIComp1 {

method c a l S p e e d { i n 1 =0; i n 2 =0; o u t =0;s e r v i c e c a l c s p e e d{qos Delay{Value =3;}}}

method c a l A n g l e { i n 1 =0; i n 2 =0; o u t =0;s e r v i c e c a l c a n g l e{qos Delay{Value =2;}}}

method move{ i n =0 ; o u t =0 ;s e r v i c e move{qos Delay{Value =1;}}}

} /∗ APIComp1 ∗ /} /∗ SwComp1 ∗ / . . . }

} / / TTA1

Listing 2. Platform Model for the case study

3) Code Generation: Our Implementation Manager gen-erates the design artifacts for the final implementation de-termined by the System Designer. The code generation inthe Implementation Manager is implemented using the Xpandlanguage of openArchitectureWare.

By using Xpand templates, the Implementation Managerproduces HDL descriptions for the application parts mappedto hardware and programs for the application parts mapped tosoftware.

B. Design Space Exploration

1) Design Space Abstraction: Similarly to most DSE ap-proaches we explicitly define the design space as a mappingof graphs. However, differently from the usual approach,as presented in [38], which is a manual mapping betweensemantically defined graphs, our approach uses the categoricalgraph product, automatically generating the mapping betweengraphs. These graphs are free of any specific semantics fromthe view of the H-SPEX tool. In the following, we formalizethe design space abstraction, which is represented in the DSEMetamodel presented in Subsection V-G.

Consider G = 〈V,E, ∂0, ∂1〉 as a graph, where V is the setof all vertices of G; E is the set of all edges of G; ∂0 : E → Vis the source function of an edge; and ∂1 : E → V is thetarget function of an edge. Let S be the set of graphs, whereGi = 〈Ei, Ti, ∂0i , ∂1i〉 ⊂ S, i = {1 . . . n} and n is the numberof graphs in S. This set is formed of graphs, such as a taskgraph, an architectural graph, and the communication structureof buses, extracted from instances our internal metamodels.The specific semantics of each graph is not considered duringthe generation of the design space, for the purpose of designspace abstraction. The specific semantics of these graphs isconsidered in the exploration rules defined in a Mappingmodel. The design space is a graph D resulting from thecategorical graph product of the sequence of terms, which areall graphs in S. In this fashion, D = Gi × Gi+1 . . . × . . .Gn = (Vi× Vi+1 . . .× . . . Vn, Ei × Ei+1 . . .× . . . En, ∂0i ×∂0i+1

. . . × . . . ∂0n , ∂1i × ∂1i+1. . . × . . . ∂1n) represents the

graph product between Gi, Gi+1 . . . and Gn, where {∂ki×

∂ki+1. . .× . . . ∂kn

|k ∈ {0, 1}} are unambiguously induced bythe dot product between vertices and edges, considering thatany two vertices (ui, ui+1, . . . , un) and (vi, vi+1, . . . , vn) areadjacent in D, if and only if ui is adjacent with vi in Gi, ui+1

is adjacent with vi+1 in Gi+1 . . . and un is adjacent with vnin Gn, i = 1 . . . n− 1, where n is the number of graphs in S.

Each product of the sequence Gi × Gi+1 . . . × . . . Gn

that constitutes D represents a design activity, such as taskmapping, processor selection, processor allocation, voltagescaling selection, etc., such that vertices in D are designdecisions and edges in D are design alternatives available at aspecific vertex of D. The projection function pi = 〈pVi

, pEi〉 :

Gi×Gi+1 → Gi is defined and returns the graph Gi involvedin the product. Using this abstraction, a graph G is a sub-graphof D and represents a candidate design.

Using the categorical graph product as abstraction, DSEis performed for multiple design activities simultaneously, as

Des ignSpace s p e c i f i c T a s k M a p p i n g (D e s i g n D e c i s i o n v1 , D e s i g n D e c i s i o n v2 ,Des ignSpace inDes ignSpace ,

S t r i n g t a s k , S t r i n g p r o c e s s o r ) :l e t t 2 = v2 . g e t ( ’ ’TASK’ ’ ) :l e t p2 = v2 . g e t ( ’ ’PROCESSOR’ ’ )( ( t 2 == g e t T a s k ( t a s k ) ) &&

( p2 != g e t P r o c e s s o r ( p r o c e s s o r ) ) ?i n D e s i g n S p a c e . removeEdge ( v1 , v2 ) : n u l l −> t h i s ;

Listing 3. Sample of exploration rules

each product represents a design activity. Specific propertiesof this product, such as a restriction on the adjacencies, reducethe number of available alternatives, as the navigation on thedesign space is performed through the edges. Moreover, thisrepresentation overcomes the interdependence between designactivities, as one vertex in the design space represents multipledesign decisions at the same time. This abstraction alsoexposes the communication (dependencies) between elementsand is well suited to combine the communication in multiplehierarchies, such as classes, task, processors, and systems.

2) Design Space Exploration Rules: In our approach, ex-ploration rules are model-to-model transformation rules, whichfollow the Mapping Metamodel and are specified using theXtend language. These rules receive an instance of a uncon-strained DesignSpace as input and generate a constrainedDesignSpace instance as output. They are constraints toguide and prune the available design space, to reduce theexploration time, and to ensure the feasibility of a candidatesolution. The user of our DSE method is expected to definesome rules, which apply to his/her specific DSEProblem.However, to alleviate the user effort, a set of typical ruleswas implemented and is provided as a library to the user.As example, a rule named specificTaskMapping is illustratedin the Listing 3 and is applied when a Composite diagramsuch as the one in Figure 7 is specified. Other examples ofimplemented rules are:• Multiple Assignments of a Task: Avoids assigning the

same task to different processors.• Lower / Upper Performance / Power/ Memory / Com-

munication Value: Defines the lower or upper values forperformance, power, memory, or communication amountfor a task.

• Task Deadline Violation: Removes the candidate from thepopulation if there is a deadline violation.

• Specific Processor Selection: Defines the processor typethat must be selected to implement the candidate design.

• Specific Task Execution Frequency: Defines the fre-quency at which a processor must execute for a specifictask.

• Specific Task Mapping: Defines that a task must executein a specific processor.

3) Design Space Exploration Tool: The DSE method pre-sented in this work extends the H-SPEX tool [39], by im-plementing the design space abstraction method described insubsection VI-B1. We also implement other two optimization

algorithms, to improve the optimization step during candidategeneration: Crowding Population-based Ant Colony Optimiza-tion for Multi-Objective (CPACO-MO) [40] and Random.Actually, H-SPEX is not limited to these algorithms, andwe are planning to integrate this tool to some optimizationframework to improve the optimization support with analysisand graphical features. The optimization is observed as ablack-box transformation, which uses an API to communicateinformation between the transformation engine and the op-timization algorithm. In order to evaluate candidate designs,we use an extended version of SPEU [35], a static analysistool, which provides a very fast evaluation step, which is thebottleneck of the DSE process. However, any other evaluationtool could be used, since the evaluation tool and H-PSEX areintegrated by assigning the costs for a DSESolution in theDSE model. The DSESolution is then obtained by meansof model-to-model transformations or using the API generatedby the EMF tool from the DSE Metamodel.

C. Formal Verification Based on LTA

One of the important aspects in the design of embeddedsystems is to ensure that a given system really does what itis intended to do. Nowadays, with the growing complexity ofembedded systems, an exhaustive test of all possible systemexecutions, or of at least a set of representative ones, is animpractical or even impossible approach. An alternative totesting is mathematically proving correctness, by specifyingprecise models of the embedded system and formally verifyinglogical properties over these models.

An LTA is an extension of the classic finite-state automataconcept [6] and captures the behavior of a system by means ofstates and transitions between states, where timing constraintscan be associated to the transitions.

In our approach, by means of model transformations, wegenerate an LTA from each InteractionGraph and aset of InteractionGraphs will produce a network ofintercommunicating LTAs. By using model to code transfor-mations, we generate a textual representation for a networkof LTAs, which is submitted to the UPPAAL model checkingtool. UPPAAL is able to check for invariant properties, forexample if a given formula is valid at all reachable statesof the LTAs, and reachability properties, as if given statesare reachable or not during the execution of the network ofLTAs. The generated network of LTAs can also be simulatedby UPPAAL, allowing one to visualize specific sequence ofstate transitions of the specified system and thus to debugpossible specification errors.

1) Generating LTA from UML: From theInteractionGraph in Figure 16, we obtain a networkof timed automata, where we have a ltaProcessPWheelchair for the entire application and altaProcess for each sequence diagram. By usingthe Xpand language of the openArchitectureWare framework,we implemented model-to-code transformations that generate,from the LTA model, the textual input for the UPPAALmodel checker.

2) Verifying Properties: At this point, the designer canspecify logical properties using CTL formulae and use UP-PAAL to verify them. As examples, we may specify propertiesto check if the application model is deadlock-free (using theUPPAAL macro A[] not deadlock) and if eventually allprocesses corresponding to the sequence diagrams will beexecuted in parallel (using the CTL formula E<> startsd1and startsd2 and startsd3).

VII. CASE STUDY

Ilustrating our approach, this section presents a develop-ment scenario for a real-time embedded system dedicated tothe automation and control of an intelligent wheelchair thathelps people with special needs. This wheelchair has severalfunctions, such as movement control, collision avoidance,navigation, target pursuit, battery control, system supervision,task scheduling, and automatic movement.

Our flow starts by modeling the wheelchair system asprescribed in Section V-A. The UML model describes thewheelchair movement control, collision avoidance, and nav-igation Use Cases, which are essential to the system andincorporate critical hard real-time constraints. It also consistsof a Class model, 18 interaction diagrams, and one compositediagram. Some of the models were presented in the SectionV-A.

The UML model is used as input to our design flow. TheApplication Manager transforms the UML model into our IM.No user-defined Mapping is required, so we use only rulesfrom our exploration rule library, described in Section VI-B2.

The platform library provides software and hardware com-ponents to be reused during the implementation of an ap-plication. The components include mathematical functions tosolve control equations, algorithms for image filtering, a real-time communication API, and RTOS components. The libraryalso provides different architectures of a Java microcontroller,communication busses and hardware implementation of algo-rithms. All components are previously characterized. Softwarecomponents are simulated in the different microcontrollermicroarchitectures, in order to define their QoS. This platformwas previously defined using the Eclipse Editor generated forour Platform DSL, generating an instance of IPMM.

The System Designer coordinates the design automationtools, invoking the HSPEX tool to perform DSE. The modelfor the selected candidate is used for the transformation, whichgenerates the LTA Model as input for the UPPAAL tool forformal verification. After verification, the verified Implemen-tation Model is ready to be synthesized by the ImplementationManager. Examples for DSE, formal verification, and synthesisare provided in the next subsections.

A. A Design Space Exploration

In the automatic DSE process performed in this scenario,H-SPEX was configured to perform the following design tasks:• definition of which objects are active or passive

(runnables), among the 17 behaviors defined in the In-teraction Graphs;

Fig. 17. Normalized DSE results with five objectives: performance (+), power( . ), total memory (x), energy (*), and communication (o).

• mapping of the active objects to selected processors (upto 6 processors);

• allocation of the selected processors into a hierarchicalbus with two segments;

• processor voltage scaling with four distinct voltage levels.

Exploring all these activities simultaneously, H-SPEX wasconfigured to optimize the system in terms of performance(cycles), power (Watt), energy (Joules), total memory (bytes),and communication volume (bytes transmitted on the bus).

The candidate population was found after 5,000 evaluationsand represents the non-dominated set of candidate designs.Figure 17 illustrates these results. The best overall candidatemust be selected after a trade-off analysis between the obtainedestimations and based on some criteria, such as weights forthe optimized objectives, or any other design feature.

The design space in this case study contains 2,064 alterna-tive design decisions (vertices) and 334,080 edges, from whicha set containing up to 17 (maximum active task distribution)vertices must be selected to define a candidate design solution(subgraph). The unveiled design space presents more than5.89 × 1041 alternative designs, considering an unrestricteddesign space (fully connected graph). However, in this pro-posal, edges guide the available alternatives, and constraints,specified as model-to-model transformation rules, are locallyapplied between the current vertex and its neighbors, thuspruning the design space and speeding up the DSE process.

Let a task drawn from the wheelchair case study beidentified as T15, which implements a stereovision function(in Figure 18, T15 corresponds to the Correlation-based +Median Filters vertex), presenting heavy image processingalgorithms. Figure 7 shows a composite diagram specifyingthat H-SPEX must map Task 15 into the DSP processor P0,benefiting from the DSP processor architecture. The resultingexploration rule from the Mapping model is presented inListing 3.

Fig. 18. Task dependency graph for the wheelchair system.

Fig. 19. Effect of constraints: Sample of a partial design space graph

Let us consider a vertex from the design space graph be thetuple 〈T13, P1, C1, V 2〉, which specifies that task T13 mustbe mapped to processor P1, while P1 must be allocated tocommunication bus C1 and execute T13 with voltage level V2.There are 48 alternatives at this vertex. Figure 19 illustratesa partial graph, representing the design space at this vertex,which is located at the center. The shadowed vertices aroundthe vertex 〈T13, P1, C1, V 2〉 in the centre are pruned nodes,and the white nodes are the alternative designs that satisfy allconstraints.

Applying the structural constraints and the sample designconstraint here defined, the pruning process has reduced thedesign space by 83% on the specific vertex, avoiding wastingtime with unnecessary evaluations and unfeasible designs, thusfocusing the search for an adequate solution on the mostrelevant design points.

B. Functional Verification

After the selection of the candidate design after the DSEprocess, the System Designer performs the transformationfrom the InteractionGraphs in the Application Internalmodel into the LTA model, according to the partition defined inthe Implementation model. As example, consider the Sequencediagrams presented in Figure 6. From them we obtain thenetwork of timed automata shown in Figure 20.

For the sequence diagram SD1, we have: an LTAProcessPSD1 with 6 LTALocations, corresponding tothe IGNodes labeled Start-IG-SD1, cn-m1,cn-m2, rn-m1, rn-m2, and end-IG-SD1; and 5

LTATransitions, corresponding to the IGEdges labelede1, e2, e3, e4, and e5.

We also have an LTAProcess PWheelchair for theentire application. Thus, the diagram for the LTA model isvery similar to the one for the InteractionGraphs modelpresented in Figure 16.

By using the Xpand language of the openArchitectureWareframework, we implemented model-to-text transformationsthat generate, from the LTA model, the textual input for theUPPAAL model checker.

At this point, the designer can specify logical propertiesusing CTL formulae and use UPPAAL to verify them. Wehave specified properties to check: if the application model isdeadlock-free (using the UPPAAL macro A[] not deadlock);and if eventually all processes corresponding to the sequencediagrams will be executed in parallel (using the CTL formulaE<> startsd1 and startsd2 and startsd3).

C. Code Generation and Synthesis

In our approach, the code generation strategy is based ontemplates. The generation tool uses the EMF API to obtaininformation from the Implementation Model and to completethese templates, which are specified using the Xpand languagefrom openArchitectureWare.

The code generator uses different templates, according tothe specified resource in the Implementation Model. In thisway, communicating tasks allocated to different processorsimply the generation of specific communication directivesand/or interconnection components. Likewise, the allocationof various active tasks to the same processor implies thegeneration of scheduler services, as well as of real-timedirectives on each active task to specify its activation pattern.Listing 4 shows part of the software source code that ourtool generates for the MovementController class, whichincludes objects responsible for controlling the wheelchairmovement.

The software source code contains two important methodsfor a RealtimeThread subclass: mainTask() (lines 18-23) and exceptionTask() (lines 24-26). The former rep-resents the task body, i.e. the code executed when the taskis activated. This is a periodic task, for which the periodicactivation is implemented as a loop with execution frequencybeing controlled by calling the waitForNextPeriod()method. This method uses the task release parameters to inter-act with the scheduler and to control the correct execution ofthe method. The exceptionTask() method represents theexception handling code that is triggered if the mainTask()method does not finish until the established deadline. We usethe Java API for real-time specification described in [41].

Besides the software source code generation, the design flowis also automated by a set of generated scripts, which configureand execute compilers, synthesis tools, and simulators for thegenerated and assembled components of the ImplementationModel. Thus, to perform the entire design flow, a designer canexecute a script, such that all design process phases, including

Fig. 20. Network of LTA in UPPAAL for InteractionGraphs in the IAM.

01 p u b l i c c l a s s Movemen tCon t ro l l e r e x t e n d sR e a l t i m e T h r e a d

02 {03 p r i v a t e s t a t i c P r i o r i t y P a r a m e t e r s04 schedParams = new P r i o r i t y P a r a m e t e r s (05 P r i o r i t y S c h e d u l e r . g e t M a x P r i o r i t y ( )−3) ;06 p r i v a t e s t a t i c P e r i o d i c P a r a m e t e r s07 r e l e a s e P a r a m s = new P e r i o d i c P a r a m e t e r s (08 n u l l , / / s t a r t t i m e09 n u l l , / / end t i m e10 TimeObjec t s . 10 ms , / / p e r i o d11 TimeObjec t s . 4 200 ms , / / c o s t12 TimeObjec t s . 10 ms ) ; / / d e a d l i n e13 p u b l i c s t a t i c MovementActuator14 movementActua tor = new MovementActuator ( ) ;15 p r i v a t e i n t m Las tVal idSpeedValue = 0 ;16 p r i v a t e i n t m Las tVal idAngleValue = 0 ;1718 p u b l i c vo id mainTask ( ) {19 w h i l e ( i s R u n n i n g ( ) ) {20 / / . . . movement c o n t r o l code21 w a i t F o r N e x t P e r i o d ( ) ;22 }23 }24 p u b l i c vo id e x c e p t i o n T a s k ( ) {25 / / d e a d l i n e mi s s h a n d l i n g code26 }27 / / code c o n t i n u e . . .28 } ;

Listing 4. Generated source code for MovementController class

automated exploration, compilation, synthesis, simulation, anddeployment, will be performed.

VIII. FINAL REMARKS

The pressure to reduce the time-to-market and the evergrowing design difficulties require new research efforts toadopt languages with high abstraction level or/and new ap-proaches to cope with that. Model Driven Engineering (MDE)is the current betting to raise the design abstraction level andto provide mechanisms to improve the portability, interoper-ability, maintainability, and reusability of models. In addition,MDE helps to abstract platform complexity and to representdifferent concerns of the system.

This paper presented a introduction to MDE applied for thedevelopment of complex embedded systems. A brief history

of MDE was presented and the main concepts, namely, mod-els, meta-models and transformation between models, wereintroduced.

The technological framework supporting MDE relies onstandards. Three of them were identified and described,namely MDA, MIC, and Software Factories. Moreover, themost adopted languages, tools and technologies for MDE werepresented.

A short survey on domain specific engineering tools forembedded systems was presented. The paper also describedin detail the MDE framework for embedded systems, indevelopment at UFRGS.

In that approach, named MoDES, the MDE fundamentalnotion of transformation between models is used to generate,from a UML model of an application consisting of classand sequence diagrams, an internal representation model tobe used by formal verification and co-synthesis tools. Theobtained model captures structural aspects of an applicationmodel by using a hierarchy of modules and processes, as wellas behavioral aspects by means of a CDFG model.

A new design space abstraction based on the categoricalgraph product was also proposed. This abstraction overcomesthe challenge to deal with interdependencies between designactivities and provides a flexible representation for multipledesign activities. A DSE metamodel was defined, so that thedesign space can be easily handled by MDE transformationengines using their transformation rules. These rules are usedto implement design constraints that prune the design spaceand generate the candidate design, thus improving DSE re-sults. Moreover, UML/MARTE models are used to generatethe additional transformation rules, which remove unfeasibledesigns from the design space, thus saving time that would bespent with unnecessary evaluations.

Observing the history of MDE for embedded systems, itis possible to identify some trends. The first application ofMDE for embedded systems consisted in using model-to-model transformation to integrate tools by transforming theoutput from a tool to another, usually integrating existing co-

design tools. Afterwards, domain specific metamodels wereproposed to capture the heterogeneous nature of embeddedsystems and syntactic transformations were used to generatesystems from these metamodels, as in the Gaspard framework.The next steps are the development of smart generators whichuse transformations based on the semantics of elements, suchas GenERTiCA. Additional improvements can be achieved,by including domain expertise in model-to-model transforma-tions, such as the design space exploration methodology usingMDE concepts, implemented by H-SPEX.

ACKNOWLEDGMENT

This work was partially supported by CNPq.

REFERENCES

[1] P. Marwedel, Embedded System Design, 1st ed. Boston, USA:Kluwer Academic Publishers, Oct. 2003. [Online]. Available:http://www.worldcat.org/isbn/1402076908

[2] B. Selic, “The pragmatics of model-driven development,” Software,IEEE, vol. 20, no. 5, pp. 19–25, 2003. [Online]. Available:http://dx.doi.org/10.1109/MS.2003.1231146

[3] D. C. Schmidt, “Guest Editor’s Introduction: Model-DrivenEngineering,” Computer, vol. 39, no. 2, pp. 25–31, Feb. 2006.[Online]. Available: http://dx.doi.org/10.1109/MC.2006.58

[4] S. Kent, “Model Driven Engineering,” in Proceedings of the ThirdInternational Conference on Integrated Formal Methods, ser. IFM’02. London, UK, UK: Springer-Verlag, 2002, pp. 286–298. [Online].Available: http://portal.acm.org/citation.cfm?id=743552

[5] OMG, “Meta Object Facility (MOF) Core Specification Version 2.4,”OMG, Tech. Rep. December, 2010.

[6] R. Alur and D. L. Dill, “A theory of timed automata,” TheoreticalComputer Science, vol. 126, no. 2, pp. 183–235, Apr. 1994. [Online].Available: http://dx.doi.org/10.1016/0304-3975(94)90010-8

[7] K. G. Larsen, P. Pettersson, and W. Yi, “Uppaal in a nutshell,”International Journal on Software Tools for Technology Transfer(STTT), vol. 1, no. 1-2, pp. 134–152, Dec. 1997. [Online]. Available:http://dx.doi.org/10.1007/s100090050010

[8] S. Edwards, L. Lavagno, E. A. Lee, and A. Sangiovanni-Vincentelli,“Design of embedded systems: formal models, validation, andsynthesis,” Proceedings of the IEEE, vol. 85, no. 3, pp. 366–390, Mar.1997. [Online]. Available: http://dx.doi.org/10.1109/5.558710

[9] P. M. Weichsel, “The Kronecker Product of Graphs,” Proceedings ofthe American Mathematical Society, vol. 13, no. 1, 1962. [Online].Available: http://dx.doi.org/10.2307/2033769

[10] J. Bezivin, “On the unification power of models,” Software and SystemsModelling, vol. 4, no. 2, pp. 171–188, May 2005.

[11] J. Rothenberg, “The nature of modeling,” in AI, Simulation, andModeling. John WIley and Sons, 1989, pp. 75–92. [Online]. Available:http://www.rand.org/pubs/notes/2007/N3027.pdf

[12] D. Gasevic, D. Djuric, and V. Devedzic, Model Driven Engineering andOntology Development. Berlin, Heidelberg: Springer Berlin Heidelberg,2009. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-00282-3

[13] K. Czarnecki and S. Helsen, “Feature-based survey of modeltransformation approaches,” IBM Syst. J., vol. 45, no. 3, pp. 621–645,Jul. 2006. [Online]. Available: http://dx.doi.org/10.1147/sj.453.0621

[14] R. France and B. Rumpe, “Model-driven Development of ComplexSoftware: A Research Roadmap,” in 2007 Future of SoftwareEngineering, ser. FOSE ’07. Washington, DC, USA: IEEEComputer Society, May 2007, pp. 37–54. [Online]. Available:http://dx.doi.org/10.1109/FOSE.2007.14

[15] J. Sztipanovits and G. Karsai, “Model-integrated computing,” Computer,vol. 30, no. 4, pp. 110–111, Apr. 1997. [Online]. Available:http://dx.doi.org/10.1109/2.585163

[16] J. Greenfield, “Software factories: assembling applications with patterns,models, frameworks and tools,” pp. 16–27, 2004. [Online]. Available:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.5929

[17] F. Jouault and J. Bezivin, “KM3: a DSL for metamodel specification,” inProceedings of the 8th IFIP WG 6.1 international conference on FormalMethods for Open Object-Based Distributed Systems, ser. FMOODS’06,vol. 4037. Berlin, Heidelberg: Springer-Verlag, 2006, pp. 171–185.[Online]. Available: http://dx.doi.org/10.1007/11768869 14

[18] M. F. van Amstel, C. F. J. Lange, and M. G. J. van den Brand, “Metricsfor Analyzing the Quality of Model Transformations,” 2008.

[19] S. J. Mellor and M. Balcer, Executable UML: A Foundationfor Model-Driven Architectures. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2002. [Online]. Available:http://portal.acm.org/citation.cfm?id=545976

[20] T. Schattkowsky, W. Mueller, and A. Rettberg, “A Generic ModelExecution Platform for the Design of Hardware and Software,” in UMLfor SOC Design, G. Martin and W. Muller, Eds. Springer US, 2005, pp.63–88. [Online]. Available: http://dx.doi.org/10.1007/0-387-25745-4 4

[21] M. A. Wehrmeister, J. G. Packer, and L. M. Ceron, “Support forearly verification of embedded real-time systems through UML modelssimulation,” SIGOPS Oper. Syst. Rev., vol. 46, no. 1, pp. 73–81, Feb.2012. [Online]. Available: http://dx.doi.org/10.1145/2146382.2146396

[22] P. Boulet, J. L. Dekeyser, C. Dumoulin, and P. Marquet, “MDAfor SoC Design, Intensive Signal Processing Experiment,” inFDL. ECSI, 2003, pp. 309–317. [Online]. Available: http://dblp.uni-trier.de/rec/bibtex/conf/fdl/BouletDDM03

[23] L. Bonde, C. Dumoulin, and J.-L. Dekeyser, “Metamodels andMDA Transformations for Embedded Systems,” in Advances inDesign and Specification Languages for SoCs, P. Boulet, Ed.Boston: Springer US, 2005, ch. 8, pp. 89–105. [Online]. Available:http://dx.doi.org/10.1007/0-387-26151-6 8

[24] E. Piel, R. B. Atitallah, P. Marquet, S. Meftali, S. Niar, A. Etien, J. L.Dekeyser, and P. Boulet, “Gaspard2: from marte to systemc simulation,”in DATE’08 Workshop on Modeling and Analysis of Real-Time andEmbedded Systems with the MARTE UML profile, 2008.

[25] M. A. Wehrmeister, E. P. Freitas, C. E. Pereira, and F. Rammig,“GenERTiCA: A Tool for Code Generation and Aspects Weaving,” inProceedings of the 2008 11th IEEE Symposium on Object OrientedReal-Time Distributed Computing, ser. ISORC ’08. Washington, DC,USA: IEEE Computer Society, 2008, pp. 234–238. [Online]. Available:http://dx.doi.org/10.1109/ISORC.2008.67

[26] F. Balarin, Y. Watanabe, H. Hsieh, L. Lavagno, C. Passerone,and A. Sangiovanni-Vincentelli, “Metropolis: an integrated electronicsystem design environment,” Computer, vol. 36, no. 4, pp. 45–52, Apr.2003. [Online]. Available: http://dx.doi.org/10.1109/MC.2003.1193228

[27] A. Sangiovanni-Vincentelli and G. Martin, “Platform-based design andsoftware design methodology for embedded systems,” IEEE Design &Test of Computers, vol. 18, no. 6, pp. 23–33, 2001.

[28] R. Chen, M. Sgroi, L. Lavagno, G. Martin, A. S. Vincentelli, andJ. Rabaey, UML and platform-based design. Norwell, MA, USA:Kluwer Academic Publishers, 2003, pp. 107–126. [Online]. Available:http://portal.acm.org/citation.cfm?id=886350

[29] T. Kangas, P. Kukkala, H. Orsila, E. Salminen, M. Hannikainen,T. D. Hamalainen, J. Riihimaki, and K. Kuusilinna, “UML-basedmultiprocessor SoC design framework,” ACM Trans. Embed. Comput.Syst., vol. 5, no. 2, pp. 281–320, May 2006. [Online]. Available:http://dx.doi.org/10.1145/1151074.1151077

[30] A. Bakshi, V. K. Prasanna, and A. Ledeczi, “MILAN: A ModelBased Integrated Simulation Framework for Design of EmbeddedSystems,” in Proceedings of the ACM SIGPLAN workshop onLanguages, compilers and tools for embedded systems, ser. LCTES ’01.New York, NY, USA: ACM, 2001, pp. 82–93. [Online]. Available:http://dx.doi.org/10.1145/384197.384210

[31] S. Neema, J. Sztipanovits, G. Karsai, and K. Butts, “Constraint-Based Design-Space Exploration and Model Synthesis,” in EmbeddedSoftware, ser. Lecture Notes in Computer Science, R. Alur and I. Lee,Eds. Berlin, Heidelberg: Springer Berlin / Heidelberg, 2003, vol. 2855,ch. 19, pp. 290–305. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-45212-6 19

[32] S. Mohanty and V. K. Prasanna, “Rapid system-level performanceevaluation and optimization for application mapping ontoSoC architectures,” 2002, pp. 160–167. [Online]. Available:http://dx.doi.org/10.1109/ASIC.2002.1158049

[33] F. A. M. Nascimento, M. F. S. Oliveira, and F. R. Wagner, “ModES:Embedded Systems Design Methodology and Tools based on MDE,”in Model-Based Methodologies for Pervasive and Embedded Software,

2007. MOMPES ’07. Fourth International Workshop on, Mar. 2007, pp.67–76. [Online]. Available: http://dx.doi.org/10.1109/MOMPES.2007.14

[34] M. F. S. Oliveira, E. W. Bri ao, F. A. M. Nascimento, and F. R. Wagner,“Model driven engineering for MPSOC design space exploration,” inProceedings of the 20th annual conference on Integrated circuits andsystems design, ser. SBCCI ’07. New York, NY, USA: ACM, 2007, pp.81–86. [Online]. Available: http://dx.doi.org/10.1145/1284480.1284509

[35] M. F. S. Oliveira, L. B. de Brisolara, L. Carro, and F. R.Wagner, “Early Embedded Software Design Space Exploration UsingUML-Based Estimation,” in Proceedings of the Seventeenth IEEEInternational Workshop on Rapid System Prototyping. Washington,DC, USA: IEEE Computer Society, 2006, pp. 24–32. [Online].Available: http://portal.acm.org/citation.cfm?id=1136925

[36] R. S. Scowen, “Extended BNF - A Generic BaseStandard,” in Proceedings of the 1993 Software EngineeringStandards Symposium (SESS’93), Aug. 1993. [Online]. Available:http://www.cl.cam.ac.uk/˜mgk25/iso-14977-paper.pdf

[37] M. F. S. Oliveira, F. A. M. Nascimento, W. Mueller, and F. R. Wagner,“Design space abstraction and metamodeling for embedded systemsdesign space exploration,” in Proceedings of the 7th InternationalWorkshop on Model-Based Methodologies for Pervasive and EmbeddedSoftware, ser. MOMPES ’10. New York, NY, USA: ACM, 2010, pp.29–36. [Online]. Available: http://dx.doi.org/10.1145/1865875.1865880

[38] T. Blickle, J. Teich, and L. Thiele, “System-Level SynthesisUsing Evolutionary Algorithms,” Design Automation for EmbeddedSystems, vol. 3, no. 1, pp. 23–58, Jan. 1998. [Online]. Available:http://dx.doi.org/10.1023/A:1008899229802

[39] M. F. S. Oliveira, E. W. Briao, F. A. Nascimento, and F. R. Wagner,“Model driven engineering for MPSOC design space exploration,”Journal of Integrated Circuits and Systems, vol. 3, no. 1, pp. 13–22,2008.

[40] D. Angus, “Crowding Population-based Ant Colony Optimisationfor the Multi-objective Travelling Salesman Problem,” in IEEESymposium on Computational Intelligence in Multi-CriteriaDecision-Making, Apr. 2007, pp. 333–340. [Online]. Available:http://dx.doi.org/10.1109/MCDM.2007.369110

[41] M. A. Wehrmeister, L. B. Becker, F. R. Wagner, and C. E. Pereira,“An Object-Oriented Platform-based Design Process for EmbeddedReal-Time Systems,” in Proceedings of the Eighth IEEE InternationalSymposium on Object-Oriented Real-Time Distributed Computing, ser.ISORC ’05. Washington, DC, USA: IEEE Computer Society, 2005, pp.125–128. [Online]. Available: http://dx.doi.org/10.1109/ISORC.2005.13

Introducao ao desenvolvimento de softwareembarcado

Alexandra da Costa Pinto de AguiarPUCRS - PPGCC

Email:[email protected]

Felipe G. de MagalhaesPUCRS - PPGCC

Email: [email protected]

Sergio J. FilhoPUCRS - PPGCC


Fabiano HesselPUCRS - PPGCC


Oliver LonghiPUCRS - PPGCC


Resumo—Sistemas embarcados estao presentes na vida damaioria das pessoas e a tendencia e que cada vez mais essesdispositivos sejam essenciais para o nosso dia-a-dia. Ao longodos anos, a alta convergencia dos sistemas resultou em um con-stante acrescimo de funcionalidades nos dispositivos embarcados,principalmente os da industria de comunicacao e entretenimento.Nesse contexto, o desenvolvimento de sistemas embarcados, ondeo software era projetado especificamente para uma aplicacao temdado lugar ao desenvolvimento baseado em plataformas, ondeo software atrelado a um sistema operacional vem ganhandodestaque. O principal problema em se aumentar a relevancia dosoftware reside no atendimento aos requisitos tıpicos dos sistemasembarcados, que mesmo com um acrescimo nas funcionalidadesainda possuem limitacoes quanto ao tamanho do codigo, consumode energia, alem de restricoes temporais para determinadasaplicacoes. Assim, ferramentas de auxılio ao desenvolvimentode software embarcado tem sido cada vez mais objeto deestudo. Este minicurso apresenta conceitos introdutorios sobreo desenvolvimento de software embarcado alem de fundamentos,definicoes e, em um carater teorico-pratico, exemplos praticosde desenvolvimento embarcado utilizando a plataforma Hellfire,seguido pelos desafios e oportunidades na area.

I. INTRODUCAO

Nos ultimos anos, os sistemas embarcados tem aumentadosua importancia na vida das pessoas. Esses sistemas podemser encontrados nas mais diversas areas, em equipamen-tos medicos, na industria automotiva e em dispositivos deentretenimento. Ainda, entre as aplicacoes tradicionalmenteconhecidas como sendo embarcadas podem ser citadas asespaciais, os sistemas de navegacao, de aviacao e robotica.

Inumeras vezes, para que aplicacoes embarcadas possamser utilizadas em diferentes setores e necessario que estejamadaptadas a uma serie de restricoes impostas pelo ambienteou pela situacao. Dentre as principais limitacoes destacam-setamanho final do dispositivo, consumo limitado de energia,tempo de funcionamento irrestrito, controle da geracao decalor, imunidade a interferencias e impactos, entre outros.

Alem de restricoes que podem ser consideradas fısicas,muitas aplicacoes embarcadas possuem uma caracterıstica emcomum: uma falha temporal em qualquer ponto do sistemapode causar sua inutilidade (total ou parcial), grandes perdasfinanceiras ou ate mesmo desastres de grandes proporcoes.

Sistemas onde existem tais restricoes de tempo sao chamadosde Sistemas de Tempo Real (do ingles, Real-time Systems,RTS). Em seu contexto, espera-se que uma resposta ou oresultado de uma operacao seja fornecido em um tempo pre-determinado e previsıvel, ou seja, uma operacao e consideradacorreta somente se, alem de seu resultado logico esperado, elafor finalizada ate o tempo previsto.

Por conseguinte, e comum que RTSs utilizem um SistemaOperacional (do ingles, Operating System, OS) especıfico,denominado de Sistema Operacional de Tempo Real (doingles, Real-Time Operating System - RTOS) responsavelpor viabilizar seu gerenciamento. Alem de prover funcional-idades comuns a OSs, como temporizadores e gerencia-mento de interrupcoes e tarefas, uma das principais funcoesdos RTOSs e garantir que a execucao de uma determinadatarefa ocorra conforme a restricao temporal atribuıda a ela[1]. Assim, o escalonador de tarefas de um RTOS nao eresponsavel apenas por gerenciar a ordem de execucao, mastambem por assegurar o cumprimento das restricoes temporaisde um determinado conjunto de tarefas. Esse mecanismo eimplementado com base em uma das diversas polıticas deescalonamento existentes e que se adapte melhor ao tipo deaplicacao alvo.

A complexidade computacional e as restricoes de desem-penho em aplicacoes embarcadas tem aumentado nos ultimosanos o que traz a tona a consequente necessidade por capaci-dade de processamento cada vez maior. Ainda, o tempo queesses dispositivos tem ate chegar ao mercado e cada vez maisreduzido, diminuindo o chamado time-to-market.

Como principal consequencia desse aumentode exigencias e restricoes, tem-se a crescentecomplexidade do projeto de sistemas embarcados,principalmente considerando as metodologias existentes[2]. Assim, o uso de plataformas altamente configuraveis,alem do aumento no nıvel de abstracao das especificacoes,sao propostas para lidar com esses desafios durante odesenvolvimento de sistemas complexos [3].

Ainda, alem de poder apresentar restricoes temporais, edesejavel, por questoes de custo e desempenho, que sistemasembarcados sejam implementados atraves da integracao de

todos os componentes necessarios para sua execucao emum unico chip, formando um sistema comumente chamadode System-on-Chip (SoC). Um SoC permite a utilizacao decomponentes he-terogeneos, tais como CPUs, memorias ebarramentos, entre outros. Alem disso, e possıvel que um unicoSoC seja formado por mais de um elemento de processamento(EP). SoCs que empregam varios elementos de processamentoem um unico chip sao denominados de SoC multiprocessadosou MPSoC (do ingles, Multiprocessor System-on-Chip).

Dentro desse contexto, e importante observar que algumascaracterısticas mais genericas de sistemas multiprocessadoscomputacionais de proposito geral podem ser observadas nosMPSoCs, incluindo diversos desafios antes vistos somentenesses sistemas. Destacam-se a dificuldade de programacaoparalela, a necessidade por balanceamento de carga, o mel-hor aproveitamento das unidades de processamento alem demecanismos de comunicacao eficientes.

A. Motivacao

Ao longo dos anos sistemas uniprocessados costumavamser empregados em grande escala, tanto na industria quantona academia. Porem, ao mesmo tempo em que apresentavamcada vez mais funcionalidades e maior desempenho, devido acrescente melhoria nos processos de fabricacao e implementa-cao de circuitos integrados, seu consumo de energia cres-cia proporcionalmente. Assim, nos ultimos anos, uma novaabordagem vem sendo utilizada, consistindo na construcao desistemas que utilizem multiplos elementos de processamentointegrados, operando a uma frequencia menor.

Assim, sistemas embarcados multiprocessados (do ingles,Multiprocessed System-on-Chip, MPSoC) estao presentes emgrande parte das aplicacoes que eram, tradicionalmente, geren-ciadas por sistemas uniprocessados. O emprego de sistemasMPSoC e cada vez maior, tanto em ambientes academicosquanto na industria, sua vasta utilizacao ja tornou-se realidadeao longo dos ultimos anos [4].

A implementacao de multiplos elementos de processamentooperando a uma frequencia menor em um unico chip facilita asolucao de diversos problemas, tais como consumo de energia,desempenho e paralelismo, enquanto introduz novos desafios,devido ao aumento da complexidade arquitetural. Entre osprincipais desafios estao a programabilidade, o gerenciamento,a otimizacao e adaptacao de tais sistemas a aplicacoes comcaracterısticas dinamicas e de tempo real.

Uma forma de lidar com esses fatores e se utilizar modelos eferramentas que permitam a simulacao do ambiente em nıveisde abstracao superiores. Dessa forma, mais possibilidadespodem ser avaliadas em tempo de projeto, alem de se pos-sibilitar uma reducao no esforco necessario para a conclusaodo sistema.

Ainda, tipicamente, MPSoCs sao formados por poucoselementos de processamento, de medio poder computacional,onde a aplicacao e definida e tem suas tarefas posicionadas emtempo de projeto. Aplicacoes atuais, no entanto, tendem a tergrande complexidade e apresentar carga variavel nos elemen-tos de processamento ao longo do tempo de vida do sistema

[5]. Alem disso, precisam ter atendidos seus requisitos detempo real, especificados em tempo de projeto.

Por fim, todas essas questoes sao fatores motivadores paraque novas plataformas de auxılio ao desenvolvimento dosoftware embarcado sejam pes-quisadas e estudadas, o quee feito ao longo deste trabalho.

B. Organizacao do Texto

No que concerne a organizacao do texto, e possıvel ob-servar a divisao do capıtulo em duas partes principais: afundamentacao teorica e o estudo de caso da plataformaHellfire.

Na fundamentacao teorica, sao detalhados os itensnecessarios a compreensao de uma plataforma embarcada noseu todo e das restricoes geralmente impostas nesses sistemas.Itens como definicao e classificacao de sistemas embarcadossao amplamente discutidos, dando-se uma enfase especial aoprincipal componente do software embarcado: o Sistema Op-eracional de Tempo Real. Entao, sao mostrados os principaisfatores que diferenciam os sistemas de tempo real dos deproposito geral e de melhor esforco. O escalonamento dastarefas e algoritmos empregados pelo escalonador do sistematambem sao detalhados.

Ja o estudo de caso apresenta, de maneira mais extensa,a plataforma Hellfire, incluindo seus diversos componentese exemplos praticos de uso, seguido pelas consideracoesfinais do trabalho. A plataforma Hellfire e composta por umsimulador do tipo ISS (do ingles, Instruction Set Simulator),que possibilita a simulacao de ate 256 processadores. Umframework voltado pra web foi desenvolvido para facilitar aconfiguracao do principal componente da plataforma Hellfire:o HellfireOS. Esse OS pode ser personalizado em diversositens visando a otimizacao do projeto final, sendo isso facil-itado pela integracao do simulador e do OS em um mesmoframework.

II. FUNDAMENTACAO TEORICA

Para facilitar a compreensao do leitor, esta Secao possuitres divisoes principais: sistemas embarcados, plataformas dedesenvolvimento e software embarcado. Sao apresentados con-ceitos e informacoes de carater basico pertinentes a cada umdesses topicos, sendo que o leitor experiente nesses assuntospode concentrar sua leitura na proxima Secao.

A. Sistemas Embarcados

Os sistemas embarcados estao em toda parte e presentesno cotidiano das pessoas. Essa afirmacao vem se tornandocada vez mais verdadeira e absoluta, uma vez que a ocupacaodos sistemas embarcados vem crescendo em um ritmo bas-tante acelerado devido aos avancos tecnologicos. Dentre essesavancos, destaca-se o grande poder de miniaturizacao providopelas tecnologias inovadoras de fabricacao de chips [6].

Para que se possa conceituar os sistemas embarcados epreciso diferenciar os tipos de computacao. Existem computa-dores de proposito geral e de proposito especıfico. Como apropria nomenclatura sugere, os computadores de proposito

geral sao projetados e desenvolvidos para que qualquer tipode tarefa possa ser executada nele com algum desempenhomınimo garantido. Ja os computadores de proposito especıfico,tambem, como a nomenclatura esclarece, sao pensados eimplementados para um fim especıfico, ou seja, para aqueleobjetivo, possivelmente seu desempenho (do ponto de vistacomputacional e energetico) seja maior do que o oferecidopor um computador de proposito geral.

Classicamente, os sistemas de proposito especıficoencontravam-se embarcados em algum outro sistemamaior, originando o termo de sistemas embarcados.Atualmente, o termo sofreu alguma mudanca na suaconceituacao principalmente em funcao da grande tendenciade convergencia entre dispositivos de entretenimento quepossuem diversas funcionalidades e podem ser consideradoshıbridos.

Mesmo assim, diferente dos computadores de propositogeral como PCs (do ingles, Personal Computer), os Sis-temas Embarcados (SE) realizam um conjunto de tarefaspre-definidas e, geralmente, contem requisitos especıficos aserem atendidos. Em grande parte dos casos, esses sistemaspossuem restricoes bastante severas quanto ao tamanho fısicodo dispositivo final, ao custo e ao consumo de energia - itensque tendem a aumentar ainda mais a complexidade do seuprojeto.

Outro aspecto acerca desses sistemas, nao menos impor-tante, diz respeito a grande pressao mercadologica imposta asempresas, levando a necessidade de projetos cuja realizacaoseja cada vez mais eficiente. Isso significa que os novossistemas devem ser desenvolvidos dentro de poucos meses,alem de ter seu retorno financeiro garantido em perıodos igual-mente restritos [7]. Desse modo, para que a implementacao deum sistema embarcado possa ser bem sucedida, respeitandoessas restricoes impostas pelo intrınseco e curto time-to-marketdisponıvel atualmente, e essencial a realizacao de um projetoque compreenda o sistema como um todo.

Novas metodologias vem sendo desenvolvidas com o obje-tivo de aumentar a produtividade dos projetistas. Dentre essas,destaca-se a desenvolvida por [8] que se baseia no emprego deuma sequencia de tecnicas - as de abstracao e de aglomeracao- adotada ao longo dos anos, que visa alcancar esse objetivo.

A tecnica de abstracao preve a descricao de um determinadoobjeto atraves do emprego de um modelo, no qual, alguns de-talhes de baixo nıvel podem ser ignorados. Ja, a aglomeracaofaz a juncao de um conjunto de modelos que pertencem aomesmo nıvel de abstracao para conceber um novo tipo deobjeto, o qual normalmente possui novas propriedades que naofazem parte dos modelos isolados que o constitui. Atraves daaplicacao dessas duas tecnicas de forma sucessiva, a eletronicadigital fez com que o projeto dos sistemas evoluısse dedesenhos de layouts, a esquematicos de transistores, paranetlists de portas logicas, e por fim, ate o nıvel de transferenciaentre registradores (do ingles, Register Transfer Level - RTL),conforme pode ser observado na Figura 1.

O emprego de plataformas tambem e importante para ouso eficiente das tecnicas de abstracao e aglomeracao. Uma

Fig. 1. Abstracoes e aglomeracoes no projeto de hardware

Fonte: Adaptado de [8]

plataforma e um modelo unico abstrato que esconde os detal-hes de diferentes implementacoes possıveis, a exemplo de umaglomerado de componentes de mais baixo nıvel de abstracao[8]. O uso de plataformas permite que os custos de projetoe fabricacao possam ser divididos entre uma gama maior depotenciais usuarios, sendo que tal fato nao aconteceria se umprojeto unico fosse desenvolvido para cada produto [8].

Um novo e mais alto nıvel de abstracao vem surgindo,nos ultimos anos, em reposta a crescente complexidade doprojeto dos circuitos integrados. Nesse nıvel os objetos po-dem ser descricoes funcionais de comportamentos complexosou especificacoes arquiteturais de plataformas completas dehardware. A relacao entre os elementos de uma plataformae uma aplicacao e conhecida como mapeamento. Em umnıvel sistemico, o mapeamento e feito entre objetos funcionaise elementos de plataformas e associa-se um comportamentofuncional a um elemento arquitetural que pode implementaresse comportamento.

Segundo [8], o mapeamento em nıvel sistemico operasobre objetos heterogeneos e, tambem, permite a separacaode aspectos diferentes e ortogonais, tais como:

• computacao e comunicacao: importante porque o refi-namento da com-putacao e geralmente feito de formamanual ou por compilacao e escalo-namento, enquantoque a comunicacao faz uso de padroes;

• implementacao de plataforma e de aplicacao: frequente-mente definidos e projetados por empresas ou gruposdiferentes de forma independente, e;

• comportamento e desempenho: devem ser separadosporque as informa-coes de desempenho podem ou naorepresentar requisitos nao funcionais ou resultar em umaescolha de implementacao.

Todas essas separacoes acarretam em um reuso melhor umavez que desacoplam aspectos independentes permitindo umareducao no tempo de projeto e aumentando a produtividadeatraves da reducao do tempo necessario para a verificacao dosistema.

1) Sistemas de Tempo Real em Sistemas Embarcados:Conforme observado, os sistemas embarcados visam, em geral,resolver problemas bastante especıficos alem de, em muitos

casos, nao ser diretamente percebidos pelo usuario. Nesse con-texto, pode ser exemplificado um tipo diferente de aplicacoesque possuem requisitos de respostas em tempos determinadose nao suportam falhas: os chamados Sistemas de Tempo Real(do ingles, Real-time Systems - RTS).

Os RTSs podem ser definidos como sistemas computa-cionais que interagem fisicamente com o mundo real, alemde possuırem requisitos de tempo nessas interacoes [1]. Tipi-camente, a interacao com o mundo real e realizada atraves desensores e atuadores em vez de utilizar o par teclado e monitorcomum nos computadores de proposito geral.

A partir da conceituacao existente de RTS pode-se citar ex-emplos dos mais diversos tipos: sistemas de air bag, videocon-ferencia, sistema de controle de trafego aereo, controladoresde maquinas de lavar roupa e DVD players, entre outros.Na medida que o uso de sistemas computacionais proliferaem nossa sociedade, aplicacoes de tempo real tornam-se maiscomuns [9]. Atraves de uma breve analise desses exemplose possıvel observar que a aceitacao de falhas e atrasos esuportada de forma diferente entre eles, o que caracteriza umadivisao nos RTS em: crıticos (do ingles, hard) e nao-crıticos(do ingles, soft).

A principal diferenca em relacao a RTS crıticos e nao-crıticos e a consequencia que o atraso na execucao de umadeterminada tarefa pode causar. Nos RTS crıticos uma eventualfalha e catastrofica e pode causar prejuızos e/ou apresen-tar riscos a vida humana e ao meio ambiente em geral[1]. Ja os RTS nao-crıticos nao possuem limitacoes tao fortesem relacao ao atraso de uma determinada tarefa, pois essedescumprimento de tempo, em geral, acarreta apenas em umadegradacao do desempenho do sistema, sem causar, no entanto,os possıveis prejuızos de uma falha em um sistema crıtico [10].

2) Exemplos de sistemas embarcados: Para finalizar aSecao introdutoria a respeito dos sistemas embarcados,discute-se, brevemente, sobre como os sistemas embarcadosestao presentes em veıculos automotivos.

Assim como muitos sistemas embarcados, os subsis-temas inteligentes que estao presentes nos automoveis estaodisponıveis em varios modelos e muitos sao tao discretos queate mesmo o condutor tem dificuldade de perceber sua atuacao.Entre esses, podem ser citados os sistemas de freios ABS, ainjecao eletronica e a suspensao ativa.

Sistemas com freios a disco com ABS (do ingles, Anti-BrakSystem) e EBD (do ingles, Electronic Brake Distribuition)sao eficientes, evitando o travamento das rodas garantindomelhor aderencia com o piso de rolamento. Assim, ao seevitar o deslizamento das rodas durante a frenagem, os freiosantitravamento beneficiam os condutores de duas maneiras: (i)o automovel ira parar mais rapido, e; (ii) a trajetoria do carropode ser alterada enquanto a frenagem e realizada. Sensoresde velocidade, bomba, valvulas alem da unidade controladoracompoem esse sistema.

Ja os sistemas de injecao eletronica tambem funcionamsem a percepcao do condutor e consistem em sistemas dealimentacao de combustıvel e gerenciamento eletronico domotor do automovel. Nesse contexto, pode-se afirmar que

sua utilizacao em larga escala tem estreita relacao com anecessidade de a industria automotiva reduzir o ındice deemissao de gases poluentes, agindo como um poderoso e eficazcontrole da mistura admitida pelo motor. Isso significa quemotores que possuem o sistema de injecao eletronica possuemuma maior economia de combustıvel ja que sempre trabalhamcom a relacao ideal na mistura entre combustıvel e ar.

Outro sistema que exemplifica bem os sistemas embarcadosno mundo automotivo corresponde a suspensao ativa, umatecnologia que controla os movimentos verticais das rodasatraves de um sistema eletronico. Assim, ao contrario dasuspensao comum, que trabalha de acordo com a rodagem,a suspensao ativa corrige as imperfeicoes da pista com maiseficiencia. Isso garante mais estabilidade e desempenho aoveıculo em situacoes diversas, como curvas, aceleracao oufrenagem, alem de facilitar o controle do condutor.

Muitos outros sistemas veiculares, como assistente paraestacionamento, auxiliares de navegacao e verdadeirasestacoes de entretenimento contemplando DVD players eoutros itens de diversao estao cada vez mais difundidos, sendoque seu custo, por consequencia, tambem esta decaindo. Dessemodo, pode-se afirmar que a area de estudo dos sistemasinteligentes para automoveis e promissora para pesquisas eexistem muitas iniciativas ao redor do mundo focadas naseguranca e conforto dos condutores de automoveis atravesdo uso de sistemas embarcados complexos.

B. Arquiteturas

Nesta Secao apresentam-se as plataformas utilizadas para odesenvolvimento e implementacao dos sistemas embarcados.

Uma plataforma de desenvolvimento pode ser definida comosendo a infraestrutura necessaria para a criacao e desenvolvi-mento de um determinado sistema. Essa definicao abrangequestoes de diferentes nıveis de abstracao.

Em um nıvel mais baixo encontram-se as definicoes arquite-turais, enquanto que em nıveis mais altos estao informacoesa respeito do software a ser empregado. Embora possa serdividida em camadas de acordo com o nıvel de abstracao, aplataforma de desenvolvimento apresenta restricoes normal-mente ligadas a natureza do sistema embarcado.

1) Monoprocessadas: Arquiteturas monoprocessadas tipi-camente encontram-se em um System-on-Chip (SoC) ondeestao diretamente em contato com outros componentes, taiscomo memorias, decodificadores, circuitos dedicados, entreoutros, em um mesmo chip. Essas arquiteturas represen-taram uma queda muito grande no custo dos dispositivosembarcados, principalmente os de entretenimento em funcaoda reducao do numero total de chips necessario para acomposicao total do sistema.

O crescente avanco tecnologico vem possibilitando aintegracao de varios blocos de hardware, tais como proces-sadores, memoria e perifericos, em um mesmo chip. O circuitointegrado (CI) que contem esse sistema e denominado deSystem-on-Chip e permite a realizacao de tarefas especıficas[11] [12]. Um dos principais resultados dessa caracterıstica e

a possibilidade de aumento de desempenho e funcionalidadesencontradas nos equipamentos atuais.

Assim como no desenvolvimento de SEs, o processode desenvolvimento dos SoCs tambem possui pressao mer-cadologica e, por isso, deve ser eficaz e eficiente. Conformepode ser observado, a aplicacao de tecnicas de reuso [13] eessencial para que as diversas restricoes existentes nesse tipode projeto possam ser respeitadas, uma vez que defendem apadronizacao de interfaces e a modularizacao de diferentescomponentes.

Na Figura 2 pode-se observar um exemplo tıpico de SoCque conta com:

• um ou mais microcontroladores, microprocessador enucleos de DSP’s;

• blocos de memoria (ROM, RAM, EEPROM e/ou Flash);• osciladores e PLLs (do ingles, Phase Locked Loop);• perifericos diversos;• interfaces externas (analogicas e digitais);• meios de interconexao para ligar os blocos mencionados.

Fig. 2. Exemplo de System-on-Chip

Por fim, outro item importante a repeito de SoCs e odesempenho. Para determinadas aplicacoes o emprego de umprocessador unico que seja responsavel por toda a execucaodo sistema pode ser mais custoso, em termos de desempenhoe/ou consumo de energia, do que quando comparado a sistemascompostos por mais de um processador [14]. Nesse contextosurgem os SoCs multiprocessados ou MPSoCs abordados aseguir.

2) Multiprocessadas: Arquiteturas multiprocessadas rep-resentam uma evolucao em relacao as monoprocessadas eestao tipicamente nos chamados Multiprocessor System-on-Chip (MPSoC), onde tambem se comunicam com outros dis-positivos do sistema [4]. Essas arquiteturas dificultam o desen-volvimento do software, uma vez que problemas encontradosem sistemas paralelos de pro-posito geral tambem podemser vistos nesse novo contexto, porem, com um conjunto derestricoes bem maior e mais diverso.

A crescente demanda de diversas aplicacoes, tais como asde multimıdia e os sistemas moveis, traz a necessidade dautilizacao de mais de um elemento de processamento em umunico SoC. Nesse caso, os diversos elementos de processa-mento (homogeneos ou heterogeneos) agregados aos demaiscomponentes tıpicos de um SoC formam um MPSoC [15].

A arquitetura de um MPSoC tıpico lembra as ja consagradasarquiteturas multiprocessadas. No entanto, o projeto de umMPSoC possui restricoes adicionais em relacao ao custo eao consumo de energia, possibilitando, assim, a realizacao dediversos estudos novos nessa area [15]. Um exemplo basicode MPSoC que conta com elementos de processamento (EP),memoria, interface de E/S e barramentos para interconexao,pode ser visua-lizado na Figura 3.

Fig. 3. Exemplo de MPSoC

Em sistemas embarcados multiprocessados utilizam-se duasou mais CPUs com um consumo de energia reduzido, o quediminui, consequentemente, sua capacidade computacional.Apesar disso ainda sao capazes de realizar tarefas complexas,pois paralelizam a computacao. Um cuidado que deve sertomado ao se utilizar MPSoC esta no fato de que o meio deinterconexao possui papel importante no desempenho geral dosistema. Por exemplo, em um sistema altamente comunicante,se o meio de conexao escolhido nao suportar diversas trocas demensagens, o desempenho total sera prejudicado. Sendo assim,o desempenho do sistema nao depende apenas da capacidadecomputacional dos processadores, mas tambem, do poder decomunicacao.

Comunicacao por Barramento. Nesse modelo, N nodossao interligados por uma ou mais vias que se encarregamde transmitir os pacotes trocados. Cada nodo representa umaunidade do sistema podendo ser composto, por exemplo, pormemorias e/ou processadores. Essa estrutura de comunicacaoe amplamente empregada principalmente devido a sua simpli-cidade e eficiencia de implementacao [16].

Nos barramentos simples (nao hierarquicos), somente umelemento de processamento pode usar o barramento por vez,enquanto todos os outros devem esperar o termino da trans-missao em andamento e dependendo de um arbitro para teracesso ao barramento. A Figura 4 mostra um exemplo debarramento onde uma CPU, uma memoria e uma unidade deE/S estao conectadas por uma unica via.

Fig. 4. Exemplo de Barramento

Uma alternativa mais eficiente para uso desse tipo de barra-mentos, denominado de Simples e o Barramento Hierarquico.

A grande diferenca dessa topologia para o simples e a ex-istencia de diversos nıveis de barramentos interconectadospor pontes (do ingles, bridges), responsaveis pela troca depacotes entre os nıveis. Este tipo de barramento possibilitao paralelismo de comunicacoes, no entanto este paralelismoe restrito, e uma comunicacao entre nucleos conectados parasub-barramentos diferentes provocara a paralizacao de diversosrecursos [7]. A Figura 5 mostra um barramento hierarquicocomposto por quatro CPUs, uma memoria compartilhada euma unidade de comunicacao com o mundo externo. No nıvelsuperior, denominado nıvel mestre (do ingles, Master), duasCPUs e o modulo de comunicacao externa estao interligadospor um barramento simples. Ja no nıvel inferior, denomi-nado nıvel escravo (do ingles, Slave), outras duas CPUs ea memoria estao conectadas por outro barramento simples. Acomunicacao entre os dois nıveis ocorre via uma ponte detroca de pacotes.

Fig. 5. Barramento Hierarquico Interligando CPUs

Comunicacao por Redes Intra-Chip. Devido ao aumentoda demanda por sistemas altamente comunicantes aliadoas limitacoes dos modelos baseados em barramentos, no-vas solucoes tiveram de ser pesquisadas. Um modelo decomunicacao que vem sendo muito explorado nos ultimosanos e o de Redes Intra-Chip (do ingles, Network-on-Chip- NoC) [16].

Nesse modelo, a abordagem para comunicacao difere emrelacao aquela adotada pelos barramentos: enquanto que embarramentos todos os elementos sao interligados por um meiosimples e direto de comunicacao, em NoCs roteadores geren-ciam todo trafego e direcionam os pacotes da maneira maisadequada. A eficiencia na entrega dos pacotes esta ligada aosalgoritmos de roteamento presentes na rede, que podem serdivididos em tres grupos principais:

• roteamento estatico e dinamico;• roteamento distribuıdo, e;• roteamento mınimo e nao-mınimo.Nas NoCs diversas topologias sao propostas sendo que o

modelo mais usado e o de rede tipo malha (do ingles, mesh)onde todas as conexoes possuem o mesmo comprimento,facilitando o projeto. Nesse modelo todos roteadores, ex-cluindo os nodos externos que possuem apenas duas conexoes,estao interligados a, no maximo, quatro roteadores vizinhos,agilizando a troca de pacotes [16].

Um exemplo de rede em malha contendo 16 processadorese mostrado na Figura 6, onde e possıvel observar o nodo 0003

com apenas duas ligacoes destacadas em negrito, alem do nodo0201, que contem quatro ligacoes destacadas.

Fig. 6. Exemplo NoC com Topologia em malha

Ja na Figura 7 outra alternativa de topologia para NoCs eexibida. Nesse caso, denomina-se Torus sendo muito semel-hante a topologia em malha, com a principal diferenca de queos nodos externos sao conectados aos nodos externos da outraextremidade.

Fig. 7. Rede Torus Interligando CPUs

3) Virtualizadas: Por fim, e apresentado um tipo maisrecente de arquitetura de sistemas embarcados: as plataformasvirtualizadas. Essas plataformas, assim como no seu uso nossistemas de proposito geral, sao empregadas com objetivosdiversos, entre os quais: reducao de custo e aumento de de-sempenho. Virtualizacao de sistemas computacionais consisteem criar um grupo logico de recursos que se assemelhamaos recursos fısicos oferecidos por um ambiente computa-cional [17]. Essa tecnica tem sido adotada amplamente nomundo empresarial, especialmente para explorar o potencial desistemas multiprocessados, alem de oferecer outras vantagens,tais como:

• permitir que varios sistemas operacionais sejam executa-dos em uma unica maquina;

• prover isolamento de uma maquina virtual para outra,aumentando a seguranca;

• aumentar a flexibilidade do sistema;• melhorar o gerenciamento da carga de trabalho, e:• permitir a independencia de hardware.

Por outro lado, a virtualizacao pode ser considerada umatecnica que demanda alto poder computacional, ja que, nor-malmente, requer um grande es-paco em disco e muitouso de memoria RAM, alem de inserir uma camada ex-tra de gerenciamento: o monitor de maquinas virtuais (doingles, Virtual Machine Monitor - VMM), tambem conhecidocomo hypervisor, camada essa res-ponsavel por permitir queinstrucoes executadas pela maquina virtual sejam executadasnormalmente pela maquina hospedeira.

Em servidores comerciais, a virtualizacao permite que umunico servidor fısico funcione como multiplos servidoreslogicos alem de prover multiplas ins-tancias de diferentesSistemas Operacionais, como Windows, Linux e outros. Fre-quentemente, esses sistemas sao empregados em processadoresmulti-core da Intel e AMD, tendencia essa que e adotadaatualmente pela maioria dos fabricantes de processadores,cujos projetos ultrapassam os quatro cores para um futuroproximo.

Como discutido anteriormente, e visıvel a tendencia dese utilizar plataformas multiprocessadas tambem nos sis-temas embarcados [4], sendo que dispositivos que utilizamtais plataformas forcam uma mudanca na maneira pela qualseus desenvolvedores realizam a concepcao de seus sistemas.Isso acontece principalmente porque tecnicas antes vistas nacomputa-cao multiprocessada de proposito geral precisam serreavaliadas antes de ser empregadas em SEs [18].

Enquanto a virtualizacao possibilita a execucao de multiplasinstancias de sistemas operacionais em um unico processador(mono- ou multi-core), a sua utilizacao em sistemas embar-cados nao e trivial, pois sao muito diferentes de sistemasempresariais [19]. Desse modo, para que a virtualiza-cao possaser empregada de maneira vantajosa em sistemas embarcados,muito esforco deve ser realizado para que se entenda comose deve adapta-la as necessidades e caracterısticas dos SEs,sistemas normalmente restritos com relacao ao consumo de en-ergia, quantidade de memoria, restricoes temporais e tamanhode area.

O hypervisor, tambem denominado de monitor de maquinasvirtuais, juntamente com o hardware, e responsavel por lidarcom as instrucoes vindas da maquina virtual, alem de realizartodo o controle das maquinas virtuais. Adicionalmente, deve-se observar o funcionamento desse componente. De acordocom [17], existem dois tipos diferentes de hypervisor:

• tipo 1, conhecido como virtualizacao no nıvel de hard-ware, onde se considera que o hypervisor e um sistemaoperacional por si so, ja que somente ele opera emmodo kernel, como pode ser observado no lado esquerdoda Figura 8. Sua principal tarefa, alem de controlar amaquina real, e prover a nocao de maquinas virtuais, e;

• tipo 2, ou virtualizacao no nıvel de SO, onde o hypervisore como qualquer outra aplicacao de usuario e nao temacesso direto ao hardware (deve passar antes pelo sistemaoperacional da maquina). Nesse caso, perde-se uma dasprincipais vantagens da virtualizacao que e justamente ouso de SOs diferentes.

Fig. 8. Hypervisors tipos 1 e 2

E importante destacar que uma vez que a maquina virtualimita o hardware real, tambem deve separar a execucao nosmodos kernel e usuario. Nesse sentido, estudos classicos dePopek e Goldberg [17] introduzem uma classificacao dosISA (do ingles, Instruction Set Architecture) em tres gruposdiferentes:

1) instrucoes privilegiadas: aquelas que causam em umatrap quando executadas em modo usuarios mas que naocausam trap se empregadas no modo kernel;

2) instrucoes sensitivas de controle: aquelas que tentammodificar a confi-guracao dos recursos no sistema, e;

3) instrucoes sensitivas de comportamento: aquelas cujocomportamento ou resultado depende da configuracaode recursos (o conteudo do registrador de relocacao ouo modo do processador).

Assim sendo, de acordo com os Popek e Goldberg [17],para que a virtualizacao de uma dada maquina seja possıvel,as instrucoes sensitivas (de controle e comportamento) devemser um subconjunto das instru-coes privilegiadas. Isso nao erealidade em muitos processadores, como os da famılia Intelx86, e, nesse caso, a solucao comumente perpassa por adotarsuporte em nıvel de hardware por parte do processador. AIntel possui o IntelVT (do ingles, Virtualization Technology)e a AMD possui o SVM (do ingles, Secure Virtual Machine).O suporte pelo hardware pode nao ser a melhor solucao nocaso dos sistemas embarcados, ja que e interessante que avirtualizacao consiga lidar com o diverso hardware existente,especialmente para acelerar o restrito time-to-market.

Quando o suporte de hardware e inexistente, a maneiramais comum de se virtualizar um sistema e conhecida comovirtualizacao pura. Nesse caso, sempre que a maquina vir-tual tentar executar uma instrucao privilegiada (requisicaode E/S, escrita em memoria etc.), ocorre uma trappara o hypervisor. Normalmente, essa e considerada umaforma muito ineficiente de se aplicar a virtualizacao tantoem sistemas de proposito geral quanto em embarcados[19].

Alternativamente, a tecnica de para-virtualizacao pode serempregada para substituir as instrucoes sensitivas do codigooriginal por chamadas explıcitas ao hypervisor (hypercalls).Na verdade, o sistema operacional da maquina virtual estaagindo como uma aplicacao normal de usuario sendo execu-tada sobre um sistema operacional normal, com a diferenca

que o sistema operacional convidado esta sendo executadosobre o hypervisor. Quando a para-virtualizacao e adotada, ohypervisor deve definir uma interface composta por chamadasde sistemas que possam ser usadas pelo sistema operacionalconvidado. Ainda, e possıvel remover todas as instrucoessensitivas do SO convidado, forcando-o a usar comente ashypercalls o que torna o hypervisor mais parecido comum microkernel, o que pode aumentar o desempenho davirtualizacao.

C. Software embarcado

Apos examinar com mais cautela as nuances das platafor-mas de desenvolvimento dos sistemas embarcados, apresenta-se, nesta Secao, o conceito de software embarcado. Essesoftware pode estar em diversas camadas e linguagens e -nos dias de hoje - corresponde desde o sistema operacionalresponsavel pelo funcionamento do sistema embarcado, ate osaplicativos ins-talados ou mesmo desenvolvidos pelo usuariode um telefone celular, por e-xemplo.

Historicamente, sistemas embarcados eram especıficos auma aplicacao e, por conta disso, o software era visto apenascomo um complemento - por vezes opcional - do sistema.Com a constante mudanca e evolucao nesse mercado, novosdispositivos e novas necessidades por parte dos consumidoresfizeram com que a complexidade dos sistemas aumentasse emigual proporcao. Dessa forma, o software antes visto comoopcional em alguns casos passou a ser o elemento fundamen-tal de um sistema embarcado. No entanto, ferramentas queexplorem e auxiliem o desenvolvedor ainda sao escassas.

1) Sistemas Operacionais Embarcados de Tempo Real:Dentre as camadas de software, a principal a ser destacadacorresponde ao sistema operacional existente. Desde que umaclasse generosa de sistemas embarcados possui restricoes detempo real, destacamos os Sistemas Operacionais de TempoReal (do ingles, Real Time Operating System - RTOS) comoos principais agentes no atendimento as restricoes crıticas enao crıticas encontradas nesses sistemas.

Um RTOS deve atender aos requisitos funcionais e, prin-cipalmente, aos temporais, imprenscindıveis aos RTS. Nessessistemas, diferentemente daqueles de proposito geral, mecan-ismos como caches de disco e memoria virtual, em geral, naosao empregados porque podem dificultar a previsibilidade dastarefas a serem executadas.

Dessa forma, um RTOS e um sistema que, tipicamente,possui suporte a prioridades e sincronizacao previsıvel dethreads, alem de oferecer um comportamento determinısticode todo o sistema operacional [20]. Para que isso seja possıvel,devem ser conhecidos itens como: o pior caso de tempo deexecucao (do ingles, Worst-Case Execution Time - WCET) eo tempo no qual as interrupcoes serao atendidas.

Outro ponto caracterıstico de um RTOS e a possibili-dade de ser personalizado para uma determinada aplicacaoem tempo de compilacao, de maneira que inclua somenteum pequeno subconjunto das funcionalidades disponıveis[20]. Isso faz com que partes subutilizadas do RTOS possam

ser descartadas para determinadas aplicacoes, com o objetivode poupar recursos.

Ainda acerca dos RTOSs, uma questao interessante - e queos faz diferente dos OSs de proposito geral - e que tipicamenteo programador pode ter um acesso mais facil e direto aohardware. O objetivo principal dessa abordagem e deixar esseacesso mais previsıvel e rapido apesar de, talvez, possibilitaro acesso indevido aos dispositivos, tais como a memoria dosistema.

Conceitos basicos em RTOS. Define-se uma tarefa comouma das pequenas partes que formam um programa emexecucao, o qual possui um espaco de enderecamento proprio.Tarefas podem ser classificadas em periodicas e aperiodicas.Uma tarefa periodica e aquela cujas ativacoes de proces-samento ocorrem em uma sequencia infinita e acontecemem intervalos regulares, denominados de perıodo. Ja umatarefa aperiodica e aquela cuja ativacao corresponde a eventosinternos ou externos, sendo que sua execucao e aleatoria.Quando existe um intervalo mınimo conhecido entre duasativacoes consecutivas, a tarefa e dita esporadica.

Alem disso, outra classificacao possıvel e em relacao apreemptividade da tarefa. Tarefas preemptivas sao aquelasque podem ser interrompidas ao longo de sua execucao, aocontrario das nao preemptivas, que devem ser executadas deforma atomica.

Por fim, a outra diferenca basica entre tarefas e em relacaoa sua prioridade. Tarefas estaticas nao tem seu nıvel deprioridade modificado ao longo da execucao, sendo que e esta-belecido pelo sistema operacional ou pelo usuario. Ja as tarefasdinamicas sao iniciadas com um valor de prioridade estatico,porem esse nıvel pode ser alterado ao longo da execucao deacordo com diversos parametros, tais como o tempo de CPUreservado e aquele consumido por uma determinada tarefa.

Outro conceito fundamental em RTOSs esta relacionado aotempo. Dessa forma, existem varias definicoes em relacao aotempo de ocorrencia de um certo evento, como por exemplo:

• tempo de computacao ou execucao (computation time):e o tempo utilizado por uma tarefa para a execucaocompleta de suas atribuicoes. Casos especiais de tempode execucao incluem:

1) BCET (Best Case Execution Time) - melhor (menor)tempo de exe-cucao possıvel de uma determinadatarefa;

2) ACET (Average Case Execution Time) - tempomedio de execucao de uma determinada tarefa, e;

3) WCET (Worst Case Execution Time) - pior (maior)tempo de exe-cucao possıvel de uma determinadatarefa.

• tempo limite de execucao (deadline): e o tempo maximopermitido para que uma tarefa seja executada;

• tempo de inıcio (start time): e o instante de inıcio doprocessamento da tarefa em ativacao;

• tempo de termino (completion time): e o instante detempo em que se completa a execucao da tarefa;

• tempo de chegada (arrival time): e o instante em que

o escalonador toma conhecimento de uma ativacao datarefa, e;

• tempo de liberacao (release time): corresponde ao in-stante de inclusao da tarefa na fila de tarefas prontas aser executadas.

Escalonamento de Tarefas em RTOS. Conforme exposto,um RTOS deve ser capaz de realizar as tarefas que lhe forematribuıdas respeitando as restri-coes de tempo impostas pelaaplicacao. Para que isso seja possıvel e neces-sario que existaum escalonador de tarefas que consiga respeitar as restricoesexistentes nesses sistemas. Assim, define-se por escalona-mento a ordem estabelecida de execucao de um determinadoconjunto de tarefas. A forma pela qual o escalonamento serarealizado depende, basicamente, de dois fatores: do tipo detarefa (seus atributos e restricoes) e do algoritmo de escalon-amento utilizado.

Apesar de existirem diversos algoritmos de escalonamento,em geral, todos se baseiam em uma maquina de estadosbasica que representa as situacoes possıveis de uma tarefa aolongo do tempo. Essa maquina possui tres estados: pronto,executando e bloqueado e esta representada pela figura 9.Estao representadas, tambem, as interacoes possıveis entreesses estados, que sao:

• pronto para executando: a tarefa que se encontra noestado de pronto esta apta a receber a CPU a qual-quer momento, mas ainda nao a recebeu porque existeuma outra tarefa que tem o controle do processador.Quando essa outra tarefa for bloqueada ou preemptadapelo escalonador, a primeira tarefa da fila de tarefasprontas assume o controle da CPU, caracterizando, assim,a transicao do estado de pronto para o de executando;

• executando para pronto: quando a tarefa esta executandoe nao solicita nenhum servico que a bloqueie (como osde E/S), cabe ao escalonador preempta-la para que outratarefa possa assumir a CPU. Quando isso ocorre (tarefae preemptada) ela volta para a fila de tarefas prontas ediz-se que houve uma transicao do estado de executandopara o pronto;

• executando para bloqueado: quando a tarefa esta execu-tando e solicita algum servico que seja bloqueante, comopor exemplo uma operacao de E/S, ela passa do estadode executando para o de bloqueado;

• bloqueado para pronto: quando a tarefa esta bloqueadae o servico que foi solicitado e concluıdo ou ha algumatarefa de prioridade superior sendo executada; ela voltapara a fila de pronto do sistema, e;

• bloqueado para executando: quando a tarefa esta blo-queada e o servico que foi solicitado e concluıdo ounao ha nenhuma tarefa de prioridade superior sendoexecutada; ela volta para o estado de executando.

Dessa forma, cabe aos escalonadores de RTOS manter duascaracterısti-cas fundamentais a esses sistemas: o cumprimentode prazos e a previsibilidade na execucao das tarefas. Pararealizar o cumprimento de prazos, o escalonador deve sercapaz de evitar a perda de dados ou o descumprimento de

Fig. 9. Maquina de estados tıpica de uma tarefa

tarefas no tempo exigido. Ja a previsibilidade e alcancadaquando o sistema e determinıstico e sabe-se exatamentequando determinadas situacoes ocorrerao.

Adicionalmente, e importante que um escalonador verifiquese um determinado conjunto de tarefas e escalonavel ou nao.Para poder realizar esse teste, e feita uma analise de escalon-abilidade que, atraves de formulas matemati-cas, calculam otempo de utilizacao total da CPU. O caso otimo ocorre quandose consegue manter esse tempo em 100%, mas nunca mais doque isso. Nesse ultimo caso, o conjunto de tarefas e dito nao-escalonavel.

Dado um determinado conjunto de tarefas que sejaescalonavel cabe ao escalonador, atraves de uma polıtica deescalonamento, decidir a ordem pela qual essas tarefas seraoexecutadas. Essa polıtica pode ser:

• offline - realiza todas as decisoes relacionadas ao escalon-amento das tarefas antes da execucao do sistema,armazendo-as em uma tabela. No momento da execucaoe utilizada uma estrutura semelhante a um despachante(dispatcher) de tarefas para ativa-las de acordo como esca-lonamento gerado. Esse mecanismo e baseadoem um timer implementado em hardware, que sinalizaquando uma outra tarefa deve ser executada. Na abor-dagem dirigida a relogio todas as tarefas tem o mesmotempo de processador disponıvel, ja que o timer possuium valor de intervalo fixo. Por outro lado, na abordagemcircular com pesos cada tarefa pode ter um tempo deprocessador para a execucao dos seus trabalhos diferenteuma das outras. Para isso, usam-se pesos, onde tarefascom maior peso possuem mais tempo de processadordisponıvel para serem executadas. Nesse caso, o escalon-amento e realizado de forma circular;

• online - realiza todas as decisoes relacionadas ao escalon-amento das tarefas durante a execucao do sistema. Essasdecisoes sao baseadas em diversos parametros que podemou nao mudar em tempo de exe-cucao. Essa abordagemcomporta os algoritmos dirigidos a prioridade, onde eatribuıda uma prioridade para cada tarefa. O escalonadordisponibiliza o recurso computacional aquela tarefa quetiver maior prioridade. Nessa categoria, ha metodos pre-emptivos e nao-preemptivos, sendo que diversos algorit-mos podem ser citados como exemplo. Entre os maisutilizados estao os algoritmos Rate-Monotonic (RM) [21],[22], Deadline Monotonic (DM), Earliest-Deadline-First(EDF) [23], [24], Least-Slack-Time (LST) e Latest-Release-Time (LRT) (Reverse-EDF). Desses algoritmos,o Rate-Monotonic e o menos complexo, sendo muito

utilizado. Ja o algoritmo EDF e considerado otimo - assimcomo o LST e o LRT - porem, quando comparado aosdois ultimos, e o menos complexo do ponto de vistacomputacional, e, por consequencia, mais utilizado queos demais.

Observa-se que existem diversas polıticas de escalonamentoempregadas em RTSs. Apesar disso, existem dois algoritmosque se destacam perante os outros, pois sao os mais empre-gados: o RM (do ingles, Rate Monotonic) e EDF (do ingles,Earliest Deadline First) [25].

D. Modelo e tecnicas de programacao

Uma vez que os sistemas embarcados possuem tantas pecu-liaridades, o desenvolvimento de seu software tambem sofreas consequencias dessa ca-racterıstica. Com a utilizacao deplataformas monoprocessadas, multiprocessadas e ate virtual-izadas, um dos principais problemas diz respeito a maneirapela qual a programacao desses recursos e realizada [26]. As-sim, pode-se observar que existem diversos nıveis de abstracaopara que o software possa ser projetado e desenvolvido.Esses nıveis tem o intuito de facilitar o desenvolvimento dossistemas, alem de diminuir os erros encontrados durante e apossua implantacao.

Devido a possıvel complexidade da arquitetura de umsistema embarcado e essencial que o projeto elaboradoseja dividido em diversos nıveis de abstracao. A ex-istencia de ferramentas de CAD, cada vez mais robus-tas, que automatizam as diversas etapas da metodologiagarantem um produto final confiavel, mesmo aqueles ini-cialmente descritos nas camadas mais altas de abstracao[25].

Resumidamente, pode-se adotar uma metodologia que sejaabrangente a todos os nıveis de abstracao envolvidos em umprojeto desse genero. Assim, de forma generica, os principaisnıveis de abstracao no processo de desenvolvimento sao: req-uisitos, especificacao, arquitetura, componentes e integra-caodo sistema. Esses nıveis podem ser visualizados na Figura 10onde, no lado esquerdo, esta exibida a abordagem top-downenquanto que no lado direito a visao bottom-up e evidenciada.Essas abordagens dizem respeito a maneira pela qual o sistemae desenvolvido. Na primeira solucao inicia-se o processo deplanejamento pela definicao dos requisitos do sistema, seguidoda especificacao e assim por diante. Por outro lado, na segundaalternativa inicia-se o projeto a partir dos componentes sendoque a etapa seguinte e a definicao da arquitetura e assimsucessivamente. A seguir, a descricao das fases que constituemcada uma dessas abordagens:

• requisitos - sao listados os requisitos funcionais e estru-turais do sistema como um todo;

• especificacao - nessa etapa cada um dos requisitos ante-riormente citados sao detalhados de forma a descrever amaneira pela qual o sistema deve se comportar;

• arquitetura - sao descritos os detalhes internos do sistemapropondo como pode ser construıdo. Nessa etapa estaoestruturados os componentes a serem utilizados;

• componentes - uma vez definidos quais os componentesnecessarios, os mesmos devem ser concebidos e imple-mentados incluindo modulos em software e hardwareespecializados, e;

• integracao do sistema - a montagem do sistema deforma global e alcan-cada pela uniao dos componentespreviamente concebidos

Fig. 10. Principais nıveis de abstracao no processo de desenvolvimentoFonte: Adaptado de [25]

Adicionalmente, a fase de verificacao e teste de um sistemaembarcado deve garantir a sua confiabilidade, sem, no entanto,ultrapassar o restrito espaco de tempo designado para o projetoe desenvolvimento do produto final permitindo que atinga otime-to-market esperado. Para que isso seja possıvel, e im-portante que se realize a validacao de diferentes componentes(tanto de software quando de hardware) de maneira separada.

Dessa forma, a divisao do sistema em nıveis deabstracao e novamente empregada para permitir avalidacao separada dos componentes de software[27]. A Figura 11 ilustra essas camadas de abstracao para umaaplicacao simples que contem tres tarefas (denominadas nografico de T1, T2 e T3) a serem mapeadas em uma arquiteturacomposta de dois processadores e subsistemas de hardware.Para cada um dos nıveis essa figura mostra a organizacao dosoftware, a interface entre software e hardware e a plataformade desenvolvimento de software que sera utilizada para suavalida-cao em cada um dos nıveis.

Segundo [27], a Figura 11 apresenta quatro diferentes nıveisde abstracao (descritos dos mais altos para os mais baixosnıveis de abstracao):

• nıvel de arquitetura do sistema - um conjunto defuncoes agrupadas em tarefas formam um software. Acomunicacao entre funcoes, tarefas e subsistemas e re-alizada atraves de canais abstratos de comunicacao. Asimulacao nesse nıvel e realizada, por exemplo, atraves doambiente Simulink com objetivo de validar a funcionali-dade da aplicacao.

• nıvel de arquitetura virtual - cada uma das tarefas saorefinadas, por e-xemplo, em codigos C que contem ocodigo da aplicacao final e usam a API (do ingles,Application Programming Interface) HdS (do ingles,Hardware Dependent Software) - cujas primitivas decomunicacao acessam de forma explıcita os componentesde comunicacao;

• nıvel de arquitetura de transacoes - o software e ligadoespecificamente a um OS (do ingles, Operating System) ea um software de E/S (Entrada e Saıda), responsavel porimplementar as unidades de comunicacao. O software re-sultante usa primitivas de nıveis de abstracao de hardware(do ingles, Hardware Abstraction Level - HAL).

• nıvel de prototipo virtual - a API HAL e o processadorsao implementados atraves do uso de uma camada desoftware HAL e da parte correspondente do proces-sador para cada um dos subsistemas de software. Asimulacao nesse nıvel e realizada atraves de modelos deco-simulacao hardware/software classicos.

Fig. 11. Nıveis de abstracao de softwareFonte: Adaptado de [27]

E. Exemplos de Aplicacoes

Algumas aplicacoes sao consideradas classicas em termosde sistemas embarcados e, usualmente, sao utilizadas comobenchmarks para o teste de novas propostas.

Uma das aplicacoes mais utilizadas atualmente, principal-mente em funcao da ampla utilizacao de dispositivos embarca-dos multimıdia, e o padrao H.264. Esse e um padrao utilizadopara compressao de vıdeo baseado originalmente no formatoMPEG-4. Dentre os sistemas que utilizam esse padrao pode-secitar a TV digital brasileira, aparelhos de DVD e consoles devıdeo-game.

O H.264 e dividido em perfis e, visto que sua utilizacaoe bastante ampla (de produtos portateis ate decodificadoresde alta definicao) e importante otimizar ao maximo a suaexecucao, visando diminuir itens como custo de fabricacao,o consumo de potencia e energia, visando os dispositivos queoperam com bateria.

Ainda, padroes famosos como JPEG e MP3 figuram entre asprincipais aplicacoes embarcadas de uso cotidiano e em pro-dutos de natureza diversa e que, portanto, possuem restricoesigualmente heterogeneas.

F. Virtualizacao de Software

A virtualizacao do software objetiva permitir o uso de sis-temas legados com sistemas atuais, aumentar a seguranca dossistemas embarcados, alem de - juntamente com plataformasvirtualizadas - reduzir o custo total do produto final e aumentarseu desempenho. Nessa Secao encontram-se os detalhes maisimportantes a respeito da aplicacao dessa tecnica nos sistemasembarcados.

Uma das vantagens mais apropriadas e diretas davirtualizacao em sistemas embarcados consiste em permitir

que SOs diferentes possam coexistir na mesma maquina. Nessecaso, pode-se atacar dois problemas diferentes:

• o uso de software legado, ja que e possıvel a criacao (oumanutencao) de um sistema operacional compatıvel comesse software e outro mais moderno, que permita quenovos recursos sejam explorados, e;

• dividir o sistema em uma parte onde o usuario tem acesso,com chama-das especıficas conhecidas por ele, separadasda parte crıtica, respon-savel por manter o dispositivofuncionando. Nesse caso, dois sistemas operacionais, umde usuario e outro de sistema, podem ser empregadossimultaneamente.

Quando a virtualizacao for empregada com esses objetivos,o hypervisor deve ter controle total do hardware alem de criardiferentes maquinas virtuais, uma por OS. Como pode serobservado na Figura 12, essa abordagem pode ser usada tantoem maquinas mono- ou multi-core. Ainda, permite que seaumente a qualidade de desenvolvimento de software, uma vezque o projetista pode escolher entre diversos OSs aquele maisadequado a sua aplicacao. Alem disso, o tempo requerido paradesenvolver uma aplicacao pode ser reduzido drasticamente, jaque a reusabilidade das aplicacoes cresce sensivelmente [28].

Fig. 12. Hypervisor para separacao de maquinas com varios OSs

Alem disso, essa abordagem oferece a vantagem de sealcancar uma arquitetura de software unificada que pode serexecutada em multiplas plataformas de hardware. Nesse caso,um problema atual e recorrente nos sistemas embarcados - aportabilidade de software - pode ser amplamente afetado e osprojetistas tem o potencial de satisfazer mais rapidamente otime-to-market cada vez mais restrito.

Adicionalmente, a seguranca do sistema embarcado tambeme um forte apelo para a virtualizacao, ja que atraves delapode se prevenir que ataques ocorridos ao sistema atinjamo OS principal, como pode ser visto na Figura 13. Nessecaso, o codigo malicioso fica restrito a maquina virtual, semcontaminar o resto do sistema, pois nao possui o conhecimentonecessario do hypervisor para poder explorar os seus pontosfracos. Ainda, o hypervisor pode detectar a ocorrencia de umataque e reinicializar a maquina virtual, sem prejudicar o restodo sistema.

Enquanto a virtualizacao embarcada pode trazer inumerasvantagens, e importante que se esclareca a que custo essesbenefıcios podem ser alcancados. Algumas das limitacoes ja

Fig. 13. Ataque de usuario bloqueado atraves do isolamento das maquinasvirtuais

estao presentes na virtualizacao de proposito geral, enquantoque outras surgem do seu uso em ambientes tao severamenteres-tritos, como os sistemas embarcados. Para que mecanismosde virtualizacao possam ser implementados, e necessario queexista suporte do hardware, o que muitas vezes nao existeou nao e viavel em sistemas embarcados. O suporte paravirtualizacao acarreta em um crescimento na area do chip,e consequentemente do consumo de energia e aumento detemperatura. Algumas tecnicas, como traducao dinamica decodigo ou emulacao podem ser utilizadas como alternativas.Estas tecnicas, no entanto, aumentam o tempo de execucaoda aplicacao e mecanismos de hardware, apesar de suasdesvantagens, mostram-se mais adequados na grande maioriados casos [SPE].

Um dos principais problemas a ser atacado esta relacionadocom o escalo-namento das tarefas realizado pelo hypervisor.Sistemas embarcados tipicamente tem restricoes temporais e,por isso, qualquer deslize do hypervisor pode comprometer osistema.

Pode-se ainda considerar o caso onde um dado multi-coreapresenta um comportamento multi-processado assimetrico,com dois OSs: um de usuario e um RTOS. Nesse caso,cada OS e tratado como uma maquina virtual separada e, emsistemas embarcados, e desejavel que o RTOS seja priorizadoem relacao ao OS de usuario, assim como tarefas de tempo-real que eventualmente sejam executadas no OS de usuario(como aplicacoes multimıdia) tambem devem ter preferencia.Esse escalonamento com prioridades vai de encontro com osprincıpios das maquinas virtuais, nos quais todas as maquinasvirtuais devem dividir o hardware real em proporcoes iguais.

Alem disso, a heterogeneidade tıpica de sistemas embarca-dos pode representar um grande desafio, ja que o hypervisortem de, teoricamente, conseguir comunicar-se com o maiornumero possıvel de arquiteturas. Enquanto que na computacaode proposito geral a arquitetura Intel x86 e amplamenteusada, por exemplo, em sistemas embarcados existe umagrande variedade de arquiteturas empregadas, desde DSPs aprocessadores ARM, passando ainda por arquiteturas PowerPCe MIPS.

Ainda, o isolamento excessivo e absoluto trazido pelasmaquinas virtuais - que aumenta os nıveis de segurancae confiabilidade - podem causar dificuldades para que os

diversos subsistemas embarcados cooperem entre si, o quealtamente desejavel em sistemas embarcados.

Ja entre os principais usos para virtualizacao em sistemasembarcados, pode-se destacar a reducao do numero totalde processadores em um sistema, colocando-os em diversasmaquinas virtuais sobre um unico processador (seja ele mono-ou multi-core).

Em outro exemplo, a confiabilidade de sistemas as-simetricos, onde cada processador de um sistema multipro-cessado possui seu proprio OS, pode ser aumentada atravesda separacao dos recursos, com a capacidade de se reiniciaras maquinas virtuais de maneira independente. Tambem, epossıvel migrar sistemas existentes em uma maquina virtuale adicionar novas funcionalidades a elas, provendo assim,oportunidade para reuso e inovacao. Alem disso, a migracaode tarefas entre maquinas virtuais e facilitada.

Finalmente, vale destacar que diversos es-tudos sobre virtualizacao em sistemas em-barcados tem sido realizados [29], [30],[31] e ja existem diversos sistemas especıficos com ointuito de monitorar as maquinas virtuais embarcadas.Dentre esses sistemas pode-se destacar [32], [33], [34], [35],[36], [37].

G. Principais DesafiosOs desafios na area do software embarcado dizem respeito,

principalmente, ao desenvolvimento em nıveis mais altos deabstracao, mas que permitam simulacoes com fidelidade aomodelo final bastante agucada.

Tambem figura como grande desafio a melhor utilizacao dasplataformas multiprocessadas tanto no que tange ao aproveita-mento da totalidade desse poder computacional quanto no quediz respeito a maneira pela qual o software dessas plataformase desenvolvido. Programacao paralela tem sido um desafio aolongo dos anos para a computacao de proposito geral e nao ediferente no mundo dos embarcados.

Do ponto de vista do hardware, a principal atencao deveser dada aos meios de interconexao que devem ser eficienteso bastante para atender a uma quantidade cada vez maior deelementos de processamento em um unico sistema. Pesquisasenvolvendo novos algoritmos de roteamento para NoCs, amescla entre barramentos e NoC e ate mesmo o uso devirtualizacao como estrategia para diminuir o overhead decomunicacao tambem se mostram como uma forte tendencia.

III. ESTUDO DE CASO - SISTEMA HELLFIRE

O projeto de um sistema embarcado contem diver-sas restricoes como cada vez menores time-to-market.Nesse contexto, plataformas de desenvolvimento, teste esimulacao de sistemas embarcados, como por exemplo[38] e [26], procuram diminuir o tempo gasto com o de-senvolvimento disponibilizando recursos que o agilizem, taiscomo simuladores e ferramentas de depuracao (do ingles,debug).

Nesta Secao, destaca-se a plataforma empregada comoestudo de caso no presente trabalho: o Sistema Hellfire [39].Atualmente, o Sistema Hellfire e constituıdo por tres modulos:

• OS, que contem a descricao do HellfireOS. Nesse modulotodas funcionalidades basicas de um RTOS estao disponi-bilizadas para uso via uma API;

• hardware, formado pela plataforma prototipada em FPGA(do ingles, Field-programmable gate array - FPGA).Nesse modulo quatro processadores MIPS estao interliga-dos por um barramento simples e, em cada processador,uma imagem do HellfireOS e instanciada, e;

• simulacao, consistindo em um ambiente onde e possıvelsimular ate 256 processadores contendo o HellfireOS.

HellfireOS. Esee sistema operacional, baseado em umaarquitetura microkernel, segue os principais conceitos deRTOSs apresentados anteriormente e possui ferramentas parao desenvolvimento e simulacao de aplicacoes embarcadas detempo real [39]. O sistema operacional pode ser configuradode acordo com a aplicacao a ser executada e parametros comoo numero maximo de tarefas no sistema, tamanho de pilhadas tarefas, tamanho da memoria heap (pode ser alocada di-namicamente), polıtica de escalonamento, opcoes para debug,velocidade do processador, migracao de tarefas e verificacaode erros de hardware tambem podem ser customizados.

O principal objetivo dessa customizacao e permitir queo tamanho da imagem binaria final1 do sistema operacionalseja otimizada, tornando possıvel a execucao do sistema emarquiteturas com tamanho de memoria reduzido. Algumas dasfuncionalidades disponibilizadas ao desenvolvedor incluem:

• Sistema operacional preemptivo (tarefas podem opcional-mente coope-rar);

• Gerenciamento dinamico de tarefas (adicionar, remover,bloquear, resumir, alterar parametros, fork());

• Chamadas de sistema (informacoes sobre deadlines, usode processador, memoria, energia, parametros de tarefas,tempos de trocas de contexto);

• Diferentes polıticas de escalonamento para tarefas comprioridade fixa (Rate Monotonic e Priority Round Robin)e dinamica (Earliest Deadline First);

• Exclusao mutua e semaforos;• Alocacao, liberacao e gerencia dinamica de memoria;• Verificacoes de integridade do sistema de forma au-

tomatica;• LibC customizada (com funcionalidades adicionais, como

calculos de CRC, geracao de numeros randomicos, entreoutros);

• Biblioteca para emulacao de ponto flutuante com pre-cisao simples (com funcionalidades adicionais comoconversoes, calculos de raiz quadrada e funcoestrigonometricas);

• Comunicacao entre tarefas por trocas de mensagem oumemoria compartilhada;

• Migracao de tarefas.Perifericos sao acessados atraves de entrada e saıda mapeada

em memo-ria. O mapa de perifericos pode ser configurado na

1A imagem binaria final do sistema e composta pelo sistema operacionale tarefas que executam no mesmo. Essa imagem e carregada na memoriade uma unidade de processamento, permitindo que, apos a inicializacao, osistema operacional execute as tarefas.

camada de abstracao de hardware (HAL) para uma solucaoespecıfica de hardware, o que facilita a portabilidade do sis-tema operacional para outras arquiteturas. Atualmente, existemversoes para as arquiteturas MIPS (multiprocessador) e x86(monoprocessador).

A Figura 14 apresenta a estrutura do sistema operacional.Todas as funcoes dependentes de arquitetura sao implemen-tadas na HAL (camada 1). O microkernel e implementadosobre esta camada (camada 2). Alguns device drivers debaixo nıvel sao implementados nesta camada, onde possuemacesso privilegiado ao sistema e ao hardware. Uma bibliotecareduzida de funcoes padrao da linguagem C (LibC), assimcomo a API (do ingles, Application Programming Interface)do sistema operacional sao implementadas sobre o microkernel(camada 3). Tanto as tarefas quanto o sistema operacionalcompartilham a biblioteca padrao, o que permite reducaona utilizacao de memoria. As tarefas de usuario sao imple-mentadas na camada 4, e utilizam da API disponibilizada.Nessa camada tambem sao implementados os device driversque executam em nıvel de usuario, que possuem os mesmosparametros de tarefas do sistema, ou seja, sao regidos pelamesma polıtica de escalonamento.

Fig. 14. Estrutura em camadas do sistema operacional Hellfire

Rotinas de tratamento de interrupcao, salvamento erecuperacao de contexto sao dependentes de arquitetura edessa forma foram escritas em linguagem de maquina. Essasrotinas fazem parte da camada de abstracao de hardware. E im-portante salientar que essa camada pode ser facilmente portadapara outras arquiteturas, devido a modularidade do sistemaoperacional. O fluxo de execucao do sistema operacional segueos padroes de inicializacao e espera por eventos como outrossistemas existentes.

A. Medidas de Escalonamento

De acordo com [40], um sistema operacional de tempo realnao e apenas definido por seu comportamento, ou seja suapolıtica de escalonamento, mas tambem por suas propriedadestemporais, as quais impactam na evolucao da execucao de umconjunto de tarefas.

Base de Tempo. Provida por um contador em hardwarede 32 bits, que opera na mesma frequencia do elementode processamento. Essa base de tempo e referenciada como

tick, e corresponde as unidades das medidas utilizadas nasdefinicoes dos parametros de tarefa. Pode-se selecionar umsinal apropriado desse contador em hardware e a partir dessesinal obter-se a gera-cao de interrupcoes de timer. Dependendodo sinal selecionado e da frequencia de operacao, podem serobtidos diferentes perıodos de tick.

O perıodo de tick e calculado de acordo com a formula,onde a e o sinal desejado do contador e freq e a frequenciade operacao do elemento de processamento, em hertz:

periodo = 2a

freq

Diferentes frequencias de operacao e selecao de sinaisdo contador definem um grande conjunto de valores parao tempo do tick, permitindo ao desenvolvedor a escolha dagranularidade de escalonamento adequada a uma determinadaaplicacao. A Tabela I apresenta valores para tempos de tick,variando-se o sinal selecionado do contador (bit) e a frequenciade operacao.

A Tabela II enumera a quantidade de interrupcoes de timerpor segundo, de acordo com a frequencia de operacao e sinalselecionado do contador. Observa-se que a 100MHz e como sinal 15 selecionado, sao geradas 3125 interrupcoes porsegundo, o que equivale, em um algoritmo de escalonamentoque nao reescalona a tarefa recem preemptada ao mesmonumero de trocas de contexto.

Os valores de 25MHz para a frequencia de operacao esinal 18 do contador foram utilizados como padrao, o quecorresponde ao perıodo de 10.48ms entre interrupcoes. Assim,sao realizadas aproximadamente 95 trocas de contexto porsegundo. Esses valores sao equivalentes ao prototipo emhardware e foram adotadas em funcao de um compromissoentre o tempo de resposta do sistema operacional, facilidadena prototipacao e overhead em virtude das trocas de contexto.

Overhead do Sistema Operacional. O sistema operacionalHellfire prove uma chamada de sistema que retorna o tempogasto em trocas de contexto em ciclos. Essa chamada utilizao contador em hardware para efetuar a medicao, sendo,portanto, independente das ferramentas de software. Tendo-se o tempo gasto em trocas de contexto (dependente dapolıtica de escalonamento e compilador utilizado), o numerode interrupcoes de timer por segundo (ticks) e a frequencia deoperacao, o overhead pode ser calculado por:

overhead = tps×cslfreq

Onde overhead e expresso por um numero entre 0 e 1, tpse o numero de ticks por segundo, csl e a latencia (ou tempo)das trocas de contexto e freq e a frequencia de operacao, emhertz. O tempo de uma troca de contexto e despendido sempreque ocorrer uma interrupcao de timer. Assim, esse custo incidesempre sobre o progresso das tarefas, uma vez que as fatias detempo de processador sao distribuıdas de acordo com a polıticade escalonamento empregada e o overhead e absorvido a cadatick.

Como exemplo, a uma frequencia de operacao de 25MHz eum perıodo entre interrupcoes de 10.48ms, uma tarefa escalon-ada executa por aproximadamente 262000 ciclos (supondo

que no perıodo em questao a tarefa nao realiza chamadapor reescalonemento e a mesma seja preemptada apos otermino do tick). Se for considerada uma latencia de 1500ciclos do sistema operacional2, e observado um overhead deaproximadamente 0.57%.

Interrupcoes do timer sao utilizadas para a geracao de ticksdo sistema. O seu perıodo deve ser bem balanceado, de formaque uma fatia de tempo muito longa pode tornar o sistemapouco responsivo (e pode nao honrar as restricoes de temporeal) e uma fatia de tempo muito curta pode aumentar ooverhead do sistema operacional.

B. Implementacao do Modelo de Tarefas

Uma tarefa τi e definida pelos parametros da n-upla(idi, ri,WCETi, Di, Pi), onde os parametros significamidentificacao, release time, worst case execution time, deadlinee perıodo da tarefa τi, respectivamente.

O comportamento de uma tarefa e definido como um blocode codigo em linguagem C, implementado por uma funcao dotipo void, ou seja, uma funcao que nao recebe parametrosnem retorna valores. Uma tarefa pode ser entendida comouma funcao que itera infinitamente, mas pode ser interrompidaa qualquer momento pelo sistema operacional (a tarefa epreemptada) e ter sua execucao continuada posteriormente.

Uma troca de contexto ocorre apenas por interrupcao dehardware (que pode ser mascarada) ou se a tarefa desiste daexecucao voluntariamente, permitindo que o sistema opera-cional eleja outra tarefa para execucao. A Figura 15 apresentaum exemplo de implementacao, mostrando de maneira geralcomo uma tarefa e organizada. Variaveis locais sao declaradasno corpo da tarefa e armazenadas em sua pilha. O codigo deinicializacao e um segmento de codigo que executa apenasuma vez, nao sendo seu uso mandatorio (pode ser utilizado,no entanto, para inicializar estruturas de dados da tarefa). Overdadeiro codigo da tarefa executa em um laco infinito.

Fig. 15. Corpo da descricao de uma tarefa exemplo

Cada tarefa do sistema encontra-se em um dos seguintes es-tados: pronta, rodando, bloqueada, esperando e nao executouainda. A tarefa e considera-da pronta quando foi preemptada

2Valor estimado, obtido por testes realizados no sistema operacional Hellfirecompilado com GCC 4.4.2 e executando a polıtica Rate Monotonic. A latenciadepende de fatores como compilador, arquitetura, polıtica de escalonamentoe sua implementacao.

TABELA IVALORES PARA TEMPOS DE tick

Freq. de operacao (MHz) 15 16 17 18 19 20 2125 1.31ms 2.62ms 5.24ms 10.48ms 20.97ms 41.94ms 83.88ms33 0.99ms 1.98ms 3.97ms 7.94ms 15.88ms 31.77ms 63.55ms50 0.65ms 1.31ms 2.62ms 5.24ms 10.48ms 20.97ms 41.94ms66 0.49ms 0.99ms 1.98ms 3.97ms 7.94ms 15.88ms 31.77ms100 0.32ms 0.65ms 1.31ms 2.62ms 5.24ms 10.48ms 20.97ms

TABELA IINUMERO DE TROCAS DE CONTEXTO

Freq. de operacao (MHz) 15 16 17 18 19 20 2125 763.36 381.68 190.84 95.42 47.69 23.84 11.9233 1010.1 505.05 251.89 125.94 62.97 31.48 15.7450 1538.46 763.36 381.68 190.84 95.42 47.69 23.8466 2040.82 1010.1 505.05 251.89 125.94 62.97 31.48100 3125 1538.46 763.36 381.68 190.84 95.42 47.69

pelo sistema operacional ou realizou pedido de reescalona-mento voluntariamente. Nesse estado, a tarefa encontra-se nafila de escalonamento. No estado rodando a tarefa encontra-seem execucao e acabou de ser escalonada. A tarefa e encontradano estado bloqueada quando esta pronta para rodar, no entanto,foi removida da fila de escalonamento (vo-luntariamente ounao). No estado esperando a tarefa esta em espera em umsemaforo e nao pode progredir sua execucao ate que outratarefa o incremente ate o ponto em que ela seja liberada.Inicialmente, todas as tarefas encontram-se no estado naoexecutou ainda. Apos a primeira execucao, se nao ficar presaem um semaforo ou bloqueada uma determinada tarefa emantida no estado pronta ate que seja escalonada novamente.Caso nao exista tarefa a ser escalonada, uma tarefa especialadicionada na inicializacao do sistema chamada idle task eescalonada. Apenas sao executadas tarefas que estiverem nafila de escalonamento. Os possıveis estados em que uma tarefapode estar sao apresentados na Figura 16.

Fig. 16. Estados das tarefas

Para garantir a execucao de tempo real do sistema, tarefasnao podem desabilitar interrupcoes. Mascarar uma interrupcaodo timer, mesmo que por um curto espaco de tempo, podefazer com que o kernel perca a interrupcao e o escalonamentoperca sua validade de tempo real.

Todas as informacoes que dizem respeito a tarefas sao

armazenadas em uma estrutura especial denominada TCB(do ingles, Task Control Block, bloco de controle de tarefa).Nessa estrutura, o sistema operacional mantem todas as pro-priedades das tarefas: sua identificacao, descricao, estado deescalo-namento, informacoes de progresso, perıodo, tempode execucao, deadline, utilizacao do processador e memoria,contexto da tarefa, ponteiros de uso geral (regiao de memoriada pilha, por exemplo) e informacoes sobre transmissao dedados.

C. Polıticas de Escalonamento

Segundo [41], polıticas que utilizam prioridades estaticassao mais adequadas a sistemas de tempo real, devido a umfator conhecido como estabilidade. Este fator define que,mesmo em um sistema sobrecarregado, aonde ocorrem perdasde deadline, as tarefas com maior prioridade nao sao afetadaspela sobrecarga. Polıticas de escalonamento com prioridadedinamica, apesar de apresentarem uma maior utilizacao deprocessador, sao instaveis a partir do momento em que ocorreuma situacao de sobrecarga, o que para muitos sistemas detempo real e inadmissıvel.

A polıtica padrao empregada no sistema Hellfire e o algo-ritmo Rate Monotonic, onde tarefas com perıodos curtos pos-suem prioridade sobre outras tarefas. Essa polıtica, no entanto,nao garante que todas as tarefas serao executadas obedecendorestricoes temporais caso o conjunto nao seja escalonavel.Outras polıticas estao disponıveis e o proprio programadorpode definir novas polıticas devido a alta modularizacao dosistema.

D. Comunicacao Entre Tarefas

A comunicacao entre tarefas e realizada por dois mode-los diferentes no sistema Hellfire. O primeiro e o modelode comunicacao por memoria compartilhada, adequado paratarefas que executam no mesmo processador. O outro modeloe comunicacao por troca de mensagens, adequado para tarefasque executam em processadores diferentes.

Esses dois modelos diferem em sua perspectiva deprogramacao. Comuni-cacao por memoria compartilhadapode ser implementada pela protecao de uma estrutura

de dados global compartilhada. Essa estrutura pode serde qualquer tipo, como por exemplo, uma struct emlinguagem de programacao C. A protecao e feita como uso de primitivas para exclusao mutua ou semaforos[42]. Essa protecao precisa ser utilizada para evitar que maisde uma tarefa acesse a mesma estrutura de dados de formaconcomitante3, causando incoerencia dos dados.

Comunicacao por troca de mensagens e implementada como uso de primitivas especıficas do sistema operacional, quepodem enviar e receber qualquer tipo de dado. O programadore responsavel por alocar buffers e especificar a identificacaounica da tarefa a receber os dados em um envio. A tarefadestino automaticamente identifica a tarefa fonte em umrecebimento de dados.

As primitivas de comunicacao por troca de mansagemseguem o modelo produtor / consumidor. Cada tarefa possuiuma fila circular local de recepcao, contendo mensagens quepodem ser retiradas em ordem pela primitiva adequada. Sea fila estiver vazia, a tarefa e bloqueada. Da mesma forma,uma tarefa que envia dados a outra pode ser bloqueada caso atarefa receptora nao possua mais espaco na fila de recepcao. Ainsercao de dados na fila de recepcao, assim como bloqueio eliberacao de tarefas e gerenciado por kernel drivers, ativadospor interrupcao.

E. API do Sistema Hellfire

A seguir e apresentada a API do sistema operacionalHellfire. A interface do sistema e bastante simples, entretantofornece muitos servicos basicos necessarios para o desenvolvi-mento de aplicacoes embarcadas de tempo real. A API consisteem 6 classes de chamadas de sistema: gerenciamento de tare-fas, informacoes do sistema, exclusao mutua, gerenciamentode memoria, primitivas de comunicacao e migracao de tarefas.Esta API e apresentada na Tabela III.

F. Toolchain

Para o desenvolvimento de aplicacoes e do sistema op-eracional foi cons-truıdo um conjunto de ferramentas para oambiente Linux, baseado na colecao de compiladores GCC(do ingles, Gnu Compiler Collection) versao 4.4.2. O conjuntode ferramentas tem como arquitetura alvo o conjunto deinstrucoes MIPS, e inclui:

• compilador cruzado (mips-elf-gcc);• montador de linguagem de maquina (mips-elf-as);• linker (mips-elf-ld);• ferramentas para manipulacao de binarios (mips-elf-

objdump, mips-elf-readelf, mips-elf-objcopy).Para a construcao de uma imagem binaria a ser carregada

em cada elemento de processamento, sao realizados algunspassos: (i) montagem, compi-lacao e customizacao do sistemaoperacional; (ii) compilacao da aplicacao; (iii) criacao de umaimagem ELF contendo a aplicacao e sistema operacional; (iv)

3Diz-se que tarefas acessam dados concomitantemente quando determinadatarefa modifica uma estrutura de dados (mas nao completa a modificacao) eocorre uma troca de contexto, sendo que a tarefa escalonada tambem acessa aestrutura, corrompendo dados (em uma escrita) ou lendo dados corrompidos.

criacao de uma imagem binaria final, utilizando ferramentaspara a manipulacao. A criacao da imagem final e necessariapara que a mesma possa ser diretamente carregada na memoriade determinado elemento de processamento.

Nao sao utilizadas funcoes da biblioteca padrao. O sis-tema operacional inclui uma versao customizada da bibliotecapadrao, com o objetivo de reduzir a utilizacao de memoria eaumentar o desempenho das aplicacoes.

G. Modulo de Hardware

O modulo de hardware e com-posto por quatro processadores MIPS[43] interligados por um barramento. Esse modulo eprototipado em um FPGA e em cada processador umaimagem do OS e instanciada. O processador utilizado e umPlasma [44] (arquitetura MIPS).

H. Modulo de Simulacao

O modulo de simulacao do Sistema Hellfire e chamadode N-MIPS MPSoC Simulator e consiste de um ISS. Essesimulador foi descrito em C e permite a simulacao de ate 256imagens do HellfireOS. E importante ressaltar que a mesmaimagem usada no hardware pode ser simulada, nao sendonecessaria nenhuma alteracao no codigo fonte. Em simulacoescom dois ou mais processadores e assumido como meio decomunicacao um barramento simples.

O N-MIPS gera diversos relatorios de funcionamento dosistema apos a simulacao, conforme lista a seguir:

• saıda padrao de cada processador;• relatorio contendo as instrucoes do Plasma utilizadas,

quantas vezes cada instrucao foi utilizada e percentual deuso para cada grupo de instrucoes (aritmeticas, logicas,...);

• resumo do consumo de energia estimado do sistema comoum todo e individualmente para cada processador. Paraeste calculo sera usada como base [45];

• relatorio contendo as principais caracterısticas do sistema,como por exemplo numero de perda de deadlines e cargada CPU, e;

• relatorio individual do funcionamento ciclo a ciclo dosprocessadores. Todas informacoes contidas na pilha dosistema sao mostradas nesse relatorio.

I. MPSoC: Particionamento e Mapeamento Inicial

Atualmente, tanto o particionamento quanto mapeamentoinicial das tarefas e realizado manualmente, isto e, de-pende da experiencia do projetista para obter bons resul-tados. E o desenvolvedor o responsavel por descrever aaplicacao e definir os grupos de tarefa (particionamento)e a posicao dos grupos nas respectivas unidades de pro-cessamento (mapeamento). Essas definicoes sao feitas nocodigo fonte da aplicacao, como apresentado na Figura17. No exemplo, quatro tarefas sao particionadas entre doiselementos de processamento. Nao foi especificada a quanti-dade de CPUs que compoem a arquitetura, sendo esse um

TABELA IIIAPI DO SISTEMA OPERACIONAL HELLFIRE

Chamada de sistema Classe Formato DescricaoOS BlockTask() Gerenciamento de tarefas int OS BlockTask(unsigned char id) Bloquear tarefaOS ResumeTask() Gerenciamento de tarefas int OS ResumeTask(unsigned char id) Continuar a execucao de

tarefa bloqueadaOS KillTask() Gerenciamento de tarefas int OS KillTask(unsigned char id) Remover tarefaOS AddPeriodicTask() Gerenciamento de tarefas int OS AddPeriodicTask(void (*task)(),

unsigned short int period, unsigned shortint capacity, unsigned short int deadline,char description[], unsigned int energy t,unsigned char locked)

Adicionar tarefa periodica(realtime) e configurarparametros

OS ChangeTaskParameters() Gerenciamento de tarefas int OS ChangeTaskParameters(unsignedchar id, unsigned short int period, un-signed short int capacity, unsigned shortint deadline, unsigned char locked)

Modificar parametros(perıodo, tempo deexecucao, deadline epossibilidade de migracao)

OS Fork() Gerenciamento de tarefas int OS Fork(void) Criar uma copia da tarefa,com os mesmos parametros

OS TaskYield() Gerenciamento de tarefas void OS TaskYield(void) Chamar por escalonamentode maneira voluntaria

OS Start() Gerenciamento de tarefas void OS Start(void) Iniciar o sistema opera-cional.

OS TaskDeadlineMisses() Informacoes do sistema unsigned intOS TaskDeadlineMisses(unsigned charid)

Numero de perdas de dead-line de uma tarefa

OS TaskCpuUsage() Informacoes do sistema unsigned int OS TaskCpuUsage(unsignedchar id)

Utilizacao do processadorde uma terefa

OS TaskEnergyUsage() Informacoes do sistema unsigned intOS TaskEnergyUsage(unsigned charid)

Consumo de energia de umatarefa

OS TaskMemoryUsage() Informacoes do sistema unsigned intOS TaskMemoryUsage(unsigned char id)

Utilizacao de memoria deuma tarefa

OS TaskParameters() Informacoes do sistema unsigned intOS TaskParameters(unsigned charid, unsigned short int *period, unsignedshort int *capacity, unsigned short int*deadline, unsigned char *locked)

Parametros de uma tarefa

OS TaskTicks() Informacoes do sistema unsigned int OS TaskTicks(unsigned charid)

Numero de vezes que umatarefa executou

OS TaskLastTickTime() Informacoes do sistema unsigned intOS TaskLastTickTime(unsigned charid)

Tempo (em ciclos) daultima execucao de umatarefa

OS PacketsSent() Informacoes do sistema unsigned int OS PacketsSent(unsignedchar id)

Numero de pacotes envia-dos por uma tarefa

OS PacketsReceived() Informacoes do sistema unsigned intOS PacketsReceived(unsigned charid)

Numero de pacotes rece-bidos por uma tarefa

OS LastContextSwitchCycles() Informacoes do sistema unsigned intOS LastContextSwitchCycles(void)

Tempo (em ciclos) daultima troca de contexto

OS TaskIdFromUniqueId() Informacoes do sistema unsigned charOS TaskIdFromUniqueId(unsignedshort int uid)

Converte um numero deidentificacao unico em umaid local

OS CurrentTaskId() Informacoes do sistema unsigned char OS CurrentTaskId(void) id da tarefa atualOS CurrentTaskUniqueId() Informacoes do sistema unsigned short int

OS CurrentTaskUniqueId(void)id unica da tarefa atual

OS CurrentCpuId() Informacoes do sistema unsigned char OS CurrentCpuId(void) Numero do processador at-ual

OS CurrentCpuFrequency() Informacoes do sistema unsigned intOS CurrentCpuFrequency(void)

Frequencia do processadoratual

OS NCores() Informacoes do sistema unsigned char OS NCores(void) Numero de processadoresno MPSoC

OS NTasks() Informacoes do sistema unsigned char OS NTasks(void) Numero de tarefas no pro-cessador atual

OS CpuUsage() Informacoes do sistema unsigned int OS CpuUsage(void) Utilizacao total do proces-sador

OS EnergyUsage() Informacoes do sistema unsigned int OS EnergyUsage(void) Consumo total de energiaOS MemoryUsage() Informacoes do sistema unsigned int OS MemoryUsage(void) Utilizacao total de memoriaOS FreeMemory() Informacoes do sistema unsigned int OS FreeMemory(void) Memoria disponıvel para

alocao dinamica

OS EnterRegion() Exclusao mutua void OS EnterRegion(mutex *m) Entrada em regiao crıticaOS LeaveRegion() Exclusao mutua void OS LeaveRegion(mutex *m) Saıda de regiao crıticaOS SemInit() Exclusao mutua void OS SemInit(semaphore *s, int value) Inicializar semaforo com

determinado valorOS SemWait() Exclusao mutua void OS SemWait(semaphore *s) Decrementar semaforo, e

esperar se necessarioOS SemPost() Exclusao mutua void OS SemPost(semaphore *s) Incrementar semaforo, e lib-

erar se necessario

OS Free() Gerenciamento de memoria void OS Free(void *ptr) Liberar regiao de memoriaOS Malloc() Gerenciamento de memoria void *OS Malloc(unsigned int size) Alocar regiao de memoria

OS SendPacket() Comunicacao int OS SendPacket(unsigned short int tar-get uid, unsigned char buf[], unsignedshort size)

Enviar pacote de dados paratarefa com identificacaounica

OS ReceivePacket() Comunicacao int OS ReceivePacket(unsigned short int*source uid, unsigned char buf[], un-signed short *size)

Receber pacote de dadosde tarefa com identificacaounica

OS TaskMigrate() Migracao de tarefas int OS TaskMigrate(unsigned charsource id, unsigned char target cpu)

Migrar tarefa local paraoutro processador

Fig. 17. Particionamento e mapeamento manual no Hellfire

parametro de configuracao do sistema operacional em tempode compilacao e da arquitetura.

Migracao de Tarefas. O sistema operacional possui, at-ualmente, uma primitiva que permite a migracao explıcita detarefas de uma CPU para outra. A primitiva OS TaskMigrate()realiza essa funcao e aceita como parametros a identificacaode uma tarefa local e o processador destino da tarefa.

Todas as tarefas que forem adicionadas aosistema podem ser migradas, desde que tenham sidoconfiguradas com a opcao de migracao TASK CANMIGRATE. Tarefas de sistema, como a tarefa idle e driverssao configurados com a opcao TASK CANNOT MIGRATE ea primitiva de migracao e impedida de migrar tais tarefas.

Na Figura 18 e apresentado um exemplo de uso da prim-itiva de migracao. No exemplo, existem 2 tarefas de usuarioatribuıdas a CPU 0. Apos executarem por um tempo determi-nado pelo algoritmo, a tarefa migration executa a primitiva demigracao, transferindo a tarefa i am alive para a CPU 1.

Fig. 18. Exemplo de migracao de tarefas no sistema operacional Hellfire

A primitiva de migracao alem de realizar a transferencia deuma tarefa para outra CPU insere em uma lista a identificacaoglobal da tarefa migrada. Caso seja enviada uma mensagem

para a tarefa migrada, o kernel driver de comunicacao re-sponde para onde a tarefa foi migrada, de forma que a tarefada CPU origem possa descobrir o novo destino. Caso a tarefaseja transferida para uma CPU onde ja esteve, sua entrada nalista de migracao e removida.

Atualmente, o sistema operacional realiza migracao parcialde tarefas, isto e, apenas os dados e o contexto da tarefamigrada sao transferidos a CPU destino. Esse foi o modeloadotado inicialmente devido a fatores como a nao existencia degerencia de memoria em hardware (MMU) e falta de suportedo toolchain para a geracao de codigo completamente re-locavel. Apesar de suas limitacoes, a migracao parcial oferecevantagens em algumas aplicacoes, onde o tempo de migrar ocodigo, alem dos dados, torna-se proibitivo. Segundo [46], otempo de migracao pode ser reduzido significativamente aoser utilizado esse metodo.

Para que a migracao parcial funcione, o codigo das tarefasa serem migra-das precisa estar em todas as CPUs que sejamalvo provavel de migracao. Alem disso, o codigo precisa estaralinhado, ou seja, os enderecos de codigo das tarefas precisamser exatamente os mesmos. O alinhamento de codigo utiliza re-gras simples onde e estabelecida a ordem de ligacao (linking).Essas regras ditam que inicialmente todo o codigo do sistemaoperacional que nao tem seus enderecos modificados e ligado,apos as tarefas da aplicacao e por fim codigo dependente daCPU em questao, como polıtica de escalonamento e kerneldrivers.

Design Flow. Unificando os itens explicados pode-se vi-sualizar na Figura 19 o fluxo de desenvolvimento adotadopelo sistema Hellfire. Nessa figura, esta bem clara a divisaosofrida pelo sistema em configuracao de software e hardware,sendo que o desenvolvimento da aplicacao, a criacao doprojeto na plataforma e a configuracao do OS sao realizadascomo parte do software final. Ja configuracoes do processador(arquitetura e frequencia, por exemplo), personalizacoes domeio de comunicacao disponıvel (barramento ou NoC) e atemesmo o tamanho da memoria final do sistema dizem respeitoao hardware final que estara em constante interacao com osoftware configurado anteriormente.

Apos essa configuracao inicial de tanto software comohardware e criada uma imagem binaria contendo o OS e aaplicacao desejada que juntamente com uma descricao dohardware e passado para a ferramenta N-MIPS para refina-mento e simulacao. Caso modificacoes sejam necessarias, noproprio framework e possıvel retornar ao ponto desejado erefaze-lo. Caso o resultado esperado tenha sido obtido, coma mesma imagem binaria (contendo a aplicacao e o OS), epossıvel testa-lo em uma plataforma real, desde que requisitoscomo area do dispositivo sejam respeitados.

IV. CONSIDERACOES FINAIS

Este Capıtulo apresentou uma extensa revisao do estadoda arte contemplando os principais conceitos envolvendo sis-temas embarcados. Alem de uma conceituacao inicial, pode-se observar exemplos e a descricao de ca-racterısticas tıpicasencontradas nesses sistemas.

Fig. 19. Fluxo de desenvolvimento da plataforma Hellfire

Dentre essas caracterısticas, destacou-se, principalmente,os Sistemas de Tempo Real, como sendo aqueles ondeo resultado de uma computacao depende nao somente desua corretude logica como, tambem, do tempo em queessa tarefa foi concluıda. Conceitos importantes sobre sis-temas de tempo real, enfatizando-se a parte do escalona-mento tida como vital para que o leitor possa compreen-der o sistema proposto. Ainda, brevemente discutiu-se ouso da virtualizacao em sistemas embarcados, destacando-se asprincipais vantagens, problemas e casos de uso dessa tecnica.

Na segunda parte do texto, o sistema Hellfire foi exposto eseu principal componente, o HellfireOS foi detalhadamente de-scrito. Esse sistema, tipicamente de tempo real, possui diversascaracterısticas para auxiliar o desenvolvedor a aumentar a pro-dutividade de seu desenvolvimento sem perder o compromissocom as tarefas portadoras de requisitos temporais. Assim, foiexposta a implementacao do modelo de tarefas existente nessesistema operacional. Hellfire ainda pode ser considerado comoum sistema operacional que pode ser extensıvel e adaptadoa diferentes arquiteturas. Tambem foi exposto o fluxo dedesenvolvimento adotado pela plataforma Hellfire (e todas asferramentas que a compoem) no intuito de facilitar a visaoglobal do sistema.

Espera-se que com este trabalho seja possıvel entender aimportancia do software embarcado nos sistemas atuais alemde estimular que novos desenvolvedores sintam-se atraıdospara os desafios atuais e futuros desta area em pleno desen-volvimento.

REFERENCES

[1] J.-M. Farines, J. da Silva Fraga, and R. S. de Oliveira, Sistemas deTempo Real. Sao Paulo-SP: Second Escola de Computacao, IME-USP,2000.

[2] K. Vivekanandarajah and S. K. Pilakkat, “Task mapping inheterogeneous MPSoCs for system level design,” in ICECCS2008: Proceedings of the 13th IEEE International Conference onEngineering of Complex Computer Systems. Washington, DC, USA:IEEE Computer Society, Apr. 2008, pp. 56–65. [Online]. Available:http://dx.doi.org/10.1109/ICECCS.2008.18

[3] A. Sangiovanni-Vincentelli, “Quo vadis, SLD? reasoning about thetrends and challenges of system level design,” Proceedings of theIEEE, vol. 95, no. 3, pp. 467–506, 2007. [Online]. Available:http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=4167779

[4] Y. Cho, S. Yoo, K. Choi, and N.-E. Z. A. A. Jerraya, “Schedulerimplementation in mp soc design,” in ASP-DAC ’05: Proceedings ofthe 2005 conference on Asia South Pacific designautomation. NewYork, NY, USA: ACM Press, 2005, pp. 151–156.

[5] G. Marchesan Almeida, G. Sassatelli, and P. Benoit, “An adaptivemessage passing mpsoc framework,” International Journal ofReconfigurable Computing, vol. 2009, pp. 1–21. [Online]. Available:http://hindawi.com/journals/ijrc/2009/242981.pdf

[6] W. Wolf, “How many system architectures?” Computer, vol. 36, no. 3,pp. 93–95, 2003.

[7] L. Carro and F. R. Wagner, “Sistemas computacionais embarcados,” inJornadas de Atualizacao em Informatica 2003, 2003, ch. 2.

[8] L. Lavagno and C. Passerone, “Design of embedded systems,” inEmbedded Systems Handbook, R. Zurawski, Ed. CRC press, 2005,ch. 3.

[9] R. S. de Oliveira, A. da Silva Carissimi, and S. S. Toscani, “Organizacaode sistemas operacionais convencionais e de tempo real,” in Jornadasde Atualizacao em Informatica 2002, 2002, ch. 8.

[10] H. Hansson, M. Nolin, and T. Nolte, “Real-time in embedded systems,”in Embedded Systems Handbook, R. Zurawski, Ed. CRC press, 2005,ch. 2.

[11] R. A. Bergamaschi and W. R. Lee, “Designing systems-on-chip usingcores,” in DAC ’00: Proceedings of the 37th conference on Designautomation. New York, NY, USA: ACM Press, 2000, pp. 420–425.

[12] R. A. Bergamaschi, S. Bhattacharya, R. Wagner, C. Fellenz, M. Muh-lada, W. R. Lee, F. White, and J.-M. Daveau, “Automating the designof socs using cores,” IEEE Des. Test, vol. 18, no. 5, pp. 32–45, 2001.

[13] K. K. Fellow, “Ieee transactions on computer-aided design of integratedcircuits and systems, vol. 19, no. 12, december 2000 1523 system-leveldesign: Orthogonalization of concerns and platform-based design.”[Online]. Available: citeseer.ist.psu.edu/756855.html

[14] W. Wolf, “How many system architectures?” Computer, vol. 36, no. 3,pp. 93–95, 2003.

[15] A. Jerraya, H. Tenhunen, and W. Wolf, “Multiprocessor systems-on-chips,” Computer, vol. 38, no. Issue 7, pp. 36– 40, July 2005.

[16] S. Pasricha and N. Dutt, On-Chip Communication Architectures: Systemon Chip Interconnect. San Francisco, CA, USA: Morgan KaufmannPublishers Inc., 2008.

[17] G. J. Popek and R. P. Goldberg, “Formal requirements for virtualizablethird generation architectures,” Commun. ACM, vol. 17, no. 7, pp. 412–421, 1974.

[18] G. Martin, “Overview of the mpsoc design challenge,” in DAC ’06:Proceedings of the 43rd annual conference on Design automation. NewYork, NY, USA: ACM Press, 2006, pp. 274–279.

[19] C. A. Waldspurger, “Memory resource management in vmware esxserver,” SIGOPS Oper. Syst. Rev., vol. 36, no. SI, pp. 181–194, 2002.

[20] G. C. Buttazzo, “Real-time operating systems: The scheduling andresource management aspects,” in Embedded Systems Handbook, R. Zu-rawski, Ed. CRC press, 2005, ch. 12.

[21] C. L. Liu and J. Layland, “Scheduling algorithms for multiprogrammingin a hard real-time environment,” Journal of the ACM, vol. 20, no. 1,pp. 46–61, 1973.

[22] J. Lehoczky, L. Sha, and Y. Ding, “The rate monotonic schedulingalgorithm: exact characterization and average case behaviour,” IEEEReal-Time Systems Symposium, pp. 166–171, 1989.

[23] W. H. Hesselink and R. M. Tol, “Formal feasibility conditions for earliestdeadline first scheduling,” Tech. Rep., 1994.

[24] M. Andrews, “Probabilistic end-to-end delay bounds for earliest deadlinefirst scheduling,” in In Proceedings of the IEEE INFOCOM 2000, 2000.

[25] W. Wolf, Computers as components: principles of embedded computingsystem design. San Francisco, CA, USA: Morgan Kaufmann PublishersInc., 2001.

[26] S. Yoo, G. Nicolescu, L. Gauthier, and A. Jerraya, “Automatic generationof fast timed simulation models for operating systems in soc design,” in

DATE ’02: Proceedings of the conference on Design, automation andtest in Europe. Washington, DC, USA: IEEE Computer Society, 2002,p. 620.

[27] K. Popovici, X. Guerin, F. Rousseau, P. S. Paolucci, and A. Jerraya,“Efficient software development platforms for multimedia applicationsat different abstraction levels,” in Proceedings of the 18th IEEE/IFIPInternational Workshop on Rapid System Prototyping. Washington,DC, USA: IEEE Computer Society, 2007, pp. 113–122. [Online].Available: http://portal.acm.org/citation.cfm?id=1263545.1263926

[28] H. Shen and F. Petrot, “Novel task migration framework on configurableheterogeneous mpsoc platforms,” in Design Automation Conference,2009. ASP-DAC 2009. Asia and South Pacific, Jan. 2009, pp. 733–738.

[29] S. Subar, “Virtualisation to enable next billion devices,”Web, Available at http://www.embeddeddesignindia.co.in/ART 8800576093 2800003 TA 7cb7532e.HTM. Accessed at 10feb., 2009.

[30] G. Heiser, “The role of virtualization in embedded systems,” in IIES’08: Proceedings of the 1st workshop on Isolation and integration inembedded systems. New York, NY, USA: ACM, 2008, pp. 11–16.

[31] A. Aguiar and F. Hessel, “Embedded systems’ virtualization: The nextchallenge?” jun. 2010.

[32] ——, “Virtual hellfire hypervisor: Extending hellfire framework forembedded virtualization support,” in To appear in Quality ElectronicDesign (ISQED), 2011 12th International Symposium on, 2011.

[33] XEN.org, “Embedded xen project.” Web, Available athttp://www.xen.org/community/projects.html. Accessed at 10 ago.,2010.

[34] W. River, “Wind river,” Web, Available at http://www.windriver.com/.Accessed at 2 oct., 2010.

[35] V. VLX, “Real-time virtualization for connected devices,” Web, Avail-able at http://www.virtuallogix.com/. Accessed at 2 oct., 2010.

[36] Trango, “Trango hypervisor,” Web, Available at http://www.trango.com/.Accessed at 2 oct., 2010.

[37] XtratuM, “Trango hypervisor,” Web, Available athttp://www.trango.com/. Accessed at 2 oct., 2010.

[38] R. Le Moigne, O. Pasquier, and J.-P. Calvez, “A generic rtos model forreal-time systems simulation with systemc,” in Design, Automation andTest in Europe Conference and Exhibition, 2004. Proceedings, vol. 3,Feb. 2004, pp. 82–87 Vol.3.

[39] A. Aguiar, S. J. Filho, F. G. Magalhaes, T. D. Casagrande, andF. Hessel, “Hellfire: A design framework for critical embedded systems’applications,” in ISQED. IEEE, 2010, pp. 730–737.

[40] R. Le Moigne, O. Pasquier, and J.-P. Calvez, “A generic rtos model forreal-time systems simulation with systemc,” in DATE ’04: Proceedingsof the conference on Design, automation and test in Europe. Washing-ton, DC, USA: IEEE Computer Society, 2004, p. 30082.

[41] L. Sha, “Rate monotonic analysis for real-time systems,” Computer,vol. 26, pp. 73–74, 1993.

[42] G. L. Peterson, “Myths about the mutual exclusion problem,” Informa-tion Precessing Letters, vol. 12, no. 3, pp. 115–116, 1981.

[43] G. Kane, MIPS RISC architecture. Upper Saddle River, NJ, USA:Prentice-Hall, Inc., 1988.

[44] O. Cores, “Plasma most mips i(tm) opcodes,”http://www.opencores.org.uk/projects.cgi/web/mips/, Accessed,September 2009, 2007.

[45] S. J. Filho, A. Aguiar, C. A. M. Marcon, and F. P. Hessel, “High-level estimation of execution time and energy consumption for fasthomogeneous mpsocs prototyping,” in RSP ’08: Proceedings of the2008 The 19th IEEE/IFIP International Symposium on Rapid SystemPrototyping. Washington, DC, USA: IEEE Computer Society, 2008,pp. 27–33.

[46] A. Mehran, A. Khademzadeh, and S. Saeidi, “Dsm: A heuristic dynamicspiral mapping algorithm for network on chip,” IEICE ElectronicsExpress, vol. 5, no. 13, pp. 464–471, 2008.

CBSEC 2012- CES-School paper 96934

1

Abstract— Embedded systems design is restricted by functional

and non-functional requirements as performance, power and

energy consumption, memory footprint, availability, reliability,

costs and design time. These requirements are very different from

the usually found in the application development for the desktop-

based systems. The embedded systems are a multi-domain design,

composed of digital and analog hardware and software

components. The platform-based design intends to decrease the

design time using predefined hardware platform. Consequently,

the design effort concentrates in the mapping of functionalities to

the platform components. This short course provides an

introduction to embedded systems design. Additionally, the

course will present a development board for embedded system

called Beagleboard. The main component of the Beagleboard is

the OMAP 3530 platform which has two processors, a general

purpose ARM Cortex A8 and a DSP processor C64x+.

Index Terms—embedded systems, platform-based design,

requirements

I. INTRODUCTION

owadays, is more and more common the use of

electronic devices with an embedded processor. These

devices called embedded systems (ES) have some similarities

with the general purpose computer, but usually are design to

execute specific tasks. Marwedel [1] defines an embedded

system as an information processing system which is

integrated inside a product, and usually not visible to the user.

Cell phones, avionics systems, car navigation systems are

example of embedded systems.

The development of hardware and software components for

embedded systems has some differences compared to the

development of general purpose desktop software. In the ES

development, the hardware layer need to be considered,

because it will be an important component to cover the

requirements like performance, power and energy

consumption, costs and design time.

The performance is directly related to the power and energy

requirement. A high-performance processor tends to consume

Manuscript received April 30, 2012. This work was supported in part by

Fundação Araucária and CNPq.

Marcio Seiji Oyamada, UNIOESTE – Curso de Ciência da Computação,

Cascavel, PR, Brasil (e-mail: [email protected]).

Alexadre Augusto Giron, UNIOESTE – Curso de Ciência da

Computação, Cascavel, PR, Brasil (e-mail: [email protected]).

Joao Angelo Martini, UEM- Departamento de Informática, Maringá, PR,

Brasil (e-mail: [email protected]).

more power. This is a key requirement in the embedded system

design, since most of the devices are powered by batteries. The

high power consumption will require constant recharging. It

will be an uncomfortable fact in some devices such as mobile

phones.

Fault tolerance is especially important in the critical

embedded systems, such as train and aircraft controller and

ABS brakes. In such cases, even if a component has failed, the

systems must be designed to mask it.

The restrictions of the ES must be satisfied in order to

ensure a final product which meets the requirements. However,

the design of an ES must be done within the shortest time

possible, because this is a critical factor and impacts directly in

the acceptance and profit.

Advances in microelectronics enabled the development of

innovative embedded systems. Increases in the integration

capacity of transistors have allowed the development of

solutions with various components integrated into a single

chip, such as processors, memory, analog and digital

interfaces. These solutions are called system-on-chip (SoC), or

MPSoC when the multiple processors are integrated on a chip.

The main advantages of SoCs are [2]:

• Increases in the operation speed, because the

communication between the processor and the others

components are performed on chip;

• Reduction on the power consumption and size;

• Increases the reability compared to solutions with multi-

ICs;

• Potentially low cost solutions.

This paper presents some important topics in the embedded

system design. Section II describes the key requirements in the

embedded system design. Section III presents the overall

design flow of an embedded system, and details the platform-

based and IP-based design methodologies. In Section IV, some

MPSoC are described and Section V describes the OMAP

3530 platform. Section VI presents the conclusions.

II. EMBEDDED SYSTEM REQUIREMENTS

A. Performance and power consumption

The performance requirement depends on the application

characteristics. Multimedia applications require higher

performance compared to word processing. The performance

is directly related to the power consumption, since processors

with higher performance tend to consume more power.

Introduction to embedded systems and platform-

based design

Marcio Seiji Oyamada, Alexandre Giron, João Angelo Martini

N


2

In battery-powered devices the requirements of peak of

power consumption and energy consumption must be taken

into account. In order to increase the utilization time of an

embedded system, it is necessary to reduce the energy

consumed. However, since each battery has a peak power that

it supports, the maximum power of the system is also an

important requirement. However, the decision between

consuming more in less time or decrease the power

consumption and increase the processing time is not trivial.

The energy (E) used by a device is the resulting from the

power consumed P over the time t, as shown in Equation 1.

E = ∫ P dt (1)

In [3] the dynamic power consumption in CMOS

(Complementary Metal-Oxide-Semiconductor) circuits, can be

obtained by means of the Equation 2.

P = C x A x F x V² / 2 (2)

where C is switching capacitance, A is switching activity, F is

the operation frequency and V is the voltage. In the CMOS

circuits we also can consider the frequency as a linear

proportion of the voltage [4].

Based on the Equation 2, it can be stated that changes in the

frequency has a quadratic impact on the power consumed by

the circuit. It has motivated the use of multiprocessor

architectures with small cores, where one can get the same

performance, but reducing the total system power

consumption. However it is important to note that the same

performance is obtained only if the division of processing on

different processors is perfect. The parallelization of an

application involves the communication and synchronization

of multiple tasks, decreasing the efficiency. If these losses are

not significant, the solution with multiple cores can be

advantageous due to the reduction of power consumption.

Thus, we can reduce the energy consumption by decreasing

the power or decreasing the processing time. In the case of a

processor, one can increase the frequency of operation to

complete the task faster, resulting in an increase in power

consumption. Another solution is to decrease the frequency of

the processor, reducing power consumption. However, this

choice would result in a longer processing time, increasing the

operating time t. This scenario is described in Figure 1, where

the line P1 represents a processor with higher frequency and

the line P2 represents a processor with a lower frequency. The

choice of which approach should be used is defined in the

design space exploration. In this case, the designer should

check the requirements because some applications need to be

completed quickly, and others must be executed with the best

balance between power and processing time.

Fig. 1. Energy consumption as a factor of power and processing time[1]

To improve the tradeoff between performance and power

consumption, some processors have included some features to

reduce the power consumption like as DVS (Dynamic Voltage

Scaling or frequency), Clock Gating and Power States.

The DVS technique reduces the frequency or voltage of the

processor, thus, reducing its power consumption as stated in

the Equation 2. The Intel Xscale is an example of an

embedded processor with the DVS, varying the operation

frequency from 150 MHz to 800 MHz and power ranging from

0.1 to 0.9 Watts as shown in Figure 2. However, is important

to note that the performance varies according to the frequency

changes.

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0

200

400

600

800

1000

1200

150 600 800

Watts MIPS

Frequency

WattsMIPS

Fig. 2. Xscale frequency vs power consumption [5]

The clock gating method is used in synchronous circuits and

the basic idea is to disable portions of the circuit not bing used.

Using the Equation 2, when the clock gating method is enabled

the activity A becomes zero, and consequently the dynamic

power consumption is also zero, resulting significant savings

in the power consumption.

The use of different operational states is another low-power

technique used for embedded processors. This technique

defines different states to turn off some of the components

when they are not required. The StrongARM processor [6] is

an example of the use of this technique. The Run state

provides full processor operation and, in Idle state, the CPU is

disabled and the interrupts are monitored. In Sleep state, CPU

is totally turned off, and only few interrupts like real-time

clock are enabled. As show in the Figure 3, the Sleep state

offers the greatest energy savings, but it spends more time to

back to the Run state.


3

Fig. 3. Power State em um processador StrongARM [6]

B. Development time, cost and design

Another important requirement in an embedded system

design is the development time. These products usually have

factors like innovation and novelty making necessary the

decreasing of the design time. Large profits of an innovative

product are achieved in the initial time of life cycle, where

normally there is no competition of similar products. As the

novelty factor goes, profitability decreases and it requires the

development of new innovative products. It pushes the

industry to develop new products reducing the design time

available.

Embedded systems also suffer from seasonal factors, for

example in the case of video game consoles, where the

launches are scheduled before the Christmas day.

Cost is also an important requirement in the design, because

the products for the consumer market are very sensitive to the

final product price. Thus, any possible optimization to reduce

cost will be a competitive aspect. In high-volume products, an

US$ 1 reduction in product cost will have a significant impact.

Embedded system has many different aspects compared to

desktop computers, where the design is focused on user

interface. In an embedded system, another factor to be

considered is the design of the product. Thus, the design must

consider the size, shape, material used and input/output

interfaces. All these aspects have a direct influence on the

hardware and software to be developed.

III. DESIGN METHODOLOGIES

This section will present some aspects related to the design

flow of embedded systems. Initially, we will present the

different abstraction levels that can be used in the embedded

system design. Additionally, we will describe the platform-

based and component-based design, discussing the differences

between these two approaches.

A. Abstraction Levels

Abstract descriptions provide a suitable way to manage

design complexity, hiding implementation details that the

designer may want to leave out at some point. In consequence,

the description is short, making its understanding easier.

The design flow must define different abstraction levels and

refinement steps that lead to a final solution [7]. Ideally, the

designer would have the benefit of automatic refinement from

higher abstraction levels to system implementation. The

embedded system design is quite complex and the available

tools do not cover all design steps.

Fig. 4 shows an ideal embedded system design flow. The

system specification describes the behavior of the system

under development. Software engineers see the system

specification as a document that describes the required

functionalities, using abstract representations. For instance,

using the UML notation, a system may be represented by

classes and use cases diagrams. Usually, in electronic design,

an executable specification is used to represent system

behaviour. Some languages have been proposed for electronic

system specification, such as SystemC [8] and SpecC [9].

These languages are extensions of existing imperative

languages (for example, C++) and support hardware-oriented

descriptions. However, the research community is now

focusing on the use of more abstract specification languages

such as UML and Simulink.

Fig. 4. Embedded system design flow[10]

Architecture exploration uses the specification to define the

golden architecture that covers application requirements in

terms of performance, power consumption, energy

consumption, and area, among others.

Due to strict requirements, usually the design of an

embedded system involves the development of hardware and

software. Thus, it is necessary to explore the design space to

obtain the best configuration of the hardware architecture for a

given application. This phase helps designers in the detection

and resolution of problems related to architectural design.

According to Carro and Wagner [10], the exploration phase

will find the solution to three questions:

1) How many and which processors and dedicated blocks

are needed?

2) What ideal mapping of functions on the hardware

components?

3) What is the ideal communication infrastructure to

connect the components in the architecture?

From the architecture exploration step, a macroarchitecture


4

that represents the system in terms of software and hardware

components is obtained. Each component is then refined

following the traditional hardware and software design flow.

For the software it includes the software and RTOS

generation. The hardware side includes the synthesis of

specific hardware components. The communication synthesis

includes the generation of necessary hardware and software

components to accomplish it. In some cases communication

protocols may require hardware components such as co-

processors and channel adapters, which are responsible for

adapting the internal component bus to the interconnection

network.

The final step includes the hardware and software

integration and the validation of the whole solution.

B. Platform-based Design

The design of a new architecture involves non-recurring

engineering (NRE) costs that are not negligible in the overall

cost of manufacturing and designing a SoC[11]. Due to these

costs, developing a new architecture from scratch for each new

product becomes unacceptable. Consequently, platforms are

proposed to cover an application domain, and then tailored to

a specific product.

Platform-based design [12] uses architecture templates to

obtain a solution called a derivative, by tailoring the platform

for a given application. Architecture templates are domain-

specific hardware platforms composed of processors,

memories, hardware blocks, and communication structures.

Occasionally, these components have some degree of

configurability, such as processor caches and memory sizes.

Fig. 5 presents the overall platform-based design flow. The

platform is defined from the past designs and the requirements

of a group of applications or domain. A solution is obtained

using the base platform and customizing according to the user

needs. This includes the software development, user interface

and hardware customizations.

requirements past designs

platform

user

needs

product

Fig. 5. Platform-based design [12]

Platform-based design provides gains in terms of design

time and cost. Application mapping to platform components

must be efficient and handled by system-level design tools.

C. Component-based Design

In component-based design the architectural template is

implemented by assembling hardware and software IP

components available in a library or provided by third-party

companies. Components should comply with a given protocol,

thus making their integration into the platform possible. The

reuse of pre-tested components reduces design time and

facilitates the verification of the solution in terms of expected

system functionality and requirements.

Component-based design requires a well-defined process

involving IP creation, qualification, and classification[13] on

the IP provider side. On the client side, IP integration includes

the search process, validation, and final integration with the

platform. The integration step is highly influenced by the IP

distribution form. IP components may be distributed in hard

form, when all gates and interconnects are placed and routed;

soft form, with only an RTL representation; or, firm form, with

an RTL description together with physical floorplanning or

placement. Using hard IP components has the advantage of

yielding more predictable estimations of performance, power,

and area. However, they are less flexible and therefore less

reusable than adaptable components.

IP integration imposes problems due to the heterogeneous

and hard IP components. The bus-based approach uses

standard interconnection, to which the IP interface must

comply, following a plug-and-play integration. AMBA[14]

and CoreConnect[15] are examples of standard buses available

in the market. When the source code is available, the IP

component may be changed and adapted for the target

platform. Another solution is to construct a wrapper around the

component that adapts it to the bus or the interconnection

network. Software IP components are standardized by the API

and target OS. OSEK[16] (for automotive systems) and

ITRON[17] (for consumer electronics) are examples of

domain-specific APIs.

IV. MPSOC ARCHITECTURES

The increase in performance requirements and restrictions

on the power consumption as shown in the previous sections,

calls for solutions using multiple processors (homogeneous or

heterogeneous) in embedded systems, called MPSoC

(multiprocessor system-on-chip). MPSoC design opens many

possible solutions in terms of processor architectures, IP

components, and interconnection structure. The next sections

present the trade-offs, in terms of hardware for MPSoC.

A. Processor

Fig. 6 shows the market share for each type of embedded

32-bit processor. In contrast to personal computer processors,

the market, here, is shared among different architectures and

manufacturers. These different architectures provide various

options in terms of performance, power consumption, area,

and cost.


5

ARM

57%

68K

3%

PowerPC

6%

x86

3%

SuperH

2%

Proprietary

16%

Other

1%

MIPS

12%

Fig. 6. Market share of 32-bit embedded processors [18]

Processor microarchitecture design has an important impact

on MPSoC quality. Microarchitecture optimization for a given

application includes pipeline configuration, branch prediction,

and prefetch, among others. Processor data size is another

design parameter, since embedded applications require a

minimal size. Processor cores are available in different

versions of 8, 16, and 32 bits. Currently, most embedded

software remains unchanged after product deployment, making

it possible to tune architectural parameters according to system

requirements.

Application-specific processors (ASIP) optimize the

architecture by creating new instructions to efficiently execute

a given application. Commercial processors like Tensilica [19]

are sold with an environment to analyze the application C

code, in order to configure and derive the optimized

architecture.

The multimedia domain is composed of processing-

intensive applications and requires the use of more

performance/power efficient architectures, such as digital

signal processors (DSP). These processors optimize the

execution of DSP algorithms using MAC (multiply and

accumulate) units, address generators, and Harvard

architecture, among other features. DSP processors efficiently

execute digital signal processing algorithms and can run at low

frequencies compared to general-purpose processors (GPP),

consequently decreasing energy consumption.

Very long instruction word (VLIW) processors also provide

an efficient architecture to execute processing-intensive

applications, exploiting instruction-level parallelism (ILP) at

compilation-time. For this reason, VLIW processors do not

require the complex dispatch units and speculative techniques

used in general purpose processors, since ILP is statically

extracted.

The purpose of multithread architectures is to efficiently

execute multithread applications by supporting fast context

switch and concurrent execution. Fast context switch provides

a way to hide memory latency by executing other threads when

memory access occurs.

Low-power techniques such as frequency/voltage scaling or

operation states as presented in Section I need the OS or

another supervisor component to control their use. Normally,

for laptops, the processor dynamically adjusts the

frequency/voltage based on application demand. However,

these techniques impact on processor performance and system

response. As a consequence, these techniques require an

integrated application and OS design in order to not disturb the

real-time behavior that is commonly required for embedded

applications.

B. Memory

Memory design has an important impact on processor

performance and power consumption. For embedded

processors, cache design is important because its influence on

system power consumption represents about 50% of core

power consumption (see Fig. 7).

> 50%

Fig. 7. Processor power-consumption [20]

Fig 8 presents work done by Zhang et al. [20] showing the

influence of cache size on global energy consumption. It can

be seen that global energy is directly related to cache size.

Initially, when cache size increases, global energy decreases

because of fewer memory accesses. However, after a given

point, the sheer influence of cache size dominates global

energy consumption, despite a small number of memory

accesses. The same scenario occurs for processor performance

[21]. After a given point, an increase in cache size will not

result in an increase in performance, because the application

reaches a temporal and locality limit.


6

0

1

2

3

4

5

1K

B

2K

B

4K

B

8K

B

16K

B

32K

B

64K

B

128

KB

256

KB

512

KB

1M

B

Cache Size

En

erg

y(J

)Cache Memory Total

Fig 8. Cache size and its influence on system energy consumption[20]

Other techniques are available to decrease power

consumption and execution time in memory hierarchies.

Scratchpad or fast memories are small memories inside the

processor core used to decrease access time and power

consumption. The main difference with cache is that their

contents are directly loaded by the application, making the

programmer responsible for choosing which data and

instructions are important in regards to fast memory. This

technique makes execution time more predicable in

comparison to caches, which can be polluted by other tasks.

Jain et al. [22] propose a technique to lock the cache lines,

avoiding undesirable line substitution. In both techniques,

knowledge of application behavior is necessary to optimize

scratchpad and cache use.

C. Interconnection

SoC interconnection design complexity is increasing due to

the number of components and sophisticated communication

schemes. Ad-hoc solutions cannot deal with concerns

regarding flexibility and design time, which can only be

addressed by long-term solutions that can cope with future

MPSoC requirements.

Point-to-point connections, shown in Fig 9(a), enable

designs customized in terms of performance and predictability.

However, design time and low reuse make point-to-point

interconnections impracticable in future MPSoC designs.

Current MPSoC designs commonly adopt bus-based (see

Fig 9(b)) solutions. Due to scalability problems many

variations, such as hierarchical buses and time-sliced

arbitration, are proposed.

The network-on-chip (NoC) approach represents a long-

term solution for MPSoC design. A NoC, as shown in Fig.

9(c), provides the scalability and reuse necessary to future

MPSoC designs. Predictability and real-time requirements call

for NoC solutions with quality-of-service (QoS) capabilities.

Currently, NoCs are a subject of intense research. However

few real designs exist, due to high latency and area overhead

when compared to other interconnection solutions.

uP M uP

uP M

IP

IO uP

uP

M

M

(a) (b) (c)

uP M uP

uP M

IP

IO uP

uP

M

M

(a) (b) (c)

Fig 9. Communication topologies (a) point-to-point, (b) bus-based

connection, and (c) network-on-chip

To improve reusability, communication interconnection is

provided in the form of IP component[14] that must be

configured for a given application (for example, number of

masters in a bus, switch buffer size in a NoC). This requires

tools to explore the communication structure and to link

application QoS requirements to the real implementation.

D. MPSoC Platforms

MPSoC with heterogeneous processors have been proposed

to reduce energy consumption and increase performance in

specific tasks. A heterogeneous MPSoC usually has a general

purpose processor, which runs the operating system, and one

or more special purpose processors like graphics or digital

signal processors.

In this section some MPSoC platafoms will be presented.

Fig 10 shows the NovaThor[23] platform targeted to mobile

phones and multimedia PDAs. Nova is a multimedia platform

composed of a dual-ARM processor and audio and video

accelerators. Many I/O interfaces, such as an LCD controller,

USB, and flash card, are available.

Fig. 10. NovaThor U9500 architecture

OMAP 1610[24] is another example of a platform targeted

for use in multimedia mobile devices. The platform is

composed of two processors: a general-purpose processor

(ARM926), used to execute system-level tasks, and a DSP

used for multimedia processing (see Figure 11). The SoC also

integrates digital interfaces with external devices. An API

called OMAPI is provided to access the multimedia resources

available in the DSP processor, thus abstracting the hardware

architecture. The OMAP platform leaves the programmer

responsible for detecting code suitable for execution in the

DSP.


7

Fig. 11. Platform OMAP 1610

Figure 12 shows the Phillips Nexperia [25] platform.

Nexperia is a heterogeneous platform composed of a general-

purpose processor (MIPS), DSP processors, and various

hardware application-specific accelerators. The memory

controller manages communication and is interconnected with

different buses available in the platform. Bridges share

communication among the subsystems, avoiding overload of

the memory controller.

Fig 12. Phillips Nexperia PNX8550

Programming models for MPSoC platforms have become a

major issue, due to the programming complexity of

coordinating platform elements. UHAPI is Nexperia’s abstract

programming model and is used for home applications based

on use cases. UHAPI brings platform programming close to

software engineering models such as UML, by providing high-

level use cases for the most common needs of home

applications. For instance, the API provides use cases to play

DVDs, record movies, and so on. This represents an important

tendency because the value of the platform is not only

attributed to the hardware solution, but also to the API that is

provided.

V. PLATFORM OMAP 3530

The OMAP3530 [26] (Open Media Application Platform) is

an MPSoC for mobile and portable multimedia developed by

Texas Instruments (TI). Some examples of embedded devices

using OMAP 3 platform are presented in Table I.

TABLE I

EMBEDDED DEVICES CONTAINING OMAP 3 PLATFORM

Device Type MPSoC

Touch Book Netbook OMAP 3530

Pandora Portable Video Game OMAP 3530

DevKit8000 Evaluation Kit OMAP 3530

BeagleBoard Evaluation Kit OMAP 3530

Motorola Milestone Smartphone OMAP 3430

Nokia N9000 Smartphone OMAP 3430

Samsung i8910 Smartphone OMAP 3430

Galaxy S Smartphone OMAP 3630

Droid 2 Smartphone OMAP 3630

Milestone 2 Smartphone OMAP 3630

The OMAP is an heterogeneous MPSoC and has a general

purpose processor ARM Cortex A8, and a digital signal

processor, DSP C64x+. These two processors will be

discussed in more detail in subsequent sections.

A. ARM Cortex A8

ARM (Advanced Risc Machine) is a RISC (Reduced

Instruction Set Computer) 32 bits architecture, targeting the

embedded market [27]. The ARM Company holds the

copyright of the architecture and the other companies wishing

to produce the ARM processors needed to be licensed. The

ARM is responsible for the evolution and development of new

architectures.

There are two license types: implementation and

architecture. The implementation license provides all

information required to produce integrated circuits containing

the ARM processor. The architectural license gives the rights

to develop a processor with an ARM compatible instruction

set.

Several factors made the ARM processor suitable for

embedded systems. Its simple RISC architecture requires less

transistors and consequently decreasing the footprint, costs and

power consumption.

The ARM architecture has many features of typical RISC

architectures, but it is not entirely RISC. When the first RISC

processors emerged, their objectives were reducing the

complexity of the instructions and obtain higher operation

frequency and performance through the use of pipeline. Issues

such as power consumption, size and low cost production

requirements were not the main objective, despite the RISC

architecture contribute to some of them because of their

simplicity.

In RISC processors, the executable size is larger when

compared to CISC architecture due to the need of more

instructions to represent the same behavior. In the case of


8

embedded devices, where memory requirements are strict,

some changes were made in ARM to reduce the memory

occupied by the code. Sloss, Symes e Wright [28] describes

the main differences between the RISC processors and ARM:

Different execution cycles in some instructions: the

number of registers involved in execution and the type of

memory access (sequential or random), interferes in the

number of cycles of each instruction;

Shift preprocessing: before reaching the ALU, one

operand can be modified using shift operations,

expanding the operand before it is used. It reduces the

code size;

Instruction set Thumb 16-bit: this increases the density of

the code in approximately 30% when compared to fixed-

length instructions of 32 bits;

Conditional execution: an instruction can be executed

when a specific condition is satisfied. This avoids explicit

branch instructions, improving performance and code

density;

Specific instructions: some instructions were added for

digital signal processing as the integer multiplication of

16 bits operands with saturation;

Multiple Load/Store: memory access can operate on

multiple registers.

1) Coprocessors

One feature that makes the ARM processors suitable for

embedded systems is the possibility of extending the

instruction set using coprocessors. The coprocessors are

special purpose processors, designed to extend the

functionality of the processor or improve performance for a

given domain. The ARM core, for example, does not contain

instructions for floating point, requiring the software emulation

of these operations. However, a floating point coprocessor can

be added in the architecture improving the performance of

applications using floating point operations. Thus, a device

that does not require floating point calculations can use an

ARM processor unit without the floating point processing.

This reduces the cost, size and the power consumption of the

processor. On the other hand, a device that performs a large

number of floating point operations may include a floating

point coprocessor.

The ARM Cortex A8 has 16 coprocessors from CP0 to

CP15. The instructions to use the coprocessors are part of the

ARM instruction set. As consequence of this, all features

available in the coprocessors are accessed through assembler

instructions, or instructions for the ARM core.

As previously mentioned, the ARM architecture is licensed

for the manufacturing by third parties. The licensees then

produce its ARM processors, including more or less

coprocessors to ARM core, according to the application and

the target market. By default, the compiler does not use the

ARM coprocessors to generate machine code, but it can be

signaled to the compiler which coprocessors are available.

2) ARM-Cortex A8

The ARM Cortex-A8 architecture is based on ARMv7, and

the operation frequency can vary from 600Mhz to 1Ghz,

depending on the processor model. One particular element of

the ARM Cortex architecture is the NEON coprocessor. It has

SIMD(single instruction multiple data) instructions that

operate on vectors of 128 bits, and can be used to speed up

processing multimedia audio and video, among others.

The ARM Cortex-A8 pipeline has 3 stages(see Figure 13):

• Fetch stage (3 cycles)

• Decode stage (4 cycles)

• Execution stage (6 cycle)

Fig. 13. Pipeline ARM Cortex A8 [29]

The execution stage of the pipeline has three functional

units, two ALUs and one load/store unit (LS). This decreases

the occurrence of pipeline stalls and enables the superscalar

execution.

B. DSP TMS320C64X+

Analog signals represent physical quantities like pressure

and temperature that varies continuously over time. To treat

them computationally is necessary a digital representation

format. A digital signal in this case is nothing more than a

sequence of discrete states that encodes a message. In general,

digital processing functions are mathematical operations in

real-time signals, repetitive and numerically intensive [30].

Digital Signal Processors (DSP) are specialized processors

that treat signals from various types (data, video, audio, etc.),

and has advantages in terms of performance, cost and power

consumption [31, 32]. In contrast to general purpose

processors (GPP), they are developed to efficiently execute

DSP algorithms [31]. Operations such as multiply-and-

accumulate (MAC) and SIMD (Single Instruction Multiple

Data), which operate in parallel on a set of data, are commonly

supported by DSP processors.

Since the DSP processors are designed to efficiently execute

algorithms for digital signal processing, its architecture is

different from those found in GPPs. Older DSPs processors

were very different from general purpose processors, using

fixed-point arithmetic instead of floating point, and Harvard

architecture instead of the traditional Von Neumann

architecture [32]. Such differences no longer exist in the latest

generations of DSPs.

DSP processors now implement floating-point arithmetic as

TMS320C67xTM generation of Texas Instruments (TI). On

the other hand, Harvard architectures that have separate caches

for data and instructions, are now used by many GPPs.

However, most DSP processors still use fixed point arithmetic,

allowing the reduction of hardware complexity and energy

consumption. In DSP algorithms this characteristic is not a


9

problem because digital signals have a well defined range and

can be discretized with little or no loss of information.

The TMS320C64x+ processor, or just c64x+ is a high

performance processor for digital signal processing with fixed-

point, manufactured by Texas Instruments (TI). It belongs to

the TMS320C6000 generation and has a VLIW

architecture[33]. The specifications for the C64x+ are shown

in Table II.

The C64x+ processor reaches the performance of up to

4160 million instructions per second (MIPS). The architecture

is composed 64 registers (32-bit) and eight general-purpose

functional units. With the maximum possible parallelization,

each functional unit will perform one of the eight instructions

in a VLIW instruction every clock cycle.

Making a comparison with a general-purpose processor that

executes one instruction per cycle, the nominal performance

shows that the DSP processor is eight times faster than the

general purpose processor. However, in real applications the

gains are no more than four times.

Table III presents the results comparing the performance of

a DSP processor and the ARM9 RISC processor. The DSP

processor is a C55x and has similar features of the C64x +.

The performance gains of the DSP processor are three times

on average. The minimum speed up achieved was 1.1 times

and the maximum of six times when compared to the

ARM9[24].

The C64x+ processor has eight functional units (L1, L2, S1,

S2, M1, M2, D1 and D2) and are divided into two datapaths

with four units each (Figure 14). Each datapath has a set of 32

registers, and it is also possible to communicate with the other

datapath with a penalty of few cycles. The format of the

instructions in c64x+ processor are based on RISC and VLIW

architectures. Each instruction has a fixed format of 32 bits,

but the compiler can divide it into two parts of 16-bit where

possible. In a VLIW architecture, the compiler is responsible

for the instruction ordering and schedule, in order to maximize

the parallel execution. It is different compared to superscalar

architectures where the processor manages the order of the

instructions and decide when they can execute in parallel due

to the data dependencies. It results in savings of hardware and

energy consumption, because all work done by the hardware in

superscalar architectures is done by the compiler in VLIW

architectures. Another advantage of the VLIW architecture is

that the optimization is performed only once in the compilation

and has no time constraint.

TABLE II

C64X SPECIFICATIONS

Clock Frequency 700 MHz

Instructions per second Up to 5600 Million (MIPS)

Cache L1: 256 Kb L2: 640 Kb

Registers 64 General purpose

Functional Units 2 multipliers 6 ULA

Instructions fetched by cycles 8-14 instructions: 256 bytes

Operands fetched by cycle 4-8 operands: 256 bytes

Arithmetic Fixed-point arithmetic

TABLE III

ARM9 VS C55 PERFORMANCE

ARM9E1

GPP StrongARM

11001

DSP

TMS320C55101

Performance

DSP/ARM2

Echo Cancellation 16-bit (32 ms - 8 kHz) 24 39 4 6x

Echo Cancellation 32-bit (32 ms - 8 kHz) 37 41 15 2.46x

MPEG4/H263 Decoding QCIF @ 15 fps 33 34 17 1.94x

MPEG4/H263 Coding QCIF @ 15 fps 179 153 41 3.73x

JPEG (QCIF) Decoding 2.1 2.06 1.2 1.71x

MP3 Decoding 19 20 17 1.11x

Proportional Average cycle to C5510TM 3.1 3 1

1 Performance in MIPS (millions of instruction pe second) 2 DSP speed up compared to the ARM9 processor


10

Fig. 14. C64x architecture overview [34]

The format of the instructions is shown in Figure 15. The p

field bit determines whether an instruction can run in parallel

with other instructions. The p-bits are read from right to left. If

the p-bit of an instruction I is 1, then the instruction I+1 can be

executed in parallel with the instruction I. If the p-bit of an

instruction is zero, then the instruction I +1 must be executed

after the instruction I. Thus, up to eight instructions can be

executed in parallel, where each one must use a different

functional unit.

Fig. 15. C64x+ Instruction format [34]

The pipeline of the DSP c64x+ is divided in three stages[34]:

• Fetch (4 cycles)

• Decode (2 cycles)

• Execution (5 cycles)

In the fetch stage a package of eight instructions is fetched

from the memory. The fetch stage has four phases for all

instructions: PG (program address generate), PS (Program

address send), PW (program access ready wait) and PR

(Program fetch, packet received). During the PG, the program

address is generated. In PS phase, the program address is sent

to the memory. In PW step, the memory read occurs. Finally,

in stage PR, the fetched packet is received in the CPU. In the

decode stage an instruction package is divided into executable

packages. Executable packages are formed from one to eight

instructions that can be executed in parallel. The decode stage

has two phases: DP (Instruction dispatch) and DC (Instruction

decode). During the DP, the instructions are attributed to

functional units, and in the DC stage the registers are decoded

for the execution inside the functional units.

The execution stage is divided in five phases (E1-E5).

Different types of instructions require different numbers of

cycles for execution. A 16-bit multiplication requires two

cycles. A store instruction requires three cycles, while a load

instruction requires five cycles. Instructions may take more

cycles due to stalls, which may occur during its execution. For

instance, if an operand is not in cache, it should be fetched in

the main memory, and consequently it will consume more

cycles. Figure 16 shows a full pipeline in C64x+ DSP without

stalls.

Fig. 16. C64x pipeline

The C64x+ supports also SIMD instructions, using vector of

128 bits. Table IV presents some SIMD instructions supported

by C64x+ processor. Figure 17 shows also the result of

instruction execution SADDU4, which adds 4 byte integers

with saturation. The saturation is performed directly in

hardware, as opposed to software implementation that needs at

least 2 instructions: compare and assignment instruction.

TABLE IV

C64X+ SIMD INSTRUCTIONS

Fig 17. C64x+ SIMD instructions and saturation example

C. DSP Programming

In order to obtain the best performance in a DSP processor,

the application must be developed considering the processor

architecture. The performance improvement can be reach

using SIMD or VLIW instructions, in order to execute

uint_saddu4(int src1, int src2); Performs saturated addition

between two 8-bit unsigned

values in src1 and src2

double_mpy2(int src1, int src2); Returns the product of low and

high 16-bit values in src1 and

src2

int_subabs4(int src1, int src2); Calculates the absolute value of

the subtraction between src1

and src2, for each packet of 8

bits

uint_avgu4(uint src1, uint src2); Calculates the average for each

pair of 8-bit values

uint_swap4(uint src); Exchange pairs of bytes within

each 16 bit value


11

multiple instructions in parallel, using the multiple functional

units available in the architecture. There are basically three

steps to develop a code for a DSP processor.

The first step is to encode a standard C code, identifying the

time-consuming parts. In DSP applications such parts are in

loops.

In the second phase, the designer tries to optimize the code

in the time-consuming parts, passing to the compiler as much

information as possible. For instance, the number of minimum

and maximum iterations of a loop, use of aligned data, and

informing if a memory location can be addressed using

different pointers. With this information the compiler can, for

example, make a loop unrolling, pack four instructions in

single SIMD instruction, or determine the instructions that can

be executed in parallel. The programmer can also use intrinsic

functions that are mapped directly to assembler instructions.

An example of intrinsic function is the _sadd4, which adds a 4

byte integer with saturation and is translated into the assembly

instruction SADD4. One can also optimize the transfer

between memory/cache and CPU using intrinsic instructions

that read or write up to 8 bytes in memory in a single access.

The last step consists in the writing assembler code for the

main functions if the desired performance is not reached.

Coding in assembler language is possible to decide in which

functional units a given instruction will execute. This step

relies heavily on the developers' knowledge about the

architecture.

In the c64x+ DSP, a compiler flag activates a feedback that

informs the used resources, the pipeline depth and number of

execution cycles. The optimization is made using the upper

bounds of loops, where the parallel execution can use the

efficiently the resources of the architecture.

To examine how performance can be improved DSP we will

analyze three different codes that performs the 8-bit integer

vector addition[35].

The first function is a standard C for the vector addition,

and is presented in Algorithm 1. It can be compiled in any C

compiler and no extra information is passed to the compiler.

void sum(unsigned char *a, unsigned char

*b,unsigned char *res, int n)

{

int i;

for(i=0; i<n; i++){

res[i] = a[i] + b[i];

}

}

Algorithm 1- Standard C vector addition

In the Algorithm 2, some information is passed to the

compiler. The keyword const indicates that the operands

vectors a and b will not be changed inside the function. The

keyword restrict indicates that there no other pointer accessing

the location. The pragma MUST_ITERATE tells to the

compiler the minimum and the maximum amount of

executions of the loop. The last parameter is the multiply and

indicates that the loop will always iterate a multiple of this

many times. Using this extra information the compiler can

safely unroll the loop.

The Algorithm 3 uses intrinsic instructions that perform 8

sum operations in each loop iteration. In this case, 8 bytes of

data from each operand vector are read from memory to

registers. The saddu4 instruction adds 4 bytes and stores the

result in the memory.

void sum_info(const unsigned char * restrict a,

const unsigned char * restrict b,

unsigned char *restrict c,

const int n)

{

int i;

#pragma MUST_ITERATE (512, 1024, 8)

for(i=0; i<n; i++){

res[i] = a[i] + b[i];

}

}

Algorithm 2- Sum with annotations

void sum_intrinsic(const unsigned char *restrict a,

const unsigned char *restrict b,

unsigned char *restrict c,

const int n )

{

#pragma MUST_ITERATE ( 512/8 , 1024/8 ,8)

for ( i =0; i <k / 8 ; i +=8){

unsigned int a1_a0 , a3_a2 ;

unsigned int b1_b0 , b3_b2 ;

unsigned int c1_c0 , c3_c2 ;

a3_a2 = _ hi ( _amemd8_const (&a[i]));

/*higher part of 8 bytes */

a1_a0 = _ lo ( _amemd8_const (&a[i]));

/*lower part of 8 bytes */

b3_b2 = _ hi ( _amemd8_const (&b[i]));

b1_b0 = _ lo ( _amemd8_const (&b[i]));

/* 4 bytes sum intrinsics instruction*/

c3_c2 = _saddu4 (a3_a2, b3_b2);

c1_c0 = _saddu4 (a1_a0, b1_b0);

_amemd8(&c[i]) =_itod(c3_c2,c1_c0);

/*packs to integer into a double and stores in

the memory */

}

}

Algorithm 3- Vector addition using intrinsic

The compiler feedback for the three algorithms is presented

in Table V. The values are the maximum performance that can

be achieved and considers ideal situations. For instance, the

execution cycles is calculated considering that data is always

present in the cache. The performance of the function

sum_intrinsic was about 7.5 faster than sum_info. Compared

to the standard Algorithm 1, the sum_info is 2 times faster.

Table V presents also the following results:

• Loop unrolling: indicates how many times the original

loop was unrolled;

• Minimum and Maximum number of iterations: indicates

the number of loop executions;

• Total cycles: number of cycles in terms of iterations. TABLE V

COMPILER FEEDBACK FOR THE SUM ALGORITHM[35]

Sum Sum_info Sum_intrinsic

Loop unrolling Not 2x 2x

Min iterations Unknown 256 32

Max iterations Unknown 512 64

Total cycles 8+iterations*2 8+iterations*3 6+iterations*3

Cycles n=512 1030 777 192

Cycles n=1024 2054 1545 198


12

D. DSPBridge- ARM-DSP communication

As described in the previous sections the ARM and DSP

processors have different goals in OMAP 3530 platform.

While the first is a general purpose processor which runs the

operating system, the second is dedicated to digital signal

processing in real time. The ARM is considered the main

processor or host, while the DSP can be considered as a

coprocessor. The ARM and the DSP processor has a specific

operating system for resources management, and the

connection between the two OSs is made by DSP Bridge [36].

The DSPBridge [37] driver provides communication and

control functions of the DSP processor. In the ARM side, the

DSPBridge API is used in the following tasks:

• start tasks in the DSP processor;

• send and receive messagens to/from the DSP;

• create and use streams to data transfer to/from the

DSP;

• dynamic memory mapping in the DSP address space;

• stop, restart and delete DSP tasks.

In the DSP side, the API enables the messages exchanges

between the DSP and ARM processors, and the use of streams.

The DSP applications are abstracted as execution nodes. The

nodes can be charged at the boot time or in the execution time

using the DSPBridge API.

The DSP node states are presented in the Figure 18. A DSP

node starts the execution when an ARM task calls the

DSPNode_Allocate function. This function is responsible to

allocate the data structure to enable the control and

communication of the node. In this state the node is allocated

just in the ARM side. After this, the function

DSPNode_Create, will create the node in the DSP side.

When the ARM processor executes the DSPNode_Run, it

will start the node execution into the DSP. It is possible to

suspend a node using the function DSPNode_Pause, and also

resume the execution using the DSPNode_Run command

again.

Fig. 18. Life cycle of a DSP node[38]

The node will change to the state Done, after the processing

is finished or if the ARM calls a DSPNode_Terminate

function.

In order to finish the node life cycle, the function

DSPNode_Delete will release the all resources in the DSP and

ARM side.

E. Beagleboard development kit

The BeagleBoard is a development kit that uses OMAP3530

platform. The Beagleboard measures approximately 3"x 3",

allowing the development of prototypes of small size. Another

feature of the board is low power consumption around 2W.

The consumption is dependent of the number of peripheral

devices attached to the USB port.

The BeagleBoard supports multiple operating systems and

Linux distributions like Angstrom, Debian, Ubuntu and

Android, adapting to the different products requirements. Most

of these distributions provides applications and libraries

repositories in binary format. The application includes

graphics processing, window managers and compilers. It eases

the development of new applications and also allows the

porting of applications already developed for other

architectures. The file system containing the operating system

and applications are stored on the SD card. Figure 19

summarizes the resources of the BeagleBoard C4 version.

The application development for the BeagleBoard can be

made by compiling and testing applications directly on the

board. However, due to the restricted resources, another

solution is to use cross-compilers for the ARM processor,

allowing the development using a desktop.

The development of DSP modules should be performed on a

desktop, since it requires a C6000 compiler from Texas

Instruments. This compiler is free and available for Linux and

Windows operating systems. After compiling the code, it must

be transferred to the SD card.

Processors - ARM Cortex-A8 - 720 Mhz (RISC)

- TMS320C64x+ - 520MHz (DSP) Memory - 256MB DDR RAM

- 256MB NAND flash memory Peripheral and

connections - HDMI - S-Video - USB - I/O audio stereo - RS232 - Connector JTAG

Storage - Slot SD

Fig. 19. Beagleboard C4


13

VI. CONCLUSIONS

This paper presented an introduction to embedded

system, its requirements and main characteristics. Also we

described the steps involving the embedded system design,

and some methodologies currently applied.

The embedded applications have heterogeneous

characteristics. In order to cope with the different

requirements, chip manufacturers have provided

heterogeneous multiprocessor platforms. This paper

described the platform OMAP 3530, detailing the key

aspects of the ARM and DSP processors, showing the main

differences in the development of such systems in relation to

implementation of desktop applications.

REFERENCES

[1] P. MARWEDEL, “Embedded System Design”. Netherland: Springer,

2006.

[2] G. MARTIN and H. CHANG, "System-on-Chip design," in Proc. 4th

International Conference on ASIC, pp.12-17, 2001

[3] T. GIVARGIS and F. VAHID, “PLATUNE: A Tuning Framework for

System-on-a-Chip Platforms,” IEEE Transactions on Computer – Aided

Design of Integrated Circuits and Systems, Vol. 21, No. 11, 2002, p.

1317-1327.

[4] T. SIMUNIC, L. BENINI, A. ACQUAVIVA, P. GLYNN and G. DE

MICHELI, “Dynamic Voltage Scaling and Power Management for

Portable Systems”. in Proc. Design Automation Conference. June,

2001, Las Vegas.

[5] Intel® XScale™ Microarchitecture Technical Summary, Intel

Corporation, 2000.

[6] L. BENINI, A. BOGLIOLO and G. DE MICHELI. “A Survey of Design

Techniques for System-Level Dynamic Power Management”. IEEE

Transactions on Very Large Scale Integration (VLSI) Systems, Boston,

v.8, n. 3, p. 299-316, June 2000.

[7] S. EDWARDS, L. LAVAGNO, E. A. LEE and A. SANGIOVANNI-

VICENTELLI. “Embedded Systems Design: Formal Models,

Validation, and Synthesis”. Proceedings of the IEEE, New York, v. 85,

n. 3, p. 366-390, Mar. 1997.

[8] SystemC 2.0.1 Language Reference Manual, Open SystemC Initiative,

San Jose, CA, 2003.

[9] SpecC Language Reference Manual, Copyright © R. Domer, A.

Gerstlauer, D. Gajski, 2002.

[10] F. WAGNER and L. CARRO, “Sistemas Computacionais Embarcados,”

In XXII Jornadas de Atualização em Informática. Campinas:

UNICAMP, 2003, v. 1, p. 45-94.

[11] P. MAGARSHACK and P. PAULIN. System-on-Chip Beyond the

Nanometer Wall. In: Proc. DESIGN AUTOMATION CONFERENCE,

DAC, 40., 2003, Anaheim, USA. New York: ACM Press, 2003. p. 419-

424.

[12] K. KEUTZER, S. MALIK, R. NEWTON, J. RABAEY and A.

SANGIOVANNI-VICENTELLI. “System Level Design:

Orthogonalization of Concerns and Platform-Based Design,” IEEE

Transactions on Computer-Aided Design of Circuits and Systems, New

York, v. 19, n. 12, p. 1523-1543, Dec. 2000.

[13] F. R. WAGNER, W. CESARIO, L. CARRO and A.A. JERRAYA.

“Strategies for the Integration of Hardware and Software IP

Components,” Embedded Systems-on-Chip. Integration - the VLSI

Journal, Amsterdam, v. 37, n. 4, p. 223-252, Sept. 2004.

[14] AMBA™ Specification (Rev 2.0), ARM Ltd., 1999.

[15] The CoreConnect™ Bus Architecture, IBM Microeletronics, 2006.

[16] OSEK/VDX Operating System Specification 2.2.3, Continental GmbH,

2005.

[17] µITRON 4.0 Specification, TRON Association, 2002.

[18] Embedded processors market, IDC. Available: <http://www.idc.com >.

Visited on: June 2007.

[19] Xtensa® Instruction Set Architecture (ISA) Reference Manual For All

Xtensa Processor Cores, Tensilica Inc. Santa Clara, CA, 2010.

[20] C. ZHANG, F. VAHID and R. L. LYSECKY, “A Self-Tuning Cache

Architecture for Embedded Systems,” in: Proc. DESIGN

AUTOMATION AND TEST IN EUROPE, DATE, 2004, Paris, France

Los Alamitos: IEEE Computer Society Press, 2004. p. 142-147.

[21] J. HENNESSY and D. PATTERSON. Computer Architecture: A

Quantitative Approach. 3th ed. San Francisco: Morgan Kauffman, 2002.

[22] P. JAIN, S. DEVADAS, D. ENGELS and L. RUDOLPH, “Software-

assisted cache replacement mechanisms for embedded systems,” in:

Proc. INTERNATIONAL CONFERENCE ON COMPUTER AIDED

DESIGN, ICCAD, 2001, San Jose, USA. New York: ACM Press, 2001.

p. 119-126.

[23] NovaThor Platform- U9500. Available:

http://www.stericsson.com/products/u9500-novathor.jsp

[24] OMAP™ Technology Overview White paper, Texas Instruments, Inc.

Dallas, TX, 2000.

[25] K. GOOSSENS, et al. “Service-Based Design of Systems on Chip and

Networks on Chip”. In: VAN DER STOK, P. (Ed.), Dynamic and

Robust Streaming in and Between Connected Consumer-Electronics

Devices. [S.l.]: Springer, 2005. p. 37-60.

[26] OMAP 3530/25 Applications Processor White paper, Texas

Instruments, Inc. Dallas, TX, 2009.

[27] ARM Holdings Company Profile. Available:

http://www.arm.com/about/company-profile/index.php

[28] A. SLOSS, D. SYMES and C. WRIGHT, “ARM System Developer’s

Guide: Designing and Optimizing System Software”. San Francisco,

CA, USA: Morgan Kaufmann Publishers Inc. 2004. ISBN 1558608745.

[29] Architeture and Implementation of the ARM® Cortex™-A8

Microprocessor White paper, ARM Ltd. Cambridge, UK, 2005.

[30] E. J. TAN and W. B. HEINZELMAN, “DSP architectures: past, present

and futures”. SIGARCH Comput. Archit. News, ACM, New York, NY,

USA, v. 31, p. 6–19, June 2003. ISSN 0163-5964. Available:

http://doi.acm.org/10.1145/882105.882108

[31] J. EYRE and J BIER, “The evolution of DSP processors”. Signal

Processing Magazine, IEEE, v. 17, n. 2, p. 43 –51, mar 2000. ISSN

1053-5888.

[32] Y. MOSHE and N. PELEG, “Implementations of h.264/avc baseline

decoder on different digital signal processors”. In: ELMAR, 2005. 47th

International Symposium. [S.l.: s.n.], 2005. p.37 – 40.

[33] Fixed-Point Digital Signal Processor Texas Instruments, Inc. Dallas,

TX, 2009.

[34] TMS320C64x/C64x+ DSP: CPU and Instruction Set – Reference

Guide, Texas Instruments, Inc. Dallas, TX, 2009.

[35] D. R. HACHMANN, “Distribuição de tarefas em MPSoC Heterogêneo:

estudo de caso no OMAP3530”. Trabalho de Conclusão de Curso:

Ciência da Computação, UNIOESTE, 2011.

[36] Developing Core Software Technologies for TI’s OMAP Platform,

Texas Instruments, Inc. Dallas, TX, 2002.

[37] DSP/BIOS™ Bridge Integration Document. Texas Instruments, Inc.

Dallas, TX, 2006.

http://www.arm.com/about/company-profile/index.php

http://doi.acm.org/10.1145/882105.882108

Introdução aos Sistemas Embarcados utilizando FPGAs

Edilson Reis Rodrigues Kato

Universidade Federal de São Carlos

São Carlos, Brasil

e-mail: [email protected]

Emerson Calos Pedrino

Universidade Federal de São Carlos

São Carlos, Brasil

e-mail: [email protected]

Resumo— O Curso de “Introdução aos Sistemas Embarcados

utilizando FPGAs” promovido pela CES-School (Escola de

Sistemas Embarcados Críticos) em conjunto com o 2º CBSEC

(Congresso Brasileiro de Sistemas Embarcados Críticos) tem

como objetivo fornecer uma visão introdutória ao aluno, de

FPGAs (hardware), e de como podem os circuitos ser

programados (esquemático e linguagens de programação de

hardware), de quais circuitos podem ser explorados, e das

várias formas de embarcar um Sistema Computacional,

auxiliando o usuário a ponderar a melhor forma de embarcar

o sistema de acordo com o projeto a ser especificado. No

laboratório, será utilizada a placa DE1 da Altera, permitindo

ao usuário a implementação de um sistema computacional

simples, utilizando o SOPC Builder com o auxílio do NIOS II, um processador softcore de 32bits embarcado na FPGA.

Palavras Chave- FPGA, Sistema Computacional Embarcado,

hardware

I. INTRODUÇÃO

Uma FPGA (Field-Programmable Gate Array) é um

circuito integrado reprogramável ou reconfigurável

composto por vários componentes básicos de circuitos

lógicos, além de outros blocos de circuitos mais complexos,

tais como DSPs (Digital Signal Processor), memórias,

PLLs (Phase Locked Loop), etc. Esses circuitos podem ser

vistos como componentes padrões, os quais podem ser

configurados e conectados independentemente a partir de

uma matriz de trilhas e conexões programáveis pelos

usuários [1,2].

A programação da FPGA utiliza um conjunto de

ferramentas de software associado ao fluxo de projeto,

fornecendo ao desenvolvedor um nível de abstração que

permite se concentrar no algoritmo a ser implementado. A

programação é realizada através de linguagens de

programação de hardware (HDLs), tais como, Verilog-

HDL, VHDL, ou outras linguagens de modelagem de

sistemas onde o ciclo de projeto de circuitos em FPGAs

trata da especificação, implementação e verificação [3,4].

Um sistema dedicado pode então ser embarcado em uma

FPGA de forma que os recursos do projeto sejam

otimizados e sua forma de implementação possua grande

flexibilidade.

Sistemas computacionais dedicados podem ser

implementados de várias maneiras nas FPGAs, assim, esses

podem ser projetados de maneira convencional utilizando-se

elementos de lógica digital na forma de esquemáticos,

estabelecendo-se a quantidade de bits do processador e os

periféricos necessários ao projeto, e também através de um

processador básico existente agregar qualquer elemento ou

periférico, ou utilizar um processador (core) básico pronto.

O presente curso tem como objetivo fornecer uma visão

introdutória ao aluno, de FPGAs (hardware), e de como

podem os circuitos ser programados (esquemático e

linguagens de programação de hardware), e quais circuitos

podem ser explorados, e as várias formas de se embarcar um

Sistema Computacional, auxiliando o usuário a ponderar a

melhor forma de embarcar o sistema de acordo com o

projeto a ser especificado.

No laboratório, será utilizada a placa DE1 da Altera,

permitindo ao usuário a implementação de um sistema

computacional simples, utilizando-se o SOPC Builder com

o auxílio do NIOS II (Altera), um processador softcore de

32bits embarcado na FPGA, e serão propostos também

exercícios em classe, os quais poderão ampliar o sistema

inicial visando o melhor aprendizado do aluno [5,6].

A. Tópicos a serem abordados no curso:

Introdução a FPGAs

o Hardware

o Software

Sistemas Embarcáveis em FPGAs

o DSPs

o Microcontroladores

o Microcompuradores

o NIOS II

o Dedicado

Implementação Exemplo (Laboratório)

o Altera DE1 Board

o NIOS II

o Assembler

o Implementação de um computador simples em

FPGA

Exercícios

Espera-se dessa forma que o aluno esteja apto a realizar o projeto de um sistema computacional embarcado em uma FPGA e configurá-lo conforme suas necessidades.

mailto:[email protected]

II. CONCEITOS BÁSICOS

A tecnologia de PLDs (Programmable Logic Devices),

tais como FPGAs, CPLDs, entre outros dispositivos, é extremamente poderosa para projeto de sistemas digitais nos dias de hoje. Assim, tal dispositivo pode ser definido basicamente como sendo um circuito integrado (arranjo de portas lógicas) usado para implementar circuitos digitais onde este pode ser configurado e reconfigurado pelo usuário final através de um software específico fornecido pelo seu fabricante.

Os dispositivos atuais podem lidar com qualquer tarefa computacional e alguns já possuem no mínimo uma CPU embutida. As Técnicas de programação para esses dispositivos variam de HDLs à linguagens de alto nível tais como Handel-C e Streams-C. Também, alguns dispositivos já possuem Capacidade de Reconfiguração dinâmica. Como exemplos de aplicações, podem-se citar: processamento digital de imagens, reconhecimento de padrões, criptografia, experimentos em sala de aula, etc.

A tecnologia de programação de PLDs dita se as interconexões do chip são feitas por transistores comandados por células SRAMs, transistores EEPROM, fusíveis, multiplexadores, etc. Dependendo da aplicação a ser explorada, o dispositivo poderá ter alta granularidade, contendo LUTs (Lookup Tables), ou poderá ser de granularidade fina, contendo, por exemplo, mais elementos de lógica combinacional.

Todas as interconexões discutidas gerarão atrasos em relação a um simples contato metálico utilizado nas interconexões de um MPGA (Mask Programmable Gate Arrays), por exemplo. Também, em CPLDs os atrasos são mais previsíveis do que em FPGAs (interconexões segmentadas). Em relação aos tamanhos dos blocos, por exemplo, um bloco maior implicará em maior desperdício para implementar funções mais simples.

As categorias comerciais de FPGAs são dividadas basicamente, independentemente do fabricante, em: arranjo simétrico, baseada em linhas, PLD hierárquico, e mar de portas. Na Figura 1, é apresentado um exemplo dessas categorias.

Fig. 1. Categorias comerciais de FPGAs.

Na Figura 2, apresenta-se uma ilustração do processo de configuração de um PLD. Na figura, é possível ver todos os passos do processo de projeto desde a criação do mesmo através de diagramas esquemáticos ou linguagens de descrição de hardware, passando pela geração do arquivo netlist, o mapeamento, o posicionamento e o roteamento do chip, até a geração do arquivo binário de configuração final do dispositivo [7].

Fig. 2. Processo de projeto de um PLD.

III. IMPLEMENTAÇÃO PRÁTICA

A primeira parte da implementação prática consiste em

implementar um processador com poucos periféricos embarcados na FPGA (o sistema inicial contém apenas alguns componentes): um processador, memória e alguns periféricos simples de entrada e saída.

Em seguida serão implementadas e testadas outras funcionalidades no sistema computacional criado de forma que se aproveite os recursos existentes na placa DE1. A parte prática do curso foi baseada no Tutorial “Introduction to the Altera SOPC Builder” obtido do site da Altera [1].

Para a implementação do sistema computacional será utilizado o SOPC Builder em conjunto com o Quartus® II e o processador NIOS II embarcado na FPGA. Dessa forma o usuário deve ter instalado em seu computador o Quartus II 10.1, o NIOS II 10.1, e o Altera Monitor Program, além de possuir a placa de desenvolvimento DE1.[8-13].

A. Primeira Parte – Sistema Computacional Dedicado

Simples

Para a implementação de um sistema computacional

dedicado simples, o usuário poderá utilizar o processador

NIOS II embarcado na FPGA. A partir dele, são conectados

os periféricos desejados para se estabelecer sua

funcionalidade. A Figura 3 ilustra o sistema a ser

implementado.

O processador Altera Nios II é um processador de 32

bits que pode ser instanciado em um chip FPGA Altera.

Três versões do processador Nios II estão disponíveis:

econômico (/e), padrão (/s), e rápido (/f). O Sistema Computacional a ser implementado trata do Nios II versão

/e. Uma maneira fácil de começar a trabalhar com o Sistema

Computacional e com o processador Nios II é fazer uso de

um utilitário chamado de Programa Monitor Altera. Este

utilitário proporciona uma maneira fácil de montar e

compilar programas Nios II no Sistema Computacional que

são escritos em qualquer linguagem de montagem ou

linguagem de programação C. O Programa Monitor, o qual

pode ser adquirido a partir do site da Altera, é uma

aplicação que é executada a partir de um computador host

conectado à placa de DE1. O Programa Monitor pode ser

usado para controlar a execução de código no Nios II, listar

(e editar) o conteúdo do processador, registros, editar o conteúdo de memória na placa de DE1 e operações

similares.

No sistema exemplo da Figura 3, iremos conectar oito

interruptores SW7-0 para ligar ou desligar os oito LEDs

verdes, LEDG7-0 da placa DE1. Os switches são

conectados ao Nios II por meio de uma interface I / O

paralela configurada para atuar como uma porta de entrada.

Os LEDs são acionados pelos sinais de outra interface de

I/O paralela configurada para atuar como uma porta de

saída. Para a realização da operação desejada, os oito bits

padrão correspondente ao estado dos interruptores tem que

ser enviados para a porta de saída para ativar os LEDs. Isso

será realizado por um programa armazenado na memória a

bordo do chip onde o Nios II será o responsável em executar

o programa.

SOPC Builder será utilizado para projetar o hardware

descrito na Figura 3. Em seguida atribui-se os pinos da série

Cyclone II FPGA para realizar as conexões entre as interfaces paralelas e os interruptores e LEDs que

funcionam como dispositivos I/O. O sistema projetado é

compilado e descarregado na placa de desenvolvimento e

finalmente, utiliza-se a ferramenta de software chamada

Altera Program Monitor para compilar, linkar, descarregar e

executar o programa no hardware do Nios II para realizar a

tarefa desejada.

Os passos para esta implementação podem ser resumidos

a seguir:

• Utilizar o Construtor de SOPC para projetar um

sistema baseado no Nios II

• Integrar o sistema projetado Nios II em um projeto

Quartus II • Implementar o sistema projetado na placa DE1

• Execução de um programa de aplicação sobre o

processador Nios II

A.1 - Altera’s SOPC Builder

O SOPC é a ferramenta utilizada em conjunto com o

software CAD Quartus II. Ele permite ao usuário facilmente

criar um sistema baseado no processador Nios II,

simplesmente selecionando as unidades funcionais

desejadas e especificando seus parâmetros. Para

Figura 3 – Sistema Computacional Simples a ser implementado

implementar o sistema na Figura 3, temos que instanciar as

seguintes unidades funcionais:

• Nios II, que é referida como uma Unidade Central

de Processamento (CPU).

• memória on-chip, que consiste em blocos de memória no chip Cyclone série II utilizado na placa

DE1, vamos especificar 4-Kbytes de memória dispostos

em palavras de 32 bits.

• Duas paralelas interfaces I / O

• Interface JTAG UART para comunicação com o

computador host.

Para definir o sistema desejado, inicie o software

Quartus II e execute os seguintes passos:

1. Criar um novo projeto Quartus II para o seu sistema

com o nome pratica1 em um diretório chamado curso_FPGA. Escolha a FPGA da placa DE1, ou seja, o

Cyclone II EP2C20F484C7.

2. Selecione Ferramentas SOPC Builder. Digite

nios_system como o nome do sistema, este será o nome do

sistema que a SOPC Builder irá gerar. Escolha Verilog.

Clique em OK para alcançar a janela na Figura 4.

3. A Figura 4 apresenta o guia de montagem do sistema

computacional no SOPC Builder, que é usado para adicionar

componentes para o sistema e configurar os componentes selecionados para atender às exigências do projeto. Os

componentes disponíveis estão listados no lado esquerdo da

janela.

4. O processador Nios II é executado sob o controle de

um relógio. Para este curso iremos fazer uso do clock de 50

MHz que é fornecido na placa DE1. Conforme mostrado na

Figura 4, é possível especificar os nomes e freqüência de

sinais de clock no visor SOPC Builder. Se já não estiver

incluído nesta lista, especifique um relógio chamado clk_0 com a fonte externa de freqüência 50,0 MHz.

5. Em seguida, especifique o processador da seguinte

forma, no lado esquerdo da janela na Figura 6 expanda

Processadores, selecione Nios II Processor e clique em

Adicionar, o que leva à janela na Figura 5.

Escolha Nios II /e que é a versão mais simples do

processador. Clique em Concluir para retornar à janela da

Figura 4, que agora mostra o processador Nios II específico,

como indicado na Figura 6. Pode haver alguns avisos ou

mensagens de erro exibidas na janela SOPC Builder

(Mensagens na parte inferior da tela), pois alguns parâmetros ainda não foram especificados. Ignorar essas

mensagens pois iremos fornecer os dados necessários mais

tarde.

6. Para especificar a memória on-chip faça o seguinte:

• Selecione Memories and Memory Controllers > On-Chip > On-Chip Memory (RAM or ROM) e clique em Adicionar

• Na janela On-Chip Memory, mostrada na Figura

7, definir a Data width para 32 bits e o tamanho total da memória com 4 Kbytes (4096 bytes) (clique Enter)

• Não altere as outras configurações padrão, clique

em Concluir.

Figura 4 – Interface de construção do Sistema computacional no SOPC

Figura 5 – Configuração do processador Nios II.

Figura 6 – Definição do processador na placa DE1.

Figura 7 – Definição da on-chip memory

7. Especificar a entrada paralela I / O interface como

segue:

• Selecione Peripherals > Microcontroller Peripherals > PIO (Parallel I/O) e clique em Adicionar para abrir a janela de configuração da

Figura8.

• Especifique a largura da porta em 8 bits e escolha a

porta como entrada, como mostrado na Figura 8. Clique

em Concluir.

8. Da mesma forma, especificar a interface de saída

paralela I / O:

• Selecione Peripherals > Microcontroller Peripherals > PIO (Parallel I/O) e clique em Adicionar para abrir a janela de configuração da PIO

novamente.

• Especifique a largura da porta a ser 8 bits e escolha

a porta como saída.

• Clique em Concluir para retornar.

Figura 8 – Definição da interface paralela de entrada.

9. Queremos conectar a um computador host e fornecer

um meio de comunicação entre o sistema Nios II e o

computador host. Para isso devemos instanciar a interface JTAG UART como segue:

• Selecione Interface Protocols > Serial > JTAG UART e clique em Adicionar para abrir a janela de

configuração JTAG UART da Figura 9.

• Não alterar as configurações padrão. • Clique em Concluir para retornar.

Figure 9 – Definição da interface JTAG UART.

10. O sistema completo é mostrado na Figura 10.

Note que a SOPC Builder escolhe automaticamente nomes

para os vários componentes. Os nomes não são

necessariamente descritivos o suficiente para serem

facilmente associados com o projeto, mas eles podem ser

mudados. Na Figura 3, usamos os nomes Switches e LEDs

para as interfaces paralelas de entrada e saída, respectivamente. Esses nomes podem ser usados no sistema

implementado. Clique com o botão direito do mouse no

nome pio_0 e selecione Renomear. Mudar o nome para

Switches. Da mesma forma, mudar pio_1 para LEDs.

11. Os endereços de base e final dos diversos

componentes do sistema projetado podem ser atribuídos

pelo usuário, mas também podem ser atribuídos

automaticamente pelo SOPC Builder. Vamos escolher a

última possibilidade. Então, selecione o comando (usando

os menus no topo da janela do SOPC) System > Auto-Assign Base Addresses , para que o SOPC estabeleça os endereços conforme a Figura 10.

12. O comportamento do processador Nios II quando é

resetado é definido pelo seu vetor de reset. É o local no dispositivo de memória que o processador busca a próxima

instrução quando é resetado. Da mesma forma, o vetor de

exceção é o endereço de memória que o processador vai

para quando uma interrupção é gerada. Para especificar

estes dois parâmetros, deve-se fazer o seguinte:

• Clique o botão direito do mouse sobre o item

CPU_0 e selecione Editar (Figura 10).

• Selecione onchip_memory2_0 como o dispositivo

de memória para ambos os vetores de reset e de exceção,

como mostrado na Figura 11.

Figure 10 – A especificação final da Placa DE1.

• Não altere a configuração de offset

• Clique em Concluir para retornar à guia System Contents.

13. Após ter especificado todos os componentes

necessários para implementar o sistema desejado, o sistema

computacional pode agora ser gerado. Selecione a aba

System Generation, o que leva à janela na Figura 12.

Figura 11 – Definição dos vetores de reset e de exceções.

Figura 12 - Geração do sistema

Desligue Simulation - Create Project simulator files. Clique em Gerar na parte inferior da janela do SOPC, nesse

momento pode-se salvar o projeto com o nome de pratica1.

O processo de geração produz as mensagens exibidas na

figura 12. Quando a mensagem "SUCESSO: GERAÇÃO do

SISTEMA COMPLETO" aparecer, clique em Sair para

voltar à janela principal do Quartus II.

Mudanças no sistema projetado podem ser realizadas facilmente a qualquer momento, reabrindo a ferramenta

SOPC Builder, qualquer componente na aba System Contents do SOPC Builder pode ser selecionado e

excluído, e um novo componente pode ser adicionado e o sistema novamente gerado.

A.2 - Integração do Sistema Nios II em um Projeto

Quartus

Para completar o projeto de hardware, temos que fazer o

seguinte:

• Instanciar o módulo gerado pelo SOPC Builder

para o projeto Quartus II.

• Atribuir os pinos FPGA.

• Compilar o circuito projetado. • Programar e configurar o dispositivo Cyclone II

sobre a placa DE1.

A.2.1 - Instanciação do Módulo Gerado pelo SOPC

Builder

Tudo o que precisamos fazer é instanciar o sistema Nios

II e conectar as entradas e saídas paralelas das portas I / O,

bem como as entradas de clock e reset, aos pinos

apropriados no dispositivo Cyclone II.

O módulo Verilog gerado pelo SOPC Builder está no

arquivo nios_system.v no diretório do projeto. Note que o nome do módulo Verilog é o mesmo que o nome do sistema

especificado no SOPC Builder.

A Figura 13 mostra o módulo Verilog de mais alto nível

que instancia o sistema Nios II. Este módulo é chamado

pratica1, porque este é o nome que foi especificado para o

projeto no Quartus II. Note que as portas do módulo de

entrada e saída usam os nomes especificos, como os botões

de pressionar KEY, chaves SW, os leds verdes LEDG e o

clock de 50 MHz como CLOCK_50. Isso porque iremos

utilizar um arquivo específico com os noves dos pinos de

entrada e saída já especificados, de acordo com a placa DE1 da Altera. Digite o código em um editor conforme a Figura

13 e salve como pratica1.v. Adicione este arquivo e todos os

arquivos *. v produzido pelo SOPC Builder para o projeto

do Quartus II (Project>Add/Remove files in Project).

Para preparar os pinos devemos importar o mapa de

pinos de um arquivo (Assigments>Import Assigments...). O

university program da altera possui um arquivo para cada

placa, para facilitar o projeto, busque dentro dos diretórios

do university program, para a versão do Quartus II e o tipo

de dispositivo o arquivo DE1_pin_assignments.qsf.

Figura 13 – Instanciando o sistema NIOS II no Quartus II.

O SOPC Builder fornece um arquivo exemplo de

instanciação, no nosso caso o arquivo nios_system_inst.v.

Deve-se apagá-lo ou comentá-lo por completo para que o

código possa ser compilado com o arquivo criado pratica1.v.

Após realizar as configurações necessárias deve-se

compilar o código. Você pode ver algumas mensagens de

aviso associado ao Nios II do sistema, tais como alguns

sinais que estão sendo utilizados ou erro nos comprimentos

de vetores, estes avisos podem ser ignorados.

A.2.2 - Programação e Configuração

Programar e configurar o FPGA Cyclone II no modo de

programação JTAG como segue:

1. Conectar a placa DE1 para o computador host por

meio de um cabo USB conectado ao USB-Blaster. Ligue a

alimentação da placa DE1. Verifique se o interruptor

RUN/PROG está na posição RUN. Se necessário instalar a

USB Blaster a partir do driver Altera USB Blaster contido

no diretório do Altera>10.1> Quartus>Drivers>

USB_Blaster.

2. Selecione Tools > Programmer para abrir a janela da Figura 14.

3. Se ainda não estiver escolhido por padrão, selecione JTAG na caixa Mode. Além disso, se o USB-Blaster não é

escolhido por padrão, pressione o botão Hardware Setup... e selecione o USB-Blaster na janela que aparece.

4. O arquivo de configuração pratica1.sof deve ser

listado na janela.Caso o arquivo não esteja listado, selecione

o arquivo e clique em Add.

5. Clique na caixa Program/Configure para estabelecer

a ação.

6. Nesse ponto a janela se parece com a da Figura 14.

Pressione Start para configurar a FPGA.

A.3 - Execução do Programa de Aplicação

Tendo configurado o hardware necessário no dispositivo

FPGA, é agora necessário criar e executar um programa que

executa a operação desejada. Isto pode ser feito utilizando a

linguagem assembly do Nios II ou em uma linguagem de

alto nível, como C. Vamos ilustrar a programação em assembly.

A.3.1 – Usando a Linguagem de Programação do Nios II.

Usando um programa em linguagem assembly (Figura

15) carregaremos os endereços dos dados registrados na

PIOs em dois registradores r2 e r3. Em seguida, em um loop

infinito transfere os dados da PIO de entrada, Switches, à de

saída, LEDs.

Figura 15 - Código Assembly para controlar as luzes.

Para uma explicação detalhada das instruções em

linguagem assembly do Nios II ver o tutorial Introduction to

the Altera Nios II Soft Processor [14].

Digite esse código em um arquivo pratica1.s e coloque o

arquivo no diretório de trabalho. O programa tem de ser

montado e transformados em um arquivo S-Record,

pratica1.srec, adequado para fazer o download para o

sistema implementado Nios II.

A Altera fornece um software monitor, chamado Altera

Monitor Program, para uso com as placas DEs. Este

software fornece um meio simples para a compilação,

montagem e download de programas em um sistema

implementado com o Nios II em uma placa DE. Ele também torna possível para o usuário executar tarefas de depuração.

A descrição deste software está disponível no tutorial Altera

Monitor Program.

Abra o Altera Monitor Program (Figura 16). Este

software precisa conhecer as características do sistema

projetado, o qual é dado por um arquivo do tipo .ptf, nesse

caso o nios_system.ptf. Clique File > New Project para abrir

a janela New ProjectWizard e siga os seguintes passos:

1. Entre no diretório curso_FPGA como o diretório de

projeto digitando diretamente ou buscando no campo Browse.

2. Digite pratica1 como o nome do projeto e clique

Next>..

3. A partir da caixa Select a System, selecione <Custom

System>.

4. Clique Browse... antes do campo System Description

para mostrar a janela de seleção e escolha o arquivo

nios_system.ptf. Note que esse arquivo está no diretório de

projeto curso_FPGA.

Figura 14 – A janela do programador.

Figura 16 – Janela do Altera Monitor Program.

Figura 17 – A janela de especificação da memória do programa.

5. Especifique o arquivo .sof (pratica1.sof) no campo

Quartus II Programming (SOF) File para permitir que o

usuário faça o download do programa dentro da placa a

partir do Altera Monitor Program. Click Next>.

6. Selecione Assembly Program como o tipo de programa a ser utilizado e clique Next>.

7. Clique Add... para abrir a janela de seleção de arquivo

e escolha o arquivo pratica1.s e clique select., clique Next >.

8. Verifique que a conexão do host está configurada para

USB-Blaster,o processador está setado para cpu_0 e o

Terminal Device está setado para JTAG UART, e clique

Next>.

9. O Altera Monitor Program também precisa saber onde

buscar o programa de aplicação. Nesse caso será no bloco

de memória da FPGA. O SOPC Builder estabeleceu o nome

onchip_memory2_0 para esse bloco. Como mostrado na

Figura 17, o Monitor Program já selecionou os dispositivos

de memória corretos.

10. Clique Finish para confirmar a configuração do

sistema.

Em seguida clique em Actions > Compile & Load. O

Altera Monitor irá invocar o assemblador e um linkador.

Depois que o programa já foi baixado na placa, o programa

é exibido na janela Disassembly do Altera Monitor,

conforme ilustrado na Figura 18.

Clique Actions > Continue para executar o programa.

Com a execução do programa, agora você pode testar o

projeto ligando os interruptores, SW7 a SW0, os LEDs devem responder em conformidade.

B. Segunda Parte – DE1 Basic Computer

Nessa segunda parte iremo implementar um computador

básico completo. Um diagrama de blocos do computador

DE1 básico é mostrado na Figura 19 mostra todos os

recursos encontrados na placa DE1 Board. Seus principais

componentes incluem o Nios II da Altera processador,

memória para o programa e armazenamento de dados, portas paralelas conectados a switches e luzes, um módulo

timer e uma porta serial.

Como mostrado na Figura 19, o processador e suas

interfaces para dispositivos I/O são implementadas dentro

do chip Cyclone II FPGA na placa DE1.

Utilizaremos o University Program da Altera, especificando a placa DE1 e o Sistema Computacional DE1 Básico. Esse pacote e o tutorial completo pode ser encontrado no site da Altera [15].

Figura 18 – Visão da janela depois do programa descarregado na placa.

CONCLUSÕES E TRABALHOS FUTUROS

Com o curso de introdução aos sistemas embarcados em

FPGAs, o aluno pode realizar a implementação de um sistema computacional simples, utilizando o SOPC Builder com o auxílio do NIOS II, além de implementar um computador básico. Espera-se, assim, que no futuro o aluno possa projetar de um sistema computacional embarcado em uma FPGA e configurá-lo conforme suas necessidades. Como trabalhos futuros, os autores pretendem realizar versões mais avançadas de cursos de sistemas embarcados utilizando FPGAs, DSPs e computadores dedicados.

AGRADECIMENTOS

The authors are grateful to the “Instituto Nacional de

Ciência e Tecnologia em Sistemas Embarcados Críticos

(INCT-SEC)” for the financial support.

REFERENCES

[1] J. O. Hamblen, T. S. Hall, and M. D. Furman, “Altera Rapid

Prototyping of Digital Systems SOPC Edition”, Springer, 2008.

[2] Z. Navabi, “Digital Design and Implementation with Field

Programmable Devices”, Ed. Kap, 2005.

[3] D. A. Patterson, and J. L. Hennessy, “Organização e Projeto de Computadores - A Interface Hardware/Software”, Editora Campus,

2005.

[4] A. S. Tanenbaum, “Organização Estruturada de Computadores”,

Pearson: Prentice-Hall, 2007.

[5] W. Stallings, “Arquitetura e Organização de Computadores”, Pearson, 2010.

[6] R. J. Tocci, “Sistemas Digitais - Princípios e Aplicações”, Pearson:

Prentice Hall, 1994

[7] E. C. Pedrino, “Arquitetura pipeline reconfiguravel atraves de instrucoes geradas por programação genetica para processamento

morfologico de imagens digitais utilizando FPGAs” (in Portuguese), Doctoral thesis, São Paulo University -USP-EESC, pp. 220,2008.

[8] “Introduction to the Altera SOPC Builder Using Verilog Designs”,

Accessed April 27, 2012 ftp://ftp.altera.com/up/pub/ Altera Material/9.1/Tutorials/Verilog/Introduction_to_the_Altera_SOPC_Bu

ilder.pdf

[9] J. O. Hamblen, Altera DE2 Board Resources for Students

http://users.ece.gatech.edu/~hamblen/DE2/, 2011.

[10] “Altera University Program – IP Cores for Education”, Accessed Oct 14, 2010. [Online]. Available: http://www.altera.com/education/

univ/materials/ip-cores/unv-ip-cores.html

[11] “Altera’s Embedded Processors”, Accessed Oct 14, 2010. [Online]. Available: http://www.altera.com/products/ip/processors/nios2/ni2-

index.html

[12] “Nios II Community FTP”, Accessed Oct 14, 2010. [Online]. Available: http://www.niosftp.com/pub/

[13] J. O. Hamblen, T. S. Hall, M. D. Furman, “Tutorial IV: Nios II

Processor Hardware Design” In Rapid Prototyping of Digital Systems SOPC Edition Springer 352-370 (2008).

[14] “Introduction to the Altera Nios II Soft Processor.” Accessed April

27, 2012, ftp://ftp.altera.com/up/pub/Tutorials /DE2/ Computer _Organization/ tut_nios2_introduction.pdf

[15] “Basic Computer System for the Altera DE1 Board” Accessed April 27, 2012, ftp://ftp.altera.com/up/pub/Altera_Material/11.0/Examples/

DE1/ NiosII_Computer_Systems/DE1_Basic_Computer.pdf

Figura 19 - Diagrama de blocos do Computador DE1 Básico.

ftp://ftp.altera.com/up/pub/Altera_Material/9.1/Tutorials/Verilog/Introduction_to_the_Altera_SOPC_Builder.pdf

Prof. Dr. Edilson Reis Rodrigues Kato Universidade Federal de São Carlos (UFSCar)

Rod. Washington Luis, Km 235 – São Carlos SP [email protected]

Professor Adjunto da Universidade Federal de São Carlos do

Departamento de computação DC regime de dedicação exclusiva (DE). Possui graduação em Engenharia Elétrica pela Universidade

de São Paulo - USP (1988), mestrado em Engenharia Mecânica pela Universidade de São Paulo - USP (1994), doutorado em Engenharia Mecânica pela Universidade de São Paulo - USP (1999)

e Pós-Doutorado em Automação e Inteligência Artificial pela Universidade Federal de São Carlos - UFSCar (2001). Tem experiência na área de Engenharia Elétrica e de Computação, com ênfase em Automação Eletrônica de Processos Elétricos e

Industriais, atuando principalmente nos seguintes temas: Modelagem de Sistemas Automatizados, Sistemas Inteligentes aplicados a Manufatura, Inteligência Artificial, Arquitetura de

Sistemas e Dispositivos de Lógica Programável,

http://lattes.cnpq.br/8517698122676145

Prof. Dr. Emerson Carlos Pedrino

Universidade Federal de São Carlos (UFSCar) Rod. Washington Luis, Km 235 – São Carlos SP

[email protected]

Possui graduação em Bacharelado em Física Computacional pela Universidade de São Paulo - IFSC - (2000), Especialização em Geoprocessamento pela Universidade Federal de São Carlos -

DECiv - (2003), Mestrado em Engenharia Elétrica pela Universidade de São Paulo - EESC - (2003) e Doutorado em Engenharia Elétrica pela Universidade de São Paulo - EESC - (2008). Atualmente é

Professor Adjunto do Departamento de Computação da Universidade Federal de São Carlos. Tem experiência nas áreas de Ciência da Computação, Engenharia Elétrica e Geoprocessamento,

atuando principalmente nos seguintes temas: desenvolvimento de arquiteturas rápidas e inteligentes para processamento de imagens em tempo real utilizando dispositivos de lógica programável de alta

capacidade, instrumentação microprocessada, programação genética, morfologia matemática, sensoriamento remoto e visão robótica

http://lattes.cnpq.br/6481363465527189

Autonomic Wireless Sensor Networks

A.R. Pinto1, G.M. Araújo2, J.M. Machado1, Adriano Cansian1, Carlos Montez2

1State University of São Paulo - UNESPSão José do Rio Preto-SP, Brasil

{arpinto,jmachado,adriano}@ibilce.unesp.br

2PGEAS – Universidade Federal de Santa Catarina – UFSC, Brazil{araujo,montez}@das.ufsc.br

Abstract

Wireless Sensor Networks (WSN) can be used to monitor hazardous and inaccessible areas. In these situations, the power supply (e.g. battery) in each node can not be easily replaced. One solution is to deploy a large number of sensor nodes, since the lifetime and dependability of the network can be increased through cooperation among nodes. In addition to energy consumption, applications for WSN may also have other concerns, such as, meeting deadlines and maximizing the quality of information. The large number of WSN nodes and the harsh or inaccessible areas where WSN are generally deployed increase the efforts of WSN management. Thus, autonomic computing approaches are necessary to maintain networks that must be active during a long period of time.In this chapter, we shown the WSN and autonomic computing characteristics. Two autonomic approaches for dense WSN are also presented. The first approach is a Genetic Machine Learning algorithm aimed at applications that make use of trade-offs between different metrics. Simulations were performed on random topologies assuming different levels of faults. GMLA showed a significant improvement when compared with the use of IEEE 802.15.4 protocol. Moreover, an approach that autonomically provides QoS for dense WSN called VOA (Variable Offset Algorithm) is presented. Experimental results had showed that VOA can significantly improve communication efficiency in dense WSN.

1. Introduction

Wireless sensor networks (WSN) are a denomination for network that covers several variations in compositions and deployment of nodes. These networks are composed of small communicating nodes, which contain a sensing unit, wireless communication module, processor, memory and a power supply, typically a battery [9].

The nodes that compose these networks are able to collect scalars and they are also able to communicate with each other. The set of nodes can be homogeneous or some of them may have special characteristics. Some WSN considers the use of a base station that has more computational power than other nodes. The base station has the responsibility to collect, process and store data sent by slave nodes [10].

Resources in WSN technology (processor, memory and battery) are generally restricted. Some networks are deployed in hazardous or inaccessible places where change of battery is prohibitive [6]. This way, there are multiple research efforts currently underway to increase the system lifetime, adopting approaches that minimize the duration of processing and communication tasks and that also minimize context switches. Moreover, due to battery depletion, faults in the wireless communication and faults in hardware nodes, the network topology becomes very dynamic [1].

Some approaches consider a large number of nodes (a dense network), which are deployed near the phenomenon that needs to be monitored. Sometimes, due to the fact that the network is deployed quickly and their nodes are scattered over a large area in a random fashion, the position of the nodes can not be predetermined [6].

The strategy behind the deployment of a large number of non-reliable nodes has several advantages: (i) better fault tolerance through distributed operation; (ii) uniform covering of the monitored environment; (iii) easy deployment; (iv) reduced energy consumption; and (v) longer network lifetime.

The high interaction degree that a WSN may have with the environment where sensors are deployed imposes multiple implicit and explicit time constraints. For instance, the concept of data freshness implies that some data in the system has a short time of validity [3]. For instance, in security applications, whenever someone accesses a predetermined room, the system must to localize the potential intruder within a maximum period of time (a deadline).

Due to the high fault degree, the inherent non-determinism, the surrounding noise and the resource restrictions, it is extremely hard to guarantee real-time properties in WSN. In this way, applications with hard deadlines constraints are generally not considered.

Data fusion approaches, in dense networks, are used in order to increase sensor readings dependability, to make a more accurate estimation of monitored environment and to achieve longer network lifetime [5,6]. In these approaches, sensed scalars are sent to a base station that fuses data, with the objective of extracting useful information from a set of readings. This way, even in the presence of faulty sensors, dependable information may be generated. This issue is one of the most important that outcomes from data fusion approaches: it is no longer necessary to rely just upon one sensor reading, when supporting dependable applications.

Even though dense WSN presents several advantages, self-management characteristics are required in order to deal with the management of a large number of nodes. Self-management techniques are part of autonomic-computing methodologies, that can also be used to manage WSN with conflicting goals (energy efficiency, self organizing, time constraints and fault tolerance). The main goal of self-management is the development of a computing system that does not need the human intervention to operate. This way, computing systems are able to self-organize and self-optimize themselves, once they follow global objective dictated by a system administrator .

Dense WSN composed for several sensor nodes and a base station in a star network topology is in conformance with IEEE 802.15.4 standard, which is becoming a de facto standard in WSN [2]. The IEEE 802.15.4 offers support for different kind of applications, but many issues are still open, when the goals are conflicting (for instance, increase dependability and energy efficiency, while meeting time constraints). IEEE 802.15.4 protocols do not seem to be able to deal with such complexities.

For example, when the number of nodes in a network is increased to achieve better reliability, the WPAN may be congested, and fewer messages arrive in the base station on time. In order to show this situation, we perform experiments using TrueTime simulator1. Two metrics called Ef (efficiency) and QoF have been adopted. The efficiency is a metric that measures the ratio between sent and received messages. QoF, represents, roughly, the average number of received messages by the base station, periodically. QoF provides a quality measurement. Figure 1 shows that when density network is increased, QoF increases slowly, but communication efficiency quickly decreases.

1 It is freely available at http://www.control.lht.se/truetime.

Figure 1 – IEEE 802.15.4 behavior.

In this chapter, two autonomic approaches are presented. The first one is a self-organizing approach for WSN based on the use of genetic machine learning algorithm. Genetic Machine Learning algorithms (GMLA) are a machine learning approach based on genetic algorithms (GA).

GA are optimization algorithms based on the natural selection procedures proposed by Charles Darwin. GA are efficient in solving multi-goal optimization problems. However, its overhead may be cumbersome. On the other hand, GMLA approaches reduce partially the overheads (genotype evolution is made after some classifier consults). GMLA tries to achieve a trade-off between communication efficiency (Ef) and quality of fusion (QoF). The second presented approach, VOA (Variable Offset Algorithm) uses random offset before each transmission in order to decrease collisions in wireless media.

This chapter is organized as follows: section 2 shows WSN challenges and characteristics. The autonomic computing principles are briefly describe in section 3. Genetic Machine Learning Algorithms are shown in section 4. Related works are presented in section 5. Model System is presented in section 6. GMLA simulation results are presented in section 7. VOA approach is presented in section 8 and VOA experimental tests in section 9. Finally, final remarks are presented in section 10.

2. Wireless Sensor Networks

WSN are generally composed by a large number of tiny nodes that can sense the environment and communicate through wireless media. The nodes of a WSN are embedded systems with severe hardware and software constraints [15]. The Figure 1 shows a WSN node scheme.

Figure 1 – WSN Node Scheme

The small size and wireless communication of WSN technology allow the quick deployment of the network over a monitoring area. The WSN deployment can be previously determined or the sensors can be randomly deployed. The sensor placement can be manually or autonomically done, following the pros and cons of each deployment type will be discussed.The sensors deployment made by human beings is shown in Figure 2. The placement can be done in a predetermined way or in a random fashion. The main advantage of this kind of deployment is the simplicity. However, the placement precision is lower than the robotic one.

Figure 2: Manual WSN Deployment

WSN deployment can also be autonomically done by robots (Figure 3). There are many advantages behind this strategy: the WSN topology is precisely formed, it is possible to deploy a WSN over a harsh or inaccessible environment without human lives risk and the deployment cost is lower than a manual deployment. However, the complexity of robot development and the cost of robot management could be prohibitive.

Figure 3: Robotic WSN Deployment

WSN can also be dropped by an UAV (unnamed aerial vehicles) or an airplane during the flight (Figure 4). Thus, sensor nodes will be placed in a complete unpredictable location, and it is necessary to deal with this uncertainty through special self-organization techniques. Moreover, the sensor must be extremely cheap due to the fact that many of them can be lost or damaged during the aerial deployment. However, this strategy is suitable in battlefields and contaminated sites (for example, chemical or radiation contamination).

Figure 4: WSN nodes dropped by Aerial Vehicles.

The sensor nodes can also present mobility capabilities (robots). Thus, each node can move to a specific location in order to form the network. Figure 5 show a WSN formed by mobile sensors. This strategy present several challenges like: group coordination, self-localization and energy consumption (generally the

mobility itself consumes more energy than wireless communication). On the other hand, this kind of WSN is more flexible (nodes can move to specific geographic location when it is necessary).

Figure 5: Autonomic Deployment

After the deployment WSN nodes are turned on in order to sense and report environmental values like: temperature, humidity, pH, etc.

Due to the fact that WSN is suitable to sense large areas, there are three main network topologies that can be used in order to cover the monitoring area and to deliver data to the users: star, cluster-tree and mesh. The advantages, disadvantages and characteristics of each topology will be following discussed.

The star topology is formed by two main kind of nodes: base station that is used to collected data from sensors, WSN management, data fusion. There are also common sensor nodes that just sample data from sensors and broadcast the information to the base station in just one hop. The star topology scheme is showed in Figure 6.

Figure 6: Star Topology

The base station generally presents a more powerful hardware, special software and a larger energy budget to deal with the WSN management.

The common sensor nodes must be placed in such a way that their distance is lower than the maximum base station radio diameter. Thus, just nodes that are covered

by the base station can be considered member of the WSN. Figure 7 shows the case where nodes are located out of the base station radio coverage.

Figure 7: Star Topology Problem

There are several advantages in a star topology WSN formation:

Simple network communication : Due to the fact that the communication is just in one way (base station to sensors and sensors to base station), sensor nodes can be set into sent mode during a long period of time. Moreover, the base station can just receive messages and turn into sent mode just in special situations like: query specific group of nodes or to send checkpoint packets. Due to the fact that star topology is based on a one-hop communication, routing algorithms are not necessary (these algorithms are generally complex and spend high level of energy).

Faulty nodes do not affect the network communication: The one-hop communication of star topology is much more simple than the k-hop approaches that are used in cluster-tree and mesh WSNs. Thus, faults in nodes would not interrupt the data delivery to the base station (when nodes that are used as routers fail, the entirely WSN can be affected). Moreover, the fault nodes can be easily replaced or another WSN node could assume the fault nodes role.

Simple software implementation : the star communication pattern needs a simple software implementation than other topologies. Moreover, the development of routing algorithms (that is more complex) is not necessary

Low cost WSN : base stations are more expensive than common nodes (they must have powerful hardware, software and long life batteries). Topologies like mesh or cluster-tree may need more than one base station, thus their cost is also higher.

The simple topology scheme of star topology can also cause many problems, some of them will be listed below.

Limited coverage area: the WSN coverage is limited by the radio antenna. Other topologies relies on routing algorithms which can increase the monitored area.

Larger number of nodes per base station : The nodes of a star topology are disposed over a singe cluster. Thus, the number of nodes managed by a single base station is generally larger than in other topologies.

Single Failure Point : Star topology just rely on one single base station that receives messages and manage the entire WSN, when this base station fails all the WSN collapses.

Heavy wireless traffic : all the information collected by sensor nodes is broadcasted to a single node. Thus, the heavy network traffic can cause congestion.

Cluster-tree Topology

The nodes of a cluster-tree topology are divided into clusters. Each cluster has a special node called cluster-head (CH) that is responsible for the management of all the nodes of its cluster. The base station node must manage all WSN nodes and generally receives messages from the CH´s. Thus, a cluster-tree WSN is composed by three main actors: base station, sensor nodes and cluster heads (Figure 8).

Figure 8: Cluster-tree Topology

There are several challenges when a cluster-tree topology is used:

Cluster formation : The cluster formation in a random or strategically deployed WSN is a complex task. The choose of nodes of each cluster is generally done based on goals like: cluster size, increase WSN coverage, decrease overlapping of CHs(see Figure 9) or minimize power consumption.

Figure 9: Overlapping Problem

CH Election : The election of a CH in a cluster is based on metrics like: battery charge, geographical location, special hardware characteristics (mobility, wireless module, process power or storage). Due to the fact that CH spent more energy than common nodes, the CH schedule is also used in order to increase WSN lifetime.

Management of a larger WSN : due to the fact that nodes are divided into clusters the monitored area is bigger than in a star topology. Besides, there are more nodes to manage.

Routing Techniques : the unpredictability of wireless media and the low dependability of nodes cause several changes in the WSN topology. Thus, the state of routes must be periodically updated. The high level of exchanged messages, also increase the energy consumption. Moreover, routing tables can not have all the routes due to the limited memory size of WSN nodes.

The third kind of WSN topology is mesh, that is used when there is no central management node(see Figure 10). Thus, every WSN node can be used as a router. This way, there are several possible paths along the WSN. The main advantage of a mesh WSN can be considered the fault tolerance. This characteristic is mainly due to the distributed nature of the mesh network. Due to the fact that all nodes can route messages and there is no central node, when some node fails another can assume its duties. The main disadvantage is that all nodes must be prepared to deal with uncertainties that are not previously known (for example route changes). Moreover, the approaches of mesh WSN must be carefully implemented In order to overcome hardware and software constraints.

Figure 10: Mesh WSN

Finally, there are several advantages when a WSN technology is used like:

Non-intrusive monitoring : the small size of WSN nodes allows a non-intrusive environmental monitoring. Moreover, the wireless communication decrease the deployment effort.

Low cost Technology : The WSN technology is much more cheaper than other wired solutions.

Larger area monitoring : the wireless communication and the low cost of WSN nodes allow the deployment of large scale networks. Thus, it is possible to cover larger monitoring areas than other technologies.

On the other hand, the small size, wireless media and severe hardware and software constraints introduces new challenges in the development of WSN approaches:

Energy Consumption : WSN are often deployed over harsh or inaccessible areas. Thus, the battery replacement is generally prohibitive.

Resource Constraints : As noted above, WSN faces severe resource constraints. The main resource constraints are: limited energy budget, restricted CPU clock, memory and network bandwidth. This characteristic imposes the application of new solutions. The fact that WSN topologies are composed of a huge number of nodes is also a new issue that were not usually considered in simple ad-hoc networks. For instance, trade-off approaches that aims energy economy and real-time requirements became necessary [20].

Self-*: One of the biggest challenge is how to create a

WSN vision in the network application layer. Due to the

fact that WSN are deployed to operate with few or none

human intervention, self-* characteristics like self-

organization, self-optimization and self-healing become

necessary [18]. This characteristics are easily listed as

challenge, however it is extremely difficult to achieve.

High scale/density : There are several WSN approaches that consider a large number of nodes in order to overcome hardware or software faults, thus there is a minimum number of nodes that are necessary to guarantee the WSN service. The main challenges include: the processing of this large number of generated data, the assurance that WSN achieves the minimum desirable density, and the development of solutions that requires the least density and energy consumption in order to minimize energy consumption and maximize the WSN lifetime. The WSN based on huge number of nodes that are deployed in large areas are considered large scale systems. Due to the high density, these systems are subject of faults, noise (that sometimes can be caused by the WSN itself) and other uncertainties. Moreover, when a WSN is deployed it must be self-operational and present self-maintenance, due to the fact that human intervention is sometimes very expensive or even impossible. Therefore, all these characteristics imposes several conflicting goals. These challenges can be increased due to the minimization tendency in the industry (nanometrics WSN are being considered) [17].

Real-time: WSN operates in the real world, thus real-time features are really necessary to guarantees the correct WSN functioning. These systems present implicit real-time constraints. Besides the response time of this task is also important, thus the system tasks must be finished as faster as possible. Several WSN also present explicit real-time constraints. For example, a structural monitoring application imposes explicit deadlines for the data sensing [19]. However, due to the large number of nodes, non-determinism and noise it is extremely hard to guarantee real-time properties.

Security: WSN can be used in critical applications, thus the security is an essential issue to be considered. Denial of Service techniques can be easily executed over a WSN. Moreover, coordination and real time communication approaches do not consider security issues. Thus, some intruder can easily exploit these WSN security faults. The great dilemma is how to implement security techniques that need large computational resources in a technology that have severe hardware constraints.

3. Autonomic Computing Principles

Computer systems have achieved such a high level of complexity that human efforts to keep them operational has become inadequate. A similar problem took place in 1920 in telephony. At that time, human operators were required to work hand in switchboards. The rapid

popularization of the phone caused serious concerns regarding the number of trained operators to meet the demand. The introduction of the machines that performed the work eliminated the need for human intervention [21].The term autonomic computing was introduced by IBM in 2001 to describe computer systems able to self-manage [22]. The main properties "auto-x" proposed by IBM are: self-configuring, self-optimizing, self-healing and self-protection. Each one of them is detailed as follow [21]:Self-configuration: system's ability to configure itself according to high level objectives;Self-optimization: The system can decide to start a change in the system so pro-active, in order to optimize the performance or quality of service;Self-healing: the system detects and diagnoses problems. The problems here can be either faulty bits in a memory chip and a software error;Self-protection system is able to self-defend against malicious attacks or unauthorized changes.The idea of autonomic computing is heavily inspired by biological systems. Biological systems are the result of years of evolution and have desirable features such as autonomic systems [23], some of these characteristics are mentioned below:

• Environmental changes adaptation;• Robustness to failures caused by internal or

external factors;• Ability to achieve complex behaviors usually

based on a limited set of basic rules;• Ability to learn and evolve as new conditions

are applied;• Ability to self-organize in a distributed manner,

achieving an effective balance in a collaborative manner;

• Intelligent management of limited resources through a global intelligence;

• Survivability in harsh environments.The features of biological systems mentioned above are presented by computation techniques evolutionary [24]. Moreover, the WSN introduce an explicit need for self-organization [23], especially with the tendency to achieve nanoscale devices [17].

4. Genetic machine learning algorithms

Classifier systems are machine-learning algorithms based on genetic algorithms. These systems are able to learn syntactically simple rules. In this paper we call it Genetic Machine Learning Algorithm (GMLA). Classifiers systems are composed of three main components: (i) Rules and Message System; (ii) Apportionment system; and (iii) Genetic Algorithm.

Rules and Messages System is a computational scheme that uses just simple rules to guide the system in

a certain environment. Rules are generally of the following form: if <condition> then <action>

The meaning of this production rule is that the action has to be imposed to the system when the condition is satisfied. Classifiers are generally composed of three characters {0,1,#}, where # is a wildcard: it can mean 0 or 1. A message received by the system can activate one or more classifiers.

Table 1 - Example of classifier population.

Condition 10#01# 11#1#0 0#1111 100001

Action 100 111 001 110

When the system receives a message “101011” from the environment, then the first classifier will be activated and the action “100” will be executed. Classifier systems are able to adapt their classifiers in a way where actions that enhance the performance of the system are privileged. This way, a classifier system is able to adapt itself in an unknown system.

In the startup of the classifier system, all classifiers receive the same budget. Budget is an adaptation measure of some classifier. When some classifier is chosen by the classifier system, it has to pay a predetermined amount of its budget to the apportionment system. This amount is previously set by the manager system. When more than one classifier satisfies some predefined condition, the one that has the larger budget will be chosen. The amount of accumulated budget by the apportionment system will be paid to a classifier that improves system performance. On the other hand, if the last classifier did not improve the system, it will lose part of its budget as a payment for its bad action. This way, the most adapted classifiers will increase their budgets. After some consults to the classifier system, the genetic algorithm (GA) evolutes the classifier population in order to get better solutions to the problem.

GA consider a population of answers for some question, in this case, individuals are represented by their genotypes, which are usually a set of bits or characters. This population is evoluted by the GA every cycle of evolution. At each generation of answers, a new set of artificial creatures (set of characters) is generated. These answers are based on fragments of the most adapted previous individuals. The main focus of GA is robustness. Once a system is more robust it will not need intervention of programmers or redefinitions. Moreover, they will achieve higher levels of adaptation and they will be able to execute better and longer.

The main difference between classical GA approaches and the GMLA approach is that GMLA optimizes the answer for the problem on-line, due to the fact that it gives answers instantly. Otherwise, GA needs more time to achieve a problem solution.

4.1. GMLA Dynamic adjust of sending probabilityThe main target of the proposed approach is to dynamically adapt the sending probability Sp. In such a way, there is a trade-off between QoF and Ef. The configuration of the classifiers is shown in Table 2.

Table 2. Classifiers configuration.

Classifier Part

Bits Meaning

C1 1 0 = decrease, 1 = increase

C2 3 000 = [0%;12%]001 = (12%;24%]010 = (24%;36%]011 = (36%;48%]

100 = (48%;64%]101 = (64%;72%]110 = (72%;84%]111 = (84%;100%]

A1 1 0 = decrease, 1 = increase

A2 3 000 = 12%001 = 24%010 = 36%011 = 48%

100 = 64%101 = 72%110 = 84%111 = 100%

According to Table 2, a classifier is composed of four parts: C1, C2, A1 and A2. The form of a classifier is <C1+C2>:<A1+A2>. C1 value indicates if the efficiency has increased or decreased since the last checkpoint, and C2 indicates the efficiency gain level. A1 value indicates if Sp will be increased or decreased, and A2 indicates the level of change in Sp. The overhead imposed by GMLA is much smaller than the overhead of traditional GAs. However, the evolution requires more system execution time. The key of GMLA is that the evolution is done during the execution time, whereas a traditional GA evolutes to candidate solutions before executing them. This is one of the reasons why we consider GMLA-based solutions more suitable for dynamic systems like WSN applications. The efficiency variation is calculated as follows:

Efi = 10011 ×

−

−

i

i

Ef

Ef(1)

5. Related work

The proposed model has its roots in previous research work from the authors and also in some existing wireless network standards. The adopted star topology is part of ZigBee technology (based in IEEE 802.15.4). Approaches presented in [4,5,7] use star topologies, as well, where sensor nodes reach the base station in just one hop.

The round concept is showed in [8]. The main goal of this concept is to discretize the time intervals at which decisions are made. In this work, the monitoring phase (that is equivalent to our session concept) is divided in equal duration rounds.

A metric similar to the proposed QoF concept is presented in [3], within the context of real-time databases. The metric is called QoD (Data Base

Freshness) and considers miss deadline ratio and the data freshness (QoD levels) as the relevant metrics.

A parallel data fusion scenario is considered in [5], where the master node is not aware of the number of sensor nodes. The data fusion rule (referred as counting rule) imposes that the number of packets must be greater than a pre-defined threshold in order to make a decision.

A serial fusion based on genetic algorithms is presented in [1]. However, a mobile agent approach for the target detection is used in order to validate the multi objective genetic algorithm.

A node grouping approach called H-NAME is presented in [11], where the authors show that such technique can improve the network throughput, reliability and energy efficiency, while the transfer delay is decreased. The impact of hidden nodes was highlighted through simulation in an IEEE 802.15.4 star topology. Two test beds were also used to demonstrate the performance of the proposed approach. However, when the network is grouped in clusters the network density is decreased. Moreover, the tests just considered a IEEE 802.15.4 network with 18 nodes in beacon mode.

Q-DAP is a QoS data aggregation and processing approach that is executed at the intermediated nodes of a cluster-tree network [12]. Thus, the energy efficiency and network lifetime are increased while end-to-end latency and data loss are decreased. The main effort in Q-DAP is to determine when and where execute data aggregation based just in local information. Q-DAP was evaluated through simulation and mathematical modeling. The main concern about this approach is that it considers a static cluster tree topology with predetermined routes. Therefore scalability issues may hinder the quality of the proposed approach.

MMSPEED is a routing protocol for probabilistic QoS guarantee in WSNs. It provides two quality domains, called timeliness and reliability [13]. MMSPEED guarantee multiple packet delivery speed options in the timeliness domain. It also provides various reliability requirements by multipath forwarding. End-to-end requirements can be guaranteed in a localized way, which is desirable for scalability and adaptability to large scale dynamic sensor networks. Nevertheless, the use of geographic routing poses to the nodes to be aware of their positions. This way, authors assumed that WSN nodes have GPS or distributed location services. However GPS devices are expensive and do not function well indoor and distributed location services impose extra overhead in packet exchanging (a node must periodically send its location in broadcast).

5.1. IEEE 802.15.4 standard

IEEE 802.15.4 [2] was proposed in 2003 and is becoming a de facto standard for low power and low rate

wireless networks. The physical layer can operate with 250 Kbps of maximum transmission rate. The MAC supports two types of operational modes that can be selected by a central node called PAN coordinator: (1) beaconless mode, a non-slotted CSMA/CA; and (2) beacon mode, where beacons are sent periodically by PAN coordinator. In this last case, nodes are synchronized by a superframe structure.

An IEEE 802.15.4 network can enable the use of up to 65,000 nodes, based on its address scheme. Three types of topologies are supported: star, mesh and cluster tree. The star topology is considered the simplest scheme, where nodes achieve to communicate with each other in just one hop.

CSMA/CA in beaconless mode is used when the coordinator does not send a periodic beacon. Thus, backoff periods of one device are not related in time to the backoff periods of any other device in the network [2].

Two variables are maintained by each device in beaconless mode: NB, the number of times a CSMA/CA algorithm is required to backoff and BE, the backoff exponent, which is related to how many backoff periods a device shall wait before attempting to asses a channel.

The first step in the algorithm is the initialization of NB and BE. After this step, the MAC sublayer shall delay for a random number of complete backoff periods in the range of 0 to 2BE – 1 and request the physical layer to perform CCA (Clear Channel Assessment). If the channel is assessed to be busy, the MAC sublayer will increment NB and BE by one (the algorithm must ensure that BE is not greater than macMaxBE). If the value of NB is greater than macMaxCSMABackoffs, the CSMA/CA shall end with a channel access failure status [2].

This way, we may consider three main parameters that influence beaconless CSMA/CA performance: macMaxBe (default value 5), macMaxCSMABackoffs (default value 4) and macMinBE (default value 3) [3].

These default values can decrease battery consumption (due to the fact that one device just try 5 times before abort the transmission), however when the number of nodes increase in the network the communication efficiency decreases (see Figure 1). Thus, IEEE 802.15.4 does not seem to be adequate for applications requiring the use of dense networks.

6. Communication model

The used communication model considers one master node (base station) and N slave nodes (Figure 2), where the slave nodes periodically sense scalar data [16]. The signal is considered to be homogeneous in the monitoring area. Data collected by slaves is sent to the master node that performs the data fusion. All the slave nodes reach the master using just one hop. That is, a parallel data fusion is performed in master node.

Figure 2 - System Architecture.

The concept of monitoring session is adopted. A monitoring session is a time interval where all slave nodes send periodically sensed data to the master node. A session S is composed of N TS rounds with the lenght R. Therefore, it is composed of 0,R,2R,3R, ..., (N-1)R rounds. The round concept is used to synchronize nodes, and it also represents the periodicity of the data fusion task. On each round, a slave node can send zero or one message M containing the sensed data to master node.

All slave nodes are synchronized by the WSN round concept. Each message M sent by a slave node has an absolute deadline D, that is the maximum time interval within which it must be delivered to the master node. Otherwise, it will no longer be useful for the data fusion task. This absolute deadline is computed based on a relative deadline d. We considered an homogeneous architecture where all slave nodes have the same relative deadline. This relative deadline value is sent by the master node in the beggining of the session. The absolute deadline of a slave node at round n is D=nR+d, where R is the round lenght.

The master node performs a data fusion operation considering just the messages that arrived on time. In this work, the master node just fuses data that arrived within the same round. That is, the relative deadline of a message sent in round n is always 0<d<R, and consequently, the absolute deadline is nR<D<(n+1)R.

Figure 3: Network Behavior

Figure 4 - Slave Node Algorithm.

A sending probability (Sp) parameter is considered in the model, and all slave nodes have the same Sp. This parameter control the messages sent by slave nodes within each round. For instance, if Sp is configured to 0.1, each slave has a 10% probability of sending its message. The signal is considered homogeneous and redundant in the monitoring area, so a well-configured Sp saves network energy, reducing the number of packets in the WSN. The network behavior is presented

in Figure 3.The sending probability, the round time and the

relative deadline parameters are sent by the master node in the beginning of each session. Some of these parameters may remain valid during all the monitoring session, or they can be changed at checkpoint C. A checkpoint is a special round where it is imposed the resynchronization of parameters based on the network condition. Slave nodes do not send messages in a checkpoint round; they just receive new parameter values. The first round of every monitoring session is a checkpoint round and slave nodes always wait for parameter values in the first round.

Figure 5 - Master Node Algorithm.

The master node calculates performance metrics during a checkpoint round in order to tune the WSN. In the proposed model, two metrics are considered: Quality of Fusion (QoF) and Efficiency (Ef). Ef is the relation between timely received messages (messages received by the master node before the deadline) and sent messages. It is calculated according to:

Ms

N

i

E

MrEf

∑== 1 (2)

where N is the number of rounds since previous checkpoint C, Mr is the number of received messages and EMs is an estimation of the number of messages sent by slave nodes (3). This metric indicates how many messages are used in data fusion task:

NDeSpEms ××= (3)

where De is the density of slave nodes in the WSN deployment. Finally, QoF is the average number of received messages by the master node during a monitoring session, which is evaluated according to:

TS

MrQoF

N

i∑

== 1 (4)

The basic idea of the QoF metric is to represent the quality of information on data fusion. A higher number of messages used in data fusion task result in more reliable information. Figures 4 and 5 present an activity model of Slave and Master algorithms respectively.

7. GMLA Simulation results

Our approach was evaluated using the TrueTime simulator. We considered a star topology in IEEE 802.15.4 and fixed the position of the master node in the center of a 70x70 meters square and slave nodes are randomly deployed in this square in a way that their antennas are able to reach the master node. Therefore, at each experiment, we have a different network topology. Besides this, our fault injection scheme increases the network topology uncertainty.

We used a fault injection scheme to add an uncertain behavior in the WSN. This way, at each round some slave nodes have a probability of failures in the communication. Therefore, the networks topology may change at each round, what justifies the use of the GMLA.

The main target of this simulation is to check GMLA performance in a random deployed IEEE 802.15.4 WSN.Learning GMLA capability has also been checked. This way, the simulation time was varied (500, 1000, 1500 and 2000 seconds), and we made 33 simulations for each simulation time. Table 3 and 4 presents, respectively, IEEE 802.15.4 and GMLA parameters.

Table 3 - IEEE 802.15.4 parameters.

Data rate

Transmit

power

Receiver signal

threshold

Pathloss exponen

t

Ack timeout

Retry limit

250 kbps -10 dbm -90 dbm 3.5 0.864

ms

3

Table 4 - GMLA Parameters.

Checkpoint time

Reposition rate

Evolution Population size

Actiontax

10 2 15 16 100

The main goal of the GMLA approach is to improve the communication efficiency, in a communication environment where the network topology is unknown to the master node (node that tunes Pe). We can notice in Figure 6 that the communication efficiency maintains the same level when IEEE 802.15.4 is used. However, when GMLA is used, it is possible to notice a gain of almost 10% in communication efficiency.

0

5

10

15

20

25

30

35

40

45

500 1000 1500 2000

rounds

Effi

cien

cy (

%)

IEEE 802.15.4 GMLA

Figure 6 - Comparison of GMLA and IEEE 802.15.4.

We collected the maximum communication efficiency in 33 simulations. Our approach, always achieved higher levels of communication efficiency too. We could notice that IEEE 802.15.4 presents a static behavior, and that it can not learn better communication patterns when topology changes are faced.

Table 5 presents average and standard deviation of 33 simulations. GMLA presented higher values of standard deviation than IEEE 802.15.4. This is due to the learning characteristics of GMLA. GMLA has to test different types of Sp, in order to get higher levels of Ef. However, IEEE 802.15.4 is not able to optimize Ef, so it maintains the same reduced level of communication efficiency. The higher level of Ef was achieved in 1000 rounds simulation, where GMLA presented a 39% efficiency (average). Figure 7 presents the maximum Ef, achieved in 33 simulations. It is possible to notice that GMLA always achieve a higher level of Ef.

Table 5 - Average of Efficiency.

Rounds GMLA IEEE 802.15.4500 37+/-2 34+/-0.71000 39+/-3 34+/-0.51500 38+/-2 34+/-0.22000 37+/-2 34+/-0.1

0

10

20

30

40

50

500 1000 1500 2000rounds

Max

imum

Effi

cien

cy (

%)

IEEE 802.15.4 GMLA

Figure 7 - Comparison of Maximum Efficiency.

0

5

10

15

20

25

30

35

40

45

500 1000 1500 2000rounds

Effi

cien

cy/Q

oF (

%)

Eff iciency QoF

Figure 8 - GMLA Efficiency and QoF values.

An analysis of Figure 8 indicates that QoF maintains almost the same level, in all simulations. However, the higher level of Ef was achieved in 1,000 rounds simulation. This could be explained through GMLA´s learning behavior, which tries different Sp when longer simulation times are runned.

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

RM SM RM SM RM SM RM SM

500 rounds 1000 rounds 1500 rounds 2000 Rounds

Mes

sage

s

IEEE 802.15.4 GMLA

Figure 9: Sent (SM) and Received Messages (RM).The relation between Received Messages (RM) and

Sent Messages (SM) is shown in Figure 9. It is possible to notice that GMLA always presents a higher Ef than IEEE 802.15.4. Thus, a WSN running the proposed GMLA approach send a lower number of messages sent than an ordinary IEEE 802.15.4. This way, we can conclude that the proposed GMLA approach is able to trade-off QoF and Ef. Moreover, GMLA expend lower levels of energy than IEEE 802.15.4. However, this approach is just suitable for applications where the signal is homogeneous through the monitoring area.

8. Variable Offset Algorithm (VOA)

VOA algorithm (Variable Offset Algorithm) targets the optimization of the communication efficiency in dense WSNs with star topology. The VOA algorithm can be easily implemented upon IEEE 802.15.4 devices, as it is a light middleware implemented at the application layer. The main target of VOA is the communication efficiency through the use of random offsets before slave

nodes transmission.A round is triggered by the master through the

broadcasting of a checkpoint message. This message synchronizes the beginning of the session among all slaves, and conveys 4 parameters: a) SL: the session length; b) RL: the round length; c) MO: the maximum offset that a slave can use during a session; and d) K: the number of messages that each slave should transmit in a session.

Therefore, a checkpoint imposes a resynchronization of parameters based on the network condition. If the checkpoint message is not received by a slave, than the device it will wait in listen mode until receive the next checkpoint message (on the next session).

The maximum offset (MO) is a parameter that is used o compute a random delay in range [0, MO[ using a Uniform distribution. This delay is used later to desynchronize the instants of transmission between different slaves.

During a session each slave should transmit K messages. This QoS requirement is defined by the data fusion application that is executed in the master node. Message transmissions proceed as follows. After the random delay, each slave should transmit one message every round times, until K messages are transmitted, or the round finishes (which occurs first). Message transmissions are always acknowledged. The default behavior of IEEE 802.15.4 is assumed regarding medium access, collisions, retransmissions, timeouts, etc.

In conclusion, a K-out-of-N model is proposed where slaves have a QoS requirement of sending K messages during a session (i.e. N round). This approach guarantees that the probability of several slaves try to transmit a message in the same instant is minimized, which reduces the number of collisions and allows the transmission of a higher number of messages when compared with the standard solution (i.e. IEEE 802.15.4). Moreover, this model introduces a trade-off between QoS and energy consumption in the WSN. The tuning of number of sent messages is enabled with VOA, which increases both the network lifetime and the QoS even in random deployed networks.

The master performs the data fusion operation considering just the messages that arrived on time. In this case, the master just fuses data that arrived in the previous session. In order to tune the operation of the network, the master computes QoF and Efficiency metrics at the end of each session. Efficiency is the relation between timely received messages (messages received by the master in the previous session) and the required messages. It is computed as following:

Ms

N

ii

E

MrEf

∑== 1 (5)

where Mri is the number of received messages from slave i and EMs is an estimation of the number of messages sent by slave nodes (Eq. 6). This metric indicates how many messages are used in data fusion task:

NKEMs ×= (6)

where K is the QoS requirement and N is the number slaves. Quality of Fusion is the average number of received messages by the master node during all the sessions, and is evaluated as equation 4.

In the following section a detailed description of VOA algorithm is performed using pseudo-code language.

8.1. VOA AlgorithmVOA slave and master algorithm are presented below:

% Master Algorithm%% K : number of transmissions% MO : maximum offset% SL : Session length% RL : Round length

If new_Session thenK = number_of_transmissions()MO = maximum_offset()end_Session =Start_timer(SL)Broadcast (SL, RL, K, MO)

Endif

While !end_Sessionmsg = Receive_msg()Store (msg)

Endwhile

If end_Session thenCalculate_efficency()Calculate_QoF()Data_fusion()Stop_timers()new_Session= True

Endif

% Slave Algorithm%% nMT : number of messages transmitted% SOF : slave offset

If new_Session thenIf Receive_ckeckpoint (SL, RL, K, MO) then

SOF = Slave_offset(MO)start_Transmission = Start_timer(SOF)end_Session = Start_timer(SL)new_Session = FalsenMT = 0

Else

Listen()Endif

Endif

If start_Transmission thenmsg = Sense_data()Send_data(msg)nMT++If !end_Session then

If nMT < K Then start_Transmission = Start_timer(RL)Endif

ElseStop_timers()new_Session = True

EndifEndif

A final remark about the random number generation. Since this task could be computationally and timely intensive we adopt a simple approach for their implementation. Instead of performing a direct calculation using a specific algorithm we used pre-computed values which are stored in the slave ROM.

9. VOA Experimental results

The experimental setup is composed by 30 MicaZ nodes [14], featuring an Atmel ATmega128L 8-bit microcontroller with 128 kB of in-system programmable memory and with IEEE 802.15.4. However the deployment area is now a 1,3x1,3m square (Figure 10).

The checkpoint message has 19 bytes, and messages sent by the slaves have 18 bytes. The memory occupied by VOA code is the following:o Master algorithm: 572 bytes (RAM memory) and

15902 bytes (ROM memory);o Slave algorithm: 333 bytes (RAM memory) and

12088 bytes (ROM memory).

Figure 10: Experimental setup.

As in the simulation, two sets of results were obtained: with and without VOA (Table 6).

Table 6: Experimental results.

VOA

KReceived Messages

Required Messages Ef QoF

1 28,825 29,000 99,4 2,82 57,617 58,000 99,3 5,73 86,260 87,000 99,2 8,64 115,209 116,000 99,3 11,55 139,979 145,000 96,5 13,96 156,088 174,000 89,7 15,67 161,585 203,000 79,6 16,18 159,421 232,000 68,7 15,99 169,653 261,000 65,0 16,9

IEEE 802.15.4

KReceived Messages

Required Messages Ef (%) QoF

- 80,127 261,000 30,7 8,0

It is possible to notice that when K increases the efficiency decreases. However, VOA obtained good levels of Efficiency due to the fact the round duration is just 0,1 second. Moreover, a decrease in the level of efficiency is just seem when K is higher than 4. When K is in range 1 to 4, the levels of efficiency achieved were higher than 99%.

The relationship between received and required messages is presented in Figure 11. It is possible to notice a gap between estimated and received messages when K>4. The lower level of efficiency was achieved when K= 9 (65 %).

Figure 11: Required and received messages.

The relationship between QoF and Efficiency is presented in Figure 12. When K < 4 the level of

Efficiency is almost the maximum. However, when K is near of the number of microcycles the level of Efficiency decreases and the QoF increases. This is possibly due to the fact that the wireless media is very busy.

Figure 12: QoF and Efficiency.

A second experiment was also performed by varying the number of slaves (Figure 13). The goal was to evaluate the influence of the number nodes in the Ef and QoF metrics. When compared with VOA, IEEE 802.15.4 presents similar results for just one case: a network with 4 slaves. When the number of slaves increases this difference between VOA and IEEE 802.15.4 become higher. The difference of efficiency between VOA and IEEE 802.15.4 when considering 29 slaves its more than 100%. These results show that VOA has a satisfactory performance and maintains a minimum QoS level even with a high number of slaves.

0

5

10

15

20

25

4 9 14 19 24 29

Qu

alit

y o

f F

usi

on

(Q

oF

)

0

10

20

30

40

50

60

70

80

90

100

Number of nodes

Eff

icie

ncy

%

VOA QoF IEEE 802.15.4 QoF.VOA Ef. IEEE 802.15.4 Ef.

Figure 13: Variable number of slaves.

10. Final remarks

In this chapter we have shown the challenges of WSN technology. Moreover, we have shown how autonomic computing can support large scale WSNs. The GMLA approach for WSN data fusion applications was presented as a case study. GMLA achieves higher levels of Ef than IEEE 802.15.4 even facing random topologies

and communication faults. This way, the proposed approach seems to be suitable for non-predictable WSN. Moreover, GMLA presented a trade-off between QoF and Ef. GMLA presented almost 13% of gain over IEEE 802.15.4 in 1000 rounds simulation.It has also been shown the VOA algorithm to enhance the communication efficiency in dense wireless sensor networks. A set of comparisons with IEEE 802.15.4 bare nodes showed an impressive enhancement in terms of communication efficiency and QoS. The VOA algorithm was assessed with the help of an experimental setup based on MicaZ motes. The obtained results showed a clear improvement of the efficiency attained by the proposed algorithm. Moreover, both VOA and GMLA algorithms can be implemented as a light middleware at the application layer, thus no network stack modifications are necessary.

Acknowledgement.The authors acknowledge the support granted by

CNPq and FAPESP to the INCT-SEC (National Institute of Science and Technology - Critical Embedded Systems - Brazil), processes 573963/2008-9 and 08/57870-9.

References

[1] Q. Wu, N.S.V. Rao, J. Barhen, S.S. Iyergen, V.K. Vaishnavi, H. Qi, K. Chakrabarty, On Computing Mobile Agent Routes for Data Fusion in Distributed Sensor Networks, IEEE Trans. on Knowledge and Data Engineering, Vol 16, No. 6, 2004.

[2] 802.15.4 Part 15.4: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Low-Rate Wireless Personal Area Network (LR- WPAN), IEEE-SA Standards Board 802.15.4, 2006

[3] K. Kang,, H. S. Son, J.A. Stankovic, Managing Deadline Miss Ratio and Sensor Data Freshness in Real-Time Databases, IEEE Trans. on Knowledge and Data Engineering, vol. 16, No. 10, 2004.

[4] K. Morita, K. Watanabe, N. Hayashibara, T. Enokido, M. Takanizawa, Efficient Data Transmission in a Lossy and Resource Limited Wireless Sensor-Actuator Network, Proc. of the ISORC'07, 2007.

[5] R Niu, P.K. Vashney, Q. Cheng, , Distributed Detection in a Large Wireless Sensor Network, Science Direct, Information Fusion 7, July, 2006.

[6] G. Werner-Allen, J. Johnson, M. Ruiz, J. Lees, M. Welsh, Monitoring Volcanic Eruptions with a wireless Sensor Network, Proc. of the Second European Workshop on Wireless Sensor Networks, 2005, 108-120.

[7] A.A. Somasundara, A. Rammorthy, M.B. Srivastava, Mobile Element Scheduling with Dynamic Deadlines, IEEE Trans. on Mobile Computing, Vol 6, no 4, 2007.

[8] T. Yan, T. He, J.A . Stankovic, Differentiated Surveillance

for Sensor Networks, Proc. of First Int. Conf. on Embedded

Networked Sensor Systems, 2003.[9] I.F.Akyildiz, , W. Su, Y. Sankarasubramaniam, E Cayirci A

Survey on Sensor Networks. IEEE Communications Magazine, 2002, 102-114.

[10] J.A. Stankovic, T.F. Abdelzaher, C. Lu, L. Sha, J.C Hou, Real-Time Communication and Coordination in Embedded Sensor Networks. Proceedings of The IEEE, Vol. 91, No. 7, Jul. 2003, 1002-1022.

[11] A. Koubaa, R. Severino, M. Alves, E. Tovar, Improving Quality-of-Service in Wireless Sensor Networks by mitigating “Hidden-Node Collisions”, IEEE Transactions on Industrial Informatics, Special Issue on Real-Time and Embedded Networked Systems, Volume 5, Number 3, August 2009

[12] J. Zhu, S. Papavassiliou, J. Yang, Adaptative Localized QoS-Constrained Data Aggregation and Processing in Distributed Sensor Networks, IEEE Transactions on Parallel and Distributed Systems, vol 17 no 9, September 2006, 923-933.

[13] E. Felemban, Chang-Gun Lee,E. Ekici,MMSPEED: Multipath Multi-SPEED Protocol for QoS Guarantee of Reliability and Timeliness in Wireless Sensor Networks, IEEE Transactions on Mobile Computing, Vol.5, No. 6, june 2006, 738-754.

[14] MicaZ Mote Datasheets [Online]. Available at: http://www.xbow.com

[15] Akyildiz, I.F., Su W., Sankarasubramaniam Y. e Cayirci E.(2002). A Survey on Sensor Networks. IEEE Communications Magazine, pp. 102-114.

[16] A. R. Pinto, C. Montez, Autonomic Approaches for Enhancing Communication QoS in Dense Wireless Sensor Networks with Real Time Requirements, 2010 IEEE Int. Test Conference, 2010. 1-10.

[17] Akyildiz I., Brunetti, F., Blazquez C., Nanonetworks: A new communication paradigm, Computer Networks, no. 52, 2008, 2260–2279.

[18] Huebscher M.C, McCann J.A. A survey of autonomic computing degrees, models, and applications. ACM Comput Surveys 2008;40(3):1-28.

[19] Kim, S., Pakzad, S., Culler, D., Demmel, J., Fenves, G., Glaser, S., Turon, M., Health Monitoring of Civil Infrastructures Using Wireless Sensor.

[20] Yick, J., Mukherjee, B., Ghosal. D., Wireless sensor network survey, Computer Networks, no. 52, 2008, 2292–2330.

[21] Huebscher M.C, McCann J.A. A survey of autonomic computing degrees, models, and applications. ACM Comput Surveys 2008;40(3):1-28.

[22] Kephart , J.O., Chess, D.M. ,The Vision of Autonomic Computing, IEEE Computer, 2003.

[23] Dressler, F., A study of self-organization mechanisms in ad hoc and sensor networks, Computer Communications, no. 31, 2008, 3018–3029.

[24] Miorandi, D., Yamamoto, L., Pellegrini, F., A survey of evolutionary and embryogenic approaches, Computer Networks, no. 54, 2010, 944–959.

http://www.xbow.com/