salil r akerkar - arizona state university · 2019. 12. 21. · salil r akerkar _____ a thesis...

ANALYSIS AND VISUALIZATION OF TIME-VARYING DATA

USING THE CONCEPT OF ‘ACTIVITY MODELING’

by

Salil R Akerkar

______________________________________

A Thesis Submitted to the Faculty of the DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

In Partial Fulfillment of the Requirements For the Degree of

MASTER OF SCIENCE

WITH A MAJOR IN ELECTRICAL & COMPUTER ENGINEERING In the Graduate College

THE UNIVERSITY OF ARIZONA

2004

STATEMENT BY AUTHOR

This thesis has been submitted in partial fulfillment of requirements for an advanced degree at The University of Arizona. Brief quotations from this thesis are allowable without special permission, provided that accurate acknowledgement of source is made. Requests for permission for extended quotation form or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed us of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author. SIGNED_____________________

APPROVAL BY THESIS DIRECTORS

This thesis has been approved on the date shown below:

_________________________________________ _____________________

Dr Bernard Zeigler Date

Professor of Electrical and Computer Engineering

ACKNOWLEDGEMENT

I would like to thank my parents for the belief and confidence they had me. Their happiness was a major driving force for me to excel in my educational career. I would like to thank my sister for always setting challenging (and realistic) goals for me throughout my schooling days. I always followed her career path, and I am happy to do so. I would like to thank my advisor, Dr Zeigler for his immense support throughout the course of my Masters. It was a pleasure working under him and I truly have learned a lot under his tutelage. I would like to thank Dr. Hariri and Dr. Nutaro for being in my thesis defense panel. I would like to thank Hans-Berhard Broeker and Cristina Siegerist for their help with my thesis and Glenn Jorden for his wonderful mentoring during my Co-Op tour at AMD. I would like to thank everyone at ACIMS lab for creating a wonderful environment during my research assistantship over here. Thanks to Xiaolin Hu and Alex for providing data for my thesis. Lastly, thanks to all my friends here at Tucson, for making my stay a memorable one. A special thanks to Harish Krishnamoorthy and Harinarayan Venkatramani for being the best roommates ever.

TABLE OF CONTENTS

LIST OF CHARTS AND FIGURES........................................................................... 6 LIST OF TABLES ........................................................................................................ 8 ABSTRACT................................................................................................................... 9

1 INTRODUCTION..................................................................................................... 1 1.1 CURRENT TECHNIQUES IN VISUALIZATION ...................................... 1 1.2 RESEARCH MOTIVATION .......................................................................... 2 1.3 THESIS OUTLINE........................................................................................... 3

2 ACTIVITY MODELING ......................................................................................... 5 2.1 ACTIVITY MODELING: RELATED CONCEPTS .................................... 5 2.2 ACTIVITY: A DEVS CONCEPT ................................................................... 6

3 IMPLEMENTATION .............................................................................................. 9 3.1 AN OVERVIEW OF THE ‘ACTIVITY-MODELING’ SYSTEM.............. 9 3.2 PRE-VISUALIZATION PROCESSING BY PERL MODULES .............. 10

3.2.1 REGULAR STRUCTURED FORMAT ............................................... 11 3.2.2 CORRECTION LOGIC (PRE-VISUALIZATION) ........................... 13 3.2.3 DATA GENERATION USING PERL SCRIPTS................................ 15

4 THE ACTIVITY ENGINE .................................................................................... 16 4.1 THE ACTIVITY ENGINE- AN OVERVIEW............................................. 16 4.2 THE DATA ENGINE..................................................................................... 17 4.3 THE ACTIVITY GENERATOR .................................................................. 17

4.3.1 ACTIVITY FACTOR............................................................................. 18 4.3.2 SIGNED INSTANTANEOUS ACTIVITY........................................... 19

4.4 THE STATISTIC ANALYZER .................................................................... 20 4.4.1 LIVING FACTOR.................................................................................. 21 4.4.2 HISTOGRAM OF TIME ADVANCES................................................ 21

4.5 THE PATTERN PREDICTOR..................................................................... 23 4.5.1 PEAK-BASE METHOD ........................................................................ 24 4.5.2 REGION OF IMMINENCE (ROI)....................................................... 26 4.5.3 TRACE LOCATOR ............................................................................... 33 4.5.4 PEAK LOCATOR .................................................................................. 34 4.5.5 PREDICTING THE PATTERN FOR 1D SPACE.............................. 38

5 VISUALIZATION .................................................................................................. 44 5.1 GNUPLOT....................................................................................................... 44

5.1.1 INTRODUCTION................................................................................... 44 5.1.2 SPLOT FUNCTION ............................................................................... 45 5.1.3 GNUPLOT SCRIPTS............................................................................. 46 5.1.4 GNUPLOT ON WINDOWS .................................................................. 47

5.2 AVS-EXPRESS ............................................................................................... 48 5.2.1 INTRODUCTION................................................................................... 48 5.2.2 READERS ............................................................................................... 49 5.2.3 AVS-MODULES..................................................................................... 51

5.3 VISUALIZATION CONCEPTS ................................................................... 52 5.3.1 VISUALIZATION OF 1D TIME-VARYING DATA ......................... 53 5.3.2 ZERO PADDING AND BINARY VISUALIZATION........................ 55

5.3.3 VISUALIZATION OF 2D TIME-VARYING DATA ......................... 58 6 OBSERVATIONS................................................................................................... 61

6.1 DATA FOR 1D SPACE.................................................................................. 61 6.1.1 TEST DATA GENERATED BY dataGenerator SCRIPT .................. 61 6.1.2 1D DIFFUSION PROCESS FOR N=10, N=100, N =200 .................... 66 6.1.3 ROBOT ACTIVITY – (MODELING TRANSITION OF STATE)... 69

6.2 DATA FOR 2D SPACE.................................................................................. 71 6.2.1 2D DIFFUSION PROCESS ................................................................... 71 6.2.2 FIRE FRONT MODEL.......................................................................... 74

7 AUTOMATION AND OPTIMIZATION............................................................. 77 7.1 AUTOMATION OF THE SYSTEM............................................................. 77 7.2 OPTIMIZATION OF THE SYSTEM .......................................................... 81

8 SUMMARY AND FUTURE WORK .................................................................... 85 8.1 SUMMARY ..................................................................................................... 85 8.2 FUTURE WORK............................................................................................ 87

REFERENCES................................................................................................................ 90

LIST OF CHARTS AND FIGURES

Figure 1: Definition of Activity .......................................................................................... 6 Figure 2: ‘Activity-Modeling’ System............................................................................... 10 Figure 3: Regular structured format for 1D data .............................................................. 12 Figure 4: Operation of 2D formatter ................................................................................. 13 Figure 5: The Activity Engine ........................................................................................... 16 Figure 6: signed Instantaneous Activity ............................................................................ 19 Figure 7: The Statistic Analyzer ....................................................................................... 23 Figure 8: Peak-Base Method to find Accumulated Activity............................................. 25 Figure 9: False-Peak problem ........................................................................................... 26 Figure 10: Region of Imminence ...................................................................................... 27 Figure 11: Scanning IA curve – to find ROI in 1D........................................................... 29 Figure 12: Scanning IA surface – to find ROI in 2D........................................................ 31 Figure 13: Visual Comparison of scanning algorithms for 2D diffusion problem........... 33 Figure 14: Difference between Trace Locator and Peak Locator ..................................... 34 Figure 15: False Peak problem for 1D diffusion problem ............................................... 37 Figure 16: Approximated ROI using Algorithm 4.4......................................................... 37 Figure 17: Concept of order in the Predict Pattern module............................................ 40 Figure 18: Attributes of the Pattern data-structure........................................................... 40 Figure 19: Rotating the vertical angle............................................................................... 47 Figure 20: Rotating the horizontal angle .......................................................................... 47 Figure 21: File Import module for AVS-Express ............................................................. 51 Figure 22: Visualization of 2D spatio-temporal process .................................................. 54 Figure 23: Visualization using multi-plot graphs ............................................................. 55 Figure 24: Binary Visualization for Max-Locator module ............................................... 57 Figure 25: Surface Plot Images for 2D spatial visualization ............................................ 59 Figure 26: Test data 1,2 in value domain.......................................................................... 62 Figure 27: Results from Statistical Analyzer for test-data 1,2.......................................... 63 Figure 28: Results from Pattern Predictor module for test-data1,2 .................................. 64 Figure 29: IA/AA curve for test data-3............................................................................. 64 Figure 30: Results from Pattern Predictor module for test-data3 ..................................... 65 Figure 31: Check for Peak-Locator and Max-Locator module for test data-3 ................. 66 Figure 32: IA curve for 1D diffusion process (N=100, N=200) ....................................... 67 Figure 33: Activity Factor plotted for 1D diffusion (N=100. N=200).............................. 68 Figure 34: ROI for 1D diffusion (N=100, N=200) ........................................................... 68 Figure 35 Histogram of Time-Advances for 1D diffusion (N=10, N=200) ..................... 69 Figure 36: Activity Factor for robot activity..................................................................... 71 Figure 37: Imminent Factor and Living Factor for robot activity .................................... 71 Figure 38: Histogram of Time advances for the 2D diffusion problem ........................... 72 Figure 39: Standard Deviation for the 2D diffusion problem........................................... 72 Figure 40: ROI and Peak Bars for 2D diffusion process (t=2, t=20)................................ 73 Figure 41: Living Factor for Fire-Front data (100 x 100, t=300) ..................................... 75 Figure 42: Sum of IA values for all cells and imminent cells .......................................... 76 Figure 43: Instantaneous Activity Surface for Fire-Front model...................................... 76

Figure 44: Directory structure of system .......................................................................... 78 Figure 45: Command-line arguments for automation....................................................... 79 Figure 46: Automation Process for the ‘Activity Modeler’ system.................................. 80 Figure 47: Percentage Error in the Predicted and the actual ROI for 1D process ............ 89

LIST OF TABLES

Table 1: Computation time and Imminence Factor for scanning algorithms ................... 33 Table 2: Type of First order patterns ................................................................................ 41 Table 3: Type of Second order patterns............................................................................ 41 Table 4: Data types supported by AVS-Express............................................................... 50 Table 5: Visualization of results for 1D space.................................................................. 57 Table 6: Visualization of results for 2D space.................................................................. 58 Table 7: Process Characteristics for 1D/2D diffusion ...................................................... 67 Table 8: Results for Fire-Front model (100 x 100, T =300) ............................................. 74 Table 9: Optimization in time using the integer flag ........................................................ 84 Table 10: DEV v/s DTSS based on activity concept ........................................................ 87 Table 11: Percentage error in predicted and actual ROI for 1D processes. ...................... 88

ABSTRACT

Scientific visualization, which transforms raw data into vivid 2D or 3D images

and movies, has been recognized as an effective way to understand the large-scale

datasets. However, most existing visualization methods do not scale well with growing

data size. At present there are a number of techniques to analyze data belonging to a

particular time-step, but not much research has been made into analysis of data with

respect to correlation between time-steps. An attempt is made in this thesis to develop a

system, extending the concept of Activity in DEVS, based on a simplified theory of

finding the active regions in a cellular space of a spatio-temporal process. The process of

analyzing and visualizing the time-varying data using the system is called Activity

Modeling. Monitoring of activity would aid in analyzing the process with respect to it’s

computationally efficiency, dynamically allocating resources to the then-active regions.

Analysis of data using Activity Modeling gives a different perspective of the process

under consideration and focuses only on the active regions in the cellular domain. An

overall analysis of the process is presented, in the form of images and movies, of various

results computed in the cellular and temporal domain The ‘Activity modeling’ system,

apart from detecting active regions in the time-varying datasets, also computes statistical

results and introduces a concept to predict a possible pattern based on the temporal

correlation extracted from the data analyzed during the present time step.

�

1

1 INTRODUCTION 1.1 CURRENT TECHNIQUES IN VISUALIZATION

Several disciplines, including medicine, computational fluid dynamics, molecular

dynamics and oceanography need to analyze large time-varying data-sets using effective

visualization and data analysis methods. However, visual exploration of the data is a

complicated task because often data produced is large, remote, multi-dimensional and

featureless.

There are organizations like ITR [1] which focus on improving the interactivity

and explorability of large-scale, time-varying data visualization through the study and

development of innovative data reduction methods. ICASE and NASA LaRC [2] have

come up with novel methods to visualize time-varying data and organize frequent

symposiums to tackle the problem of analyzing large time-varying data. There are papers

published [3] to develop an end-to-end low-cost solution for visualizing time-varying

data on a parallel computer located at a remote site, in essence making use of large

storage space and parallelization techniques to increase the processing power.

Current techniques are more focused on scalability issues and interactive

techniques of visualization like volume rendering (ray-tracing algorithms for volume

data) and Iso-surfaces. At present the amount of data from scientific and engineering

simulations is so large that overwhelming percentage of it is never inspected. There are a

few efforts being actually made on doing the analysis of data at the right locations. One

such effort [4] deals with developing a client/server model for visual analysis of iso-

surfaces and cross-sections in time-varying data and to set a control plane based on it.

2

The control plane would guide the user with possible starting locations for inspection of

data.

It is also found that most of the present techniques are focused in visualization of

a single time-step at a time giving more importance to the spatial correlation in the data.

There is some research [5] going on in progress to de-correlate the data into a range of

spatial and temporal levels of details using a concept called as Wavelet-Based Time-

Space Partitioning tree for large-scale time-varying data.

1.2 RESEARCH MOTIVATION

At present there a number of simulations in the scientific and engineering

community to produce time-varying data which needs to be visualized in an efficient

way. Many applications producing data actually are interested in visualizing the change

going on in a particular cell in the temporal domain. It would be possible to find this

change, if we have a minimum of at least two time-steps of data values at any given time.

The concept of Activity was introduced in DEVS [6] to compute the number of

threshold crossings for a cell. It computes the activity of the cell for a pre-defined

quantum by using the correlation found in data, in the temporal domain. If the concept of

Activity is extended to visualization of time-varying data to generate the change in data

values for successive time-steps, it would be possible to visualize the activity of the

process.

Visualizing activity for a process would help in tracing the region, which needs to

be brought in focus for the analysis of large time-varying data-sets. It is also possible for

the system to detect the shift in activity found in the cellular domain, which would be of

3

immense help for resource allocating algorithms.

Presently most of the techniques concentrate on visualizing data in only the

cellular domain (one image frame consists of one time-step). The concept of Activity

Modeling would make it possible to visualize temporal information for an image frame in

a cellular domain.

Based on this concept, we can not only generate the activity based information but

make use of it to find out other interesting results related to the pattern of activity in a

process, the shift of activity found with respect to time and the behavior of the process

with respect to its activity.

1.3 THESIS OUTLINE Chapter 2 introduces the term ‘Activity Modeling’ with respect to various

disciplines in today’s world. It overviews the ‘Activity’ concept in DEVS and finally

describes how ‘Activity Modeling’ would be used in visualization.

Chapter 3 provides an overview of the system (Activity Modeler). It explains in

brief the three stages of implementation generic to all the models. The first stage: Pre-

visualization processing is described in the same chapter.

The concept of ‘The Activity Engine’ is introduced in Chapter 4, which forms the

second stage of our system. Various concepts related to the visualization of the Activity

information are introduced here.

Chapter 5 dives into the Visualization modules implemented in the third (final)

stage of our system. Visualization modules developed in GNUPLOT and AVS-Express,

used to generate images and movies, are explained here.

4

Chapter 6 presents observations found from a variety of time-varying data

obtained from scientific simulations. It introduces a way to analyze the process from a

different perspective (Activity Domain).

Automation and Optimization methods used are described briefly in Chapter 7.

Benchmarking of the modules with respect to CPU time and memory usage is also

covered in this chapter.

Chapter 8 concludes the thesis discussing the future work, present limitations and

suggests enhancements, which could be made to the present Activity Modeler system.

5

2 ACTIVITY MODELING

2.1 ACTIVITY MODELING: RELATED CONCEPTS ‘Activity Modeling’ in essence is a very generic term, which can be applied to a

variety of topics in different disciplines. Before applying the concept of Activity

Modeling in the field of visualization, I would like to discuss a few fields where the

concept has already been deeply rooted.

Most of the early work in modeling activity comes from the field of Artificial

Intelligence [7]. Many uncertainty reasoning models have been actively pursued in AI

and image understanding literature, including Belief networks [9] and Dempster-Shafer

Theory.

There have been few ‘Activity Modeling’ algorithms in the field of Computer

vision for applications such as video surveillance. The basis depends on generating data

using various types of networks and formulating algorithms to recognize events. There

has been an interesting research [10] in Activity Modeling and Recognition in this field

using Shape Theory.

The ‘Interaction Lab’ [11] in University of Southern California is dedicated to

research in modeling human and robot group activity ranging from one-to-one interaction

to small groups to large crowd. Motion capture mechanisms including vision-based and

laser-based methods are used to develop robot-tracking system, which is used to gather

activity data over long periods. These data are used to derive interaction features and

patterns for ‘Activity Modeling’.

‘Activity modeling’ is a term very common in the ‘Work-Flow management

6

systems’ [12] and other Business related topics. However, in this case the term activity is

more related to an ‘organizational unit performing a specific function’.

2.2 ACTIVITY: A DEVS CONCEPT

The DEVS specification introduces a way to model systems as mathematical

objects using DEVS formalisms. It is used for discrete event modeling and simulation

and has considerable advantages over the conventional discrete time (DTSS) approach.

DEVS introduces a concept called Activity [13]. A cell is said to more active than

another cell if its value crosses the quantum greater number to times than the other one.

A mathematical definition of Activity over a continuous segment follows:

Figure 1: Definition of Activity

…(i)

…(ii)

…(iii)

t1 0 ti

m1

mi

mn

q

T

TActivityTyAvgActivit /)( =

||)( 1 ii mmTActivity −=� +

),)...(,)...(,()( 11 nnii tmtmtmTtternActivityPa =

7

…(iv)

…(v)

From the above equations we can conclude the following:

��Activity at any given time step cannot be negative.

��Activity during a given time interval is directly proportional to the number of threshold

crossings in that interval.

A quantum (q) is defined to provide quantization supported by DEVS simulator,

which is used to detect the change in output (event). In other words, quantization is a

method that allows DEVS simulator to track the activity of the cells. From equation (iv),

we find that the quantum size is inversely proportional to the number of threshold

crossings. It has been proved that to achieve computational efficiency, the number of

threshold crossings should be roughly equal to the number of transitions in DEVS.

2.3 VISUALIZATION WITH ACTIVITY MODELING

The concept of ‘Activity Modeling’ introduced here is with reference to

visualization of time-varying data. Two terms related to Activity; Instantaneous Activity

and the Accumulated Activity are introduced here. Both the terms are defined with respect

to a particular cell spanning the range of temporal domain.

The Instantaneous Activity of a cell for a particular time-step is defined as the

absolute difference between values of that cell for successive time steps and is illustrated

in the equation below.

iiiiii ttmmttivativeAverageDer −−= +++ 111 /||),(

qTActivityqTssresholdCroNumberOfTh /)(),( =

8

Instantaneous Activity (t) (IA) = )1()( −− tValuetValue

The Accumulated Activity of a cell for a particular time-step is defined as the sum

of Instantaneous Activities of that cell for that particular time-step as well as for all the

preceding time-steps.

Accumulated Activity (T) (AA) = � −−T

t

tValuetValue |)1()(|

The data is initially said to be present in Value domain. When the data is

represented in the form of Activity related information, it is said to be present in Activity

domain. The data can be processed either for a particular cell at a time (Cellular domain)

or for a particular time-step (Temporal domain).

The Activity concept is based on certain guidelines which all the processes

normally abide to. The most common of such guidelines are:

��Spatial coherency: If a cell is active, there is a high probability that its neighboring

cells are active. Similarly, if a cell is passive, there is a high probability that its

neighboring cells are passive.

��Temporal Coherency: If a cell is active for some time-step, there is a high probability

that it would remain active for the next time-step.

9

3 IMPLEMENTATION

3.1 AN OVERVIEW OF THE ‘ACTIVITY-MODELING’ SYSTEM

The ‘Activity-Modeling’ system can be considered as a black-box, which takes

raw time-varying data, generated from some process, as an input and provides us with a

detailed analysis of the process, with regards its ‘activity’. The results generated are

visualized using the visualization modules, which are part of the system. An effort has

been made to automate the whole process so that the system would run without human

intervention.

The system consists of three stages, which are illustrated in Figure 2. The first

stage, which is described later in this chapter, has a primary purpose of checking the

integrity of the data and converting the raw-data to a format known to its successive

stages. It consists of a formatter written in PERL, which also provides results related to

the structure of the data-file used as input in the next stage.

The second stage comprises the ‘Activity Engine’ generates the activity related

information and analyzes the data-pattern of the process. This stage is implemented with

C++ modules and also provides optimization flags and command-line arguments, which

help in the automation of the process as described in Chapter 7.

The third stage is dedicated to visualization of the results using pre-built

visualization modules in GNUPLOT and AVS-Express.

The system is embedded with benchmarking modules to trace the computation

time for various modules and also has a pre-defined directory structure and a parent

‘Activity_modeler’ script which provides automation of the system via an easy user

10

interface. At present the system supports time-varying data which is 1D or 2D in the

spatial domain.

Figure 2: ‘Activity-Modeling’ System

3.2 PRE-VISUALIZATION PROCESSING BY PERL MODULES The raw-data provided at the input of the system can take various formats and

needs to be processed for using in our system. It may be corrupted during its transfer

from some remote-location. The user may not be provided with any information related

to the data file. Due to the large variety of formats the raw-data can be present in, there

are few limitations present for our PERL formatter.

��Data-files should be present in ASCII format.

Raw- Data

Formatted Data

Results

Stage-1

Results

Activity Data

Stage-2 Stage-3

PERL formatter

ACTIVITY ENGINE

(optional) PERL formatter

GNUPLOT MODULES

GNUPLOT MODULES

AVS- EXPRESS MODULES

11

��The cells should be of a regular structured format, which is described in 3.2.1.

��The comments should be present only before the start of data.

��In case of a 2D space, data for different time steps should be present in separate files

with (file) naming convention of the form ‘<fnamebase>#timeStep.fnameEnd’.

This stage has mainly three functions. Firstly, it provides information regarding

the number of cells and time-steps to the Activity Engine module. Secondly, it formats the

data in desired format (required by the Activity Engine), for 1D and 2D space (cellular).

Third, it provides an error-checking module, which automatically provides correction

logic and corrects the data file incase it has been corrupted.

The PERL modules also perform auxiliary functions like extracting only a

specific part of the data file or generating a transpose of the file.

3.2.1 REGULAR STRUCTURED FORMAT

For the sake of standardization, the data is required to be in a particular format as

described below. This format is called the ‘Regular Structured format’, following the

naming conventions of AVS-Express. It is characterized by the following features:

��Every value corresponds to a particular cell

��Every cell is same in size, is quad-shaped; has eight neighbors (4 on sides + 4 on

diagonals) for 2D and two neighbors for 1D.

��Connectivity information is implicit in the data and should not be explicitly mentioned.

With the above features as a base, there are some significant variations in the way

data is presented for 1D and 2D space.

12

For 1D space, all the data is specified in one single file, with rows representing

the cells and columns representing the time-steps as shown in Figure 3. The format was

designed for providing efficiency in computational speed considering the fact the C++

processes arrays in a row-major order and the activity information is present in time-

steps.

Figure 3: Regular structured format for 1D data

For 2D space, normally a data for each time-step is present in a separate file.

Since the Activity Engine requires data for successive time-steps, the PERL formatter

concatenates the data from every file to a single file with the data present in the Regular

structured format. This format essentially helps the Activity Engine to handle the trade-

off between CPU utilization and memory usage wisely. For larger time-steps it also has

the advantage of decreasing the computation time by eliminating the time spent in IO

operations for files. The disadvantage would be that this method would impose

restrictions to visualization for a large number of time-steps, however this disadvantage is

taken care of by the visualization scripts and post processing of resultant data using

PERL modules. The concatenation and expansion operation for 2D data, illustrated in

Figure 4, is carried out the pre-processing and post-processing modules in PERL.

Time-steps

Cells

Foo.txt

13

Figure 4: Operation of 2D formatter

3.2.2 CORRECTION LOGIC (PRE-VISUALIZATION)

The Correction logic provides an error-checking algorithm described below. It is

built on few assumptions about the structure of the data file and can be changed when

data files are of a different structure than our assumptions.

cells

Time-steps

Foo.txt

rows

cols

Foo0.txt

rows

cols

Foo2.txt

rows

cols

Foo1.txt

PERL formatter2D and transpose module

Concat operation

Expand operation

Time-Step 0 Time-Step 1 Time-Step n

14

By default, we assume that the data file abides by the rule that the rows of our

data files represent the cells and columns the increasing time-steps, as described in 3.2.1.

The correction-logic algorithm is as follows:

Algorithm 3.1: Step1:

Compute the number of lines (nCells) Compute the number of values (nValues) If(nValues%nCells is 0)

Correction logic not required, nTSteps = nValues / nCells Else

Correction logic required, goto Step2 Step2: (Correction Logic)

Generate a 1D diagnostic array: For(cell:= 0-nCells)

diagArr[cell] = number of time-steps found for that cell Generate a frequency table (freq_table(time-steps)) which calculates the number of occurrences of the total number of time-steps. Maximum(freq_table) := maxAt For (each line in data)

If(number of Elements eq maxAt) leave the row unmodified

Else if((number of Elements < maxAt) Replicate the last time-step value for the missing values

Else Truncate the values till maxAt.

The PERL scripts to implement the above logic are formatter.pl (1D) and

formatter2D.pl (2D) and have various command line options to make automation of the

process easier as discussed in Chapter 7. Extensive use of PERL’s pattern-matching,

regular expressions and array processing has been made in the formatter to increase the

computational speed.

15

3.2.3 DATA GENERATION USING PERL SCRIPTS

A PERL script called dataGenerator.pl is used to generate 1D and 2D data files,

which were used as inputs to the Activity Engine during the training stage of the system.

They produce test data generated by mathematical equations and have great flexibility in

terms of the nature of data to be produced.

The script generates data based on,

1. Dimensions of data (1D, 2D. 3D)

2. data-type to output – float OR integer

3. nature of data to output – random OR user-defined function f(x,t) and f(x,y,t)

The significance of this script was to generate known patterns of data and verify if

the Activity Engine produced desirable results during the training stage of development of

our system.

16

4 THE ACTIVITY ENGINE

4.1 THE ACTIVITY ENGINE- AN OVERVIEW

The Activity Engine, written in C++, is the heart of the visualization system. It is

used primarily to generate the activity related information from the data-file, which is

later on visualized using modules in GNUPLOT and AVS-Express. It is also used to

extract statistical information from the data and analyze the activity-pattern for the whole

process.

A functional block diagram of the Activity Engine is shown in Figure 5 with

regards to its interaction with the other modules in the system.

Figure 5: The Activity Engine

Activity Generator

Pattern Predictor

Statistic Analyzer

Activity Time-Services

Activity Log

The Activity Engine

PERL Formatter

--------------------

GNUPLOT scripts

AVS-Express modules

--------------------

--------------------

--------------------

--------------------

Data-file Pattern Information

Statistical Information

Activity Data

Data Engine

17

The Activity Engine was first conceived with the purpose of generating

Instantaneous Activity and Accumulated Activity. Later on two additional modules were

incorporated for a thorough analysis of the data in the Activity Domain.

As seen in Figure 5, the Data Engine module forms the input interface to the

Activity Engine. It passes the extracted data to the Activity Generator module, which

switches the mode of operation from the Value domain to the Activity domain. The

Statistic Analyzer and Pattern Predictor modules, then, act in the Activity domain to

produce a number of results in the Cellular and temporal domain.

4.2 THE DATA ENGINE

The Data Engine forms the interface to the Activity Engine and deals with file

handling, memory allocation, de-allocation and data-processing for the other modules in

the Activity Engine. Input data is read by this module either sequentially or via random

access logic (reading a particular row out-of-sequence). Apart from acting as a reader it

also performs the following functions:

��Generating transpose to switch between domains (Temporal and Cellular).

��Transformation between 2D and 1D arrays for analysis of 2D cellular data.

��Tracking the size of files in bytes to keep track of memory usage.

4.3 THE ACTIVITY GENERATOR

Activity Generator is the primary module in the system, used to switch the mode

of operation from Value domain to Activity domain. In Activity domain it is used to

18

calculate the Instantaneous Activity and the Accumulated Activity of the cells. Based on

the Instantaneous Activities in the process, it is also used to calculate the Time Advances

for a cell by finding the reciprocal of the Instantaneous Activities. Apart from that it is

also used to find out the Activity Factor.

4.3.1 ACTIVITY FACTOR

Activity Factor is defined as the fraction of total time-steps the Instantaneous

Activity (IA) remains greater than a particular threshold set by the user. The threshold is

generally a fraction of the maximum IA values of the process. Activity Factor is used to

find the process behavior in the Cellular domain. In other words, it denotes where the

activity is concentrated in the Cellular domain.

Activity Factor (AF) = tepsTotalTimeSthresholdtIAnTimeSteps ))(( >

In the above equation, the threshold value can also be used to eliminate

insignificant amounts of activity values in the process generated due to errors in

simulations, which generate the data. A meaningful threshold value, normally set

oblivious to the user, is the global average of Instantaneous Activity taken over both the

Temporal and Cellular domain.

Threshold (AF) = NT

txIAt x�� ),(

19

4.3.2 SIGNED INSTANTANEOUS ACTIVITY

The concept of signed Instantaneous Activity (sIA) is introduced here, which

violates the definition of Activity in some extent.

signed Instantaneous Activity (sIA) = ][][ tValuettValue −∆+

As seen above the sign is preserved. Signed Instantaneous Activity is used to

generate find information related to Peaks in the temporal domain using a simple

algorithm illustrated below.

Figure 6: signed Instantaneous Activity

Algorithm 4.1: if( (signed_activity[i-1] >=0) && (signed_activity[i] < 0) )

peaks++; // Increase the peak count else if( (signed_activity[i-1] <=0) && (signed_activity[i] > 0) )

bases++; // Increase the base count

The concept of signed Instantaneous Activity is used to simplify the problem of

finding the peaks. As seen from the figure it is evident that whenever the activity curve

Values

Signed Instantaneous Activity

Instantaneous Activity

Time

20

cuts the time axis, depending on the direction in which the curve cuts the axis, there is a

corresponding peak or base in the output values.

4.4 THE STATISTIC ANALYZER

The Statistic Analyzer module, as the name suggests is used to extract various

statistics from the data-file. This module is expandable and at present supports three

specific groups of statistics namely:

1. Maximum, Minimum, Range

2. Average, Standard deviation, Mean, Geometric mean, harmonic.

3. Living Factor

4. Histogram of Time Advances

A flag present in the command-line options of the Activity Engine, called the fast

flag, is used to increase the computational speed. One way to achieve the increase in

speed is by disabling the group of statistics present in the Statistic Analyzer module,

which tend to be compute-intensive for floating-point values.

The Statistic Analyzer acts in the Activity domain processing values for each cell

at a time (Cellular domain) or each time-step at a time (Temporal domain), depending on

which group of information it is generating. Statistics Analyzer is also used to provide the

visualization system with a set of known values, like the global and local average of IA

curves, which are provided as inputs to other modules in the system. For example the

Living Factor is normally provided by a local average of IA values while the Activity

21

Factor is provided by a global average of activity values.

4.4.1 LIVING FACTOR

Living Factor is defined as the fraction of total cells, the Instantaneous Activity

(IA) remains greater than a particular threshold for a particular time-step. The threshold is

generally a fraction of the maximum IA values of the process. Living Factor is similar to

Activity factor, only that the two act in different domains.

LivingFactor (LF) = TotalCells

thresholdtIAnCells ))(( >

The Living factor is used in the temporal domain and the gives results for a

particular time-step. In simple words, as the name suggests, it signifies the fraction of

cells living for a particular time-step.

In the above equation, a threshold value has been introduced to eliminate the

effect of possible noise in the data (insignificant activities). The process can be termed as

flattening the IA curve, considering that applying threshold would remove (flatten) the

insignificant maxima and minima in IA curve. A meaningful threshold would be the local

average of IA values taken over the cellular domain.

4.4.2 HISTOGRAM OF TIME ADVANCES

The Statistical Analyzer module also computes the histogram of time advances

(Tavd) for the process to give an overall picture of the speed of the process, with respect

to the time taken to produce the next event (Tadv). The user can give the number of

22

histogram steps at the input and the module will compute the distribution of cells over the

temporal domain. Since various simulations can take a very large range of time advances

the histogram is plotted over the logarithmic scale. Time advances are calculated from the

IA curve as they are nothing but the reciprocal of instantaneous activities. Algorithm 4.1

computes the histogram as shown below

Algorithm 4.2: (Histogram for tadv) For(cell:=0; cell<Cells; cell++) Tadv[cell] = 1/IA[cell] Initialize an array for Histogram (hist[STEP_SIZE]) to all zeros MaxLog := log10(Max(Tadv)/Min(Tadv)) MinLog := log10(Min(Tadv)/Min(Tadv)) = 1 If(Tadv[cell] == 0) hist[STEP_SIZE-1]++ Else hist[int (log10 (Tadv[cell]/Min) ) ]++

The formula in the above algorithm enables us to visualize a very high range of time

advances of the order of ( ) 910)( ≈tadvMintadvMax

A visual representation of the working of the Statistic Analyzer is shown in

Figure 7, with the help of the domain concept.

23

Figure 7: The Statistic Analyzer

The results generated for each group are present in a particular format, which are

fed along with other related information to the visualization scripts to visualize the results

in the form of images and movies.

4.5 THE PATTERN PREDICTOR

The Pattern Predictor module works only in the temporal domain and is most

intelligent of all the modules in the system. It analyzes the activity of the process using

the phenomenon of spatial coherence and makes and attempt to predict futuristic behavior

of the process depending on the phenomenon of temporal coherence.

Pattern predictor introduces two new concepts called as trace locator and Region

of Imminence (ROI), which are used to monitor the activity for the present time-step and

Value Domain Activity Domain

Activity Generator

Statistic Analyzer

Cellular Domain

Temporal Domain

--------------------

Data-file

--------------------

Data-file

--------------------

Group3

--------------------

Group1

--------------------

Group2

24

give predictive results regarding, what the activity at the next the time-step, would be in

the future. It also introduces a novel method to find out the Activity related information

using peaks, applied in the Value domain. The Pattern predictor module in essence is not

meant to generate data for visualization, but to predict futuristic behavior of the process

and analyze the present behavior in the temporal domain.

4.5.1 PEAK-BASE METHOD

The conventional method to find Accumulated Activity is to add the

Instantaneous Activity for successive time-steps as show below:

Accumulated Activity (T) = || 11

−−� i

T

i mm

As seen clearly, the operation would require 2T operations. Since accumulated

activity is normally required only for the final time-step, there can be a more efficient

way to find out the accumulated activity if we are presented with some additional

information related to the Instantaneous Activity (IA) curve. This additional information

should be provided in form of peaks and bases present in the IA curve.

As seen in Figure 8, a significant savings in computations can be achieved by the

presence of the Peak-Base information of the process.

25

Figure 8: Peak-Base Method to find Accumulated Activity

In Figure 8, we see that there are three pairs of Peak and Base in the

Instantaneous Activity curve. The Accumulated Activity can be calculated by the adding

the difference in the successive Peaks and Bases as shown by the formula below

Accumulated Activity (T) = |)||(| 1 iiii i BasePeakBasePeak −+− +�

The case for boundary condition is not shown in the above formula. For example

the formula denoted the computations going on for the intermediate Peaks and Bases,

while it has no idea whether the IA curve starts or ends in a Peak or Base.

Another constraint to the above formula is that the Peaks and Bases should always

alternate (NPeaks = NBases Or NBases +/- 1). The presence of false peaks would cause

an error as seen in Figure 9, since there is expected to be a Base in between Peak#2 and

Time Steps

Instantaneous Activity

T = 35

Peak#1 Peak#2

Peak#3

Base#3

Base#2

Base#1

Peak-Base Method

Conventional Method Computations = 2T = 70

Computations = 2 x 3 = 6

26

Peak#3.

Figure 9: False-Peak problem

The false peak problem can be eliminated by slightly modifying the algorithm to

find the Peaks. However, in the actual implementation of the Peak-Base formula, both the

problems mentioned above are taken care off by using a single array consisting of the

Peak-Base Values, which takes care of the boundary conditions as well as the false Peak

problem.

One thing to note is that the information provided by Peaks and Bases causes some

overhead in terms of memory usage. In processes where the IA curve is not characterized

by frequent fluctuations, the Peak-Base method would not be recommended to calculate

the Accumulated Activity. However, if the fluctuations in the IA curve are known to be

caused by noise, the concept of threshold can be used to flatten the IA curve as described

above.

4.5.2 REGION OF IMMINENCE (ROI)

Region of Imminence is the region of cells defined in the temporal domain, which

P#1 P#1

P#2

P#3 P#2

B#1 B#2

B#1 B#2

False Peak

27

is predicted to be active for the following time-step, considering normal process behavior.

In other words, ROI consists of the cell numbers whose Instantaneous Activity is

predicted to be greater than the threshold set for the following time-step.

ROI is mainly used to analyze process behavior and requires information

regarding the Peak locations and Values.

Figure 10 illustrates the concept of ROI and how it is related to the Peak

information of an IA curve in the temporal domain. Any process that attains a state of

equilibrium is defined to be normal. For such a process, the Instantaneous Activity should

approach zero as the process ends. This would mean that the ROI should comprise of all

the cells at t =T for even a single peak present.

Since the ROI information is extracted from data in the Cellular domain and

Activity domain, it makes use of both; spatial and temporal coherency described in

Chapter 2.

Figure 10: Region of Imminence

t = 0

t = n

t = T

28

The algorithm to find the ROI for a 2D process would be:

Algorithm 4.3: (ROI) Compute Peak Values and locations as described in Equations 4.1from IA curve For(peak := 1 – Number of Peaks)

Threshold = (e*PeakValue[peak]) leftNeighbor = rightNeighbor = PeakLocation[peak] ..For 1D left = right = Peaks2D[peak].y, up = down = Peaks2D[peak].x ..For 2D

Compute the neighboring locations based on the threshold conditions and the boundary conditions described in scanning algorithm4.4 For(cell := leftNeighbor - rightNeighbor) ..For 1D ROI[cell] = 1 For (row:=left - right) ..For 2D For (col := up - down) If(IA[row][col] > Threshold) ROI2D[row][col] = 1

The above algorithm requires a way to find the Peaks and to scan the IA curve

(1D) or the IA surface (2D) to find out the ROI. Finding the peaks is described below in

4.5.4, while scanning the IA curve (and IA surface) would be described now.

For 1D space, finding the ROI from the IA curve requires scanning the curve only

in 1 dimension since every cell has only 2 neighbors (and the ones at corners have 1). The

procedure of scanning the IA curve is described using the Figure 11, which uses the

following algorithm. Since every peak in the IA surface needs to be scanned for

imminent regions, this algorithm is compute-intensive and is more optimized for CPU

time then memory.

Algorithm 4.3 (a) For 1D: Scan to the left of Peak and compute leftNeighbor based on

Boundary condition (leftNeighbor > 0) and Threshold condition (IA[leftNeighbor] >= Threshold)

29

&& (IA[leftNeighbor-1] < Threshold) Scan to the right of Peak and compute rightNeighbor based on

Boundary condition (rightNeighbor < nCells-1) and threshold condition (IA[rightNeighbor] >= Threshold)

&& (IA[rightNeighbor-1] < Threshold.

Figure 11: Scanning IA curve – to find ROI in 1D

For 2D space, scanning the ROI surface uses a similar approach, however, since

the space is 2D, the number of neighbors present are 4 (if you consider only the edges)

and additional 4 (considering the neighbors diagonally at the corners). Also the boundary

conditions need to be taken care of at the four edges of the bounding box defining the 2D

space. The algorithm used to find the ROI is on the similar lines to that for 1D:

Algorithm 4.3 (b) For2D: While(IA[row][left] > threshold) and (left>0)

Decrement left While(IA[row][right] > threshold) and (right<Col)

Increment right While(IA[up][col] > threshold) and (up>0)

Decrement up While(IA[down][col] > threshold) and (down<Row-1)

Cells Threshold condition

Boundary condition

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

Peak Under consideration: 2 Location of cell: 10 Initial Values: Left-neighbor = rightneighbor = 10 Final Values: Left-neighbor = 7 Right-neighbor = 12

30

Increment down While( (IA[row-upleft][col-upleft] > threshold) and

((row-upleft)>0) and ((col-upleft)>0)) ROI2D[row-upleft][col-upleft] = 1; upleft++;

While((Values2D[row+downleft][col-downleft] > threshold) and ((col-downleft)>0) and ((row+downleft)<Rows-1))

ROI2D[row+downleft][col-downleft] = 1; downleft++; While((Values2D[row-upright][col+upright] > threshold) and

((row-upright)>0) and ((col+upright)<Cols-1)) ROI2D[row-upright][col+upright] = 1;upright++; While((Values2D[row+downright][col+downright] > threshold)

and ((row+downright)<rows-1) and ((col+downright)<Cols-1)) ROI2D[row+downright][col+downright] = 1;downright++;

Figure 12 shows the scanning algorithm for IA surface. The cells colored in black

are the Peak values at that particular time-step. The cells colored in dark-grey are the

cells covered in the ROI and those colored light-grey are the cells not traced by the ROI

scanning algorithm, however theoretically speaking they are in the ROI. Lastly, the

uncolored cells are the also the cells not belonging to the ROI, however the should not be

mistaken to be inactive.

31

Figure 12: Scanning IA surface – to find ROI in 2D

The Step 4 in ROI algorithm can be modified to encompass the cells colored in

light-grey too, however that would be at the cost of the simplicity of the present

algorithm.

Finding the ROI is normally a compute-intensive process in case a significant

number of peaks present. There are some modifications made to the above scanning

algorithm for a 2D spatial data to speed up the algorithm depending on few assumptions.

There are in all four levels of scanning, based on how the algorithm is implemented.

Based on those algorithms there are four levels of tuning in ROI:

i. Coarse tuning: Only the horizontal and vertical neighbors are scanned.

ii. Normal tuning: In addition to the horizontal and vertical even the diagonal neighbors

are scanned. Described in Figure 12.

2

1 2 3 1

1

2 1 1 1 2

1 1

2

2 1 1 1

1 1 1 1 1

3

2

3

2 2 3

4 3

2 2 1 1

1 2 1 1

2 1

2 1 3 1

2

Cells not in ROI

ROI

Cells not covered by ROI

Peaks

32

iii. Fine tuning: After normal tuning, the region with in the horizontal and vertical limits is

scanned to cover the light-grey cells shown in Figure 12. This algorithm depends on

the assumption that most of the cells not covered in normal tuning are within the limits

of the horizontal and vertical ROI limits.

iv. Finest tuning: This is the most comprehensive algorithm of all and depends on a

recursive method to consider every imminent neighbor as a peak. However, this turns

out to be highly inefficient for data with large number of cells and produces results

slightly better than the previous algorithm.

A visual comparison of the first three scanning algorithms for a classical 2D

diffusion problem (time step -15) is given for in Figure 13 as well as a comparison with

respect to the Imminent Factor and computation speed is given in Table 1.

33

Figure 13: Visual Comparison of scanning algorithms for 2D diffusion problem

Scanning Algorithm

Computation Time (ms)

Imminence Factor (t=5)



Coarse 5393 0.070900 0.143200 0.097400 Normal 6224 0.098500 0.189600 0.139500 Fine 6456 0.150500 0.254700 0.332700

Table 1: Computation time and Imminence Factor for scanning algorithms

ROI not only serves as a predictive characteristic of the process but also provides

information regarding the region where bulk of Activity is going on for a particular time-

step. This information would be of use in resource allocation algorithms where the

system needs to find the active regions for a particular time-step.

4.5.3 TRACE LOCATOR

Trace Locator is an algorithm used to trace the locations of the cells having the

same values over the period of time. In many applications it may be useful to trace the

locations of the maximum value over a period of time-step, to see where the activity of a

process is concentrated.

We can get the similar information monitoring the Peak Values. However, it may

be possible that the peak values do not represent the actual concentration of activity when

there is a significant difference in the Peak Values for a particular time-step. Figure 14

shows an example when Trace Locator for maximum values can be more useful to

represent the activity of the process (as compared to a Peak locator)

34

Figure 14: Difference between Trace Locator and Peak Locator

As illustrated in Figure 14, if the values of peaks are differing significantly or if

the peaks with smaller values are appearing significantly more times, spread out over the

whole range of space, it would be more sensible to track only the values greater than a

particular threshold set in the Trace Locator modules (for e.g. 0.9 x maximum values).

It is mainly used to find the most active regions during a particular time-step. It

can be used for a number of applications, which require knowledge regarding the shift of

activity and to characterize the whole process on the basis of activity.

4.5.4 PEAK LOCATOR

The Pattern Predictor module computes information required for various modules

like ROI, based on the peaks found in the IA curve (or surface). It is one of most basic

functionality provided by the Pattern Predictor module. The Peak Locator as the name

suggests stores the values and locations of the peaks in dynamically formed arrays. This

requires two runs through the values, one for finding the number of peaks, and the other

to store the Peak Values and locations. For 1D space, the algorithm to find peaks should

make two comparisons since every cell has 2 neighbors, while in 2D the algorithm should

Cells

Peaks

Maximum Value

35

make eight comparisons since every cell has 8 neighbors. The cells present at the

boundaries should be treated differently with lesser comparisons being made as shown in

the equations below.

Equations 4.1: For 1D peak is present if i. [ ] [ ] [ ] [ ])1&(&)1( +≥−> cellValcellValcellValcellVal … for internal cells

ii. [ ] [ ]10 ValVal > … boundary condition 1 iii. [ ] [ ]11 −>− CellsValCellsVal … boundary condition 2 For 2D peak is present if

i.

[ ][ ] [ ][ ]( ) [ ][ ] [ ][ ]( )[ ][ ] [ ][ ]( ) [ ][ ] [ ][ ]( )[ ][ ] [ ][ ]( ) [ ][ ] [ ][ ]( )[ ][ ] [ ][ ]( ) [ ][ ] [ ][ ]( )1122&&1122

&&22&&1122&&1122&&122

&&122&&122

++>−+>+≥+−>

−−>−>−>+≥

colrowDValcolrowDValcolrowDValcolrowDVal




… for internal cells ii. [ ][ ] [ ][ ]( ) [ ][ ] [ ][ ]( ) [ ][ ] [ ][ ]( )112002&&012002&&102002 DValDValDValDValDValDVal >>>

… boundary conditions (corner)

iii.[ ][ ] [ ][ ]( ) [ ][ ] [ ][ ]( )[ ][ ] [ ][ ]( ) [ ][ ] [ ][ ]( )[ ][ ] [ ][ ]( )11202

&&1202&&11202&&10202&&10202

+>>−>

+>−>

colDValcolDVal

colDValcolDValcolDValcolDVal

colDValcolDValcolDValcolDVal

… boundary condition (edge)

The Peak Locator algorithm takes care of the false peak problem mentioned

above. One common problem faced in tracing peaks is detection of a region of cells,

which all have some maximum values and are located adjacent to each other. In such case

a center cell will have all its eight neighbors with the same values and the equalities in

the above equation will not hold true. In this case, we ignore the peak from the peak

locator module, however it is found out by the ROI module during another run of

comparison through the cells. The false peak problem is illustrated in Figure 16 below for

a 1D heat diffusion problem. Since the IA curve for our 1D diffusion problem, was found

36

to be characterized with a very small slope in the cellular domain, it was hard to detect

the true peaks values using Equation 4.1 (i-a)

[ ] [ ] [ ] [ ])1&(&)1( +≥−> cellValcellValcellValcellVal …Equation 4.1 (i)-a

The above equation detected peaks, biased to the cells with larger locations as shown in

Figure 15 (a). If we change the above equation with respect to its inequalities as

[ ] [ ]( ) [ ] [ ]( )1&&1 +>−≥ cellValcellValcellValcellVal …Equation 4.1 (i)-b

, we find that it detects false peaks biased to the cells with smaller locations as shown in

Figure 15 (b). If the above equation is changed to

[ ] [ ]( ) [ ] [ ]( )1&&1 +>−> cellValcellValcellValcellVal …Equation 4.1 (i)-c

, we find that detection algorithm becomes very conservative and some of the true peaks

are also missed out as seen in Figure 15 (c), while equation

[ ] [ ]( ) [ ] [ ]( )1&&1 +≥−≥ cellValcellValcellValcellVal …Equation 4.1 (i)-d

, detects the false peaks for the full range of cellular domain as seen in Figure 15 (d).

(a) (b)

37

(c) (d)

Figure 15: False Peak problem for 1D diffusion problem

The following algorithm takes care of the problem illustrated above and gives a true

representation of the ROI as seen in the Figure 16

Algorithm 4.4 Step 1: Find the ROI-a using Algorithm 4.3 and Equation 4.1 (i)-a Step 2: Find the ROI-b using Algorithm 4.3 and Equation 4.1 (i)-b Step 3: ROI: = ROI-a && ROI-b (over the cellular domain)

Figure 16: Approximated ROI using Algorithm 4.4

38

4.5.5 PREDICTING THE PATTERN FOR 1D SPACE

The Pattern Predictor module also has the logic to predict the activity pattern of

the process under consideration for a future time-step on the basis of the activity pattern

at the present time-step. The theory of operation is based on extracting possible patterns

in activity from two successive time-steps at an earlier time and cross-checking the

activity pattern with an activity pattern in the future.

The activity pattern fed to the input of this module is generated by the ROI

module with an epsilon (e) value in the range [0.9-0.95]. The reason why the e is set at

the higher end is to discard the unwanted peaks which tend to predict false patterns. Since

the data present in ROI module is in the form of 0s and 1s, it is preprocessed by a Linear

Span module which detects the region of 1s in the data and saves them in the form of

(start Cell, length of region) as illustrated in algorithm 4.4.

Algorithm 4.5: (Linear Span) startCell = 0, cell = 0, Intialize Lspan[cell] While(cell < nCells) If (ROI [cell]) startCell := cell while(ROI[cell] and (cell < nCells) ) Increment cell

Lspan[startCell]++ Else Increment cell

During the first step of the algorithm, the output of the Linear Span modules for

t=t0 and t=t1 are processed using the pattern predictor algorithm and the detected

patterns are saved in a data structure. In the second step of the algorithm, a possible

pattern is extracted for the time t=tn (for n- [t2-T]). In the third step of the algorithm a

39

sanity check is made by comparing the active region predicted by the module and the

actual active region computed by the ROI at t=t2. If the actual ROI is significantly

different than the predicted ROI, it is re-computed for the next sample (t=t1 and t=t2).

The process of predicting the temporal pattern can be illustrated with the algorithm.

Algorithm 4.6: (Predict Pattern) Step1: ta = t1, tb = t2, set ROI (e=0.9)

Compute ROI(ta) and ROI(tb) as illustrated in algorithm 4.2 Compute LinearSpan(ta) and LinearSpan(tb) as shown in algorithm 4.4

Step2: Start the prediction process by detecting patterns from LinearSpan(ta) and LinearSpan(tb) to produce predictedROI(t) as shown in equation 4.2.

Step3: Compare the ROI(tb+1) and the predictedROI(tb+1). If(results are satisfactory)

End Else

ta = ta+1, tb = tb+1, check boundary conditions and goto Step1

The prediction process consists of two steps. First step consists of scanning for

predefined pattern present in the repository of patterns and the second step consists of

computing the ROI for the later time-steps. The pattern found is stored in a data structure

characterized by the attributes such as its direction, offset in location and type of pattern

(increasing or decreasing). First order patterns are defined to be the ones in which either

of the neighboring cells at t=tb posses a non-zero value and may contribute to a pre-

defined pattern. Second order patterns are defined to be those which have all its

neighboring cells with zero values but has some high non-zero value at t=tb which may

contribute to a pre-defined pattern from our repository. The concept of the order in

patterns in illustrated in Figure 17 below.

40

Figure 17: Concept of order in the Predict Pattern module

Based on the order, direction and type of patterns there are 5 first order patterns

and 2 second order patterns. The attributes of a pattern are explained with the help of the

examples in Figure 18.

Figure 18: Attributes of the Pattern data-structure

Table 2 and Table 3 provide a detailed description of the pre-defined possible

pattern in the repository and its classification on the basis of order, type and direction of

n 0

0 n

Offset = +1

n 0

n 0

Offset = 0

0 n

n 0

Offset = -1

t = ta t = tb

n 0

0 m

diff = m - n

t = ta t = tb

1 1 1 0 1 0 0 00 0 1 1 0 1 0 0

3 0 0 0 1 0 0 00 0 2 0 0 1 0 0

t = ta t = tb

t = ta t = tb

ROI(ta) and ROI(tb)

Lspan(ta) and Lspan(tb)

First order pattern Second order pattern

41

the pattern. Note that the data structure in which the pattern is stored also has information

about the location (x0) and the activity value at the cell (m0), however since these values

are independent of the type of pattern detected they are not covered in the tables below.

Type

Example Constraint Direction (dir)

Offset (off) Difference (d)

1

no constraints Down 0 0

2

no constraints Left -1 0

3

|m - n| <= 2 Right 1 d = n – m

4

|m-n| <= 2 Right 0 d = m – n

5

|m-n| <= 2 Right-Left -1 d = m - n

Table 2: Type of First order patterns

Type

Example Constraint Direction (dir)

Offset (off) Difference (d)

1

|m - n| <= 2 Right L(n) – L(m) n – m

2

|m - n| <= 2 Left L(n) – L(m) n- m

Table 3: Type of Second order patterns

0 0 m n 0 0

m 0

m

m m

n n

n n

m 0 0 0 0 n

42

The second step in the module consists of generating the ROI for a time-step t=tn;

it can be generalized for all the patterns depending on attributes such as difference and

direction. The algorithm 4.6 describes the process of generating the ROI(t = tn).

Algorithm 4.6: If(difference eq 0) If(direction is down or right) cell = x0 + off*(tn-t0) while( (cell < Cells) and (cell < x0+off*(tn-t0)+m0) ) ROI[tn][cell] = 1, Increment cell Else If(direction is left) cell = x0 + off*(tn-t0) + m0 while( (cell >= 0) and (cell > x0 + off*(tn-t0)) ROI[tn][cell] = 1, Decrement cell Else If(difference > 0) If(direction is down or right) cell = x0 + off*(tn-t0) while( (cell < Cells) and (cell < x0+off*(tn-t0)+d*(tn-t0)) ) ROI[tn][cell] = 1, Increment cell Else If(direction is left) cell = x0 + off*(tn-t0) + d*(tn-t0) while( (cell >= 0) and (cell > x0 + off*(tn-t0)) ROI[tn][cell] = 1, Decrement cell Else If(direction is undefined) cell = x0 while( (cell >= 0) and (cell > x0 + off*(tn-t0)) ROI[tn][cell] = 1, Decrement cell cell = x0 while( (cell < Cells) and (cell < x0 + off*(tn-t0) + d*(tn-t0)) ROI[tn][cell] = 1, Increment cell Else //If(difference < 0) If(direction is down or right) cell = x0 + off*(tn-t0) while( (cell < Cells) and (cell < x0+off*(tn-t0)+d*(tn-t0) + m0) ) ROI[tn][cell] = 1, Increment cell Else If(direction is left) cell = x0 + off*(tn-t0) + d*(tn-t0) + m0 while( (cell >= 0) and (cell > x0 + off*(tn-t0)) ROI[tn][cell] = 1, Decrement cell Else If(direction is undefined) cell = x0 + off*(tn-t0) + m0 + d*(tn-t0) while( (cell >= 0) and (cell > x0 + off*(tn-t0)) ROI[tn][cell] = 1, Decrement cell

43

It is found that for processes whose ROI is characterized by a non-linear curve,

the predictor module doesn’t work well. However, it gives significantly close results for

processes with more or less linear ROI curves.

44

5 VISUALIZATION

Two visualization softwares- GNUPLOT and AVS-Express were used for

visualization purpose. Both have their advantages and reasons why they are used during

that particular stage. A brief description follows regarding the two softwares and how

they were used to visualize the data files.

5.1 GNUPLOT

5.1.1 INTRODUCTION

GNUPLOT is a portable command-line driven interactive data-file (text or binary)

and function plotting utility for UNIX, IBM OS/2, MS Windows, DOS, Apple

Macintosh, VMS, Atari and many other platforms. The software is copyrighted but freely

distributed.

It was originally intended as graphical program which would allow scientists and

students to visualize mathematical functions and data. In 2D, it can draw line, point, dot,

box, histogram graphs or vector fields. In 3D, it supports line, point and dot surfaces,

with or without hidden line removal. It supports color or grayscale surfaces and maps.

The GNUPLOT version 3.7 for 32-bit Windows was used for visualization.

GNUPLOT was selected for the preliminary and intermediate visualization of the data

files due to the following reasons:

��Simplicity of use and portable over variety of platforms

��Presence of interactive mode and batch mode to provide flexibility of usage

45

��GNUPLOT scripting which allows automation of the system and customization for a

particular data-file.

��GNUPLOT v3.7 onwards has an additional flag to the ‘splot’ function called the

‘matrix’ flag, which is used to visualize data represented in a 2D format as a matrix.

Incidentally, this is the same format used as input to the Activity Engine module.

��Variety of features can be enabled and disabled by a simple ‘set’ commands.

��Lastly, the visualized images can be saved in a variety of different formats.

5.1.2 SPLOT FUNCTION

The splot (surface plot) function is used for the preliminary visualization of data-

files generated by the Activity Engine. splot can display a surface as a collection of

points, or by connecting those points. The points may be read from a data file (as in our

case) or result from evaluation of a function at specified intervals. The surface may be

approximated by connecting the points with straight line segments, in which case the

surface can be made opaque with set hidden3d. The orientation from which the 3d

surface is viewed can be changed with set view.

Additionally, for points in a grid format, splot can interpolate points having

common amplitude and can then connect those new points to display contour lines, either

directly with straight-line segments or smoothed lines. Functions are already evaluated in

a grid format, determined by set isosamples and set samples, while file data must either

be in a grid format, as described in data-file, or be used to generate a grid. Contour lines

may be displayed either on the surface or projected onto the base. The base projections of

the contour lines may be written to a file, and then read with plot, to take advantage of

46

plot's additional formatting capabilities.

The matrix flag in splot function is used to visualize the data, which is represented

in the same format used as input to the Activity Engine module. The matrix flag indicates

that the ASCII data are stored in matrix format. The z-values are read in a row at a time

as shown below

z11 z12 z13 z14 ...

z21 z22 z23 z24 ...

z31 z32 z33 z34 ...

and so forth. The row and column indices are used for the x- and y-values.

5.1.3 GNUPLOT SCRIPTS

Sometimes, several commands are typed to create a particular plot, and it is easy

to make a typographical error while entering a command. To stream- line our plotting

operations, several GNUPLOT commands may be combined into a single script file.

Consider a case in which we have to change the viewing angles to get a closer

look of a particular cell or a specific time-step. This would require a means to rotate the

image around the central axis to an angle which would give a more clear perspective of

that required portion of the image. A script file can be written for exactly that purpose,

which would use the ‘set view’ command from GNUPLOT to change the default viewing

angle such that all the hidden portions of the image a brought to light.

Figure 19 and Figure 20 show how the hidden portions of the image are made

47

visible by changing the viewing angles.

Figure 19: Rotating the vertical angle

Another useful aspect of the writing scripts is automating the visualization system.

Figure 20: Rotating the horizontal angle

5.1.4 GNUPLOT ON WINDOWS

As mentioned above the Activity Modeler system is implemented on the Windows

48

platform and GNUPLOT was essentially built for the UNIX platform. To enable

intelligent scripting in GNUPLOT on Windows, we have to open a pipe to GNUPLOT

and stream-line commands to it using some scripting language like PERL. The Windows

version of GNUPLOT has a utility program called pgnuplot, which supports piping of

commands.

Since 2D visualization requires movie-making from image files, there should be an

efficient way to generate a number of images from the resultant data files. One of the

auxiliary PERL script perform the function of generating images by opening a pipe to the

pgnuplot program. While generating a large number of images (> 20), the script allows

automatically closes the pipe and re-opens it after GNUPLOT finishes the processing to

prevent buffer overflow problem.

5.2 AVS-EXPRESS

5.2.1 INTRODUCTION

AVS/Express is an object-oriented, visual development tool that enables you to

build reusable objects, application components, and sophisticated data visualization

applications. It has the following features:

��Object Oriented - AVS/Express's development approach is object-oriented; it supports

the encapsulation of data and methods; class inheritance; templates and instances;

object hierarchies; and polymorphism.

49

��Visual development - The Network Editor is AVS/Express' main interface. It is a visual

development environment that is used to connect, define, assemble, and manipulate

objects through mouse-driven operations.

��Visualization application - AVS/Express provides a number of predefined application

components (objects) that process, display, and manipulates data. The objects and

application components that you connect and assemble in the Network Editor control

how data is processed and how it is displayed. If you choose, you can compile and

package those objects and even add a user interface to create a complete application

that can be delivered as a stand-alone application.

AVS-Express has a Network Editor, which is the primary development tool

consisting of libraries of objects and a workspace where we can build visualization

modules. The underlying code in which objects are written is called the V code, which

can be modified using a text editor or the Network editor. AVS-Express is built on

component technology, which is defined as the ability to create an application from

reusable, modular pieces of components. Lastly, the most important is that it gives you

the flexibility to create our own components and make them re-usable modules adding

them to the pre-existing libraries. AVS-Express is by far more advanced then GNUPLOT

and used for developing movies and fancy visualization images, while GNUPLOT is

used for making quick 1D and 2D graphs with limited visualization modules like contour

and surface plot.

5.2.2 READERS

50

AVS-Express supports a generalized data-structure called a field which supports

many different types of data sets. Almost all of the visualization modules in AVS-

Express supports field as an input. To gain the capabilities of these AVS-Express

visualization modules we have to write Reader for our data-sets. This process is called

‘Importing the data’. Apart from an import module there are other module called the

readfilename, animfilename and trigger which comprise the reader for our data.

AVS-Express supports four different types of data based on the grid

(connectivity) information. They are summarized in Table 4.

MESH TYPE GRID CONNECTIVITY Uniform Supply max/min nodes only Implicit based on dimension Rectilinear Specify axis nodes only Implicit based on dimension Structured Specify all nodes Implicit based on dimension Unstructured Specify all nodes Specify connectivity using

cell-sets

Table 4: Data types supported by AVS-Express

As seen in Table 4 our data falls in the Uniform Category and essentially there is

no need to even process it before importing. However, since our data is present in the

matrix format and Express expects the data to be in a column arrays, an Import module is

written which formats the data in the desired format and later the AVS-Express field

mappers are used to output the desired data in the field format. Figure 21 shows a 3D data

with 2000 values being imported to form a field structure. Note that the import module

which is the heard of the Reader is not shown in the figure.

51

Figure 21: File Import module for AVS-Express

The readfilename, animfilename and trigger modules are developed to use the

Animator and ImageCapture modules, which contains the movie-making capabilities.

5.2.3 AVS-MODULES

AVS-Express has a set of libraries. The Data Visualization library was used for

our data. The Data visualization library has a rich set of modules which allow the user to

use some of the most advanced visualization techniques. Some of the techniques used are

listed below:

��Contours, Isolines, Isosurfaces: Dealing with your data as a volume. Best way to

render a volumetric data making sense to human eye.

��Slices and cross-sections: Examining volume data as a separate slice.

��Colormaps: Attaching meaningful graphical cues to the data-sets

��CitySpaces and Surface plots: Method designed to effectively show histogramic data-

sets in 3D.

52

��Animation: The best way to visualize time-varying data having a spatial dimension

greater than two.

There are a number of other modules present, however explaining those is beyond

the scope of this thesis.

5.3 VISUALIZATION CONCEPTS

Apart from using the built-in capabilities of the visualization softwares, there are

a few things to be added to generate images and develop movies from the resultant data.

The most common feature which needs to be developed is a Reader for the data to be

visualized. The implementation of Reader for AVS-Express is described above.

GNUPLOT does not require any reader module and reads directly from files, which are

already present in the required format (Activity Engine generates data-files in those

formats to eliminate the overhead in visualization)

Briefly, any visualization system consists of three stages. The first is the Reader

stage, which reads the file whose data, needs to be visualized. Internally the reader stage

is software-specific; some softwares make copies of the data, which others (AVS-

Express) create soft-links to the files. The second stage forms the deals with the core

visualization process and consists of the built-in modules like Surface Plot, Iso-Surface,

volume rendering, arbitrary slicer etc. This stage also consists of the bulk of computation

in the visualization process. The third stage is the writing stage; in which the generated

results can be stored in the form of images and movies. AVS-Express has some

sophisticated modules to make movies in different formats, however GNUPLOT does not

have any known method to make movies and they are made in a two-step process;

53

generating images for every time-step and using some other utility program to generate

movies from the image-files (for 3D visualization).

In visualization of time-varying data, the dimension of the data needs to be

increased by one, as time itself contributes to one dimension. In short, 1D spatial data is

actually 2D in the spatio-temporal domain, while 2D spatial data is 3D in the spatio-

temporal domain. There are always two ways to visualize any data, either using movies

or using images. A movie consists of frames corresponding to time-steps and is an

interactive way to monitor the changes going on per time-step for the variable visualized.

Images are static in nature. One can generate an image for a particular time-step or for a

range of time-steps where the variable is visualized in the spatio-temporal domain as

described above. However, one should note that generating an image consisting of data

for all time-steps is only possible for 1D space. In our system, for 2D spatial data,

visualization will be supported mainly in the form movies in the cellular domain and

images in the temporal domain. For 1D spatial data, visualization will be supported in the

form of images for both the domains (cellular and temporal).

5.3.1 VISUALIZATION OF 1D TIME-VARYING DATA

For 1D data, the whole process can be captured by a single image; time forms one

axis of the system as shown in Figure 22. In this case, the cellular space is 1D consisting

of 50 cells and the process has 50 time-steps. An easy and sensible way to visualize the

whole process would be adding a time axis to the graph as shown in the Figure 20. These

54

types of images are used for results generated from the Activity Generator module, which

are 2D in nature.

Figure 22: Visualization of 2D spatio-temporal process

For 1D data, results like Living Factor and Activity Factor reduce a dimension of

the (spatio-temporal) data as the results compute a single value for all the cells (in

temporal domain) and a single value for all the time-steps (in cellular domain). Such

results are visualized as 1D graph using the plot and multiplot function from GNUPLOT.

Figure 23 shows the Statistic Analyzer results for a test data using the multiplot

command. These are the second type of results, which are in essence 1D in nature and

normally generated by Statistic Analyzer for 1D data.

55

Figure 23: Visualization using multi-plot graphs

The third type of result to be visualized is 2D in nature and spans the whole

spatio-temporal domain. These results are generated mainly by the Pattern Predictor

module and use a technique called zero padding and binary visualization. Some of the

results which fall in this category are that for the Max-Locator, Peak locator and Region

of Imminence.

5.3.2 ZERO PADDING AND BINARY VISUALIZATION

56

The concept of zero padding and binary visualization was introduced specifically

for two purposes:

��Focusing on the results and eliminating unwanted data.

��Significant reduction of resultant data-files.

Zero padding is essentially replacing the unwanted data with zeros, and retaining

the values for the data of interest. Binary visualization is an enhanced form of zero

padding which replaces the data of interest by 1s and unwanted data with 0s. However, it

should be noted that almost all the visualization softwares visualize zero values by

placing a dot at that location. To overcome this unwanted effect, the range of the z-axis

can be changed to a value greater than zero and less than the minimum value to be

visualized. This can be accomplished by a simple command in GNUPLOT; set zrange

[0.5:], which sets the minimum value on z axis to 0.5 for visualizing the ROI.

Since, the zero can be printed to the file as a short integer data-type there is a

considerable savings in the file-size, which serves our second purpose. Also eliminating

the visualization of unwanted data by changing the range enables us to focus on the

region of interest and serves our first purpose. Figure 22 shows the image generated from

the data obtained by the Max-Locator module. Clearly, the image focuses only on the

maxima found in the data, which is an indication of the most active cells in the spatio-

temporal domain.

57

Figure 24: Binary Visualization for Max-Locator module

The three types of resultant data generated by the Activity Engine are summarized

below in Table 5.

Type of result Domain Dim Visualization technique

Instantaneous Activity Spatio-temporal 2D Surface Plot images (GNUplot) Accumulated Activity Spatio-temporal 2D Surface Plot images (GNUplot) Activity Factor Temporal 1D 1D single graph (GNUplot) Statistics Temporal 1D 1D multi graphs (GNUplot) Living Factor Cellular 1D 1D single graph (GNUplot) Maxima Locator Temporal 1D 1D multi graph (GNUplot) Peak Locator Cellular 1D 1D multi graph (GNUplot) Region Of Imminence Spatio-temporal 2D Binary visualization (GNUplot) Peak Bars Spatio-temporal 2D Zero padding (GNUplot) Max Bars Spatio-temporal 2D Zero Padding (GNUplot)

Table 5: Visualization of results for 1D space

58

5.3.3 VISUALIZATION OF 2D TIME-VARYING DATA

The types of resultant data generated by the Activity Engine from 2D time-varying

data are similar to that generated from 1D data. They are summarized in below in the

Table 6.

The only difference is that the dimension has increased by one correspondingly

and the techniques of visualization have changed accordingly. The visualization

techniques change with respect to the dimension of the data as seen from Table 6.

In case of 3D data, it should be noted that every cell is defined in 3D space with

three variables, two dictating its position in the 2D cellular space, while an additional

variable for the time-step.

Type of result Domain Dim Visualization technique

Instantaneous Activity Spatio-temporal 3D Surface Plot images (GNUplot) and movies (AVS-Express)

Accumulated Activity Spatio-temporal 3D Surface Plot images (GNUplot) and movies (AVS-Express)

Activity Factor Temporal 2D Surface plot images (GNUplot) Statistics Temporal 2D Surface plot images (GNUplot) Living Factor Cellular 2D Surface plot images (GNUplot) Maxima Locator Temporal 2D Surface plot images (GNUplot) Peak Locator Cellular 2D Surface plot images (GNUplot) Region Of Imminence Spatio-temporal 3D Binary visualization (GNUplot) Peak Bars Spatio-temporal 3D Zero padding (GNUplot) Max Bars Spatio-temporal 3D Zero Padding (GNUplot)

Table 6: Visualization of results for 2D space

Visualization of 2D data uses the z-axis to represent the value of the variable to be

visualized. However, in 3D data if we need to represent the whole process in one image,

59

the z-axis needs to represent the time-step in this case. Thus, we can represent values of

the variable to be visualized by colors. This would however, make the image opaque

restricting the visibility only to the boundary variables. There are techniques in

sophisticated visualization software like AVS-Express to increase the transparency of the

image or volume rendering of the image, but it would not give a clear insight to the inner

details. A better way to visualize the 3D data is using movies, with each time-frame

corresponding to a time-step. Activity Generator and Pattern Predictor produces 3D data,

which needs to be visualized by movies as described above or by surface plot images for

a particular time-step of interest as shown in Figure 25.

Figure 25: Surface Plot Images for 2D spatial visualization

60

Figure 25 above is a surface plot for the Instantaneous Activity of a 2D diffusion

process at a particular time-step. Contours are drawn on the base to show the regions of

equal values.

The data generated by the Pattern Predictor modules (ROI, Peak Bars and Max

Bars) are also 3D and are visualized in the same fashion as described above, however

since zero padding and binary visualization is used for such data, it is a good idea to

visualize the values as discrete point as shown in Figure 24.

The Statistic Analyzer for 2D time-varying data produces resultant files which

itself is 2D in nature. These results are visualized with GNUPLOT surface plot function

as images.

61

6 OBSERVATIONS

This chapter presents the observations found by processing data obtained by a

number of different types of simulations. For data in 1D space we have:

��data generated by PERL script as shown in 3.2.3

��1D heat diffusion

��activity of state of robots

For data in 2D space we have:

��2D heat diffusion

��Fire spread model

6.1 DATA FOR 1D SPACE

6.1.1 TEST DATA GENERATED BY dataGenerator SCRIPT

For 1D space, first the Activity Engine was tested with data generated by the

mathematical equations shown in Equations 6.1, to model the behavior of the process in

terms of a function represented by f(x,t) (N=100, T=100):

Equation 6.1:

Type 1 ( )

( ) ( )tx

NCells*2.0sin*1.0

1*2.0 + … (a)

Type 2 ( )

( ) ( ) ( )ttxNCells

cos*sin1

*5.0+ … (b)

Type 3 ( )

( ) ��

��

�

++ 1sin

1*2.0 xt

xNCells

… (c)

62

The first two types of test data have activity concentrated in the same cellular area

throughout the temporal domain; they differ in two aspects, the rate of decrease of

activity and the number of peaks at t=0 as seen in Figure 26.

Figure 26: Test data 1,2 in value domain

As seen from the figure the type– 2 stabilizes sooner than type - 1. The IA curves

are found to be similar to plots above, while the AA curve for the first type has a gradual

rise and the second type of data is characterized by a steeper rise of AA curves as

expected.

The Activity Factor is found to form a similar pattern fairly independent of the

threshold set for it, due to the periodic nature of the function over the cellular domain.

The plots obtained by the statistical analyzer are seen to be characterized distinctly by the

periodic behavior of the sine and cosine functions in the Equation 6.1, as seen in Figure

27.

63

Figure 27: Results from Statistical Analyzer for test-data 1,2

The Pattern predictor module computes the peaks present in the IA curve and

predicts the ROI based on it. As seen from Figure 28, the first type of data is

characterized by lesser peaks and consequently the ROI is much smaller than that of

second type.

(a) (b)

64

(c) (d)

Figure 28: Results from Pattern Predictor module for test-data1,2

Finally, to test the results obtained from the Pattern Predictor module, a graph is

plotted in the temporal domain, with the sum of IA values for the whole cellular domain,

the Max locations and the imminent locations. If the graph of the imminent locations

closely follows the first graph, the ROI is a true measure of the active region.

The first two types of data were not characterized by any shift of activity in the

process and the active regions pretty much remained the same throughout the time of the

process. The third type of test data was generated to test the case of activity shift in the

temporal domain. Figure 29 shows the plot of IA curve and AA curve for the 3rd type:

Figure 29: IA/AA curve for test data-3

65

As seen from Figure 29 there is a distinct shift in activity from one side to other

during the course of time. The peaks are not intuitive from the Activity curves nor is the

shift of activity seen clearly. Once the IA values are processed by the Activity Engine we

can clearly see the presence of Peaks and the imminent region signifying the shift of

activity in the process. The results are shown in Figure 30 below.

Figure 30: Results from Pattern Predictor module for test-data3

Figure 31 shows that activity detected by the Max Locator module would not be

an appropriate measure of the Activity for this process since the impulses, signify the

sum of the IA values for the cells detected by the Max-Locator module, the line denotes

the IA values computed by the Peak Locator module and the points is the actual

representation of the activity of the process.

It is clearly seen that the Peak Locator gives a much true measure of the activity

of the process in this case.

66

Figure 31: Check for Peak-Locator and Max-Locator module for test data-3

6.1.2 1D DIFFUSION PROCESS FOR N=10, N=100, N =200

Data obtained from 1D diffusion process was analyzed by the system with the

parameters shown in Table 7.It also shows the results such as the global average activity

and the total accumulated activity for the whole process.

67

Parameters N=10 N=100 N=200 N=200

T 100 100 100 50 ROI (e) 0.9 0.9 0.9 0.9 LFac (e) Local Local Local Local AFac (e) Global

(0.002186) Global (0.000375)

Global (0.00194)

Global (0.004098)

Global Max(Tadv)

1001624 1000225 1000225. 1016801

Global Min(Tadv)

3.800186 1.102626 1.037858 3.8713

Global Max(IA)

0.263145 0.90693 0.9635 0.2538

Global Min(IA) 9.98377e-7 9.99e-7 9.99e-7 9.8e-7 Total AA 2.428327 3.8296 3.92825 2048.887

Table 7: Process Characteristics for 1D/2D diffusion

The IA plot for the diffusion process is shown in the Fig; as seen from the figure not

much information about the process can be obtained from this plot. However the results

obtained from the Activity Factor module shown in Fig distinctly signifies the region of

cells which have been more active the

Figure 32: IA curve for 1D diffusion process (N=100, N=200)

68

Figure 33: Activity Factor plotted for 1D diffusion (N=100. N=200)

The ROI computed by Pattern Predictor module is shown in Figure 34. Three distinct

regions are observed from the module signifying the shift of activity in three different

directions. Finally, Figure 35 plots the histogram of the process for N=10 and N=200.

From the plot the speed of the process can be computed based and it is found that the

process for N=200 is characterized by larger time advances, clearly signifying that it

takes more time to schedule events. In other words, the IA curve is characterized by a

steeper curve.

(a) N=100 (b) N=200

Figure 34: ROI for 1D diffusion (N=100, N=200)

69

(a) N=10 (b) N = 200

Figure 35 Histogram of Time-Advances for 1D diffusion (N=10, N=200)

6.1.3 ROBOT ACTIVITY – (MODELING TRANSITION OF STATE)

Data was visualized for robot activity modeled at ACIMS [14] using DEVS

simulation, where a robot is either said to be active (moving) or passive (stopped). Every

robot communicates with its successor and predecessor in form of events. The whole

process is dictated by an algorithm, which defines some rules regarding the state of a

robot. For example, if a robot is lost all the other robots stop moving. To get an overall

picture of the activity of the process, the data generated by the simulation was processed

by the activity modeler system.

The data obtained from modeling the activity of robots had to go through a

separate pre-processing stage, as the data was obtained was present with a simulation

time instead of real-time. Thus, the data had to be converted to real-time with equal

intervals in time domain. The data present was binary, with a 1 present for a moving

robot and a 0 present for a stopped robot. Since the data is just binary we don’t have to

70

compute the thresholds for the various factors and can simply set them to any value in

between (0,1) (since maximum(IA) = 1, minimum(IA) = 0 for binary data). The threshold

was set to 0.5. The data was present for around 2000 simulation time steps (around 2300

real-time steps) and was the data was analyzed using the Activity Modeler system.

The activity in this case would be the change in state for the robot. Also the cells were

represented by the robots (N=30, the data was presented for 30 robots). The total number

of transitions was found to be 5324. It was found that the average IA for all the robots

was pretty much the same (< 20%) and equal to their Activity Factor as shown Figure 36.

The imminent factor and living factor were found to be the same (as seen Figure 37),

since for binary data, the ROI for a particular time step is same as the IA curve for that

time step.

71

Figure 36: Activity Factor for robot activity

Figure 37: Imminent Factor and Living Factor for robot activity

6.2 DATA FOR 2D SPACE

6.2.1 2D DIFFUSION PROCESS

The parameters set for the system for 2D diffusion and the results obtained for the

process can be found from Table 7. The histogram for time advances was found to be

concentrated in the first two steps of the histogram as seen from Figure 38, signifying

clearly that the process was not characterized by steep changes in the IA curve and the

diffusion process was gradual unlike the 1D diffusion. The results from the statistic

analyzer are now for a 2D space and have to be represented as surface plot images for

72

each of the result (For example the standard deviation for the IA values is represented by

the surface plot image in Figure 39)

Figure 38: Histogram of Time advances for the 2D diffusion problem

Figure 39: Standard Deviation for the 2D diffusion problem

Most of the results from the pattern predictor module were in the form of movies

73

and were characterized by two concentric circles moving out over from the center

towards the boundary over the period of time. This happens because the first derivative of

temperature (rate of change of the temperature) was found to be higher at those locations.

Based on the IA surface, it was found that the peaks formed a concentric circle moving in

the direction of diffusion and so did the ROI move out specifying the imminent cells in

the process.

Figure 40 shows the peaks in the IA surface and the ROI calculated from them at

t=2 and t = 20.

(a) t =2 (peaks) (b) t =20 (peaks)

(c) t = 2 (ROI) (d) t = 20 (ROI)

Figure 40: ROI and Peak Bars for 2D diffusion process (t=2, t=20)

74

6.2.2 FIRE FRONT MODEL

A fire-front model [15] was simulated to model the propagation of fire in forests.

There was a need to study the behavior of two phenomena, namely, ignition and burning.

By theory we know that as a fire-front approaches a cell, it raises the temperature of the

cell to a certain ignition threshold, beyond which the burning process begins. The burning

process is characterized by a steep increase in temperature till some maximum value,

followed by a steep decrease in temperature. If the temperature of the cell did not reach

that threshold, it doesn’t go through the burning process.

With the above knowledge of a burning procedure, it is possible to analyze the

fire-front model. The data present was for 300 time steps. The results for the process are

tabularized in Table 8.

Global average IA (AFac (e) ) 1.791918 Global Max (Tadv) 1.00 Global Min (Tadv) 0.004673 Global Max (IA) 213.995 Global Min (IA) 1 Total Accumulated Activity 5321979

Table 8: Results for Fire-Front model (100 x 100, T =300)

The data was processed by the Activity Modeler system to get an overview of the

whole process. It was found that around t=180, the fire fronts reached the cellular

boundary, which was distinctly seen by a drop in the Living Factor curve from Figure 41.

It is also seen from Figure 41 that no more than 20% of the cells were ever active

throughout the duration of the process. Figure 42 shows that although the ROI module

75

did not gave a true representation of the active regions in the process from t [50-150] for

e=0.7, it wasn’t accurate enough, since the total activity of the imminent cells is

considerable lower than the actual activity. This was the time when the fire-fronts reaches

the boundary.

Figure 41: Living Factor for Fire-Front data (100 x 100, t=300)

76

Figure 42: Sum of IA values for all cells and imminent cells

The movie of the data generated by the Activity generator, for the instantaneous

activity surfaces, was characterized by two distinct activity patterns along the fire fronts.

The first activity pattern is due to the igniting process, while the second pattern following

it is due to the burning process. There were distinct oscillations around the high activity

patterns found on the igniting fronts. Figure 43 shows the activity patterns for two time-

steps (t=136, t=148), after the fire breaks in four linear fronts.

(a) t = 136 (b) t = 148

Figure 43: Instantaneous Activity Surface for Fire-Front model

Finally, to detect only the igniting process, the Accumulated Activity was

modeled, setting a lower and a higher threshold to focus only on the desired temperature

range of burning [373K]. This helped to characterize the igniting process in terms of the

direction of ignition and length of fire-front.

77

7 AUTOMATION AND OPTIMIZATION

7.1 AUTOMATION OF THE SYSTEM

Since the System consists of three distinct stages, starting from the formatting of

raw-data to the visualization of fancy images, with a number of scripts and executables

processing data at various stages, there is a need for automation in the process.

It is possible to automate the whole system using a script making system calls to

other scripts and executables of the system. The following features are required to aid the

automation process:

��Standardization of filenames for 1D and 2D data.

��Setting correct path for the executables and data generated from intermediate results.

��Use of command line arguments to customize the system for the data under

consideration.

Apart from standardization of name for filenames generated by the modules, the

raw-data files also need to be named in specific manner for 2D data files. They should of

the form ‘filenameBase+<time-step>+.filenamebase’.

Regarding the path of the executables, the system has a pre-defined directory

structure as shown in Figure 44. Note that the root path is relocatable. The relative paths

of the code and data should not be changed at any time.

78

Figure 44: Directory structure of system

Command-line arguments for the formatter and activity engine are given below

(can be found by a ‘–h’ command). The parent script, which facilitates automation, needs

to have command-line arguments, which in turn are mapped to appropriate commands of

the internal scripts and executables. For example, for visualizing 2D spatial data the

Activity Engine (karma) needs to turn on its ‘-2D’ flag and the formatter2D PERL script

should be used instead of the formatter script. Thus, the user needs to provide sufficient

enough information to parent script for correct automation process.

/

data bin

exe PERL scripts

viz

GNU scripts

imagemovies

79

Figure 45: Command-line arguments for automation

At present automation is present only for GNUPLOT visualization modules on

Windows platforms, due to cross-platform development issues discussed which will be

discussed in the Chapter 8.

The parent script, which provides automation, is called the Activity Modeler script

and its flow path is given in Figure 46.

Formatter.pl For correction -f fname For extraction -f fname –e rows Formatter2D.pl -C [<concat>||<expand>] To concatenate -f fnamebase -s time-steps To expand -f fnamebase -r <row-size> -c <col-size> Karma.exe -extract [<stats>||<activity>||<pattern>||<all>] -file <filename> -2D -fast -transpose -noactivity -predict <#steps> -cells <#cells> -steps <#steps> -rows <#rows> -cols <#cols>

80

Figure 46: Automation Process for the ‘Activity Modeler’ system

Formatter1D

Activity Engine

nSteps nCells

file

Karma –file __ -cells __ -steps __ -extract all

GNUPLOT scripts

Tmp\tmp.txt

data\*.txt

Bin\PERL_scripts\formatter1D

Bin\exe\karma

viz\GNUPLOT\scripts

viz\img\*.gif

Formatter2D

Activity Engine

ncols nrows

file

Karma –file _ -2D –rows _ -cols _ -steps _ extract all

GNUPLOT scripts & AVS-Express modules

Tmp\tmp.txt

data\*.txt

Bin\PERL_scripts\formatter2D

Bin\exe\karma

viz\GNUPLOT\scripts

viz\img\*.gif

nSteps

Concatenate

Formatter2D

Expand

Activity Modeler script

2D 1D

81

7.2 OPTIMIZATION OF THE SYSTEM

Run-time visualization of time-varying data creates a necessity for the modules of

the system to process data as quick as possible. Although, the visualization system

described here is not strictly run-time, it can be used to process time-varying data at

periodic intervals, say every n time-steps dumped by the simulation generating the data

(discussed in Chapter 9). This would impose a certain restriction on the data-processing

of our visualization system, which in turn would create a necessity to make a more

efficient use of the computation power, thus necessitating the optimizing of the code.

Most of the optimization techniques discussed here are inspired by those used in the real-

time systems and embedded software. They are discussed below.

��Local Variables: Minimize local variables so that the compiler can fit them in registers

(improving performance over accessing them from memory) avoiding frame pointer

operations on local variables that are kept on stack (reducing overhead of restoring

frame pointer). Declare local variables in innermost loop.

��Function Calls: Reducing the number of parameters due to expensive parameter pushes

on the stack call. Passing parameters by reference using pointers and references.

Specifying a return values only when needed. Declare small functions inline.

��Data types: Prefer the use of int over char and short as C++ perform arithmetic

operation and parameter passing at integer level.

��Constructors: Defining light weight constructors specially helps for an array of objects.

Using constructor initialization lists helps in increasing the performance by eliminating

82

the slower assignment operations over the initialization operations.

��Loop optimization: Eliminate loop independent expressions in the body of loop. Merge

loops if possible. Use loop un-rolling for smaller number of iterations. If possible

decrement the for loops as the condition (i!=0) is executed much faster than (i<max)

Use early loop breaking if condition is satisfied.

It is a well-known fact that there is always a trade-off between speed and memory

usage. Since, we have to process large time-varying data-sets optimization not only in

CPU-time but also memory usage should be considered equally important. Although, the

memory bottleneck depends significantly on the RAM capacity, significant optimization

of memory usage can be realized by the following techniques:

��Eliminate memory leak problems by de-allocating the memory whenever necessary and

pointing unused pointers to NULL to combat the problem with dangling pointers.

��For Statistic Analyzer and Pattern Predictor modules, where the number of peaks or

maxima are not known until run-time, there two ways to allocate space for the results

• Using linked-lists by adding a peak as it is found. The advantage is that we have

to traverse the Instantaneous Activity array only once.

• Using dynamically allocated arrays. In this case we would have to traverse twice

through the array; first to find out the number of Peaks (for eg) and second time

to store the values at those particular locations.

It is found that although one had to traverse the array twice, it was much more

efficient to use dynamically-allocated arrays over linked lists for even a small number

83

of peaks due to simpler logic and random access of array structures while using them at

a later use.

��Allocating space for the array under consideration only once by the Data Engine

module and referencing it later by other modules in the system.

Apart from the standard techniques described above, there are a few other

techniques implemented to provide optimization in time and memory which are turned on

by the ‘-fast’ flag in the Activity Engine module.

1. If it is known to the user that the data values are in integer format or it is known if the

fractional part of the floating point can be ignored, he can turn on the –fast flag and

the resultant data processing would be done in integer arithmetic as much as possible.

Since this would eliminate the use of floating point unit, there would be a

considerable increase in the performance of the system as well as reduction in the

memory usage during run-time. This point has been illustrated by comparing the time

taken to generate test data for integer data-type and floating point data-type for the

same function.

The Table 9 illustrates the time taken by the dataGenerator module to compute a

simple function ( ) yxyxfz 2, == with the integer flag turned on and later with the flag

turned off. The code is benchmarked using PERL’s benchmark module.

84

Cell Dimension (x, y)

Integer Flag (ON/OFF)

Wall Clock time(sec)

System Time (sec)

CPU Time (sec)

ON 1 0.01 0.04 100 X 100 OFF 1 0.01 0.07 ON 3 0.09 1.61 500 X 1000 OFF 7 0.12 3.44

Table 9: Optimization in time using the integer flag

2. The user can also turn on the -fast flag to save considerable amount of time that the

Statistic Analyzer spends in generating results like standard deviation. Since standard

deviation needs to compute the square root of a number, which is compute-intensive

operation, it can be bypassed, for larger data-sets, by turning on the –fast flag.

85

8 SUMMARY AND FUTURE WORK

8.1 SUMMARY

The Activity Modeler system gives a new perspective for analysis of data based on

the concept of Activity in DEVS. The system was used to generate results from a variety

of data produced by DTSS and DEVS simulations. The results were found to give a true

representation of the process and helped to study the behavior of the process in the

activity domain. Results were produced in both, the temporal and the cellular domain to

give a better insight of the process as a whole as well as for a particular range of time

steps or cells.

The statistical analyzer is used to compute a number of results, which were

inherently static in nature, like the global and local average, maxima and minima of

instantaneous activity and time advances. The pattern predictor module is based on the

fundamental that the whole process can be characterized on the basis of its activity curve

(1D) or surface (2D), by detecting its localized peaks. It introduces a concept to predict

the imminent regions based on the detected peaks. The pattern predictor also introduces a

concept to predict the imminent regions in the future based on the present active regions,

by detecting the direction and shape of activity pattern.

For 1D diffusion process, it was found that the living factor curve was similar for

N=100 and N=200, and it peaked to 0.4 around t=10 and decreased gradually, while the

activity factor was characterized by a distinct curve immaterial of the size of cells, and

was found to never exceed 20%.

86

While modeling the robot activity, it was found that the activity factor for all the

robots was more or less equal to each other (approximately 10%). The living factor

modeled the activity for all the robots in the temporal domain and showed the time-steps

of high active regions (t=500, around 40% of the robots were active) as well as the time-

steps when a robot got lost when the living factor dropped to zero. It was also found from

the linear span and ROI module that the maximum group of imminent robots was never

greater than 6.

Data analyzed by fire-front model showed the time of maximum activity (t –

[70,140]) when around 20% of the cells were active. It successfully modeled the regions

where the fire first ignited the cells and also brought into focus the oscillatory behavior of

the ignition.

The concept of activity introduced from DEVS also proves from the following

calculations that choosing a proper quantum for the simulation, DEVS can be much

efficient than DTSS simulations. Based on the concept of Activity, the number of output

transitions in DEVS, is given by the number of threshold crossings as

Number of transitions = q

txIAx t�� ],[

For DTSS simulations the number of output transitions is given by

Number of transitions = tNT

∆

The maximum instantaneous activity in the cellular and temporal domain, which

is the maximum slope (derivative) of the output values of a simulation in DEVS, is given

by

Maximum (IA) = tq

∆

87

From the above three equations the we can compare the DEVS and DTSS

simulations on the basis of the number of transitions taken by the process.

)(

],[

IAMaximumNT

txIA

DTSSDEVS

x t×=

��

The above ratio is computed for a number of processes and the results are

presented in the Table 10. Clearly, it is seen that DEVS is much more efficient than

DTSS simulations.

Model Cells Time Max (IA) Total AA DEVS / DTSS

1D diffusion (N=10)

10 100 0.263184 2.4283 0.0093

1D diffusion (N=100)

100 100 0.9069 3.8296 4.22e-4

1D diffusion (N=200)

200 100 0.9635 3.9285 0.0002

2D diffusion 10000 50 0.2583 2048.77 0.819 Fire-Front 10000 297 213.995 5321979 0.0083

Table 10: DEV v/s DTSS based on activity concept

8.2 FUTURE WORK

At present, the activity modeler system supports data only for 1D and 2D spatial

domain. It can be extended in the future for 3D space. It would impose some problems

for visualization stage, as it is would be a certain problem to visualize data in 4D (spatio-

temporal process).

The activity engine presented in this thesis has the data engine module developed

in C++ with MFC classes. Hence it would not work on a UNIX platform. The data

engine module needs to be coded using a cross-platform compatible library to enable the

88

use of the system on both, Windows and UNIX platforms. Also, at present the third stage

(visualization) of the system at present is implemented only using the GNUPLOT scripts.

If the visualization of 3D data is desired, the visualization stage of the system can be re-

built with more sophisticated visualization software.

The ability of the pattern predictor module to compute the imminent regions for

the whole process before hand, on the basis of only two time steps needs is unrealistic to

some extent. At present, it is implemented only for 1D space and was tested for 1D

diffusion process and test data-3 from the datagenerator script. The percentage error in

the predicted and the actual ROI is given in the Table 11 and plotted in the Figure 47.

Cell in error (%Error) Time Steps (t)

Cells = 10 (1D diffusion)

Cells = 100 (1D diffusion) (1Dtest data3)

Cells = 200 (1D diffusion)

8 1 (10) 13 (13) 39(39) 26 (13) 15 1 (10) 21 (21) 13(13) 22 (11) 22 3 (30) 19 (19) 13(13) 24 (12) 29 5 (50) 27 (27) 13(13) 48 (24) 36 3 (30) 12 (12) 12(12) 23 (11) 43 2 (20) 15 (15) 18(18) 26 (13) 50 1 (10) 16 (16) 7(7) 28 (14) 57 1 (10) 8 (8) 7(7) 18 (9) 64 6 (60) 8 (8) 2(2) 14 (7) 71 2 (20) 10 (10) 2(2) 14 (7) 78 2 (20) 8 (8) 0(0) 12 (6) 85 2 (20) 8 (8) 0(0) 12 (6) 92 2 (20) 8 (8) 0(0) 12 (6)

Table 11: Percentage error in predicted and actual ROI for 1D processes.

For process whose activity curve is characterized by a non-linear curve (second

order processes) the predictor module doesn’t not work well. However, even for second

order processes it is found that as the number of cells increases, the percentage error

89

decreases, even though the region of imminence increases.

The dotted curve in the Figure 47 clearly shows that the percentage error for the

test data-3 (characterized by linear activity curve) is much lesser and diminishes to zero

as time progresses

Figure 47: Percentage Error in the Predicted and the actual ROI for 1D process

90

REFERENCES

[1] Guangeng Ji and Han-Wei Shen, “Time-Varying Iso-surface tracking with global optimization”, IEEE Visualization 2004 Conference

[2] Pak Chung Wong and R. Daniel Bergeron, “A tool for hierarchical representation

and visualization of Time-Varying data”, ICASE/LaRC Symposium, 1995 [3] Kwa-Liu Ma and David Camp, “High Performance Visualization of Time-Varying

volume data over Wide-Area network”, Proceedings of Super Computing 2000 [4] Jack Snoeyink, “A geometric basis for visualizing time varying volume data”, IEEE

Visualization 2003 Conference [5] Chaoli Wang, Jinzhu Gao and Han-Wei Shen, “A Framework for rendering large

time-varying data Using Wavelet based Time-Space Partitioning Tree (WTSP)”, proceedings of Eurographics symposium on Visualization, 2004

[6] Zeigler, Bernard P., Herbert Praehofer, Tag Gon Kim, Theory of Modeling and

Simulation: Second Edition. San Diego: Academic Press, 2000 [7] J. Davis and A. Bobick, “The representation and recognition of Action using

temporal templates”, in CVPR 1997 [8] B. F. Bremond and M. Thonnat, “Analysis of Human Activities described by Image

sequences”, in Proc. Intl. Florida AI Research Symp, 1997 [9] J. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kauffman, 1976 [10] Rama Challapa, Namrata Vaswani and Amit Roy Choudhary, “Activity Modeling

and Recognition using Shape Theory”, in CVPR 2003 [11] Kristina Lerman and Aram Galystan, “Agent Memory and Adaptation in multi-agent

system”, AAMAS, 2003, Melbourne, Australia [12] R. Bastos, D. Dubugras and A. Ruiz, “Extending Activity Diagram for Worflow

Modeling”, 35th Annual Hawaii International Conference of System-Sciences [13] J. Nutaro, B.P. Zeigler, R. Jammalammadaka, S. Akerkar, “Discrete Event Solution

of Gas Dynamics with the DEVS framework: Exploiting Spatio-temporal heterogeneity”, ICCS, Melbourne, Australia, 2003

[14] Xiaolin Hu, B.P. Zeigler, “Model Continuity to Support Software Development for

Distributed Robotic Systems: A Team Formation Example”. [15] Alex Muzy, “Fire Front Model”

salil r akerkar - arizona state university · 2019. 12. 21. · salil r akerkar _____ a thesis...

Documents