implementing topmodel in nova

7/27/2019 Implementing TopModel in Nova

1/20

Justin Dee 1

Justin DeeImplementing TopModel in NovaB.Sc. Computer Science31

stMarch 2013


2/20

Justin Dee 2

I certify that the material contained in thisdissertation is my own work and does not containunreferenced or unacknowledged material. I alsowarrant that the above statement applies to the

implementation of the project and all associateddocumentation. Regarding the electronicallysubmitted version of this submitted work, I consentto this being stored electronically and copied forassessment purposes, including the Departments useof plagiarism detection systems in order to checkthe integrity of assessed work.

I agree to my dissertation being placed in thepublic domain, with my name explicitly included asthe author of the work.

Date: 2013-05-31

Signed:


3/20

Justin Dee 3

Implementing

TopModel in Nova

AbstractEnvironmental computational science uses a variety of platforms to create models upon which are based

predictions about future behaviour. Modellers and scientists suffer from a lack of an easy, free, application withwhich to create complex models without resorting to prior knowledge of programming languages, which inhibitscreativity and hinders efficiency. Nova is a platform created by Prof. R. Salter of Oberlin College to fill this niche,and in this project, Nova is analysed with regards its fitness for purpose in this role, by way of an implementation

of the well-known hydrological model, TopModel. From this process, it is made clear that while Nova is a strongcandidate to be considered when choosing a platform for a new computational model, there are ways in which itcurrently cannot replace existing methods.

Table of ContentsChapter 1 Introduction and Objectives _______________________________________________________________________p3

Chapter 2 Background on Nova _______________________________________________________________________________ p5

Modelling Applications _____________________________________________________________________ p5

How does Nova work? ______________________________________________________________________ p5

The LotkaVolterra Model _________________________________________________________________ p5

Beyond LotkaVolterra: other components and functionality _________________________ p10

Chapter 3 Background on TopModel ________________________________________________________________________ p12

Chapter 4 Implementing TopModel _________________________________________________________________________ p14

Challenge 1: Understanding the source code ____________________________________________ p14

Challenge 2: How to begin tackling such a large problem ______________________________ p14

Challenge 3: Inputs and outputs __________________________________________________________ p15

Challenge 4: Flows _________________________________________________________________________ p15

Challenge 5: Inputting data sets __________________________________________________________ p16

Challenge 6: Obtaining meaningful output _______________________________________________ p17

Reflections on my method _________________________________________________________________ p17

Chapter 5 Conclusion _______________________________________________________________________________________ p18

Was Nova itself an inherently poor choice for the implementation of TopModel? ____ p18

Does Nova meet the criteria laid out above? _____________________________________________ p18

Future Developments ______________________________________________________________________ p19

Bibliography ____________________________________________________________________________________________________ p20

Acknowledgements _____________________________________________________________________________________________ p20


4/20

Justin Dee 4

Chapter 1 Introduction and ObjectivesOne of the most important tasks in the environmental sciences is modelling the environment. Through models,

assumptions can be tested, and predictions can be made. The environmental sciences rely heavily on thesepredictions, assumptions, and models, because the environment is a complex and multi-faceted system; often, theonly way to predict what effect some change will have, many months or years down the line, is to construct amodel.

For example, take rainfall; when it rains, water enters the ground, soaks through the earth, and makes its way

to the seas via streams and rivers. One way flooding occurs is when too much of that water reaches a river at thesame time; but how much water is that, exactly? How does this flood danger point change when it rains more, orless? What about if it only rains sparsely, but over a long period of time? All of these are questions that, while theycan be answered in principle through measurements and demonstrations, thought experiments, need models toaccurately answer for a given set of parameters. A model of rainfall dynamics can answer these questions, andprovide local councils, flood protection companies, and insurance firms, for example, with the data they need toeffectively plan for the future. This type of research, known as Computational Science, is a massive market thatspans many disciplines, and how models are written or generated is an important part of the discussionsurrounding them.

Currently, in the environmental sciences, many models are implemented in programming languages like C, R,Java, Matlab, or Fortran. The reasons for this are several:

Models are complicated. Often involving multiple iterations of calculations, programming languagesprovide the tools to perform these calculations with multiple variables.

New models often reuse parts of existing models; Because older modules were written in lower-levelprogramming languages like Fortran, new models which wish to refer to them have to either be writtenin the same language, or written in a language that the older code can be imported or converted into.

Because much of the theoretical environmental science research is done in academia, rather than bycorporations, the price of software must be considered. All of these programming languages are freelyavailable, which is a significant advantage to be considered, compared to costly alternatives (covered inmore detail below), which is a big draw when funding may be limited (for instance, for speculativeresearch projects).

Models are run on server machines. Because of the complex multidimensional calculations needed torun models, (for instance, with Monte Carlo simulations, where models with random factors are runmany times to generate an idea of the result), heavy-duty computers are often needed to run thesesimulations in a reasonable length of time. These server computers, used to run the simulations, oftenonly support programs written in these languages. And recently, a lot of work and research intorunning these sorts of models has focussed on the strength of parallel cloud computing, using manysmaller machines to emulate one large one for the purposes of speeding up and optimising highlyparallel computing like the Monte Carlo method. Being written in a format that allows thisparallelisation is a big advantage to any model.

Nevertheless, such a reliance on traditional programming languages for these models generates problems; notall Environmental Scientists are programmers, or come from a Computer Science background. Having to learn a

programming language (or several) could be considered a hidden cost to the discipline, and one that delays andhinders projects. If a scientist wants to create a new model, he or she can do so in whatever language they feelmost comfortable (or that which best fits the computational need of the model), but, if they want to change anexisting one, they would need to be familiar with the language in which it is written, and this highlights anotherproblem; models written in programming languages are not intuitive; it is hard to look at a block of code andunderstand intuitively what that represents, in terms of the model. Large amounts of commenting can make acomputational model easy to read in practise, but it may still require one to read through the whole program inorder to understand the role any one section has, or to gain an understanding of the structure of the model.

The solution to these problems is higher-level languages, domain-specific tools that provide graphical ormodule-based interfaces where models can be created, changed, viewed, and executed. Graphical modelling toolsshould be easy to use, with little to no knowledge of formal programming languages, and should facilitate the use

of computational models without having to worry about the method by which they are implemented.

One such graphical modelling platform is Nova, created by Professor Richard M. Salter, PhD, from OberlinCollege, as a scientific and educational tool for computational science. Its main feature is a graphical user interface,


5/20

Justin Dee 5

where components are dropped onto a canvas, and properties added, to formulate a model without the need tocode anything at all beside the simple mathematical equations that comprise the functional parts of the model.

The objective of my project is to analyse the Nova platform with regards its fitness for purpose, taking intoaccount the following primary criteria:

Usability In terms of basic operations, how easy is Nova to use? The goal of any such higher-levelapplication should be to increase efficiency, and if it is harder to things in Nova than otherwise, then itwont be an improvement at all.

Functionality Can you use Nova to do everything that is needed from such a platform? Computationalmodels can be quite complex, and Nova will need to be able to duplicate every option a traditionallanguage uses in order to be a viable replacement.

Simplicity How much less complicated is it to use Nova instead of any other option? If Nova takes along time and a lot of effort to learn, then there is no advantage to learning it than any of theprogramming languages already used for computational science.

Additionally, I will evaluate if Nova fulfils some other important criteria that it must be compared against,criteria taken from the reasons I listed why programming languages are used at the moment as the norm:

Cost; How cheap is Nova, compared to alternatives? Speed; Can Nova run simulations in a reasonable length of time? Cloud computing; Is it possible to adapt Nova to run in the cloud and take advantage of the benefits it

offers?

Novas modelling functionality is primarily aimed at the natural sciences, so, in order to analyse its use for theintended purpose, I shall convert several well-known computational models from these fields into Nova. The firstof these shall be the LotkaVolterra model, serving as a simple model with few terms. This model (otherwiseknown as the PredatorPrey model) is a biological population model, but has broader applications. Additionally, itis used in several of the online Nova tutorials (created by Dr. Andy Lyons, PhD), and thus will serve as anintroduction for me, to Nova and its tools. The second model I will implement is TopModel, a hydrographical modelused to describe the way rainfall is absorbed by the ground. This is a much more complex model, and will serve totest the limits of Nova and its capabilities. It is a real, current model that is used in practical application today,something that I feel is important in providing relevant analysis.

In Chapter 2, Background on Nova, I will explain a little of the history of Nova, and its main competitors, and Iwill discuss how it differs from them, and why it is important. I will also describe Nova, and the components andfeatures I will use, through the example of the Lotka-Volterra model. In Chapter 3, Background on TopModel, I willdescribe TopModel, and how the it works. In Chapter 4, Implementing TopModel, I will describe the process bywhich I attempted to implement TopModel in Nova, and the challenges I faced in doing so. In Chapter 5,Conclusion, I will analyse any problems I had, and evaluate Nova with respect to the criteria listed above, discussthe methods I used for the implementation, and what effect hat had on the projects success, and consider anypotential improvements, either to those methods, or the Nova platform in general.


6/20

Justin Dee 6

Chapter 2 Background on Nova

Modelling Applications

On Richard Salters mini-biography on the NovaModeller website, he explains that Nova was originallydeveloped as a teaching tool because of the limitations of existing solutions. He elaborates on this in Nova: Amodern platform for system dynamics, spatial, and agent-based modeling, explaining that the existing goodsolutions only support one specific view of a model, or are too broad, with little domain-specific support fordynamic models.

The two main competitors in this field are both commercial products; Vensim, and STELLA: Vensim, producedby Ventana Systems, described itself as .. simulation software for improving the performance of real systems used for developing, analyzing, and packaging dynamic feedback models. STELLA, by ISEE Systems, aims itself atthe education sector as well as the commercial; from their website: STELLA offers a practical way to dynamicallyvisualize and communicate how complex systems and ideas really work. Whether they are first-time orexperienced modelers, teachers, students, and researchers use STELLA to explore and answer endless questions.

Both of these seem to offer similar functionality to Nova (primarily, a stock-and-flow graphical interface), invery polished packages, but, as seen above, are intended primarily for use in the government and commercialsector. STELLA does advertise itself for teachers and other educators, but the practicality of the situation is that atthe time of writing, a single-person STELLA license costs $1899, and the professional version of Vensim, $1195.Admittedly, Vensim will provide a free educational evaluation license, but not with the full functionality of the

larger versions (such as Monte Carlo simulation). In this way, Nova does have a niche to fill; its a domain-specifictool for educators and scientists, without the associated overhead of license costs.

How does Nova work?

Nova is a Java application, which uses an internal scripting language, NovaScript, based on JavaScript. Itfeatures two primary canvases, one for displaying the model components, such as stocks and flows, which will bedescribed in further detail below, and one for displaying results, usually in the form of tables or graphs. Below is ascreenshot of a blank project in Nova, with a key to the various UI elements overlaid:

Figure 2.1

The LotkaVolterra Model

The Lotka-Volterra model of population density dependence was first proposed byAlfred J. Lotka in 1910. It takes the form of two equations, which describe the growthrates of two populations; conventionally called Predator and Prey, although the

model is also applicable for other situations. The growth rate of each population(dx/dtand dy/dt) is dependent on the size of both populations (xandy), and anumber of model parameters (here, , , , ) that define how the populationsinteract; and specifically relating to the mechanics of predation, and and defining how fast the populations would grow/decline in absence of the other.

This is the primary

component display

canvas. Terms, flows, and

stocks are dropped here

to create the model, along

with other components.

This is the output

display canvas. Tables,

and graphs, appear

here when they arecreated.

Above are the clock controls, adjusting the models running time

and granularity.

Above those are the execution commands, to run or stop the model.

This is the console, for manual output or

(usually) error messages.

This list contains references to submodels,used to create more complex interactions. In

this instance, Untitled is just the name of

the blank project shown here.

This window contains global definitions

and code to be used with the whole model.

Above it are menus to add components.


7/20

Justin Dee 7

These equations naturally create a periodic solution, where thepopulations vary in harmonic motion; as the prey populationbegins growing, so does the predators, and as the predators growmore populous, the pretty begin to decline again. This declinegradually causes the predators to decline, which reduces thepressure on the prey population, which begins growing again.Figure 2.2 is an output from Nova showing this pattern.

Implementing the LotkaVolterra Model in Nova

I chose the LotkaVolterra model to implement initially,because I felt that as a simple model (only two equations, and twovalues), it would serve as an excellent introduction to thecomponents of Nova, both to me, through the online tutorials(www.novamodeller.com) and in this project report.

The first, and primary, component of Nova, around which mostof the others revolve, is the Stock. This represents a value, avariable, a quantity which is expected to change through the

evolution of the system. In this model (hereafter called the LV model), Stocks will be used to represent the twopopulation quantities. Figure 2.3 shows some Stocks in Nova.

Figure 2.3

These green squares are dropped onto the drawing canvas from a menu. Right-clicking on one brings up adialogue where the initial value can be set, and two important characteristics defined; there are checkboxes fordiscrete and non-negative. The latter does simply what it describes; it prevents the stock from having a negativevalue; in this case that makes perfect sense, so it is left checked. In other models, say, financial models, to give anexample, it would be useful to allow a stock to enter negatives. Checking discrete means the value is treated as asequence rather than a stock; the value of it is defined at every time step, but not in between (the evaluation of theflow is treated as a next value rather than a change in value; more on flows below). While this seems like itwould make sense with this model: you cant, after all, have a fraction of an eagle or a fox. However, this model isunits-independent: a value of 1 prey doesnt necessarily equate to only one fieldmouse, and, as mentioned above,the model can also apply to situations which are more intuitively continuous in nature rather than discrete, likeeconomics. Additionally, the equations we have for LV are in the form dx/dt=f(x), which indicates a continuousfunction, rather than E(x)=f(x-1) which is the form one would need for a discrete function. So this checkbox is leftunchecked.

The next important component in Nova is the Term.

Figure 2.4

These components are used to store values (or expressions based on other parts of the model) for use incalculations. For this example, theyre just going to be used to store the parameters, PreyBirthCoefficient, andPredDeathCoefficient, that define how the populations grow or decline in absence of the other. As with Stocks,right-clicking them accesses their properties window where the expression indicating their value is defined.

Lastly, we have the most important basic component of all; the Flow.

Figure 2.2


8/20

Justin Dee 8

Figure 2.5

Flows are how Stocks change. The cloud symbols can be dragged independently of the hourglass, whichrepresents the flow itself. When dropped onto a stock, that side of the flow is connected to it. On the flows right-click window, the expression detailing how much the stock should change is input, and additionally, there is aradio button choice between uniflow and biflow, as seen in Figure 2.6. Uniflow creates the blue arrows seen in2.5, whereas biflow would add an additional arrow to the left, indicating that it would allow movement both waysthrough the flow. For this example, I am going to be tracking births and deaths with separate flows, so onlyuniflows are needed. If I was to use only one flow, then Id set it to Biflow, so that the value of the stock wouldcorrectly increase when the population change rate is positive, and decrease when negative.

Figure 2.6

Both sides of the flow need not be connected to stocks; in the example, we dont need to track the populationsdead members, nor do the new ones need to come from any sort of pre-existing pool, so we can just leave theclouds on one end of each flow hanging. Figure 2.7 shows the LV model after connecting the flows correctly. Thered arrows indicated where the formula relies on a certain value; for instance, the birth rate of the predatorsdepends on the death rate of the prey (in this model, prey only die when predated by a predator, and predatorsneed to actually predate prey in order to breed, so their birth rate is dependant on their predation rate). Ive alsoadded some additional terms that are needed; PreyBirthCoefficient and PredatorDeathCoefficient represent and from the original equations, but terms are also needed to represent and ; the predation rate coefficient andthe predation to predator birth conversion coefficient, although Ive named them to be consistent with the schemeI already had here:

In this pane, the expression for evaluating themagnitude of the flow is entered, as can be seen here.Since this will be the flow for determining how manyprey are born, that part of the equation is entered here.


9/20

Justin Dee 9

Figure 2.7

So, simply by looking at this diagram, even without knowing the formulae behind each flow, it can be intuitivelyseen that PreyBirth depends on Prey and PreyBirthCoeff, PreyDeath depends on Prey, PreyDeathCoeff, andPredator, PredBirth is based on PreyDeath and PredBirthCoeff, and so on. The clouds are the open beginning andends of the population flow, so its easy to see that births flow into the stock from nowhere, and deaths flow out,

into nowhere. The only thing still missing is a way to examine this data, and for that, a graph is ideal. Wed like tocompare the growths of both populations over time, and while we could use two graphs for that, in Nova, we canactually plot them both to the same graph:

Figure 2.8

The little square on the component canvas represents the graph there; the line to the two stocks is not drawnby the user; rather, on the properties window of that component, all the possible graphable values are listed, andhere I have chosen both prey and predator from that list. We could also graph the values of the flows, or theterms (although that would be pointless here, as they are constant). Also on that properties window are variousoptions to do with the scale of the graph, graph type, and so forth. The actual graph window is createdautomatically, in the results canvas, where is can be moved around. As the model has not actually been run yet, thegraph is blank.

Tables, another Nova component, work very similarly to graphs, but presenting the data as raw tabulatedresults instead.

As shown in Figure 2.1, above, and Figure 2.9, below, the controls for running the model are above thecanvasses:


10/20

Justin Dee 10

Figure 2.9

The Start and End input boxes define how long the model should run for, and Dt defines the granularity.Method can be changed to a couple of different integration methods, presumably useful for some models. To runthe model, first it must be captured, with the Capture button. This converts all of the components on the canvasinto NovaScript form (which can be viewed with the button). NovaScript is the internal script representation ofthe model, as seen in Figure 2.10, which shows the part of the NovaScript with the model definitions in it, includingthe values for the terms I entered, and all the flow equations. There is more in that window, but a lot of theremaining NovaScript relates to the display of all the components, their position on the canvas, for example, whichI feel it unnecessary to showcase and explain, as it is not important for the models functionality.

NovaScript is actually a domain-specific form of JavaScript, which allows for the use of JavaScript mathsfunctions, more complex formulae including if statements (particularly their inline variant), any JavaScript library,and so forth. Having this underlying tool will be very useful later on when more complex goals arise.

After this data has been captured into NovaScript, the next thing to do is Load it, either from the NovaScriptwindow or the main one. This loads and compiles the NovaScript code, and it important for checking any syntaxerrors one may have entered into the formulae. Finally, the model must be Executed, either via the Exec buttons,which run the whole model from start ot end, or, the Init and Step buttons, the former of which begins the modelwithout running it, the latter of which advances the model one time step. The Stop button can be used to ceasecalculation mid-simulation, very useful if one has misjudged the length of time a simulation will take to completeand set an unnecessarily long simulation time.

The results of this LV model execution can be seen in Figure 2.9 above; the two lines, red and blue, on thegraph, indicate the development of the populations over time. Although at first glance it may seem that theyachieve a similar maximum population, note that Nova plots them on their own vertical scale, so as to retain thehighest resolution view of each graph. It is clear to see, though, the periodic motion expected from the equationsthat we started with, and the harmonic relationship between the two populations.

Beyond LotkaVolterra: other components and functionality

This model is relatively simple, and so uses little of the extended functionality of Nova. Some of that I willexplore in Chapter 4, as I attempt to implement a much more complex model, but I will outline some morecomponents here briefly:

Commands are components that solely contain code to be executed each time step. Sliders and Spinners are very much like Terms which cant hold expressions, only values, with the

addition that the representation of them in the component canvas includes a graphical slider or spinnerwhich can be used to adjust the value without entering the properties pane.

Labels are just that; labels. Used to provide annotations to diagrams.


11/20

Justin Dee 11

Chips and pins are components used within the context of submodels, something that seems veryimportant to Novas power. Chips act as either outputs or inputs, and if a model has at least one output,then it can be used as a submodel within a larger supermodel, as a chip, a black box model that can behooked up via the pins to components in the supermodel.

Agent containers; these are beyond the scope of this project, but in essence, allow for models whereindividual submodels (agents) act with individual scope relative to other submodels; for instance,

simulating an actual population of animals with some element of randomised activity. Batch processing. Each term can be configured as a batch element, which allows its value to iterate

through a range of values over a simulation.

Code chips (confusingly named; unrelated to chips and pins) which contain raw code to be executed. Integration with the statistical programming language R

Figure 2.10


12/20

Justin Dee 12

Chapter 3 Background on TopModelMy primary source on TopModel is A dynamic TOPMODEL (K. Beven & J. Freer, Hydrological Processes 2001),

although somewhat over my head as a non-environmental-scientist. I shall here attempt to describe TopModel asbest I can. All diagrams are courtesy of Prof. Gordon S. Blair and Dr. Yehia El-Khatib (A Cloud-based VirtualObservatory for Environmental Science, OpenWater symposium 2011/4/19, and, Building a Cloud Infrastructurefor a Virtual Environmental Observatory, The American Geophysical Union (AGU) Fall Meeting, 2012/12).

TopModel describes the way that rainfall falls onto an area of terrain, and, based on the surface of that terrain,

including its inclination, and the saturation of the ground, how that rain either permeates to the water table, orflows through the ground, or over it, to the channel or basin.

Figure 3.1

At the beginning of a simulation, the ground is dry, the water table low.

Figure 4.2

In Figure 4.2, the rain begins to fall, permeating the so-called root zone. This begins to saturate the soil.


13/20

Justin Dee 13

Figure 4.3 Figure 4.4

Figures 4.3 and 4.4 show how the water permeates through to the water table, causing it to rise, and in theshallows terrain, rise above the terrain.


Figures 4.5 and 4.6 show the overland flow, streamlets forming, but then ceasing as the saturated groundbegins to transfer through to the channel, and the water table recedes.


Eventually, as the rain stops, the root zone begins to dry out again as the saturation begins to return to normal.

TopModel was first published in 1979 by Beven & Kirkby (A physically based, variable contributing areamodel of basin hydrology), has had a profound impact on hydrographics, particularly rainfall, runoff, and the waythey are modelled and studied, and is still in use today.


14/20

Justin Dee 14

Chapter 4 Implementing TopModel

Challenge 1: Understanding the source code

I began this project without a great deal of understanding about how TopModel works, how it is structured,what the various elements of it were; not being an actual environmental scientist, I could not approach this aspectof the project as an actual user could. Nevertheless, what I did have access to, was the source code for a Cimplementation of TopModel (available athttps://source.ggy.bris.ac.uk/wiki/Topmodel). My hypothesis is that atranslation from C to Nova, while not emulating the actual implementation process a target scientist would use,

would nevertheless expose me to many of the same challenges and tests of the system as they would. I am, after all,attempting to implement a real hydrological model, the same goal, and so the end result shouldnt differ toogreatly. I will take into account the fact that I am approaching this from a different initial standpoint than thetarget audience in my conclusions.

I began by reading through the source code of the main methods (in topmodel.c and core_topmodel.c). I notedthat there are a lot of parameters to TopModel (an overview of them can be found onhttps://source.ggy.bris.ac.uk/wiki/Running_Topmodel). I also noted that a large portion of the calculation wasdone in a loop over an array of area index classes (nidxclass). Realising this, I decided that this was an idealcandidate for the submodel functionality of TopModel; I could design a submodel with everything inside that loopin, and run many copies of that submodel to simulate the loop, with input pins to adjust the behaviour for eachiteration.

I also noted that there was a fairly large function get_f(), which seems to simple be a mathematical calculation(albeit based on many inputs), which could easily be turned into a submodel (or simply a code chip).

I noted that on top of the large number of parameters, TopModel also requires a large dataset to operate on(the rainfall measurements, particularly). This I obtained, along with suggested values for the parameters, fromone of my project supervisors, Dr. Yehia El-Khatib, who had worked with TopModel before.

Challenge 2: How to begin tackling such a large problem

At this point, I did some research into how to deal with models that had so much going on in them as TopModeldid, as I was somewhat intimidated by TopModels complexity, and could not see any way to beginning animplementation. A breakthrough in my understanding came when I read through an example model on the Novawebsite; a model of a drunken random walk (http://www.novamodeler.com/model-library/drunken-random-

walk/). Although simple, the way this model used stocks to represent, rather than quantities, variables (in thiscase, components of a velocity vector), helped me grasp my way towards an understanding of how I might begin totackle this problem. I had already created Terms to represent all the models parameters, since that was acomparatively simple task. Understanding that I could represent all the variables used in the C methods withStocks, I decided to do just that, and created a whole screens worth of Stocks, matching the variable names tothose within the C code; note the use of a Nova label component to reference descriptions for all the parameters:

Figure 4.1


15/20

Justin Dee 15

For the large part, this was an easy task, being as the code was in C, and well-coded, all important variableswere defined in a structure, rather than inside the code.

Challenge 3: Inputs and outputs

I already had a large amount of data to be used as input, in the form of a .csv file with five columns, two ofwhich, date and time, I knew could be ignored for the purposes of this here; Nova is time-unit agnostic; all I neededto do was make sure that the value for dt (in the Term) was correct with reference to what I set 1 timestep to be(in this case, 15 minutes, since that was the granularity of the data I had). The other three columns were theimportant input data: Flow, Rain, and PET. Although I was not aware of what these were, I knew they would be be

important in the model, so I created terms to represent them (note: these could have been pins since I was alreadylooking at this module being a submodel; handily, Nova includes a menu function to convert Terms to Pins).Already I was foreseeing a problem; how was I going to convert data from a CSV file into Nova? I set that aside fornow, and focussed on the output; it looked like (from output.c) there were five output variables measured, so Iattached graph components to those:

Figure 4.2

Challenge 4: Flows

I knew that every variable was important to the calculation in some way; the next challenge was to porethrough the code and discover what each variables interaction with the input, and each other, was. I began in amethodical manner; reading through the C code top to bottom, focussing on each expression one by one andimplementing it as a flow in the diagram. I quickly came to the conclusion that doing it in this manner only madesense using Novas discrete mode; these stocks should all be sequences instead, as the way the C code wasimplemented was as a grand loop every time step; where a discrete operation on each variable occurred eachiteration, rather than an expression describing the relationship between time and change, as was demonstrated in

the LotkaVolterra model. So as I went, I changed the type of each Stock to a sequence (which changes it to purple,as can be seen below. I also added a placeholder for the get_f() nested function, which I reasoned could beimplemented later.


16/20

Justin Dee 16

Figure 4.3

At this point, the model is beginning to take shape. I have all of the dependencies implemented for some of theoutput variables, like fex, for example. Here, I felt the time was right to begin testing the model, to see what sort ofoutput it gave. I wasnt expecting anything near correct, as I had not yet finished the method I had started ofimplementing each expression one by one. Nevertheless, this is where I butted heads with the input problem again,and it had me well and truly stumped. I had a CSV file with over 3000 records, but there seemed no way to importthis into Nova, no version of a Term that allowed one to reference a file, or even have multiple values. Based onwhat I already knew, the only solution I could come up with was to manually create 3000 terms, for each inputcolumn, which would have taken an excessive amount of time and screen space; clearly not feasible. I felt at thisstage that I had discovered a major flaw with Nova.

Challenge 5: Inputting data sets

After some thought, and research, and meeting with my supervisors, I hypothesised three paths of action: One,to find some way of importing the CSV file through NovaScript, since I expected JavaScript to have IO capabilities.Two, to programmatically convert the CSV file in some manner to a NovaScript array and paste it into the Lambdaview of the model. Three, to seek the Plugin API for Nova, which is referenced in its documentation, and write aplugin that would allow me to read in a file and access it through a special component. The Nova application itselfis written in Java, a language I would have been confident writing such a plugin in.

However, before going ahead with any of these possibilities, I emailed the developers (viahttp://www.novamodeler.com/contact/), detailing my problem. I was happy to receive a rapid response from Dr.Andy Lyons, who said hed pass it on to their programming team. The next day, a new stable build of Nova waspublished (http://www.novamodeler.com/blog-release4/), the release notes of which hinted at a new data typecalled a Run, which sounded like it could totally solve my problem. It was again not long until I heard from the

Nova team, this time Professor Richard Salter, who detailed a general solution to my problem using the newfunctionality added in this release. It required splitting my CSV file into three separate files, but beyond that wasnot complicated:


17/20

Justin Dee 17

Figure 4.4

As shown above, the Term rain_i that I had created as a placeholder was changed to instead refer to the index

of raindata equal to the current time step. Raindata itself was created in the program window, simply by creatinga variable definition and invoking the new newRunData() function, which does all of the background work ofloading the values from the comma-separated file.

Challenge 6: Obtaining meaningful output

With my data input functioning, and most of the variable calculations attributed, I was beginning to worry thatmy model refused to give any meaningful output; all output graphs consistently showed values of zero across thewhole period. I spent a long time trying to hunt down bugs in my code, areas where I though problems could be,but increasingly, I found that with so many variables and interrelations, it was very hard to keep an overview ofwhat everything was doing. This was hampered by a poor understanding of the models structure at the beginningof my implementation.

Reflections on my methodSimply put, I do not think I approached the implementation of TopModel from the correct perspective. Partially,

this is due to my coming at it from a coding angle, rather than a modelling one, for instance, attempting to relayevery single expression in the C code to an expression in the Nova. Had I the time now, Id start over with a freshapproach. Its clear, I think, that many of the variables represent a quantity of water in some way, and theexpressions and flows are indications of water changing state as the models develops. If I had a clearer view ofexactly what is happening here, I feel I could much more easily represent this with the system of Stocks and FlowsNova provides, rather than treats the stocks as variables to which the same expressions could be applied as theywere in the original C. My hypothesis in the beginning of this implementation chapter was that implementing itfrom a code-first perspective would result in essentially the same model as an idea-first system, whereas I thinkthis tangled mess I have ended up with shows that hypothesis is demonstrably false.

On the other hand, I do believe the problem I had importing data into the model was a sincere one, backed upthe developers need to release new functionality in order to accommodate it! So in some ways, I do think Isucceeded in challenging Nova to some of the real problems it would face being used in a practical environmentalscience project.


18/20

Justin Dee 18

Chapter 5 Conclusion

Was Nova itself an inherently poor choice for the implementation of TopModel?

Without a successful implementation of TopModel to demonstrate, the question must be posed; if Nova cannotbe used to implement a well-known and important model, is it inherently a failure as a platform for modelling ingeneral? The answer to this is no. I feel that my failure to get results from my TopModel implementation must stillbe due to an error in my code, or a section I incorrectly implemented somewhere. Had I the opportunity to beginafresh, or, if I was instead approaching it as someone who understood the underlying assumptions and structure of

the model, I do believe Nova would have been able to provide a suitable environment for TopModel. And as such, Ido feel that Nova can be regarded as a strong candidate for a choice of modelling platform.

Does Nova meet the criteria laid out above?

In Chapter 1, I outlined three primary criteria against which I would analyse Nova, and three secondary criteriawhich are important for the domain-specific use of Nova, that being, for educational and research use in theenvironmental sciences. I shall repeat them again here:

Usability. Functionality. Simplicity. Cost. Speed. Adaptability for Cloud networks.

Through the process of implementing TopModel, I feel I can comment well on all of these criteria; in thatrespect, the project has been a great success.

First I will address the simple secondary criteria, the important primary conclusions.

Cost

Nova is free, for educational and research uses. As shown in chapter 2, this is significantly less than its majorgraphical competitors, and no more expensive than more technically difficult solutions like C or R.

SpeedI generally found Nova to be fast an efficient. With the Lotka-Volterra model, it could render a hundred time

steps fast enough that you could use the slider functionality to see changes in results almost live, and withTopModel, it only took around ten seconds to process over three thousand records. Maybe with more complexrelationships this would slow to unfeasible times, but I do not think that it is any slower than a native code basedsolution.

Cloud access

While Nova has no native cloud functionality, it is itself a Java application, and one that parses a script to createa runnable module. I am sure that with the right Java middleware a Nova interpreter could be created that takesadvantage of cloud parallel processing.

UsabilityOn a basic level, Nova is very easy to use. The ability to drop components down and connect them to createsimple relationships is intuitive, and the visual presentation of the model leads itself well to a modelling platform.Moreover, as I found while experimenting with the LotkaVolterra model, it is a big advantage to be able to changeparameters and instantly see them update on a graph. Very little of the core functionality is obscured, and withonly a selection of very simple components, complex systems, like the framework I implemented for TopModel,can be quickly created. The Stock and Flow representation of states is not unique to Nova (it is also featured aspart of the primary competitors I mentioned in chapter 2, but I find it an excellent choice for a modelling platformlike this.

Additionally, the simple method of converting terms to pins in order to drop one model into another as asubmodel is ingenious, and opens up many solutions that would otherwise be unfeasible in this platform; like, for

example, my initial plan for the final stages of my TopModel implementation; would it be that I had to copy andpaste the whole submodel I was creating for every index class, I would not have looked at Nova at all.


19/20

Justin Dee 19

Functionality

Here Nova had some problems; notably, the inability to use a large data set as input values. The fact that thedevelopers released a patch supporting this functionality soon after I contacted them about it must be considereda massive indication in their favour, and I am very grateful for the work they did. This also demonstrates that Novais a tool still under constant development, which means that any further functionality which is missing, mightpossibly be upcoming. In practical terms, it wasnt the lack of functionality which prevented me gaining any resultsfrom my model, but rather, an incorrect approach to the creation of the model which meant that maintaining it wasvery hard.

SimplicityUnfortunately, this is the big point under which Nova fails, for me. I felt that, while simple operations andsimple models were just so easy to create, intuitively going beyond a certain point was very hard, semantically.Much can be done with NovaScript to create interesting and exciting functionality within models, but in essence itdoes still require knowledge of a programming language to do so (in this case, JavaScript). The solutions for theproblem I face regards data input involved some JavaScript. Doing much beyond basic maths inside expressionsrequired knowledge (or access to documentation for) JavaScript functions. Now, which these are all readilyavailable, Im not sure it really comes across as an advantage over just using a pure language; that being the ideabehind replacement platforms like Nova, to replace more general solutions.

In conclusion, it did not take me very long to pick up the basics of Nova, but the grander scale of TopModel andthe grasp over the platform I needed to have a better chance at a successful implementation of it required, I think,

a longer learning curve. Looking at some of the more complex models available for Nova, I can see that this extendseven further; in essence, while Nova has a very simple front-end functionality (ideal for educational uses), a lot ofits power that makes it into a research-level tool comes from the powerful back-end, which unfortunately, I cantsee as being much easier than learning an existing domain-specific language like R.

That said, there are still some major advantages it has to offer; it is simpler to view models in Nova. Even withthe result I obtained with TopModel, I could, simply by looking at it, gain a more intuitive understanding of whatwas going on than with the C code. This was also true with the LotkaVolterra model; two equations which mightnot mean very much to someone without an understanding of calculus, are represented as two very simple flowsof population, flowing in, then flowing out, with clear arrows to indicate dependencies.

Additionally, I feel it is a lot easier to change a Nova model, to take it and deconstruct it. The powerfulsubmodels feature means chips can effectively act as black boxes, where the contents can be changed

independently of the main model. Comparing this to the C code for TopModel, where even the individual methodslike get_f() referred to globally stored variables, and changing out any one section would have required extensiveexamination of the consequences through the whole code.

These strengths lead me to recommend Nova as a great tool for implementing a model in from scratch. Myexperience with TopModel has taught me a lot about understanding where youre coming from before you can dosomething with a model, and Id definitely NOT recommend using any graphical language for a conversion ofprocedural code.

Future Developments

Overall, I feel that Nova is a strong candidate for use within the computational sciences. However, I can onlyconclude that as a tool still under development, too much of the functionality is hidden behind NovaScript, which,

while strong, has no definite advantages over existing solutions aside from existing inside the Nova framework. Iwas impressed to be presented with a solution to missing functionality so soon after discovering it, and can onlyhope that the Nova developers continue to bring that functionality forward to the intuitive graphical interface thatis Novas forte.

I began this report by discussing the problem of environmental scientists having to learn procedural languagesin order to use models with their work. I feel I am ending it here with the understanding that, moving forward, ifNova continues to garner interest and support (like, for example, with a recent Google Tech Talk on it:http://www.novamodeler.com/pres/20130510_google-tech-talk/), then while it may not obviate entirely theneed for new environmental scientists to learn a more formal language eventually (primarily because so muchexisting content is there in R, C, Matlab, and so on), Nova is a perfect intermediate tool for scientists wanting aprimary platform to work with on their own new projects.


20/20

Bibliography

http://www.novamodeler.com/

including:

Nova: An Interactive Graphics-Scripting Platform for Education and ComputationalResearch by W. Getz, from Nova, A Google Tech Talk,Mountain View, CA,May 10, 2013

Nova: A modern platform for system dynamics, spatial, and agent-based modellingbyRichard M. Salter, International Conference on Computational Science, ICCS 2013

http://cran.r-project.org/web/packages/topmodel/topmodel.pdf

A dynamic TOPMODEL, K. Beven & J. Freer, Hydrological Processes 2001

A physically based, variable contributing area model of basin hydrology, K. Beven & J.Kirkby, Hydrological Sciences 1979

A Cloud-based Virtual Observatory for Environmental Science, by Gordon S Blair &Yehia El-khatib, OpenWater symposium 2011/4/19

Building a Cloud Infrastructure for a Virtual Environmental Observatory, by YehiaElkhatib et al, The American Geophysical Union (AGU) Fall Meeting, 2012/12

AcknowledgementsMany thanks to my project supervisors, Prof. Gordon Blair and Dr. Yehia El-Khatib of Lancaster University, and

also to Prof. Richard Salter and Dr. Andy Lyons for their assistance.

Working Documents can be found athttp://www.lancaster.ac.uk/ug/deej1/fyp2/