copyright · langin, chet southern illinois university usa ... site) through the proactive support...
TRANSCRIPT
COPYRIGHT
Copyright and Reprint Permission: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy for private use. Instructors are permitted to photocopy, for private use, isolated articles for non-commercial classroom use without fee. For other copies, reprint, or republication permission, write to IIIS Copyright Manager, 13750 West Colonial Dr Suite 350 – 408, Winter Garden, Florida 34787, U.S.A. All rights reserved. Copyright 2013. © by the International Institute of Informatics and Systemics. The papers of this book comprise the proceedings of the conference mentioned on the title and the cover page. They reflect the authors’ opinions and, with the purpose of timely disseminations, are published as presented and without change. Their inclusion in these proceedings does no necessarily constitute endorsement by the editors. ISBN: 978-1-936338-96-2
International Conference on Complexity, Cybernetics, and Informing Science and Engineering: CCISE 2013
ADDITIONAL REVIEWERS (Reviewers who contributed reviewing at least one paper)
Affenzeller, Michael Heuristic and Evolutionary Algorithms Laboratory Austria Anuar, Nor Badrul University of Malaya Malaysia Arteaga Bejarano, José R. University of the Andes Colombia Aveledo, Marianella Simon Bolivar University Venezuela Bangert, Patrick Algorithmica Technologies USA Bangyal, Waqas Iqra University Islamabad Pakistan Bönke, Dietmar Reutlingen University Germany Bulegon, Ana Marli Federal University of Rio Grande do Sul Brazil Carnes, Patrick Kirtland Air Force Base USA Caron-Pargue, Josiane University of Poitiers France Chen, Jingchao DongHua University China Chen, Zengqiang Nankai University China Cheng, Zhengdong Texas A&M University USA Cho, Vincent Hong Kong Polytechnic University Hong Kong Coffman, Michael G. Southern Illinois University Carbondale USA Cunha Lima, Guilherme Rio de Janeiro State University Brazil Darsey, Jerry A. University of Arkansas at Little Rock USA Debono, Carl James University of Malta Malta Demestichas, K. National Technical University of Athens Greece El Kashlan, Ahmed Arab Academy for Science and Technology Egypt Fallah, M. Hosein Stevens Institute of Technology USA Fikret Ercan, M. Singapore Polytechnic Singapore González Soriano, Juncal University Complutense of Madrid Spain Gonzalo-Ruiz, Alicia University of Valladolid Spain Gosukonda, Ramana Fort Valley State University USA Gotoh, Noriko University of Tokyo Japan Grasso, Giovanni University of Palermo Italy Grau, Juan B. Technical University of Madrid Spain Hanakawa, Noriko Hannan University Japan Hasenclever B., Carlos C. National Laboratory for Scientific Computation Brazil Hawke, Gary Victoria University of Wellington New Zealand Hespel, Christiane National Institute of Applied Sciences of Rennes France Hudson, Clemente Charles Valdosta State University USA Jia, Lei New York University USA Jinwala, Deveshkumar Sardar Vallabhbhai National Institute of Technology India Jirina, Marcel Academy of Sciences of the Czech Republic Czech Republic Johnson, Mark Army Research USA Jong, Din Chung Hwa University of Medical Technology Taiwan Kaivo-Oja, Jari Turku School of Economics Finland Karpukhin, Oleksandr Kharkiv National University of Radio and Electronics Ukraine Kasapoglu, Ercin Hacettepe University Turkey Kess, Pekka University of Oulu Finland
Lalchandani, Jayprakash Indian Institute of Technology Kharagpur India Langin, Chet Southern Illinois University USA Lau, Newman Hong Kong Polytechnic University Hong Kong Lunsford, Suzanne Wright State University USA Matsuda, Michiko Kanagawa Institute of Technology Japan Matsuno, Akira Teikyo University Japan McGowan, Alan H. Eugene Lang College the New School for Liberal Arts USA McIlvried, Howard G. National Energy Technology Laboratory USA Minoro Abe, Jair Paulista University Brazil Mussoi, Eunice Maria Universidade Federal do Rio Grande do Sul Brazil Neyra Belderrain, Mischel Instituto Tecnologico de Aeronautica Brazil Normand, Alain Brampton Flower City Canada Ostrowski, David Ford Motor Company USA Parker, Brenda C. Middle Tennessee State University USA Rajan, Amala V. S. Higher Colleges of Technology UAE Rodríguez, Mª Dolores University of Alcala Spain Rodríguez-M., Antonio Autonomous University of the State of Morelos Mexico Rosete, Juan Technological Institute of Queretaro Mexico Rutherfoord, Rebecca H. Southern Polytechnic State University USA Safia, Nait Bahloul University of Oran Algeria Sathyamoorthy, Dinesh Science Malaysia Savva, Andreas University of Nicosia Cyprus Schumacher, Jens University of Applied Sciences Vorarlberg Austria Segall, Richard S. Arkansas State University USA Shing, Chen-Chi Radford University USA Siemieniuch, Carys Loughborough University UK Stasytyte, Viktorija Vilnius Gediminas Technical University Lithuania Su, J. L. Shanghai University China Sun, Baolin Wuhan University China Tam, Wing K. Swinburne University of Technology Australia Woodthorpe, John The Open University UK Zeidman, Robert Zeidman Consulting USA Zmazek, Blaž IMFM Slovenia Zyubin, Vladimir Institute of Automation and Electrometry Russian Federation
International Conference on Complexity, Cybernetics, and Informing Science and Engineering: CCISE 2013
ADDITIONAL REVIEWERS FOR THE NON-BLIND REVIEWING
(Reviewers who contributed reviewing at least one paper)
Acharya, Sushil Robert Morris University USA Ahmed, Mahmoud National Authority for Remote Sensing Egypt Alhayyan, Khalid N. University of South Florida USA Andersson, Jonas Chalmers University of Technology Sweden Arabnia, Hamid R. University of Georgia USA Behr, Franz-Josef Stuttgart University of Applied Sciences Germany Beukes, Denzil R. Rhodes University South Africa Bots, Jan Nyenrode Netherlands Dobronravin, Nicolay St. Petersburg State University Russian Federation Dodig-Crnkovic, Gordana Malardalen University Sweden Effat, Hala National Authority for Remote Sensing and Space Sciences Egypt Erkollar, Alptekin University of Applied Sciences Wiener Neustadt Austria Feng, Yaokai Kyushu University Japan Foster, Harold The University of Akron USA Gallerano, Gianpiero ENEA Italy Gómez Santillán, Claudia Instituto Tecnológico de Ciudad Madero Mexico Hegazy, Mohamed Nagib National Authority for Remote Sensing and Space Sciences East Timor Jurik, Lubos Slovak Agricultural University in Nitra Slovakia Knoll, Matthias Darmstadt University of Applied Sciences Germany Laarni, Jari Technical Research Centre of Finland Finland Landero Nájera, Vanesa Universidad Politécnica de Apodaca Mexico Laux, Friedrich Reutlingen University Germany Niewiadomska-S., Ewa Warsaw University of Technology Poland Nikolic, Hrvoje Rudjer Boskovic Institute Croatia Otterstad, Ann Merete Oslo Akershus University College Norway Ramírez-Díaz, Humberto CICATA Mexico Rehak, Stefan Water Research Institute Slovakia Reis, Arsénio Universidade de Trás-os-Montes e Alto Douro Portugal Samant, Bhupesh Rhodes University South Africa Shoham, Snunith Bar-Ilan University Israel Simeonov, Plamen I. INBIOSA Germany Skrinar, Andrej Faculty of Civil Engineering Slovakia Smith, Debbie Poland High School USA Soshnikov, Dmitry Microsoft Russia Russian Federation Sowilem, Mohamed National Authority for Remote Sensing and Space Science Egypt Strikwerda, Johannes University of Amsterdam Netherlands Usmanov, Zafar D. Tajik Academy of Sciences Tajikistan Wolfengagen, Vyacheslav Institute for Contemporary Education JurInfoR-MSU Russian Federation Wu, Yingjie Fuzhou University China Zimmermann, Alfred Reutlingen University Germany
Foreword Complexity, Cybernetics, and Informing Science/Engineering are increasingly being related on the conceptual, methodological, and practical dimensions. T. Grandon Gill’s book (Informing Business: Research and Education on a Tugged Landscape) shows the strong and important relationships between Complexity and Informing Science (specifically academic informing), and the potentiality of these relationships in supporting the integration of academic activities: Research, Education, and Consulting or real Life Problem Solving. On the other hand, the concepts and tools of Cybernetics (Communication and Control) are providing an increasingly effective support for more adequate integrative processes in the context of Informing Science and Engineering, as well as in the context of relating academic activities, more effective and synergistically, among themselves and with professional practice and Society at large. The following diagram schematizes the reciprocal relationships among Complexity, Cybernetics, and Informing Science/Engineering; which, in turn, are supported by Informatics and Communications/Control technologies and tools.
References Ershov, A.P., 1959, "Academician A.I. Berg on cybernetics and the perestroika in 1959", Microprocessor devices and systems, 1987, No. 3, p. 3. (In Russian); quoted by Ya. Fet in the foreword of “The History of Cybernetics, edited by Ya. Fet, - Novosibirsk: "Geo" Academic Publishers, 2006. - 301 pp. - (In Russian). Accessed on September 14th, 2009 at http://www.ithistory.org/resources/russia-from-the-history.pdf Gill, T. G., 2010, Informing Business: Research and Education on a Rugged Landscape, Santa Rosa, California: Informing Science Press Hoefler, M. 2002, International Informatics Society Launched in Santa Fe; accessed on August 16th, 2009 at http://www.lascruces.com/~rfrye/complexica/d/IIS%20Launch%20PR.doc Michlmayr, E., 2007, Ant Algorithms for Self-Organization in Social Networks; Ph. D. Thesis Submitted to the Vienna University of Technology Faculty of Informatics, on May 14th, 2007; accessed on August 16th, 2009 at http://wit.tuwien.ac.at/people/michlmayr/publications/dissertation_elke_michlmayr_FINAL.pdf
Consequently, the purpose of the Organizing Committee of the International Conference on Complexity, Cybernetics, and Informing Science and Engineering: CCISE 2013 was to bring together scholars and professionals from the three fields (including scholars/professionals in their supporting tools and technologies), in order to promote and foster inter-disciplinary communication and interactions among them; oriented to foster the formation of the intellectual humus required for inter-disciplinary synergies, inter-domain cross-fertilization, and the production of creative analogies. There are many good disciplinary, specific and focused conferences in any one of the major themes of CCISE 2013. There are also good general conferences, which have a wider scope and are more comprehensive. Each one of these kinds of conferences has its typical audience. CCISE 2013 Organizing Committee purpose was to bring together both kinds of audiences, so participants with a disciplinary and focused research would be able to interact with participants from other related disciplines for interdisciplinary communication and potential inter-disciplinary collaborative research. CCISE 2013 was organized in the context of the larger event “InSITE 2013: Informing Science + IT Education Conferences” organized by the Informing Science Institute in collaboration with Universidade Fernando Pessoa (UFP) in Porto, Portugal (a UNESCO designated World Heritage Site) through the proactive support of the conference Chairs UFP Rector Salvato Trigo and Associate Professor Luis Borges Gouveia. The venue of the conference was the campus of Universidade Fernando Pessoa. The organizing Committee received 33 submissions to be considered for their presentation in the conference. 92 reviewers from 35 countries evaluated and commented the submissions according to the traditional double-blind method, and 40 reviewers, from 21 countries, evaluated and commented submissions according a non-anonymous reviewing method. Submissions were accepted if, and only if, they were recommended to be accepted by the majority of the reviewers of both methods. To be accepted in each method was a necessary condition but not a sufficient one. All submissions had to be accepted as a result of each of both methods. A total of 224 reviews were made with an average of 1.7 reviews per reviewer and 6.79 reviews per submission. These proceedings include 13 accepted papers which is 39.39% of the number of the submitted articles. The following table resumes the numbers that we included in this section.
# of submissions
received
# of reviewers that made at
least one review
# of reviews made
Average of reviews per
reviewer
Average of reviews per submission
# of papers included in the proceedings
% of submissions included in the proceedings
33 132 224 1.70 6.79 13 39.39%
We would like to extend our gratitude to: 1. The program Committee’s members who supported the quality of these conference by means
of their quality as scholars/researchers and their support.
2. The 132 reviewers from 46 countries who supported the organizers in the selection process, by means of their evaluations and recommendations, and the authors by means of the constructive comments they made to the respective articles they reviewed.
3. The co-editors of these proceedings, for the work, energy and eagerness they displayed in
their respective activities.
4. Professor T. Grandon Gill for Chairing the Program Committee and for delivering a great plenary keynote address to the audience of all collocated conferences.
5. Professors Paulo Fonseca Matos Silva Ramos, Luis Borges-Gouveia, and Linda V. Knight for their keynote addresses.
6. Dr. Eli Cohen as General Co-Chair of CCISE 2013 who conceived and, with Betty Boyd, made possible the collocation of CCISE 2013 in the context of the main event by means of thinking and implementing the necessary adaptation between the main conference and CCISE 2013.
7. Betty Boyd for contributing with the design and implementation of the required organizational adaptation for this joint event.
8. Belkis Sánchez Callaos for chairing the Organizing Committee and for co-implementing the required organizational adaptation.
9. María Sánchez, Dalia Sánchez, Keyla Guedez, and Marcela Briceño, for their knowledgeable
effort in supporting the organizational process and for producing these proceedings.
Dr. Nagib C. Callaos, CCISE 2013 General Co-Chair
i
CCISE 2013 International Conference on Complexity, Cybernetics, and Informing Science and Engineering
CONTENTS
Contents i
Ammann, Eckhard (Germany): ''Knowledge Development Taxonomy and Application Scenarios'' 1
Błaszczyk, Jacek *; Malinowski, Krzysztof *; Allidina, Alnoor ** (* Poland, ** Canada): ''Optimal Pump Scheduling by Non-Linear Programming for Large Scale Water Transmission System''
7
Balvetti, R.; Botticelli, A.; Bargellini, M. L.; Battaglia, M.; Casadei, G.; Filippini, A.; Pancotti, E.; Puccia, L.; Zampetti, C.; Bozanceff, G.; Brunetti, G.; Guidoni, A.; Rubini, L.; Tripodo, A. (Italy): ''Towards the Construction of a Cybernetic Organism: The Place of Mental Processes''
13
Braseth, Alf Ove; Øritsland, Trond Are (Norway): ''Seeing the Big Picture: Principles for Dynamic Process Data Visualization on Large Screen Displays'' 16
Djuraev, Simha; Yitzhaki, Moshe (Israel): ''Factors Associated with Digital Readiness in Rural Communities in Israel'' 22
Koolma, Hendrik M. (Netherlands): ''Information and Adaptation in a Public Service Sector: The Example of the Dutch Public Housing Sector'' 25
Monat, André S.; Befort, Marcel (Germany): ''The Usage of ISOTYPE Charts in Business Intelligence Reports - The Impact of Otto Neurath Work in Visualizing the Results of Information Systems Queries''
31
Normantas, Vilius (Tajikistan): ''Statistical Properties of Ordered Alphabetical Coding'' 37
Schroeder, Marcin J. (Japan): ''The Complexity of Complexity: Structural vs. Quantitative Approach'' 41
Turrubiates-López, Tania; Schaeffer, Satu Elisa (Mexico): ''Studying the Effects of Instance Structure in Algorithm Performance'' 47
Yukech, Christine M. (USA): ''Paradigm Shifting through Socio-Ecological Inquiry: Interdisciplinary Topics & Global Field Study Research'' 53
Zvonnikov, Victor; Chelyshkova, Marina (Russian Federation): ''The Optimization of Formative and Summative Assessment by Adaptive Testing and Zones of Students Development''
58
ii
Zykov, Sergey V. (Russian Federation): ''Pattern-Based Enterprise Systems: Models, Tools and Practices'' 62
Authors Index 69
Knowledge Development Taxonomy and Application Scenarios
Eckhard Ammann School of Informatics, Reutlingen University
72762 Reutlingen, Germany
ABSTRACT Knowledge development in an enterprise is about approaches, methods, techniques and tools that will support the advancement of individual and organizational knowledge for the purpose of an improvement of businesses. A modeling basis for knowledge development is provided with a new conception of knowledge and of knowledge conversions, which introduces three dimensions of knowledge and general conversions between knowledge assets. This modeling basis guides the definition of a taxonomy of knowledge development scenarios. In this taxonomy, constructive and analytic scenarios are distinguished as main categories and subsequently refined into more specific ones. In order to indicate the usefulness of this taxonomy, example implementations of two knowledge development scenarios are briefly outlined: a modeling notation for knowledge-intensive business processes as a constructive scenario and a rule-processing system based on an knowledge ontology as an analytic scenario. Keywords: Knowledge Development, Taxonomy, Application Scenarios, Constructive and Analytic Scenarios, Knowledge-Intensive Business Processes, Semantic Knowledge Development.
1. INTRODUCTION Knowledge development in an enterprise is about approaches, methods, techniques and tools that will support the advancement of knowledge for the purpose of an improvement of businesses. This notion includes as well individual knowledge as group and organizational knowledge. It can be seen as integral part of knowledge management, see [1], [9] and [11] for a description of several existing approaches for knowledge management. While the management aspect of knowledge management seems to be rather well understood and practiced in many companies [11], there is no common concept and understanding of knowledge and of knowledge development as basis of it. In this paper we investigate and classify possible application scenarios for knowledge development. This leads to a taxonomy of knowledge development scenarios. This taxonomy is based on a new conception of knowledge and knowledge development, which is shortly described in this paper (see [2] for a complete description). The conception of knowledge is represented by a knowledge cube, a three-dimensional model of knowledge with types, kinds and qualities. Using this conception we introduce general knowledge conversions between the various knowledge variants as a model for knowledge dynamics and development in the enterprise. First a basic set of such conversions is defined,
which extends the set of the four conversions of the well-known SECI-model [12]. Building on this set, general knowledge conversions can be defined, which reflect knowledge transfers and development more realistically and do not suffer from the restrictions of the SECI-model. Built on this conception, application scenarios for knowledge development are classified. Application scenarios are understood as typical processes, which lead to an advancement of individual and organizational knowledge in the enterprise. Two main categories of application scenarios are identified: constructive and analytic scenarios. Constructive scenarios build knowledge development processes. For example, knowledge dynamics in knowledge-intensive business processes can be modeled. Analytic scenarios can be represented by general nets of general knowledge conversions, which are introduced in this paper. They are characterized by gaps, i.e., by unknown knowledge or conversion parts in these nets. Important knowledge development requirements in an enterprise can be covered by analytic scenarios. Assume for example, that the knowledge requirements for a project are known as well as the learning options in the company. From that, one would try to identify minimal knowledge requirements for a new employee, which should work in the project and should be able to fulfil the requirements of this scenario at least after some learning efforts. At least for simple cases, analytic scenarios can be supported by a rule-processing system based on a knowledge ontology, which has been built as representation of our knowledge and knowledge dynamics concept. A set of corresponding rules for addressing these scenarios and their representations has been developed. Therefore, possible solutions for those scenarios, i.e. filling the gaps in the scenarios, can be gained. The structure of the paper is as follows. After an introduction and section II on related work, the two sections III and IV will introduce the knowledge conception and general knowledge conversions between knowledge and information assets, respectively. Section V discusses knowledge development scenarios and presents a taxonomy of these scenarios, while section VI outlines example implementation of the two main scenario categories. Finally, section VII summarizes and concludes the paper.
2. RELATED WORK One specific approach for enterprise knowledge development is EKD (Enterprise Knowledge Development), which aims at articulating, modeling and reasoning about knowledge, and which supports the process of analyzing, planning, designing, and changing your business; see [8] and [5] for a description of
1
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
EKD. EKD does not provide a conceptual description of knowledge and knowledge development. For the conception part, there exists one well-known approach by Nonaka/Takeuchi [12], which is built on the distinction between tacit and explicit knowledge and on four knowledge conversions between the knowledge types (SECI-model). However, many discussions exist, whether to interpret the explicit knowledge part as still bound to the human being, or as already detached from him. Also the linear spiral model of knowledge development has turned out to be limiting. Another important work is the introduction of the type/quality dimensions of knowledge in [7]. Finally, important distinctions of implicit knowledge are given in [10].
3. CONCEPTION OF KNOWLEDGE General Understanding of Knowledge In this section we briefly provide a conception of knowledge, and of knowledge types, kinds and qualities. More details can be found in [2]. As our base notion, knowledge is understood as justified true belief, which is (normally) bound to the human being, with a dimension of purpose and intent, identifying patterns in its validity scope, brought to bear in action and with a generative capability of new information, see [7], [8] and [12]. It is a perspective of “knowledge-in-use” [7] because of the importance for its utilisation in companies and for knowledge management. In contrast, information is understood as data in relation with a semantic dimension, but is lacking the pragmatic and pattern-oriented dimension, which characterises knowledge. We distinguish three main dimensions of knowledge, namely types, kinds and qualities, and describe those in the following three sub-sections. The whole picture leads to the three-dimensional knowledge cube, which is introduced at the end of this section. Type Dimension of Knowledge The type dimension is the most important for knowledge management in a company. It categorizes knowledge according to its presence and availability. Is it only available for the owning human being, or can it be communicated, applied or transferred to the outside, or is it externally available in the company’s organisational memory, detached from the individual human being? It is crucial for the purposes of the company, and hence a main goal of knowledge management activities, to make as much as possible knowledge available, i.e. let it be converted from internal to more external types of knowledge. Our conception for the type dimension of knowledge follows a distinction between the internal and external knowledge types, seen from the perspective of the human being. As third and intermediary type, explicit knowledge is seen as an interface for human interaction and for the purpose of knowledge externalisation, the latter one ending up in external knowledge. Internal (or implicit) knowledge is bound to the human being. It is all that, what a person has “in its brain” due to experience, history, activities and learning. Explicit knowledge is “made explicit” to the outside world e.g. through spoken language, but is still bound to the human being. External knowledge finally is detached from the human being and may be kept in appropriate storage media as part of the organisational memory. Fig. 1 depicts the different knowledge types.
Fig. 1 Conception of knowledge types Internal knowledge can be further divided into tacit, latent and conscious knowledge, where those subtypes partly overlap with each other, see [10]. Conscious knowledge is conscious and intentional, is cognitively available and may be made explicit easily. Latent knowledge has been typically learning as a by-product and is not available consciously. It may be made explicit, for example in situations, which are similar to the original learning situation, however. Tacit knowledge is built up through experiences and (cultural) socialisation situations, is specific in its context and based on intuition and perception. Statements like “I don’t know, that I know it” and “I know more, than I am able to tell” (adapted from Polanyi [13]) characterise it. Kind Dimension of Knowledge In the second dimension of knowledge, four kinds of knowledge are distinguished: propositional, procedural and strategic knowledge, and familiarity. It resembles to a certain degree the type dimension as described in [7]. Propositional knowledge is knowledge about content, facts in a domain, semantic interrelationship and theories. Experience, practical knowledge and the knowledge on “how-to-do” constitute procedural knowledge. Strategic knowledge is meta-cognitive knowledge on optimal strategies for structuring a problem-solving approach. Finally, familiarity is acquaintance with certain situations and environments, it also resembles aspects of situational knowledge, i.e. knowledge about situations, which typically appear in particular domains [7]. Quality Dimension of Knowledge The quality dimension introduces five characteristics of knowledge with an appropriate qualifying and is independent of the kind dimension, see [7]. The level characteristics aims at overview vs. deep knowledge, structure distinguishes isolated from structured knowledge. The automation characteristic of knowledge can be step-by-step-doing by a beginner in a domain of work or automated fast acting by an expert. All these qualities measure along an axis and can be subject to knowledge conversions (see section III). Modality as the fourth quality asks for the representation of knowledge, be it words versus pictures in situational knowledge kinds, or propositions versus pictures in procedural knowledge kinds. Finally, generality differentiates general versus domain-specific knowledge. Knowledge qualities apply to each knowledge asset.
2
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Fig. 2 The knowledge cube
The Knowledge Cube Bringing all three dimension of knowledge together, we gain an overall picture of our knowledge conception. It can be represented by the knowledge cube, as is shown in Fig. 2. Note, that the dimensions in the knowledge cube behave different. In the type and kind dimensions, the categories are mostly distinctive (with the mentioned exception in the sub-types), while in the quality dimension each of the given five characteristics are always present for each knowledge asset.
4. KNOWLEDGE CONVERSIONS In this section we give a conception of knowledge conversions. The transitions between the different knowledge types, kind and qualities are responsible to a high degree for knowledge development in an organisation. More details can be found in [2]. Most important for knowledge management purposes are conversions between the knowledge types and they will be the focus in the following. Among those, especially those conversions making individual and internal knowledge of employees usable for a company, are crucial for knowledge management. The explicitation and externalisation conversions described in this section achieve this. Implicitly socialisations between tacit knowledge of different people also may contribute to this goal. Conversions in the kind dimension of knowledge are seldom, normally the kind dimension of knowledge remains unchanged in a knowledge conversion changing the type dimension. Those in the quality dimension are mostly knowledge developments aiming at quality improvement and will not change the type and kind dimensions of the involved knowledge assets. Five basic knowledge conversions (in the type dimension) are distinguished here: socialisation, explicitation, externalisation, internalisation and combination. Basic conversion means, that exactly one source knowledge asset is converted into exactly one destination knowledge asset and that only one knowledge dimension is changed during this conversion. More complex conversions may be easily gained by building on this set as described later in this section. They will consist of m-to-n-conversions and include information assets in addition. Socialisation converts tacit knowledge of one person into tacit knowledge of another person. For example, this succeeds by exchange of experience or in a learning-by-doing situation
under supervision of an experienced person. Explicitation is the internal process of a person, to make internal knowledge of the latent or conscious type explicit, e.g. by articulation and formulation (in the conscious knowledge type case) or by using metaphors, analogies and models (in the latent type case). Externalisation is a conversion from explicit knowledge to external knowledge or information and leads to detached knowledge as seen from the perspective of the human being, which can be kept in organisational memory systems. Internalisation converts either external or explicit knowledge into internal knowledge of the conscious or latent types. It leads to an integration of experiences and competences in your own mental model. Finally, combination combines existing explicit or external knowledge in new forms. These five basic knowledge conversions are shown in Fig. 3. As generalisation of basic knowledge conversions, general knowledge conversions are modeled converting several source assets (possibly of different types, kinds and quality) to several destination assets (also possibly different in their knowledge dimensions). In addition, information assets are considered as possible contributing or generated parts of general knowledge conversions. For example, in a supervised learning-by-doing situation seen as a complex knowledge conversion a new employee may extend his tacit and conscious knowledge by working on and extending an external knowledge asset in a general conversion, using and being assisted by the tacit and conscious knowledge of an experienced colleague. A piece of relevant information on the topic may also be available on the source side of the conversion. Here on the source side of the general conversion we have two tacit, two conscious and one external knowledge assets plus one information asset, while on the destination side one tacit, one explicit and one external knowledge asset (i.e. the resulted enriched external knowledge) arise. Completing this section, we shortly mention knowledge conversions in the quality dimension of knowledge. In three out of the five quality measures, basic conversions can be identified, which are working gradually. Those are, firstly, a deepening conversion, which converts overview knowledge into a deeper form of this knowledge. Secondly, there may be a structuring conversion performing improvement in the singular-versus-structure scale of the structural measure. Finally, conscious and step-by-step-applicable knowledge may convert into automated knowledge in a automation conversion, which describe a process from beginner to expert in a certain domain. The remaining two quality measures of knowledge, namely modality and generality, do not lend themselves to knowledge conversions. They just describe unchangeable knowledge qualities.
Fig. 3 Knowledge conversions in the type dimension
3
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
5. KNOWLEDGE DEVELOPMENT SCENARIOS In this section, application scenarios for knowledge development are classified. Application scenarios are understood as typical processes, which lead to an advancement of individual and organizational knowledge in the enterprise. Two main categories of application scenarios are identified: constructive and analytic scenarios. Both can be reduced to single or multiple general knowledge conversions. While constructive scenarios build knowledge development processes, analytic scenarios are characterized by gaps, i.e., by unknown knowledge or conversion parts in knowledge development nets. The two categories are described in the following two sub-sections. In sub-section C, a taxonomy of knowledge development scenarios will be provided and depicted in Fig.4. Constructive Scenarios Constructive scenarios build knowledge development processes. For example, knowledge dynamics in knowledge-intensive business processes can be modeled. The set of constructive scenarios includes (pure) knowledge development processes, with the advancement of knowledge as main and single goal. Furthermore normal business processes, which lead to knowledge development effects as a kind of “by-product”, for example, by making process participants more experienced for future process deployments. And finally knowledge-intensive business processes, where the advancement of knowledge is an integral part of the process, see our example of supervised learning-by-doing in section IV. Analytic Scenarios Analytic scenarios can be represented by general nets of general knowledge conversions, which have been introduced in section IV. They are characterized by gaps, i.e., by unknown knowledge or conversion parts in these nets. Important knowledge development requirements in an enterprise can be covered by analytic scenarios. Assume for example, that the knowledge requirements for a project are known as well as the learning options in the company. From that, one would try to identify minimal knowledge requirements for a new employee, which should work in the project and should be able to fulfil the requirements of this scenario at least after some learning efforts.
This scenario in fact is a simple scenario, a sub-category of analytic scenario, as explained below. Analytic scenarios can be specialized. Let us start from bottom. Basic scenarios are represented by exactly one basic knowledge conversion. For example, a socialization conversion will convert tacit knowledge of one employee to tacit knowledge of another. Basic scenarios are specialisations of simple scenarios, which can be described by single general knowledge conversions. The next higher level of generality is a sequential chain of general knowledge conversions. Here, as an example, a step-wise knowledge development process of an employee may be modeled, where in each step the appropriate new knowledge from others will come in and be utilized. Chains of simple scenarios are one important sub-category of the general nets, which establish the category of analytic scenarios. At least for simple cases, analytic scenarios can be supported by a rule-processing system based on a knowledge ontology, which has been built as representation of our knowledge and knowledge dynamics concept. A set of corresponding rules for addressing these scenarios and their representations has been developed. Therefore,possible solutions for those scenarios, i.e. filling the gaps in the scenarios, can be gained, see section VI for an example and [4] for a detailed description. Taxonomy of Knowledge Development Scenarios In this sub-section, the findings of the section are summarized and categorized in a taxonomy of knowledge development scenarios. This is a model-based taxonomy, because it relies heavily on the conceptual model of knowledge and knowledge development given in sections III and IV. Fig.4 depicts this taxonomy. 6. IMPLEMENTATION EXAMPLES OF KNOWLEDGE
DEVELOPMENT SCENARIOS Two implementation examples, one out of the two main scenario categories each, are decribed in this section. Example of a Constructive Scenario As an example of constructive scenarios, a modeling approach
Fig. 4 Taxonomy of knowledge development scenarios
4
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Fig. 5 Expanded process “Propose Product Idea”
for knowledge-intensive business processes with human interactions is described. It uses our knowledge development conception and represents a constructive knowledge development scenario. We introduce an integrated model for knowledge management, which covers task-driven, knowledge-driven and human-driven processes in an organisation. It is based on seven very general entities (Process, People, Topic, Implicit, Explicit and External Knowledge, and Document) and the various interconnections between them. The model covers process-oriented approaches, reflects the human role in various forms (as individuals, groups, or knowledge communities plus the interaction between those) and the various types of knowledge with their mutual conversions. It is an extension of the model in [1] and reflects the new knowledge conception. As notation for our model we propose an expressional extension of the Business Process Modeling Notation BPMN [6], which we call BPMN-KEC2 (KEC stands for knowledge, employees, and communities, 2 indicates the second version). BPMN is widely used for business process modeling, there exists a whole body of tools to support the visual modeling procedure, to integrate it in service-oriented architectures and to map models to execution environments for appropriate IT-support. For a detailed description of BPMN-KEC2 see [3]. The most important notational objects may be categorized as objects for knowledge and information, for knowledge conversions, for associations between knowledge and persons, and for persons. Knowledge objects are tagged with type/kind information according to the two knowledge dimensions as introduced in Section III. The quality dimension of knowledge is not reflected in this approach. Quality characteristics of knowledge assets may be implicitly denoted in the knowledge name if necessary. General knowledge conversions are denoted with an elliptical symbol. As an example, we model a business process for product renewal planning. The product is assumed to be knowledge-intensive and complex. The existing version of it should be possibly renewed by a new version. The overall process is
modeled as sequence of four activities in BPMN notation: Propose product idea, define product characteristics, plan product development and finally decide on renewal. Here we will focus on the first one, which is really knowledge-intensive and requires human interactions. The expansion of this process using the BPMN-KEC2 notation is shown in Fig. 5. The main human actors are the product manager responsible for the product in the company, a knowledge community named Expert Community, and finally a product strategist. The expanded sub-process relies on two knowledge conversions. Generate Product Idea is a general and complex knowledge conversion, Formulate Product Idea a basic externalisation conversion. The main origins for Generate Product Idea are on the one side explicit knowledge on new technologies (of the propositional knowledge kind), conscious knowledge on actual relevant research themes, both available in a knowledge community named Expert Community. On the other side, knowledge on market trends and the product position of the existing product in the market is available at the product manager as conscious and explicit knowledge, respectively.
Thirdly, the product strategist applies his internal knowledge (of the types conscious and tacit and of the strategic kind). Relevant information (Market Information) is available. Bringing this together via the knowledge conversion Generate Product Idea will end in a general product idea, being explicit knowledge associated to the product manager. This explicit knowledge now will be externalised in the second conversion to end up in external knowledge, the documented product idea. Example of an Analytic Scenario An knowledge ontology with reasoning support and a rule-processing system have been built. Fig. 6 shows the main procedure for the handling of analytic scenarios. They are represented by general knowledge conversions with gap(s), processed with the help of the rule system, and finally interpreted as scenarios with all parts known. This work is already completed with respect to basic scenarios, the following shows a rule resolving a basic scenario with the gap at the source side, externalisation as known conversion and a known
5
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Fig. 6 Rule support of analytic scenarios
destination knowledge piece. The rule is formulated with the Semantic Web Rule Language (SWRL, see [14]):
Knowledge(?k2) ^ Externalisation(?e) ^ hasDestination(?e, ?k2) ^ swrlx:makeOWLThing(?k1, ?k2) → Explicit_Knowledge(?k1) ^ hasSource(?e, ?k1)
Here, given knowledge k2 and the externalisation e, where k2 is the destination knowledge of conversion e, a new piece of knowledge (k1) is generated, which is of type explicit and is the source knowledge of conversion e. As a result, the rule produces a new source knowledge of type explicit knowledge, which fills the gap in the basic scenario. The next step, the support of simple scenarios is under development currently. Because of the rapidly increasing complexity of general knowledge conversions compared to basic ones, rule processing could no longer lead to unique solutions. Instead heuristics have to be introduced to support the scenario handling. Support of chains or nets of simple scenarios will be straightforward then, once the simple ones can be handled.
7. SUMMARY AND CONCLUSION
A new conception of knowledge and knowledge conversions is given, which serves as modeling basis for knowledge development in an enterprise. Investigation and classification of possible applications lead to a taxonomy of knowledge development scenarios. The main categories in this taxonomy are constructive and analytic scenarios. Derived from them important sub-categories are described. Two implementation examples are given. First, a modeling notation for knowledge-intensive business processes is introduced, which serves for constructive scenarios. This extends the potential of business process modeling by further recognition of knowledge, which is needed, generated, transferred through those processes. Second, a semantic approach with rule processing is described, which
can handle analytic scenarios. It offers the potential, to fill gaps in knowledge chains by semantic reasoning. Further research is needed, to address hybrid scenarios with both constructive as analytic characteristics. This would include cases, where only a model of knowledge-intensive business processes could be reached, which is incomplete in the sense that there are gaps in the modelled topology of activities.
REFERENCES [1] Ammann, E., “Enterprise Knowledge Communities and Business Process Modeling”, in: Proc. of the 9th ECKM Conference, Southampton, UK, 2008, pp. 19-26. [2] Ammann, E., “The Knowledge Cube and Knowledge Conversions, in: World Congress of Engineering 2009, Int. Conf. on Data Mining and Knowledge Endineering, London, UK, 2009, pp.319-324 [3] Ammann, E., “BPMN-KEC2 – An Extension of BPMN for Knowledge-Related Business Process Modeling”, Internal Report, Reutlingen University, 2011. [4] Ammann, E., Ruiz-Montiel, M., Navas-Delgado, I., Aldana-Montes, J., “A Knowledge Development Conception and its Implementation: Knowledge Ontology, Rule System and Application Scenarios”, in: Proceedings of the 2nd International Conference on Advanced Cognitive Technologies and Applications (COGNITIVE 2010), Lisbon, Portugal, November 21-25, 2010, pp. 60-65 [5] Bubenko, J.A., Jr., Brash, D., Stirna, J.: EKD User Guide, Dept. of Computer and SystemScience, KTH and Stockholm University, Elektrum 212, S-16440, Sweden. [6] “ Business Process Modeling Notation Specification”, OMG Final Adopted Specification, http://www.omg.org/spec/BPMN/1.1/, 2008. [7] De Jong, T., Fergusson-Hessler, M.G.M., “Types and Qualities of Knowledge”, Educational Psychologist, 31(2), 1996, pp.105-113. [8] EKD – Enterprise Knowledge Development, skd.dsv.su.se/home.html [9] Gronau, N.,Fröming, J., “KMDL® - Eine semiformale Beschreibungssprache zur Modellierung von Wissenskonversionen“ (in German), Wirtschaftsinformatik, Vol. 48, No. 5, pp. 349-360, 2006. [10] Hasler Rumois, U., Studienbuch Wissensmanagement (in German), UTB orell fuessli, Zürich, 2007. [11] Lehner, F., Wissensmanagement (in German), 2nd ed., Hanser, München, 2008. [12] Nonaka, I., Takeuchi, H., The Knowledge-Creating Company, Oxford University Press, London, 1995. [13] Polanyi, M., The Tacit Dimension, Routledge and Keegan, London, 1966. [14] SWRL: A Semantic Web Rule Language Combining OWL and RuleML, http://www.w3.org/Submission/SWRL/
6
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Optimal Pump Scheduling by Non-Linear Programmingfor Large Scale Water Transmission System
Jacek Błaszczyk∗1, Krzysztof Malinowski†1,2, and Alnoor Allidina‡3
1Research and Academic Computer Network (NASK), ul. Wawozowa 18, 02-796 Warsaw, Poland2Institute of Control and Computation Engineering, Faculty of Electronics and Information Technology, Warsaw University of Technology, ul. Nowowiejska
15/19, 00-665 Warsaw, Poland3IBI-MAAK Inc., 9133 Leslie Street, Suite 201, Richmond Hill, Ontario, Canada L4B 4N1
Abstract Large scale potable water transmission system considered in this paper is theToronto Water System (TWS), one of the largest potable water supply networks in NorthAmerica. The main objective of the ongoing Transmission Operations Optimizer (TOO)project consists in developing an advanced tool for providing such pumping schedulesfor 153 pumps, that all quantitative requirements with respect to the system operation aremet, while the energy costs are minimized. We describe here, in general, the concept ofTOO system, and, in detail, a large-scale non-linear, so-called Full Model (FM), basedon system of hydraulic equations, which is solved over 24-hour horizon and deliversoptimal aggregated flows and pressure gains for all pumping stations.
Keywords large-scale nonlinear programming, minimum cost operative planning,
pump scheduling, water supply
1. INTRODUCTION
The City of Toronto water transmission system is a large complex integrated
system consisting of pumping, storage and transmission (water mains, me-
ters, and valves). The City of Toronto water supply system capacity is the
largest in Canada and the fifth largest in North America. The Water Supply
function is responsible for providing services 24 hours per day, seven days
per week. The system consists of treated water pumping at four filtration
plants and pumping stations, and floating storage at reservoirs and elevated
tanks, and approximately 500 km of large transmission mains, ranging from
400 to 2500 mm in diameter, that transport treated water from the lake up
through the system. Water is pumped through a hierarchy of pressure dis-
tricts with elevated storage facilities (reservoirs and tanks).
Within each district, there are a number of water supply connections from the
transmission water mains to the local water distribution systems. Combina-
tions of the pumping stations and floating storage facilities provide water to
the City’s local water distribution systems. The system serves a population
of approximately 3,000,000 of which 2,500,000 are in the City of Toronto
and 500,000 are in the Region of York. The system service area is about
630 square kilometres. The Water Transmission System facilities are spread
throughout the City of Toronto and the Region of York. The Region of York
Water Transmission System (in the southern part of the Region of York) con-
sists of pumping stations, ground level storage reservoirs, elevated tanks, a
standpipe, and wells.
At present a large part of the system within the City of Toronto is essentially
manually operated, where an operator decides for example when to turn a
pump on or off. The Region of York part of the system works automati-
cally where the pumps are turned on or off based on measured tank levels;
however, the level set-points are manually set. Even when there are no abnor-
mal situations (pumping units out-of-service, hydro failure, plant down-time,
etc.), manual decision making within the City of Toronto system is a com-
plex process. The problem is further aggravated when the operators have to
deal with abnormal situations.
With this background, the City of Toronto and Region of York decided to
develop the Optimizer that automatically determines control strategies for
the Water Transmission System, based on certain criteria, including meet-
ing service delivery levels (pressures, reservoir levels, water quality), and
the Simulator that allows simulating and predicting the system performance
under various what-if situations.
The Optimizer works on-line alongside the City of Toronto’s and Region of
York’s SCADA (Supervisory Control and Data Acquisition) Systems, while
the simulator is an off-line tool.
∗Email: [email protected]†Email: [email protected]‡Email: [email protected]
2 . OV E RV IE W OF TOO S Y S TE M
The primary objective of the Optimizer (TOO) is to ensure that required
water delivery standards are met, while minimizing electrical power costs.
The TOO ensures fundamental service delivery standards including pressure,
flow, and storage are not compromised and water quality is optimized. The
pumping strategies must safeguard meeting the prevailing Water Q uality re-
quirements.
• The TOO ensures that pre-set minimum (critical) storage levels are not
violated.
• The TOO ensures that optimal strategies are achieved for different sea-
sonal, weekday/weekend and peak-day demands, as well as when ab-
normal events occur (pumping station/filtration plant/reservoir cell out-
of-service).
• The TOO includes capability for evaluating situations for buying and
selling electricity to examine the impact of H ydro spot market prices.
• The TOO consider the production cost of water which varies from plant
to plant in developing the optimal solution.
• The Transmission Operations Optimizer (TOO) is based on a water de-
mand forecast model, system hydraulic and water quality model, con-
trol strategies and practices that enables optimization of water pumping
and water quality in the Transmission System.
• The hydraulic and water quality model defined in EPANET format is
used by the Optimizer and Simulator.
• A water demand forecast model has been developed to forecast and
input short term demands for use with the Optimizer.
• A full hydraulic model based approach for determining optimal strate-
gies has been developed for use with the Optimizer.
In general, the Optimizer runs as follows:
1. Collect external factors (weather, energy rates), system status and data.
This includes, but is not limited to, reservoir levels, equipment out-of-
service, equipment auto/manual modes, production costs.
2. Run demand model to predict demand.
3. Determine potential optimal strategies.
4. Run hydraulic/quality model to check strategies.
5. Analyse results.
6. If results are acceptable, apply strategy to SCADA systems, otherwise
re-run Optimizer with objective and/or constraints.
Figure 1 depicts the overall approach/architecture for the Optimizer.7
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
F ig u re 1 : Transmission Operations Optimizer (TOO) architecture
3 . OP TIM A L NE TW ORK S CH E DUL ING P ROB L E M
Optimization methods described in this paper are model based and, as such,
require hydraulic model of the network to be optimized. Such hydraulic
model is provided as the Epanet INP file and consists of three main compo-
nents: boundary conditions (water sources and demands), a hydraulic non-
linear network made up of pipes, pumps, valves, and reservoirs dynamics.
The NLP algorithm that has been used to compute the continuous schedules,
and also the schedule discretization method, require the Epanet simulator of
the hydraulic network. In the TOO system the Epanet Toolkit has been used
to provide the initial feasible solution to the NLP solver by simulating the
network, and also Epanet Toolkit has been utilized by the TOO scheduler for
discretization of the continuous solution.
The main objective is to minimize the pumping cost subject to the hydraulic
network equations and operating constraints over a given time horizon with
hourly discretization. The relationships between different components of
cost and operating constraints in the whole hydraulic network have a very
complex nature, thus they can only be solved by use of an advanced nonlin-
ear programming solver and optimal scheduler.
The goal of the optimal network scheduling is to calculate the least-cost op-
erational schedules for pumps, valves, and water treatment plants for a given
period, typically for 24 hours or one week. The optimization problem is
given as:
1. minimize objective function consisting of pumping cost and water treat-
ment cost,
2. subject to hydraulic network equations, and
3. operational constraints.
These three parts of the problem are discussed in the following subsections.
The optimization problem is expressed in discrete-time, i.e., in the FM model
an hourly time-step is used.
3 .1. Ob je c tiv e fu n c tio n
The objective function is the sum of two costs associated with the system:
pumping cost cost and the water treatment cost:
J = JP + JT. (1)
The main component of objective function is pumping cost (considered over
a given time horizon [0,N −1]), given by the following equation:
JP = β ·∆t ·N−1
∑k=0
NLPS
∑l=1
cul (k)
Ql(k) ·∆hl(k)
ηl
, (2)
where:
k – hourly interval index, k = 0, . . . ,N,
N – total number of hourly intervals (typically N = 24),
l – logical pumping station (LPS) index, l = 1, . . . ,NLPS,
NLPS – total number of LPSs,
p – physical pumping station (PPS) index, p = 1, . . . ,NPPS,
NPPS – total number of PPSs,
i – pump index, i = 1, . . . ,NPl ,
NPl – number of pumps at l-th logical PS,
cul (k) – electricity tariff at time period k for pumping station l (usually a
function of time),
Ql(k) – average aggregated flow (in MLD) for l-th PS at k-th hour,
∆hl(k) – average head gain (in meters) for l-th PS at k-th hour,
ηl – average aggregated efficiency for l-th PS (parameter),
β – unit conversion coefficient,
∆t – length of time period in hours (by default equal to 1).
The water treatment cost for each water treatment plant (WTP) is propor-
tional to the flow output from l-th WTP with the unit price of ctl(k):
JT = ∆t ·N−1
∑k=0
NWTP
∑l=1
ctl(k) ·Ql(k), (3)
where:
l – water treatment plant (WTP) index, l = 1, . . . ,NWTP;
NWTP – total number of WTPs,
ctl(k) – treatment tariff at time period k for l-th WTP.
In the case of TOO system a water treatment plant is also treated as a pump-
ing station.
The term:
Pl(k) = fl(Ql(k),∆hl(k)) =Ql(k) ·∆hl(k)
ηl
, (4)
in the objective function (2), represents the electrical power consumed by the
pumping station l. The mechanical power of water is obtained by multiplying
the aggregated flow (Ql(k)) and the aggregated head gain (∆hl(k)) across the
pumping station. The consumed electrical power can then be determined by
dividing the mechanical power of water by the average aggregated efficiency
for pumping station (ηl ), which is computed as the weighted average from
the maximum (best) efficiencies of all pumps included in the l-th pumping
station:
ηl =∑
NPl
i=1 QBEPi ·ηBEP
i
∑NP
li=1 QBEP
i
, (5)
where:
QBEPi – flow at best efficiency point (BEP) for the i-th pump,
ηBEPi – maximum (best) efficiency for the i-th pump,8
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
The pumping cost depends on the efficiency of the pumps used and the elec-
tricity power tariff over the pumping duration. The tariff is usually a function
of time with alternating cheap and more expensive energy periods. In the case
of TOO system the unit energy price is computed as follows:
cul (k) = 0.22 · ccon
l +0.78 · cspotl (k), (6)
where the first unit price component is fixed according to long term contract
with electrical power supplier, and the second term is related to local energy
market spot price cspotl (k) at given location and hour k. This price is unknown
for a decision maker before actual real time occurrence, and so has to be
forecasted prior to performing the optimization.
For the TOO system the cost is based on the monthly usage, and the total
monthly pumping cost for a physical station p is defined as:
Jp = CCp +(DCRp −TARp) ·MaxKV Ap
+TCNRp ·PeakKWp +TCCRp ·MaxKWp
+(DRCRp +WOCRp ·LFactorp) ·PKWH totalp, (7)
where:
CCp – Commodity Charge, per kWh; flat or increasing block tariffs
charge,
DCRp – Distribution Charge, per maximum KV A through the month,
TARp – Transmission Allowance, per maximum KV A through the month,
TCNRp – Transmission Charge–Network, per maximum kW from tTCNRb
(e.g., 7:00 a.m.) to tTCNRe (e.g., 11:00 p.m.) weekdays (referred
to as " peak kW" ), through the month,
TCCRp – Transmission Charge–Connection, per maximum kW from
tTCCRb (e.g., 11:00 p.m.) to tTCCR
e (e.g., 7:00 a.m.), through the
month,
DRCRp – Debt Retirement Charge, per kWh in the month,
WOCRp – Wholesale Operation Charge, per kWh in the month; cost is mul-
tiplied by a loss factor LFactorp (e.g., 1.0376),
and
PKWH totalp = β ·∆t ·N−1
∑k=0
∑l:l∈p
Pl(k). (8)
The commodity charge (CCp) is variable, dependent on the time of day and
the energy rate structure, i.e., the values of unit energy prices cul (k) over a
control horizon:
CCp = β ·∆t ·N−1
∑k=0
∑l:l∈p
cul (k) ·Pl(k). (9 )
Maximum KV A through the month is:
MaxKV Ap = max
{
∑l:l∈p
PV Al (k)
}N−1
k=0
,
PV Al (k) =
Pl(k)
PFp, (10)
where PFp is the power factor for the p-th physical pumping station (e.g.,
0.9 2).
Peak KW through the month is:
PeakKWp = max
{
∑l:l∈p
Pl(k), k = tTCNRb to tTCNR
e weekdays
}N−1
k=0
(11)
Maximum KW through the month is:
MaxKWp = max
{
( ∑l:l∈p
Pl(k), k = tTCCRb to tTCCR
e
}N−1
k=0
(12)
The cost function (7) depends on the maximum values over the time period
of optimization:
JMDCp = (DCRp −TARp) ·MaxKV Ap
+TCNRp ·PeakKWp +TCCRp ·MaxKWp. (13)
The above component can be converted into a conventional optimization
form by introducing auxiliary variables z1p, z2
p and z3p to represent peak fac-
tors. We express the transformed model as:
JMDCp = (DCRp −TARp) · z1
p +TCNRp · z2p +TCCRp · z3
p, (14)
subject to constraints:
∑l:l∈p
PV Al (k) ≤ z1
p, k = 0, . . . ,N −1,
∑l:l∈p
Pl(k) ≤ z2p, k = tTCNR
b to tTCNRe weekdays and k = 0, . . . ,N −1,
∑l:l∈p
Pl(k) ≤ z3p, k = tTCCR
b to tTCCRe and k = 0, . . . ,N −1. (15)
In the objective function (2) we should take into account only ∆hl(k)≥ 0, i.e.,
when the pumping stations provides a flow (Ql(k) > 0) by use of its pumps
(there maybe a flow for some pumping stations by use of a bypass pipe, but
in that case ∆hl(k)≤ 0). Thus, the equation (2) could be reformulated as:
JP = β ·∆t ·N−1
∑k=0
NLPS
∑l=1
cul (k)
Ql(k) ·max(0,∆hl(k))
ηl(∆hl(k)), (16)
and finally replaced by the well-know formulation for a “ min-max” objective
function:
JP = β ·∆t ·N−1
∑k=0
NLPS
∑l=1
cul (k) ·P+
l (k),
P+l (k) ≥ 0,
P+l (k) ≥ Ql(k) ·∆hl(k)
ηl(∆hl(k)), (17)
where P+l (k) is an auxiliary variable defined for each pumping station and
for each k = 0, . . . ,N −1.
In general, the pumping cost may be reduced by decreasing the water quan-
tity pumped, decreasing the total system head, increasing the overall effi-
ciency of the pumping station by proper pump selection, or using reservoirs
and elevated tanks to maintain uniform highly efficient pump operations. In
most instances, efficiency can be improved by using an optimization algo-
rithm to select the most efficient combination of pumps to meet a given
demand. Additional cost savings may be achieved by shifting pump oper-
ations to off-peek water-demand periods through proper filling and draining
of reservoirs and elevated tanks. Off-peek pumping is particularly beneficial
for systems operating under a variable-electric-rate schedule.
3.1.1. D ecision variables
The decision variables in the resulting aggregated nonlinear optimization
problem are the average aggregated flows and average head gains for all log-
ical pumping stations at each hour of the control horizon. Also, the deci-
sion variables might be the settings for some throttled valves (minor losses
or valve openings) and settings for pressure reducing valves (set-point pres-
sures) in the hydraulic system.
The indirect decision variables in the optimization problem are:
• flows and head losses for every pipe and valve,
• heads at every junction and demand node,
• heads, volumes and water levels for every reservoir and elevated tank.
For all those variables there are simple bounds constraints. All variables are
related mutually through the hydraulic model.
3 .2 . H y d ra u lic m o d e l
Each network element has a hydraulic equation. In the optimal scheduling
problem it is required that all calculated variables satisfy the hydraulic model
equations. The network equations are usually non-linear and are embedded
as inequality and equality constraints in the optimization problem. In the
following subsections we describe the network equations used in modelling
of:
• flow continuity at connection nodes,
• mass-balance, average head and volume curve for reservoirs and ele-
vated tanks,9
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
• head-loss for pipes,
• head-loss for TCV valves,
• check valves,
• PRV valves,
• pumping stations.
3.2.1. Flow continuity eq uations at connection nodes
For each i-th network’s node the flow continuity equation (resulting from
Kirchhoff’s law I) must be met:
∑j∈L:Λc
i j 6=0
Λci j ·Q j(k) = di(k), k = 0, . . . ,N −1, (18)
where:
Λc – node-component incidence matrix for connection nodes,
Q j(k) – flow through j-th link at k-th hour,
di(k) – i-th node demand at k-th hour (nonzero for demand node, zero for
connecting node),
L – set of network links.
3.2.2. Mass-balance state eq uations for reservoirs and elevated tanks
For each r-th reservoir or elevated tank the following mass-balance state
equation must be fulfilled:
Vr(k+1) =Vr(k)+ ∑j∈L:Λr
r j 6=0
Λrr j ·Q j(k) ·∆t, k = 0, . . . ,N −1, (19 )
where:
r – reservoir or elevated tank index, r = 1, . . . ,NR,
NR – total number of reservoirs and elevated tanks,
Vr(k),Vr(k+1) – r-th reservoir or elevated tank volume at k-th and (k+1)-th hours,
Λr – node-component incidence matrix for reservoir and ele-
vated tank nodes,
∆t – time step (equal to one hour in the FM model).
3.2.3. Average head eq uations for reservoirs and elevated tanks
For the k-th hour and for each r-th reservoir or elevated tank, average head
Hr(k) required for flow modeling is computed as:
Hr(k) = Er +1
12(−xr(k−1)+8 · xr(k)+5 · xr(k+1)) ,
xr(k−1) = f−1(Vr(k−1)),
xr(k) = f−1(Vr(k)),
xr(k+1) = f−1(Vr(k+1)), k = 1, . . . ,N −1, (20)
where:
Er – reservoir or elevated tank elevation,
xr(k) – reservoir or elevated tank level,
Vr(k−1),Vr(k),Vr(k+1) – reservoir or elevated tank volumes for previous,
current, and next hour,
f (.) – level-volume curve, i.e., at each time Vr(k) =f (xr(k)).
In the equation (20) we used two-interval extended Simpson’s rule because it
was more numerically stable for the resulting non-linear optimization prob-
lem.
3.2.4. Volume curves for reservoirs and elevated tanks
A V olume Curve determines how storage tank volume (in ML) varies as a
function of water level (in meters). It is used when it is necessary to ac-
curately represent tanks whose cross-sectional area varies with height. The
lower and upper water levels supplied for the curve must contain the lower
and upper levels between which the tank operates.
In the FM model a volume curve is approximated by a linear curve or a cubic
polynomial. For r-th reservoir or elevated tank, and at k-th hour, we have:
Vr(k) = ar · xr(k)3 +br · xr(k)
2 + cr · xr(k)+dr, k = 0, . . . ,N, (21)
where:
Vr(k) – volume,
xr(k) – level,
ar,br,cr,dr – cubic polynomial coefficients.
Most of the reservoirs and elevated tanks in the FM model have a simple
linear volume curve:
Vr(k) = cr · xr(k)+dr, k = 0, . . . ,N. (22)
3.2.5. Head-loss eq uations for pipes
A pipe segment, with heads h1(k) and h2(k) at bordering nodes 1 and 2,
and flow Q(k) considered positive when directed from node 1 to node 2, is
described by the H azen-Williams (H W) empirical head-loss formula:
h1(k)−h2(k) = A · sgnQ(k) · |Q(k)|α , k = 0, . . . ,N −1, (23)
where:
A – resistance coefficient for the pipe,
α – flow exponent (α = 1.852).
Equation (23) models pressure loss in water pipes due to friction. For each
pipe, it uses a single constant A to characterize the pipe’s resistance which
depends on the diameter, length and roughness of pipe (the roughness de-
pends only on the material the pipe is made). Introduced in 19 02, the H azen-
Williams equation is an accepted model for fully turbulent flow in water net-
works that, because of its simplicity, has had large diffusion in hydraulic
computations.
Because of numerical difficulty with the absolute value term in the H W for-
mula (i.e., the non-differentiability when the flow is 0), we use its smooth
approximation on interval [−δ ,+δ ]:
h1(k)−h2(k) =
(
3δ α−5
8+
1
8(α −1)αδ α−5 − 3
8αδ α−5
)
Q(k)5
+
(
− 5δ α−3
4− 1
4(α −1)αδ α−3 +
5
4αδ α−3
)
Q(k)3
+
(
15δ α−1
8+
1
8(α −1)αδ α−1 − 7
8αδ α−1
)
Q(k) (24)
Outside of the interval we use the original H W formula.
3.2.6. Head-loss eq uations for TCVs
The throttle valves (TCV ) are modeled in a similar way as a pipe segment
with α = 2. Also for TCV s we are using the smoothing approximation for
function:
f (x) =
{
x2 x ≥ 0,
−x2 x < 0.(25)
3.2.7. CV model
A pipe can contain a check valve (CV ) restricting flow in one direction –
always from the start node to the end node.
∆h+(k) ·Q(k) ≤ 0,
A ·Q(k)α − (h1(k)−h2(k))−∆h+(k) = 0, k = 0, . . . ,N −1, (26)
where:
h1(k) – head at the start node of CV ,
h2(k) – head at the end node of CV ,
Q(k) – flow through the CV , Q(k)≥ 0,
∆h+(k) – auxiliary variable, ∆h+(k)≥ 0.
3.2.8. PRV model
A Pressure Reducing V alve (PRV ) limits the pressure at a point in the pipe
network. EPANET computes in which of three different states a PRV can be
in:
1. partially opened (i.e., active) to achieve its pressure setting on its down-
stream side when the upstream pressure is above the setting,
2. fully open if the upstream pressure is below the setting,10
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
3. closed if the pressure on the downstream side exceeds that on the up-
stream side (i.e., reverse flow is not allowed).
A pressure reducing valve will throttle the flow to prevent the downstream
pressure or hydraulic grade from exceeding a user-defined pre-set value. In
order to achieve its pressure reducing ability, a specific head-loss will be
induced through the PRV , such that the resulting downstream pressure obeys
the setting.
The valve can be in one of three states:
1. V alve is CLOSED if downstream pressure exceeds the pressure setting or
is greater than the upstream pressure (to prevent reverse flow).
2. V alve is OPEN if upstream pressure is less than setting and downstream
pressure is less than upstream pressure.
3. V alve CONTROLS if upstream pressure is greater than setting and down-
stream pressure equals setting.
Modeling of PRV s has to be, for purpose of optimization, different from de-
scription of those component as used in EPANET simulator. Thus, PRV s in
the FM model are modeled by set of nonlinear constraints involving multi-
plication of two or three variables:
(h1(k)−h2(k)) ·Q(k) ≥ 0,
(hs(k)−h2(k)) ·Q(k) ≥ 0,
(h1(k)−hs(k)) ·Q(k) · (h1(k)−h2(k)) ≥ 0,
(h2(k)−hs(k)) ·Q(k) · (h1(k)−h2(k)) ≥ 0, k = 0, . . . ,N −1,(27)
where:
h1(k) – head at the upstream side of PRV ,
h2(k) – head at the downstream side of PRV ,
hs(k) – PRV setting (downstream elevation + pressure setting),
Q(k) – flow through the PRV .
3.2.9. Constraints for pumping stations
We formulate the following set of constraints for the l-th pumping station:
∆hl(k)− (h2(k)−h1(k)) = 0, (28)
(h2(k)− (E2 + pminl )) ·Ql(k) ≥ 0, (29 )
(∆hl(k)−∆hminl ) ·Ql(k) ≥ 0, (30)
(h1(k)−hNPSHl ) ·Ql(k) ≥ 0, (31)
Ql(k)− ∑i∈MON
l
Q+i (k) ≥ 0, (32)
Ql(k)−NP
l
∑i=1
(
1
1+ e−L·Q+i (k)
)
·Q+i (k) ≤ 0, k = 0, . . . ,N −1, (33)
where:
∆hl(k) – average head gain for the PS,
h1(k) – head at the suction side of PS,
h2(k) – head at the discharge side of PS,
Ql(k) – flow through the PS, Ql(k)≥ 0,
E2 – elevation of discharge node for PS,
pminl – requested minimum pressure at PS discharge,
∆hminl – requested minimum head gain for the PS,
hNPSHl – Net Positive Suction H ead (NPSH ),
MONl – set of pumps in manual ON mode at l-th logical PS,
Ai,Bi,Ci – coefficients of the exponential H -Q curve for i-th pump:
∆hi(Qi(k)) = Ai −Bi ·Qi(k)Ci , i = 1, . . . ,NP
l ; (34)
The Q -H pump curve is given as:
Qi(k) =
(
Ai −∆hi(k)
Bi
)1
Ci, i = 1, . . . ,NP
l , (35)
Q+i (k) – flow through i-th pump for given ∆hl(k) (Q+
i (k)≥ 0):
Q+i (k) =
(
max{0,Ai −∆hl(k)}Bi
)1
Ci, (36)
L – sufficiently large scaling parameter (L = 40).
The non-smooth function max{0,g(x)} in the equation (36) is replaced by
a smoothed reformulationg(x)+
√g(x)2+ε
2. For sufficiently small ε > 0 this
function provides a reasonable approximation of the max operator.
The equation (28) defines an average head gain for PS as a difference be-
tween heads at discharge and suction sides of PS. The equation (29 ) requests
a predefined minimum pressure at discharge side only when there is a flow
through the PS, i.e., Ql(k) > 0, and the equation (30) is a constraint for the
minimum average head gain for PS (again only when Ql(k)> 0). The equa-
tion (31) requests the minimum head at suction side of the PS (named NPSH ),
only when Ql(k)> 0. The equation (32) defines the minimum nonzero flow
for a PS, when there is a pump switched manually ON, and the equation (33)
defines the maximum flow for a PS taking into account the sum of flows for
each pump, only when it provides a feasible flow for given ∆hl(k).
3 .3 . Op e ra tio n a l c o n s tra in ts
The operational constraints have the form of simple inequalities and are ap-
plied to keep the system state within its feasible operating range.
Thus, we must take into account time varying minimum and maximum reser-
voir and elevated tank volumes:
V minr (k)≤Vr(k)≤V max
r (k), k = 1, . . . ,N, (37)
where V minr (k) and V max
r (k) are the minimum and maximum storage volumes
specified (typically these will be constants with respect to time k). The reser-
voir and elevated tank volumes (state variables) should remain within the
prescribed simple bounds in order to prevent emptying or overflowing, and
to maintain sufficient storage for emergency purposes.
Similar constraints must be applied to the heads at critical nodes (SYPs) in
order to maintain required pressures throughout the water network:
hmins ≤ hs(k)≤ hmax
s , k = 0, . . . ,N −1, (38)
where hmins and hmax
s are the minimum and maximum heads specified for SYP
nodes.
The other variables, such as:
• flows for all links (pipes including CV s, TCV and PRV valves, and
pumping stations),
• head-losses for pipes and valves, and head-gains for pumping stations,
• heads at all nodes (connection junctions, demand nodes, suctions and
discharges of pumping stations),
• water levels for reservoirs and elevated tanks,
are also constrained by lower and upper constraints determined by the fea-
tures of particular network elements.
Other important constraints are on the final water level (and final water vol-
ume) of reservoirs and elevated tanks, such that the final level is not smaller
than the initial level:
xr(N)≥ xr(0), r = 1, . . . ,NR. (39 )
Without such constraints the least-cost optimization would result in empty-
ing all reservoirs. In the case of TOO system such constraint is applied over a
long-horizon (up to 7 days) when solving a mass-balance optimization prob-
lem.
3 .4 . M a x im u m De m a n d Ch a rg e s (M DC) c o n s tra in ts
In calculation of pumping cost two types of electricity pricing are applicable:
1. unit electricity tariff,
2. maximum demand tariff.
The second is difficult to handle and is not widely used by water companies.
The maximum demand charge (MDC) is calculated for the power peak (in
kWs or kV As) which occurred during the month. This calculations are made
independently for each physical pumping station (i.e., for each electrical fa-
cility), and the total charge is:
JMDC =NPPS
∑p=1
cMDCp · max
k=0,...,N−1Jp(k), (40)
where:11
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
cMDCp – maximum demand charge for p-th electrical facility,
Jp(k) – sum of power consumed by all logical pumping stations included in
the p-th physical pumping station:
Jp(k) = ∑l:l∈p
Pl(k). (41)
The operational monthly cost of running the water supply system can finally
be expressed as:
J = JP + JT + JMDC (42)
The terms JP and JT in equation (42) are separable in time and they can be
used to formulate a control problem over any period of time shorter than one
month. The maximum demand charge JMDC expressed by equation (40) is
not separable and causes problems if the control horizon is shorter than one
month. Thus, the common approach is to ignore the MDC term, and optimize
only the unit charge and treatment cost.
H owever, in the case of TOO system the MDC could affect the optimal solu-
tion significantly, thus the special mechanism is incorporated into a one-day
network scheduling problem (i.e., for N = 24). We formulate the objective
function for the i-th day scheduling (during MDC period) as:
J = JP + JT + JMDC, (43)
where:
JMDC =NPPS
∑p=1
cMDCp ·wi ·max{MDp(prev),MDp(i)}, (44)
and where the following notation is employed:
wi – weight coefficient representing the rate of MDC on different
days (i.e., dependent on the day of the month under consid-
eration); suppose lMDC to be the length of the MDC period
(maybe 30 days); then, wi =i
lMDC,
MDp(prev) – previous maximum demand for p-th electrical facility until the
i-th day;
MDp(i) – current maximum demand for p-th electrical facility on the
i-th day:
MDp(i) = maxk=0,...,N−1
Jp(k). (45)
By a proper choice of the values of wi as the month progresses we can pre-
serve a balance between the term (JP + JT) and the term JMDC and achieve a
solution which is close to the optimal solution over a one-month horizon.
The objective function (44) may be reformulated by as:
JMDC =NPPS
∑p=1
cMDCp ·wi ·
(
MDp(prev)+MD+p (i)
)
,
MD+p (i) ≥ 0,
MD+p (i) ≥ MD++
p (i)−MDp(prev),
MD++p (i) ≥ 0,
MD++p (i) ≥ Jp(k), k = 0, . . . ,N −1, (46)
where MD+p (i) and MD++
p (i) are an auxiliary variables defined for each elec-
tric facility.
3 .5 . F le x ib le fi n a l s ta te s fo r re s e r v o irs a n d e le v a te d ta n k s
The objective function, representing the total operating cost to be minimized,
is usually comprised of energy cost for pumping water and the cost for treat-
ing water, although other cost such as penalties for deviation from the final
reservoir (and elevated tank) target levels are sometimes included. The final
penalty charge is associated with the cost imposed on the state variables for
deviation from the specified final reservoirs levels.
In the case of TOO system final reservoir states for the 24-hour FM prob-
lem are taken from solution of the full mass-balance model (FMBM). The
FMBM model is a large-scale linear programming model solved usually over
one week horizon, and emerges directly from the FM model for which all
pressure dependent variables (heads, levels, head losses and head gains) and
constraints were omitted or substituted by parameters.
Thus, the objective function, representing operative cost, is the sum of the
pumping, treatment, maximum demand charges, and final state penalty costs:
J = JP + JT + JMDC + JF, (47)
where the final term JF is a penalty function associated with the final levels
of reservoirs and elevated tanks xr(N),r = 1, . . . ,NR. In the FM model the
JF is modeled by use of the slack variables x+r (N)≥ 0 and x−r (N)≥ 0 in the
following way:
JF =NR
∑r=1
ρr ·(
x+r (N)+ x−r (N))
, (48)
and by an additional equation for each reservoir:
xr(N) = xr(N)+ x+r (N)− x−r (N), (49 )
where:
ρr – penalty coefficient (equal to a large value, e.g., 1000),
xr(N) – desired final level for r-th reservoir or elevated tank (in the TOO
system obtained from solution of the FMBM problem).
4 . CONCL US IONS A ND F UTURE W ORK
We described, in general, the concept of TOO system, and, in detail, a large-
scale non-linear, so-called Full Model, based on system of hydraulic equa-
tions, which is solved over 24-hour horizon and delivers optimal aggregated
flows and pressure gains for all pumping stations. The resulting NLP model
is a truly large-scale nonlinear optimization problem. The basic, 24-hour
period, version involves over 9 0000 variables and nearly 100000 equality
and inequality constraints. For the solution of such NLP problem we use
the IPOPT solver [9 ] , implementing a primal-dual interior-point algorithm
with line-search minimization based on the filter method. The IPOPT solver
was found to provide very good performance, stability and robustness when
solving real-time NLP problems generated by the TOO system.
Re fe re n c e s
[1] J. Błaszczyk, A. Karbowski, K. Krawczyk, K. Malinowski, and A. Allidina. Op-timal pump scheduling for large scale water transmission system by linear pro-gramming. Journal of Telecommunications and Information Technology (JTIT),2012(3):9 1–9 6, 2012.
[2] J. Błaszczyk, K. Malinowski, and A. Allidina. Aggregated pumping station op-eration planning problem (APSOP) for large scale water transmission system. InK. Jó nasson, editor, Applied Parallel and Scientifi c Computing, 10th InternationalConference, PARA 2010, Reykjavik, Iceland, June 6-9, 2010, Revised Selected Pa-pers, Part I, volume 7133 of Lecture Notes in Computer Science, pages 260–269 ,Berlin / H eidelberg, 2012. Springer-V erlag Inc.
[3] M. A. Brdys and B. U lanicki. Operational Control of Water Systems: Structures,algorithms and applications. Prentice H all, New York, 19 9 4.
[4] J. Burgschweiger, B. G nä dig, and M. C. Steinbach. Nonlinear programming tech-niques for operative planning in large drinking water networks. The Open AppliedMathematics Journal, 3:14–28, 2009 .
[5] J. Burgschweiger, B. G nä dig, and M. C. Steinbach. Optimization models for opera-tive planning in drinking water networks. Optimization and Engineering, 10(1):43–73, 2009 .
[6] L. W. Mays. Optimal Control of Hydrosystems. Marcel Dekker, New York, firstedition, 19 9 7.
[7] H . Methods. Advanced Water D istribution Modeling and Management. H aestadPress, Waterbury, CT U SA, first edition, 2003.
[8] L. A. Rossman. EPANET 2 users manual. Technical Report EPA/600/R-00/057,U .S. States Environmental Protection Agency, National Risk Management Re-search Laboratory, Office of Research and Development, Cincinnati, Ohio, U SA,2000.
[9 ] A. Wä chter and L. T. Biegler. On the implementation of a primal-dual interior-pointfilter line-search algorithm for large-scale nonlinear programming. MathematicalProgramming, 106(1):25–57, 2006.
12
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Towards the construction of a cybernetic organism: the place of mental processes
R. Balvetti, A. Botticelli, M.L. Bargellini, M. Battaglia, G. Casadei, A. Filippini, E. Pancotti, L. Puccia, C. Zampetti.
ENEA C.R. Frascati
Roma Italy
and
G. Bozanceff, G. Brunetti, A. Guidoni, L. Rubini, A.Tripodo
ENEA guest C.R. Frascati Roma Italy
ABSTRACT
Observing the process that generates actions inside our
Biological System, we ask ourselves the question: where is the
mind placed?
The paper describes the cybernetic model GIASONE: a
synthetic emulator of the Biological Intelligent System, and its
mental processes. Starting from the GIASONE model, several
intelligent applications have been realized both as aids for
people, and as implementations into already existing machines,
making some processes "intelligent", or also to improve
functions on existing technologically advanced processes.
Keywords: cybernetic model, mind, brain, GIASONE, synthetic
intelligence
1. INTRODUCTION
Observing the process, that generates actions inside the
Biological System, we wondered where the wellspring of the
mind was. How does the environmental clue reach the mind?
how does the germ expand to build action sequences? and,
finally, how do these sequences become interior sequences of
postures and trajectories of motion made into environment?
2. THE MENTAL PROCESSES
The mind, which is considered as place of expansion of mental
processes [1], is usually studied separately from the rest of the
body that includes it. It is difficult to consider the mind as a set
of elements distributed throughout the body, but if we analyze
the body we find scattered "places" of mind and all concur to
mental events.
This is the achievement of technology as Synthetic Intelligence:
mental processes that led to the conception of GIASONE [2]
model, a synthetic emulator of the Biological Intelligent System,
and of its mental processes. For synthetic emulator we mean a
system that can perceive the environment and perform
autonomously consistent actions on it, this is based on a new
technology philosophy called Olocontrollo emulative,
Olocontrollo emulativo is a technology that comes directly from
the cybernetic GIASONE model [3]. This is to provide a new
place (emulative space) into a machine that already exists. In
this place occurs the interference of temporal reconstructions or
emulates of concrete reality, which includes the machine itself.
The interference between emulates creates new emulates that
still interfere with each other. The process of interference is
reiterative without limits; this generates an architecture that is
spread over several levels, in the present and in the past. The
differential (∆) produced drives the machine to perform actions
on the environment through actuators. All this is achieved by
zeroing the differential.
The olocontrollo emulative schema for the INTELLIGENT
PROSTHETIC application [4] is represented in Fig.1.
Fig.1 Representation of the Olocontrollo emulative schema
The olocontrollo emulative technology uses different tools and
methods of investigation as: Physics, Engineering, Physiology,
Philosophy, etc. that are harmonized in a single integrated and
organic cognitive approach, the cybernetic approach.
The objective of our project is a technological synthetic
emulation of the Intelligent Biological System and its
elementary processes [5].
For intelligent system we mean a system that is able to perceive
the environment and perform autonomously actions coherent on
it.
The project is developed along a search path that starts from the
observation of the behaviour of an Intelligent Biological System
to get to know, and artificially emulate the process that leads
from the perception to the action.
13
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
3. THE GIASONE MODEL
The representation of the brain (Fig.2), according to the
GIASONE model (1972) [6], is confirmed by the most advanced
Fig.2 Representation of the brain according to the GIASONE model
brain images, produced by the Magnetic Resonance Imaging.
However, the GIASONE model provides a network
configuration that goes beyond the cerebral area, is spread over
the entire biological system. The network has variable
thickening of its meshes, tailored to the different body functions.
The entire network contributes to the mental processes, the
whole network is the place of mind, and the images, evoked by
the memory, are composed following a holographic paradigm
Fig.3.
Fig.3 ENEA Frascati laboratory: an experiment with the hologram
A holographic brain model of cognitive function was also
suggested by Pribram, David Bohm: “holonomic model of
Bohm-Pribram” [7].
As well as over, the holographic plate lit by coherent laser light,
rebuilds the image contained in the grid; according to our thesis,
the network, urged by an environmental clue, coherent with
elementary memories already stored, lights up of a configuration
coherent with the clue and the memories themselves.
Like in a hologram, in which even a small portion carries the
information of the entire image and can reconstruct it; also a
small portion of the “mental network” recalls a picture able to
play the role of a source for the reconstruction of the entire scene
too. This reconstruction is reconfigured inside the biological
system, but it is also projected to the outside world by the mind,
creating a sort of hallucination. For example, in the process of
recognition and catching of an object, it is the search of the
perceived object that generates the displacement vector that
drives the movement of arm and hand.
The biological system responds as resonant cavity where the
coherent information travels and reflects. The superposition of
all reflected waves generates a stationary field and the energy
contained in the cavity is conveyed towards particular portions
of the executive network. This energy passes in the external
environment as implementation energy: the mental process has
generated the action.
In short, from the mind where the memory sediments that we
consider as dark memory are located, the sediments, lighted by
the clue, emerge and, for resonance, spreads across all the three-
dimensional body network re-projecting the content; the entire
proprioceptive network, due the resonance, responds and is
configured in a manner consistent to the excited sediments and
the body gets ready for action. All of this is supported by the
chemical network that, at the same, is triggered. The network
nodes begin to vibrate with their natural frequency as a guitar
string through its sound box, in that resonant cavity, that is the
body, the whole organism participates, rekindled of the same
state. The system is able to produce a virtual reconstruction
through the projection in the three-dimensional network of what
was already present inside the mind. Therefore “what is inside is
the bijective representation of the outside world that the newly
reconfigured body is now searching for”.
4. THE INDUSTRIAL APPLICATIONS
Starting from the GIASONE model we have realized several
applications, where our olocontrollo emulativo technology has
been converted into applications in the construction of machines
or machine modules of synthetic intelligence.
Some of these machines are used as physical aids to people:
• VISIO, for tactile perception at distance, dedicated to the blind
people and tested by more than 200 blind [8];
• INTELLIGENT PROSTHETIC, dedicated to trans femoral
prosthesis wearers;
Other implementations making intelligent some processes or
functions of processes technologically advanced:
• TRANSFER, multi tool machine. The System complex
machine is equipped by several functional tools. Seven stations,
five horizontal and two verticals, and a station for loading/
unloading of semi-finished. The piece switches automatically
from one tool to another until the end of processing cycle. In a
virtual dimension (the emulator), which is the intelligent stage of
the machine, the machine takes possession/rebuilds inside: its
volume; the volume of the environment including the raw piece.
Simultaneously, in the same virtual dimension, the machine
owns the volume of the ideal configuration of environment,
14
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
including the finished piece. Now, in the machine there are two
different configurations of the same environment. Inside the
machine, these volumes of the environment are interfering. This
interference drives the machine that transforms the raw piece
into finished piece.
• SECURCRANE, anti-sway module [9]. The module aim is to
solve the most important problem of the containers movement
into the port: the swaying crane load movement, during all
phases of loading and unloading of the container (caused also
from event not foreseeable, for example the wind). The sway is
predicted and avoided before it appears. The “anti-sway”
module is a module in side of the European SECURCRANE
project.
• A ROBOT VISION laser welding. The System was designed
with an innovative type of control that allows the seam
autonomous tracking, and the on-line parallel control of the
quality of the weld. The System is equipped with an intelligent
agent system the emulator that emulates the welding and
manages the run of welding over the seam line: in a dynamic
way. The emulator emulates the next spot of welding and drives
the laser's head in the run of welding. This System with an
artificial vision robot was used for flat sheets up to 16 meters in
length.
5. CONCLUSION
At previous question: where is the place of the mind? we answer
that the place of the mind and of the mental processes cannot
confined into the brain, but whole body by its resonant net is the
place of the mental process, which resonates from the
imaginative phase to pre-actuative phase.
6. REFERENCES
[1] M.Battaglia, A. Botticelli, G.Gazzi, A.Guidoni, G. La Rosa,
N. Pacilio, S.Taglienti, C. Zampetti Processi mentali, 2003
ENEA
[2] R.Balvetti, M.L.Bargellini, M. Battaglia, A. Botticelli,
G.Casadei, A.Filippini, A.Guidoni, E.Pancotti, L.Puccia,
L.Rubini, C.Zampetti, F.Bernardo, A.Grandinetti, B.Mussi,
G.Bozanceff, C.Iencenelli; From Natural Intelligence To
Synthetic Intelligence Through Cybernetic Model, The
International Symposium on Design and Research in the
Artificial and the Natural Sciences: Proceeding DRANS 2010
Orlando/Florida
[3] http://www.frascati.enea.it/UTAPRAD/olem.htm
[4] R.Balvetti, M.LBargellini, M.Battaglia, A.Botticelli, G.
Casadei, A.Filippini, E.Pancotti, L.Puccia, C.Zampetti., G.
Bozanceff, G.Brunetti, A.Chiapparelli, A.Guidoni, L.Rubini,
A.Tripodo, M.Traballesi, S.Brunelli, F.Paradisi, A.Grandinetti,
E.Di Stanislao, R.Rosellini The Cybernetic can improve the
quality of life: An Intelligent Limb Prosthesis, Proceeding
BMIC2012 Orlando/Florida
[5] A.Botticelli, N. Pacilio, Cento Lavagne Edizioni Controluce,
2008
[6] http://antonio.controluce.it/giasone
[7] http://en.wikipedia.org/wiki/Holonomic_brain_theory
[8] Il SOLE-24 ORE (Newspaper) Venerdì 12 Maggio 1995
Informatica-Robotica Visio sostituisce la vista con la sensibilità
tattile.
[9]http://cordis.europa.eu/search/index.cfm?fuseaction=result.do
cument&RS_LANG=FR&RS_RCN=12385922&q=
ENEA: Italian National Agency for New Technologies, Energy
and Sustainable Economic Development
The Agency’s activities are targeted to research, innovation
technology and advanced services in the fields of energy.
ENEA performs research activities and provides agency
services in support to public administrations, public and private
enterprises, and citizens. (www.enea.it)
15
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Seeing the Big Picture: Principles for dynamic process data visualization on Large Screen Displays
Alf Ove BRASETH
Institute for Energy Technology, OECD Halden Reactor Project Halden, Norway
and
Trond Are ØRITSLAND, PhD Norwegian University of Science and Technology, Interaction Design
Trondheim, Norway
ABSTRACT
Control room operators in time-constrained situations easily lose track of what is happening in complex large-scale pro-cesses, coping with thousands of variables and control-loops. Information-Rich Design (IRD) is an industry-tested approach to Large Screen Display (LSD) design that aims to close the gap through an easy perceivable big picture. This paper develops a theoretical basis and proposes design principles for IRD, focusing on visualization of dynamic behaviour of complex large-scale processes.
The theoretical basis is discussed in light of initial evaluations of the IRD concept, other display concepts for complex pro-cesses, scientific findings on visualization and perception of displays, and psychological literature on rapid, intuitive, infor-mation perception. Design principles are discussed using the case of an on-going installation of a third generation IRD large screen display for a nuclear research reactor.
Keywords: Large Screen Display, Complex Processes, fast Visual Perception
1. INTRODUCTION & MOTIVATION
Control room operators face a huge challenge in monitoring thousands of variables and control-loops in large industrial processes. They may experience difficulty in seeing the greater picture if complexity goes too far. Endsley [1] noted that operators have difficulties developing satisfactory Situation Awareness (SA) in complex processes because of the necessity to perceive critical factors, comprehend them in a meaningful context in relation to goals and to support projection of future status.
Display technology suitable for control room installations has evolved rapidly in recent years. High-definition video projectors and flat screen power-walls have enabled the display of process information on much larger surfaces than in the past. Andrews et al. [2] refer to studies showing that high-resolution Large Screen Displays (LSDs) can positively affect user performance for spatial visualizations. Thus it is plausible that LSDs can contribute to improving the operator´s SA, presenting much more information than on smaller desktop displays.
Unfortunately, larger scale displays in control rooms are often only up-scaled traditional schematic process and instrumenta-tion type pictures, using traditional process symbols, numbers and bar graphs. Andrews et al. [2] suggest, however, that designing effective large displays is not a matter of scaling up existing visualizations; designers should adopt a human-centric
perspective on these matters, taking limited human capabilities into consideration.
Endsley [1] refer to studies showing that experts use pattern-matching mechanisms to draw upon long-term memory structures, enabling them to quickly understand a given situa-tion. This mechanism is recognized by the US nuclear regulator, which has worked with issues related to information presentation in control rooms for many years. For example, NUREG-0700 [3] section 6: Group-view display system states that: “An overview display should provide a characterization of the situation as a whole in a concise form that can be recognized at a glance”, it is also referring to object categorization schemes and pattern matching cues to reduce demands on attention. There is, however, a scarcity of scientific literature or design approaches that attempt to answer the question: “How should one display process information on LSDs to support fast information perception for complex large-scale processes?”
The IRD approach discussed in this paper is a scientifically based LSD concept developed at the Norwegian Institute for Energy Technology. It has been applied for industrial and re-search purposes so far through 13 live applications in the petro-leum, mining and nuclear domains. Its objective is to give the big picture of the process state, and to support rapid visual perception of data.
Figure 1 illustrates qualitatively how the process operator experiences reduction in information acquisition capacity in increasingly faster-paced, data-driven situations. IRD addresses fast information acquisition, inspired by Rasmussen’s Skills-Rules-Knowledge (SRK) model [4].
Figure 1: Positioning IRD, modified from [9]
Information-Acquisition
researcher analyst pilot fire-fighter
Max. info. load
SituationSelf-paced-Top-down
Task paced Tight-Bottom-up
Information Rich Design
SkillsRulesKnowledge
16
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
The IRD approach incorporates graphical process objects in-spired by Tufte´s concepts of high data-ink ratio and colour layering [5, 6]. The objective is to reduce cognitive workload through explicit information visualization inspired by Norman´s [7] concept of information is in the world, rather than in the head. Gestalt grouping principles are used to reduce complexity in larger data sets, see Lidwell, Holden and Butler [8].
The left side of Figure 2 shows an example of three process variables visualized through horizontally aligned IRD generic mini-trends, using mathematical normalization of the measuring scale. This generic objects are used to visualize process data such as liquid level, pressure, temperature and flow. The green arrow represents the target value (set-point), darker areas indicate high and low alarm limits. The IRD mini-trend can also integrate controller output, valve position and explicit alarm information.
Figure 2: IRD mini-trends on left side, a traditional true scale
on the right side
With the exception of the SRK model, the IRD theoretical framework is, however, mostly influenced by information visualization theory for a static state, such as printed-paper. For this reason, there is a need to expand the theoretical foundation of the concept, most notably by creating a stronger relation to dynamic process plant behavior, and findings from studies focused on display-based visualization. Typical questions are:
• Which means are suitable to support rapid-search attention to dynamic data in LSDs?
• How can one visualize dynamic process plant behavior in LSDs?
Outline: We examine first what others have accomplished on display concepts for large complex processes, before introduc-ing some recent findings on visualization and perception on displays. We then discuss psychological literature on an ecological approach to interface design. This is applied to extend the IRD theoretical foundation, focusing on dynamic behaviour of complex processes. From this, design principles are proposed.
An example of applying the design principles, and earlier find-ings on IRD displays are discussed through the case of a third generation IRD display implemented for a live nuclear research reactor process. Finally, relevant issues for further research are described.
Earlier work: This paper extend on our earlier work discussing the need for a design concept that supports rapid visual perception through Rasmussen´s SRK model and Tufte on high data ink ratio and colour layering; see Braseth et al. [9]. More recent publications focus on realizing the concept on LSDs, see Braseth et al. [10, 11]. Two user-tests have been done on for the nuclear domain; see Laarni et al. [12] and Braseth et al. [13, in press].
2. DISPLAY CONCEPTS FOR COMPLEX PROCESSES
Even though not much has been done on visualization concepts for LSDs, we find it relevant to look at related concepts in-tended for smaller desktop displays. Well-known approaches regarded as state-of-the-art are discussed: the ASM Honeywell approach, Function-Oriented Design (FOD), Parallel Coordi-nates concept, grid control displays and Ecological Interface Design (EID).
Reising & Bullemer [14] suggest that direct perception displays are needed to provide an overview at a glance supporting SA. The Abnormal Situation Management (ASM) consortium explores the concept on smaller desktop overview displays in the petroleum industry. They suggest displaying process data through generic qualitative indicators such as normalized dials, and vertical and horizontal bars. These overview displays use a functional tabular layout instead of the more common schematic layout with lines to connect process objects.
A user-test by Tharanathan et al. [15] found an ASM functional overview display more effective in supporting SA than ordinary schematic displays with traditional data coding. The results suggested that a transition to a functional display is not overly problematic. In an ASM-sponsored paper, Bullemer et al. [16] discuss the advantage of new technology not restricted by col-our limitations, recommending a grey background, considering situation awareness, alertness, eyestrain and fatigue.
FOD is an innovative approach to human-system interfaces intended for use in large complex nuclear systems on a display system called FITNESS (not specific to LSDs). The concept originates from work by Pirus [17] and his colleagues at Electricité de France. The objective is to “control the complexity of the plants and their operation by introducing structuring elements”. FOD reduces plant complexity; applying a hierarchical display structure, see Figure 3.
Figure 3: FOD reduces complexity through display hierarchy,
based on Pirus [17]
In a large-scale user test by Andresen et al. [18], the FOD con-cept was given positive feedback by the test subjects on pro-cess-overview, disturbances, and alarm visualization. On the negative side, there was an extensive need for button pushing and navigation in the display hierarchy.
The Parallel Coordinates concept excels in displaying high-density graphics, visualizing large data sets on a single display. Lines are drawn as patterns of values for variables at different instances of time, where deviation from normal plant modes can be spotted as lines falling outside earlier clusters of lines. Inselberg [19] popularized the concept; a later paper by Wegman [20] initiated computerized applications of parallel coordinates. The concept is used in industrial applications as demonstrated by Brooks et al. [21], illustrated in Figure 4.
Part-wise mathematical normalized- scale in IRD Traditional true scale
Trendedvalue
Low alarms
Highalarms
Set-Point
Top LevelFunctional purpose
Detailed level
Sub-functions
17
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Figure 4: Parallel Coordinates data in PPCL software (Brooks
pers. communication 2012)
Comfort et al. [22] performed a case study determining the effectiveness of parallel coordinates for supporting operators in mitigating hazard events through historical process data. They found the concept excellent for general explorative data analysis.
Hoff and Hauser [23] presented a new design approach to improve display interfaces of grid control in energy management systems. They argued that traditional display approaches are not tuned to our natural ecological perceptual system. They suggested an approach that supports rapid information pick-up in-line with ecological psychology and Rasmussen´s SRK model. They offered some display examples of easy to perceive analogue diagrams. Hoff and Ødegård [24] outlined eight properties referring to the degree of directness in display interfaces in a taxonomy named Ecological Interaction Properties.
EID is a theoretical framework that covers the work domain description, and how to assign information to displays accord-ing to Rasmussen’s SRK taxonomy, see Vicente and Rasmussen [25, 26]. The main objective of EID is to support operators in unfamiliar, unanticipated, events. The concept is well described in the scientific literature, but has had few industrial applications. The concept does not focus on a specific display size or type. A more recent book by Burns and Hajdukiewicz [27] describes suitable EID graphical objects; some are quite similar to the generic qualitative objects used in the ASM consortium approach.
Applicable for IRD: Even though these concepts are not de-signed specifically for LSDs, ASM consortium, grid-control, EID, and the Parallel Coordinates approaches suggest that generic qualitative process objects are suitable for fast visual perception of complex processes. The results from FOD suggest that a hierarchical display structure might result in extensive and time-consuming navigation. The work by Hoff and Hauser and EID suggests that Ecological Psychology is a suitable approach for describing a complex dynamic work-domain.
3. VISUALIZATION & PERCEPTION IN DISPLAYS
The following section focuses on rapid visual perception in computer displays.
Ware [28] focused on how to create displays that support human pattern recognition skills through efficient top-down search strategies, and bottom-up data driven pop-out effects. He suggested relying on external visual aids in the process of visual thinking due to limited human visual memory, and that the real power rests in pattern finding. Ware explained that it is better to
re-establish visual cognitive operations through rapid fast eye movements than to remember or navigate for information. He identified the strongest pop-out effects, or features, to be: Color, orientation, size and motion (omitting depth here). Motion is extremely powerful, and a gentler motion can be used instead of abrupt flashing and blinking, which can overly irritate the user. He suggested as a rule of thumb that the most important, and common queries in displays should be given most weight, “if all the world is grey, a patch of vivid color pops out”. Ware suggests visualizing large and small-scale structures to support efficient visual top-down search. Lines and connectors are suitable to describe relationship between concepts.
Healey & Enns [29] have written a comprehensive article on attention and visual memory in visualization and computer graphics, see also Healey´s web page [30]. They described how seeing is done through a dynamic fixation-saccade cycle 3-4 times each second through bottom-up data-driven, and top-down search processes. Only a limited number of visual fea-tures can be detected within a single glance in a saccade cycle.
They suggested that visual features should be suited to the viewers’ needs and not produce interference effects that mask information, referring to Duncan and Humphreys’ [31] similarity theory. To avoid masking primary data, the most important information should be given the most salient features (feature hierarchies). In their discussion on change blindness, on how people miss information due to limited visual memory, Healey and Enns [29] noted that larger format displays increase this problem in comparison to smaller computer screens. They suggested reducing the problem by designing displays that support both top-down and bottom-up processes.
Applicable for IRD: Although this work is not specifically focused on the issue of visualization of dynamic process plant data on LSDs, it indicates that a dynamic process display should allow rapid visual scans for information due to limitations in visual memory. LSDs should support effective means for top-down search, including large and small-scaled structures. Lines are appropriate to connect concepts. Data-driven processes should be visualized through pop-out effects. Feature hierarchies can help avoid masking of primary data.
4. AN ECOLOGICAL APPROACH TO INTERFACE DESIGN
Gibson [32] is one of the founders of ecological psychology, and in this approach he sees humans and other animals from an organism-environment reciprocity perspective. Gibson de-scribed how the values and meaning of things in the physical environment are directly perceivable for humans and animals, contrary to a sensation-based perception triggered by stimuli, and approaches describing cognition through mental models.
Gibson described the world and its behavior through: Sub-stances, mediums, surfaces, events and their affordances. Sub-stance is described as persistent to outer forces. Bodies can move through mediums. They are homogenous, without sharp transitions, examples are air and water. Events are described as changes in our environment as a result of shock or force, ripples on water, evaporation, etc. Events are typically observed on the surfaces that divide substances and mediums. Affordance de-scribes how the physical environment provides immediate actionable properties, such as: walking on a floor, sitting on a chair, constraints describe limitations.
18
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Ecological psychology aims in general to address human behav-iour in our complex multisensory, dynamic, physical world, not for abstract displays visualizing process plant´s behaviour. It offers, however, several useful concepts when considering the process control operator as an integral, mutual, part of a complex process plant. Most notably, it enables us to explore direct perception of a complex domain.
Direct perception suggests that information should be presented in a manner appropriate for rapid visual perception, for intuitive pick-up, in-line with: substances, mediums, surfaces, events and affordances.
Figure 5 suggests that only the left vessel visualizes process plant disturbances in a manner directly perceivable through surface movement. The number, bar and dial to the right only afford an immediate description of the actual value and con-straints (measuring-scale bar & dial). The events happen inside a physical structure of unchangeable substance.
Figure 5: Vessels with three variables. Dynamic events directly
perceivable through left side trend-lines, surfaces
The use of affordance in HCI is, however, debated. Norman [33], stated that the concept has taken on a life far beyond its original meaning. He suggested instead perceived affordance when applied to screen-based interfaces. Hartson [34] extended this further for use in context of interaction design and evalua-tion, and proposed: cognitive affordance, physical affordance, sensory affordance, and functional affordance.
Applicable for IRD: We conclude that LSDs should be rich in perceived affordances, providing many clues to the complex process plant, enabling the operator to detect and see the big-picture with enough detail to comprehend the whole situation. Dynamic process disturbances can be described through events, directly perceived through trended surfaces and their constraints. Physical vessels and structures in the process plant can be visualized as substances.
5. DISCUSSION
Due to the large scale and high complexity of LSDs, we find the approach of attempting to address our limited visual memory to be particularly interesting. That work gives us further insights and support on how to support fast top-down search in large displays. It suggests including both large and small-scale struc-tures in a process display, for which we have earlier used the term landmarks. However, we had not previously considered that they should be given different size and shape (typically large vessels) to better support rapid top-down search. This is somewhat contradictory to our intention of creating displays that focus on dynamic information, reducing static clutter. The problem can, however, be minimized through the use of colour layering to avoid masking primary dynamic information.
Furthermore, it is interesting that the use of lines to connect shapes is encouraged. We have in earlier displays been very cautious in the use of lines, only using grey colours for fear of generating unnecessary clutter. This could be a reason why it has proven challenging to make IRD displays easily interpretable. On the whole, this suggests that we need to focus
more on connecting process objects in the display to enhance top-down search.
Early IRD displays were found overly information dense, so in-troducing more space, as open areas, might also be beneficial. More research is needed to determine the right balance for fast top-down search between static large- and small-scale structures, lines and information density.
Attention to dynamic data-driven processes is a challenge in LSDs, and we find the work on pop-out effects to be particularly appropriate to this. In many ways, our earlier work on colour layering supports this, but we have given limited attention to masking issues. This work suggests that we must introduce greater differences in features between information classes in the display than we have done in the past. Users have also complained that IRD displays are too dim, with too little contrast - “everything is grey, nothing stands out”. Ware [28] suggested, however, being cautious of blinking, applying a gentler motion instead. This indicates that the IRD dynamic alarm-spot is an appropriate solution to visualize new, unacknowledged alarms, see Figure 6.
Figure 6: Pop-out effect: incoming unacknowledged alarm
visualized through dynamic alarm-spot on green valve
There seems to be a consensus that qualitative indicators as process objects are suitable for rapid visual perception. In IRD displays, we have used mathematically normalized bar graphs, polar diagrams and mini-trends to make them even easier to perceive also in LSDs. The ecological surfaces and events suggest, however, that the mini-trend is probably best suited to visualize dynamic process plant behaviour.
In summary, we find the theory and approaches described here relevant for the IRD concept, and we propose the following design principles for dynamic process data on LSDs:
• Display graphics should support direct perception of the system situation. One should design dynamic graphics rather than lists and numbers. Data should be rich in perceived affordances, presented as graphics designed to visualize substances, mediums, surfaces, and their constraints.
• The design should include large- and small-scale structuring elements that support top-down visual search. One should layout the system using lines, grouping, and open space.
• Data should be given lower level pop-out effects, to provide cognitive support through rapid eye movements. One should apply graphics orientation, colour, size, and motion and substitute blinking for a gentler animation. A grey back-ground is suitable for pop-out effects.
• Colour layering should be used for a visual hierarchy rather than display hierarchies, avoiding too low contrast.
In our opinion, what separates IRD from smaller desktop oriented concepts is: firstly, a stronger focus on simplification of visual complexity, secondly, its use of animated objects (dynamic alarm-spot), and finally, its focus on visual search in
4,4
New alarm Stable spotlight
AcknowledgedApproximately 2 seconds
19
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
larger displays, retaining a relatively traditional schematic layout.
6. A THIRD GENERATION IRD DISPLAY
Figure 7 illustrates how we have applied the proposed design principles to a new third-generation IRD display. The display is installed in the Halden research reactor control room, using two rear-mounted projectors and mirrors. It is designed by expert operators and the first author, and replaced older hardwired panels during 2012.
The objective is to address some of the problems encountered in our earlier first [10] and second-generation [11] LSDs. The first generation display succeeded well in comparison with a tradi-tional overview display, but it had significant potential for improvement in readability: it was too dense, abstract, and was inconsistent in alarm visualization. A follow-up second-generation display had improved alarm visualization. However, it still suffered readability problems, as it was too dim with low contrast. Both were reported to be unfamiliar and abstract.
The largest structure in the new display is the reactor tank with nuclear control rods; other liquid filled vessels are using a 3D shaded background. The brown lines are primary radioactive coolant circuits. The green and blue are the second and third outer non-radioactive coolant circuits. Mini-trends at the lower right are monitoring experimental loops.
To avoid challenging limited human visual memory, the display layout is flat without any hierarchy, in accordance with the last design principle. Instead, a colour hierarchy is used, and dynamic data-driven events as alarms are visualized through salient pop-out effects. Saturated red is reserved for alarms, avoiding masking problems. To limit visual clutter, we have used the grey background colour on equipment that is not running or is closed. Green is used on active running equipment.
Early in the development phase we used a functional tabular layout of display elements, but it was considered too unfamiliar and abstract by process operators. The final display combines a traditional schematic layout of large process elements, and a functional tabular layout of other monitored process variables (right and upper left). The central section of the display is quite similar to the replaced older analogue panels. This might contribute to a display that is not too unfamiliar and abstract.
To ease top-down navigation, large and small-scale structures (substances) are visualized. Examples are the large reactor tank, and other liquid-filled vessels. Space, in the form of open areas, has been introduced to avoid the earlier overly dense appearance. Major flow-lines visualizing medium colour are included to connect related objects through a livelier colour palette than in earlier displays, avoiding the “everything is grey” appearance.
We have used aligned and grouped IRD mini-trend objects to display pressures, temperatures and liquid levels (surfaces). Alarm-limits (constraints) are visualized where applicable as darker areas in the mini-trends. Unfortunately, the mini-trends are quite abstract looking. Using physical structures (substances) as a background might help putting them into a context.
To keep the display rich in cues (perceived affordances), graph-ical objects are kept dynamic. Examples are the use of thick flow lines when valves are open, thin lines when closed. A circle indicates pump speed, full speed is full circle, and half circle is half speed. A problem reported from earlier IRD displays is that analogue data presentation does not afford high enough accuracy. This has encouraged us to include digital numbers on key parameters in the new display.
7. CONCLUSIONS & FURTHER WORK
This paper approaches complex processes through effective LSD design, the IRD concept described here has a human-centric perspective; resulting in graphical process objects and design principles. We have found the mini-trend object suited to display dynamic process response in a natural way. To our knowledge, IRD is positioned quite uniquely as a LSD concept. User tests from earlier nuclear research displays indicate, however, that the concept has not yet achieved an acceptable level of user experience. From this and our initial discussions, we suggest focusing on the following in further research work:
• Measure Situation Awareness levels; does IRD increase levels and reduce information overload problems through easily perceivable process objects and their layout?
• Measure user experience; is IRD acceptable for real-world installations?
Other issues include consistency problems between the IRD LSDs and other control room information sources.
Figure 7: Third generation nuclear IRD large screen display, 1.4m x 4.5 m
20
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
8. ACKNOWLEDGEMENT
Thanks to M. Gustavsen, M. Louka, S Nilsen, S. Collier and G Skraaning for valuable discussions and comments. We are grateful to J. Laarni and J. Andersson for thoroughly reviewing the paper.
9. REFERENCES
[1] M.R. Endsley, Situation Awareness, In J.D Lee & A. Kirlik (Eds.), The Oxford Handbook of Cognitive Engineering, pp. 89, 99, Oxford University Press, 2013.
[2] C. Andrews, A. Endert, B. Yost, C. North, Information visualization on large, high-resolution displays: Issues, challenges, and opportunities, Information Visualization SAGE, pp. 341-355, 2011.
[3] NUREG-0700 rev. 2, Human-system Interface Design review Guideline. U.S. Nuclear Regulatory Commission Office of Nuclear Regulatory Research, Washington DC, pp. 309-329.
[4] J. Rasmussen, Skills, rules, and Knowledge; signal, signs, and symbols, and other distinctions in human performance models, IEEE Trans. Syst., Man, Cybern. SMC-13, pp. 257-266, 1983.
[5] E. Tufte, The Visual Display of Quantitative Infor-mation. Graphics Press, pp. 123-137, 1983.
[6] E. Tufte, Envisioning Information. Graphics Press, pp. 53-65, 1990.
[7] D.A. Norman, The psychology of everyday things, Basic Books, pp. 52-55, 1988.
[8] W. Lidwell, K. Holden, J. Butler, Universal Principles of Design, Rockport Publishers Inc., pp. 44-45, 116-117, 144-145, 196-197, 2010.
[9] A.O. Braseth, Ø. Veland, R. Welch, Information Rich Display Design, Paper at NPIC & HMIT, Columbus, 2004.
[10] A.O. Braseth, V. Nurmilaukas, J. Laarni, Realizing the Information Rich Design for the Loviisa Nuclear Power Plant, Paper at NPIC & HMIT, Knoxville, 2009.
[11] A.O. Braseth, T. Karlsson, H. Jokstad, Improving alarm visualization and consistency for a BWR large screen dis-play using the Information Rich Concept, paper at NPIC & HMIT, Las Vegas, 2010.
[12] J. Laarni, H. Koskinen, L. Salo, L. Norros, A.O. Braseth, V. Nurmilaukas, Evaluation of the Fortum IRD Pilot, Pa-per at NPIC & HMIT, Knoxville, 2009.
[13] A.O. Braseth, T.A. Øritsland, (in press, 2013), Information Rich Design: A Theoretical Discussion and in-depth User Test of a Nuclear Large Screen Display.
[14] D.V. Reising, P.T. Bullemer, A Direct Perception, Span-of-Control Overview Display to Support a Process Control Operator´s Situation Awareness: A Practice-oriented De-sign Process, Proc. HF 52 meeting, SAGE, 2008.
[15] A. Tharanathan, P. T. Bullemer, J. Laberge, D.V. Reising, R. Mclain, Impact of Functional and Schematic Overview Displays on Console Operator´s Situation Awareness, Journal of Cogn. Eng. And Dec. Making, Vol. 6, no. 2, 2012.
[16] P. Bullemer, D.V. Reising, J. Laberge, Why Gray Back-grounds for DCS Operating Displays? The human Factors Rationale for an ASM Consortium Recommended Prac-
tice, 2011, ASM sponsored paper, accessed http://www.asmconsortium.net/ 29. May 2012.
[17] D. Pirus, Future trends in Computerized Operation. Proc. 2002 IEEE, 7th. Conference on human factors and power plants, Arizona, 2002
[18] G. Andresen, M Friberg, A. Teigen, D. Pirus, Function-Oriented Display System, First Usability Test; HWR-789, OECD Halden Reactor Project, 2005.
[19] Inselberg A., The plane with parallel coordinates, The Visual Computer, Springer-Verlag, pp. 69-91, 1985.
[20] E.J Wegman, Hyperdimensional Data Analysis Using Parallel Coordinates, Journal of the American Statistical Association, Vol. 85, No. 411, pp. 664-675, 1990.
[21] R. Brooks, J. Wilson, R. Thorpe, Geometry Unifies Pro-cess Control, Production and Alarm Management, IEE comp. & Contr. Eng., 2004.
[22] J.R. Comfort, T.R. Warner, E.P. Vargo, E.J. Bass, Parallel Coordinates Plotting as a Method in Process Control Haz-ard Identification, Proc. IEEE Syst. 6 Info. Eng. Design Symposium, USA, 2011.
[23] T. Hoff, A. Hauser, Applying a Cognitive Engineering Approach to Interface Design of Energy Management Sys-tems: PsychNology Journal, Vol. 6, No3, 2008.
[24] T. Hoff, K.I. Øvergård, Explicating the Ecological Interaction Properties. In T. Hoff & C. A. Bjørkli (Eds.), Embodied Minds – Technical Environments, pp. 147-160. Trondheim, Norway: Tapir Academic Press, 2008.
[25] K J. Vicente, Ecological Interface Design: Theoretical foundations, IEEE Trans. On Sys. Man. And Cybern. Vol. 22 No. 4 July/August, 1992.
[26] J. Rasmussen, K.J. Vicente, Coping with human error through system design: Implications for ecological inter-face design, Int. J. Man-Machine Studies, vol. 31, pp. 517-534, 1989.
[27] C.M. Burns, J.R. Hajdukiewicz, Ecological Interface Design, CRC Press, Florida, 2004.
[28] C. Ware, Visual Thinking for Design, Elsevier, Morgan Kaufmann Publishers, pp. 10-17, 29, 36-41, 58-59, 74, 84, 2008.
[29] C.G. Healey, J.T. Enns, Attention and Visual Memory in Visualization and Computer Graphics, IEEE Trans. On Visualization and Comp. Graphics, 2011, accessed June 2012.
[30] C.G. Healey, Perception in Visualization, http://www.csc.ncsu.edu/faculty/healey/PP/
[31] J. Duncan, G.W. Humphreys, Visual search and stimulus similarity, Psychological Review, vol. 96, no. 3, pp. 433-458, 1989.
[32] J.J. Gibson, The Ecological Approach To Visual Perception, Psychology Press, pp. 8, 16-24, 93-96, 127-132, 147-148, 1979.
[33] D.A. Norman, Affordances and Design, www.jnd.org, 2004, accessed Dec. 2012.
[34] R.H. Hartson, Cognitive, physical, sensory, and functional affordances in interaction design, Behaviour & Infor-mation Technology, 22, pp.315-338, 2003.
21
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Factors Associated with Digital Readiness
in Rural Communities in Israel
Simha DJURAEV
and
Moshe YITZHAKI Department of Information Studies, Bar-Ilan University
Ramat-Gan, Israel 52900
Abstract
In the age of the information and knowledge society, digital readiness has become an important element of economical and social development. The Purpose of the study was to assess the level of "digital readiness" in rural communities in Israel and to find factors associated with it. A closed questionnaire designed to measure six different aspects of digital readiness was filled out by 200 people living in four rural settlements. A digital readiness index was composed of these six measures. Additionally, an open questionnaire was rendered to each governing council representative in these communities. Main findings were: reported rate of domestic internet connection and use (60%) was lower than the national average, probably due to a combination of two factors, known in the literature as deterrents of technological progress, namely, the rural nature of the settlements and their remoteness from the center and the unique religious character of the residents. Demographic features found to be associated to at least three measures of digital readiness were: age group, income, education and level of religion observance. Ultra-orthodox respondents were found at the lowest level of digital readiness, probably due to relatively better adaptation of the modern orthodox sector to the internet, unlike the ultra-orthodox one. Keywords: Digital Readiness; Rural Communities; Israel
Introduction
In the age of the information and knowledge society, digital readiness has become an important element of economical and social development. A country wishing to maintain its competitiveness capacity aspires to elevate the level of its digital readiness nationwide and to reduce digital gaps between different sectors. Assessment of digital readiness in quantitative terms is essential for monitoring and forming an efficient public policy regarding digital gaps. We assumed that obtaining such information about rural communities, a hitherto unaddressed population sector, would contribute to the reduction of existing digital gaps. Additionally, the study might act as an impetus to community development through ICT, promoting both digital readiness and community development goals.
Purpose of the study To assess the level of "digital readiness" in rural communities in Israel. To reveal factors associated with the "digital readiness" level of these rural communities.
Research procedure Definitions
"Digital readiness" was defined as "the extent of ability and willingness to make use of a local site as a tool for personal and community development". Practically, the "digital readiness" of residents of the studied communities was defined as a multi-facet variable composed of the digital readiness of the residents and that of their governing councils.
Population
The population studied included the adult residents of four rural settlements as well as the representatives of their governing councils.
Research Tools and Sample
The research included two parts: A closed questionnaire which was disseminated to 200 randomly chosen residents, forming a representative sample of the entire population living in the four rural settlements chosen as the target population of the study. The questionnaire was designed to measure six different aspects of digital readiness of the residents: (1) domestic computer and internet infrastructure (2) extent of internet use (3) level of internet proficiency (4) perception of internet importance (5) inhibitions to internet use (6) level of interest in community internet. These six aspects comprised the "digital readiness" index. In addition, the following demographic features were examined: residence, sex, age, status, number of family members, religious level, occupation, income, education, disability and ethnic group. The answers were analyzed using SPSS. Additionally, an open questionnaire was issued to each governing council representative in the studied communities, aiming to determine two measures of community digital readiness: activity and infrastructure for community development through the internet as well as their view of the internet as a community development tool. The open questionnaire sought to determine the "digital readiness" of the governing councils, i.e. (1) the extent of current internet activity and infrastructure for community development, and (2) the perception of representatives of the internet as a community development tool. Responses from the open questionnaires were analyzed by qualitative means.
Findings and discussion The main findings were: Internet Ownership and use
22
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
About 56% of the respondents reported having a computer at home and using a domestic internet connection, while 21% had a home computer not connected to the internet and another 23% did not own a computer at all. These numbers were 10-20 percent lower than the then prevailing national average in the Jewish sector. Former studies have indicated that peripheral location and rural character are known inhibitors of technological progress, slowing the pace of internet penetration in such communities, refuting the common assumption that internet use serves to alleviate isolation among geographically remote populations. The lower rates of domestic internet connection and use may also be attributed to the unique religious character of the respondents, 97% of whom defined themselves as religious or ultra-orthodox, unlike the Jewish sector in general, where these sectors comprise only 20 percent. While 40% of respondents reported not using the internet for other than job related purposes, the majority (60%) used it on a daily basis, with the extent ranging from less than one hour (60%) to over 3 hours a day (11%). One-third of the subjects considered the internet important or very important, with the remainder attaching to it slight importance (33%) or none (33%). The vast majority (76%) expressed some interest in developing a community internet, while 24% were indifferent. About half the respondents assessed themselves as lacking internet skills, 37% reported having low skills and only 17% ascribed to themselves medium to high level of internet skills. Interestingly, while the vast majority (82%) reported having no technological block regarding the internet, the religious block, or inhibition, was found much more imminent, reported by 42% of the respondents. Factors associated with "digital readiness"
The demographic features (excluding residence) found to be related to at least three measures of digital readiness were: age group, income, education and level of religiousness.
Gender
Female residents of the four settlements were found to be more interested in establishing and developing a community internet than men.
Age group
Predictably, older age groups (40 and up) utilized the internet less for personal non-work related purposes, felt less technically proficient and attributed lower importance to it, than members of the younger age groups (19-29 and 30-39). This suggests that despite the high prevalence of internet in Israel, a generation gap remains between older and younger age groups in the ability and desire to use the internet. Lower rates of internet use were also found among unemployed people and housewives as opposed to working people and students.
Number of children
Contrary to our hypothesis, digital readiness was found to be associated with family size: households with 2-4 children reported greater internet use than those having 5-6 children. Apparently, child care demands leave less spare time for leisure internet use.
Income
As in previous surveys, income was found to be related to both the extent of internet use and perceived internet importance: respondents with a higher monthly income (above $2200) compared to those earning $2200 or less, were found to have the highest level of digital readiness in the following measures: number of internet uses, perceived importance of the internet and general digital readiness index. These findings reemphasize the long-proven connection between income and education on one hand and digital gaps on the other hand, which still exists in Israel and in most developed countries.
Education level
Similarly, education level was also found to be positively associated with digital readiness. Higher educated (rabbinic or academic) respondents scored higher on the digital readiness index, made more use of the internet and attached greater importance to it than those with only an elementary or high school education. Religiousness level
Unlike previous surveys, our findings do not indicate a consistent reverse relation between one's religiousness level and digital readiness. Respondents who defined themselves as modern-orthodox reported higher internet use, displayed more interest in developing community internet and scored above the average in the digital readiness index, as compared to those who defined themselves as ultra-orthodox or others. Ownership of a personal computer with internet connection was reported by most modern-orthodox (61%), but by only 32% of ultra-orthodox residents. However, respondents defining themselves as ultra-orthodox had the lowest level of digital readiness. This finding can be explained by a religious inhibition to use the internet that was reported by 68% of the ultra-orthodox but only by 37% of the modern-orthodox. Evidently, strict religiousness level apparently creates a certain psychological block against use of the internet and other digital readiness measures. The gap between the above-mentioned Jewish religious groups may result from the relatively better adaptation of the so-called "modern-orthodox" sector to the internet and other advanced forms of IT, unlike the more conservative ultra-orthodox one. No indication was found, however, of a negative connection between religious level and extent of internet use, within each of the religious groups analyzed. Differences between studied communities
The four studied communities were found to significantly differ in demographic features of income and religiousness level. Apparently, the significant differences found between the four communities 23
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
concerning the extent of internet use and digital readiness index measures are related to these two demographic variables.
Community digital readiness
Analyzing the questionnaires filled by the governing council representative in each of the four settlements, significant differences were found in the level of community digital readiness and local use of ICT. However, all four representatives considered the internet an important means for promoting community quality of life. Nevertheless, some of them complained about lack of resources, or a lack of keen interest on the part of their colleagues in the local council or many local residents.
Further study
In light of the findings regarding the connection between religious level and the consequent reluctance to use the internet, a further study within the various religious sectors is recommended, in order to better clarify the connection between those variables. Such research may help to formulate a policy aimed at reducing the digital gap among certain religious groups. Regarding the practical aspects of community digital development, the settlement best suited to establish a community internet project was obviously that whose residents and governing council had the highest digital readiness measures and maintained a community site.
24
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Information and adaptation in a public service sector:
The example of the Dutch Public housing sector
Hendrik M. Koolma
Faculty of Social Sciences, VU University Amsterdam, 1081 HV The Netherlands
ABSTRACT
A public service sector can be conceived as a multi agent system subordinated to a principal, mostly a department of a national government. The agents are relative autonomous and have decisional discretion, as long as they respect the boundaries setup by law and legislation. The hierarchy is less compulsory than in a command-and-control structure. Central control opponents presume more adaptational capacities of semi-autonomous organizations. The line of thinking is that the
distributed intelligence structures can cope better with variance in circumstances. Such a multi agent system would be more suitable to handle environmental complexity. The paper gives insight into the way such a particular kind of multi agent system makes decision on issues of adaptation. Empirical evidence from the case of the Dutch housing sector shows that the expectations of scientist and policy makers are exaggerated. The agents use strategies which reduces decisional complexity,
whereby the adaptation to environmental circumstances is low and arbitrary and rationality of the adaptation is limited by self-reference and overconfidence. This observation provides new thoughts to the ongoing rationality debate. Keywords: adaptation, MAS, portfolio management, nonprofit, KDD, overconfidence, self-reference, rationality
1. INTRODUCTION
The case in this paper, the Dutch public housing sector, is an example of private nonprofit organizations which are the executioners of public tasks. The number of organizations is decreasing from 552 ultimo 2002 to 400 ultimo in 2010. The organizations are foundations or associations. Their private status provides them the protection of property rights, while the public law gives them certain privileges in comparison to for profit housing organizations [1]. Law and legislation prescribe the objectives, although ambiguously regarding the vagueness
of the multiple goals. One goal is to be read in the text of the core regulation: housing corporations are supposed to deal with differing local circumstances, tracking and serving the target groups in their working area, and taking into account the market situation. This expectation is neither operationalized nor instrumentalized, so the text is no more than an intentional instruction given by the public legislator to the private agents. The implementation is left to the own responsibility and
discretion of the local agents.
2. MAIN QUESTIONS
The theoretical question is how multi agents systems involve information in decisions on adaptation to the environment. Empirically the question is elaborated: Which kind of information correlates to the decisions made? Which information processing strategies reflect the observations from the case to the Dutch public housing sector?
The two questions need conceptual elaboration. All kinds of
information could be discerned, depending on the chosen point of view. For this purpose the following two taxonomies are crossed (see figure 1). Firstly, the decision process is divided in blocks, each containing types of information:
• Static information, like retrospective statistics on processes
and facts in the environment, and statutory objectives.
• Dynamic information, reflecting the working processes in
action.
• Conditions for decision making.
• Forecasts of action programs and expected effects of these
programs. This conception is an example of the construction of the ontology [2] of decision-making by housing corporations and owners of real estate portfolios in general. The second taxonomy concerns the level of information, considering the sector to be an open, social and anthropogenic
system [3]:
• Environment of the organization.
• Organizational level.
• The level of the decision maker.
Because of the aggregation level of the data set, the level of separate decisions is not available. The level of the decision makers requires some explanation.
Although organizations ought to be considered as impersonal system of human effort, the function of the chief executive has personal aspects according to Barnard [4]. The chief executive has influence on which information is admitted to the decision making, clearly illustrated in situations of groupthink [5]. Assuming this, the CEO has a determining role in the complexity of the decision making. Also the CEO is a major ‘node’ in the linkage between organization and sector networks,
bringing in ideas for innovation and so on. The two taxonomies are crossed in next figure, and filled with information items which are possibly relevant to a service delivering organization. Figure 1 Conceptual model
Static
information
Process
information Conditions Forecasts
Environment
Organization
Decision makers
Decision on
Adaptation Not all cells of the table are filled, due to limitations of the data set. Transactions in the working area could comprise relevant information. Most missing cells are on the decision maker’s level. Items like education, tenure, incentive compensation contracts, reputation, personal values, heuristics, beliefs, scores
25
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
on overconfidence scales, etcetera would be interesting and should get attention in further research. Yet, we make do with
what we have at this moment.
Information strategies The second question is what information strategies can be observed. Simon [6] coined the concept of bounded rationality. March [7] applies this concept to organizational decision making. Bounded rationality implies that decision makers cannot process all incoming and available information. They
have the intention to be rational but they are as humans restricted by computational limitations. However, limited use of information can also be attributed to an adverse choice of heuristics [8], for instance when they rely on intuition in issues which were better handled with cold headed arithmetic. Barnard [9] addresses this phenomenon in an essay in which he introduces a useful taxonomy of types of information:
• Precise information, like business registers.
• Hybrid information, neither complete nor unambiguous,
but probably most relevant to the problem issue of the decision maker.
• Speculative kind of information.
Regarding the subject of rationality Luhmann [10] provides a differentiation. Rationality is usually conceived as guided by objectives. In his opinion however, three kinds of rationality
have to be considered:
• Guided by objectives and rules.
• Attracted by perceptions of chance.
• Assignment to a problem solving approach.
Luhmann reveals strategies of complexity reduction. Firstly, if the relations between situation, objectives, actions, and effects are undetermined, inner complexity of the system will be low.
In other words, the degree to which collected information, outspoken objectives and decisions are related, is varying. Secondly, the connection of social systems to their environment is indirect and interfered by self-reference [11]. The implication of this theory is that references that the necessity and the success of adjustments to the environment will be proved by prepossessed beliefs and opinions inside the organizations.
If success of action is believed to be predetermined, we encounter the phenomenon of overconfidence. In this paper overconfidence is put in an informational frame. In spite of uncertainty and risks not all available and relevant information is used for the decision making, relying on a delusion of success [12]. Charness and Gneezy [13] show that when complexity reducing techniques like portfolio analysis are applied to highly complex problems, the perception of risks will decrease
substantially. Presumably, information regarding the risks will be rather avoided than digested.
3. DESIGN AND OPERATIONALIZATION
The research is an example of knowledge discovery in databases (KDD) [14] applied to an open multi-agent system of public housing providers.
Adaptation Adaptation is conceived as a deliberate adjustment of the
housing portfolios to norms, demands, and market forecasts. Portfolio analysis implies separate decisions investments and divestments [15]. However, project decisions are aggregated in
the data set to the level of the organizations and report years. Portfolio adjustments are realized by means of:
• Acquisition of houses from other owners and landlords.
• Building of rental houses.
• Building of houses for sale.
• Sale of existing rental houses.
• Demolition and joining small houses to a smaller number
of large houses. The last two bullets represent divestments. The last item is a combination of two measures due to an aggregation in the provided original data set.
Operationalization of the input variables A selection is made of variables in the data set to cover the scheme of potentially relevant information items.
Figure 2 Scheme with input items
static
information
proces
information conditions forecasts
regulatory
objectives
local
arrangements
long-term
market
local stock
composition
local field
position
demographics
statutory
objectives actual supply
size
own stock
composition
actual
demand assets future assets
fit demand
supply hidden assets
decision makers prominence in
fields
environment
organization
Some remarks on the variables. The regulatory objectives on the environmental level are equal to all organizations, for which reason they are not selected as variable. Housing corporations
have to comply to these objectives in order to be admitted, so there is neither differentiation in the statutory objectives. Analysis of 144 of the 522 annual reports does not show differences between by state prescribed objectives on the one hand and the expressed operational objectives on the other hand. This smaller sample includes also the variable local arrangements on objectives and agreed performance. The items are elaborated into 26 variables. Some remarks can be made. One of the variables is the size of
the organizations, because of regulatory practices expressed in the number of rental houses and other objects in exploitation. The point of view determines whether size of an organization is a (human) resource or an attribute for the position in the local field. Study of motivation to mergers in the annual reports shows both points of view. In this paper organizational size is conceived as an attribute of resources. The organization field position is measured as ‘prominence’ by a combination of two
variables, one measuring the activity of the chief officers as speaker on national symposia and the other measuring the participation of chief officers in committees of the national sector organization. The first variable is weighted twice.
Hypotheses Each independent variable has an implicit hypothesis. Assumed is that all variable are correlated to adaptation decisions. If not,
the null hypotheses cannot be rejected. For the sake of reader’s digest, the paper leaves the null hypotheses unmentioned, and only presents values if a variable has a significant correlation.
26
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Hypotheses on informational strategies The question on the informational strategies is elaborated in sub
questions. In the paper the question on informational strategies is elaborated in sub questions: 1. Do organizations use all available information (full use)?
Full use is measured true if all cells are covered by significant variables. This hypothesis is tested for each dependant variable and for the aggregate.
2. Or, are they limited by computational limitations (bounded
rationality)? Bounded rationality is found if only one or two cells are covered by significant values, assuming that information is restricted to simple one-to-one or two-to-one causations.
3. Which of the three kinds of rationality is reflected by the
patterns of information selection: a) guidance by
objectives, b) chance orientation, or c) problem solving
approach? Problem solving rationality is ascertained if one of three
variables (fit between supply and demand, vacancy, satisfying low income households’ demand) has the highest partial correlation value in the estimation models. In those cases is assumed that adaptation is a reaction on direct observable disturbances in the public service delivery. Chance orientation is difficult to measure. In the data the variable for hidden assets is used as indicator for an
orientation to chances. Guidance by objectives is not operationalized, but the disassociation between objectives and action (see 8.)
4. Which type of information is preferred: a) precise, b)
hybrid, or c) speculative?
From the information type taxonomy only the precise information is selected. Precise information is determining adaptation if only static information items have significant
correlation values. 5. Can self-reference be ascertained in the balance between
information inside and outside the organizations? Self-reference is tested by measuring the correlation values on the row of the organization in comparison to the values on the level of the environment. If in the row of the organization two cells contain the highest correlation scores, self-reference is ascertained.
6. Are information-reducing scopes like self-reference and
closure attributes of the process or of the position of the
organization? Self-reference on the action is the case if the level of precedent investment activity determines the decision on future adaptation. This is called inertia, and is proven if the investment level in the preceding years has the highest t-value in the estimation model. Process closure is ascertained if only the organization row contains
significant correlation values. Position closure is the case if only the condition column holds significant values.
7. Do the observations show signs of overconfidence? Overconfidence is not tested by a hypothesis but deduced by argumentation. Indicators are low use of presumably relevant information items in relation to the decision.
8. Are there indications of disassociation between objectives
and actions?
This question is also answered by argumentation.
Methods Empirical evidence comes from a data set (2002) provided by Dutch regulatory agency and supplemented with demographic
and market information. On most items the data set covers the whole population of agents (N = 552).
Testing is made by multiple regression according to the Tobit method. This method is of use for data set where the dependant variable has restrictions. The restriction in this inquiry is that there are both zero values, and a descending range of values approximating zero. The sets of independent variables are controlled for collinearity by a series of Pearson bi-variate regressions. Variables with a significant correlation to the dependant variable are selected and
used as input for the Tobit regression analysis. The measured t-values are presented in the result schema as far as the selected variables correlates significantly. Thresholds for significance are in descending order 1%, 5%, and 10%.
4. ANALYSES
The use of information is described by means of filled version of the input scheme. Subsequently the hypotheses on information strategies are tested. The procedure is done for each of the five items of portfolio adaption.
Acquisition The first dependant variable concerns the acquisition of houses from other owners and other landlords. Figure 3 Scheme with results on acquisition Tobit regression
Acquisition n=521
static
information
process information conditions forecasts
environment
organization Supply to low-
incomes households
(1.9901**)
Size (1.9309*)
decision makers Prominence
(1.8489*) 1. Full use of information not the case.
2. Bounded rationality is tested false because of the measurement of 3 significants.
3. Problem solving rationality is ascertained, because the supply presents the highest t-value. The variable for hidden assets has not a significant score, so chance orientation is not found.
4. Static information items do not show significant correlations. Therefore there is not a preference to precise information.
5. The self-reference on the organizational level requires two highest scores in the row. This is the case, so the hypothesis is accepted.
6. Inertia is not found, because the investment level in preceding years has not a significant score. Process closure is not observed, because of significants in other rows than the one of organization. Process closure is tested false, because of significants outside the conditions column.
7. As acquisition is a market depending operation, the absence of correlations to market forecast variables indicates overconfidence.
8. Actual performance in the supply to low-income households is related to the level of acquisitions. The reason for this observation is not clear. There is no clear argument to state a dissociation between objectives and action, so the hypothesis is rejected.
27
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Building new rental housing Building of rental houses is a core business of housing
corporations, because it achieve a renewal of their stock. The following significant correlations are found. Figure 4 Scheme with results on building rental houses Tobit regression
building rental housing
n=447
static information process information conditions forecasts
environment
organization
Share low quality
houses (1.6965*)
Investments
preceding years
(3.8194***)
∆ Solvability
(-6.4146***)
decision makers 1. The full information hypothesis is easily rejected. 2. Bounded rationality is tested false. 3. Problem solving rationality is not ascertained, because of
the fact that low quality is not chosen as indicator of a problem solving approach. The variable for hidden assets has not a significant score, so chance orientation cannot be stated.
4. One static information item has a significant correlation,
but there are other kind of information items. 5. The self-reference on the organizational level requires two
highest scores in the row, so the hypothesis is accepted. 6. Inertia is found, because the investment level in preceding
years has a significant score. Process closure occurs when significant information is only found in the organization row. This is the case. There are no significant scores in the condition column, so position closure is not ascertained.
7. Although low information dependency is observed, there are at first glance no arguments for stating overconfidence. However, taken into account the negative correlation to the solvability trend, building of new rental houses should give financial concerns. However, the decisions are indifferent to the actual financial state, although the financial impact of the decision is remarkably negative. So, it is allowed to indicate overconfidence. Also the indifference to market
forecasts supports the claim of overconfidence. 8. The results suggest that another objective plays a role,
namely a concern on technical quality. The regulation comprises a quality objective, namely the maintenance of the level of quality. Therefore there is no ground for the statement of objective disassociation.
Building houses for sale
Building houses for sale is not a traditional activity to housing corporations. However shortage of affordable owner-occupied houses could bring housing corporations to invest in such houses. The next figure shows on which items the choice between activity and no activity depends. Figure 5 Scheme with results on building of houses for sale Tobit regression
building houses for
sale n=429
static information process information conditions forecasts
environment Share owner-occupied
houses (-2.1967**)
organization Investments preceding
years (7.7763***)
Size (6.9819***)
decision makers
1. The full information hypothesis is rejected. 2. Bounded rationality is tested false, because of the
observation that more than two cells are covered. 3. Problem solving rationality is not ascertained. The variable
for hidden assets does not have a significant score, so the hypothesis on chance orientation is rejected.
4. One static information item shows a significant correlation. However the correlation is not exclusive, so the preference to precise information is not proven.
5. The self-reference on the organizational level requires two
highest scores in the row. This requirement is met. 6. Inertia effect is found, because the investment level in
preceding years has the highest t-value. Process closure occurs when significant correlations are only found in the organization row. This is not the case. The significant correlation scores are not restricted to the condition column, so position closure is absent.
7. Considering that building of houses is not a core business and housing corporations have to operate in a demanders’
market, it is astonishing to observe indifference to market forecasts. Therefore overconfidence can attributed to the decision to develop houses for sale. The inertia observation enhances the risk of building for periods without demand.
8. There is no correlation to forecasted demand from buyers of low-priced houses. The correlation to the share of owner-occupied housing stock in the municipalities could reflect considerations with the local housing policy.
However, the type of housing is private and the potential demand of low-income has no influence, so a disassociation between objectives and action is stated.
Divestment by sale of existing rental houses Sale of existing rental houses can serve certain objectives. Common practice is that investment in new rental houses is combined with sale of existing houses. The analyses provide the
following results. Figure 6 Scheme with results on sale of existing rental houses Tobit regression
sale rental houses
n=475
static information process information conditions forecasts
environment Low-income
households
(-2.0787**)
organization Vacancy rental
houses (3.6062***)
Size (4,1785***)
Solvability (-1,8133**)
decision makers 1. The full information hypothesis is rejected. 2. Bounded rationality is tested false too. 3. Problem solving rationality is ascertained because of the
significant correlation of vacancy to the sales. However the t-value of Size is higher, therefore the problem orientation is not leading. The variable for hidden assets has no significant score, so chance orientation is absent.
4. One static information item shows a significant correlation.
However the correlation is not exclusive, so the preference to precise information is not proven.
5. The self-reference on the organizational level requires two highest scores in the row. This requirement is met, so the decision is characterized by organizational self-reference
6. Inertia effects are absence, because the investment level in preceding years has no significant score. Process closure occurs when significant information is only found in the
organization row. This is not the case. The significant
28
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
correlation scores are not restricted to the condition column, so position closure is absent.
7. Decisions on sale of rental houses are made without sensitivity for market forecasts. Consequently, the profits of a sale program might be overestimated, so also on this decision overconfidence is observed.
8. The decision to sell houses has a negative correlation to the share of low-income households in the local situation. This indicates a reverse response. However the regulation mentions also the aspect of livability, an objective that in
state documents is associated which a policy against geographical concentration of low income groups. Ambiguity of regulatory objectives is a reason to reject the hypothesis of disassociation between objectives and actions.
Divestments by means of demolition and joining of houses The number of houses in stock decreases by demolition and the practice of joining small houses to larger ones. Analyses of the
decision on these divestments have the following results: Figure 7 Scheme with demolition and joining of houses Tobit regression
divestments n=521
static information process
information
conditions forecasts
environment Share owner-
occupied houses
(-1.6930*)
organization Share appartments
(-2.0613**)
Vacancy rental
houses
(3,1094***)
Size (5,7346***)
Hidden assets
(-1.9598*)
decision makers 1. Here, the full information hypothesis is also rejected. 2. Bounded rationality is tested false, because more than two
cells are covered. 3. Problem solving rationality could be ascertained, when not
the t-value of Size would be higher. The variable for hidden assets does have a significant score, so chance orientation is found. The negative value implies that corporations with hidden assets are more reluctant to demolish or join their houses.
4. Two static information item show significant correlations.
However the correlations are not exclusive, so the preference to precise information is not proven.
5. The self-reference on the organizational level requires two highest scores in the row. This requirement is met, so the decision is characterized by organizational self-reference.
6. Inertia effect is absent, because the investment level in preceding year does not have a significant score. Process closure occurs when significant information is only found
in the organization row. This is not the case. The significant correlation scores are not restricted to the condition column, so position closure is absent.
7. There are no arguments to state overconfidence. 8. The divestment decisions are depending on characteristics
of the housing stock. As mentioned before, quality aspects are connected to a second statutory objective. So, there is no ground for stating a disassociation between objectives and action.
Results summarized The paper has to deal with two questions: firstly which information is related to organizations’ adaptation, and secondly, which information strategies can explain the observed
sensitivity to information items. The answer on the first question is presented in the next table.
Figure 8 Information items correlated to portfolio decisions Aggregation of
portfolio
adjustments
static information process
information
conditions forecasts
environment Demographics Local field position Long-term market
•
Local stock
••
organization Own stock Vacancy Size ∆ Solvability
• ●● ●●●- ●
Quality Actual supply Assets
- ●●• ••
decision makers prominence
-
significane to 1% ●
significane to 5% •
significane to 10% - The used items are spread over the scheme. The implicit hypothesis was that the selected information items would be relevant to the decision making. Market forecasts do not make
difference to portfolio decisions. Local field position is irrelevant. Remarkably, size is the most prominent variable. It has more influence on adaptation than public housing issues. It could be that more human resources are available. It might be a matter of position as well: bigger corporation have presumably more power to impose portfolio adjustment upon actors in their environment. Several hypotheses are postulated regarding to the information
strategies, based on rationality types and heuristics. In the following table the summarized result of the analyses is given. Figure 9 Results on testing information stategies
Aggregate
investment investment sale of divestments
acquisition rental for sale rental stock (demolition) Total
Full use False
Bounded False
Problem oriented True 1 True 4 False
Chance orientation True True 1 True 4 False
Preference precise False
Selfrefence org. level True True True True True True
Inertia True 1 True 4 False
Closure process True 1 True 4 False
Closure position False
Overconfidence True True True True 4 True 1 False
Task avoidance True True 2 True 3 False
ActionsHypotheses test results
All five analyses are negative on the full use of information items. The hypothesis of bounded rationality is also rejected. The hypothesis of problem solving rationality is both rejected and accepted. Items indicating problem orientation are found,
however they have lower t-values than other items. The hypothesis of chance oriented rationality is found at acquisition and divestment decisions. At the building and sales decisions the chance orientation is absent. The hypothesis of a preference to precise information is rejected overall. The hypothesis on organizational self-reference is accepted in all five issue of portfolio adjustment, bearing for a large part on
the influence of Size in the regression analyses. The inertia hypothesis is only confirmed in the decision to build houses for sale. This is an intriguing result for a commercial activity in a competing market. For that reason the observation is related to overconfidence. Insight observations of the organizations demonstrate that housing corporations set up departments for commercial real estate development. Probably
29
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
production becomes a goal in itself as soon as such departments are in operation.
Informational closure of processes is only observed in the decision of building new rental houses. A probable explanation is that it is a self-inducing routine of housing corporations. Informational closure related to position is overall rejected. The overconfidence hypothesis is not tested straightforwardly, but by argumentation. On four portfolio issues argumentation indicates overconfidence, so overconfidence is a rather present attribute of portfolio decisions by housing corporations.
Disassociation between regulatory objectives and actions is indicated by argumentation on two issues. The confirmation concerns building of houses for sale and the sale of existing rental houses. A tentative explanation for this observation is that these commercial activities trigger goal displacement.
5. CONCLUSION AND DISCUSSION
The paper has to deal with two questions: firstly which information correlates to organizational adaptation, and secondly, which information strategies can explain to the
observed sensitivity to the information items. Adaptation is operationalized as portfolio adjustments by Dutch housing corporations. Empirical evidence is based on a data set of annual report information (2002), supplemented with demographic statistics and market forecasts. Answering the first question, it is remarkable that Size of the organization dominates the portfolio decisions. Also notable is the insignificance of market forecasts and financial concerns.
Self-reference and overconfidence appears to be the most significant information strategies of housing corporations. The operationalization of 3 strategies, namely full use, bounded-rationality and preference of precise information might be reconsidered for future research. The research has been designed to provide insight in the relation
between information and adaptation in a public service sector. Starting point was the expectation of policy makers that a privatized multi agent system would have more capacity to deal with local differences. The paper is not set up to prove or to reject the expectation of Dutch policy makers, however it gives a lot of second thoughts. Do the results converge of diverge with other inquiries into the
Dutch housing sector? De Kam [16] concludes that building of houses for sale has a weak correlation to market circumstances, and a strong relation to internal attributes of the organizations. Nieboer [17] indicates on basis of interior observations in case studies, that although general portfolio policy comprises market information and demand of target groups, actual investment decisions are decoupled from this policy. The research design has obviously blank spots, so more and
other information on the environmental and personal level could shed new light on the relation between information and adaptation. A data set which allows longitudinal research would be welcome. More comprehensive information could provide higher correlation values of multiple regression models, although the scores presented are not uncommon for research into social systems and institutionalized environments.
At the end one question remains: can the results of the paper be generalized to other sectors. The Dutch housing corporations are an example of nonprofit organization. Nonprofit
organizations have other drives than the profit maximand, even if they expand to commercial activities. So a generalization to
other nonprofit sectors might be considered, especially if their public task comprises investments and divestments. Although a transfer of results to profit sectors is not recommendable, application of the research design to these sectors would be a challenge. Can we find the homo economicus, or do we find organizations which apply complexity reducing strategies when they have to decide on adaptation?
6. REFERENCES
[1] European Commission, State aid No E 2/2005 and N
642/2009 - The Netherlands Existing an special project aid to housing corporations, Bruxelles: Commission Européenne, 2009.
[2] B.F. Fomin & T.L. Kachanova, Physics of Open Systems: Generation of System Knowledge, 2012 http:www.iiis.org/CDs2012/CD2012I<C/IMCIC_2012/PapersPdf/ZA477TJ.pdf
[3] Idem
[4] C. I. Barnard, The Functions of the Executive,. Cambridge, MA: Harvard University Press, 1938, p. 216.
[5] G.R. Whyte, “Recasting Janis' groupthink model - The key role of collective efficacy in decision fiascos”,
Organizational Behavior and Human Decision Processes, Vol. 54, pp. 185 - 209.
[6] H.A. Simon, Models of man - social and rational. New York: John Wiley & Sons, 1957, p. 198.
[7] J.G. March, “Understand How Decisions Happen in Organizations”, J.G March, The Pursuit of
Organizational Intelligence, Malden MA: Blackwell Publishers, 2000, p. 16.
[8] K.E. Stanovich & R.F. West, “Individual Differences in Reasoning - Implications for the Rationality Debate?” Behavioral and Brain Sciences Vol. 23, No. 5, 2000, pp. 645-665.
[9] C.I. Barnard, “Mind in Everyday Affairs”, C. I. Barnard, The Functions of the Executive, Cambridge MA: Harvard University Press, 1936, p.309.
[10] N. Luhmann, Zweckbegriff und Systemrationalität -
über die Funktion von Zwecken in sozialen Systemen, Tübbingen: J. C. B. Mohr, 1968, p. 179.
[11] N. Luhmann, Social Systems, Stanford CA: Stanford University Press, 1995; D.D. Reneman, Self-Reference
and Policy Success - An exploration into the role of self-
referential conduct of organizations in the effectiveness of policies, Amsterdam: VU University of Amsterdam Press, 1998.
[12] D. Lovallo & D. Kahneman, “Delusions of Success: How Optimism Undermines Excutives’ Decisions”, Harvard
Bussiness Review, July, 2003, product 4279 [13] G. Charness & U. Gneezy, Portfolio Choice and Risk
Attitudes - An experiment (working paper), Santa
Barbara CA: University of California. 2003 [14] E.E. Vityaev & B.Y. Kovalerchuk, “Relational
methodology for data mining and knowledge discovery”, Intelligent Data Analysis, Vol. 12, 2008, pp. 189-210.
[15] Markowitz, H. M., Portfolio selection - Efficient
diversification of investments, New York: Wiley, 1959. [16] G. de Kam. Bouwgrond voor de Volkshuisvesting.
Almere: Nestas Communicatie, 2012, p 164.
[17] Nieboer, N. E. (2009). Het lange koord tussen portefeuillebeleid en investeringen van woningcorporaties. Amsterdam: IOS Press, p. 241
30
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
The usage of ISOTYPE Charts in Business Intelligence Reports - The Impact of Otto Neurath Work in Visualizing the Results of Information Systems
Queries
André S. Monat Post Doctorate Researcher - sponsored by CNPq-Brazil Scholarship
Marcel Befort Dipl. Des.
Program of Industrial Design in the field of design theory for methodology, planning and strategy
Wuppertal University, North Rhine-Westphalia, Germany
ABSTRACT Business Intelligence (BI) systems are designed to provide managers with a user-friendly way to build and analyze reports. Nowadays, BI systems make available a large range of graphics tools for displaying such reports. Nevertheless, these systems disregard so far the immense potential of using ISOTYPE approach for graphic display of statistics. ISOTYPE was created by the Austrian social scientist Otto Neurath (1882-1945). It is the acronym for International System of TYpographic Picture Education. ISOTYPE aims to create a system for communicating the analysis of social and management data for a broad audience that includes the laymen and the experts. The reason for not using ISOTYPE in BI systems may rely on the difficult to build algorithms that realize what Neurath described as the transformation phase of working over collected data. In this phase, data must be grouped in a proper way that facilitates further display and understanding about what we can conclude out of the data. In this article, we propose that BI systems could include ISOTYPE based tools for visualization. In order to illustrate our ideas we built a BI system that displays statistics on charts built according to ISOTYPE approach.
1. INTRODUCTION
Business Intelligence (BI) systems are designed to provide managers with a user-friendly way to build and analyze reports. These reports are usually meant to facilitate the management and decision taking in enterprises where a huge amount of data is generated and must be somehow analyzed. Business Intelligence systems are normally built in a multi-dimensional approach where the business enterprise is analyzed according to dimensions. As illustration, typical dimensions are date, product, enterprise branch, client and sales among others. A pivotal factor for making a successful BI system is to use a graphic system that display BI reports in an interactive and easy to understand way. Furthermore, BI reports must conduct the user to what is most interesting about the data being reported. The user must have his or her attention captured to what is special, different or exceptional about the data being portrayed. Nowadays, BI systems make available a large range of graphics
tools for displaying reports. This includes the well-known graphics tools available in spreadsheets systems and also more sophisticated business-oriented tools provided by the processing language project conducted by MIT media lab [3]. Nevertheless, BI systems so far disregard the immense potential of using ISOTYPE approach for graphic display of statistics. ISOTYPE is a graphical language created by the Austrian social scientist Otto Neurath (1882-1945). It is the acronym for International System of TYpographic Picture Education. ISOTYPE aims to create a system for communicating the analysis of social and management data for a broad audience that includes the laymen and the experts on the specific field being studied. Otto Neurath´s ideas were first applied in an exhibition held in Vienna (and later printed as an atlas) of social data in 1930 called Gesellschaft und Wirtschaft, or Society and Economics in English [1]. Later, all these ideas were applied in several others contexts [8]. Despite the huge acceptance of Neurath´s work, information systems tend not to list ISOTYPE based tools among those they make available. The reason for this attitude may rely on the difficult to build algorithms that realize what Neurath described as the transformation phase of working over collected data. In this phase, data must be grouped in a proper way that facilitates further display and understanding about what we can conclude out of the data. In this article, we propose that BI systems could include ISOTYPE based tools for visualization. We suggest the usage of ISOTYPE especially in BI systems that attend the broad public. The BI system must use a graphic framework built according to ISOTYPE principles and display the quantities and statistics over it. The numerical values may vary according to the parameters submitted to the BI system, but the graphics will always exhibit the values according to Neurath’s ideas. In order to illustrate these concepts we built a BI system that displays information about the usage of the Rio de Janeiro underground transport system. This system example is supposed to be available for the general public in underground stations or on the Internet.
2. THE ISOTYPE SYSTEM
31
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
In [10], the authors suggest that design performs in its relation to science a role similar to the one performed by the classical view of philosophy. This latter is interested in analyzing knowledge generated in other sciences and evaluate the impact of this knowledge in our society. Design is interested in interpreting this same type of knowledge and creating objects that can be useful for people. Under this perspective, both design and philosophy may be regarded as meta-knowledge. Otto Neurath was an Austrian born sociologist who developed a graphic system to present statistics and numeric data in an understandable way for a large proportion of the society. He thought that was fundamental to interpret data and be able to present it in such way that even laymen would be able to understand the message conveyed by it. Therefore, his work concerned how to portray knowledge that came from science in a way that it can be useful for ordinary people. His graphic system can also be regarded as meta-knowledge. Otto Neurath ideas had their first great impact when he became head of the museum of economy and society, in Vienna, and organized a major exhibition in 1930. Such exhibition had the same name of the museum, Geselschaft und Wirtschaft in German. His graphic system was mainly used to show data and statistics about societies and countries. In this moment, his graphic system started to be called Viennese method. Later, in England, where Neurath lived due the Second World War, the system was renamed ISOTYPE. In Figure 1 we have an illustration of an ISOTYPE work. Neurath intention was to show the rate of deaths and births in Germany, and to make very clear the years when the first surpasses the latter. In this work, we can find some characteristics of Neurath system. The description of these characteristics is presented in a comprehensive way in [8]. First we must use symbols to represent quantities. These symbols are called pictograms. In this picture we can see two main pictograms: babies, representing births and coffins representing deaths. In order to give the notion of amount of births and deaths, Neurath suggest repetition of pictograms. He believed such repetition had a much better educative impact than making the symbol size proportional to the amount being represented [12]. Moreover, pictograms should be exhibited having equal spaces among them. Others suggestions made by Neurath can be also realized from this picture. Preferably, time should be shown in vertical axis and amounts and statistics in horizontal axis. Pictograms should be two-dimensional pictures. The usage of perspective should be avoided. Neurath had a team to work on ISOTYPE graphics. In some moments this team involved 25 members and included Gerd Arntz, a graphic designer responsible for many ISOTYPE pictograms and solutions. Neurath divided his team in three main groups. The first one, called data collectors, was involved in collecting the data to be portrayed. The second group, called transformers, was involved in the process of analyzing, selecting, ordering and then making visual the information, data, ideas and implications involved (Lima, 2008). Finally there was the artistic group, involved in creating the graphics. In this work, we believe data stored in databases may be the result of the first group work. A designer is still essential to perform the transformer phase. Nevertheless, it should be accomplished with a further requirement. The basic idea of how to display the results should be scalable according to the amounts being exhibit. For instance, how could we remake Figure 1 for showing this same data for Austria rather than Germany? We believe an information system can be programmed to adapt a basic idea of displaying data to several
different contexts. Finally, designers could perform as the artistic group as well.
Figure 1 - Deaths and births in Germany from 1911 to 1926. Extracted from [8].
3. Introduction to Business Intelligence
The evolution of the enterprise databases systems followed two main approaches. The first one was mainly interested in storing, updating and querying data in order to fulfill the operational needs of the enterprises. The second one wanted to provide companies with decision support systems. This difference in purpose of these two approaches caused a difference on the way databases have been conceived and built. In [2], Edgard Frank Codd coined the terms OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) as a way to refer respectively to the operational database type and the one for decision-making. OLTP systems were created based on set theory and matrix calculus. OLAP systems were designed to attend businesses models and the analytical approach this type of decision requires [13]. Thomsen [14] says OLAP systems are valuable for enterprises due to their capacity in making comparisons and simulations over variables that represent the business environment. As illustration of these variables we may mention variances, ratios and tendencies in sales. The term analytical in OLAP is related to the capacity of slice and dice data in order to take business decisions. Variables as productivity and profit margin are typically used in this process. During the last two decades, there were several great advances in both OLAP and OLTP systems. Specifically for the OLAP systems, Howard Dresner, from Gartner Institute, coined the term Business Intelligence to aggregate the whole set of tools and equipment that make possible to decide with solid foundations. In [11] we can find the whole evolution of BI systems. The main feature concerning the BI systems is the way they make possible the construction of business reports without the necessity of using any programming language or deeper knowledge of computing. These systems also portray data in panels called dashboards. These panels synthesize hundreds of relevant information about the enterprise. Normally this data is
32
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
presented in a familiar way to business managers, using well know graphic styles. The advantage of using BI systems become clear when we face data about their wide spread usage by all sectors of the economy. Examples of such usage may be found in Kimball and Ross (2002). These examples cover areas as sales, human resources, production, and telecommunication among others.
4. THE STAR SCHEMA
In BI systems, it is very common to analyze data according to the facts stored and the several dimensions we can use to categorize those facts. We may illustrate this using the Rio de Janeiro underground transport system. Such system, shown in Figure 2, has two main lines, simply called line 1 and line 2, and each one of these lines have two directions. Line 1 has directions Saens Peña and Ipanema, and line 2 has directions Botafogo and Pavuna.
. Figure 2: Rio de Janeiro underground map.
Extracted from [9] Rio de Janeiro underground transport system, called Metrô Rio, is able to control the number of passengers that enter each station. Through the usage of the turnstiles, it is also possible to monitor the direction each passenger takes. In BI terms, we would call a fact each moment a passenger performs his or her trip involving two stations of the Metrô. Each passenger may be recorded by his or her ticket number. Table 1 shows the variables involved in this type of control. Since each line in Table 1 represents a passenger on a trip, we have a line for each fact we are analyzing. The system we are going to use as illustration is supposed to communicate to the general public
information about the intensity of the flow of passengers in different moments and dates. The public could rely on the reports generated by this system in order to decide the best moments and stations to perform its trips.
TRIP IDENTIFIER
TRIP ATRIBUTES
Id_TICKET STATION_ORIGIN
STATION_DESTINY
LINE
LINE_DIRECTION
DATE
TRIP_START
TRIP_END
TIME_ELAPSED
OCCUPANCY_RATE
Table 1 – Variables associated to a trip Each fact associates the passenger with the origin station of the trip and the destiny one. It also records the line direction and the line itself. The fact contains also the date and time when the trip occurred. In BI terms, we say that stations, lines, date and hour are dimensions for data in Table 1. The time spent on the trip is called a measure for the system. All measures must portray numerical values associated with the fact being stored. The occupancy rate is regarded as a calculated variable. It shows an estimative of the occupancy of the average wagon on that trip. Figure 3 synthesizes what should be regarded as dimensions, and measures for Table 1. Figure 4 shows the so-called Star Schema for the situation depicted in Figure 3. The Star Schema is highly used in modeling BI systems. It describes how data is going to be regarded by those who are going to manipulate it. We have a central table where we store the measures. This central table is called fact table. Around the Fact Table, we have the dimensions that allow us to categorize and organize the data. Kimball and Ross [7] introduce the Star Schema concept in details. Star Schemas are called logical models of data. It may be implemented in several ways. A possible physical model to store data according to it is the ROLAP model. In this popular model, dimensions and facts are stored as tables or relations. Figure 5 illustrate how this model could be adapted to the Metrô Rio situation. For each dimension in Figure 4 we establish a table with rows for each possible value this dimension may assume. Each dimension has an identifier (in our case called id_dimension) to link the row from the table dimension with
33
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
rows in the fact table. For those who are familiar with data base theory, id_dimension works as foreign key for the fact table.
Figure 3: Dimensions and measures for the
Rio de Janeiro underground BI system
Figure 4: Star Schema for the Rio de Janeiro underground BI system example
Figure 5: Relational model for the Rio de Janeiro Underground example
A BI system built around one Star Schema is called a Data Mart. When we have several Star Schemas somehow interconnected the system is called a Data Warehouse. Both types of systems are used to store huge amount of data and it is not uncommon to deal with BI system with trillions rows in their fact tables. Despite this storage capacity, the system is conceived to deliver reports and aggregated values in very fast and efficient way. Nevertheless, how this can be achieved, and the techniques applied for it, are beyond the scope of this work. Anyone interested in this aspect of a BI system is suggested to read [14].
5. PIVOT TABLES
In this work we are interested in showing an alternative way, based on Otto Neurath´s work, for displaying data stored in a DM or DW. For this, aspects of data storage are not relevant. Actually, BI systems tend to keep the data base structure transparent for their users. It is very common also that users are stimulated to access data using a spreadsheet system they are familiar with. For instance, Microsoft suit for BI, called Analysis Services, use Excel as interface to access BI systems. Therefore, users access and manipulate data as they were using a simple spreadsheet rather than a gigantic database. The main tool for promoting interaction with BI systems is the Pivot Table. According to [6], the concept of Pivot Table was first introduced by Pito Salas for the 1991 release of the Lotus Improv system. In the popular Microsoft Excel spreadsheet system, Pivot Tables were made available in 1993, for the Excel 5 version. Currently, it is regarded as the most used form of visualization for data retrieved from BI systems. Pivot Tables are also used to build business reports in a friendly way. There are several good introductions for using and applying Pivot Tables. In this work we are not concerning with how to build Pivot Tables but how to employ them as data visualization tool. Those interested in how Pivot Tables are available in a spreadsheet system could read [5] or [4]. One of the best advantage concerning Pivot Tables is the use of a drag and drop procedure to select the rows and columns of the table or report we are interested in. In Figure 6 we show a simple report built by this way. It is shown the average amount
TRIP
IDENTIFIER
id_TICKET
DIMENSIONS
STATION_ORIGIN
STATION_DESTINY
LINE
LINE_DIRECTION
DATE
TRIP_START
TRIP_END
MEASURES
TIME_ELAPSED
OCCUPANCY_RATE
Id_TICKET(Id_STATION_ORIGIN(Id_STATION_DESTNY(
ID_LINE(ID_LINE_DIRECTION(
ID_DATE(ID_TRIP_START(ID_TRIP_END(
(TIME_ELAPSED(
OCCUPANCY_RATE((
STATION(
LINE(
DATE(hour(
ID_HOUR((HOUR(
ID_STATION((NAME_STATION((CAPACITY(
ID_DATE(((DATE((((DATE_TYPE(
ID_LINE(((LINE_NAME(
ID_TICKET(ID_STATIO_ORIGIN(ID_STATION_DESTINY((ID5LINE(ID_LINE_DIRECTION(ID_DATE((ID_TRIP_START(ID_TRIP_ENDE(
TIME_ELAPSED((((OCCUPANCY_RATE(
34
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
of passengers that enters and leaves Maracanã Station from 14:15 and 14:30. These quantities should be multiplied by 100. Nevertheless, the type of chart shown in this picture may be regarded as unfriendly for being used by the public in general. People tend to associate statistics in charts as a mathematical visualization and consider then hard to understand. Automatic and dynamic sizing of the axes make this problem even worse. In an underground transport system as Metrô Rio it is necessary a more friendly way to portrait reports.
Figure 6 – Chart showing entrance and exit
From an underground station
6. ISOTYPE AND BI SYSTEMS
A system oriented to provide numeric information for a broad audience must consider the difficulties people face to understand and analyze the quantities involved. Therefore we cannot disregard people resistance in dealing with charts, graphs and other mathematical tools to visualize numeric data. ISOTYPE is a good option for dealing with this problem. It can even be applied to communicate statistics that come out of an information system. For instance, Figure 7 shows how data shown in Figure 6 could be portrayed using ISOTYPE. Basically, this type of solution displays quantities in a more familiar way to the layman who uses the Metro system. In order to use ISOTYPE associated with an information system, we need to provide scalable solutions. Therefore, we are no longer interested in building a solution to a limited and previously specified numeric set. We must find a framework that can fit the range of possible numeric data associated to the context being analyzed. Therefore, the solution shown on Figure 7 must be adaptable to data from other underground stations and for all moments of the day. In this solution, the ordinary user of Metrô Rio can take several conclusions more easily than observing Figure 6. In Figure 7 becomes quite clear that if a typical passenger waits more fifteen minutes, he or she is going to face a less busy station than it is now. Also the volume of the flow of passengers is more easily grasped by the shown pictograms. Figure 8 shows a possible ISOTYPE solution for data concerning the flow of passengers using one of the lines available in Metrô Rio. Again, the solution provided was designed to portray data regardless of the stations involved or the moments selected. From this solution, users of Metrô Rio can easily realize that to wait fifteen further minutes would allow a much more comfortable trip. In order to design this
solution we imagined a typical 280 passengers wagon having 40 seats available.
Figure 7: Flow of passengers in an underground station
7. CONCLUSIONS
When we want to make a presentation displaying numeric data, we can face two main different types of audience. The audience may be a specialized one, familiar with statistics jargon, and well versed in mathematical tools to visualize data. A second type of audience is the general public. In this case, we have to figure out solutions that entitle the layman to interpret, take conclusions and make decisions out of the data being displayed. Otto Neurath introduced a very useful solution for this second type of audience. Although his ideas were introduced in a pre computer era, they are still valuable resources for communicating data generated by modern BI systems for broad audiences. Some problems may occur when we try to use ISOTYPE ideas to the first type of audience, the specialized one. Sometimes, pictograms are not so accurate as the numbers they represent. It is more evident when we are dealing with fractions or numbers that are not exact multiples of the scale we adopted. Nevertheless ISOTYPE is an excellent tool for exhibiting tendencies within data and produces a powerful communication effect.
0
10
20
30
40
50
60
70
Entrance Exit
2:00-‐2:15
2:15-‐2:30
35
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Figure 8: Occupancy rate for the Rio de Janeiro
Underground example.
References [1].Bibliographisches Institut AG (1930). Gesellschaft und Wirtschaft – Bildstatistisches Elementarwerk. Leipzig: Bibliographisches Institut AG.
[2]. Codd, E. F. Saley, C. T. (1993) Providing OLAP to User-Analysts: An IT Mandate. Manchester: Hyperion Solutions Europe, E. F. Codd & Associates.
[3]. Fry, B. (2007) Visualizing Data. USA: O’Reilly Media.
[4]. Frye, C. (2011). Excel 2010: Pivot Tables in Depth. USA: lynda.com publisher. ISBN-13 9768-1596717237.
[5]. Hill, T. (2011). EXCEL 2010 Pivot Tables. USA: Questing Vole Press. ISBN-13 978-0789734358.
[6]. Jelen, B. Alexander, M. (2005). Pivot Table Data Crunching. Que Publisher.
[7]. Kimball, R. Ross, M. (2002) The data warehouse toolkit – The complete guide to dimensional modeling. 2. Edition. New York, EUA: Wiley Publishing.
[8]. Lima, Ricardo C. (2008). Otto Neurath e o legado do ISOTYPE. InfoDesign – Revista Brasileira de Design da Informação, 5(2), 36-49.
[9]. Metrô Rio (2013). Metrô Rio official site. Retrieved April 12, 2013, from http://www.metrorio.com.br/mapas.htm.
[10]. Monat, A. S., Campos, J. L., Lima, R. C. (2008). Metaconhecimento: Um esboço para o design e seu conhecimento próprio. BOCC-Biblioteca on-line de ciências da comunicação, v 1 page 1518.
[11]. Mundy, J. Thornthwaite, W. Kimball, R. (2006) The Microsoft data warehouse toolkit: With Microsoft business intelligence toolset. 1. Edition. Indianapolis, EUA: Wiley Publishing.
[12]. Neurath, M. Kinross, R. (2009) The transformer: Principles of making ISOTYPES charts. Hyphen Press: First Edition
[13]. Peter, R. Coronel, C. (2011) Sistemas de banco de dados: Projeto, implementação e administração. 8. Edition. São Paulo. Cengage Learning. 711 p.
[14]. Thomsen, E. (1997) OLAP solutions – Building multidimensional information systems. 1. Edition. New York, EUA: Wiley Computer Publishing. 576 p.
36
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Statistical Properties of Ordered Alphabetical Coding
Vilius NORMANTASInstitute of Mathematics AS Republic of Tajikistan, Dushanbe
ABSTRACT
The paper presents a type of text coding, called αβ-coding. The essence of αβ-coding is that letters of ev-ery word of a given text are arranged in a specific wayto create a code of that word. List of codes obtainedby scanning text corpora is stored in a database to-gether with words that could be transformed into eachcode. Word frequencies are stored as well. Decoding isperformed by transforming possibly scrambled wordsaccording to the algorithm of the coding and findingin the database the most frequent word correspond-ing to the resulting code. As more than one wordmay result in the same code, decoding is inherentlyambiguous. However a study on corpora of five lan-guages has shown that about 95% of word-tokens canbe correctly decoded.
Keywords: Ordered Alphabetical Coding, Coding,Ambiguous Decoding, Anagram, Corpora.
1. INTRODUCTION
It is widely known [1] that human readers are able tounderstand most words of a text where every letter,except the first one and the last one are scrambled inrandom order. This study is an attempt to describethis phenomenon as a type of text coding and demon-strate that a simple computer program is also able toperform decoding even when all letters are scrambled.
Ordered alphabetical coding (or αβ-coding) was in-troduced by Z.D. Usmanov in [4].
In this paper the coding was studied on corporaof four natural languages: English, Lithuanian, Rus-sian and Tajik; and one artificially created language -Esperanto. These particular languages were chosenpartially for subjective reasons (the author alreadypossessed corpora). Another reason was the intentionto test this coding on sufficiently diverse languages.Results of the statistical study are presented in twotables.
This type of coding could be used to introduce re-silience against certain distortions of text, for exampletyping errors, in applications where indexing of tex-tual data is needed. Concrete examples being spellingcheckers and textual database search engines.
2. DEFINITION OF ORDEREDALPHABETICAL CODING
Let L be a natural language with alphabet A, letW = a1a2...an be a word in that language of lengthn consisting of letters ak ∈ A, k ∈ 1..n. Let’s intro-duce a string of letters CW = as1as2...asn consistingof the same letters as the word W , but arranged inthe alphabetical order of the alphabet A.
Definition 1. We will call an image F : W → CWan ordered alphabetical coding (αβ-coding) of theword W , and string of letters CW - it’s αβ-code.
Example. Given the Latin alphabet A,F : W = spring → CW = ginprs.
Assuming that all written natural languages havetotally ordered alphabets (comparison of two distinctletters always results in “less” or “greater”[3]), imageof any word W under F will always be a single αβ-codeCW . However the inverse image F−1 : CW → Wmay be ambiguous, because more than one wordmay have the same αβ-code. For example, CW =eimst may be an image of several different words{W : W = times, items, mites, smite}.
A finite set of at least two words consisting of thesame letters arranged in different ways will be calledan anagram in this paper. The words belonging to theanagram will be called anagram elements.
Every anagram may be described by a single αβ-code. However the inverse image F−1 of the ana-gram’s αβ-code always corresponds to at least two el-ements of set {W}, therefore decoding of anagram’simage is always ambiguous.
In order to evaluate ability of αβ-coding to restorepreimages of αβ-codes, it is necessary to evaluate thecardinality of anagram set in natural languages. Sta-tistical study of corpora of four natural languages: En-glish, Lithuanian, Russian, Tajik, and an artificiallycreated language Esperanto have been made in [5]and [6]. The results suggest that the ratio betweenthe number of word tokens belonging to the anagramset and total number of word tokens in the corpusfluctuates about the value of 0.5. This suggests thatabout half of word tokens in the studied corpora areelements of anagrams. This fact raises serious doubts
37
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
regarding the usefulness of αβ-coding, as about 50%of words potentially could be decoded incorrectly.
3. VARIATIONS OF ORDEREDALPHABETICAL CODING
In order to reduce the ambiguity of decoding, twomodified versions of αβ-coding have been presented in[6]: F f and F f,l. Just as F , the modified images aredifined on a set of words {W} in a natural languageL.
Definition 2. Image F f of word W is a stringα1C(W/α1), where α1 is the first letter of the wordW and C(W/α1) is αβ-code of the word W withoutthe first letter.
This image, unlike F , leaves the first letter of theword in its original position and arranges the remain-ing letters by the alphabetical order of A.
Following is another variation of αβ-coding.
Definition 3. F f,l : W → α1C(W/{α1, αn})αn.
In this image both the first letter of the originalword α1 and the last one αn remain intact. The re-maining letters W/{α1, αn} are arranged according tothe alphabetical order of A.
Example. Image F f of elements of anagram{W : W = times, items, mites, smite} wouldbe encoded as teims, iemst, meist, seimt.
This should be an improvement over F image, asless words are represented by the same code. Howevercodes of some other words would still be ambiguous,when the first letters match, for example both ele-ments of anagram {W : W = protein, pointer}would be encoded as peinort. In case of F f,l imageeven more words would have unambiguous codes. Forexample, elements of the last anagram would result intwo distinct codes: peiortn and peinotr.
In comparison to the original αβ-coding F , bothvariations F f and F f,l offer some improvement interms of accuracy of decoding.
4. STATISTICAL STUDY OF WORDFREQUENCY LIST
Corpora of the following sizes has been used to gatherthe data:
• English 11,252,496 words;
• Lithuanian 34,165,084 words;
• Russian 19,175,074 words;
• Tajik 2,323,965 words;
• Esperanto 5,080,195 words.
Lan-guagecode[2]
Cod-ing
Uniquecodes
Unam-biguouscodes,in %
Ambi-guouscodes,in %
EnF 119,055 89.74 10.26F f 130,644 95.19 4.81F f,l 135,618 98.49 1.51
LtF 605,039 90.28 9.72F f 654,475 94.96 5.04F f,l 675,208 97.44 2.56
RuF 462,886 93.01 6.99F f 488,286 96.32 3.68F f,l 500,433 98.39 1.61
TgF 80,080 93.05 6.95F f 84,220 96.77 3.23F f,l 85,805 98.48 1.52
EoF 147,220 90.92 9.08F f 158,310 95.94 4.06F f,l 162,940 98.45 1.55
Table 1: Anagram statistics
Results of the statistical study of the corpora arepresented in table 1. First of all, a list of all uniquewords with absolute frequencies of their occurrencewas created. Size of the list, which is equal to thetotal number of unique words found in the corpus ispresented in the second column of the table.
Second step was to encode every word from the listaccording to the algorithm of the particular modifi-cation of αβ-coding and group words resulting in thesame code. The codes were divided into two groups.One includes codes generated by a single word, calledunambiguous codes. The other group contains codesgenerated by at least two distinct words. The thirdand fourth columns of table 1 represent amount of re-spectively unambiguous and ambiguous codes as per-centage of total number of distinct codes. The resultsshow that about 90 to 98% of word types can be un-ambiguously decoded for all languages. As expected,the modified versions of αβ-coding performed betterthan the original version, and F f,l performed best onall languages.
However this evaluation does not take into accountthe frequencies of the word occurrences.
5. UNAMBIGUOUS INVERSE IMAGES
As discussed above, images F , F f and F f,l assign asingle code to every word. However the inverse imagesin general case does not provide unambiguous decod-ing. To overcome this limitation images F , F f andF f,l are presented.
Definition 4. Images F , F f and F f,l have the fol-lowing properties:
38
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
• they are defined on a set of words {W} of lan-guage L;
• encode words in the same way as correspondingcodings F , F f and F f,l;
• the inverse images F−1
, F f−1
and F f,l−1
match
the inverse images F−1, F f−1and F f,l−1
, whenthe word can be decoded unambiguously;
• when decoding is ambiguous, inverse images F−1
,
F f−1
and F f,l−1
return a single word W ∗, whichhas the largest frequency of all words with thesame code.
Example. Two words have been found, which areencoded as eenprst by image F . The absolute fre-quency of the first word present is 4129, the otherword serpent occurred 22 times in the corpus. As theformer word has a larger frequency, therefore the in-
verse image F−1
would decode eenprst as present.It is clear that this type of decoding cannot avoid
mistakes when the less frequent element of an anagramis expected. However statistical study of the corporapresented in the following section suggests that thepercentage of correctly decoded words may be highenough for some applications.
6. STATISTICAL STUDY OF INVERSEUNAMBIGUOUS IMAGES
The fourth column of table 2 presents ratio of occur-rences of words which could be decoded unambigu-ously by each image, as percentage of total numberof word tokens in the corpus. Unlike data presentedin table 1, this table takes into account the word fre-quencies.
It is important to notice, that for all studied lan-guages only about 50% of words tokens are not el-ements of an anagram, thus can be unambiguously
decoded by inverse image F−1
. Russian languagehas slightly less occurrences of anagrams - it hasabout 54% non-anagram words. Esperanto particu-larly stands our by having very low number of non-anagram words, comparing to other studied languages- just about 35% of word occurrences.
Images F f and F f,l offer significantly higher per-centage of unambiguously decoded word tokens. Re-sults tend to cluster around about 75% and 90% re-spectively.
The last column of table 2 gives a quantitative eval-uation of the effectiveness of decoding by every αβ-coding. For all five languages error rates are within
1% when decoding by images F f−1
and F f,l−1
. Error
rates of decoding by image F−1
stay within 3% for alllanguages except Esperanto, where error rate is about6%.
Lan-guagecode[2]
Numberof wordsin cor-pus
Cod-ing
Frequ-ency ofunam-biguouscodes,in %
Cor-rectlyde-codedwords,in %
En 11,252,496F 42.11 97.42
F f 73.60 99.35
F f,l 96.25 99.75
Lt 34,165,084F 45.77 97.17
F f 69.48 99.03
F f,l 84.88 99.60
Ru 19,175,074F 54.31 97.65
F f 75.57 99.42
F f,l 85.79 99.85
Tg 2,323,965F 49.59 98.12
F f 75.70 99.37
F f,l 86.99 99.67
Eo 5,080,195F 35.21 94.14
F f 82.39 99.09
F f,l 95.16 99.77
Table 2: Efficiency of decoding
7. EXAMPLE OF DECODING
The following two sentences (taken from Wikipedia’sarticle about English language) were scrambled sothat letters of every word were ordered randomly.
English is a West Germanic language that was firstspoken in early medieval England and is now the mostwidely used language in the world. It is spoken asa first language by the majority populations of sev-eral sovereign states, including the United Kingdom,the United States, Canada, Australia, Ireland, NewZealand and a number of Caribbean nations.
The texts looks as follows when scrambled. Capi-talization of letters is removed, the punctuation marksand white spaces are left intact:
hleigsn si a wtse arngmeic gaaleung taht saw ftirsknsope in lerya dmaelive nlanegd dan is now eth stmowdleyi sude gnauglae ni het orlwd. it is pekons saa tsfri aaengulg by hte ytrmajoi otunlpsioap of eervlsanresvegoi ettass, cgiidnuln the euidnt nmdokgi, teh un-edti ssetta, cdaana, uasrtaial, ndairle, enw eaazdnlnda a nmerub of eicarbanb ntsoina.
Text restored by F−1
image:
english is a west germanic language that was firstspoken in early medieval england and is now the mostwidely used language in the world. it is spoken as afirst language by the majority populations of severalsovereign states, including the united kingdom, theunited states, canada, australia, ireland, new zealandand a number of caribbean nations.
39
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
In this example, the text has been completely re-stored to the original state, excluding the capital let-ters.
8. DISCUSSION AND CONCLUSIONS
Although the problem of ambiguous decoding is un-avoidable in αβ-coding, the results of the statisticalstudy presented above suggest that this type of cod-ing may be accurate enough for some applications. Forexample, if we assume that the average length of asentence of some language is 20 words, decoding by Fwould on average make about one error per every 1.6sentence. In most cases this level of accuracy wouldmake the resulting text easily legible comparing to thescrambled input (for example see section 7).
Although linguistic aspects of the chosen languageswere not subject of this study, it is interesting to noticethat statistical properties of αβ-coding remain consis-tent through the languages. Even though the lan-guages were sufficiently distinct. It is also interestingto notice that artificially created language Esperantostands out from the other languages by having signifi-cantly larger number of words belonging to anagrams.
This study also may provide an explanation to thewidely known phenomenon where people can success-fully read text with scrambled letters, except the firstand the last ones (this corresponds to F f,l coding).Human readers may have this peculiar ability, becausemost words do in fact have distinct sets of letters.Even when several words belong to an anagram, it isvery likely that frequencies of their occurrence woulddiffer by as much as several orders of magnitude.
References
[1] M. Davis, http://www.mrc-cbu.cam.ac.uk/
people/matt.davis/Cmabrigde/, 2003, accessed2013-05-04.
[2] Library of Congress, “Codes for theRepresentation of Names of Languages”,http://www.loc.gov/standards/iso639-2/
php/English_list.php, accessed 2013-05-04.
[3] B. Smyth, Computing Patterns in Strings, Es-sex, England: Pearson Addison-Wesley, 2003, p. 6.
[4] З.Д. Усманов, “Об упорядоченном алфавитномкодировании слов естественных языков”, ДАНРеспублики Таджикистан, т. 55, № 7, 2012.
[5] З.Д. Усманов and V. Normantas, “Статистиче-ские свойства - кодирования слов естественныхязыков”, ДАН Республики Таджикистан, т.55, № 8, 2012.
[6] З.Д. Усманов and V. Normantas, “О множе-стве анаграмм и распознавании их элементов”,Proceeding of the 16th seminar Новые ин-формационные технологии в автоматизи-рованных системах, Moscow, 2013, pp. 287-292.
40
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
The Complexity of Complexity: Structural vs. Quantitative Approach
Marcin J. Schroeder Akita International University Akita, 010-1211 Akita, Japan
ABSTRACT The first part of this paper is devoted to a short critical review of the concepts frequently used to characterize complexity. It will be followed by a choice of the conceptual framework for this study based on methodological reflection of the author from his earlier publications in which concept of information plays fundamental role. Finally, more comprehensive approach will be proposed for the study of complexity in which both selective and structural manifestations of information are considered. The former is associated with the quantitative characterization of complexity, the latter with the structural one. The choice of the conceptual framework of information and the use of the formalism developed for the study of information in the study of complexity is justified by their explanatory power regarding several aspects of complex systems. Keywords: Complexity, Information, Information integration, Selective and structural manifestations of information, State of a system, Formalism for information and its integration.
1. INTRODUCTION Complexity is more complex, than it was described by Warren Weaver in his very influential early study of the concept and its role in scientific inquiry [1].
Weaver distinguished three levels of complexity. Simple systems devoid of complexity involve small number of variables easily separable in analysis.
Systems of disorganized complexity involve numerically intimidating number of variables, but because of limited interaction of the components, they may be successfully analyzed using statistical methods. Here, an example is gas consisting of so big number of molecules that the tracking of their individual states is impossible, but whose collective behavior can be easily analyzed in terms of macro variables.
Finally, systems of organized complexity reflect “the essential feature of organization” of the big number of components, and escape statistical analysis. In systems of this type components are interacting in an organized way making statistical analysis ineffective.
In spite of its importance for the initiation of the study of complexity, Weaver’s categorization has several deficiencies. The simplest is in not taking into account systems involving small number of variables, but exhibiting some form of complexity. An example can be a pair of entangled elementary particles. Crystal gives us another example which can be used as an argument against the merit of Weaver’s categorization. The system is not much different from gas in the number of components, and its organization is of much higher level, but it can be well described and analyzed in terms of symmetry. In this case, the components (atoms or molecules) are interacting in a
highly organized manner, and it is this organized interaction which allows analysis.
More serious deficiency is in the conceptual ambiguity of Weaver’s approach. This objection can be also applied to the majority of other attempts to define, describe and analyze complexity, that refer to variables, systems, organization, interaction, or causation. Even the assumption that the big number of components is a prerequisite of complexity can be objected. It is easy to predict preferences of the big number of customers in a shop, while virtually impossible for a single customer.
Finally, Weaver’s study of complexity does not address issues of hierarchic levels of organization and complexity arising in the mutual relationship of the levels. This type of complexity can be found in the study of life, which in turn is a paradigmatic object of the study of complexity.
First part of this paper will be devoted to a critical review of the concepts used to characterize complexity. It will be followed by a choice of the conceptual framework for this study based on methodological reflection of the author from his earlier publications [2]. Finally, more comprehensive approach will be proposed for the study of complexity together with its justification.
2. UNDERSTANDING COMPLEXITY Complexity is an abstraction derived from the adjective “complex” and therefore there is a natural question about the term it qualifies. Here is the first source of ambiguity. An obvious answer is that complexity characterizes a system, but typically there is no explanation of the meaning of this concept.
It is a wild card term, even in the context of highly formalized disciplines such as mechanics. The same term is used when the system under consideration has an epistemic status and is simply that which is considered, and when it is used in the ontological meaning of independently existing and objectively distinguished object of study.
In physics, there is frequently an additional qualification of a closed or isolated system. This indicates an idealization of complete independence of the system from external influence, however typically there is no explanation of the topological expressions such as “external” or “closed.” Moreover, the isolated system can be under influence of controlled external forces, or can fill out all space, for instance in the case of a force field. It can be a vacuum in a specified region of space. In any case, the physical system is capable of assuming many states, at least potentially, as otherwise it could not be a subject of study.
Systems are studied not only in physics, and not only in physics they are so enigmatic. The use of the term “system” always presupposes some form of identity and some level of potential or actual multiplicity of either membership type, part-whole type, or in the form of the variability of qualitative or quantitative
41
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
characteristics. But this does not go much beyond an exclusive characterization in terms of two opposite very general concepts of unity (identity, whole) and potential or actual multiplicity.
Weaver’s categorization of the levels of complexity makes an important distinction between simple increase in the number of the multiplicity characterizing a system and interdependence of the elements of this multiplicity [1].
This interdependence is described in a temporal fashion as interaction. But again, what does it mean “interaction”? In physics, interaction is through action of forces, and presence of force means simply change of a state. Mechanical system does not change its state in the absence of forces. If observed in a reference frame associated with a system influenced by some forces, observed system will exhibit virtual change of state by pseudo-forces.
At this point, we should be aware of the possible terminological confusion in the use of the term state, which frequently is used in a different meaning, synonymous with coordinate configuration.
An important characteristic of the physical interactions introduced already by Isaac Newton as the Third Principle of Mechanics is that they are always mutual and symmetric. If one object is acting on the other with some force, then the other is acting on the first one with a force of the same magnitude, but in opposite direction. It is in strong contrast to human perception of the one-way actions and has important consequences for the underlying concept of causality.
At the level of mechanical interactions, dynamic of the systems is symmetric with respect to the inversion of time coordinate, which makes the concept of causal relationship between a cause and its effect inadequate. However, in compound systems with sufficiently large number of components, the Second Law of Thermodynamics is breaking the symmetry with respect to time inversion and the causal relationship becomes relevant.
Thus, from the point of view of mechanics, it is impossible to decide whether I am pushing the wall causing its crumbling, or the wall is pushing me and crumbles due to my resistance. However, thermodynamic analysis shows that only my body can be a source of energy necessary to destroy the wall. Thus, my body was the active side, and the wall passive one. My action was the cause of wall’s destruction. It is complexity of the system which allows for transition from interaction to causal relationship.
Physical interaction through forces is associated with the change of the state of a system. However, the concept of a state is not clear either. Uri Abraham observed “Considering its central place, it is surprising that the general notion state has received so little attention.” [3] Even in physics, where the term “state” is one of the most frequently used, its meaning, or rather an interpretation of mathematical concepts from the formalisms used by physical theories associated with this term, is sometimes ambiguous, in particular in classical mechanics.
Textbook explanations of a state of a system consisting of n point masses (particles) identify the state with a point in a 6n dimensional space (phase space) with dimensions representing possible three spatial coordinates of each particle and its possible three momentum coordinates. Of course the values of coordinates depend on the choice of the observer or reference frame, but this dependence of the state on coordinatization can
be avoided when we consider the state to be a vector in vector space.
Bigger problem is that this vector is changing in time, even when there are no forces acting on it, for instance when we have a single free particle moving along a straight line. Change of the position without any change of the momentum is a matter of the choice of reference frame and of the identification of the system in this frame, not of a change of the state. Thus, the interpretation of what in the mechanical formalism constitutes a description of the mechanical state is different from the question of what we need to make predictions regarding the future of the system.
When we consider more general systems, a state of the system can be identified with the selection of accidental properties of the system. It is necessary to make distinction between the properties which identify the system (usually we would say object) and can be considered essential properties, i.e. those which are necessary for the existence of the object, and accidental or variable properties, which are involved in interactions. Using this approach it would be appropriate to distinguish in mechanical systems spatial description (position vector) as a description of its identity, and the vector of momentum as a description of its state.
In the general case, the properties can have qualitative form, or can be associated with quantitative representation. Interaction between systems here can also be understood as change of the state. Of course there is a question under what conditions changes of the states of the systems can be understood as interaction. Simultaneity of the changes can be coincidental, so it is not a reliable criterion. If we can identify physical interactions of the components of the systems the answer is easy. It is more difficult to answer the question in more general context. An attempt to answer this question will be given later in the proposed conceptual framework for the study of complexity.
One more concept used by Weaver in his original study of complexity should be considered, that of organization. Certainly, organization is associated with some structural characteristics. Actually Weaver considers an adjective “organized” to qualify complexity.
Disorganized complexity means that the components can be considered in separation, and the only problem in the analysis of a complex system of this type is their large number. We could observe in the example of a crystal mentioned above that the large number of components may not be a problem, if the structure which describes organization, in this case a group of transformations corresponding to the symmetries of the crystal, has simple description. Thus, complexity becomes a source of problems when structural characteristics of the system are highly involved.
3. METHODS TO RESOLVE PROBLEMS OF COMPLEXITY
We can trace the attempts to overcome difficulties arising in handling complexity even in very remote past of humanity. Two early examples can be found in the use of numbers and in language. Humans have very limited capacity in direct comprehension of the number of objects. The classical article of George Miller “The Magical Number Seven Plus or Minus Two” sets the limit surprisingly low [4]. It tells us that in this
42
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
respect we are not much better than other animals, for instance ravens or parrots [5, 6].
To avoid problems in dealing with larger number of objects various numerical systems were introduced which represented groups of several objects by a single, but possibly compound symbol. Next step was a positional numerical system which allows with a fixed set of fundamental symbols (digits) construction of a numeral representing an arbitrary number. Positional numerical system has an additional advantage, that the arithmetical operations representing compounding operations on sets (addition corresponding to disjoint sum and multiplication corresponding to direct product of sets) could be performed in a sequence of elementary operations involving manipulation exclusively of single digit numerals. Thus, the process of a high level of complexity could be decomposed into elementary steps involving relatively small number of symbolic manipulations whose outcomes could be easily memorized. This decomposition was the original form of an algorithm.
Alan Turing detached this process of manipulation of symbols from the human brain in his model of an A-machine [7]. Moreover, he showed that it is possible to describe a universal machine which can perform any algorithmic process, when it is appropriately described on the input tape. The universal character of this type of Turing machines had great practical consequences, but also was a bridge to another ancient method of dealing with complexity.
This other method of overcoming complexity has much more remote sources in the use of language preceding the introduction of numerical systems. Here too, relatively simple words representing complex systems were constructed from small number of units, first sounds, later written characters. We can observe that probably without any intentional intervention there was some correlation between complexities of the meaning of the words and their linguistic representations. Simple concepts, whose word representations were frequently used, acquired simple, short form.
Greek Antiquity brought the recognition of another type of relationship among the words reflecting the structure of comprehended reality. Aristotle used differences in generality of universals, i.e. terms representing classes of objects to develop a structure which allowed creation of new concepts out of those earlier defined. Moreover he provided a system of syllogistic which allowed derivation of some sentences expressing relationships between scopes of terms, from other (premises).
This first form of logic, syllogistic required specific form of sentences. Stoics developed a logical system of logical consequences based on analysis of connectives used to combine simple sentences into compound ones. Much later predicate logic was developed combining both earlier logical systems.
Deduction was a tool to reduce complexity. Originally, Greeks believed that the truth of a complicated statement can be shown by deriving it from the set of so simple and obviously true axioms, that nobody would question them. Today we know that no axioms are obviously true, but that their selection is either arbitrary, or is dictated by the process of induction from the results of an empirical process. However, we still can agree that axiomatic systems reduce complexity.
Aristotle created the first axiomatic system for his syllogistic, but it was the formulation of geometry in this way by Euclid in the “Elements” which became a paradigm of the
method. It is interesting that the Greeks had their own “Turing machine” in their studies of geometry. The logical process was frequently represented by a sequence of operations made with the use of the ruler (straightedge) and compass. Almost two thousand years later, Rene Descartes in his “La Géométrie” published in 1673 as an appendix to “Discourse on Method” showed that these geometric constructions are equivalent to operations on numbers when geometry is formulated in the analytical form. But the relationship was with real numbers, not integers.
The revolution of Turing’s approach consisted in realization that all algorithmic processes, including logical proofs, can be modeled with a very simple mechanism which can be interpreted as a device working with natural numbers.
Moreover, several different measures of complexity were introduced with the use of the concept of computation, although after the death of Turing. From the present perspective they seem quite obvious, but their invention was a great achievement. For each process carried out by a universal Turing machine we can count either the number of steps leading to the result, the minimal length of the program which is producing the outcome, and the minimal size of the tape, or memory necessary for its implementation. Each gives us evaluation of some quantitative aspect of complexity.
The fact that we can measure something is frequently dangerous for the development of the understanding the subject of study, as it creates an illusion that our knowledge is complete. What do we know about complexity in a consequence of the availability of these measures? Does it help us to understand what does it mean “complexity”?
4. CONCEPT OF INFORMATION Information is an apparently equally enigmatic concept. However, a suitable definition of information can reduce the study of complexity to that of information. It is not a surprise, because even without deeper reflection of the two concepts, it is quite convincing that complexity is a characteristic of information, or information systems. Moreover, the second of the measures of complexity mentioned above is considered a variant of the measure of information. Certainly, for this purpose information has to be defined in a very general way.
Concept of information, introduced and studied by the author in his earlier publications [8], is understood as an identification of a variety, which presupposes only categorical opposition of one and many and nothing else. The variety in this definition, corresponding to the “many” side of the opposition is a carrier of information. Its identification is understood as anything which makes it one, i.e. which moves it into or towards the other side of the opposition. The preferred word “identification” (not the simpler, but possibly misleading word “unity”) indicates that information gives an identity to a variety. However, this identity is considered an expression of unity or “oneness”. We could interpret this formulation of the concept of information as a resolution of the one-many opposition.
There are two basic forms of identification. One consists in the selection of one out of many in the variety (possibly with a limited degree of determination described for instance by probability), the other in a structure binding many into one (with a variable degree of such binding reflected by decomposability of the structure). This brings two manifestations of information,
43
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
the selective and the structural. The two possibilities are not dividing information into two types, as the occurrence of one is always accompanied by the other, but not on the same variety, i.e. not on the same information carrier.
It is easy to recognize in the selective aspect of information subject of Shannon’s information theory, in which a probability distribution of the selection is utilized to define the measure of information in terms of entropy [9]. It is usually overlooked that in this approach it is the probability distribution of the choice which describes information and its entropy is only a secondary concept which characterizes the distribution. It is naïve to expect that the concept of information appears out of nothing between the probability distribution and the calculated value of entropy.
The structural aspect of information can be identified in the attempts to link the concept of information with topological or geometric structures defined on the objects functioning as carriers of information.
For the concept of information formulated in so general way, the formalism has to be equally general [10]. The concept of information requires a variety, which here is understood as an arbitrary set S (called a carrier of information). Information system is this set S equipped with the family of subsets ℑ satisfying conditions: S is in the family ℑ, and together with every subfamily of ℑ, its intersection belongs to ℑ, i.e. ℑ is a Moore family.
Information itself is a distinction of a subset ℑ0 of ℑ, such that it is closed with respect to (pairwise) intersection and with each subset belonging to ℑ0, all subsets of S including it belong to ℑ0 (i.e. in mathematical terminology ℑ0
is a filter).
This description can be translated into simpler explanation. The Moore family ℑ, represents a variety of structures (e.g. geometric, topological, algebraic, etc.) of a particular type which can be defined on the subsets of S. This corresponds to the structural manifestation of information. Filter ℑ0 in turn, serves identification, i.e. selection of an element within the family ℑ, and under some conditions in the set S.
For instance, in the context of Shannon’s selective information based on probability distribution of the choice of an element in S, ℑ0 consists of elements in S which have probability measure 1, while ℑ is simply the set of all subsets of S. This approach clearly combines the both manifestations of information, selective and structural.
Since every Moore family ℑ of subsets of a set S corresponds to the family of closed subsets of some closure operator defined on S, each information system can be characterized in terms of an algebraic structure £, called complete lattice, introduced on the family ℑ. This structure is a generalization of the concept of a Boolean algebra, and at the same time it assumes the role of the generalized logic going beyond its special instance of the traditional logic for linguistic information systems [11].
The analysis of the logic £ of an information system gives us description of the level of information integration. Direct product reducibility or factorizability (i.e. decomposability into a product of simpler component structures) of this lattice can be used as a characterization of the level of information integration.
If the logic, i.e. lattice representing it in the algebraic form cannot be decomposed into a product, we have completely integrated information. If it can be decomposed into trivially
indecomposable two element structures (the case of a Boolean algebra), it is not integrated at all. And between these two extreme possibilities we have a wide range of partially decomposable logics.
Moreover, it is possible to describe a mathematical model of the theoretical device (gate) integrating information, and therefore the verb form of the term “integration” is fully justified [12].
Finally, in the formalism for information in terms of the closure operators a symbolic representation can be defined as mapping from one information system to another which preserves structural characteristics of information [13]. Thus, instead of the difficult to comprehend relationship between objects of different ontological status (symbol and its denotation), we can think of a symbol as an image of a function which assigns to elements of one information system elements of the other.
5. COMPLEXITY AND INFORMATION In the earlier part of this paper, complex systems were characterized in terms of the identity and multiplicity. This opens connection between complexity and information understood as an identification of a variety. Complex systems are simply information systems with the high level of complexity which can be understood in terms of selective information or structural information.
Systems whose complexity Weaver described as disorganized can be associated with the former manifestation with a high value of the measure of information, those with organized complexity with the latter of high level of integration. It can be easily observed that now we can include complex systems with small number of components, as the high level of information integration is not necessarily related to the number of components.
In the present conceptual framework, we can consider many different types of complexity related to information in geometric, topological or other structures. Also, the study of hierarchic information systems, such as systems describing life, gives us a new perspective on complexity involving multiple levels of organization [14].
6. WHY INFORMATION? There is a legitimate question regarding the choice of the concept of information, as defined by the author, as a framework for the study of complexity. What are the reasons? What are the advantages of this approach?
Some reasons were already given above. Complexity was already in its earlier studies associated with information, for instance, when the quantitative characterizations of complexity were formulated in terms of algorithms or computation. But this association, although important for practical reasons, does not help in understanding of what is complexity, or how to overcome the limitations imposed by it.
Thus, it is not so important that complexity can be associated with information, but that complexity can be described using structural characteristics of information and its integration.
As it was stated above, the level of information integration in the formalism proposed by the author is essentially the level of decomposability of an algebraic structure identified as a logic of information. Decomposability of the algebraic, or in general
44
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
mathematical structures is probably the most important subject of all mathematical theories, and immense amount of work has been done in mathematics on this subject, especially in relation to the task of the classification of the simple, indecomposable structures.
It may seem to be a smile of the history of human endeavor, but actually should not be a surprise, that the issue of decomposability of mathematical structures into a small number of simple structures has its source in the attempt to reduce the complexity of the study of these structures.
Typically, the paradigm of this study is explained using not very well fitting example of prime numbers. Every natural number greater than two can be written in a unique way as a product (here simply a result of multiplication) of some prime numbers (numbers greater than one which are divisible only by one and itself) or their powers. Thus, prime numbers serve as indecomposable, simple components. Very often, proofs of theorems in number theory are done first for prime numbers, and only then an extension is made to all numbers which are built from these numbers, or it is known that some theorems can be easily proven for all numbers, if we can only find a proof for prime numbers.
The example is not perfect because unfortunately we have as many prime numbers as all numbers (questionable reduction of complexity), the long-standing problem of finding an algorithm generating the n-th prime number has not been solved, and the algorithms of decomposition of the natural numbers into products of prime numbers is notorious of its high level of computational complexity.
However, we have another example of the one of the greatest mathematical achievements, the completion in 2008 of the classification of all finite simple groups as a result of the publication of the work of about one hundred authors in the second half of the 20th Century.
For our purpose of the study of complexity in terms of information integration, we can focus on the subject of decomposability not of all mathematical or algebraic structures, but of complete lattices, as they serve as the structures characterizing every logic of information. The study has also very extensive literature from similar period of time as in the case of finite groups, although probably due to much smaller interest in the domain of lattices the results are less advanced.
More important than already existing results are highly developed methods of inquiry which can be easily adapted for the purpose of the study of information.
Another advantage of this formalism is the possibility to consider semantics of information programmatically neglected in the orthodox, quantitative analysis of information in terms of entropy, or more exactly in the analysis of information transmission.
As it was stated above in earlier sections, some of the most effective methods to deal with complexity consisted in the use of symbolic representation and manipulation of symbols, i.e. the use of abstraction or the algorithmic decomposition of complex processes into simple manipulations of symbols in computation or logical reasoning. But the concept of meaning for centuries created more philosophical problems, than solutions.
Meaning as a mapping between the objects and states of reality and some structure of symbols (for instance a language) is a quite obvious idea explored in many works. However, in the earlier attempts there was nothing which would explain which of a large variety of possible mappings should be associated with the meaning and how the choice of a function generates the mechanism of meaning.
Completely different status of the elements of the domain of the function and the set of its values made the choice impossible without involving external and arbitrary concepts, such as for instant of a mind with distinctive mental characteristics (Brentano) or a more abstract interpreter (Peirce).
In the present approach, meaning is defined by functions which preserve closure structure of the two information systems. In mathematical language we can say that such a function is continuous with respect to respective closure operators.
Someone could object: “What? Is a cow an information system when I use the word ‘cow’?” Of course! Every cow is a much more sophisticated information system than a mainframe computer. Try to milk it. This is not necessarily a joke, considering for instance immunological information in milk and taking into account that every existing computer is processing information at only two levels of hierarchical structure of information dynamics, while life requires multi-level hierarchy [14]. Using more serious terminology to defend against the objection of frivolity, all objects of our experience are information systems or their elements. This does not require adoption of the “bit for it” John A. Wheeler’s philosophy of information giving this concept primary ontological status. The definition of information used here is neutral with respect to ontological issues.
7. WHY COMPLEXITY? We have to consider another legitimate question “Why complexity?” What is the reason for extending the quantitative study of algorithmic or computational complexity to structural study involving information? Do we need more than what we knew about complexity before?
There are many reasons going beyond the most obvious, that the quantitative methods apply to very special instances of complexity and do not give us any answers to the question “What is complexity?” It is a very typical illusion that the assignment of numbers is a measuring of something and that this “something” is an existing entity [15]. But is this question just an expression of curiosity?
Complexity became recently the main obstacle in the progress of many disciplines. Probably the most clear is the recognition of the problem in the study of the foundations of life [2]. But the other disciplines are exposed to similar difficulties.
Programs to solve the mystery of consciousness are blocked by extreme complexity of human brain. Some research centers are trying to reduce complexity by limiting their ambitions to mapping of the brain of mouse which has “only” 70 million neurons, instead of that of human with about a thousand times bigger number. It is still an expression of the extreme optimism, as the similar task to explain functioning of the brain of roundworm Caenorhabditis elegans with its 302 neurons and 8,000 synapses turned out to be extremely difficult.
45
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Human brain is very likely the most complex system in the Universe, hence maybe we should not start from this end. However, the issues of complexity pop up everywhere. The fact that we know algorithms to solve some relatively simple problems is very often of no practical relevance, since algorithms require time of execution exceeding the age of the universe.
Probably the most important challenge for the human intellect is at present finding a way to curb complexity, and without knowing what it is and what are its structural characteristics, the task would be hopeless.
Acknowledgements. The author would like to express his gratitude for the valuable comments and suggestions to improve this paper from Gordana Dodig-Crnkovic, Plamen L. Simeonov, and other, anonymous reviewers.
8. REFERENCES
[1] W. Weaver, “Science and Complexity”, American Scientist,
Vol.36, No.4, 1948, pp. 536-544. [2] M. J. Schroeder, “The Role of Information Integration in
Demystification of Holistic Methodology”, in P. L. Simeonov, L. S. Smith, A. C. Ehresmann (Eds.) Integral Biomathics: Tracing the Road to Reality, Berlin: Springer, 2012, pp. 283-296.
[3] U. Abraham, “What is a State of a System? (an outline)”, in Manfred Droste, Yuri Gurevich (Eds.) Semantics of Programming Languages and Model Theory. Algebra, Logic and Applications Series, Vol. 5. Newark, N.J.: Gordon and Breach, 1993, pp. 213-244.
[4] G. Miller, “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information”, Psychological Review, Vol. 63, 1956, pp. 81-97 (reprinted in Vol. 101, No.2, pp. 343-352).
[5] O. Koehler, “The Ability of Birds to ‘Count’”, in J.R. Newman, The World of Mathematics. New York, NY: Simon and Schuster, 1956, pp.489-496.
[6] H. Davis, J. Memmott, “Counting behavior in animals: A critical evaluation”, Psychological Bulletin, Vol. 92, 1982, pp. 547-571.
[7] A. M. Turing, “On computable numbers, with an application to Entscheidungsproblem”, Proc. London Math. Soc., Ser 2, Vol 42. 1936, pp. 230-265.
[8] M. J. Schroeder, “Philosophical Foundations for the Concept of Information: Selective and Structural Information”, in: Proceedings of the Third International Conference on the Foundations of Information Science, Paris 2005, http://www.mdpi.org/fis2005.
[9] E. C. Shannon, “A mathematical theory of communication”, Bell Sys. Tech. J., Vol. 27, 1948, pp. 323-332; 379-423.
[10] M. J. Schroeder, “From Philosophy to Theory of Infor-mation”, Intl. J. Information Theor. and Appl., Vol.18, No. 1, 2011, pp. 56-68.
[11] M. J. Schroeder, “Search for Syllogistic Structure of Semantic Information”, J. of Applied Non-Classical Logic, Vol. 22, 2012, pp. 101-127.
[12] M. J. Schroeder, “Quantum Coherence without Quantum Mechanics in Modeling the Unity of Consciousness”, in: P. Bruza et al. (Eds.) QI 2009, LNAI 5494, Berlin, Germany: Springer, 2009, pp. 97-112.
[13] M. J. Schroeder, “Semantics of Information: Meaning and Truth as Relationships between Information Carriers”, in C. Ess & R. Hagengruber (Eds.) The Computational Turn: Past, Presents, Futures? Proceedings IACAP 2011, Aarhus University – July 4-6, 2011. Munster: Monsenstein und Vannerdat Wiss., 2011, pp. 120-123.
[14] M. J. Schroeder, “Dualism of Selective and Structural Manifestations of Information in Modelling of Information Dynamics”, in: G. Dodig-Crnkovic, R. Giovagnoli (Eds.) Computing Nature, SAPERE 7, Springer, Berlin, 2013, pp. 125-137.
[15] M. J. Schroeder, “Crisis in science: In search for new theoretical foundations”, Progress in Biophysics and Molecular Biology, 2013, http://dx.doi.org/10.1016/ j.pbiomolbio.2013.03.003/
46
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Studying the Effects of Instance Structure in AlgorithmPerformance
Tania TURRUBIATES-LOPEZ
Computer Systems Engineering, Instituto Tecnologico Superior deAlamo TemapacheAlamo,Veracruz 92750, Mexico
Satu Elisa SCHAEFFER
Faculty of Mechanical and Electrical Engineering, Universidad Autonoma de Nuevo LeonSan Nicolas de los Garza, Nuevo Leon, 66450, Mexico
ABSTRACT
Classical computational complexity studies theasymptotic relationship between instance sizeand the amount of resources consumed in theworst case. However, it has become evidentthat the instance size by itself is an insufficientmeasure and that the worst-case scenario is of-ten uninformative in practice. As a comple-mentary analysis, we propose the examination ofstructural properties present in the instances andthe effects they have on algorithm performance;our goal is to characterize complexity in termsof instance structure. We propose a frameworkfor identifying and characterizing hard instancesbased on algorithm behaviour as well as a casestudy applying the framework on the graph col-oring problem.
Keywords. Algorithm performance, computa-tional complexity, instance difficulty, structuraleffects.
1. INTRODUCTION
It is intuitive that the difficulty of a problem in-stance varies with its size: large instances areusually harder to solve than small ones. How-ever, in practice, it is becoming recognized thatmeasuring complexity only in terms of the in-stance size implies overlooking any structuralproperty of the instance that could affect theproblem complexity [5]. Individual problem in-stances can be inherently hard, independent of
any particular algorithm used to solve the prob-lem [12].
In this work, we attempt a move towards apractical theory of structural computational com-plexity for graph problems that permits to char-acterize the inherent difficulty of instances ofequal size but different structure. To achievethis, we propose a framework for detecting struc-tural properties and their influence on algorithmperformance. We focus ongraph optimizationproblems and iterative (typically heuristic) algo-rithms, and propose a measure of algorithm per-formance to classify instances as easy or hard interms of the algorithm behaviour when workingtowards the optimum. It is important to empha-size that we attempt neither to rank nor to com-pare one algorithm to another in any sense, butinstead seek to rank and to compare the prob-lem instances themselves in terms of the diffi-culty that their solutions presents to a set of al-gorithms.
The most successful strategies to solve thisproblem employ knowledge of the search space1.Several efforts have been taken to characterizethe search space to extract information in orderto design more efficient algorithms, to choosethe best algorithms or heuristics, and to under-stand the behavior of the algorithms. These stud-ies indicate that there exists a relation betweenthe topology of the search space and the graphstructure [8, 13], and consequently an important
1The search space is the set of all possible solutions givena formulation of the objective functions and a set of con-straints.
47
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
impact in the algorithm performance [15]. Muchwork remains to be done to understand how thealgorithm performance depends on the character-istics and the structure of the graph instance; ex-isting literature points out that there is in fact animportant influence, but with little experimentalresults to validate it.
The remainder of this paper is organized as fol-lows. Section 2 describes the proposed frame-work to study the effects of the instance struc-ture in algorithm performance, as well as, theproposed measure of performance. In Section 3,the proposed framework is applied to graph col-oring. Section 4 presents an analysis of the re-sults obtained. Finally, future research directionsand conclusions are drawn in Section 5.
2. FRAMEWORK TO STUDY THEEFFECTS OF INSTANCE STRUCTURE IN
ALGORITHM PERFORMANCE
We propose a framework for identifying thestructural properties of instances that affect thealgorithmic performance. We attempt to definethe necessary steps, summarized in the follow-ing subsections important considerations in howit should be carried out.
Problem selection
The selection of the computational problem fixesthe information required in each instance andalso the set of algorithms available. It is rec-ommended to choose a thoroughly-studied prob-lem for which theoretical and practical results areabundant and benchmark instances exist. Also,studying a problem to which many other prob-lems are reducible will be informative as the re-sults extends, at least to some degree, to the otherproblems that reduce to the selected object ofstudy.
Instance recollection
Importantly, we must be able to control or at leastmeasure the size of the instances as well as thestructural properties present in them in order tobe able to distinguish between the effect of in-stance size (the traditional complexity measure)and the structure of the instance (the object ofour proposed study). We implemented populargraph-generation methods include the followingfive: Erdos-Renyi (ER), Watts-Strogatz (WS),
Barabasi-Albert (BA), Kleinberg (KL), geomet-ric random graph (RGG). An important consider-ation to apply the framework is that the resultinggraph needs to beconnected.
Instance characterization
For each instance to be used in the study, a se-ries of measurements needs to be made in or-der to identify the structural properties presentin the instance. The minimal information usu-ally gathered of graphs include the degree mea-sures but using only degree-based information isinsufficient in characterizing the graph structure,we need to employ is a rich family of “structuralmetrics”; see the work of da F. Costa et al. [4] fora survey and our previous work [14, 9] for moredetails.
Algorithm selection
As our goal is to detect the properties of theinstances that make them difficult or easy, weshould not limit ourselves to one single algo-rithm, as the particularities of the algorithm couldout-shadow the effects of the instance structure inany subsequent experiment. Hence we need a setof algorithms, preferably the state of the art andwith different approaches into solving the prob-lem at hand.
It is also important to remember that the algo-rithms themselves are not in any way the objectof study, but rather the means. We do not aim torank the algorithms from better to worse, nor dowe wish to punish or reward the algorithms fortheir performance on a particular instance.
Measuring algorithm performance
Our goal is to be able to run two instances on aset of algorithms and then say for which instance,the algorithms had a harder time reaching a goodsolution. This can not be only in terms of the so-lution time nor only in terms of the resulting so-lution quality, as letting more iterations executewould usually improve the solution in any case.
We wish to measure how does the algorithmconverge towards its final solution, iteration byiteration. Does it stall, is there a constant im-provement, are there sudden jumps in the solu-tion quality, and so forth.
In Figure 1 a single algorithm with a singleparameter setup was used on several instancesof the same size but different structure; even for
48
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
10
15
20
25
30
35
0 10 20 30 40 50 60
Obj
ectiv
e fu
nctio
n va
lues
Iterations
Instance 1
Instance 2
Instance 3
Instance 4
Instance 5
Figure 1: Performance profiles for five instancesof the same size but different structure under asingle algorithm.
those instances that have similar final values, theway in which the algorithm arrives to those val-ues through the iterations differs.
We call the curve formed by plotting the valueof the objective function per each iteration duringan execution aperformance profileof the algo-rithm on a particular instance.We settled for thearea beneath the curve as a preliminary measure,easy to compute from the performance profile,and easy to compare among instances.
Experimental design
Now we need an experimental design that per-mits us to reach valid conclusions on the problemunder study: what makes an instance hard? Thegoal of the experiment needs to be determiningwhether and which structural properties affect ina significant manner the algorithm performance,which in turn is characterized by the area belowthe performance profile.
3. CASE STUDY
In this section we apply the proposed frameworkalong with the proposed performance measure, tothe specific problem. We will discuss briefly eachof the six steps, special considerations that werenecessary in each step, and then, in the followingsection, the results of the characterization.
Problem selection: Graph coloring
The selection of our particular computationalproblem to study, the graph coloring problem, isjustified by the vast amount of literature on thetopic; a great variety of algorithms have beenproposed [6] and several theoretical [3] studieshave been published over the past decades.
Instance generation and characterization
We seek to characterize the inherent difficultyof instances of equal size but different structure.This is an obstacle to the use of benchmark in-stances of the graph-coloring problem, as thereare no sufficiently large sets of instances thatfulfill these criteria. Hence we turn to genera-tion models. In our first experiment, graphs ofmedium to high density were generated using thegeneration models, yielding a total of1,800 in-stances in this first set. Also, as real-life net-works tend to be of rather low density and ofmuch higher order [11, 1], we generated a set oflower-density graphs for each model; this secondset comprises of a total of1,050 instances.
The following data was recorded for eachgraph instance: graph order and size, degree dis-tribution and degree dispersion coefficient, aver-age path length, diameter, radius, global and lo-cal efficiency, and clustering coefficient. As mostof these measures are in fact computed for eachvertex, we recorded not only the distribution initself but also computed its average, minimum,maximum, range, kurtosis, asymmetry, variance,and other standard statistical descriptors.
Algorithm selection and performance evalua-tion
We selected algorithms that were considered incontemporary literature as state of the art for theproblem of k coloring [2] — TabuCol, Reac-TabuCol, PartialCol, React-PartialCol — all ofthem are tabu search [7]. These algorithms em-ploy different formulations of the objective func-tion and the solution neighborhood. This givesus a reason to believe that the observations onthe difficulty of solving a particular instance arepossible: if an instance is difficult on all formula-tions, then it is more likely the instance than theformulation that causes the difficulty. For eachinstance generated in the previous step, all fouralgorithms were executed in iterative fashion foreach of the chosen valuesk, with 30 replicas ofeach execution. For each profile we recorded thearea beneath the curve, as well as the quadraticand exponential regression results. We studiedthe average, standard deviation, minimum andmaximal values over the30 replicas.
49
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Experimental Design
We classify the graphs into classes based on thestructural properties and compare the resultingclasses with the generation models employed.The experimental design is focused in determin-ing which, if any, of the structural properties sig-nificantly affect the algorithm performance. Theinstances are handled as two groups: those ofhigh density and those of low density. The hy-pothesis to examine is the same for both sets:“The structure of graphs of the same order andsize has an impact on algorithm performance”.
High-density graphs: Figure 2 shows those ofhigh-density graphs with64 vertices each. It isvisible in the figure that higher-density graphs —that correspond to larger instances as they havemore edges — are on average harder to color,which is expected [10]; also, allowing more col-ors makes coloring easier, as is also expected [5].This is intuitively pleasant and confirms that theproposed measure reflects adequately establishedcomplexity concerns.
Figure 2: Average performance measure for thefour algorithms for64-vertex graphs. On thexaxis, the graphs are grouped by the generationmodel, they axis represents the density, and thez
axisthe number of colorsk given as a parameter.Each dot has a diameter linearly proportional tothe area under the profile curve.
More interestingly, Figure 2 also indicates thatthe generation model has a similar effect on thealgorithm performance over the four algorithms,indicating that the graph structure affects the dif-ficulty of solution independently of the formula-tions of the neighborhood and the objective func-tion. The initial conclusions of this first set ofinstances are the following: the graphs generatedby the ER and BA models seem easier to solvethan those of KL and RGG.
Low-density graphs: For low-density graphs,the effect of the structure in the performancemeasure is much more evident. In general theBA and ER models seem to produce graphs onthe low-density regimen that are easier to color,as seen in Figure 3. Peculiarly, increasingk isless beneficial for the BA graphs than to the oth-ers: their difficulty does not lower as rapidly asit does for the other models. The RGG, KL, andWS models tend to be more difficult to solve forthe values ofk employed in this study.
(a) React-PartialCol
(b) PartialCol
(c) React-TabuCol
(d) TabuCol
Figure 3: Average performance measure for100-vertex withden(G) < 0.06. On thex axis, thegraphs are grouped by the generation model, they axis represents the area under the profile curve.
50
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Table 1: Results of the ANOVA test.
Algorithm F0
PartialCol 1,340.99
React-PartialCol 1,146.09
TabuCol 1,307.60
React-TabuCol 1,647.60
Statistical results: As mere graphical observa-tion is not a sufficient tool when the differencesare slight, we also apply statistical tests. We usethe model
yij = µ+ τi + ǫij , (1)
wherei andj identify the instance and its genera-tion model,yij is the area under the profile curve,µ a global average,τi the effect of the graphstructure (identified by the generation model),andǫij a random error. There areN = 30 repli-cas per observation. We study the hypothesisthat the effect of the graph structure is nil, mean-ing that the algorithm performance is statisticallyequal for all generation models. We use ANOVAfor each algorithm. The rejection criterion is setasF0 > Fα, a−1, N−a, with a significance levelα = 0.05 and a power of0.916.
The analysis of variance for100-vertex graphswith den(G) < 0.06 andk = 3 is shown in Ta-ble 1. For each algorithm we haveF0.05, 4, 25 =2.76. This allows us to conclude that thereis a significant effect in algorithm performancecaused by the instance structure, regardless of thealgorithm used in the experiment2.
This type of analysis was carried out for allgroups of instances, when density is increased,the values of theF0 statistics decrease, indicat-ing that the differences in algorithm behaviourare less and less evident. For all4 algorithms, itwas found that the instances generated with KL,WS, and RGG were more difficult to solve thanthose generated with the BA and ER models.
Classification of instances We are aiming toidentify structural properties that differentiateamong the easy and difficult instances, we exe-cuted a clustering algorithm the goal is explorewhat characteristics differ for the easy and thehard instances, see Figure 4.
The most difficult instances were those with ahigh value of the clustering coefficient, whereas
2We examined the assumptions of normality as requiredfor the analysis of variance to be valid.
the easiest were those with much lower values ofthe clustering coefficient, also standard deviationof degree are, for the graph coloring problem, arelevant structural properties.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 1 2 3 4 5 6
Clu
ster
ing
coef
ficen
t
Standard deviation of the degree distribution
ER
BA
WS
KL
RGG
Figure 4: A projection to the structural metrics ofthe standard deviation of the degree distributionand the clustering coefficient.
4. CONCLUSIONS AND FUTURE WORK
The experimental results from the case study in-dicate that the proposed measure adequately re-flects the effect of the instance structure in the al-gorithm performance in the graph-coloring prob-lem, which we believe to be representative ofmany interesting classes of graph problems andcombinatorial optimization in general.
In our case study, graphs that turned out to bedifficult were those of RGG, KL, and WS, whichhave higher values of the clustering coefficient.It is of future interest to statistically verify the re-lationship of the clustering coefficient and stan-dard deviation of degree and the algorithm per-formance.
A line of future work is structural optimiza-tion: given a graph and a computational problem,as well as indicating whether one desires for theproblem to be easy or hard, gradually modifyingthe structure of the graph until it falls into the de-sired regimen of the values of the (approximated)structural properties. One can also impose a bud-get that limits the amount and extent of modifica-tion imposed on the structure.
5. ACKNOWLEDGMENTS
This work has been supported by the UANL un-der grants PAICYT IT553-10, and the CONA-CyT (2010–2011) under grant 49130.
51
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
6. REFERENCES
[1] Reka Zsuzsanna Albert.Statistical mechan-ics of complex networks. PhD thesis, Uni-versity of Notre Dame, Notre Dame, IN,USA, 2001.
[2] Ivo Blochliger and Nicolas Zufferey. Agraph coloring heuristic using partial solu-tions and a reactive tabu scheme.Comput-ers & Operation Research, 35(3):960–975,2008.
[3] G. Chartrand and P. Zhang.ChromaticGraph Theory. Chapman & Hall/CRC,Boca Raton, FL, USA, 2008.
[4] L. da F. Costa, Francisco A. Rodrigues,G. Travieso, and V.P Villas Boas. Char-acterization of complex networks: A sur-vey of measurements.Advances in Physics,56(1):167 – 242, January 2007.
[5] J. Flum and M. Grohe.Parameterized com-plexity theory. Springer-Verlag, Secaucus,NJ, USA, 2006.
[6] Phillipe Gallinier and Alan Hertz. A surveyof local methods for graph coloring.Com-puters & Operations Research, 3:2547–2562, 2006.
[7] Fred Glover and Manuel Laguna.TabuSearch. Kluwer Academic Publishers, Nor-well, MA, USA, 1997.
[8] J.P. Hamiez and J.K. Hao. An analysisof solution properties of the graph coloringproblem. Applied Optimization, 86:325–346, 2003.
[9] Tania Turrubiates Lopez. Clasificacion deredes complejas usando funciones de car-acterizacion que permitan discriminar en-tre redes aleatorias, power-law y expo-nenciales. Master’s thesis, Instituto Tec-nologico de Ciudad Madero, November2007.
[10] R. Mulet, A. Pagnani, M. Weigt, andR. Zecchina. Coloring random graphs.Physical Review Letters, 89(26):268701,2002.
[11] M. E. J. Newman. The structure and func-tion of complex networks.SIAM Review,45:167–256, 2003.
[12] P. Orponen, K. Ko, U. Schoning, andO. Watanabe. Instance complexity.Jour-nal of the ACM, 41(1):121, 1994.
[13] Daniel Cosmin Porumbel, Jin-Kao Hao,and Pascale Kuntz. A search space “cartog-raphy” for guiding graph coloring heuris-tics. Computers & Operations Research,37(4):769–778, 2010.
[14] Satu Elisa Schaeffer. Algorithms fornonuniform networks. Research ReportA102, Helsinki University of Technology,Laboratory for Theoretical Computer Sci-ence, Espoo, Finland, April 2006. Doctoraldissertation.
[15] K.A. Smith-Miles, R.J.W. James, J.W. Gif-fin, and Tu Y. Understanding the relation-ship between scheduling problem structureand heuristic performance using knowledgediscovery.Lecture Notes in Computer Sci-ence, 5851:89–103, 2009.
52
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
ABSTRACT
This paper explores the notion of scholarly inquiry from a variety of science education perspectives. Such a perspective allows the individual to view scientific phenomenon from a variety of epistemologies when solving socio-ecological problems in the field. Concept models are used to simplify relationships towards understanding social-ecological systems as communication tools. Models quantitatively deliver empirical data, define questions and concepts, generate hypothesis, make predictions and determine the relationship between the whole and parts. Models explore ways to create new paths and determine what we know and what don’t. Proper translation by using conceptual modeling can improve implications when making social and interdisciplinary connections. Common language for interdisciplinary research includes a place for human and ecology, and questions about processes and functions using scales. Scientist or social scientist when communicating dynamically about their topics in field studies or in their discipline should be able to shift gears when relating to papradigms such as from pragmatism, to interpretism or constructivism, or postmodernism to bracketing such as in case study. Paradigm shifting is necessary when collaborating & solving complex worldwide interdisciplinary field problems regarding natural resource management. It can also advance the design of new courses and interdisciplinary programs at universities.
Keywords: Interdisciplinary concept models, paradigm shifting, socio-ecological inquiry, methodologies, epistemologies, natural resource management, Global field study research.
INTRODUCTION
How do epistemologies and methodologies from other disciplines connect to make sense and how does such a lens zoom in on the research as it is applied to multiple contexts, and disciplines?
In the vast arena of contextual interpretation and the uni –dialect that currently exists one need to be able communicate across variable economic, social, historical, physical and natural systems. The interpretations lack concrete evidence and cause and effect strategies which often get disguised and transposed to look at an entirely different set of meaning and constructs. For me this takes time and life experiences that teach to understand and translate. Reflexivity can get one there faster.
When addressing knowledge based research the problem arises when focusing on context and meaning. The proper translation and conceptual modeling can have implications for social and interdisciplinary connections. The structure of scholarly inquiry from a science education perspective allows the individual to view scientific phenomenon from a variety of epistemologies to solve problems.
Educational philosopher John Dewey, suggest that we solve problems by using our past experiences and connecting them to things we currently know about. Modern philosopher, Thomas Kuhn discusses the structure of scientific revolutions by experiencing a paradigm shift. If we look at the disciplines in nature, culture and religion we may reveal a deeper understanding of social issues and science. Thinking possible….when thinking of the ability to frame shift within and among paradigms in science and history the book the Structure of Scientific Revolutions by Thomas Kuhn comes to mind.
THOMAS KUHN’S SUGGESTIONS Thomas Kuhn is well known for his attempts to
vindicate the nature and existence of scientific and social paradigm revolutions. He talks of using illustrations and examples to visualize the revolutions which appear as scientific knowledge. The reason for this view is that scientists use their image from an authoritan source (such as text book, philosophical work and popular representation). The text usually disguises the existence of the scientific revolution. The text uses popular representation and philosophical work to address problems of data, and theory, committed to a set of paradigms when they were written. Textbooks try to keep up with the scientific terminology of the day. Popular models attempt to explain applications in a language closer to everyday life. The philosophy of science analyzes the structure of the body of knowledge. All three (text book, philosophy, popular models) display the outcome of past revolutions and thus display the bases of normal scientific tradition.
DEVELOPMENTAL PATTERNS
A dominant mature text will differentiate its
developmental patterns from other fields. Textbooks, the pedagogical vehicle for normal science have to be rewritten after each scientific revolution. Once rewritten they hide the role or existence of the revolution that produced them. Unless the author has lived through the revolution they only write about the most recent revolution that they have experienced. Textbooks then replace the discipline’s history and supply a substitute for what was eliminated or estimated to be true. Scientific textbooks refer only to the work that contributes to statements and solutions of paradigm shifts.
ORIENTATION
Scientists skew the history because the results of
scientific research show no dependence on historical context of the inquiry and except during crisis or revolution the scientist’s position is accepted. More scientific detail could highlight the things that were meant to be deleted. Scientists traditionally look deeply into science facts and misrepresent historical facts. Whitehead quotes “a science that hesitates to forget its founders is lost”. (p.138) Kuhn says that science does need its heroes. What results are consistent tendencies to make the history of science seem linear which affects scientists when even looking back at their own work. For example, Dalton was interested in chemical problems of combining proportions early on that he was famous latter for solving. All of Dalton’s work omits the
Paradigm Shifting through Socio-ecological Inquiry: Interdisciplinary Topics & Global Field Study Research
Christine M. YUKECH
Secondary Science Education, Curriculum & Instruction The University of Akron Akron, Ohio 44325, U.S.
53
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
revolutionary effects of applying to chemistry questions restricted to physics and meteorology. This new orientation taught chemists how to solve new problems and new conclusions from old data. It is this sort of change that accounts for empirical discoveries and transition from Aristelian to Galilean to Newton dynamics.
MISCONCEPTIONS RELATED TO PARADIGM SHIFTING
Misconstructions render revolutions invisible. The
arrangement of material in a science text implies a process that if existed would deny the revolution a function. The aim tends to be to quickly orient the science student to what the scientific community thinks it knows, so various textbooks, concepts, laws, theories of current normal science present the science separate from the true natural connecting the events. Information becomes presented linear instead of dynamic.
Science has taken place by a series of individual
discoveries and inventions, that when gathered together present a body of modern technical information. Scientists add information to the paradigms like bricks in a wall one after the other. Kuhn says this is not the way science develops. Many of the puzzles of contemporary normal science did not exist until after the most recent scientific revolution. It is hard to trace back their scientific origins. The problems haven’t changed but rather the whole nature of fact and theory has shifted. Dalton fitted associations with theory and fact to earlier chemical experience as a whole changing that experience in the process. Theories do not evolve to fit the facts. Fact and theory merge together to form a revolutionary reformation with preceding tradition. CONCEPT DEVELOPMENT NOTIONS THROUGH THE
IMAGE OF SCIENTIFIC EXPERTS Another example of the impact of textbook on the
image of scientific development was with elementary chemists where text must discuss concepts of chemical elements. This concept is almost always represented by the seventeenth-century chemist Robert Boyle who Skeptical Chemist has provided the reader the ability to find a definition closer to that in science today.
Like time ‘energy’, ‘force’, or ‘particle’ the concept of element is often not invented or discovered at all. Boyle’s definition can be traced back to Aristotle and forward through Lavoisier into modern texts. Both Lavoisier and Aristotle changed the chemical significance of ‘element’ but they did not change the verbal formula that serves the definition. Einstein did not have to redefine ‘space’ and ‘time’ in order to give them meaning within his work. Robert Boyle then was a leader in a revolution that changed the relation of ‘element’ to chemical manipulation and chemical theory. He transformed the notion into a tool different from before and this in turn changed the chemist and world of chemistry. Other revolutions like this one centered on Lavoisier gave form and function to the concept. Boyle gives examples of stages of development of what happens to the process when existing knowledge is in text. That pedagogic form has determined our image of the nature of science and the role of discovery and invention in its advance.
My Opinion: If we did fit the facts to fit the context of subject matter (of the time) than the concepts that arise would create a deeper understanding of the material being studied. It
seems as though Kuhn was trying to pinpoint addressing pedagogical issues of science literacy. He found good examples with Boyle, Dalton, and Aristotle.
HOW DO WE FIT EXISTING KNOWLEDGE WITH PAST KNOWLEDGE AND STILL HAVE IT MAKE
SENSE? That is the puzzle when dealing with text. If you take
a term out of context then how to you reconnect it or fit it in with its process or function or meaning for that matter. Revolutions become invisible when we don’t recognize that this takes place.
Thomas Kuhn’s explains that historical and scientific paradigms need to synergize in order for meaning making across them. He tries to use a formula to help the reader shift gears when understanding the meaning of the new normal or accepted constructs. He seems to be explaining how the scientific community interprets the new.
CONSTRUCTIVE INTERPRETIVE PARADIGM AND
INTERDISCIPLINARY CONCEPT MODELS
To branch from Thomas Kuhn’s ideas of scientific social revolutions to one of scientific social constructive interpretive paradigm shift further I decided to discuss the article entitled, ‘Insight Conceptual Models as Tools for Communication Across Disciplines,’ by Heemskerk, M., K. Wilson, and M. Pavao-Zucherman. The article explores systems and determines the parts and processes. To understand the complex interdisciplinary science concepts models were used as communication tools. In this article models are used to construct meaning across disciplines. Constructivism tries to construct and deconstruct meaning. I believe the concept models tried to get at the root of what the scientists knew about. The interpretive part came through processing the knowledge and communicating ideas across disciplines. The conceptual models were interpreted among interdiscipnary scholars at various field research sites. Concept models were used to simplify relationships towards understanding social-ecological systems.
The conceptual models more or less quantitatively
delivered by abstract or empirical data define questions and concepts which generate hypothesis and predictions and determine the relationships between whole parts. Models explore the behavior which helps to explore new paths and to determine what we know and what don’t know.
MODEL BUILDING PROFESSIONAL DEVELOPMENT
Professional development groups were established and
taught in 2.5 hour designed courses. The groups consisted of interdisciplinary teams of young scientists of social-ecological systems using meta data from long term ecological research sites across the United States. This type of collaboration helps to understand human intentions and behaviors and ties the ecology and social science together. The model building professional development workshops helped the participants develop questions, determine system boundaries, gaps in current data, and provide thoughts and predictions from the group. The workshop organizers were on board with INGERT or Integrative Graduate Education
54
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
RESEARCH AND TRAINING UNIVERSITY PROJECT SITES
Research and training took a long term project made
up of 26 participants from 11 Universities throughout the United States. Most of the participants were graduate students, but few were professionals and Post doctorate researchers. The research took place over 6 L.T.E.R. sites in Michigan, Oregon, Puerto Rico, and Israel and the Everglades. The goals were to share the pros and cons learned about model building and processing models as a tool for interdisciplinary work.
SCIENTIST PARTICIPANTS
Scientists from different fields are more likely to be able to communicate and interpret information about diverse social systems, including forests, desert, northern lakes, agricultural systems, and urban landscapes. Workshop participants were divided into their skilled backgrounds whether quantitative or qualitative, human or natural systems. They were also sorted into applied or theoretical, social scientist or natural scientist. The lessons learned were common language for interdisciplinary research, the place for human and ecology, and questions about processes and functions using scales.
COMMON LANGUAGE
Common language became a process so that
restoration was not just lip service or a patch to an ecological problem but a real solution to a real world problem in the field. The ideas of community and scale at each L.T.E.R. site needed differentiation. The processing also allowed for values, ideas, opinions, and beliefs into one box representing human mental ideas which were discrepant to the social scientist way of thinking. The social scientists discuss how human behavior is caused by human values and behaviors acted out. For example, a hydrologist needs to know the requirements of the people that live at different vantage points of a mountain in order to design the right systems to collect and utilize a water shed.
PARTICIPANT CONCEPT MODELS AND MAPS
Concept maps include the following types of symbols, energy pathways and flow, consumer transformation of energy, dialectic field, and propaganda to promote something.
The products of the workshop reveal and suggestions for solving interdisciplinary problems distinguished between types of people, (fishers or farmers), types of behaviors, (political or economic), mental processes, (values and attitudes). Ecological systems problems are easier to reveal as daphnia and fish do not complain when their behavior is under represented. Each field representation explains the socio-ecological suggestions for solving the management of natural resources.
CONCEPTUAL MODELS & SCALES DETERMINE NATURAL RESOURCE MANAGEMENT
The groups discussed how scales can determine the
management decisions that the social and interdisciplinary research problem solving reveals. The models main internal drivers are economic, ethical, political, social, and ecological sustained. The model conveys that by collecting and publishing data scientists can influence regional development and ecosystem management.
The models were good for producing interpretive
discussion which helped determine the things that the research
scientists agreed or disagreed about. They also were constructed by the interdisciplinary research teams so that they could interpret data and rationalize results. There is hope that this type of communication will try sort anthropogenic and biological factors that push for ecological change. The communication needs to be synergetic in that they need to cross many boundaries. It can however clarify research questions and designs. For a true interpretive shift to take place the policy makers, concept models, anthropologists, ecologists, biologists and social scientists need to look beyond just details and agendas and listen to the problems at the site communicated from multiple entities.
THE RESEARCH FIELD STUDIES
The Florida coast Everglades’ field site analyzes for
regional forces that control population diversity. The study researches how human use of water affects the aquatic biological communities. The Israel Ecological research site shows how human decision making affects the social and ecological factors that affect grazing conditions in semi-arid shrub lands in South Israel. The Kellogg Biological Station problem was to show how land use has changed over time, and how these changes feed back into linked social ecological systems. The model suggests that population growth will create extra demands on water resources. The Luquillo Experimental Forest Long Term Ecological research site, discusses the cause and effect of increased tourism in the Luquillo Porto Rico site. Future development might cause problems for the coastal forests and wetlands, which include habitat for endangered species, nesting beaches for leather back sea turtles and coral reef communities. The Andrews Experimental Ecological research site shows how public perceptions of research influences local resource use and management. The Central Arizona Phoenix Long-term ecological research site documents the long-term change in use and role in shaping the urban recreational, agricultural, and desert landscapes of today.
My Opinion: I think the article needed to discuss how the concept models helped to explain and interpret the ways to communicate and define and diagnose the problems in the field. There was more talk about the resource management then the way the concept tools were used to diagnose communication problems then ways to engage the new knowledge and employ it in the field.
BRACKETING THROUGH CASE STUDIES/STORIES THAT REVEAL SOLUTIONS USING A
PHENOMENOLOGICAL PARADIGM- GETTING THE TRUTH THROUGH CONTEXTUAL STORIES
The article Barriers and Facilitators to Integration
among Scientists in Transdisciplinary Landscape Analyses: A Cross-Country Comparison, by Christine Jakobsen, Tove Hels, & William McLaughlin falls within the phenomenological paradigm. This paradigm I chose to write about talks of bracketing the paradigm to hold constant and remove the influence of possible distractions. These ideas transmute through a cascading effect where the bracketing causes a phenomenological shift to reveal underlying reasons of the problems presented and ways to solve them. The study consisted of two groups, one group of scientists and the other made up of scientists, government agencies and policy makers. In order to get to the roots of the interdisciplinary communication case studies, bracketing was used to help differentiate what kinds of things became themes or strands that the groups agreed with or came to a consensus about.
55
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
My Opinion: The case studies became a good way to diagnose how the groups began to understand each other’s interdisciplinarity. However, I didn’t feel the meaning of the research problems were clearly defined. Instead of solutions related to the field sites and regulating policies, it became a meaning making of the communication among the groups and their various disciplines.
PRAGMATISM TO SOLVING PROBLEMS SOLVING
THROUGH A POST MODERN PARADIGM The final article chosen for analysis is called Bridges
and Barriers to Developing and Conducting Interdisciplinary Graduate-Student Team Research by Pincus, M., Force, J., and Wulfhorst. This article chooses a frame of research questions addressing the impact of land-use change or payments for environmental services. Problem definition and pragmatism helped to determine what type of integration or ways to understand the research questions and dilemmas. As the integration ranged from disciplinary to trans disciplinary a shift in gears from a pragmatic definition or frame of the research questions to moving to a wider disciplinary/interdisciplinary postmodern lens. A transdisciplinary case study was then used to diagnose solutions created by the teams about the field sites. This brought the research to life giving it new meaning in a phenomenological frame. To truly be a change agent one needs to be able to shift gears and follow through with a critical awareness of putting the solutions into action. This takes time, money, collaboration cooperation and the ability to understand the soci-ecological issues that needed of remediation and repair.
The article spells out how the interdisciplinary research teams began to expand beyond traditional research barriers. The research group or what is called I.G.E.R.T. or Integrated Graduate Research of Higher Education, sometimes referred to as CATIE is a NSF interdisciplinary and International funded grant initiative through the University of Idaho Tropical Agricultural Research and Higher Education center in Idaho and Costa Rica with the theme of biodiversity conservation and sustainable production of anthropogenic fragmented landscape. I.G.E.R.T. consisted of 18 doctoral students in 4 cohorts over 2 years. Their backgrounds included; botany, economics, entomology, forest ecology, wildlife and plant genetics, hydrology, remote sensing, rural sociology, soil science and wildlife biology. Student’s coursework took place in the College of Natural Resources, College of Agriculture and Life Science and Environmental Science. Five research teams operated in agricultural and forested landscapes in Costa Rica and Idaho, and the USA. (3 in Costa Rica and 2 in Northern Idaho). I.G.E.R.T. fellows were involved in an interdisciplinary effort to study conservation biology and sustainability issues. Four cohorts of students are working closely with their UI and CATIE advisors, as well as other UI and CATIE graduate students. Students from different disciplinary backgrounds, including biological, physical and social sciences, work together in the cohorts and conduct comparative and cooperative research projects across ecological settings and disciplines.
BARRIERS
Barriers include the ability to address complex scientific dilemmas with disciplinary specialization which does not guarantee the ability to solve complex problems.
Crossing the barriers requires; A. Funding/Time – joint proposal writing/doctoral preliminary exams B. Cross disciplinary cooperation/ integrated technical training
D. Getting around turfism E. Getting around egos F. Getting past differences in methodologies
DESIGN OF THE RESEARCH
The design of research questions integrates theoretical knowledge with practical problem solving. The research outcomes need to impact the knowledge structures of each represented disciplinary product constructing ways to create a critical awareness towards understanding the needs for interdisciplinary research. The project wanted to produce graduate students with interdisciplinary backgrounds who had accurate knowledge in chosen disciplines with technical, professional and personal skills to help them become their own career leaders and creative agents for change. Issues with the methods such as individual, disciplinary, and programmatic themes become a bridge or barrier. The spectrum of integration of the project required coordination, collaboration, combined inquiry, sharing, creation, synthesis of the knowledge among the research from various disciplines.
TRAINING AND RESOURCES ISSUES
Training and resource provided bridges and barriers
for technical training which was provided through integrated networks. Funding took place through a three year graduate stipend that provided funding for professional travel. The groups were given time for joint proposal writing, doctoral exams and coordinated proposal writing.
Recommendations for accountability and communication strategies
• Developing formal and informal
communication strategies • Select team members thoughtfully and
strategically to address temporal and spatial scale issues
• Recognize and respect timing issues • Define focal themes and research questions
jointly and clearly • Emphasize problem definition and team
proposal writing • Target interdisciplinary training identify
mentors on team integration
My Opinion: I really like the dynamic approach when trying to solve such a large scope of social biodiverse ecological problems. I think starting with a spelled pragmatic approach connecting theory to practice and then shifting gears to a post modern way of applying the ideas and putting things into action in the field helps make the study more meaningful. Considering I would love to work with such a program and have experienced many similar field research projects I think this approach works when narrowing in on the socio- cultural issues at each field site. When working with the integrated bioscience program at Akron the missing link is experiencing the project in the field.
Integrated Bioscience and Field Study Experiences:
This semester 11 doctoral students of various disciplinary specialty backgrounds at the University of Akron were asked to communicate across boundaries to form research topics that
56
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
have been addressed before. There were natural tendencies for groups to sort into comfort zones pertaining to research specialties. Our group ended up with three main topics, Atrazine in bioremediation zones, slugs with chloroplast processing abilities and a new science puzzle logic teaching method that helps to show the pathways students use when determine how students reason and solve real world problems in an Anatomy & Physiology course. I thought this article did a good job of crossing and shifting paradigms by switching from pragmatism to solving postmodern problems with a phenomenological case study approach. It was hard to narrow this article down because as the research grew it gained steam in its potential impact to help cross bridges and barriers when related to interdisciplinary research topics.
CONCLUSION
The articles included in this paper presents a dynamic
way of defining, describing, interpreting and constructing ways of putting the theory into practice to solving problems by being able to see them through different paradigms. For a true interpretive shift to take place the policy makers, anthropologists, ecologists, biologists and social scientists need to look beyond just details and agendas and listen to the problems at the field sites communicated from multiple entities. I felt it necessary to include the article about forest policy as it gets to the roots of transferring meaning across many social issues and disciplines by using bracketing as in case study to look closely at the details and values behind the need for bioremediation and policy making. In order for socio-ecological change to take place there needs to be a platform for creating the space to dialogue and use conceptual models that find a way to put the ideas into motion.
REFERENCES
[1] Morse, W.C., Nielson-Pincus, Force, J., & Wulfhorts,
J, D. 2007. Bridges and Barriers to [2] Developing and Conducting Interdisciplinary
Graduate Student Team Research. Ecology & Society 12(2):8
[3] Heemskerk, M., K. Wilson, and Pavao-Zuckerman. 2003. Conceptual models as tools for communication across disciplines. Conservation Ecology 7(3): 8.
[4] Jackobsen, C., Hels, T., McLaughlin, W. 2004. Forest Policy and Economics. 6 15-31.
[5] Kuhn, T. 1962. The Structure of Scientific Revolutions. The Chicago Press.
57
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
The Optimization of Formative and Summative Assessment by Adaptive Testing
and Zones of Students Development
Victor Zvonnikov
The Ministry of Education and Science, State University of Management
Moscow, Russia
and
Marina Chelyshkova
The Ministry of Education and Science, State University of Management
Moscow, Russia
ABSTRACT
In article the approach to optimization of formative and
summative assessment by optimum scores item
difficulties during carrying out of adaptive test is
described. This optimisation is based on connection of
the concept about zones of
development and mathematical models of the modern
theory of tests Item Response Theory. By this
connection some inequalities are resulted, which are
allowing to allocate items for maintenance of
development during formative assessment and to
minimise the measurement error at carrying out
summative assessment.
This inequalities are supplemented with the personal
characteristic curves illustrating zones of
developmen. The steepness of personal characteristic
curve constructed by means of two-parametrical model, is
interpreted as parametre of structure
knowledged.
Keywords: zone of development,
Item Response Theory, ability parameter, difficulty
parameter, formative assessment, summative assessment,
one-parametrical model, two-parametrical model,
structure of knowledge, likelihood function,
adaptive test.
THEORETICAL PRECONDITIONS
The differences among individuals have strong effect on
learning results. Issuing the important relation between
individual differences and learning results has a very long
history in Russia. In 30 and 40th years known Russian
psychologist L.Vygotsky was engaged in research of this
relation. He suggested the concept about three zones of
learner development: zone of actual development, zone
of nearest development and zone of learner s perspective
development. At Soviet schools in 40 - 60th
it was
considered to be that cognitive abilities of all pupils can
be developed to the same degree. If it does not occur for
particular learner, the teacher is guilty - he was not giving
sufficient attention to development such learner. In 70
and the next years certain progress was outlined. It was
recognized that the challenge of improving learning and
performance largely depends on correctly identifying
characteristics of a particular learner. So theorists and
teachers began to analyze the reasons of result
distinctions in achievements and to interpret them not
only in the context of training methods and teacher work
quality, but also in connection with individual differences
in abilities [6].
The concept of Vygotsky has had the further
development in Zankov researches. . He has entered the
training principle at high level of difficulty when training
and control for each learner will be organized by means
of the most difficult items. In 60-80th
some Russian
schools began to create experimental didactic systems
where the training principle at high level of difficulty
items had the central place in activity. However these
theoretical postulates could not be effectively realized in
training practice as lacks of a traditional training and
quality monitoring systems did not allow realizing the
principle. The items selection on high difficulty
level for each learner was carried out intuitively that quite
often led to excess of item difficulties when they did not
provide [7].
The researches in sphere of Item Response Theory (IRT)
and development in this theory some mathematical
models have allowed comparing the learner level of
ability and level of item difficulty. The idea of
comparison has been realized in adaptive testing on the
basis of item selection by the equation the
scores of ability parameter and the scores of difficulty
[3]. These scores are associated
with zone of actual development. The equation
helps to optimize item selection for summative
assessment because it provides high estimations
reliability score. But it does not
provide the organization of formative assessment when
the development of learners is realizes on the basis of
more difficult items performance [4] .
58
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Such scores are associated with zone of learner s nearest
development.
THE RULES FOR OPTIMIZATION ITEM
SELECTION IN FORMATIVE AND SUMMATIVE
ASSESSMENT BY ADAPTIVE TESTING AND IRT.
We tried to solve the problem of optimum items selection
for current scores in formative and summative assessment
together with idea of L.Vygotsky and to formalize this
individualization zones by mathematical models of IRT.
So we suggested some inequalities which are offered
correspond to various zones of on
the base of Rasch model of IRT for summative and
formative assessment. In adaptive testing the value of
probability of correct item performances for summative
assessment is set by the inequality | i ( i - ) - 0,5 | <0,1,
where i- level of i th
learner ability, - difficulty of items
and all item are locally independent.
Within the one-parametrical model G. Rash [1] Eq.(1) it
is possible to write probabilities as ij in the form of
)(7,1
)(7,1
1 ji
ji
e
ePij
jieQij
1
1 and ij = 1 - Qij and
j - difficulty of j th
item. (1)
After some transformations of inequality for probability
of correct item concerning parameter difficulty we have
values in the range - 0,20 < i - or i
) - 0,24 < i - ) + 0,20, taking into
account borders of a confidential interval for parameter
estimations at a significance value = 0,05. Such items
correspond zone of actual development and they
are optimum for summative assessment on the base of
adaptive testing.
The probability of correct item performance
corresponding on difficulty to zone of the nearest
development is defined by an inequality 0,2 < i i -
<0,4. From here, as well as earlier, it is easy to receive
the range for the i - - 0,80
i - -0,20. After some transformations, taking into
account borders of a confidential interval, we have the
range of estimations for corresponding to zone of the
nearest development i i + 0,80 or i +1,96Se
i - Such items
correspond zone of nearest development and
they are optimum for formative assessment on the base of
adaptive testing.
Last received inequality helps to rethink the connection
between training principle at high level of
difficulty and a principle of availability in training and to
enter the formalized characteristic. If any learner carries
out the items from the interval i i + 0,80, it
is possible to realize the principles of availability and
optimum high difficulty items simultaneously in adaptive
testing during formative assessment. For instruction to be
maximally effective, it should use by formative
assessments such items.
The subsequent possible values of probability of correct
item performance is set by inequality 0,6 < i i -
and corresponds to situations when items are too difficult
for the organisation of adaptive training during formative
assessment. From this inequality the range of item
difficulty parameter values i can be correlate
to zone of the further perspective development.
Such assessments can provide the basis for planning of
subsequent instruction. These conclusions are shown on
figure 1.
Figure 1. Geometrical interpretation of intervals for item
difficulties withp Personal curves of examinee
The first interval corresponds too easy items at which
activity proceeds on the basis of already come to
the end cycles of development. The second interval
corresponds zone of actual development. The
third interval - difficult items which correspond to zone
of the nearest development. The fourth interval -
too difficult items concerning zone of the further
perspective development.
It is more interesting to delimitate the inequalities for
zones of development by two-parametrical
model of IRT [2], where the probability of correct
performance for different learners with ability can be
given by formula Eq.(2)
dzePjja z
i
)(
2
2
2
1)(
, and
2)(1
)(
jbis
jbisj
r
ra and rbis - biserial correlation
coefficient (2)
As well as earlier the probability of correct item
performance corresponding on difficulty to level of actual
development is defined by inequality | i [ai i - - 0,5
| <0,1 and product ai i - interval
(0,20; 0,24). Unlike the situation considered above within
one-parametrical model, now the range of estimations of
item difficulty parameter values corresponding to zone of
59
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
i th
learner actual development will be defined not only
by i, but also by value ai. So for i th learner in the
assumption of positive values ai the inequality looks
i - 1/ai i + 1/ai
KNOWLEDGE STRUCTURES
For two-parametrical model of IRT
the structure of knowledge can be estimated by
[5]. As the value of ai
depends on number of errors in pattern it
is quite reasonably to correlate it with quality of
knowledge structure. By results of similar correlation
simply enough to draw the general conclusion: the higher
steepness of personal curve corresponds the
better structure of knowledge. It is shown by
figure 2.
Figure 2. Personal curves of examinees with equal levels
of ability, but various values of parameter i
Inequality i - 1/ai i + 1/ai allows to
draw an interesting conclusion about length of interval on
an axis of item difficulties. In case of i th
learner structure
of knowledge high quality, corresponding to great values
ai, borders of interval i - 1/ai i + 1/ai
for the organisation effective adaptive summative
assessment decreases / If values of parameter ai begin to
decrease, the width of interval defined by i -
1/ai i + 1/ai increases. Noted effect
shows influence the value of structure parameter on
borders of a zone of actual development which is quite
clear and co-ordinated with training experience
accumulated by each teacher.
ALGORITHMS FOR FORMATIVE AND
SUMMATIVE ASSESSMENT BY ADAPTIVE
TESTING
Algorithms of assessment demand rescoring
ability after performance every item of the adaptive test.
If we use new symbol j instead of accepted earlier
probability of the right answer Pj and designate
observable dichotomizing results of examinee answers of
the adaptive test by symbols {x1, x2 j k} (j = 1,
we can enter likelihood function for Rasch
model scores on k step of adaptive testing Eq.(3)
k
j
x
j
x
jkjj TTL
1
1)](1[)]([)( (3)
where Lk ( ) - likelihood function.
The a posterior estimations of parametre
after performance k looks like Eq.(4) Q
q
Q
q
kkk tqWtqLtqWtqLtq1 1
),()(/)()( (4)
where tq quadrature points dividing the interval of
possible distribution of measured variable from - 4 to
+4 logits For the
chosen number of quadrature points tq+1 - tq = 0,1 and q =
w (tq) - weights in quadrature points,
recalculated after performance of each next item of the
adaptive test and and Eq.(5) Q
q
tqW1
)1)(( (5)
Lk (tq) - values of likelihood function in quadrature points.
The a posterior estimation of standard deviation for
looks like Eq.(6)
Q
qk
Q
vkk
S
tqWtqL
tqWtqLtq
1
2
)(
)()(
)()()(
2
1
(6)
Where Sap - a posterior estimation of standard deviation.
THE ANALYSIS OF RESULTS
It is possible to formulate some conclusions creating the
necessary preconditions for optimization of formative and
summative assessment by adaptive testing and zones of
students development:
- item selection for adaptive testing with difficulty i +
i + 0,80 corresponding to zone of
actual development allows to optimise summative
assessment;
- - item selection for adaptive testing with difficulty i -
0,24 < i + 0,20 corresponding to zone of
nearest development allows to optimise formative
assessment;
- it is exists mutual influence between possible values of
parameter and width of zones of nearest
development. At small values of parameter (ai <1)
and small values the width of a zones depend, basically,
from fraction value 1/a.
REFERENCES
1. Applying the Rasch Model: Fundamental
Measurement in the Human Sciences / Bond T.G.,
Fox CM. Lawrence Erlbaum Associates, 2007.
2. Baker F.B. Item Response Theory: Parameter
Estimation Techniques. ASC. Univ. Ave, 2004.
60
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
3. Computerized Adaptive Testing : Theory and Practice
/ Ed. by Wim J. van der Linden and Cees A.W. Glas.
London: luwer academic publishers, 2003.
4. . Measurement and Assessment in Schools / Ed. by
B.R. Wormen, K.R. White, Xitao Fan. R.R. -
Sudweeks, 2004.
5. Weiss D.J. (Ed.) New Horizons in testing. N.-Y. :
Academic Press, 1983.
6. .
7.
61
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Pattern-Based Enterprise Systems: Models, Tools and Practices
Dr. Sergey V. Zykov, Ph.D.
National Research University Higher School of Economics, Moscow, Russia
szykov@ hse.ru
Abstract
Building enterprise software is a dramatic challenge
due to data size, complexity and rapid growth of the both
in time. The issue becomes even more dramatic when it
gets to integrating heterogeneous applications.
Therewith, a uniform approach is required, which
combines formal models and CASE tools. The suggested
methodology is based on extracting common ERP module
level patterns and applying them to series of
heterogeneous implementations. The approach includes
an innovative lifecycle model, which extends conventional
models by: formal data representation/management
models and DSL-based CASE tools supporting the
formalisms. The approach has been implemented as a
series of portal-based ERP systems in ITERA
International Oil and Gas Corporation, and in a number
of trading/banking enterprise applications elsewhere. The
works in progress include semantic network-based airline
dispatch system, and a 6D-model-driven nuclear power
plant construction methodology.
1. Introduction
The paper outlines the new technology for large-scale
integrated heterogeneous applications. Currently,
multinational enterprises possess large, geographically
distributed infrastructures. Each of the enterprises
accumulates a huge and rapidly increasing data burden. In
certain cases, the data bulk exceeds petabyte size; it tends
to double every five years. Managing such data is an
issue. The problem is even more complex due to
heterogeneous data, which varies from well-structured
relational databases to non-normalized trees and lists, and
weak-structured multimedia data. The technology
presented is focused at more efficient heterogeneous
enterprise and uniform data management procedures. It
involves a set of novel mathematical models, methods,
and the supporting CASE tools for object-based
representation and manipulation of heterogeneous
enterprise systems data. The architecture is portal-based.
2. Managing the enterprise systems
Unfortunately, a brute force application of the so-
called “industrial” enterprise software development
methodologies (such as IBM RUP, Microsoft MSF,
Oracle CDM etc.) to heterogeneous enterprise data
management, without an object-based model-level
theoretical basis, results either in unreasonably narrow
“single-vendor” solutions, or in inadequate time-and-cost
expenses. On the other hand, the existing generalized
approaches to information systems modeling and
integration (such as category and ontology-based
approaches, Cyc and SYNTHESIS projects
[2,7,8,10,12,13]) do not result in practically applicable
(scalable, robust, ergonomic) implementations since they
are separated from state-of-the-art industrial technologies.
A number of international and federal research programs
proves that the technological problems of heterogeneous
enterprise data management are critical [11].
Thus, the suggested technology of integrated
development and maintenance of heterogeneous internet-
based enterprise software systems has been created. The
approach is based on rigorous mathematical models and it
is supported by software engineering tools, which provide
integration to standard enterprise-scale CASE tools,
commonly used with software development
methodologies. The approach eliminates data duplication
and contradiction within the integrated modules, thus
increasing the robustness of the enterprise software
systems (ESS). The technology takes into consideration a
set of interrelated ESS development levels, such as data
models, software applications, “industrial”
methodologies, CASE, architecture, and database
management.
The technology elements include: conceptual
framework of ESS development; a set of object models
for ESS data representation and management; engineering
tools, which support semantic-oriented ESS development
and intelligent content management, i.e., the
ConceptModeller tool and the intelligent content
management system (ICMS) [18,19]; portal architecture,
ESS prototypes and full-scale implementations [16,19].
3. Modeling the enterprise lifecycle
For adequate modeling of heterogeneous ESS, a
systematic approach has been developed, which includes
object models for both data representation and data
management [18-20]. The general technological
framework of ESS development provides closed-loop,
62
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
two-way construction with re-engineering. The latter
feature is really critical for ESS verification, which
critically increases system robustness and reliability.
The general technological framework of ESS
development contains stages, which correspond to data
representation forms for heterogeneous software system
components, communicating in the global environment.
Such data representation forms include natural language,
mathematical models, engineering tools integration, and
content management. The data representation forms are
further detailed by the representation levels.
Content-oriented approach to ESS data management
allows data/metadata generalization on the common
model basis, uniform managing heterogeneous objects,
and adequate modeling of the internet environment,
which is critical for ESS robustness and reliability.
The object nature of the “class-object-value” model
framework provides compatibility with traditional
OOAD, as well as with other certain promising
approaches ([15], [17]) and helps to extend the mentioned
approaches to model the ESS internet-based
environments. The following technological
transformation sequence, according to the models
developed, is suggested: (i) a finite sequence object, such
as a !-calculus term [1]; (ii) a logical predicate - higher
order logics is used; (iii) a frame as a graphical
representation [14]; (iv) an XML object, with the class
declaration generated by the ConceptModeller
engineering tool [18]; (v) a UML diagram, where the data
scheme is as a part of the ESS (meta)data warehouse.
Therewith, the warehouse content representation is
based on semantic network situation model, which
provides intuitive transparency for problem domain
analysts when they construct the problem domain
description. The model can be ergonomically visualized
through a frame-based notation. W arehouse content
management is modeled as a state-based abstract machine
and role assignments, which naturally generalizes the
processes of similar engineering tools, such as (portal
page template generation, portal page publication cycle,
role/access management etc. Therewith, the major content
management operations are modeled by the abstract
machine language. The language has a formal syntax and
denotation semantics in terms of variable domains.
4. Managing SSDL: sequential elaboration
The ConceptModeller engineering tool [18] assists in
semantically-oriented visualized development of
heterogeneous ESS data warehouse scheme. Therewith, a
semantic network-based model is suggested, which works
in nearly natural-language terms, intuitively transparent to
problem domain analysts. Model visualization is based on
frame representation of the warehouse data scheme.
Thus, due to deep integration with mathematical
models and state-of-the-art CASE toolkits, the
ConceptModeller tool provides a closed-loop, continuous
ESS development cycle (from formal model to data
warehouse scheme) with a re-engineering feature.
Therewith, frames are mapped into specific ordered lists.
The ICMS tool is based on an abstract machine
model, and it is used for problem-oriented visualized
heterogeneous ESS content management and portal
publication cycle. The ICMS tool features a flexible
content management cycle and role-based mechanisms,
which allow personalized content management based on
dynamically adaptive access profiles and portal page
templates. Due to scenario-oriented content management,
the ICMS provides a unified portal representation of
heterogeneous data and metadata objects, flexible content
processing by various user groups, high data security, a
higher ergonomics level and intuitively transparent
complex data object management. Therewith, the data
object classes of the ESS warehouse are represented by
order lists of <attribute, type> format, and templates – by
ordered lists of <attribute, type, value> format.
5. Pattern-based development for enterprise
systems
The general ESS development framework [19,20]
potentially allows application of a “spiral-like” lifecycle
to the general ESS development framework, which
includes sequential elaboration of ESS warehouse scheme
after each iteration of the development cycle. Another
benefit is ESS “tuning”, specifically, ESS software and
data warehouse component-wise improvement, by
applying a “spiral-like” lifecycle and subsequent
verification. Also, requirement “tracing” implemented is
possible through reverse engineering and/or verification,
and followed by correction and/or optimization. As for
building a repository of ESS “meta-snapshots”, the
system could be “reincarnated” to virtually any previous
state using component-wise strategy. Also, building a
“pattern catalogue” [6] for heterogeneous ESS, based on
the integrated repository of various ESS state “meta-
snapshots”. Further, developing a repository of
“branches” makes possible “cloning” slight ESS
variations for the “basis”. As for the DSLs, it is possible
to develop a formal language specification [3] for ESS
requirement specification; let us call it Requirement
Specification Language or RSL. Finally, the existing ESS
“meta-snapshot” repository components can be adjusted
to match the new requirements, and the desired
components can be reused.
Thus, the ESS development framework implies
software lifecycle variations according to waterfall, spiral,
evolution, and incremental approach. Though ESS
63
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
development framework tends to be iterative, in certain
cases, the waterfall model is possible and reasonable.
An essential feature of the general ESS development
framework is its two-way organization. The approach
provides reverse engineering possibility both for ESS in
general, and their components in particular. The practical
value of the approach is provided by the verifiability of
heterogeneous ESS components at the uniform level of
the problem domain model, which is practically
independent upon the hardware and software environment
of the particular component. Therewith, a major
theoretical generalization is a possibility of
mathematically rigorous verification of the heterogeneous
ESS components by a function-based model. A critical
issue for engineering practice of huge and complex ESS,
is that the models suggested are oriented at a very
promising “pure” objects approach, which is a strategy of
the state-of-the-art enterprise-level component
technologies of Microsoft .NET and Oracle Java, where
any program entity is an object.
An essential benefit of the approach suggested is a
possibility of adaptive, sequential “fine tuning” of ESS
heterogeneous component management schemes in order
to match the rapidly changing business requirements.
Such benefit is possible due to the reverse engineering
feature of the integrated general iterative framework of
ESS development. The reverse engineering is possible
down to model level, which allows rigorous component-
wise ESS verification. Thus, conventional reengineering
and verification can be enhanced by flexible correction
and “optimization” of the target ESS in strict accordance
with the specified business requirements. This is possible
due to the suggested model-level generalization of the
iterative, evolutionary ESS development framework.
Another benefit of the suggested ESS development
framework is a possibility of building a “catalogue of
templates for heterogeneous ESS”, which is based on an
integrated metadata warehouse, i.e., a “meta-snapshot”
repository. Thus, the software development companies
get a solution for storing relatively stable or frequently
used configurations of heterogeneous ESS. The solution
allows avoiding the integration problems of “standard”
ESS components and/or combinations. The approach
allows serious project savings for clients due to ESS
developer’s “meta-snapshot” repository with a number of
similar integrated solutions to the system required.
The above consideration gives way for “meta-
snapshot” repository development, which stores the
chronological sequence of ESS solutions as a tree with
the “baseline” version and slightly different “branches”
for ESS variations. This is analogous to software
engineering tools for version control. The approach
allows a reasonable selection of most valuable
deliverables of the ESS lifecycle phases, and organization
of similar solution “cloning”. Therewith, the “clones”
may be created both for different client enterprises, and
for different companies of a single enterprise.
Further discussion could cover the prospective areas of
“meta-snapshot” repository development. First of all, to
describe the metadata warehouses and the related
enterprise-level business requirements it seems
reasonable to develop new DSL-type problem-oriented
meta-languages. Let us call them the MetaW arehouse
Description Language (MW DL) and the Requirement
Specification Language (RSL) respectively. Further, the
formal models, outlined in the paper and given a more
detailed coverage [19,20], allow interrelating the RSL and
MW DL entities. Semantic-oriented search mechanisms
based on semantic networks with frame visualization will
assist in revealing the components of ESS “meta-
snapshot” repository, which provide the closest matching
to the new requirements. The approach potentially allows
terms-and-cost-effective and adequate transforming of the
existing ESS components in order to match the new
requirements with minimum corrections effort and,
consequently, with minimum labor expenses. Therewith,
the global perspective it becomes possible to reuse certain
ESS components for current or new clients. Selection
criteria for such “basic” components may be percentage
of reuse, ease of maintenance, client satisfaction, degree
of matching business requirements etc.
6. Portal-Based Implementation: ITERA Oil-
And-Gas Group
The methodology has been approved by internet and
intranet portals in ITERA International Group of
Companies. In terms of system architecture, the portals
provide assignments with certain content management
rights, e.g. view, modify, analyze, and generate reports.
Problem-oriented form designer, report writer, online
documentation and administration tools make an
interactive interface toolkit. The enterprise warehouse
supports integrated storage of data and metadata.
During the design stage, problem domain model
specifications are transformed by the ConceptModeller
SDK to UML diagrams, then by Oracle Developer/2000
integrated CASE tool – to ER diagrams and, finally, into
target IS and enterprise content warehouse storage
schemes. Portal implementation process included fast
prototyping and full-scale integrated Oracle-based
implementation. The fast portal prototype has been
designed to prove adequacy of the content-based data
models, methods and algorithms. Upon prototype testing,
a full-scale ESS portal-based toolkit has been
implemented. W eb pages automatically generated by the
enterprise content management system are published at
ITERA Group intranet portal and official internet site.
64
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Portal architecture has been designed, implemented
and customized according to technical specifications
outlined by the author and tested for several years in a
heterogeneous enterprise environment. Implementation
terms and costs have been reduced about 40% compared
to commercial software available, while features range
has been essentially improved. Advanced personalization
and content access level differentiation substantially
reduces risks of the enterprise data damage or loss.
Upon customizing theoretical methods of finite
sequences, categories, semantic networks, computations
and abstract machines, a set of models have been
constructed including problem domain conceptual model
for enterprise content dynamics and statics as well as a
model for development tools and computational
environment in terms of state-based abstract machines,
which provide integrated object-based content
management in heterogeneous enterprise portals. For the
model collection, a generalized development toolkit
choice criteria set has been suggested for information
system prototyping, design and implementation.
A set of SDKs has been implemented including
ConceptModeller visual problem oriented CASE-tool and
the content management system. According to the
conceptual approach, a generalized interface solution has
been designed for Internet-portal, which is based on
content-oriented architecture with explicit division into
front-end and back-end sides. To solve the task of
building the architecture for enterprise content
management, a fast event-driven prototype has been
developed using ConceptModeller toolkit and
PowerScript and Perl languages. After prototype testing, a
full-scale object-oriented enterprise content management
portal-based architecture has been implemented. The full-
scale enterprise portal has been customized for content
management and implemented in a 10,000 staff
enterprise. The obtained results proved reducing
implementation terms and costs as compared to
commercial software available, and demonstrated high
scalability, mobility, expandability and ergonomics.
Portal design scheme is based on a set of data models
integrating object-oriented management of (meta)data.
The data models used integrate methods of finite
sequences, category theory, computation theory and
semantic networks and they provide enterprise content
management in heterogeneous interoperable globally
distributed environments. Due to the approach, costs of
enterprise content management, maintenance and
integrity support have been essentially reduced, while
portal modernization, customization and performance
optimization procedures have been simplified. The results
obtained have been used for development of a number of
portals in ITERA Group: CMS, intranet/internet portals.
The models, methods and SDKs make a foundation for
portal-based enterprise content management in ITERA
International Group of Companies. According to ITERA
experts, the portal implementation has resulted in a
substantial annual cost reduction, while content
management efficiency has increased essentially.
7. Domain-Driven Messaging System for a
Distributed Trading Company
A trading corporation used to commercially operate a
proprietary Microsoft .NET-based message delivery
system for information exchange between the
headquarters and the local shops. The system was client-
server based. The client included a local database and a
W indows-based messaging service, while the server side
consisted of a W eb service and central database. The
operation/maintenance challenges were: complicated
client-side code refactoring; difficult error
localization/reduction; inadequate documentation; and
decentralized configuration monitoring/management for
remote shops. To solve the problems mentioned, an
approach based on domain-driven development [5] and
Domain Specific Languages (DSL) has been suggested.
The approach included problem domain modeling and
DSL development for managing problem domain objects.
The DSL-based model helped to conquer problem
domain complexity, to filter and to structure the problem-
specific information. It also provided a uniform approach
to data representation and manipulation. W e used an
external XML-based DSL, which extended the scope of
the enterprise application programming language [9]. The
methodology instance included the following steps: DSL
scope detection, problem domain modeling, DSL notation
development, DSL restrictions development, and DSL
testing. The approach was client-side focused, since this
is the most changeable and challenging task. Lifecycle
model is iterative, the solution is based on a redesigned
architecture pattern. The W indows service is a constant
part of the application, which contains a DSL parser. The
DSL parser input is a current message transfer map.
The DSL scope included rules/parameters of message
transfer, and new types of messages. Different shops may
have different configuration instances, which made the
client-side message processing/transfer structure.
The next methodology stage was building DSL-based
semantic object model [9]. W e got three object types:
messages, message transfer channels and message
transfer templates. DSL describes object metadata, i.e.,
configurations and manipulation rules. Templates were
core model elements, and channels were links between
template instances. Templates and channels together
made message maps. DSL described the maps, i.e. the
static part of the model, while messages referred to
system dynamics and states.
65
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
Templates define actions with messages, i.e. transform
or route them. Templates were grouped into the
IMessageProcessingPattern interface. Standard routing
templates were: content-based router, filter, receiver list,
aggregator, splitter, and sorter. W e also produced a
number of domain-specific templates for system
reconfiguration, server interaction, etc. Channels were
used for message management. In the graph of map
messaging, templates are represented as nodes, while
channels are arcs between certain templates. In our case,
two types of channels were implemented: “peer-to-peer”
channel and error messages channel. Based on DSL
classes, messaging maps were built, which were later
used by parser to generate system configuration. At this
stage, DSL syntax and semantics were built. Each
messaging map, generally, a script, was instantiated by a
file. Messaging map was built as an XML document,
which defined system configuration and contained
templates for routing, message processing, transfer
channels and their relationships.
W hile parsing messaging map, the parser creates
channel objects based on DSL channel descriptions. Then
it configures the messaging system by creating message
processing objects in a similar way. Finally, the parser
instantiates the I/O channels, and creates the required
relationships between channels and message processor.
The resulting DSL-based system configuration was
functionally identical to the initial, !#-based one.
DSL-based refactoring resulted in an enterprise trade
management system with transparent configuration and a
standard object-based model. The DSL developed solved
the problem of messaging management. Since changes
are chiefly localized within the transfer configuration, the
change management has been dramatically simplified.
The DSL-based methodology conquered complexity,
made the proprietary system an open, scalable, and
maintainable solution. The approach is easily customized
to fit a broad class of similar proprietary systems.
8. The Air Transportation Planning System
for Russian Central Transportation Agency
Air traffic planning system is an area of work-in-
progress. The problem is to develop remote access to the
planning data. An operating solution currently exists.
However, it is based on an outdated TAXXI-Baikonur
technology, which is no longer evolving after early
2000s. The technology involves component-based
visualized assembling of the server application. The
ready-made VCL library components from Borland had
been integrated with proprietary TAXXI components.
The client side is an XML browser, i.e. a "thin" client.
The TAXXI technology is limited Microsoft W indows
framework, which is the only possible basis for both
client and server-side applications. According to the State
Program of Planning System Updates, the Main Air
Traffic Management Centre is going to create the new
remote access solution. The internet-based architecture is
to be implemented in Java technology and to operate on
the Apache web server platform. The solution is to query
Oracle-based data centre, process the query output and
retrieve the results of the air traffic planned capacities to
an intuitive and user-friendly GUI.
The practical application of the solution is the global
enterprise-scale integrated system, which is providing a
uniform and equal information access to all of the
international air traffic participants. The similar
globalization processes are underway in Europe and the
U.S.A. The suggested pattern-based and component-wise
approach is going to unify the issues of the architecture-
level update and application migration in Russia. The
methodology will also simplify the integration challenges
of the global air traffic management software solution.
9. 6D-modeling for nuclear power plants
Another challenging aspect of the methodology
implementation is related to high-level template-based
software re-engineering for nuclear power plants (NPP).
To provide worldwide competitive level on the nuclear
power plant production, it is necessary to meet the quality
standards throughout the lifecycle, high security under
long-term operation, terms-and-costs reduction for new
generation facilities development. The above conditions
could be satisfied only under a systematic approach,
which combines state-of-the-art production potential,
advanced control methods, and software engineering
tools. Each stage of the NPP lifecycle is mapped into a set
of business processes, where people and ESSs interact.
Identifying operation sequences, the systems form
business process automation standards. For example,
workflow mechanisms can assist in building enterprise
standards on electronic documents validation and
approval. During a certain NPP lifecycle, the enterprise
systems acquire information on it. Finally, each of the
enterprise systems reveals certain NPP aspects: design,
technology, economics etc. Thus, various objects, the
systems together describe NPP as a huge object.
Heterogeneous nature of the data objects, and millions of
units, make NPP a high complexity information object.
A major competitiveness criterion in nuclear power
industry is a set of electronic manuals, which helps to
assemble, troubleshoot, repair NPP etc. Such manual set
provides transparent information models of NPP (units),
which allow getting information on the object without
directly contacting it. Such a versatile description,
combined in a single data model is often referred to as a
6D model, which includes 3D-geometry, time and
resources for operating the plant. Since mechanisms for
66
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
information searching, scaling, filtering and linking,
should provide complete and non-contradictory results,
the information models should have well-defined
semantics. The uniqueness of data entry assumes
information model data acquisition by the enterprise
systems throughout the lifecycle. W hile a single
information model can be derived out of a single system,
the 6D model should combine information models of a
number of systems. The methodology for 6D model
suggests portal-based system integration, which can be
based on a “platform” capable of entire lifecycle support.
The further information model development assumes
monitoring system state changes and their influence to the
other parts of the system. This helps to immediately react
on critical issues in NPP construction, which can be used
for decision making. A wrong decision would be made
otherwise under incomplete or incorrect information.
Among major nuclear industry issues is a concept of
typical optimized nuclear reactor. The idea is to select
typical invariant units for rapid “template-based”
development of a slightly varying versions set. Applying
the suggested methodology to the 6D information model
of the reactor, is a promising approach to pattern-based
component-wise NPP series development.
11. Conclusion
Implementation of the suggested approach allowed to
developing a unified ESS, which integrates a number of
heterogeneous components: state-of-the-art Oracle-based
ERP modules for financial planning and management, a
legacy HR management system and a weak-structured
multimedia archive. The implementation of internet and
intranet portals, which manage the heterogeneous ESS
warehouse content, provided a number of successful
implementations in diversified ITERA International
Group of companies, which has around 10,000 employees
in nearly 150 companies of over 20 countries. The
systematic approach to ESS framework development
provides integration with a wide range of state-of-the-art
CASE tools and ESS development standards.
Other implementations and work-in-progress areas
include: air transportation planning system, messaging
system for a trading enterprise, a nuclear power plant and
banking solutions. Each of the implementations is a
domain-specific one, so the system cloning process is not
straightforward, and it requires certain analytical and
CASE re-engineering efforts. However, in most cases the
approach reveals patterns for building similar
implementation in series, which results in substantial
term-and-cost reduction of 30% and over. The series can
be applied both to subsidiaries, as it has been done in
ITERA and is being done in Renaissance, and to different
enterprises, as in case of the other clients.
References [1] Barendregt H.: The lambda calculus (rev. ed.), Studies in
Logic, 103, North Holland, Amsterdam, 1984 [2] Birnbaum L., Forbus K., W agner E. et al.: Combining
analogy, intelligent information retrieval, and knowledge
integration for analysis: A preliminary report. In: ICIA 2005,
McLean, Virginia, USA, 2005 [3] Cook S., Jones G., Kent S., W ills A.C.: Domain-Specific
Development with Visual Studio DSL Tools, Pearson
Education, Inc, 2008, 524 pp. [4] Curry H., Feys R.: Combinatory logic, Vol.1, North Holland,
Amsterdam, 1958
[5] Evans E.: Domain-Driven Design: Tackling Complexity in
the Heart of Software. Addison W esley, 2003, 560 pp. [6] Fowler M.: Analysis Patterns: Reusable Object Models,
Addison W esley, 1997, 223 pp. [7] Guha R., Lenat D.: Building Large Knowledge-Based
Systems: Representation and Inference in the Cyc Project.
Addison-W esley, 1990 [8] Güngördü Z., Masters J.: Structured Knowledge Source
Integration: A Progress Report. In: IKIMS 2003, Cambridge,
MA, USA, 2003 [9] Hohpe G., W oolf B.: Enterprise Integration Patterns:
Designing, Building, and Deploying Messaging Solutions.
Addison W esley, 2003, 736 pp.
[10] Kalinichenko L., Stupnikov S.: Heterogeneous information
model unification as a pre-requisite to resource schema
mapping. In: ITAIS 2009, Springer, 2009, pp.373-380
[11] Kanazawa S., Fujiwara M. et al.: R&D Trends for Future
Networks in the USA, the EU, and Japan. NTT Technical
Review, Vol. 7, No.5, May 2009, p.p.1-6 [12] Lenat D., Reed S. Mapping Ontologies into Cyc. In: AAAI
CWOSW 2002, Edmonton, Canada, 2002 [13] Panton K., Reed S., et al.: Automated OW L Annotation
Assisted by a Large Knowledge Base. In: ISWC 2004,
Hiroshima, Japan, 2004, pp. 71-80 [14] Roussopulos N.: A semantic network model of databases.
Toronto Univ., 1976 [15] Scott D.: Lectures on a mathematical theory of
computations. Oxford Computing Laboratory Technical
Monograph. PRG-19, 1981, 148 pp.
[16] Sushkov, N.; Zykov, S.: Message system refactoring using
DSL. In: CEE-SECR’09, Moscow, Russia, 2009, pp.153-158 [17] W olfengagen V.: Event Driven Objects. In: CSIT'99,
Moscow, Russia, 1999, pp.88-96 [18] Zykov S.: Concept Modeller: A Frame-Based Toolkit for
Modeling Complex Software Applications. In: IMCIC 2010,
Orlando, FL, USA, pp. 468-473
[19] Zykov S.: Pattern-Based Development of Enterprise
Systems – from Conceptual Framework to Series of
Implementations. In: ICEIS 2011, Beijing, China, 2011,
SciTePress, Vol.4, pp. 475-478
[20] Zykov S.: Pattern Development Technology for
Heterogeneous Enterprise Software Systems. Journal of
Communication and Computer, Vol.7, No4, David Publishing
Co., 2010, pp.56-61
67
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
68
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering
AUTHORS INDEX
Allidina, Alnoor 7 Ammann, Eckhard 1 Błaszczyk, Jacek 7 Balvetti, R. 13 Bargellini, M. L. 13 Battaglia, M. 13 Befort, Marcel 31 Botticelli, A. 13 Bozanceff, G. 13 Braseth, Alf Ove 16 Brunetti, G. 13 Casadei, G. 13 Chelyshkova, Marina 58 Djuraev, Simha 22 Filippini, A. 13 Guidoni, A. 13 Koolma, Hendrik M. 25 Malinowski, Krzysztof 7 Monat, André S. 31 Normantas, Vilius 37 Øritsland, Trond Are 16 Pancotti, E. 13 Puccia, L. 13 Rubini, L. 13 Schaeffer, Satu Elisa 47 Schroeder, Marcin J. 41 Tripodo, A. 13 Turrubiates-López, Tania 47 Yitzhaki, Moshe 22 Yukech, Christine M. 53 Zampetti, C. 13 Zvonnikov, Victor 58 Zykov, Sergey V. 62
69
Proceedings of International Conference on Complexity, Cybernetics, and Informing Science and Engineering