data warehouse cit masters report

Data Warehousing in Fisheries: A Case Study of The National

Fisheries Resources Research Institute (FIRRI)

By

Onyango Gerald

B.Sc(Mak), M.Sc(Mak).

Department of Information Systems,

Faculty of Computing and Information Technology, Makerere University.

Email: [email protected] / Phone : +256782523740

A Project Report Submitted to School of Graduate Studies in

Partial Fulfillment of the Requirements for the Award of the Degree of

Master of Science in Computer Science of Makerere University

OPTION : Management Information Systems

Supervisor : Dr. Ogao Patrick


Faculty of Computing and Information Technology, Makerere University.

Email: [email protected],+256-41-540628, Fax:+256-41-540620

October 2006

Declaration

I, Onyango Gerald, do hereby declare that this Project Report is original and has not been published

and/or submitted for any other degree award to any other University before.

Signed: .......................................................... Date: ...........................................

Gerald Onyango,

B.Sc., M.Sc.

Approval:

This Project Report has been submitted for Examination with my approval as the supervisor

Signed: .......................................................... Date: ...........................................

Dr. Patrick Ogao, Ph.D.


Faculty of Computing and Information Technology.

i

Dedication

To Dad and Mum who made it possible.

“Good things in life do not come by easily”

ii

Acknowledgments

There are a number of people who made all this possible. Thanks be to God for the strength and wis-

dom he gave me throughout the study, and to various people who assisted me in one way or another

that enabled me see the fruits of this project.

My sincere appreciation goes to my supervisor, Dr. Patrick Ogao, without whose help this work would

not be as it is.

Without the support of my parents and siblings, this study would not have been nurtured to fruitation.

I also acknowledge all my friends and classmates for having made my academics at Makerere joyous

and fruitful. Special thanks go to my coursemates Mr. Ssemwogerere Tom and Mr. Ndyanabo Antony

who provided me with the core software I used in this project.

MAY THE ALMIGHTY GOD BLESS YOU ALL ABUNDANTLY

iii

Abstract

Information on the status and trends in fisheries is key to sustainable exploitation and management of

fisheries resources. In Uganda, the organisation charged with research and dissemination of fisheries

information is The National Fisheries Resources Research Institute (FIRRI). FIRRI packages this in-

formation in brochures, posters, videos, and press releases. However, preparation of this information,

within FIRRI, is an uphill task because most of this information is scattered in files and other storage

media scattered among the Institute’s different research disciplines. This called for a system that can

centralise the storage and dissemination of the information generated within FIRRI. A Data Warehous-

ing system was the system of choice to remedy the situation.

Work on the development of the Data Warehousing System commenced with the development of a

Data Mart for one of the research disciplines. A Data mart for the Fish Biology and Ecology disci-

pline was designed and developed using Microsoft SQL Server 2005. SQL Server Integration Services

(SSIS) was used to develop the Extract, Transform, and Load (ETL) tools, while SQL Server Analysis

Services (SSAS) was used to develop the dimensional data cubes. Microsoft Excel, fitted with Cube

Analysis add-in, was chosen as the enduser interface. Validation of the system proved it to be func-

tioning as required. The results of the study show that it is possible to centralise information storage

and retrieval in fisheries using a data warehouse. It provides evidence that centralised data storage,

information retrieval, and reporting in FIRRI is both possible and attainable.

Shortage of time could not allow for the development of a fully fledged Data Warehouse, complete with

a web interface. Since it was only possible to build a data mart for one of the disciplines within FIRRI,

it is proposed that future work comprise of the development of an Enterprise wide Data Warehouse that

can even be accesses through the World Wide Web / Internet.

iv

Contents

1 Introduction 1

1.1 Background to The Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 The Case for a Data Warehouse (DW) and Data Mining . . . . . . . . . . . . . 3

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3.1 General Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3.2 Specific Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Justification of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Literature Review 5

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 The National Fisheries Resources Research Institute (FIRRI) . . . . . . . . . . . . . . 5

2.3 Data Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3.1 The Data Warehousing Concept . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3.2 Data Warehouse Design Model . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3.3 Data Warehouse Structure and Tools . . . . . . . . . . . . . . . . . . . . . . . 7

2.3.4 Data Mart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.5 Current Approaches to Data Warehouse (DW) Development . . . . . . . . . . 8

2.3.6 Analysis of the Current Approaches to Data Warehouse Development . . . . . 9

2.4 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5 Some Fisheries Data Warehousing Projects . . . . . . . . . . . . . . . . . . . . . . . 11

3 Methodology 13

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 System Study and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4 System Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.5 System Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Implementation 16

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

v

4.2 System Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2.1 Fisheries Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2.2 Usage of FIRRI’s Information System . . . . . . . . . . . . . . . . . . . . . . 17

4.2.3 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2.4 Non-functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2.5 User Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2.6 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.3.1 Logical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3.2 Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3.3 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4 System Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.4.1 Database Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.4.2 Data Extraction, Transformation, and Load (ETL) . . . . . . . . . . . . . . . . 29

4.4.3 Analysis Cubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.4.4 Enduser Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.4.5 System Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.5 Conclusions, Limitations, and Future Work . . . . . . . . . . . . . . . . . . . . . . . 46

4.5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.5.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

vi

List of Figures

4.1 Warehouse Architecture for the FIRRI Fisheries Data Warehouse . . . . . . . . . . . . 20

4.2 Fish Catch Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3 Fish Prey Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.4 Fish Biology Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.5 Fish Gonad Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.6 Fish Catch-Length Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.7 Fish Catch Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.8 Fish Catch-Length Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.9 Fish Biology Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.10 Fish Prey Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.11 Fish Gonad Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.12 Date Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.13 Water Body Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.14 Catch Type Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.15 Species Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.16 Fishing Gear Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.17 Sex and Maturity Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.18 Prey Type Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.19 Length Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.20 Excel Source Adapter Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.21 Flatfile Source Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.22 Unique Identifiers for Rows of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.23 Dimension and Fact Table Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.24 Foreach Loop Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.25 Centralised Running / Execution of Packages . . . . . . . . . . . . . . . . . . . . . . 32

4.26 Execute SQL Task Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.27 Data Cleaning and Surrogate Key Generation Control Flow . . . . . . . . . . . . . . . 33

4.28 Cleaning the Dimension-Data Flow and Generating Surrogate Keys . . . . . . . . . . 33

4.29 Fuzzy Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.30 Correcting Spelling Mistakes and Adding Missing Data Entries . . . . . . . . . . . . . 34

4.31 Sorting The Data Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

vii

4.32 Surrogate Key Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.33 Inner Joining Two Data Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.34 Mapping Source Data to The Destination Table . . . . . . . . . . . . . . . . . . . . . 37

4.35 Fact-Data Cleaning and Transformation Data Flow Task . . . . . . . . . . . . . . . . 38

4.36 Surrogate Key Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.37 Loading Data into the Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.38 Example of Warehouse Dimension Table Data Flow Task . . . . . . . . . . . . . . . . 39

4.39 Example of Warehouse Fact Table Data Flow Task . . . . . . . . . . . . . . . . . . . 40

4.40 Structure of Analysis Data Cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.41 Length Frequency Distribution ofOreochromis niloticus . . . . . . . . . . . . . . . . 42

4.42 Maximum Weight of Selected Fish Species Across 4 Quarters . . . . . . . . . . . . . 43

4.43 Check of Rows Written to the Data Warehouse . . . . . . . . . . . . . . . . . . . . . 44

4.44 System Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.45 Rows Written During Data Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

viii

List of Acronyms

1. CMR - CSIRO Marine Research

2. DW - Data Warehouse

3. EDW - Enterprise Data Warehouse

4. FAO - Food And Agricultural Organization

5. FIRRI - Fisheries Resources Research Institute

6. GUI - Graphic User Interface

7. IT - Information Technology

8. MOLAP - Multidimensional Online Analytical Processing

9. NARS - National Agricultural Research System

10. ODBC - Office Database Connectivity

11. OLAP - Online Analytical Processing

12. PARI - Public Agricultural Research Institute

13. SSAS - SQL Server Analysis Services

14. SSIS - SQL Server Integration Services

ix

Chapter 1

Introduction

1.1 Background to The Study

Fisheries is the industry or occupation devoted to the catching, processing, or selling of fish, shellfish,

or other aquatic animals (The Free Dictionary, 2006) [40]. The fisheries and aquaculture sector is ex-

tremely important in terms of food security, revenue generation and employment (Sugiyama, 2005)

[39]. Sugiyama (2005) [39] noted that catching or farming aquatic resources makes an integral con-

tribution to rural livelihoods in many parts of the Pacific region. But although fisheries resources are

renewable they can be depleted through unsustainable exploitation. It is therefore important to ensure

that there is guided development and management of this asset so that it can continue contributing to

the livelihood of the people who depend on it.

Sugiyama (2005) [39] argues that knowledge of the status and trends of fisheries, including socio-

economic information on fishing communities, is a key to using aquatic resources in a sustainable way.

Sugiyama (2005)[39] believes that adequate fisheries data, and information that are timely and reli-

able, provide a basis for sound policy development, better decision-making and responsible fisheries

management. Sugiyama (2005) [39] says that this information is required at the national level for the

maintenance of food security and for describing social and economic benefits of fisheries, as well as

for assessing the validity of fisheries policy and for tracking the performance of fisheries management.

Sugiyama (2005) [39] also observed an increasing need for fisheries information outside of the govern-

ment sector. Consequently information is a priority for the sustainable exploitation and management of

fish stocks (FIRRI, 2003) [11].

In Uganda, the national institution mandated to undertake, promote and streamline fisheries research

and to ensure dissemination and applicationof research results, is The National Fisheries Resources Re-

search Institute (FIRRI) (FIRRI, 2000; FIRRI, 2001; FIRRI, 2002; FIRRI, 2003; FIRRI, 2004; FIRRI,

2005; FIRRI, 2006a;) [8], [8], [9], [10], [11], [12], [13]. FIRRI contributes to the fisheries sub-sector

1

developmental objective by providing information to guide sustainable management of capture fisheries

resources and development of aquaculture (FIRRI, 2003) [11]. Therefore, the final products of FIRRI’s

outputs are Technical Guidelines containing technologies, methods and advice to guide development

and management of the fisheries of different aquatic systems, and development of aquaculture. The in-

formation packages are produced in the form of books, booklets, fact sheets, brochures, posters, video

films and press releases to service providers and resource users. FIRRI disseminates this information to

fishing communities and other end-users through community barazas, workshops, radio and TV shows

(FIRRI, 2006) [14].

The information system within FIRRI was originally manual and paper based. With the advent of com-

puters, different functional areas within the institute developed their own file management systems.

This independent keeping of files by the individual functional areas created data redundancy and incon-

sistency, program-data dependence, inflexibility, poor security, and lack of data sharing and availability.

Inmon (1993) [18] argues that factors such as: having the same data present on different systems, in

different departments; difficulty in getting timely, meaningful information; multiple systems giving

different answers to the same business questions; and limited analysis by decision makers and policy

planners due to non-availability of sophisticated tools and easily decipherable, timely and comprehen-

sive information calls for a data warehouse.

Having noted that lack of effective and timely information from research to fishing communities and

other stakeholders is a major constraint to sustainable fish production and utilisation, FIRRI is devoted

to the development of a Fisheries Database and Information centre (FIRRI, 2006) [14]. FIRRI hopes

that the development of a Fisheries Database and Information centre will facilitate timely acquisition

and exchange of information on all water bodies in the country and also create a central station from

which this information can be obtained. However, Mahadik (2002) [26], claims that as the quantities

of information and data handled by organisations increase, the traditional means of analysing the data

like reports and query tools prove to be inadequate. Mahadik (2002) [26] believes that powerful system

navigation and information exploration tools that use hypermedia, dynamic visual querying and tree

maps should be availed. Mahadik (2002) [26], asserts that it should be ensured that: employees are

free to communicate with each other and share data and information freely across the organisation;

data dictionaries are created and regulated; and online data is reformatted before being inserted into the

company wide databases. Mahadik (2002) [26] claims that the latest development in analytical tools,

that enables organisations find meaning in their data, is data mining.

Therefore, to enhance the availability of information in the Ugandan fisheries sector, there is need to

enhance the processing efficiency of the data analysed in FIRRI, and also enhance the dissemination

capacity. The optimal solution to this problem would be to build a data warehouse in FIRRI and add

data mining tools to the data warehouse to improve on data analysis, and information dissemination,

efficiency.

2

1.1.1 The Case for a Data Warehouse (DW) and Data Mining

A data warehouse is a subject-oriented, time-variant, nonvolatile database or repository of information

collected from many different sources and centrally stored, usually in a single location. Information

from multiple sources in different locations, applications or files, be it in different operating units or

departments, can be standardised and stored in a single repository. This consolidation of the data store

eliminates the reconciliation of inconsistent data, avoids lengthy adhoc manipulation of data from dif-

ferent sources, and improves data quality. Data can be retrieved in a matter of minutes.

In a data warehousing system, the user can create most of his or her own queries and reports by him or

her self. He/she recognises the information (s)he wants, makes a request (query) to the data warehouse,

and data or information stored in the warehouse is delivered to him/her. Tools such as Online Analytical

Processing (OLAP) and data mining improve enduser analysis capabilities and shrink the time between

the occurrence of an event and the subsequent alert of the managers. In a data warehousing system,

data can be retrieved in a matter of minutes.

A data warehouse contains only ”trusted” data, data that has been cleaned. This guarantees the accuracy

and reliability of the data and information in and from a warehouse. Historical data is also stored within

the data warehouse, which information can be used to carry out trend analysis and ”what” if analyses.

1.2 Problem Statement

FIRRI scientists are required to produce field reports after every field trip. The institute is also required

to come up with quarterly reports, and an annual report, detailing the activities performed within the

period, as well as packaged information for the stakeholders in the fisheries sector. Under the current

set up whereby information and data is scattered among different functional areas, integration of data,

compilation of reports and packaging of information for stakeholders is an uphill task. Dissemination

of information through community barazas, workshops, radio and TV shows does not enable real time

provision of information as one has to wait untill such an event is organised before one can get access

to the information. And the current information system cannot handle complicated ad hoc enquiries

such as cross-tabulation.

1.3 Objectives

1.3.1 General Objective

To develop a Data Warehousing information system that supports fisheries data and information storage,

and retrieval, from a centralised location.

3

1.3.2 Specific Objectives

i. To review work similar to, and literature related to, Data Warehousing in fisheries

ii. To design a Data Warehousing system for centralised storage and retrieval of fisheries data

iii. To implement a Data Warehousing system for centralised storage and retrieval of fisheries data

iv. To validate the fisheries Data Warehousing system developed

1.4 Justification of the study

A data warehouse system will provide a centralised location for data and information storage and re-

trieval and a range of ad hoc and standardised query tools, analytical tools, and graphical reporting

facilities for data mining. These tools will perform high-level analyses of hidden patterns, relation-

ships, or trends, and will drill into more detail where needed. The patterns inferred from the data could

be used to predict future behaviour and guide decision-making. The data warehouse will create an

increase in information availability, efficiency, scope and accuracy of scientific reporting, and provide

new opportunities for reaching out and passing on information to the fisher community via the Inter-

net. Sugiyama (2005) [39] believes that with more accurate and timely information at the community

level, the public is likely to be better informed and supportive of efforts to manage fisheries and aquatic

resources in a responsible manner. She claims that disseminating timely and readily understandable in-

formation on the status and trends of fisheries should help ensure transparency in fisheries management,

as called for by the Code of Conduct for Responsible Fisheries.

1.5 Scope

Conceptually, the study will focus on the design, development, and implementation of a data warehouse

and data mining system that can enhance data analysis and information dissemination from FIRRI.

Geographically, the study will focus on the Fisheries Resources Research Institute.

4

Chapter 2

Literature Review

2.1 Introduction

This section entails the reviews of works of various writers that are deemed relevant to the study.

2.2 The National Fisheries Resources Research Institute (FIRRI)

Established in 1947, the National Fisheries Resources Research Institute (FIRRI) is a semi-autonomous

Public Agricultural Research Institute (PARI) of Uganda operating under the National Agricultural Re-

search System (NARS) (FIRRI, 2006) [14]. As the fisheries research arm of NARO, research by FIRRI

is currently focusing on providing information for increasing and sustaining fish production and utili-

sation (FIRRI, 2004 [firri04]; FIRRI, 2006 [14]).

FIRRI has its headquarters in Jinja and an outstation at Kajjansi, where the scientific work is organised

according to disciplines such as: Stock Assessment (fish biomass, exploitation rates, etc), fish biol-

ogy and ecology (biodiversity and conservation), fish habitat quality and quantity, distribution, food

webs, physico-chemical characteristics and primary production (water quality), invertebrate studies

and food webs, wetlands, aquatic weeds such as water hyacinth, socio-economics (livelihood analysis,

co-management), aquaculture/fish farming (seed production, feeds (live feeds and commercial feeds)

and pond management/commercialaquaculture (FIRRI, 2003 [11]; FIRRI, 2006 [14]).

5

2.3 Data Warehousing

2.3.1 The Data Warehousing Concept

Data warehousing is the process of collecting data to be stored in a managed database in which the data

are subject-oriented and integrated, time variant, and nonvolatile for the support of decision making

(Chan, 1999) [3]. Data from the different operations of a corporation are reconciled and stored in a

central repository (a data warehouse) from where analysts extract information that enables better deci-

sion making (Cho and Ngai, 2003) [4]. Data can then be aggregated or parsed, and sliced and diced as

needed in order to provide information (Fox, 2004) [15].

According to Inmon (1993) [18], a Data Warehouse is a subject-oriented, integrated, time-variant, non-

volatile collection of data used in support of decision making processes (Inmon, 1993) [18]. ”Subject-

oriented” means that a data warehouse focuses on the high-level entities of the business (Chan, 1999)

[3] and the data are organized according to subject (Zenget. al. [43], 2003; Maet. al., 2000) [24].

For example, fisheries data would be organised by fish species, water body, or type of fishing gear.

”Integrated” means that the data are stored in consistent formats, naming conventions, in measurement

of variables, encoding structures, physical attributes of data, or domain constraints (Maet. al., 2000

[24]; Chan, 1999 [3]; O’Leary, 1999 [30]). For example, whereas an organization may have four or

five unique coding schemes for ethnicity, in a data warehouse there is only one coding scheme (Chan,

1999) [3].

”Time-variant” means warehouses provide access to a greater volume of more detailed information over

a longer period (Zenget. al., 2003)[43] and that the data are associated with a point in time (Chan, 1999

[3]; O’Leary, 1999 [30]), such as month, quarter, or year. Warehouse data are non-volatile in that data

that enter the database are rarely, if ever, changed once they are entered into the warehouse (Zenget.

al., 2003 [43]; Chan, 1999 [3]). The data in the warehouse are read-only; updates or refresh of the data

occur on a periodic, incremental or full refresh basis (Zenget. al., 2003) [43]. Finally, ”nonvolatile”

means that the data do not change (Chan, 1999) [3].

2.3.2 Data Warehouse Design Model

Data Warehouses typically use the multidimensional and relational storage structures (Bose, 2006)

[2]whose models are developed using Cube and Star schemes (Velasquezet. al., 2005)[[42]]. The

multidimensional structure physically stores the data in array-like structures that are similar to a data

cube. In the relational structure the data is stored in a relational database using the star and snowflake

schemas (Bose, 2006 [2]; O’Leary, 1999 [30]).

Bose (2006) [2] observed that summary data are modeled as a multidimensional data cube consisting

6

of measure and dimension attributes. With the support of OLAP for multidimensional analysis, users

can synthesise enterprise information through comparative customised viewing, and analyse historical

and projected data (Maet. al., 2000) [24]. Bose (2006) [2] noted that at the instance level the values

of the dimension attributes are assumed unique to determine the values of all measure attributes. He

affirms that a multidimensional data cube consisting of dimension and measure attributes is called a

fact table. In addition, a multidimensional data cube contains a dimension table for each dimension

attribute in the star schema (Bose, 2006) [2]. O’Leary (1999) [30] observes that at the centre of the

star is the event table (the fact table), and surrounding the event, at the points of the star, are dimension

tables containing the resources, time, and location dimensions.

O’Leary (1999) [30] noted that fact tables hold particular measures of the event, and include foreign

key references to dimension tables at each of the points on the star. He also observed that the par-

ticular process being modelled influences which resources, events, agents or locations are included in

the dimensions and the number of tables used to represent each. O’Leary (1999) [30] observed that

dimension tables describe the properties of the dimensions at hand, and are kept on each dimension that

decision makers would like to either rollup or drill down. He noted that in some situations there is a

need to generate additional tables from some of the dimensions, resulting in a snowflake schema.

The snowflake schema is a star schema whose dimension tables are normalised (Bose, 2006) [2], and

whose dimensions have embedded foreign keys so that dimension tables have relationships with other

dimension tables, creating tables for attributes within a dimension table (O’Leary, 1999) [30]. A DW

design is often built around a time dimension so that the DW contains data over several periods of

time. This feature allows users to perform extensive yearly, quarterly, and monthly analyses that help

enable the identification of patterns and trends (Theodoratos and Sellis, 1999) [41]. O’Leary (1999)

[30] noted that use of the star or snowflake schemas is aimed at limiting access and query problems in

a data warehouse environment.

2.3.3 Data Warehouse Structure and Tools

Ma et. al., (2000) [24] observed that the data warehouse has a distinct structure whose components

include current detail data, older detail data, lightly summarised data, highly summarised data, and

meta-data. The current detail data reflects the most recent happenings, stored on disk and accessed by

end-user analysts (Maet. al., 2000) [24]. Meta-data are data that describe the meaning and structure

of business data, as well as how they are created, accessed, and used (Devlin, 1997) [6]. They describe

what is in the data warehouse, specify what comes into and out of the data warehouse, schedule extracts

based on a business events schedule, documenting and monitor data synchronisation requirements, and

measuring data quality (Maet. al., 2000) [24].

Chan (1999) [3] observed that Internet-based Decision Support Systems (DSS) and Executive Informa-

7

tion Systems (EIS) can be built on data warehouses to support distributed decision processes. The

web-based multidimensional on-line analytical processing (MOLAP) systems enable users to view

summary data by zooming in on details by column, by row, or by cell displayed on multiplayer spread-

sheets (Chan, 1999) [3]. According to Chan (1999) [3], this ”slice and dice” capability enables users

examine data horizontally, and changes in aggregated performance data can be traced back to unit-level

productivity. He observed that in a networked environment, this means that decision makers can link

forecasting with operational data in a dynamic manner.

According to Pipe (1997) [32], a warehousing system has: design tools to design warehouse databases;

source data acquisition tools to capture data from source tables and databases, and clean, enhance, trans-

port and apply it to data warehouse databases; a data manager to manage and access warehouse data;

graphic user interface (GUI) and Web-based data access tools to provide end-users with tools they need

to access and analyse warehouse data; a delivery manager to distribute warehouse data and other infor-

mation objects to other data warehouses, desktop applications, and Web servers on a corporate Intranet;

middleware to connect data access tools to warehouse databases, and the delivery manager to target

systems; an information directory to provide administrators and business users with information about

the contents and meaning of data stored in warehouse databases; and warehouse management tools to

administer data warehouse operations. GUIs use multimedia to enhance the impact on information and

decision-making support generated through data warehousing (Maet. al., 2000) [24].

2.3.4 Data Mart

A data mart is a subset of the enterprise-wide data warehouse (O’Leary, 1999 [30]; Poeet. al., 1998

[33]; Singh, 1998 [38]). Unlike the data warehouse which is traditionally meant to address the needs of

the organisation from an enterprise perspective, a data mart has a limited scope and performs the role

of a departmental, regional or functional data warehouse (Bose, 2006 [2]; Singh, 1998 [38]; Poeet. al.

[33], 1998). According to Bose (2006) [2], the difference between an enterprise DW and a data mart is

essentially a matter of scope. Because data marts are developed for specific business purposes, system

design, implementation, testing and installation are less costly than for data warehouses (O’Leary,

1999) [30]. O’Leary (1999) [30] observed that where data warehouses can take years to develop, data

marts can be developed in a few months, at a much smaller cost. A data mart often uses aggregation or

summarisation of the data to enhance query performance.

2.3.5 Current Approaches to Data Warehouse (DW) Development

A number of ways can be used to build a DW. An organisation may either build a single DW, or have

a multi-tier data warehouse system (Bose, 2006) [2]. In the single data warehouse architecture there

is one centralised data warehouse where source systems feed their data to directly, and where the end

users obtain data and/or information. In the multi-tier warehousing system, an enterprise data ware-

8

house coexists with several data marts (Bose, 2006 [2]; Pipe, 1997 [32]). In this system one can either

have independent or dependent data mart architecture (Zenget. al., 2003 [43]; Bose, 2006) [2]. In the

independent data mart architecture, the source systems feed the data marts, and the warehouse is fed

by the data marts (Bose, 2006 [2]; Zenget. al., 2003 [43]). According to Bose (2006) [2] and Zeng

et. al. (2005)[43], the dependent data mart architecture has a central data warehouse that contains the

”corporate view of the data” and supplies the departmentaldata marts with the specific data they require.

Bose (2006) [2] observed that the variations on the multi-tier approach that have been implemented in

organisations are top-down, bottom-up and hybrid. With the top-down approach, data marts are seen

as follow-on to the construction of an Enterprise Data Warehouse (EDW) (Atkinson, 2001 [1]; Pipe,

1997 [32]). In this implementation approach, data flows from the source to enterprise warehouse to

data marts, and the implementation follows the waterfall approach (Pipe, 1997) [32].

The bottom up approach is to first build data marts and then an EDW (Atkinson, 2001 [1]; Pipe, 1997

[32]). The enterprise warehouse evolves bottom-up as a new layer on top of existing data marts, and

data marts are loaded directly from source systems, and the enterprise warehouse is loaded from the

data marts (Bose, 2006) [2]. In this case the corporate data warehouse project begins with a small pilot

project for a specific subject area. Bose (2006) [2] affirms that in so doing, both a data mart and the

first data warehouse are created simultaneously.

Bose (2006) [2] and Pipe (1997) [32] noted that the hybrid approach, or parallel strategy, might include

elements of both the top-down and bottom-up approaches. They argue that in this approach the enter-

prise model is developed first and documented at a high level, so certain subject areas may be modelled

in more detail as warehouse development proceeds. In this approach, therefore, the data warehouse is

developed incrementally (Pipe, 1997) [32].

2.3.6 Analysis of the Current Approaches to Data Warehouse Development

Bose (2006) [2] says that the implementation of a single Data Warehouse establishes a single, reliable

source for data and provides a more integrated solution for reporting and decision support across func-

tional areas. However, they are not well suited for highly specialised data needs (Bose, 2006) [2]. Bose

(2006) [2] believes that the data mart solution, where a data warehouse coexists with data marts, may

be well suited for highly specialised data needs. Pipe (1997) [32] affirms that in the multi-tier ware-

house architecture involving an EDW and underlying data marts, data is located where it can deliver

the highest availability and performance, without sacrificing integrity or control over the management

of corporate data for business decision-making. Bose (2006) [2] and Pipe (1997) [32] claim that in the

long run, a multi-tier warehouse architecture/system is the optimal one .

9

Both the top-down and bottom-up approaches to data warehouse development have their strengths and

weaknesses. The advantage of the Top-down implementation approach is that it leads to a planned, in-

tegrated multi-tier solution, and improves the consistency of information in the data marts (Bose, 2006

[2]; Atkinson, 2001 [1]; Pipe, 1997 [32]). However, a top-down approach can create problems when

the data marts are added later and it cannot deliver solutions fast enough for an organisation to quickly

exploit new business opportunities (Atkinson, 2001) [1]. Bose (2006) [2] and Pipe (1997) [32] argue

that this approach usually takes more time and is relatively costly.

Bose (2006) [2] and Pipe (1997) [32] point out that the bottom-up approach gives quick results and a

high return on investment. However, if the spread of data marts in this approach is not controlled, there

can be integration problems between the data marts and the future EDW (Atkinson, 2001) [1]. Bose

(2006) [2]believes that the bottom-up approach eventually yields a disintegrated warehouse because the

data marts often do not conform to a common model. Pipe (1997) [32] concludes that an ideal solution

to the top-down and bottom-up approaches would be a synergistic marriage of the two approaches

to maximise the strengths, and minimise the weaknesses, of each approach. Pipe (1997) [32] claims

this strategy supports incremental and evolutionary data warehouse development. Bose (2006) [2]

advises that during development, too much must not be take on at once as this can leave users feeling

abandoned and the development team overwhelmed. He believes that an incremental approach yields

the best results.

2.4 Data Mining

Data mining is the process of applying artificial intelligence techniques to large data sets in order to de-

termine data patterns (Maet. al., 2000) [24] and extract previously unknown but significant information

(Singh, 1998) [38]. On the front-end (client side), data-mining tools allow users to analyse contents of

the data warehouse via graphical, tabular, geographic, and syntactic reports. The front-end data-mining

tool provides the user with an intuitive, graphical tool for creating new analyses and navigating the data

warehouse. This helps focus user’s analysis so that relevant information can be obtained faster and

more effectively (Maet. al., 2000) [24].

Data mining applications utilise information stored in the warehouse to generate business-oriented, end-

user-customised information (Maet. al., 2000) [24], and statistical summaries from different views of

data (Cho and Ngai, 2003) [4]. They can be applied in conjunction with OLAP to form an integrated

business solution (Cho and Ngai, 2003)[4]. Data mining is critical to the enterprise that wants to

exploit operational and other available data to improve the quality of decision-making and gain critical

competitive advantages (Maet. al., 2000) [24]. Accurate data identification and analysis improves

the quality of decision making; strong navigation, computation, synthesis capabilities make it possible

to gain critical competitive advantages; relevant information is obtained faster and time is used more

10

effectively (Maet. al., 2000) [24].

2.5 Some Fisheries Data Warehousing Projects

A number of countries have embraced the data warehousing and data mining concepts in their fisheries

sector. NetCoast (2001) [29] observed that The European Union implemented COASTBASE, a virtual

coastal and marine data warehouse for integrated, distributed information search, access and feedback.

The CoastBase client (based on HTML and Java) provides uniform, multilingual and interactive access

to all CoastBase services. The ultimate aim of the project was to improve marine and coastal research,

assessment, policy making and cooperation along Europe’s coasts.

Rees and Finney (2000) [34] report that CSIRO Marine Research (CMR) Australia, developed a data

warehouse using ORACLE 8i. The client software is written in Java and uses Java’s Remote Method

Invocation (RMI) and Java Database Connectivity (JDBC) to connect to the underlying ORACLE data

store. The database schema has marine, biological, chemical and physical oceanographic parameters,

and is designed so that sampled parameters are primarily referenced via spatial coordinates and a time

stamp. This allows users examine integrated datasets according to spatial and temporal constraints. For

example, a biologist interested in species distribution in a particular geographic area can also acquire

any available habitat data (e.g. water column parameters and seafloor sediment composition) for that

region and time period, a feature that may be important if one is looking for any correlations between

habitat type and species distribution. This is also useful if models are to be employed in order to in-

terpolate between known data points to produce a species distribution map, or to plot the potential

distribution of a species based on its known habitat or other environmental surrogate. Users invoke the

data warehouse interface via CMR’s web page.

Kupca (2004) [22] reports that in Iceland, the Marine Research Institute, Reykjavik, developed a fish-

eries data warehouse structured around 48 tables that include biological sample data, catch data, stom-

ach data, tagging data and incomplete data that do not fit the common DW structure. To make the DW

portable and platform independent, the Linux operating system and PostgreSQL RDBMS were cho-

sen. PHP was used as the programming language to develop a web-based interface and the upload and

extraction parts. An SQL command sent to the database retrieves and presents its result in an HTML

table. To ease the use of the DW, there are predefined table aliases and groups of useful joins that enable

composition of complex multiline queries within seconds. Metadata are split into five topics (biological

samples data, stomach data, catch data, acoustic data and tagging data) and include the information on

time, position and species in each topic.

Fisheries data warehouses have been put to varying advantages. Scottish Executive Publications (2006)

[36] asserts that the IFISH data warehouse brings together fisheries data as a shared resource and has

11

resulted in a substantial reduction in the burden on each department to produce data for the other. The

Government of British Columbia (2006)[16] reports that their webpage FishInfo BC provides on-line

access to the British Columbia Fisheries Data Warehouse and to federal-provincial fisheries datasets

where all data are linked to ”active” maps and to standard tables and reports that allow users choose

exactly what they want to know about any location, and then print their own personal reports. In support

of India’s development of a DW, that includes fisheries among other agricultural disciplines, Sharma

et. al. (2006) [37] claimed that a DW can improve the quality of research and planning, reduce the

duplication of research efforts, encourage dissemination of research findings, and facilitate qualitative

research supported by agricultural databases. Therefore, development of a data warehouse that has data

mining capabilities would go a long way in improving fisheries management in the Ugandan fisheries

sector.

12

Chapter 3

Methodology

3.1 Introduction

This chapter entails the approach used to undertake the fisheries data warehouse development project.

The case study method was used because it helps get the gist of the study since case studies normally

look at cross sectional research focused at subject variables. According to Olsen and Marie (2004) [31],

cross sectional research gives better subject selection and measurements. The project was implemented

in three main phases: system study and analysis; system design; and system development. The means

used to validate the system under development is also included.

3.2 System Study and Analysis

According to Kakinda (2000) [19], research design is the structure or nature of research, which may

either be qualitative or quantitative. A qualitative approach was used to evaluate information system

and datasets, and procedures pertaining to management of research work in FIRRI. Fact finding was

based on:

1. Interviews carried out with the staff of the National Fisheries Research Institute (FIRRI). Sample

interview questions are presented in Appendix 1, 2, and 3.

2. Document analysis: a number of documents were analysed so as to gain more understanding of

the type and contents of the reports required of the data warehousing system. Documents studied

included FIRRI’s Annual reports for the years 1997 - 1998, 1999 - 2000, 2000 - 2001, 2002 -

2003, 2003 - 2004, and 2004 - 2005; field reports; as well as FIRRI’s Survey Report on its study

of the Upper Victoria Nile River under the Bujagali Hydroelectric Power Project (NARO/FIRRI,

2001) [28].

13

3.3 System Design

The methodology used to design the dimensional model was adapted from Kimball (1996) [20] and

Connolly and Begg (2001) [5]. First, the subject matter for the data mart was identified and the grain

of the fact table (what a fact table record represents) decided. The grain of the fact table determined the

minimum level at which data was referenced, and also enabled the identification of the dimensions, as

well as the grain of each of the dimension tables. The dimensions were then identified and conformed.

For each dimension chosen, all dimensional attributes that filled out each dimensional table were de-

scribed.

Next, facts that populate each fact table record were chosen. Facts comprised of numeric additive

quantities, and were expressed at the level implied by the grain. Once fact tables had been selected,

each fact table was re-examined to determine whether there are opportunities to use precalculations.

This applied to those values that may be incorrectly derived by users. As many text descriptions as

possible were then added to the dimension tables. The duration of the database, how far back in time

the fact table goes, was then chosen. All records related to an old attribute name were linked to that old

attribute name and those related to the new attribute name were accordingly linked to it, so as to track

slowly changing dimensions. Query priorities and the query modes was then decided.

3.4 System Development

System development involves the implementation, testing and refinement of the system. The data ware-

house was developed iteratively using the Data Warehouse Lifecycle based on Zachman’s approach. A

multi-tier warehouse architecture involving an EDW and underlying data marts was developed using

the hybrid/parallel approach to data warehouse development (Bose, 2006 [2]; Atkinson, 2001 [1]; Pipe,

1997 [32]). The project started with the development of a data mart for the Fish biology and ecology

research discipline in FIRRI. The steps to developing a data warehouse/mart as advocated by Roland

and Leonard (2005), Velasquezet. al. (2005) [42], and Chan (1999) [3] were considered during the

development process. Sample data was run through the system to establish whether it was functioning

as required.

3.5 System Validation

Validation entails the confirmation by examination and provision of objective evidence that an infor-

mation system has been implemented correctly and that it conforms to user needs and intended uses.

During design and development planning, the validation plan was developed to identify required vali-

dation tasks and procedures for reporting anomalies and their resolution. In the requirements definition

phase, testable user and functional requirements for the data warehouse were established.

14

During the design phase, care was taken to ensure that the software development and management

procedures were consistent with accepted practices. At the implementation phase, functional testing

was performed to check if the system performs functions as specified in the functional specifications.

To facilitate tracking and problem resolution processes, each batch of input data extracted was assigned

a unique identifier linking it back to the source. The system was also fitted with a log file as an indirect

link between the source and the input transaction. The mapping utilised by the ETL tool was reviewed,

care being taken to ensure that the data being loaded into specific data elements is in fact being sourced

from the right tables in the source systems. Each data element was given a formal description and

a mapping back to the source table(s) used to populate it during the ETL process. Simple database

queries were run on the tables in the warehouse to count the number of records in the data warehouse.

These counts were then compared with the number of data entries in the source systems. Equality of

these counts led to the assumption that records were not left out due to an error during the ETL or

simple load process. This was further verified by the lack of errors (not necessarily warnings) in the

exception reporting by the ETL tool. For additional verification, actual rows from both the source and

data warehouse tables were randomly selected, printed, and listed side by side for comparison.

15

Chapter 4

Implementation

4.1 Introduction

This chapter entails what was used as the basis for understanding and implementing the fisheries data

warehousing project. The findings of the requirements elicitisation, analyses of the findings and subse-

quent use of the findings to develop the system are presented.

4.2 System Analysis

4.2.1 Fisheries Data

Fisheries data typically comprises information on the activity of fisherfolk and their catches, plus results

of scientific surveys aimed at learning more about the biology, population dynamics, and movements

of the species concerned. This information is then used by the fisherfolk and fisheries managers to

anticipate the most favourable conditions and locations for fishing, and thereby maximise the catches

while reducing effort. The data can also be used to conduct independent assessment of stocks and mod-

eling of the resource dynamics, so as to be able to either support, confirm or dispute the soundness of

decisions made by the relevant fishery managers.

Fisheries data includes: reports and information summaries on catch and landings data; scientific ob-

servational data; and environmental Data. Summary (and derived) data includes aggregated statistics

by region, season, lengths of fish caught, etc. Catch-and-effort data includes information on the fishing

activity of the fishermen (boat movements; hours, locations and depths fished; gear type used, etc.).

Scientific survey data is similar to the catch-and-effort data from commercial operations but is less bi-

ased towards areas where catches would be expected to be highest. Biological data may be collected on

commercial boats or scientific surveys. Environmental Data is ancillary data such as water temperatures

and other hydrologic conditions, which may provide insight into the biological patterns observed.

16

4.2.2 Usage of FIRRI’s Information System

Information gathering, analysis and dissemination in FIRRI is shared among its eight disciplines. The

most important discipline is, reportedly, the Fisheries Biology and Ecology Discipline. Field data is

mainly obtained on a quarterly basis, though sometimes data is obtained monthly. This data is stored

and used to perform routine analyses and produce standardised reports such as: field reports; quarterly

reports; and annual reports. Workshop papers and papers meant for scientific publications are also pre-

pared from the reports generated from the system.

Reports prepared by FIRRI’s Fish Biology and Ecology discipline is aimed at answering questions

on the structure of fish stocks and how this varies with location; and the life history of fish species,

particularly with regard to age and growth, recruitment to the fishery, reproductive biology, migration

and other movement patterns, diet and place in the ecosystem, and natural mortality in the absence

of fishing pressure. This information is used to prepare brochures, enact legislation aimed at fisheries

management and conservation, monitor the environmental conditions and fish habitats in the different

water bodies, regulate the fishing effort, and recommend to stakeholders the best fishing practices that

may lead to sustainable exploitation of the fish resources.

The stakeholders in the fisheries sector, who need and make use of the information generated and pack-

aged in FIRRI, include: the fisherfolk; The National Agricultural Research Institute (NARO); Interna-

tional and regional collaborators such as research institutions around Lake Victoria [The Kenya Marine

Fisheries Research Institute (KMFRI) in Kenya, The Tanzania Fisheries Research Institute (TAFIRI)in

Tanzania], and The Lake Victoria Fisheries Organisation of the East African Community; the Uganda

Fisheries Department; several departments at Makerere University, such as Zoology department and

The Makerere University’s Institute of Environment and Natural Resources; NGOs such as the Uganda

Fisheries and Fish Conservation Association (UFFCA); legislators in Uganda’s parliament; schools;

and the general public.

Currently, data is mainly stored in Excel files based on desktop computers scattered among the different

functional areas of the institute. Some historical data are still being stored in paper files, though efforts

are being made transform them into an electronic form for storage in a relational database. Data analysis

is being carried out using Microsoft Excel, SPSS, and other statistical packages. GIS ArcView was

being used to present some of the results from the analysis. Most reports are written using Microsoft

Word.

4.2.3 Functional Requirements

There are a number of functionalities expected of any information system aimed at improving the

current information management in FIRRI. The system should:

17

1. be able to extract data from various files in different storage areas and store them in a centralised

location from where data and information can be retrieved;

2. be able to generate fisheries reports directly from the system;

3. be able to store a massive amount of data over a long period of time so as to enable trend analysis;

4. have an allowance for occasional loading of lump-sum data in the event that alot of data is accu-

mulated during a given quarter.

4.2.4 Non-functional Requirements

The four major non-functional requirements include: (i) system accessibility, (ii) system security, (iii)

Software operability (iv) system performance.

1. System accessibility: any end-user should be in position to access dynamic reports that have

resulted from the analysis of fisheries data.

2. System security: depending on repository content, the system should provide for differing levels

of access to repository content.

3. Software operability: The initial system should be able to make use of the software environment

within FIRRI, and therefore be able to run on the windows operating system.

4. System performance: the system should be able to handle at least 40 concurrent end-users.

4.2.5 User Requirements

The users require a system with:

1. A facility for generation of fisheries reports;

2. Ability to centralise data and information retrieval;

3. A provision for aggregations and generating summaries;

4. Ability to carry out trend analysis;

5. Ability to project trends;

6. Provision of reliability of at least 98 percent uptime.

4.2.6 System Requirements

Since the warehouse database software should run on the windows operating system platform, Mi-

crosoft SQL Server 2005 is recommended. The Microsoft SQL Server 2005 has the SQL Server

Management Studio and SQL Business Intelligence Development Studio that have the Server 2005

Integration Services (SSIS) and SQL Server 2005 Analysis Services (SSAS) that are ideal for ware-

18

house development. To run SQL Server 2005, the following hardware and software are required.

1. VGA or higher resolution;

2. A Microsoft mouse or compatible pointing device;

3. Microsoft Internet Explorer 6.0 SP1 or later;

4. Internet Information Services (IIS) 5.0 or later;

5. ASP.NET 2.0;

6. Windows Installer 3.1 or later;

7. Microsoft Data Access Components (MDAC) 2.8 SP1 or later;

8. Itanium processor or higher;

9. Minimum Processor speed of 1 GHz;

10. Memory (RAM) of at least 512 MB;

11. Windows 2003, or higher, Operating system.

4.3 System Design

In light of informational content and nature of analysis required to come up with information, the sys-

tem that best redresses the shortcomings of the information system in FIRRI is a data warehousing

system. The architectural design of the new system, that shows how the data flows throughout the sys-

tem, is presented in Figure 4.1. The two processes a data warehouse undergoes are data loading (entry)

and access. Loading is carried out using Extract Transform and Load (ETL) tools, while warehouse

data can be accessed using OLAP tools. Therefore, data will be entered into the FIRRI data warehouse

using the ETL tools that extract data already entered into operational systems.

In FIRRI’s architectural design, data is extracted from operational data sources that include the opera-

tional system in FIRRI, flat files, the internet, or decentralised databases located in the district fisheries

offices within the country. The extracted data will be loaded into the staging area, where it will be

cleaned and loaded into the Data warehouse. The data in the warehouse will be inform of meta data,

summary data, and raw data. The warehouse has a provision for archiving and backing up the data.

From the data warehouse, the information and data is availed to the data marts. The data marts are

tailored around the different functional units within FIRRI such as aquaculture and Socioeconomics

(Figure 4.1) or FIRRI’s partners in fisheries information usage and delivery. Endusers in FIRRI’s dif-

ferent disciplines and partner institutions interact with the data marts and are then able to analyse or

mine the data and come up with their reports.

19

Figure 4.1: Warehouse Architecture for the FIRRI Fisheries Data Warehouse

The trigger for the ETL process will be changes and additions to source data, that will bring about a

processing requirement for the data. The data profile for FIRRI’s Fisheries Data Warehouse includes

quarterly extractions of fisheries data and dimensional updates, and occasional monthly input of the

data. Therefore, in FIRRI, the Data Warehouse ETL will have a set of quarterly processing require-

ments, where changes and additions to source data will be extracted and processed through the system

quarterly.

4.3.1 Logical Models

Being the most important discipline, the Fish Biology and Ecology discipline was chosen for develop-

ment of a data mart that will eventually lead to an Enterprise wide Data Warehouse (EDW) for FIRRI.

The data mart models consists of five fact tables and eight dimensions, found in the fish catch dimen-

sional model (Figure 4.2), fish prey dimensional model (Figure 4.3), biology dimensional model (Figure

4.4), gonad dimensional model (Figure 4.5), and catch-length dimensional model (Figure 4.6). A given

fish species was taken as the grain of the catch fact table, while an individual fish specimen was taken

as the grain for the biology, gonad, catch-length and prey fact tables. The conformed dimensions are

date, geography, species, water body, catch type, project, and fishing gear dimensions.

4.3.2 Facts

The fact tables are Catch, Catch-Length, Biology, Prey, and Gonad tables. The Catch fact table stores

catch sample data. It comprises of the additive measures weight of fish, number of fish, number of boat

crew, and number of fishing gear (Figure 4.7). The catch-length fact table stores the length measure-

ments of the sampled catch, and comprises of the semi additive fact length (Figure 4.8). The Biology

20

Figure 4.2: Fish Catch Model

Figure 4.3: Fish Prey Model

21

Figure 4.4: Fish Biology Model

Figure 4.5: Fish Gonad Model

22

Figure 4.6: Fish Catch-Length Model

Figure 4.7: Fish Catch Fact

fact table stores biological facts about fish sampled, and comprises of: an additive fact fish weight;

semi additive fact total length; and the non additive fact serial number (Figure 4.9). The prey fact table

contains data about the prey ingested by the fish. It comprises of the additive facts predator-weight,

prey-weight, total food weight, total food count, prey count; semi additive fact predator total length;

and the non additive fact digestive state (Figure 4.10). The gonad fact table stores gonadal statistics.

It stores the semi-additive facts fish weight, number of gonads, gonadal weight, total length, number

of eggs counted, and a non additive fact serial number (Figure 4.11). The attribute SourceID has been

included in all fact tables to point their data to the source of data extraction.

23

Figure 4.8: Fish Catch-Length Fact

Figure 4.9: Fish Biology Fact

Figure 4.10: Fish Prey Fact

24

Figure 4.11: Fish Gonad Fact

Figure 4.12: Date Dimension

4.3.3 Dimensions

Eight dimensions were identified among the five fact tables. The dimensions are Date, Water Body,

Catch Type, Species, Fishing Gear, Sex-Maturity, Prey Type, and Length dimensions.

Date Dimension

This dimension contains attributes that detail the time the data was collected. It has the levels Year, Half,

Quarter Month, and Date (Figure 4.12). The Month level has the attributes Month and MonthName.

The attribute SourceID points to the data source.

Water Body Dimension

The dimension contains attributes about the water body where the fish was caught. It includes attributes

such as: waterbody type, waterbody name, zone, station, location, fishing area (Figure 4.13). The

attribute SourceID points to the source of data extraction.

Catch Type Dimension

This dimension contains attributes detailing the nature of the sample, or catch type, and the fishing

25

Figure 4.13: Water Body Dimension

Figure 4.14: Catch Type Dimension

conditions prevailing at the time when the fish was caught. It includes the attributes catchtype, fish-

ing time, season, and moon (Figure 4.14). The attribute SourceID ties the data to its extraction source.

Species Dimension

Contains attributes detailing the hierarchy and levels in the nomenclature of a given fish species. It

includes the scientific names, abbreviations of scientific names, and common names of attributes such

as Kingdom, Phylum, Genus, Family, Order, Class and Species (Figure 4.15). The attribute SourceID

points to the source of the data.

Fishing Gear Dimension

Contains attributes detailing the characteristics of the fishing gear used to catch the fish sampled. At-

tributes such as geartype, size, fleet, ply and operation are included (Figure 4.16). The attribute Sour-

ceID points the data to the source of extraction.

SexMaturity Dimension

Contains attributes detailing the sexual characteristics and the maturity state of the fish sampled. At-

tributes such as sex, maturity, gonad state, fat content, and stomach fullness are included (Figure 4.17).

The attribute SourceID ties the data entry to its origin or extraction source.

Prey Type Dimension

Contains attributes that depict the type of prey eaten by a given fish. The attributes PreyNameShort

26

Figure 4.15: Species Dimension

Figure 4.16: Fishing Gear Dimension

Figure 4.17: Sex and Maturity Dimension

27

Figure 4.18: Prey Type Dimension

Figure 4.19: Length Dimension

and PreyName are included (Figure 4.18). The attribute SourceID points to the data source.

Length Dimension

Contains the attribute detailing the length sizes of fish sampled. It has the attribute length (Figure 4.19).

The attribute SourceID ties the data to its extraction source.

4.4 System Development

This section entails the physical and data staging design, and development of the system. Owing to the

need for the data warehouse system to run smoothly within the current software environment in FIRRI,

Microsoft SQL server 2005 was chosen as the database to be used for the initial development of the

data warehousing system as it was the only readily available database technology to the researcher.

Microsoft SQL server 2005 has the SQL Server Management Studio and SQL Business Intelligence

Development Studio that were used for the development of the databases and ETL tools, respectively.

In the Studios are the SQL Server 2005 Integration Services (SSIS) and SQL Server 2005 Analysis

Services (SSAS). SSIS has a set of built-in tasks, containers, transformations, and data adapters that

may not require the writing of any lines of code during warehouse development. These SSIS features

were used during data warehouse development.

4.4.1 Database Development

The staging database and data warehouse, for use in the fisheries data warehouse were created using

SQL Server Management Studio. The staging area was divided into two for use in data transformation

and validation. The second part of the staging area is used to verify that the right transformations have

28

Figure 4.20: Excel Source Adapter Extraction

Figure 4.21: Flatfile Source Adapter

been carried out on the data before the data is loaded into the data warehouse. Dimensional tables were

then created in each of the databases.

4.4.2 Data Extraction, Transformation, and Load (ETL)

Data extraction and load tools were developed using SSIS found in SQL Business Intelligence Devel-

opment Studio. The extraction, transformation, and load of data into the warehouse were divided into

extraction of data from source tables into the staging database; data transformation and cleaning before

load into the data warehouse; data load into the warehouse; and development of cubes before deploy-

ment to the warehouse server. The ETL packages can connect to a wide variety of data sources. These

were developed in the form of packages within SSIS projects. When the different drivers are connected

to a package, the package can extract data from either flat files, Excel spreadsheets, XML documents,

or tables and views in relational databases. A package can connect to relational databases using .NET

and OLE DB providers, and to legacy databases using ODBC drivers. Figures 4.20 and 4.21 show data

flow tasks which have Excel, Flatfile, or databases as their source and destination systems.

Extracting data from the source systems into the Staging Area

Before extracting data from the source tables, each row of data in the source tables was assigned a

unique identifier (e.g. FecundityID in Figure 4.22), that was mapped to the SourceID column in the

staging database. This unique identifier tied the data to its source table or file. In this way each data

entry in the warehouse can be traced back to the source table, file or folder.

The first step in creating the packages was to create an Integration Services project. The project included

templates for objects (data sources, data source views, and packages) used in the data transformation

solution. Connection managers, that connect packages to data sources and destinations, such as Excel

connection manager, OLE DB connection manager or Flat file connection manager were then added to

the package. After creating connection managers for source and destination data, Data Flow tasks were

added to the package. An example of a package that has data flow tasks added to the control flow is

29

Figure 4.22: Unique Identifiers for Rows of Data

Figure 4.23: Dimension and Fact Table Load

presented in Figure 4.23. The Data Flow tasks encapsulate data flow engines that move data between

sources and destinations, and provide the functionality for transforming, cleaning, and modifying data

as it is moved. Most of the extract, transform, and load (ETL) processes occur in the Data Flow tasks.

Source and destination adapters that point to source and destination tables were then defined, with

a connector joining the two as shown in Figures 4.20 and 4.21. The data flows between abstracted

sources and destinations that do not contain connectivity information, but instead contain references to

connection managers (e.g. localhost.BiodiversityStaging, Excel Connection Manager, Prey Connec-

tion Manager) that define physically where the data sources and destinations are. The data-flows for

extracting the data from the source systems and populating the dimension and fact tables in the staging

area are similar. To ensure high-speed data copying, transformations were not designed to be performed

on the data while it is moving from the source file to the staging destination table. Packages and used

30

Figure 4.24: Foreach Loop Containers

to populate the dimension and fact tables were developed, as outlined above, for each of the dimension

models.

The package that populates the Biology dimension model is designed to demonstrate the ability of a

package to iterate through any number of files in a folder and extract data from multiple file sources. It

uses the Foreach Loop container (Figure 4.24). When the package is run, the Foreach Loop Container

iterates through a collection of files in a folder. Each time a file is found that matches the set criteria,

the Foreach Loop Container updates a variable with the file name. This causes the connection manager

to connect to a different file, and the data flow task processes a different data set and loads it into the

staging area.

To enable centralised extraction of data from source systems, a package that enables centralised running

/ execution of all the packages that extract data from source systems into the staging area was then

created (Figure 4.25). At runtime, SQL queries that truncate the dimension and fact tables are executed

first using Execute SQL tasks (Figure 4.26). Package execution tasks, tasks that execute packages that

populate the different dimensional models, then execute next (Figure 4.25).

Data Cleaning and Transformation

Before being loaded into the data warehouse, data extracted from the multiple files was cleaned or trans-

formed using built-in transformations contained in SSIS. Surrogate keys for the fact tables are generated

and assigned before the data is loaded into the warehouse. The control flow for the data cleaning and

surrogate key generation is presented in Figure 4.27. At runtime, before the tasks that clean the dimen-

31

Figure 4.25: Centralised Running / Execution of Packages

Figure 4.26: Execute SQL Task Editor

32

Figure 4.27: Data Cleaning and Surrogate Key Generation Control Flow

Figure 4.28: Cleaning the Dimension-Data Flow and Generating Surrogate Keys

sion tables and asign surrogate keys are executed, an SQL task is used to find the prevailing maximum

dimension / surrogate key. The tasks passes on the maximum surrogate key as a variable to the data flow.

In the dimension load tasks, the fuzzy look up is used to lookup data rows with spelling mistakes

and correct such mistakes (Figure 4.28). The fuzzy lookup adapter looks up the correct spellings in a

reference table and replaces the column entry that has the wrong spelling or some missing data entry

(Figures 4.29 and 4.30). After correction of mistakes, the data flow is passed through a slowly chang-

ing dimension editor that compares data sets in the data flow with those in the destination table (Figure

4.28). New data rows are passed through while rows with changes are passed to an OLE command

adapter that updates the changing entry in the destination table.

33

Figure 4.29: Fuzzy Lookup

Figure 4.30: Correcting Spelling Mistakes and Adding Missing Data Entries

34

Figure 4.31: Sorting The Data Flows

For new data sets, a dimension key is generated. The source data is split up into two by a ”multicast”

adapter (Figure 4.28). One side only does a ”sort” (Figure 4.31) to prepare for the ”merge join”. The

other path first sorts and removes rows with duplicate sort values, then the script component is used

to generate and assign the surrogate keys (Figure 4.32). The maximum dimension key value, that was

passed on to the flow as a variable, is incremented by the script component for every row that passes

through. This then adds a surrogate key value to the data flow. The two data flows are then inner joined

on the sort keys using the merge join transformation (Figure 4.33) resulting in an updated data flow

with new surrogate keys. The data flow from the source is then mapped onto the destination dimension

table (Figure 4.34). This is done for all dimension tables.

In the fact table-load tasks, the conditional split adapter is used to lookup rows with missing column

entries and pass them over into an error table for further management and possible cleaning or addition

of missing data (Figure 4.35). Before the bad data is loaded to the error table, the audit transformation

is used to add information about the task and package where the error was detected so as to enable

corrections. Good data is passed on to a slowly changing dimension editor that compares the data flow

with the destination fact table. Any data that has the same unique ID as a data entry in the destination

table is not passed through, while that with changes is passed over to an OLE DB command for update

of the corresponding entry in the destination table.

New data flows are passed on to lookup transformations that lookup surrogate keys in the dimension

tables and assign the surrogate keys to the corresponding foreign keys in the fact tables (Figure 4.36)

before the data is inserted into the destination fact table. In the event of an error, the error is passed

35

Figure 4.32: Surrogate Key Generation

36

Figure 4.33: Inner Joining Two Data Flows

Figure 4.34: Mapping Source Data to The Destination Table

37

Figure 4.35: Fact-Data Cleaning and Transformation Data Flow Task

over to an error flow, and audit information added to it. A union join is used to join all error data flows

before the data flow that has errors is inserted into an error table (Figure 4.35). This process is the same

for all the fact tables.

Loading Data into the Warehouse

The package developed for loading data into the warehouse is presented in Figure 4.37. Whereas the

data flows loading the warehouse dimensions (Figure 4.38) are first passed to a sort editor that removes

any duplicated rows of data before being passed on to a slowly changing dimension adapter, flows that

populate the fact tables (Figure 4.39) are not first passed through a sort adapter before being passed on

to a slowly changing dimension adapter. In both cases the slowly changing dimension transformation

adapter compares incoming data with that in the warehouse. If the unique identifier of a row of incoming

data matches that for data in the destination table, and no changes have been effected on the column

entries, the row is not passed through. If the unique identifier matches that for data in the destination

table, but one or all of the columns in the incoming data has a change, the row flow is directed to an

OLE DB Command adapter that changes and updates the concerned row in the destination table. If the

unique identifier of the incoming data flow has no match in the destination table, then the incoming row

data is passed on as a new data output, and the destination adapter inserts the data as new data in the

warehouse table. This process is replicated for all fact and dimension tables.

4.4.3 Analysis Cubes

The analysis cubes were developed using SQL Server Analysis Services (SSAS). Analysis Services is

a middle-tier server for online analytical processing (OLAP) and data mining. The Analysis Services

38

Figure 4.36: Surrogate Key Assignment

Figure 4.37: Loading Data into the Warehouse

Figure 4.38: Example of Warehouse Dimension Table Data Flow Task

39

Figure 4.39: Example of Warehouse Fact Table Data Flow Task

system includes a server that manages multidimensional cubes of data for analysis and provides rapid

client access to cube information. Analysis Server is the server component of Analysis Services that

is specifically designed to create and maintain multidimensional data structures and provide multidi-

mensional data in response to client queries. The structure of the multidimensional cubes developed,

showing the data view of all fact and dimension tables, is presented in Figure 4.40.

4.4.4 Enduser Application

Endusers will access the data warehouse through Microsoft Excel. Microsoft Excel was chosen because

most of the endusers in FIRRI are already well versed with the use of Excel, and because Excel has an

addin, Excel addin for Analysis Services that works well with SQL Server Analysis Services Cubes.

The Microsoft Office Excel Add-in for SQL Server Analysis Services provides analysis capabilities and

flexible reporting for data imported into Excel from Analysis Services cubes. By invoking the add-in

from within Excel, the enduser can import data from the Analysis Services cubes, use Analysis Services

techniques to analyze the data, and then leveraging their existing Excel skills use Excel functionality to

manipulate and present the data in reports. From within Excel, the endusers can use Excel formatting

and calculation features, combine data from multiple dimensions, use drillthrough to see source data,

drill up and drill down, expand and collapse, isolate and eliminate, and pivot the data.

The enduser has the option to use the pivot table to generate a report like the one presented in Figure

4.41, or use the Cube analysis addin to create a report such as that presented in Figure 4.42. The enduser

can then format the report and produce colourful charts as that presented in Figure 4.42.

4.4.5 System Validation

This section entails the results of the different mechanisms put in place to validate the data warehouse

system developed. The system can extract data from multiple sources and centralise them into one loca-

tion as planned. This is evidenced by the availability of data from all source systems being found in the

warehouse. Comparison of the unique identifiers for the data in the source systems and that in the data

warehouse shows they are the same. Additional data warehouse verification by listing actual rows from

randomly selected tables in both source (Figure 4.22) and data warehouse (Figure 4.43) also show an

40

Figure 4.40: Structure of Analysis Data Cube

41

Figure 4.41: Length Frequency Distribution ofOreochromis niloticus

42

Figure 4.42: Maximum Weight of Selected Fish Species Across 4 Quarters

43

Figure 4.43: Check of Rows Written to the Data Warehouse

exact match, further confirmation that the Data Warehouse is functioning as required of it. The unique

identifier, referred to as FecundityID in the source system, is similar to that in the data warehouse where

it is referred to as the SourceID.

SSIS has a progress log that shows the time the execution of a package was started and ended, the

data source and its destination, the number of rows in the source table, the number of rows written

(extracted successfully) (Figure 4.44). This inbuilt mechanism validates the execution of the package.

In addition, there is colour coding of the adapters and transformations in the package as it is executing.

Yellow indicates the package is executing, while red shows that there is an error in the package and

therefore execution was not successful. The green colour is coded when the system execution has been

successful, and the number of rows written is also indicated on the connector between the source and

destination (Figure 4.45). Examination of the system logs show that the number of records in the source

systems are an exact match with those in the data warehouse. Equality of these counts is an indication

that records were not left out due to an error during the ETL or simple load process. This was further

verified by the lack of errors (not necessarily warnings) in the exception reporting by the ETL tool.

44

Figure 4.44: System Validation

45

Figure 4.45: Rows Written During Data Load

4.5 Conclusions, Limitations, and Future Work

4.5.1 Conclusions

This project focused on verifying whether designing and implementating a data warehousing system

in FIRRI would bring about centralisation of storage and retrieval of fisheries data and information.

Results show through a data warehouse can greatly improve the storage, retrieval, and dissemination of

fishries information. Given that the data warehouse enables aggregation of data and information from

different source systems, enables easy execution of complex queries, and enables real time dissemina-

tion and retrieval of data and information, one cannot underestimate its power in effecting positively

affecting the operations of the fisheries sector. It provides evidence that centralised data storage, infor-

mation retrieval, and reporting in FIRRI is both possible and attainable.

4.5.2 Limitations

In the course of this study, a number of problems were encountered. Time and financial constraints were

a major problem faced. Owing to the limitations in the time frame and financial limitations, it was not

possible to develop a web interface for the warehouse as a preferred software for use, webintelligence,

could not be procured.

46

4.5.3 Future Work

The data warehouse provides a mechanism of extracting data from any type of database management

system on any networked information system to the extent of facilitating data transmission and ex-

change on the Internet. This is vital for the dissemination of information to fisheries stakeholders who

are not within FIRRI. Since in this study it was only possible to design and develope a data mart, future

work should therefore focus on developing an Enterprise Wide Datawarehouse that can be accessed by

endusers even via the World Wide Web.

47

References

1. Atkinson E., 2001. Data Warehousing - A Boat records managers should not miss.Records

Management Journal, Vol. 11, No. 1. pp 35 - 43.

2. Bose R., 2006. Understanding Management Data Systems for Enterprise Performance Manage-

ment.Industrial Management and Data Systems Vol. 106 No. 1., pp. 43-59.

3. Chan S. S., 1999. The Impact of Technology on Users and the Workplace.New Directions for

Institutional Research. Volume 1999, Issue 103. pp 3 - 21

4. Cho V. and Ngai E.W.T., 2003. Data Mining for Selection of Insurance Sales Agents.Expert

Systems. Vol. 20, No. 3.

5. Connoly and Begg.

6. Devlin B., 1997. Data Warehouse: From Architecture to Implementation,Addison-Wesley, Read-

ing, MA

7. FIRRI, 1999. Annual Report 1998 - 1999


9. FIRRI, 2001. Annual Report 2000 -2001





14. FIRRI, 2006. FIRRI Profile. www.firi.go.ug (Retrieved on 13/13/2006).

15. Fox R., 2004. Moving from data to information OCLC Systems and Services:International

Digital Library Perspectives Volume 20 Number 3 pp 96-101

16. Government of British Columbia, 2006. FishInfo BC. (Retrieved on 7/4/2006).

17. Han J. and Kamber M., 2001.Data Mining: Concepts and Techniques. San Diego: Morgan

Kaufman.

18. Inmon W.H., 1993.Building the Data Warehouse, A WileyQED publication, John Wiley and

Sons, Inc. NewYork 123-133

19. Kakinda M. F., 2000. Introduction to social research.

20. Kimball R. and Ross M., 1996.The Data Warehouse Toolkit: The Complete Guide to Dimen-

sional Modeling.

21. Krishna P. R. and Kumar S.D., 2001. A Fuzzy Approach to Build an Intelligent Data Warehouse.

Journal of Intelligent and Fuzzy Systems 11 (2001) 23-32

22. Kupca V., 2004. A standardized database for fisheries data. Marine Research Institute,Reykjav(́i)k,

48

Iceland. CM 2004/FF:15 (Retrieved from on 20/03/2006)

23. Lee S. M. and Hong S., 2004. Impact of Data Warehousing on Organizational Performance of

Retailing Firms.International Journal of Information Technology and Decision Making Vol. 3,

No. 1. pp. 61-79.

24. Ma C., Chou D.C., Yen D.C., 2000. Data Warehousing, Technology Assessment and Manage-

ment.Industrial Management and Data Systems Volume 100 No. 3 pp. 125 - 135

25. Mackinnon M. J. and Glick N., 1999. Data Mining and Knowledge discovery in Databases - An

Overview.Australian and New Zealand Journal of Statistics. 41(3). pp 255 - 275.

26. Mahadik H., 2002. Mumbai:India infoline Ltd. Retrieved on 20/03/2006 from

http:www.infoline.com/worked%20on%20reasearch/Knowledgement%20management.htm

27. Mundy J., 2002. Relating to OLAP,Intelligent Enterprise, Vol. 5 No. 16. pp. 20 - 22.

28. NARO / FIRRI, 2001.Aquatic and Fisheries Survey of The Upper Victoria Nile. Final Report,

January 2001. Prepared for AES Nile Power.

29. NetCoast, 2001. A Guide to Integrated Coastal Zone Management: Simulation models and mod-

elling systems related to Integrated Coastal Zone Management (Retrieved from

http://www.netcoast.nl/tools/rikz/COASTBASE.htm on 25/03/2006).

30. O’Leary D.E. 1999. REAL-D: A Schema for Data Warehouses.Journal of Information Systems

Vol, 13. No, I pp. 49-62.

31. Olsen D.M. and Marie, D.M., 2004. Cross -Sectional Study design and Data analysis. Retrieved

on 26/03/2006 from, http://www.collegeboard.com/prod-downloads/yes/

4297 MODULE 05.pdfsearch=’cross%20sectional%20study’

32. Pipe P., 1997. The Data Mart: A New Approach to Data Warehousing.International Review of

Law Computers and Technology, Volume 11, No. 2, pp 251-261.

33. Poe V., Kauer P., Brobst S., 1998.Building a data warehouse for decision support. 2nd Edition,

Prentice Hall PTR, 1998.

34. Rees T. and Finney K., 2000. Biological data and metadata initiatives at CSIRO Marine Research,

Australia, with implications for the design of OBIS.Oceanography Vol. 13 No. 3. pp 60 - 65.

35. Roland P., Leonard E.H., T 2005. Data Warehousing: An Aid to Decision-Making.THE Journal,

Apr 2005, Vol. 32, Issue 9

36. Scottish Executive Publications, 2006. Chief Statistician’s Annual Report 2005. (Retrieved on

April 7, 2006).

37. Sharma S. D., Singh R. and Anil Rai A., 2006.Integrated National Agricultural Resources

Information System. GISdevelopment.net. (Retrieved 26/03/2006)

38. Singh H., 1998. Data warehousing: concepts, technologies, implementations and management,

Prentice Hall PTR, 1998.

49

39. Sugiyama S., 2005. Information Requirements for Policy Development, Decision-making and

Responsible Fisheries Management: What Data Should Be Collected?SPC Women in Fisheries

Information Bulletin No. 15. pp 24 - 29.

40. Thefreedictionary, 2006. http://www.thefreedictionary.com/Fisheries (Retrieved

on 26/03/2006)

41. Theodoratos D. and Sellis T., 1999. ”Designing data warehouses”,Data and Knowledge Engi-

neering, Vol. 31 No. 3, pp. 279-301.

42. Velasquez D. J., Weber R. H., Yasuda H., and Aoki T., 2005. Acquisition and Maintenance

of Knowledge for Online Navigation SuggestionsIEICE TRANS. INF. and SYST., VOL.E88-D,

NO.5. pp 993 -1003.

43. Zeng Y., Chiang R.H.L., and Yen D.C., 2003. Enterprise Integration with Advanced Information

Technologies: ERP and data warehousing.Information Management and Computer Security,

Vol. 11/3 pp 115-122..

50

Appendix 1: Lead Scientists’ Questionnaire

(i). INTRODUCTION

I am carrying out a study on the use of data warehousing in fisheries. This interview is aimed at finding

out the kind of information you would want out of the warehouse and the the way it should be formatted

and presented.

(ii). RESPONSIBILITIES

• Describe FIRRI and its relationship to the rest of the fisheries sector.

• What are your primary responsibilities?

(iii). RESEARCH OBJECTIVES AND ISSUES

• What are the objectives of FIRRI? What are its top priority research goals?

• What functions and departments within FIRRI are most crucial to ensuring that these key success

factors are achieved? What role do they play? How do they work together to ensure success?

• What are the key research issues you face today? Is there anything that prevents you from meeting

your research objectives?

• Where does FIRRI stand in the use of information technology?

(iv). ANALYSES REQUIREMENTS

• What role does data analysis play in decisions made and by fisheries managers?

• What key information is required to make or support the decisions you make in the process of

achieving your goals and overcoming obstacles? How do you get this information today?

• Is there other information which is not available to you today that you believe would have signif-

icant impact on helping meet your goals?

• Which reports do you currently use? What data on the report is important? How do you use the

information? If the report were dynamic, what would the report do differently?

• What analytic capabilities would you like to have?

Thank You

51

Appendix 2: Information System Audit Questionnaire

(i). INTRODUCTION


out the kind of information you would want out of the warehouse and the way it should be formatted

and presented.



• What are its primary responsibilities?

• Which interest groups does it support?

(iii). USER SUPPORT / ANALYSES AND DATA REQUIREMENTS

• What is the current process used to disseminate information?

• What tools are used to access/analyse information today? Who uses them?

• Are you asked to perform routine analyses? Do you create standardised reports?

• Describe typical ad hoc requests. How long does it take to fulfil these requests?

• What is the technical and analytical sophistication of the users?

• What is the biggest bottleneck/issue with the current data access process?

(iv). DATA AVAILABILITY AND QUALITY

• Which source systems are used for frequently-requested information?

• How often is the data updated? Availability following update?

• How much history is available?

• What are the known bottlenecks in current source systems?

• Do you currently have common source files? Who maintains the source files?

• How are changes captured?

• What else should I know about FIRRI and its information systems?

• What must this project accomplish to be deemed successful?

Thank You

52

Appendix 3: End-User Questionnaire

(i). INTRODUCTION


out the kind of information you would want out of the warehouse and the way it should be formatted

and presented.



• What are your primary responsibilities?

(iii). RESEARCH OBJECTIVES AND ISSUES

• What are the objectives of FIRRI? What are its top priority research goals?

• What are the key research issues you face today?

• Describe your research disciplines. How do you distinguish between research disciplines? How

do you categorise research disciplines?

(iv). ANALYSES REQUIREMENTS

• What type of routine analysis do you currently perform? What data is used? How do you cur-

rently get the data?

• What do you do with the information once you get it?

• What analysis would you like to perform? Are there potential improvements to your current

method/process?

• Which reports do you currently use? What data on the report is important? How do you use the

information? If the report were dynamic, what would the report do differently?

• What analytic capabilities would you like to have?

• Are there specific bottlenecks to getting at information?

• How much historical information is required?

What must this project accomplish to be deemed successful?

Thank You

53