data warehouse cit masters report
DESCRIPTION
Project report for data warehousing CIT mastersTRANSCRIPT
Data Warehousing in Fisheries: A Case Study of The National
Fisheries Resources Research Institute (FIRRI)
By
Onyango Gerald
B.Sc(Mak), M.Sc(Mak).
Department of Information Systems,
Faculty of Computing and Information Technology, Makerere University.
Email: [email protected] / Phone : +256782523740
A Project Report Submitted to School of Graduate Studies in
Partial Fulfillment of the Requirements for the Award of the Degree of
Master of Science in Computer Science of Makerere University
OPTION : Management Information Systems
Supervisor : Dr. Ogao Patrick
Department of Information Systems,
Faculty of Computing and Information Technology, Makerere University.
Email: [email protected],+256-41-540628, Fax:+256-41-540620
October 2006
Declaration
I, Onyango Gerald, do hereby declare that this Project Report is original and has not been published
and/or submitted for any other degree award to any other University before.
Signed: .......................................................... Date: ...........................................
Gerald Onyango,
B.Sc., M.Sc.
Approval:
This Project Report has been submitted for Examination with my approval as the supervisor
Signed: .......................................................... Date: ...........................................
Dr. Patrick Ogao, Ph.D.
Department of Information Systems,
Faculty of Computing and Information Technology.
i
Dedication
To Dad and Mum who made it possible.
“Good things in life do not come by easily”
ii
Acknowledgments
There are a number of people who made all this possible. Thanks be to God for the strength and wis-
dom he gave me throughout the study, and to various people who assisted me in one way or another
that enabled me see the fruits of this project.
My sincere appreciation goes to my supervisor, Dr. Patrick Ogao, without whose help this work would
not be as it is.
Without the support of my parents and siblings, this study would not have been nurtured to fruitation.
I also acknowledge all my friends and classmates for having made my academics at Makerere joyous
and fruitful. Special thanks go to my coursemates Mr. Ssemwogerere Tom and Mr. Ndyanabo Antony
who provided me with the core software I used in this project.
MAY THE ALMIGHTY GOD BLESS YOU ALL ABUNDANTLY
iii
Abstract
Information on the status and trends in fisheries is key to sustainable exploitation and management of
fisheries resources. In Uganda, the organisation charged with research and dissemination of fisheries
information is The National Fisheries Resources Research Institute (FIRRI). FIRRI packages this in-
formation in brochures, posters, videos, and press releases. However, preparation of this information,
within FIRRI, is an uphill task because most of this information is scattered in files and other storage
media scattered among the Institute’s different research disciplines. This called for a system that can
centralise the storage and dissemination of the information generated within FIRRI. A Data Warehous-
ing system was the system of choice to remedy the situation.
Work on the development of the Data Warehousing System commenced with the development of a
Data Mart for one of the research disciplines. A Data mart for the Fish Biology and Ecology disci-
pline was designed and developed using Microsoft SQL Server 2005. SQL Server Integration Services
(SSIS) was used to develop the Extract, Transform, and Load (ETL) tools, while SQL Server Analysis
Services (SSAS) was used to develop the dimensional data cubes. Microsoft Excel, fitted with Cube
Analysis add-in, was chosen as the enduser interface. Validation of the system proved it to be func-
tioning as required. The results of the study show that it is possible to centralise information storage
and retrieval in fisheries using a data warehouse. It provides evidence that centralised data storage,
information retrieval, and reporting in FIRRI is both possible and attainable.
Shortage of time could not allow for the development of a fully fledged Data Warehouse, complete with
a web interface. Since it was only possible to build a data mart for one of the disciplines within FIRRI,
it is proposed that future work comprise of the development of an Enterprise wide Data Warehouse that
can even be accesses through the World Wide Web / Internet.
iv
Contents
1 Introduction 1
1.1 Background to The Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 The Case for a Data Warehouse (DW) and Data Mining . . . . . . . . . . . . . 3
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 General Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.2 Specific Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Justification of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Literature Review 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 The National Fisheries Resources Research Institute (FIRRI) . . . . . . . . . . . . . . 5
2.3 Data Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 The Data Warehousing Concept . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.2 Data Warehouse Design Model . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.3 Data Warehouse Structure and Tools . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.4 Data Mart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.5 Current Approaches to Data Warehouse (DW) Development . . . . . . . . . . 8
2.3.6 Analysis of the Current Approaches to Data Warehouse Development . . . . . 9
2.4 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Some Fisheries Data Warehousing Projects . . . . . . . . . . . . . . . . . . . . . . . 11
3 Methodology 13
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 System Study and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 System Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5 System Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Implementation 16
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
v
4.2 System Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.1 Fisheries Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.2 Usage of FIRRI’s Information System . . . . . . . . . . . . . . . . . . . . . . 17
4.2.3 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.4 Non-functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.5 User Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.6 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3.1 Logical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.2 Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.3 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 System Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4.1 Database Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4.2 Data Extraction, Transformation, and Load (ETL) . . . . . . . . . . . . . . . . 29
4.4.3 Analysis Cubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4.4 Enduser Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4.5 System Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Conclusions, Limitations, and Future Work . . . . . . . . . . . . . . . . . . . . . . . 46
4.5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.5.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
vi
List of Figures
4.1 Warehouse Architecture for the FIRRI Fisheries Data Warehouse . . . . . . . . . . . . 20
4.2 Fish Catch Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Fish Prey Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Fish Biology Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.5 Fish Gonad Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.6 Fish Catch-Length Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.7 Fish Catch Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.8 Fish Catch-Length Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.9 Fish Biology Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.10 Fish Prey Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.11 Fish Gonad Fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.12 Date Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.13 Water Body Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.14 Catch Type Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.15 Species Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.16 Fishing Gear Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.17 Sex and Maturity Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.18 Prey Type Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.19 Length Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.20 Excel Source Adapter Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.21 Flatfile Source Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.22 Unique Identifiers for Rows of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.23 Dimension and Fact Table Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.24 Foreach Loop Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.25 Centralised Running / Execution of Packages . . . . . . . . . . . . . . . . . . . . . . 32
4.26 Execute SQL Task Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.27 Data Cleaning and Surrogate Key Generation Control Flow . . . . . . . . . . . . . . . 33
4.28 Cleaning the Dimension-Data Flow and Generating Surrogate Keys . . . . . . . . . . 33
4.29 Fuzzy Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.30 Correcting Spelling Mistakes and Adding Missing Data Entries . . . . . . . . . . . . . 34
4.31 Sorting The Data Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
vii
4.32 Surrogate Key Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.33 Inner Joining Two Data Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.34 Mapping Source Data to The Destination Table . . . . . . . . . . . . . . . . . . . . . 37
4.35 Fact-Data Cleaning and Transformation Data Flow Task . . . . . . . . . . . . . . . . 38
4.36 Surrogate Key Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.37 Loading Data into the Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.38 Example of Warehouse Dimension Table Data Flow Task . . . . . . . . . . . . . . . . 39
4.39 Example of Warehouse Fact Table Data Flow Task . . . . . . . . . . . . . . . . . . . 40
4.40 Structure of Analysis Data Cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.41 Length Frequency Distribution ofOreochromis niloticus . . . . . . . . . . . . . . . . 42
4.42 Maximum Weight of Selected Fish Species Across 4 Quarters . . . . . . . . . . . . . 43
4.43 Check of Rows Written to the Data Warehouse . . . . . . . . . . . . . . . . . . . . . 44
4.44 System Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.45 Rows Written During Data Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
viii
List of Acronyms
1. CMR - CSIRO Marine Research
2. DW - Data Warehouse
3. EDW - Enterprise Data Warehouse
4. FAO - Food And Agricultural Organization
5. FIRRI - Fisheries Resources Research Institute
6. GUI - Graphic User Interface
7. IT - Information Technology
8. MOLAP - Multidimensional Online Analytical Processing
9. NARS - National Agricultural Research System
10. ODBC - Office Database Connectivity
11. OLAP - Online Analytical Processing
12. PARI - Public Agricultural Research Institute
13. SSAS - SQL Server Analysis Services
14. SSIS - SQL Server Integration Services
ix
Chapter 1
Introduction
1.1 Background to The Study
Fisheries is the industry or occupation devoted to the catching, processing, or selling of fish, shellfish,
or other aquatic animals (The Free Dictionary, 2006) [40]. The fisheries and aquaculture sector is ex-
tremely important in terms of food security, revenue generation and employment (Sugiyama, 2005)
[39]. Sugiyama (2005) [39] noted that catching or farming aquatic resources makes an integral con-
tribution to rural livelihoods in many parts of the Pacific region. But although fisheries resources are
renewable they can be depleted through unsustainable exploitation. It is therefore important to ensure
that there is guided development and management of this asset so that it can continue contributing to
the livelihood of the people who depend on it.
Sugiyama (2005) [39] argues that knowledge of the status and trends of fisheries, including socio-
economic information on fishing communities, is a key to using aquatic resources in a sustainable way.
Sugiyama (2005)[39] believes that adequate fisheries data, and information that are timely and reli-
able, provide a basis for sound policy development, better decision-making and responsible fisheries
management. Sugiyama (2005) [39] says that this information is required at the national level for the
maintenance of food security and for describing social and economic benefits of fisheries, as well as
for assessing the validity of fisheries policy and for tracking the performance of fisheries management.
Sugiyama (2005) [39] also observed an increasing need for fisheries information outside of the govern-
ment sector. Consequently information is a priority for the sustainable exploitation and management of
fish stocks (FIRRI, 2003) [11].
In Uganda, the national institution mandated to undertake, promote and streamline fisheries research
and to ensure dissemination and applicationof research results, is The National Fisheries Resources Re-
search Institute (FIRRI) (FIRRI, 2000; FIRRI, 2001; FIRRI, 2002; FIRRI, 2003; FIRRI, 2004; FIRRI,
2005; FIRRI, 2006a;) [8], [8], [9], [10], [11], [12], [13]. FIRRI contributes to the fisheries sub-sector
1
developmental objective by providing information to guide sustainable management of capture fisheries
resources and development of aquaculture (FIRRI, 2003) [11]. Therefore, the final products of FIRRI’s
outputs are Technical Guidelines containing technologies, methods and advice to guide development
and management of the fisheries of different aquatic systems, and development of aquaculture. The in-
formation packages are produced in the form of books, booklets, fact sheets, brochures, posters, video
films and press releases to service providers and resource users. FIRRI disseminates this information to
fishing communities and other end-users through community barazas, workshops, radio and TV shows
(FIRRI, 2006) [14].
The information system within FIRRI was originally manual and paper based. With the advent of com-
puters, different functional areas within the institute developed their own file management systems.
This independent keeping of files by the individual functional areas created data redundancy and incon-
sistency, program-data dependence, inflexibility, poor security, and lack of data sharing and availability.
Inmon (1993) [18] argues that factors such as: having the same data present on different systems, in
different departments; difficulty in getting timely, meaningful information; multiple systems giving
different answers to the same business questions; and limited analysis by decision makers and policy
planners due to non-availability of sophisticated tools and easily decipherable, timely and comprehen-
sive information calls for a data warehouse.
Having noted that lack of effective and timely information from research to fishing communities and
other stakeholders is a major constraint to sustainable fish production and utilisation, FIRRI is devoted
to the development of a Fisheries Database and Information centre (FIRRI, 2006) [14]. FIRRI hopes
that the development of a Fisheries Database and Information centre will facilitate timely acquisition
and exchange of information on all water bodies in the country and also create a central station from
which this information can be obtained. However, Mahadik (2002) [26], claims that as the quantities
of information and data handled by organisations increase, the traditional means of analysing the data
like reports and query tools prove to be inadequate. Mahadik (2002) [26] believes that powerful system
navigation and information exploration tools that use hypermedia, dynamic visual querying and tree
maps should be availed. Mahadik (2002) [26], asserts that it should be ensured that: employees are
free to communicate with each other and share data and information freely across the organisation;
data dictionaries are created and regulated; and online data is reformatted before being inserted into the
company wide databases. Mahadik (2002) [26] claims that the latest development in analytical tools,
that enables organisations find meaning in their data, is data mining.
Therefore, to enhance the availability of information in the Ugandan fisheries sector, there is need to
enhance the processing efficiency of the data analysed in FIRRI, and also enhance the dissemination
capacity. The optimal solution to this problem would be to build a data warehouse in FIRRI and add
data mining tools to the data warehouse to improve on data analysis, and information dissemination,
efficiency.
2
1.1.1 The Case for a Data Warehouse (DW) and Data Mining
A data warehouse is a subject-oriented, time-variant, nonvolatile database or repository of information
collected from many different sources and centrally stored, usually in a single location. Information
from multiple sources in different locations, applications or files, be it in different operating units or
departments, can be standardised and stored in a single repository. This consolidation of the data store
eliminates the reconciliation of inconsistent data, avoids lengthy adhoc manipulation of data from dif-
ferent sources, and improves data quality. Data can be retrieved in a matter of minutes.
In a data warehousing system, the user can create most of his or her own queries and reports by him or
her self. He/she recognises the information (s)he wants, makes a request (query) to the data warehouse,
and data or information stored in the warehouse is delivered to him/her. Tools such as Online Analytical
Processing (OLAP) and data mining improve enduser analysis capabilities and shrink the time between
the occurrence of an event and the subsequent alert of the managers. In a data warehousing system,
data can be retrieved in a matter of minutes.
A data warehouse contains only ”trusted” data, data that has been cleaned. This guarantees the accuracy
and reliability of the data and information in and from a warehouse. Historical data is also stored within
the data warehouse, which information can be used to carry out trend analysis and ”what” if analyses.
1.2 Problem Statement
FIRRI scientists are required to produce field reports after every field trip. The institute is also required
to come up with quarterly reports, and an annual report, detailing the activities performed within the
period, as well as packaged information for the stakeholders in the fisheries sector. Under the current
set up whereby information and data is scattered among different functional areas, integration of data,
compilation of reports and packaging of information for stakeholders is an uphill task. Dissemination
of information through community barazas, workshops, radio and TV shows does not enable real time
provision of information as one has to wait untill such an event is organised before one can get access
to the information. And the current information system cannot handle complicated ad hoc enquiries
such as cross-tabulation.
1.3 Objectives
1.3.1 General Objective
To develop a Data Warehousing information system that supports fisheries data and information storage,
and retrieval, from a centralised location.
3
1.3.2 Specific Objectives
i. To review work similar to, and literature related to, Data Warehousing in fisheries
ii. To design a Data Warehousing system for centralised storage and retrieval of fisheries data
iii. To implement a Data Warehousing system for centralised storage and retrieval of fisheries data
iv. To validate the fisheries Data Warehousing system developed
1.4 Justification of the study
A data warehouse system will provide a centralised location for data and information storage and re-
trieval and a range of ad hoc and standardised query tools, analytical tools, and graphical reporting
facilities for data mining. These tools will perform high-level analyses of hidden patterns, relation-
ships, or trends, and will drill into more detail where needed. The patterns inferred from the data could
be used to predict future behaviour and guide decision-making. The data warehouse will create an
increase in information availability, efficiency, scope and accuracy of scientific reporting, and provide
new opportunities for reaching out and passing on information to the fisher community via the Inter-
net. Sugiyama (2005) [39] believes that with more accurate and timely information at the community
level, the public is likely to be better informed and supportive of efforts to manage fisheries and aquatic
resources in a responsible manner. She claims that disseminating timely and readily understandable in-
formation on the status and trends of fisheries should help ensure transparency in fisheries management,
as called for by the Code of Conduct for Responsible Fisheries.
1.5 Scope
Conceptually, the study will focus on the design, development, and implementation of a data warehouse
and data mining system that can enhance data analysis and information dissemination from FIRRI.
Geographically, the study will focus on the Fisheries Resources Research Institute.
4
Chapter 2
Literature Review
2.1 Introduction
This section entails the reviews of works of various writers that are deemed relevant to the study.
2.2 The National Fisheries Resources Research Institute (FIRRI)
Established in 1947, the National Fisheries Resources Research Institute (FIRRI) is a semi-autonomous
Public Agricultural Research Institute (PARI) of Uganda operating under the National Agricultural Re-
search System (NARS) (FIRRI, 2006) [14]. As the fisheries research arm of NARO, research by FIRRI
is currently focusing on providing information for increasing and sustaining fish production and utili-
sation (FIRRI, 2004 [firri04]; FIRRI, 2006 [14]).
FIRRI has its headquarters in Jinja and an outstation at Kajjansi, where the scientific work is organised
according to disciplines such as: Stock Assessment (fish biomass, exploitation rates, etc), fish biol-
ogy and ecology (biodiversity and conservation), fish habitat quality and quantity, distribution, food
webs, physico-chemical characteristics and primary production (water quality), invertebrate studies
and food webs, wetlands, aquatic weeds such as water hyacinth, socio-economics (livelihood analysis,
co-management), aquaculture/fish farming (seed production, feeds (live feeds and commercial feeds)
and pond management/commercialaquaculture (FIRRI, 2003 [11]; FIRRI, 2006 [14]).
5
2.3 Data Warehousing
2.3.1 The Data Warehousing Concept
Data warehousing is the process of collecting data to be stored in a managed database in which the data
are subject-oriented and integrated, time variant, and nonvolatile for the support of decision making
(Chan, 1999) [3]. Data from the different operations of a corporation are reconciled and stored in a
central repository (a data warehouse) from where analysts extract information that enables better deci-
sion making (Cho and Ngai, 2003) [4]. Data can then be aggregated or parsed, and sliced and diced as
needed in order to provide information (Fox, 2004) [15].
According to Inmon (1993) [18], a Data Warehouse is a subject-oriented, integrated, time-variant, non-
volatile collection of data used in support of decision making processes (Inmon, 1993) [18]. ”Subject-
oriented” means that a data warehouse focuses on the high-level entities of the business (Chan, 1999)
[3] and the data are organized according to subject (Zenget. al. [43], 2003; Maet. al., 2000) [24].
For example, fisheries data would be organised by fish species, water body, or type of fishing gear.
”Integrated” means that the data are stored in consistent formats, naming conventions, in measurement
of variables, encoding structures, physical attributes of data, or domain constraints (Maet. al., 2000
[24]; Chan, 1999 [3]; O’Leary, 1999 [30]). For example, whereas an organization may have four or
five unique coding schemes for ethnicity, in a data warehouse there is only one coding scheme (Chan,
1999) [3].
”Time-variant” means warehouses provide access to a greater volume of more detailed information over
a longer period (Zenget. al., 2003)[43] and that the data are associated with a point in time (Chan, 1999
[3]; O’Leary, 1999 [30]), such as month, quarter, or year. Warehouse data are non-volatile in that data
that enter the database are rarely, if ever, changed once they are entered into the warehouse (Zenget.
al., 2003 [43]; Chan, 1999 [3]). The data in the warehouse are read-only; updates or refresh of the data
occur on a periodic, incremental or full refresh basis (Zenget. al., 2003) [43]. Finally, ”nonvolatile”
means that the data do not change (Chan, 1999) [3].
2.3.2 Data Warehouse Design Model
Data Warehouses typically use the multidimensional and relational storage structures (Bose, 2006)
[2]whose models are developed using Cube and Star schemes (Velasquezet. al., 2005)[[42]]. The
multidimensional structure physically stores the data in array-like structures that are similar to a data
cube. In the relational structure the data is stored in a relational database using the star and snowflake
schemas (Bose, 2006 [2]; O’Leary, 1999 [30]).
Bose (2006) [2] observed that summary data are modeled as a multidimensional data cube consisting
6
of measure and dimension attributes. With the support of OLAP for multidimensional analysis, users
can synthesise enterprise information through comparative customised viewing, and analyse historical
and projected data (Maet. al., 2000) [24]. Bose (2006) [2] noted that at the instance level the values
of the dimension attributes are assumed unique to determine the values of all measure attributes. He
affirms that a multidimensional data cube consisting of dimension and measure attributes is called a
fact table. In addition, a multidimensional data cube contains a dimension table for each dimension
attribute in the star schema (Bose, 2006) [2]. O’Leary (1999) [30] observes that at the centre of the
star is the event table (the fact table), and surrounding the event, at the points of the star, are dimension
tables containing the resources, time, and location dimensions.
O’Leary (1999) [30] noted that fact tables hold particular measures of the event, and include foreign
key references to dimension tables at each of the points on the star. He also observed that the par-
ticular process being modelled influences which resources, events, agents or locations are included in
the dimensions and the number of tables used to represent each. O’Leary (1999) [30] observed that
dimension tables describe the properties of the dimensions at hand, and are kept on each dimension that
decision makers would like to either rollup or drill down. He noted that in some situations there is a
need to generate additional tables from some of the dimensions, resulting in a snowflake schema.
The snowflake schema is a star schema whose dimension tables are normalised (Bose, 2006) [2], and
whose dimensions have embedded foreign keys so that dimension tables have relationships with other
dimension tables, creating tables for attributes within a dimension table (O’Leary, 1999) [30]. A DW
design is often built around a time dimension so that the DW contains data over several periods of
time. This feature allows users to perform extensive yearly, quarterly, and monthly analyses that help
enable the identification of patterns and trends (Theodoratos and Sellis, 1999) [41]. O’Leary (1999)
[30] noted that use of the star or snowflake schemas is aimed at limiting access and query problems in
a data warehouse environment.
2.3.3 Data Warehouse Structure and Tools
Ma et. al., (2000) [24] observed that the data warehouse has a distinct structure whose components
include current detail data, older detail data, lightly summarised data, highly summarised data, and
meta-data. The current detail data reflects the most recent happenings, stored on disk and accessed by
end-user analysts (Maet. al., 2000) [24]. Meta-data are data that describe the meaning and structure
of business data, as well as how they are created, accessed, and used (Devlin, 1997) [6]. They describe
what is in the data warehouse, specify what comes into and out of the data warehouse, schedule extracts
based on a business events schedule, documenting and monitor data synchronisation requirements, and
measuring data quality (Maet. al., 2000) [24].
Chan (1999) [3] observed that Internet-based Decision Support Systems (DSS) and Executive Informa-
7
tion Systems (EIS) can be built on data warehouses to support distributed decision processes. The
web-based multidimensional on-line analytical processing (MOLAP) systems enable users to view
summary data by zooming in on details by column, by row, or by cell displayed on multiplayer spread-
sheets (Chan, 1999) [3]. According to Chan (1999) [3], this ”slice and dice” capability enables users
examine data horizontally, and changes in aggregated performance data can be traced back to unit-level
productivity. He observed that in a networked environment, this means that decision makers can link
forecasting with operational data in a dynamic manner.
According to Pipe (1997) [32], a warehousing system has: design tools to design warehouse databases;
source data acquisition tools to capture data from source tables and databases, and clean, enhance, trans-
port and apply it to data warehouse databases; a data manager to manage and access warehouse data;
graphic user interface (GUI) and Web-based data access tools to provide end-users with tools they need
to access and analyse warehouse data; a delivery manager to distribute warehouse data and other infor-
mation objects to other data warehouses, desktop applications, and Web servers on a corporate Intranet;
middleware to connect data access tools to warehouse databases, and the delivery manager to target
systems; an information directory to provide administrators and business users with information about
the contents and meaning of data stored in warehouse databases; and warehouse management tools to
administer data warehouse operations. GUIs use multimedia to enhance the impact on information and
decision-making support generated through data warehousing (Maet. al., 2000) [24].
2.3.4 Data Mart
A data mart is a subset of the enterprise-wide data warehouse (O’Leary, 1999 [30]; Poeet. al., 1998
[33]; Singh, 1998 [38]). Unlike the data warehouse which is traditionally meant to address the needs of
the organisation from an enterprise perspective, a data mart has a limited scope and performs the role
of a departmental, regional or functional data warehouse (Bose, 2006 [2]; Singh, 1998 [38]; Poeet. al.
[33], 1998). According to Bose (2006) [2], the difference between an enterprise DW and a data mart is
essentially a matter of scope. Because data marts are developed for specific business purposes, system
design, implementation, testing and installation are less costly than for data warehouses (O’Leary,
1999) [30]. O’Leary (1999) [30] observed that where data warehouses can take years to develop, data
marts can be developed in a few months, at a much smaller cost. A data mart often uses aggregation or
summarisation of the data to enhance query performance.
2.3.5 Current Approaches to Data Warehouse (DW) Development
A number of ways can be used to build a DW. An organisation may either build a single DW, or have
a multi-tier data warehouse system (Bose, 2006) [2]. In the single data warehouse architecture there
is one centralised data warehouse where source systems feed their data to directly, and where the end
users obtain data and/or information. In the multi-tier warehousing system, an enterprise data ware-
8
house coexists with several data marts (Bose, 2006 [2]; Pipe, 1997 [32]). In this system one can either
have independent or dependent data mart architecture (Zenget. al., 2003 [43]; Bose, 2006) [2]. In the
independent data mart architecture, the source systems feed the data marts, and the warehouse is fed
by the data marts (Bose, 2006 [2]; Zenget. al., 2003 [43]). According to Bose (2006) [2] and Zeng
et. al. (2005)[43], the dependent data mart architecture has a central data warehouse that contains the
”corporate view of the data” and supplies the departmentaldata marts with the specific data they require.
Bose (2006) [2] observed that the variations on the multi-tier approach that have been implemented in
organisations are top-down, bottom-up and hybrid. With the top-down approach, data marts are seen
as follow-on to the construction of an Enterprise Data Warehouse (EDW) (Atkinson, 2001 [1]; Pipe,
1997 [32]). In this implementation approach, data flows from the source to enterprise warehouse to
data marts, and the implementation follows the waterfall approach (Pipe, 1997) [32].
The bottom up approach is to first build data marts and then an EDW (Atkinson, 2001 [1]; Pipe, 1997
[32]). The enterprise warehouse evolves bottom-up as a new layer on top of existing data marts, and
data marts are loaded directly from source systems, and the enterprise warehouse is loaded from the
data marts (Bose, 2006) [2]. In this case the corporate data warehouse project begins with a small pilot
project for a specific subject area. Bose (2006) [2] affirms that in so doing, both a data mart and the
first data warehouse are created simultaneously.
Bose (2006) [2] and Pipe (1997) [32] noted that the hybrid approach, or parallel strategy, might include
elements of both the top-down and bottom-up approaches. They argue that in this approach the enter-
prise model is developed first and documented at a high level, so certain subject areas may be modelled
in more detail as warehouse development proceeds. In this approach, therefore, the data warehouse is
developed incrementally (Pipe, 1997) [32].
2.3.6 Analysis of the Current Approaches to Data Warehouse Development
Bose (2006) [2] says that the implementation of a single Data Warehouse establishes a single, reliable
source for data and provides a more integrated solution for reporting and decision support across func-
tional areas. However, they are not well suited for highly specialised data needs (Bose, 2006) [2]. Bose
(2006) [2] believes that the data mart solution, where a data warehouse coexists with data marts, may
be well suited for highly specialised data needs. Pipe (1997) [32] affirms that in the multi-tier ware-
house architecture involving an EDW and underlying data marts, data is located where it can deliver
the highest availability and performance, without sacrificing integrity or control over the management
of corporate data for business decision-making. Bose (2006) [2] and Pipe (1997) [32] claim that in the
long run, a multi-tier warehouse architecture/system is the optimal one .
9
Both the top-down and bottom-up approaches to data warehouse development have their strengths and
weaknesses. The advantage of the Top-down implementation approach is that it leads to a planned, in-
tegrated multi-tier solution, and improves the consistency of information in the data marts (Bose, 2006
[2]; Atkinson, 2001 [1]; Pipe, 1997 [32]). However, a top-down approach can create problems when
the data marts are added later and it cannot deliver solutions fast enough for an organisation to quickly
exploit new business opportunities (Atkinson, 2001) [1]. Bose (2006) [2] and Pipe (1997) [32] argue
that this approach usually takes more time and is relatively costly.
Bose (2006) [2] and Pipe (1997) [32] point out that the bottom-up approach gives quick results and a
high return on investment. However, if the spread of data marts in this approach is not controlled, there
can be integration problems between the data marts and the future EDW (Atkinson, 2001) [1]. Bose
(2006) [2]believes that the bottom-up approach eventually yields a disintegrated warehouse because the
data marts often do not conform to a common model. Pipe (1997) [32] concludes that an ideal solution
to the top-down and bottom-up approaches would be a synergistic marriage of the two approaches
to maximise the strengths, and minimise the weaknesses, of each approach. Pipe (1997) [32] claims
this strategy supports incremental and evolutionary data warehouse development. Bose (2006) [2]
advises that during development, too much must not be take on at once as this can leave users feeling
abandoned and the development team overwhelmed. He believes that an incremental approach yields
the best results.
2.4 Data Mining
Data mining is the process of applying artificial intelligence techniques to large data sets in order to de-
termine data patterns (Maet. al., 2000) [24] and extract previously unknown but significant information
(Singh, 1998) [38]. On the front-end (client side), data-mining tools allow users to analyse contents of
the data warehouse via graphical, tabular, geographic, and syntactic reports. The front-end data-mining
tool provides the user with an intuitive, graphical tool for creating new analyses and navigating the data
warehouse. This helps focus user’s analysis so that relevant information can be obtained faster and
more effectively (Maet. al., 2000) [24].
Data mining applications utilise information stored in the warehouse to generate business-oriented, end-
user-customised information (Maet. al., 2000) [24], and statistical summaries from different views of
data (Cho and Ngai, 2003) [4]. They can be applied in conjunction with OLAP to form an integrated
business solution (Cho and Ngai, 2003)[4]. Data mining is critical to the enterprise that wants to
exploit operational and other available data to improve the quality of decision-making and gain critical
competitive advantages (Maet. al., 2000) [24]. Accurate data identification and analysis improves
the quality of decision making; strong navigation, computation, synthesis capabilities make it possible
to gain critical competitive advantages; relevant information is obtained faster and time is used more
10
effectively (Maet. al., 2000) [24].
2.5 Some Fisheries Data Warehousing Projects
A number of countries have embraced the data warehousing and data mining concepts in their fisheries
sector. NetCoast (2001) [29] observed that The European Union implemented COASTBASE, a virtual
coastal and marine data warehouse for integrated, distributed information search, access and feedback.
The CoastBase client (based on HTML and Java) provides uniform, multilingual and interactive access
to all CoastBase services. The ultimate aim of the project was to improve marine and coastal research,
assessment, policy making and cooperation along Europe’s coasts.
Rees and Finney (2000) [34] report that CSIRO Marine Research (CMR) Australia, developed a data
warehouse using ORACLE 8i. The client software is written in Java and uses Java’s Remote Method
Invocation (RMI) and Java Database Connectivity (JDBC) to connect to the underlying ORACLE data
store. The database schema has marine, biological, chemical and physical oceanographic parameters,
and is designed so that sampled parameters are primarily referenced via spatial coordinates and a time
stamp. This allows users examine integrated datasets according to spatial and temporal constraints. For
example, a biologist interested in species distribution in a particular geographic area can also acquire
any available habitat data (e.g. water column parameters and seafloor sediment composition) for that
region and time period, a feature that may be important if one is looking for any correlations between
habitat type and species distribution. This is also useful if models are to be employed in order to in-
terpolate between known data points to produce a species distribution map, or to plot the potential
distribution of a species based on its known habitat or other environmental surrogate. Users invoke the
data warehouse interface via CMR’s web page.
Kupca (2004) [22] reports that in Iceland, the Marine Research Institute, Reykjavik, developed a fish-
eries data warehouse structured around 48 tables that include biological sample data, catch data, stom-
ach data, tagging data and incomplete data that do not fit the common DW structure. To make the DW
portable and platform independent, the Linux operating system and PostgreSQL RDBMS were cho-
sen. PHP was used as the programming language to develop a web-based interface and the upload and
extraction parts. An SQL command sent to the database retrieves and presents its result in an HTML
table. To ease the use of the DW, there are predefined table aliases and groups of useful joins that enable
composition of complex multiline queries within seconds. Metadata are split into five topics (biological
samples data, stomach data, catch data, acoustic data and tagging data) and include the information on
time, position and species in each topic.
Fisheries data warehouses have been put to varying advantages. Scottish Executive Publications (2006)
[36] asserts that the IFISH data warehouse brings together fisheries data as a shared resource and has
11
resulted in a substantial reduction in the burden on each department to produce data for the other. The
Government of British Columbia (2006)[16] reports that their webpage FishInfo BC provides on-line
access to the British Columbia Fisheries Data Warehouse and to federal-provincial fisheries datasets
where all data are linked to ”active” maps and to standard tables and reports that allow users choose
exactly what they want to know about any location, and then print their own personal reports. In support
of India’s development of a DW, that includes fisheries among other agricultural disciplines, Sharma
et. al. (2006) [37] claimed that a DW can improve the quality of research and planning, reduce the
duplication of research efforts, encourage dissemination of research findings, and facilitate qualitative
research supported by agricultural databases. Therefore, development of a data warehouse that has data
mining capabilities would go a long way in improving fisheries management in the Ugandan fisheries
sector.
12
Chapter 3
Methodology
3.1 Introduction
This chapter entails the approach used to undertake the fisheries data warehouse development project.
The case study method was used because it helps get the gist of the study since case studies normally
look at cross sectional research focused at subject variables. According to Olsen and Marie (2004) [31],
cross sectional research gives better subject selection and measurements. The project was implemented
in three main phases: system study and analysis; system design; and system development. The means
used to validate the system under development is also included.
3.2 System Study and Analysis
According to Kakinda (2000) [19], research design is the structure or nature of research, which may
either be qualitative or quantitative. A qualitative approach was used to evaluate information system
and datasets, and procedures pertaining to management of research work in FIRRI. Fact finding was
based on:
1. Interviews carried out with the staff of the National Fisheries Research Institute (FIRRI). Sample
interview questions are presented in Appendix 1, 2, and 3.
2. Document analysis: a number of documents were analysed so as to gain more understanding of
the type and contents of the reports required of the data warehousing system. Documents studied
included FIRRI’s Annual reports for the years 1997 - 1998, 1999 - 2000, 2000 - 2001, 2002 -
2003, 2003 - 2004, and 2004 - 2005; field reports; as well as FIRRI’s Survey Report on its study
of the Upper Victoria Nile River under the Bujagali Hydroelectric Power Project (NARO/FIRRI,
2001) [28].
13
3.3 System Design
The methodology used to design the dimensional model was adapted from Kimball (1996) [20] and
Connolly and Begg (2001) [5]. First, the subject matter for the data mart was identified and the grain
of the fact table (what a fact table record represents) decided. The grain of the fact table determined the
minimum level at which data was referenced, and also enabled the identification of the dimensions, as
well as the grain of each of the dimension tables. The dimensions were then identified and conformed.
For each dimension chosen, all dimensional attributes that filled out each dimensional table were de-
scribed.
Next, facts that populate each fact table record were chosen. Facts comprised of numeric additive
quantities, and were expressed at the level implied by the grain. Once fact tables had been selected,
each fact table was re-examined to determine whether there are opportunities to use precalculations.
This applied to those values that may be incorrectly derived by users. As many text descriptions as
possible were then added to the dimension tables. The duration of the database, how far back in time
the fact table goes, was then chosen. All records related to an old attribute name were linked to that old
attribute name and those related to the new attribute name were accordingly linked to it, so as to track
slowly changing dimensions. Query priorities and the query modes was then decided.
3.4 System Development
System development involves the implementation, testing and refinement of the system. The data ware-
house was developed iteratively using the Data Warehouse Lifecycle based on Zachman’s approach. A
multi-tier warehouse architecture involving an EDW and underlying data marts was developed using
the hybrid/parallel approach to data warehouse development (Bose, 2006 [2]; Atkinson, 2001 [1]; Pipe,
1997 [32]). The project started with the development of a data mart for the Fish biology and ecology
research discipline in FIRRI. The steps to developing a data warehouse/mart as advocated by Roland
and Leonard (2005), Velasquezet. al. (2005) [42], and Chan (1999) [3] were considered during the
development process. Sample data was run through the system to establish whether it was functioning
as required.
3.5 System Validation
Validation entails the confirmation by examination and provision of objective evidence that an infor-
mation system has been implemented correctly and that it conforms to user needs and intended uses.
During design and development planning, the validation plan was developed to identify required vali-
dation tasks and procedures for reporting anomalies and their resolution. In the requirements definition
phase, testable user and functional requirements for the data warehouse were established.
14
During the design phase, care was taken to ensure that the software development and management
procedures were consistent with accepted practices. At the implementation phase, functional testing
was performed to check if the system performs functions as specified in the functional specifications.
To facilitate tracking and problem resolution processes, each batch of input data extracted was assigned
a unique identifier linking it back to the source. The system was also fitted with a log file as an indirect
link between the source and the input transaction. The mapping utilised by the ETL tool was reviewed,
care being taken to ensure that the data being loaded into specific data elements is in fact being sourced
from the right tables in the source systems. Each data element was given a formal description and
a mapping back to the source table(s) used to populate it during the ETL process. Simple database
queries were run on the tables in the warehouse to count the number of records in the data warehouse.
These counts were then compared with the number of data entries in the source systems. Equality of
these counts led to the assumption that records were not left out due to an error during the ETL or
simple load process. This was further verified by the lack of errors (not necessarily warnings) in the
exception reporting by the ETL tool. For additional verification, actual rows from both the source and
data warehouse tables were randomly selected, printed, and listed side by side for comparison.
15
Chapter 4
Implementation
4.1 Introduction
This chapter entails what was used as the basis for understanding and implementing the fisheries data
warehousing project. The findings of the requirements elicitisation, analyses of the findings and subse-
quent use of the findings to develop the system are presented.
4.2 System Analysis
4.2.1 Fisheries Data
Fisheries data typically comprises information on the activity of fisherfolk and their catches, plus results
of scientific surveys aimed at learning more about the biology, population dynamics, and movements
of the species concerned. This information is then used by the fisherfolk and fisheries managers to
anticipate the most favourable conditions and locations for fishing, and thereby maximise the catches
while reducing effort. The data can also be used to conduct independent assessment of stocks and mod-
eling of the resource dynamics, so as to be able to either support, confirm or dispute the soundness of
decisions made by the relevant fishery managers.
Fisheries data includes: reports and information summaries on catch and landings data; scientific ob-
servational data; and environmental Data. Summary (and derived) data includes aggregated statistics
by region, season, lengths of fish caught, etc. Catch-and-effort data includes information on the fishing
activity of the fishermen (boat movements; hours, locations and depths fished; gear type used, etc.).
Scientific survey data is similar to the catch-and-effort data from commercial operations but is less bi-
ased towards areas where catches would be expected to be highest. Biological data may be collected on
commercial boats or scientific surveys. Environmental Data is ancillary data such as water temperatures
and other hydrologic conditions, which may provide insight into the biological patterns observed.
16
4.2.2 Usage of FIRRI’s Information System
Information gathering, analysis and dissemination in FIRRI is shared among its eight disciplines. The
most important discipline is, reportedly, the Fisheries Biology and Ecology Discipline. Field data is
mainly obtained on a quarterly basis, though sometimes data is obtained monthly. This data is stored
and used to perform routine analyses and produce standardised reports such as: field reports; quarterly
reports; and annual reports. Workshop papers and papers meant for scientific publications are also pre-
pared from the reports generated from the system.
Reports prepared by FIRRI’s Fish Biology and Ecology discipline is aimed at answering questions
on the structure of fish stocks and how this varies with location; and the life history of fish species,
particularly with regard to age and growth, recruitment to the fishery, reproductive biology, migration
and other movement patterns, diet and place in the ecosystem, and natural mortality in the absence
of fishing pressure. This information is used to prepare brochures, enact legislation aimed at fisheries
management and conservation, monitor the environmental conditions and fish habitats in the different
water bodies, regulate the fishing effort, and recommend to stakeholders the best fishing practices that
may lead to sustainable exploitation of the fish resources.
The stakeholders in the fisheries sector, who need and make use of the information generated and pack-
aged in FIRRI, include: the fisherfolk; The National Agricultural Research Institute (NARO); Interna-
tional and regional collaborators such as research institutions around Lake Victoria [The Kenya Marine
Fisheries Research Institute (KMFRI) in Kenya, The Tanzania Fisheries Research Institute (TAFIRI)in
Tanzania], and The Lake Victoria Fisheries Organisation of the East African Community; the Uganda
Fisheries Department; several departments at Makerere University, such as Zoology department and
The Makerere University’s Institute of Environment and Natural Resources; NGOs such as the Uganda
Fisheries and Fish Conservation Association (UFFCA); legislators in Uganda’s parliament; schools;
and the general public.
Currently, data is mainly stored in Excel files based on desktop computers scattered among the different
functional areas of the institute. Some historical data are still being stored in paper files, though efforts
are being made transform them into an electronic form for storage in a relational database. Data analysis
is being carried out using Microsoft Excel, SPSS, and other statistical packages. GIS ArcView was
being used to present some of the results from the analysis. Most reports are written using Microsoft
Word.
4.2.3 Functional Requirements
There are a number of functionalities expected of any information system aimed at improving the
current information management in FIRRI. The system should:
17
1. be able to extract data from various files in different storage areas and store them in a centralised
location from where data and information can be retrieved;
2. be able to generate fisheries reports directly from the system;
3. be able to store a massive amount of data over a long period of time so as to enable trend analysis;
4. have an allowance for occasional loading of lump-sum data in the event that alot of data is accu-
mulated during a given quarter.
4.2.4 Non-functional Requirements
The four major non-functional requirements include: (i) system accessibility, (ii) system security, (iii)
Software operability (iv) system performance.
1. System accessibility: any end-user should be in position to access dynamic reports that have
resulted from the analysis of fisheries data.
2. System security: depending on repository content, the system should provide for differing levels
of access to repository content.
3. Software operability: The initial system should be able to make use of the software environment
within FIRRI, and therefore be able to run on the windows operating system.
4. System performance: the system should be able to handle at least 40 concurrent end-users.
4.2.5 User Requirements
The users require a system with:
1. A facility for generation of fisheries reports;
2. Ability to centralise data and information retrieval;
3. A provision for aggregations and generating summaries;
4. Ability to carry out trend analysis;
5. Ability to project trends;
6. Provision of reliability of at least 98 percent uptime.
4.2.6 System Requirements
Since the warehouse database software should run on the windows operating system platform, Mi-
crosoft SQL Server 2005 is recommended. The Microsoft SQL Server 2005 has the SQL Server
Management Studio and SQL Business Intelligence Development Studio that have the Server 2005
Integration Services (SSIS) and SQL Server 2005 Analysis Services (SSAS) that are ideal for ware-
18
house development. To run SQL Server 2005, the following hardware and software are required.
1. VGA or higher resolution;
2. A Microsoft mouse or compatible pointing device;
3. Microsoft Internet Explorer 6.0 SP1 or later;
4. Internet Information Services (IIS) 5.0 or later;
5. ASP.NET 2.0;
6. Windows Installer 3.1 or later;
7. Microsoft Data Access Components (MDAC) 2.8 SP1 or later;
8. Itanium processor or higher;
9. Minimum Processor speed of 1 GHz;
10. Memory (RAM) of at least 512 MB;
11. Windows 2003, or higher, Operating system.
4.3 System Design
In light of informational content and nature of analysis required to come up with information, the sys-
tem that best redresses the shortcomings of the information system in FIRRI is a data warehousing
system. The architectural design of the new system, that shows how the data flows throughout the sys-
tem, is presented in Figure 4.1. The two processes a data warehouse undergoes are data loading (entry)
and access. Loading is carried out using Extract Transform and Load (ETL) tools, while warehouse
data can be accessed using OLAP tools. Therefore, data will be entered into the FIRRI data warehouse
using the ETL tools that extract data already entered into operational systems.
In FIRRI’s architectural design, data is extracted from operational data sources that include the opera-
tional system in FIRRI, flat files, the internet, or decentralised databases located in the district fisheries
offices within the country. The extracted data will be loaded into the staging area, where it will be
cleaned and loaded into the Data warehouse. The data in the warehouse will be inform of meta data,
summary data, and raw data. The warehouse has a provision for archiving and backing up the data.
From the data warehouse, the information and data is availed to the data marts. The data marts are
tailored around the different functional units within FIRRI such as aquaculture and Socioeconomics
(Figure 4.1) or FIRRI’s partners in fisheries information usage and delivery. Endusers in FIRRI’s dif-
ferent disciplines and partner institutions interact with the data marts and are then able to analyse or
mine the data and come up with their reports.
19
Figure 4.1: Warehouse Architecture for the FIRRI Fisheries Data Warehouse
The trigger for the ETL process will be changes and additions to source data, that will bring about a
processing requirement for the data. The data profile for FIRRI’s Fisheries Data Warehouse includes
quarterly extractions of fisheries data and dimensional updates, and occasional monthly input of the
data. Therefore, in FIRRI, the Data Warehouse ETL will have a set of quarterly processing require-
ments, where changes and additions to source data will be extracted and processed through the system
quarterly.
4.3.1 Logical Models
Being the most important discipline, the Fish Biology and Ecology discipline was chosen for develop-
ment of a data mart that will eventually lead to an Enterprise wide Data Warehouse (EDW) for FIRRI.
The data mart models consists of five fact tables and eight dimensions, found in the fish catch dimen-
sional model (Figure 4.2), fish prey dimensional model (Figure 4.3), biology dimensional model (Figure
4.4), gonad dimensional model (Figure 4.5), and catch-length dimensional model (Figure 4.6). A given
fish species was taken as the grain of the catch fact table, while an individual fish specimen was taken
as the grain for the biology, gonad, catch-length and prey fact tables. The conformed dimensions are
date, geography, species, water body, catch type, project, and fishing gear dimensions.
4.3.2 Facts
The fact tables are Catch, Catch-Length, Biology, Prey, and Gonad tables. The Catch fact table stores
catch sample data. It comprises of the additive measures weight of fish, number of fish, number of boat
crew, and number of fishing gear (Figure 4.7). The catch-length fact table stores the length measure-
ments of the sampled catch, and comprises of the semi additive fact length (Figure 4.8). The Biology
20
Figure 4.2: Fish Catch Model
Figure 4.3: Fish Prey Model
21
Figure 4.4: Fish Biology Model
Figure 4.5: Fish Gonad Model
22
Figure 4.6: Fish Catch-Length Model
Figure 4.7: Fish Catch Fact
fact table stores biological facts about fish sampled, and comprises of: an additive fact fish weight;
semi additive fact total length; and the non additive fact serial number (Figure 4.9). The prey fact table
contains data about the prey ingested by the fish. It comprises of the additive facts predator-weight,
prey-weight, total food weight, total food count, prey count; semi additive fact predator total length;
and the non additive fact digestive state (Figure 4.10). The gonad fact table stores gonadal statistics.
It stores the semi-additive facts fish weight, number of gonads, gonadal weight, total length, number
of eggs counted, and a non additive fact serial number (Figure 4.11). The attribute SourceID has been
included in all fact tables to point their data to the source of data extraction.
23
Figure 4.8: Fish Catch-Length Fact
Figure 4.9: Fish Biology Fact
Figure 4.10: Fish Prey Fact
24
Figure 4.11: Fish Gonad Fact
Figure 4.12: Date Dimension
4.3.3 Dimensions
Eight dimensions were identified among the five fact tables. The dimensions are Date, Water Body,
Catch Type, Species, Fishing Gear, Sex-Maturity, Prey Type, and Length dimensions.
Date Dimension
This dimension contains attributes that detail the time the data was collected. It has the levels Year, Half,
Quarter Month, and Date (Figure 4.12). The Month level has the attributes Month and MonthName.
The attribute SourceID points to the data source.
Water Body Dimension
The dimension contains attributes about the water body where the fish was caught. It includes attributes
such as: waterbody type, waterbody name, zone, station, location, fishing area (Figure 4.13). The
attribute SourceID points to the source of data extraction.
Catch Type Dimension
This dimension contains attributes detailing the nature of the sample, or catch type, and the fishing
25
Figure 4.13: Water Body Dimension
Figure 4.14: Catch Type Dimension
conditions prevailing at the time when the fish was caught. It includes the attributes catchtype, fish-
ing time, season, and moon (Figure 4.14). The attribute SourceID ties the data to its extraction source.
Species Dimension
Contains attributes detailing the hierarchy and levels in the nomenclature of a given fish species. It
includes the scientific names, abbreviations of scientific names, and common names of attributes such
as Kingdom, Phylum, Genus, Family, Order, Class and Species (Figure 4.15). The attribute SourceID
points to the source of the data.
Fishing Gear Dimension
Contains attributes detailing the characteristics of the fishing gear used to catch the fish sampled. At-
tributes such as geartype, size, fleet, ply and operation are included (Figure 4.16). The attribute Sour-
ceID points the data to the source of extraction.
SexMaturity Dimension
Contains attributes detailing the sexual characteristics and the maturity state of the fish sampled. At-
tributes such as sex, maturity, gonad state, fat content, and stomach fullness are included (Figure 4.17).
The attribute SourceID ties the data entry to its origin or extraction source.
Prey Type Dimension
Contains attributes that depict the type of prey eaten by a given fish. The attributes PreyNameShort
26
Figure 4.15: Species Dimension
Figure 4.16: Fishing Gear Dimension
Figure 4.17: Sex and Maturity Dimension
27
Figure 4.18: Prey Type Dimension
Figure 4.19: Length Dimension
and PreyName are included (Figure 4.18). The attribute SourceID points to the data source.
Length Dimension
Contains the attribute detailing the length sizes of fish sampled. It has the attribute length (Figure 4.19).
The attribute SourceID ties the data to its extraction source.
4.4 System Development
This section entails the physical and data staging design, and development of the system. Owing to the
need for the data warehouse system to run smoothly within the current software environment in FIRRI,
Microsoft SQL server 2005 was chosen as the database to be used for the initial development of the
data warehousing system as it was the only readily available database technology to the researcher.
Microsoft SQL server 2005 has the SQL Server Management Studio and SQL Business Intelligence
Development Studio that were used for the development of the databases and ETL tools, respectively.
In the Studios are the SQL Server 2005 Integration Services (SSIS) and SQL Server 2005 Analysis
Services (SSAS). SSIS has a set of built-in tasks, containers, transformations, and data adapters that
may not require the writing of any lines of code during warehouse development. These SSIS features
were used during data warehouse development.
4.4.1 Database Development
The staging database and data warehouse, for use in the fisheries data warehouse were created using
SQL Server Management Studio. The staging area was divided into two for use in data transformation
and validation. The second part of the staging area is used to verify that the right transformations have
28
Figure 4.20: Excel Source Adapter Extraction
Figure 4.21: Flatfile Source Adapter
been carried out on the data before the data is loaded into the data warehouse. Dimensional tables were
then created in each of the databases.
4.4.2 Data Extraction, Transformation, and Load (ETL)
Data extraction and load tools were developed using SSIS found in SQL Business Intelligence Devel-
opment Studio. The extraction, transformation, and load of data into the warehouse were divided into
extraction of data from source tables into the staging database; data transformation and cleaning before
load into the data warehouse; data load into the warehouse; and development of cubes before deploy-
ment to the warehouse server. The ETL packages can connect to a wide variety of data sources. These
were developed in the form of packages within SSIS projects. When the different drivers are connected
to a package, the package can extract data from either flat files, Excel spreadsheets, XML documents,
or tables and views in relational databases. A package can connect to relational databases using .NET
and OLE DB providers, and to legacy databases using ODBC drivers. Figures 4.20 and 4.21 show data
flow tasks which have Excel, Flatfile, or databases as their source and destination systems.
Extracting data from the source systems into the Staging Area
Before extracting data from the source tables, each row of data in the source tables was assigned a
unique identifier (e.g. FecundityID in Figure 4.22), that was mapped to the SourceID column in the
staging database. This unique identifier tied the data to its source table or file. In this way each data
entry in the warehouse can be traced back to the source table, file or folder.
The first step in creating the packages was to create an Integration Services project. The project included
templates for objects (data sources, data source views, and packages) used in the data transformation
solution. Connection managers, that connect packages to data sources and destinations, such as Excel
connection manager, OLE DB connection manager or Flat file connection manager were then added to
the package. After creating connection managers for source and destination data, Data Flow tasks were
added to the package. An example of a package that has data flow tasks added to the control flow is
29
Figure 4.22: Unique Identifiers for Rows of Data
Figure 4.23: Dimension and Fact Table Load
presented in Figure 4.23. The Data Flow tasks encapsulate data flow engines that move data between
sources and destinations, and provide the functionality for transforming, cleaning, and modifying data
as it is moved. Most of the extract, transform, and load (ETL) processes occur in the Data Flow tasks.
Source and destination adapters that point to source and destination tables were then defined, with
a connector joining the two as shown in Figures 4.20 and 4.21. The data flows between abstracted
sources and destinations that do not contain connectivity information, but instead contain references to
connection managers (e.g. localhost.BiodiversityStaging, Excel Connection Manager, Prey Connec-
tion Manager) that define physically where the data sources and destinations are. The data-flows for
extracting the data from the source systems and populating the dimension and fact tables in the staging
area are similar. To ensure high-speed data copying, transformations were not designed to be performed
on the data while it is moving from the source file to the staging destination table. Packages and used
30
Figure 4.24: Foreach Loop Containers
to populate the dimension and fact tables were developed, as outlined above, for each of the dimension
models.
The package that populates the Biology dimension model is designed to demonstrate the ability of a
package to iterate through any number of files in a folder and extract data from multiple file sources. It
uses the Foreach Loop container (Figure 4.24). When the package is run, the Foreach Loop Container
iterates through a collection of files in a folder. Each time a file is found that matches the set criteria,
the Foreach Loop Container updates a variable with the file name. This causes the connection manager
to connect to a different file, and the data flow task processes a different data set and loads it into the
staging area.
To enable centralised extraction of data from source systems, a package that enables centralised running
/ execution of all the packages that extract data from source systems into the staging area was then
created (Figure 4.25). At runtime, SQL queries that truncate the dimension and fact tables are executed
first using Execute SQL tasks (Figure 4.26). Package execution tasks, tasks that execute packages that
populate the different dimensional models, then execute next (Figure 4.25).
Data Cleaning and Transformation
Before being loaded into the data warehouse, data extracted from the multiple files was cleaned or trans-
formed using built-in transformations contained in SSIS. Surrogate keys for the fact tables are generated
and assigned before the data is loaded into the warehouse. The control flow for the data cleaning and
surrogate key generation is presented in Figure 4.27. At runtime, before the tasks that clean the dimen-
31
Figure 4.25: Centralised Running / Execution of Packages
Figure 4.26: Execute SQL Task Editor
32
Figure 4.27: Data Cleaning and Surrogate Key Generation Control Flow
Figure 4.28: Cleaning the Dimension-Data Flow and Generating Surrogate Keys
sion tables and asign surrogate keys are executed, an SQL task is used to find the prevailing maximum
dimension / surrogate key. The tasks passes on the maximum surrogate key as a variable to the data flow.
In the dimension load tasks, the fuzzy look up is used to lookup data rows with spelling mistakes
and correct such mistakes (Figure 4.28). The fuzzy lookup adapter looks up the correct spellings in a
reference table and replaces the column entry that has the wrong spelling or some missing data entry
(Figures 4.29 and 4.30). After correction of mistakes, the data flow is passed through a slowly chang-
ing dimension editor that compares data sets in the data flow with those in the destination table (Figure
4.28). New data rows are passed through while rows with changes are passed to an OLE command
adapter that updates the changing entry in the destination table.
33
Figure 4.29: Fuzzy Lookup
Figure 4.30: Correcting Spelling Mistakes and Adding Missing Data Entries
34
Figure 4.31: Sorting The Data Flows
For new data sets, a dimension key is generated. The source data is split up into two by a ”multicast”
adapter (Figure 4.28). One side only does a ”sort” (Figure 4.31) to prepare for the ”merge join”. The
other path first sorts and removes rows with duplicate sort values, then the script component is used
to generate and assign the surrogate keys (Figure 4.32). The maximum dimension key value, that was
passed on to the flow as a variable, is incremented by the script component for every row that passes
through. This then adds a surrogate key value to the data flow. The two data flows are then inner joined
on the sort keys using the merge join transformation (Figure 4.33) resulting in an updated data flow
with new surrogate keys. The data flow from the source is then mapped onto the destination dimension
table (Figure 4.34). This is done for all dimension tables.
In the fact table-load tasks, the conditional split adapter is used to lookup rows with missing column
entries and pass them over into an error table for further management and possible cleaning or addition
of missing data (Figure 4.35). Before the bad data is loaded to the error table, the audit transformation
is used to add information about the task and package where the error was detected so as to enable
corrections. Good data is passed on to a slowly changing dimension editor that compares the data flow
with the destination fact table. Any data that has the same unique ID as a data entry in the destination
table is not passed through, while that with changes is passed over to an OLE DB command for update
of the corresponding entry in the destination table.
New data flows are passed on to lookup transformations that lookup surrogate keys in the dimension
tables and assign the surrogate keys to the corresponding foreign keys in the fact tables (Figure 4.36)
before the data is inserted into the destination fact table. In the event of an error, the error is passed
35
Figure 4.32: Surrogate Key Generation
36
Figure 4.33: Inner Joining Two Data Flows
Figure 4.34: Mapping Source Data to The Destination Table
37
Figure 4.35: Fact-Data Cleaning and Transformation Data Flow Task
over to an error flow, and audit information added to it. A union join is used to join all error data flows
before the data flow that has errors is inserted into an error table (Figure 4.35). This process is the same
for all the fact tables.
Loading Data into the Warehouse
The package developed for loading data into the warehouse is presented in Figure 4.37. Whereas the
data flows loading the warehouse dimensions (Figure 4.38) are first passed to a sort editor that removes
any duplicated rows of data before being passed on to a slowly changing dimension adapter, flows that
populate the fact tables (Figure 4.39) are not first passed through a sort adapter before being passed on
to a slowly changing dimension adapter. In both cases the slowly changing dimension transformation
adapter compares incoming data with that in the warehouse. If the unique identifier of a row of incoming
data matches that for data in the destination table, and no changes have been effected on the column
entries, the row is not passed through. If the unique identifier matches that for data in the destination
table, but one or all of the columns in the incoming data has a change, the row flow is directed to an
OLE DB Command adapter that changes and updates the concerned row in the destination table. If the
unique identifier of the incoming data flow has no match in the destination table, then the incoming row
data is passed on as a new data output, and the destination adapter inserts the data as new data in the
warehouse table. This process is replicated for all fact and dimension tables.
4.4.3 Analysis Cubes
The analysis cubes were developed using SQL Server Analysis Services (SSAS). Analysis Services is
a middle-tier server for online analytical processing (OLAP) and data mining. The Analysis Services
38
Figure 4.36: Surrogate Key Assignment
Figure 4.37: Loading Data into the Warehouse
Figure 4.38: Example of Warehouse Dimension Table Data Flow Task
39
Figure 4.39: Example of Warehouse Fact Table Data Flow Task
system includes a server that manages multidimensional cubes of data for analysis and provides rapid
client access to cube information. Analysis Server is the server component of Analysis Services that
is specifically designed to create and maintain multidimensional data structures and provide multidi-
mensional data in response to client queries. The structure of the multidimensional cubes developed,
showing the data view of all fact and dimension tables, is presented in Figure 4.40.
4.4.4 Enduser Application
Endusers will access the data warehouse through Microsoft Excel. Microsoft Excel was chosen because
most of the endusers in FIRRI are already well versed with the use of Excel, and because Excel has an
addin, Excel addin for Analysis Services that works well with SQL Server Analysis Services Cubes.
The Microsoft Office Excel Add-in for SQL Server Analysis Services provides analysis capabilities and
flexible reporting for data imported into Excel from Analysis Services cubes. By invoking the add-in
from within Excel, the enduser can import data from the Analysis Services cubes, use Analysis Services
techniques to analyze the data, and then leveraging their existing Excel skills use Excel functionality to
manipulate and present the data in reports. From within Excel, the endusers can use Excel formatting
and calculation features, combine data from multiple dimensions, use drillthrough to see source data,
drill up and drill down, expand and collapse, isolate and eliminate, and pivot the data.
The enduser has the option to use the pivot table to generate a report like the one presented in Figure
4.41, or use the Cube analysis addin to create a report such as that presented in Figure 4.42. The enduser
can then format the report and produce colourful charts as that presented in Figure 4.42.
4.4.5 System Validation
This section entails the results of the different mechanisms put in place to validate the data warehouse
system developed. The system can extract data from multiple sources and centralise them into one loca-
tion as planned. This is evidenced by the availability of data from all source systems being found in the
warehouse. Comparison of the unique identifiers for the data in the source systems and that in the data
warehouse shows they are the same. Additional data warehouse verification by listing actual rows from
randomly selected tables in both source (Figure 4.22) and data warehouse (Figure 4.43) also show an
40
Figure 4.40: Structure of Analysis Data Cube
41
Figure 4.41: Length Frequency Distribution ofOreochromis niloticus
42
Figure 4.42: Maximum Weight of Selected Fish Species Across 4 Quarters
43
Figure 4.43: Check of Rows Written to the Data Warehouse
exact match, further confirmation that the Data Warehouse is functioning as required of it. The unique
identifier, referred to as FecundityID in the source system, is similar to that in the data warehouse where
it is referred to as the SourceID.
SSIS has a progress log that shows the time the execution of a package was started and ended, the
data source and its destination, the number of rows in the source table, the number of rows written
(extracted successfully) (Figure 4.44). This inbuilt mechanism validates the execution of the package.
In addition, there is colour coding of the adapters and transformations in the package as it is executing.
Yellow indicates the package is executing, while red shows that there is an error in the package and
therefore execution was not successful. The green colour is coded when the system execution has been
successful, and the number of rows written is also indicated on the connector between the source and
destination (Figure 4.45). Examination of the system logs show that the number of records in the source
systems are an exact match with those in the data warehouse. Equality of these counts is an indication
that records were not left out due to an error during the ETL or simple load process. This was further
verified by the lack of errors (not necessarily warnings) in the exception reporting by the ETL tool.
44
Figure 4.44: System Validation
45
Figure 4.45: Rows Written During Data Load
4.5 Conclusions, Limitations, and Future Work
4.5.1 Conclusions
This project focused on verifying whether designing and implementating a data warehousing system
in FIRRI would bring about centralisation of storage and retrieval of fisheries data and information.
Results show through a data warehouse can greatly improve the storage, retrieval, and dissemination of
fishries information. Given that the data warehouse enables aggregation of data and information from
different source systems, enables easy execution of complex queries, and enables real time dissemina-
tion and retrieval of data and information, one cannot underestimate its power in effecting positively
affecting the operations of the fisheries sector. It provides evidence that centralised data storage, infor-
mation retrieval, and reporting in FIRRI is both possible and attainable.
4.5.2 Limitations
In the course of this study, a number of problems were encountered. Time and financial constraints were
a major problem faced. Owing to the limitations in the time frame and financial limitations, it was not
possible to develop a web interface for the warehouse as a preferred software for use, webintelligence,
could not be procured.
46
4.5.3 Future Work
The data warehouse provides a mechanism of extracting data from any type of database management
system on any networked information system to the extent of facilitating data transmission and ex-
change on the Internet. This is vital for the dissemination of information to fisheries stakeholders who
are not within FIRRI. Since in this study it was only possible to design and develope a data mart, future
work should therefore focus on developing an Enterprise Wide Datawarehouse that can be accessed by
endusers even via the World Wide Web.
47
References
1. Atkinson E., 2001. Data Warehousing - A Boat records managers should not miss.Records
Management Journal, Vol. 11, No. 1. pp 35 - 43.
2. Bose R., 2006. Understanding Management Data Systems for Enterprise Performance Manage-
ment.Industrial Management and Data Systems Vol. 106 No. 1., pp. 43-59.
3. Chan S. S., 1999. The Impact of Technology on Users and the Workplace.New Directions for
Institutional Research. Volume 1999, Issue 103. pp 3 - 21
4. Cho V. and Ngai E.W.T., 2003. Data Mining for Selection of Insurance Sales Agents.Expert
Systems. Vol. 20, No. 3.
5. Connoly and Begg.
6. Devlin B., 1997. Data Warehouse: From Architecture to Implementation,Addison-Wesley, Read-
ing, MA
7. FIRRI, 1999. Annual Report 1998 - 1999
8. FIRRI, 2000. Annual Report 1999 - 2000
9. FIRRI, 2001. Annual Report 2000 -2001
10. FIRRI, 2002. Annual Report 2001 - 2002
11. FIRRI, 2003. Annual Report 2002 - 2003
12. FIRRI, 2004. Annual Report 2003 - 2004
13. FIRRI, 2005. Annual Report 2004 - 2005
14. FIRRI, 2006. FIRRI Profile. www.firi.go.ug (Retrieved on 13/13/2006).
15. Fox R., 2004. Moving from data to information OCLC Systems and Services:International
Digital Library Perspectives Volume 20 Number 3 pp 96-101
16. Government of British Columbia, 2006. FishInfo BC. (Retrieved on 7/4/2006).
17. Han J. and Kamber M., 2001.Data Mining: Concepts and Techniques. San Diego: Morgan
Kaufman.
18. Inmon W.H., 1993.Building the Data Warehouse, A WileyQED publication, John Wiley and
Sons, Inc. NewYork 123-133
19. Kakinda M. F., 2000. Introduction to social research.
20. Kimball R. and Ross M., 1996.The Data Warehouse Toolkit: The Complete Guide to Dimen-
sional Modeling.
21. Krishna P. R. and Kumar S.D., 2001. A Fuzzy Approach to Build an Intelligent Data Warehouse.
Journal of Intelligent and Fuzzy Systems 11 (2001) 23-32
22. Kupca V., 2004. A standardized database for fisheries data. Marine Research Institute,Reykjav(́i)k,
48
Iceland. CM 2004/FF:15 (Retrieved from on 20/03/2006)
23. Lee S. M. and Hong S., 2004. Impact of Data Warehousing on Organizational Performance of
Retailing Firms.International Journal of Information Technology and Decision Making Vol. 3,
No. 1. pp. 61-79.
24. Ma C., Chou D.C., Yen D.C., 2000. Data Warehousing, Technology Assessment and Manage-
ment.Industrial Management and Data Systems Volume 100 No. 3 pp. 125 - 135
25. Mackinnon M. J. and Glick N., 1999. Data Mining and Knowledge discovery in Databases - An
Overview.Australian and New Zealand Journal of Statistics. 41(3). pp 255 - 275.
26. Mahadik H., 2002. Mumbai:India infoline Ltd. Retrieved on 20/03/2006 from
http:www.infoline.com/worked%20on%20reasearch/Knowledgement%20management.htm
27. Mundy J., 2002. Relating to OLAP,Intelligent Enterprise, Vol. 5 No. 16. pp. 20 - 22.
28. NARO / FIRRI, 2001.Aquatic and Fisheries Survey of The Upper Victoria Nile. Final Report,
January 2001. Prepared for AES Nile Power.
29. NetCoast, 2001. A Guide to Integrated Coastal Zone Management: Simulation models and mod-
elling systems related to Integrated Coastal Zone Management (Retrieved from
http://www.netcoast.nl/tools/rikz/COASTBASE.htm on 25/03/2006).
30. O’Leary D.E. 1999. REAL-D: A Schema for Data Warehouses.Journal of Information Systems
Vol, 13. No, I pp. 49-62.
31. Olsen D.M. and Marie, D.M., 2004. Cross -Sectional Study design and Data analysis. Retrieved
on 26/03/2006 from, http://www.collegeboard.com/prod-downloads/yes/
4297 MODULE 05.pdfsearch=’cross%20sectional%20study’
32. Pipe P., 1997. The Data Mart: A New Approach to Data Warehousing.International Review of
Law Computers and Technology, Volume 11, No. 2, pp 251-261.
33. Poe V., Kauer P., Brobst S., 1998.Building a data warehouse for decision support. 2nd Edition,
Prentice Hall PTR, 1998.
34. Rees T. and Finney K., 2000. Biological data and metadata initiatives at CSIRO Marine Research,
Australia, with implications for the design of OBIS.Oceanography Vol. 13 No. 3. pp 60 - 65.
35. Roland P., Leonard E.H., T 2005. Data Warehousing: An Aid to Decision-Making.THE Journal,
Apr 2005, Vol. 32, Issue 9
36. Scottish Executive Publications, 2006. Chief Statistician’s Annual Report 2005. (Retrieved on
April 7, 2006).
37. Sharma S. D., Singh R. and Anil Rai A., 2006.Integrated National Agricultural Resources
Information System. GISdevelopment.net. (Retrieved 26/03/2006)
38. Singh H., 1998. Data warehousing: concepts, technologies, implementations and management,
Prentice Hall PTR, 1998.
49
39. Sugiyama S., 2005. Information Requirements for Policy Development, Decision-making and
Responsible Fisheries Management: What Data Should Be Collected?SPC Women in Fisheries
Information Bulletin No. 15. pp 24 - 29.
40. Thefreedictionary, 2006. http://www.thefreedictionary.com/Fisheries (Retrieved
on 26/03/2006)
41. Theodoratos D. and Sellis T., 1999. ”Designing data warehouses”,Data and Knowledge Engi-
neering, Vol. 31 No. 3, pp. 279-301.
42. Velasquez D. J., Weber R. H., Yasuda H., and Aoki T., 2005. Acquisition and Maintenance
of Knowledge for Online Navigation SuggestionsIEICE TRANS. INF. and SYST., VOL.E88-D,
NO.5. pp 993 -1003.
43. Zeng Y., Chiang R.H.L., and Yen D.C., 2003. Enterprise Integration with Advanced Information
Technologies: ERP and data warehousing.Information Management and Computer Security,
Vol. 11/3 pp 115-122..
50
Appendix 1: Lead Scientists’ Questionnaire
(i). INTRODUCTION
I am carrying out a study on the use of data warehousing in fisheries. This interview is aimed at finding
out the kind of information you would want out of the warehouse and the the way it should be formatted
and presented.
(ii). RESPONSIBILITIES
• Describe FIRRI and its relationship to the rest of the fisheries sector.
• What are your primary responsibilities?
(iii). RESEARCH OBJECTIVES AND ISSUES
• What are the objectives of FIRRI? What are its top priority research goals?
• What functions and departments within FIRRI are most crucial to ensuring that these key success
factors are achieved? What role do they play? How do they work together to ensure success?
• What are the key research issues you face today? Is there anything that prevents you from meeting
your research objectives?
• Where does FIRRI stand in the use of information technology?
(iv). ANALYSES REQUIREMENTS
• What role does data analysis play in decisions made and by fisheries managers?
• What key information is required to make or support the decisions you make in the process of
achieving your goals and overcoming obstacles? How do you get this information today?
• Is there other information which is not available to you today that you believe would have signif-
icant impact on helping meet your goals?
• Which reports do you currently use? What data on the report is important? How do you use the
information? If the report were dynamic, what would the report do differently?
• What analytic capabilities would you like to have?
Thank You
51
Appendix 2: Information System Audit Questionnaire
(i). INTRODUCTION
I am carrying out a study on the use of data warehousing in fisheries. This interview is aimed at finding
out the kind of information you would want out of the warehouse and the way it should be formatted
and presented.
(ii). RESPONSIBILITIES
• Describe FIRRI and its relationship to the rest of the fisheries sector.
• What are its primary responsibilities?
• Which interest groups does it support?
(iii). USER SUPPORT / ANALYSES AND DATA REQUIREMENTS
• What is the current process used to disseminate information?
• What tools are used to access/analyse information today? Who uses them?
• Are you asked to perform routine analyses? Do you create standardised reports?
• Describe typical ad hoc requests. How long does it take to fulfil these requests?
• What is the technical and analytical sophistication of the users?
• What is the biggest bottleneck/issue with the current data access process?
(iv). DATA AVAILABILITY AND QUALITY
• Which source systems are used for frequently-requested information?
• How often is the data updated? Availability following update?
• How much history is available?
• What are the known bottlenecks in current source systems?
• Do you currently have common source files? Who maintains the source files?
• How are changes captured?
• What else should I know about FIRRI and its information systems?
• What must this project accomplish to be deemed successful?
Thank You
52
Appendix 3: End-User Questionnaire
(i). INTRODUCTION
I am carrying out a study on the use of data warehousing in fisheries. This interview is aimed at finding
out the kind of information you would want out of the warehouse and the way it should be formatted
and presented.
(ii). RESPONSIBILITIES
• Describe FIRRI and its relationship to the rest of the fisheries sector.
• What are your primary responsibilities?
(iii). RESEARCH OBJECTIVES AND ISSUES
• What are the objectives of FIRRI? What are its top priority research goals?
• What are the key research issues you face today?
• Describe your research disciplines. How do you distinguish between research disciplines? How
do you categorise research disciplines?
(iv). ANALYSES REQUIREMENTS
• What type of routine analysis do you currently perform? What data is used? How do you cur-
rently get the data?
• What do you do with the information once you get it?
• What analysis would you like to perform? Are there potential improvements to your current
method/process?
• Which reports do you currently use? What data on the report is important? How do you use the
information? If the report were dynamic, what would the report do differently?
• What analytic capabilities would you like to have?
• Are there specific bottlenecks to getting at information?
• How much historical information is required?
What must this project accomplish to be deemed successful?
Thank You
53