school of ict, griffith university - common data …bernus/publications/articles/... · web...

31
Common Data Definitions as a Basis of Business to Business Communication Introduction B2B systems need common definitions of business data to be exchanged in B2B transactions. Many efforts have been made to establish common definitions in the past. In 1974, a US general accounting office report to the Congress emphasised the need for government efforts to standardise data elements for computer systems (US General Account Office 1974, pp.67-69). Yet, 27 years after the US general accounting report was submitted to the Congress, we are still facing the same problem. Federal Information Processing Standards Task Group has been organised to study data dictionaries (US Department of Commerce 1997, p.1). This group is surveying the data dictionaries that are operational within the government since 1997, but has not been able to develop a methodology for developing data dictionaries that do not contain any ambiguity. IBM tried developing corporate data dictionary in 1970 and 1980, but failed. Problem of establishing common B2B definitions is similar to, but even harder, than what has been experienced in the past. The reason for this is that many stakeholders are involved in B2B transactions. Because of variances in their background it is difficult to get them to agree on common definitions. Most attempts to establish common definitions have failed. Manufacturing industry uses Standard for Exchange of Product Model Data (ISO10303) to establish common definitions. The objective of the Standard for Exchange of Product Model Data (STEP) is to provide a complete, unambiguous, computer interpretable definition of the physical and functional characteristics of a product, throughout its life cycle. Success has been achieved in STEP by taking context of the product definitions into account. Most attempts to establish common definitions have not taken context of data definitions into account. By taking context of data definitions in account, we expect that we can achieve agreement on data definitions that are used for business to business communications. The following sections of this article will review the efforts that have been made to establish common definitions, identify potential techniques that can used to establish data definitions and propose a methodology to establish common data definitions.

Upload: others

Post on 02-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

Common Data Definitions as a Basis of Business to Business Communication

Introduction

B2B systems need common definitions of business data to be exchanged in B2B transactions. Many efforts have been made to establish common definitions in the past. In 1974, a US general accounting office report to the Congress emphasised the need for government efforts to standardise data elements for computer systems (US General Account Office 1974, pp.67-69). Yet, 27 years after the US general accounting report was submitted to the Congress, we are still facing the same problem. Federal Information Processing Standards Task Group has been organised to study data dictionaries (US Department of Commerce 1997, p.1). This group is surveying the data dictionaries that are operational within the government since 1997, but has not been able to develop a methodology for developing data dictionaries that do not contain any ambiguity. IBM tried developing corporate data dictionary in 1970 and 1980, but failed. Problem of establishing common B2B definitions is similar to, but even harder, than what has been experienced in the past. The reason for this is that many stakeholders are involved in B2B transactions. Because of variances in their background it is difficult to get them to agree on common definitions.

Most attempts to establish common definitions have failed. Manufacturing industry uses Standard for Exchange of Product Model Data (ISO10303) to establish common definitions. The objective of the Standard for Exchange of Product Model Data (STEP) is to provide a complete, unambiguous, computer interpretable definition of the physical and functional characteristics of a product, throughout its life cycle. Success has been achieved in STEP by taking context of the product definitions into account. Most attempts to establish common definitions have not taken context of data definitions into account. By taking context of data definitions in account, we expect that we can achieve agreement on data definitions that are used for business to business communications.

The following sections of this article will review the efforts that have been made to establish common definitions, identify potential techniques that can used to establish data definitions and propose a methodology to establish common data definitions.

Analysis of the problem

This section will review the work done in different areas that can be helpful in establishing common definitions.

‘Situations semantics theory’ states that talking, listening, reading and writing are the activities pertaining to language (Barwise and Perry 1983). According to Barwise and Perry (1983), these activities are situated i.e. they occur in situations and they are about situations. “When uttered at different times by different speakers, a statement can convey different information to a hearer and hence can have different meanings” (Barwise and Perry 1983). According to Austin, communication depends on the context-providing situation - the speaker, the addressee, the time and place of the utterance, and the expression uttered. Since speakers are always in different situations, having different causal connections to the world and different information, the information conveyed by an utterance will be relative to its speaker and hearer” (Austin 1961, pp.117-133). Therefore, to define common definitions in an unambiguous manner we should not only define what the definition means, we should also define the situational elements associated with the definition (i.e. context in which the data elements are used, background of the user using the definitions etc.).

Page 2: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

Standard for Exchange of Product Model Data- (ISO10303) is an international standard for computer-interpretable representation and exchange of product data. The information generated about a product during its design, manufacture, use, maintenance and disposal is used for many purposes during its life cycle. The use may involve many computer systems, including some that may be located in different organisations. In order to support such uses, organisations need to be able to represent their product information in a common computer-interpretable form that is complete and consistent when exchanged among different computer systems. The objective of Standard for Exchange of Product Model Data (STEP) is to provide a mechanism that is capable of describing product data throughout the life cycle of a product (from analysis, design, manufacture, quality control testing, inspection and product support functions) in an unambiguous form. A key concept in the STEP methodology is that it does not try to model the complete information related to a product in a big step. It clearly identifies the scope of the activities that it will incorporate in one step using an activity model. Even though the information produced in one step/cycle does not model the complete product, it can be used for communication between various stakeholders. The scope of product information modeled using the STEP methodology can be increased at any time by adding extra activities to the activity model. As outlined in the previous section, STEP methodology specifies the context in which the data elements can occur. STEP methodology specifies the product information using a formal data specification language EXPRESS. It should be noted that an English dictionary follows the same concept. A dictionary defines each word based on the context in which the word occurs. From the success achieved by the STEP methodology we can state that, to establish common definitions we need to come up with a methodology similar to STEP.

“An enterprise data model is a consistent definition of all the data elements common to the business. These data elements might range from a high-level business view to a generic logical data design including links to the physical data designs of individual applications” (IBM 2000). Because the enterprise data model represents the data for the entire organisation, when an enterprise data model is developed a standard definition must be agreed for every data item. According to Sperley (1999, p.18), an enterprise model should be implemented in steps and each step should produce a deliverable that brings value to the organisation. We can apply a similar argument for development of “common” definitions. If we want build a maintainable definitions, we should not try to model the complete data dictionary in a single step. We should try to break its development in cycles. Each cycle should produce a deliverable that is of value to the organisation.

The German Savings Bank Organisation (GSBO) is the biggest banking organisation in the world consisting of more than 600 savings banks, 13 state banks and a number of partners. During exchange of information between different branches within GSBO there were problems like non-compatible system architectures and different terminology for the same data item (Krahl & Kittlaus, p.667). GSBO established a large enterprise-wide data model known as SIZ Banking data model (SIZ), as a standard for IT departments within different branches. The objectives for the development of the SIZ model were as follows: Consistent interpretations of data definitions Minimisation of data redundancy Flexible data structuresThe basic structure of the SIZ data model is based upon IBM’s Financial Services Data Model (FSDM) philosophy. The model is divided into three levels: Level-A, Level-B and Level-C.

The SIZ data model illustrates the principle of abstraction. It focuses on agreement of common definitions at an industry level. These standard definitions are documented as Level-A. Once this agreement is achieved, SIZ focuses on formation of enterprise wide Level-B. This is achieved by agreement on definitions within different departments of an organisation. “Because Level-B does not contain details specific to the individual projects carried by different departments, it is possible to gain common agreement on data definitions at this level” (Krahl & Kittlaus, p.679). Therefore, by breaking the model into different levels of granularity, the SIZ model provides a mechanism by which common agreement can be achieved amongst the involved parties. Using the SIZ model, pilot projects were performed and Information Technology (IT) centres of different GSBO branches designed database models conforming to the SIZ data model. It was observed that because of the agreement on conceptual level, unified data model agreement on structures on almost 70% Level-C data elements was achieved (Krahl & Kittlaus, p.680). Definitions created according to the concepts of the SIZ method will be classified as industry wide terminologies or data definitions specific to concepts relating to a group of people.

Page 3: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

RosettaNet is a consortium of more than four hundred of the world's leading Information Technology (IT), Electronic Components (EC) and Semiconductor Manufacturing (SM) companies working to create, implement and promote open e-business process standards (Yen 2000, p.2). The objective of RosettaNet is to establish common definitions for schema structures so that the RosettaNet users in particular domains can exchange data unambiguously. For achieving its objective RosettaNet is developing e-business Partner Interface Process (PIP) specifications that prescribe how applications should inter-operate to execute collaborative business process. These specifications include dictionaries for technical specifications and business properties. To develop business dictionaries RosettaNet specifies e-business process model. RosettaNet defines Business Operational View model (BOV) that specifies a standard set of business data entities as UML class diagrams. To create an organisation specific BOV RosettaNet workshops are conducted and brainstorming sessions between RosettaNet experts and business experts are performed. Once BOV is formed, a Functional Service View (FSV) is created. FSV captures the syntax and semantics of the business actions and their flow (exchange) between network components that provide business services. The flow of business actions is represented using flow diagrams (similar to flow charts). Finally, an Implementation Framework View is formed. Implementation Framework View consists of UML class diagrams representing the business entities and the associations between them. If we want to develop a methodology to create maintainable business definitions for certain business activities, we need to capture the dynamics by modelling business activities taking place in an organisation.

From the work done in other fields we can state that:

To establish common definitions we need to scope the terms on which common agreement needs to be obtained

It is hard (impossible) to create a complete data dictionary that defines all the data definitions pertaining a concept (takes long time, data definitions change, difficult to create a group of stakeholders who have all information and can agree) - no 'big bang' approach seems possible.

Interpretation of definitions is context dependent. To develop common definitions we need to understand and represent the context (fix contextual elements both for developers and users of the data definitions- skill/knowledge/standards/intended use)

We need selection of appropriate languages, tools and standard reference models to establish common definitions

Candidate methods and technologies to be used in a methodology

Existing methods and technologies can be used for establishing common data definitions. Some of these methods are discussed in the following section:

Integration Definition for Function Modelling (IDEF0) is "a modelling technique based on combined graphics and text that are presented in an organised and systematic way to gain understanding, support analysis, provide logic for potential changes, specify requirements, or support systems level design and integration activities" (IDEF 2001). IDEF0 models information based on activities (functions) and ICOMs. ICOM represents - Inputs (the data or objects to be transformed by the function), Outputs (the data or objects transformed by the function), Controls (the conditions required to produce the correct output) and Mechanisms (the means used to perform a function). According to Feldmann (1998, p.25), one of the major advantages of IDEF0 diagrams is its classification of four types of interfaces and the association of each interface with a different side of a function box. Feldmann believes that this classification and designation enhances the readability of a graphical model. The US AIR force department supports this argument. US AIR Force Department use IDEF0 diagrams for functional modelling (IDEF 2001). To establish common agreement on business definitions, we want to represent common definitions pertaining to a business, in a manner that is readable and understandable by people who do not necessarily have a very good knowledge of information modelling. Because of the readability and understandability of IDEF0 models, we can use them to represent the enterprise information. Another advantage of IDEF0 diagram is that as compared to text, it represents the information in a concise manner. This is evident by the experiments done by Doug Ross on ICAM and

Page 4: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

ITT projects (1977, p.34). The ratio of ITT and ICAM projects was between ten and fourteen pages of text for each page of IDEF0 model containing equivalent information. This is an advantage when we want to model information about common definitions, because we want to represent information in a concise manner.

UML represents a collection of the best engineering practices that have proven successful in the modelling of large and complex systems. Two UML techniques that can be used for establishing common definitions are - use case diagram and class diagram. A use case diagram shows a set of use cases, actors and their relationships (Booch, Jacobson & Rumbach 1999, p.25). Use case is used to define the scope of the system. “Use case model shows what a system does, but it does not show how the system does the that” (Abel 2001). According to Bernus (2001), even though both use case diagram and IDEF0 diagram specify the scope of a system, both the models have their own advantages and disadvantages. According to Bernus (2001), use case diagrams are better than IDEF0 diagrams for decomposing a system based on the tasks performed by different individuals in an organisation. For established common data definitions we can employ use case diagram to get an insight of what activities are performed by which actor.

Class diagrams are used for visualising, specifying and documenting data models. According to Booch, Jacobson and Rumbach (1999, p.110), UML class diagram are a superset of the entity relationship diagram. Whereas entity relationship diagram focuses only on modelling data, the class diagram goes a step further by modelling the behaviour as well. Because UML is fast becoming industry standard (Abel 2001), UML class diagrams can be used to document relationship between the common data definitions.

The GRAI Grid focuses on the decisional aspects of the management of systems and enables to build models at a high level of globality (Doumeingts et al. 2000, p.245). It identifies the points where decisions are made in order to manage a system - the decision centres. It is a graphical technique that gives a hierarchical representation of the structure of the decision centres within a given organisation. It is represented by a table of rows and columns and is constructed through a top-down analysis approach (Al-Ahmari and Ridgway 1999, pp. 225-238). The decision centre of a GRAI grid can be decomposed into the functions that it carries out. These functions can be modelled through the IDEF0 language. As discussed earlier, it is not possible to model all the data elements of an organisation in one big step. To build a common agreement on data definitions, we need to come up with a technique that helps in scoping the data elements on which common agreements needs to be gained. GRAI grid encapsulates similar activities and bundles them in one cell of the grid. When developing a data dictionary we can use the GRAI grid to identify a set of activities that are tightly coupled (closely related) with each other and model them in one cycle of establishment of common definitions.

Extended Mark Up Language (XML) is a subset of ISO 8879, the Standard Generalised Markup Language (SGML). According to Graham and Quam (1999, pp.1-10), XML document is a sequence of characters that encode the text of the document (e.g. words in dictionary), plus logic structure of the document and meta-information related to the document structure. Like hypertext mark up language (HTML), XML is concerned with the definition and structure of the web pages. The key difference between XML and HTML is that, XML focuses on the meaning or the context of elements within the tags, rather than HTML, which is predominantly concerned with the layout, and cosmetic appearance of the web page. According to David Marco (2000, p.72), even though the number of possible tag types within the XML protocol is infinite in theory, the control required for proliferation of tag types is at the meta data level (in DTD). According to him (2000, p.73), proliferation of tag types at meta data level is inherently less complex and less costly activity, as compared to tag proliferation at data level. Biggar and Laurant (Biggar & Laucent 1999, p.154) support this point by stating that because DTD does not include the actual data, the task of understanding the structure of XML document becomes less complex. According to Marco (2000, p.73), XML is a true meta data tool since XML documents can be easily extended to include meta data fields that can be used to identify the person responsible for creating, maintaining, transmitting, receiving and processing a XML document. According to Martin (1999, p.46), the most important feature of XML is that because XML documents consist of individual pieces of information they can be reshaped and reassembled in many different ways and yet retain their underlying structure. XML enables the same information to be presented to different types of users, in different ways. This is achieved by separation of content and structure. Content will be included in the web page/document itself, written up in a relevant dialect of XML, and structure to be applied to this content is held separately in a style sheet, which is defined in an extensible style sheet language (XSL).

Page 5: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

This two-tier arrangement ensures that not only the XML protocol can be almost infinitely extensible to incorporate different content types, but also the “artificial” differences of interpretation can be avoided by allowing different views on the same content. If we document business definitions in a XML data dictionary, then this property of XML can be utilised to present different views of the same data dictionary to different data dictionary users. The representation of the view will depend upon the context in which the user needs to view the data definition. For example, a data dictionary can document all the definitions pertaining to an organisation. Using different XSL files, different users will be able to view different definitions in the data dictionary based on their context and their needs.

Methodology for developing a maintainable pool of data definitions

This section will discuss the methodology proposed to develop common definitions. The common definitions will be documented in a data dictionary. Following definitions will be used in the following sections. Cycle - The methodology presented in this article suggests that instead of developing the complete

data dictionary in one step, its development should be broken into iterations. An iteration of the data dictionary development is referred as a cycle or data dictionary cycle.

Data dictionary client/client - Organisation for whom the data dictionary is developed.

COMMON DEFINTION DEVELOPMENT METHODOLOGY

Develop Original Problem Statement - This activity is initiated as a result of a need for the development of the data dictionary. The activity involves development of the original problem statement (OPS). The OPS is formed after initial brainstorming sessions with the key stakeholders of the data dictionary client. The OPS captures the results of initial brainstorming session with the client and describes the background of the industry for which the data dictionary is being developed. Even though the OPS does not provide a complete understanding of the needs that the data dictionary is supposed to address, it is important that the data dictionary development starts with the development of the OPS. Because the OPS captures the results of the brainstorming sessions with the client, it serves as the basis for the development of the detailed requirement statement. If the data dictionary team does not document the results of the brainstorming session, then they may forget some of the requirements when developing the final requirement statement, and hence may miss some requirements in the implemented data dictionary.

Obtain Human Resources - data dictionary development cannot be completely automated - it will require human involvement. This activity involves procurement of human resources required for the development of the data dictionary. For example, if the organisation developing the data dictionary does not have any prior experience with data dictionary development, then they may hire an external consultant who guides them through the development of the data dictionary.

Obtain Technical Resources - This activity involves procurement of technical resources required for the development of the data dictionary. Examples of technical resources required for the development of data dictionary include: Data dictionary development tools (e.g. case tools required for developing artefacts that will be

produced during data dictionary development) Information on formal language (like XML) to specify the data dictionary Technical documents that will the guide the data dictionary team during data dictionary

development (e.g. standards required for developing artefacts that will be produced during data dictionary development)

Procure Industry Specific Definitions - The OPS may contain some terms that are specific to the industry for which the data dictionary is being developed. The team responsible for creating the data dictionary may not be familiar with these terms. This activity involves procurement of standard industry definitions so that the team developing the data dictionary understands the terms and jargons used by the client. For example, if the data dictionary team is creating a data dictionary for the supply chain industry, then the data dictionary client may use terms like - supply chain integration, retailer, supplier etc. If the data dictionary team is unfamiliar with these terms, then they need to acquire the

Page 6: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

meaning of these industry specific terms so that they can carry out the development of the data dictionary.

Approve Data Dictionary Notations - This activity involves approval of the notations that will be used when documenting various artefacts produced during the data dictionary development. One of the stages during the data dictionary development (discussed later in the article) is the development of the Information Diagram. The data dictionary team can agree on a notation that a circle represents a business event in the Information Diagram. The concept of notations has been borrowed from the concept of Unified Modelling Language (UML) notations and RosettaNet notations. Both UML and RosettaNet define a list of notations that will be followed when information is represented. The data dictionary methodology uses notations to represent the data dictionary information, because notations have been successfully used to model information in UML (Booch, Jacobson & Rumbach 1999b, p.43) and RosettaNet.

The use of notations in the data dictionary is similar to the use of the enterprise data model. Like an enterprise data model provides a consistent definition of all the data elements common to the business, the use of data dictionary notations will provide a consistent representation to represent different type of data stores and data elements used during the development of the data dictionary. The author(s) studied some sample data dictionaries. One of the problems identified in these data dictionaries was the absence of visual clues. A common set of notations will help in providing visual clues to the deliverable’s produced during the development of the data dictionary. These notations will serve as a palette of tools that the data dictionary author will be able to use for describing the meaning of the terms captured by the data dictionary. Consider the entity ‘Involved Party’. Lets assume that an ‘Involved Party’ can be an organisation or an individual with whom an organisation does transactions. If the entity ‘Involved Party’ is represented as a circle in one of the artefacts produced during data dictionary development, then we can refer to the notations to determine what a circle represents. If the circle represents an organisation, then we can assume that the term ‘Involved party’ is used in the context of an organisation.

Identify Ambiguities and Develop Assumptions -This activity involves identifying ambiguities in the OPS. For each identified ambiguity, the data dictionary team develops a list of assumptions (possible solutions from the perspective of the data dictionary team) and gets the client to validate these assumptions. Because the OPS specifies the requirements at a very high level, ambiguities will arise when the data dictionary team analyses it. Every time the data dictionary team encounters an ambiguity they can either leave further analysis of the requirement or come up with an assumption for that ambiguity. If requirements of the data dictionary are linked with each other, then encountering an ambiguity will mean stopping the requirement analysis process till the data dictionary team gets the ambiguity clarified by the data dictionary client. Data dictionary client may not be available frequently, to answer the queries of the data dictionary team. The data dictionary team can come up with assumptions for the identified ambiguity, document them and continue the analysis process. Once all the assumptions have been identified, the data dictionary team can get the data dictionary client to validate these assumptions. The client may be unaware of the solution of the ambiguities identified by the data dictionary team. In this case, the assumptions developed by the data dictionary team will provide the client with a list of possible solutions for the ambiguity.

Map Assumptions and Ambiguities - If the data dictionary team has identified many ambiguities in the OPS, then the data dictionary team can validate that they have developed an assumption for each of the identified ambiguities. This can be achieved by mapping the identified ambiguities and the assumptions developed for them. The result of the mapping can be captured in the following table (Table 1).

Ambiguity Assumption

Table 1 Ambiguity - Assumption Mapping Table

Page 7: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

Mapping ambiguities with assumptions ensures that the data dictionary team has developed an assumption for each of the identified ambiguity in the OPS i.e. the data dictionary team has not left any ambiguities (that were identified by them) in the OPS.

Develop and Validate Rewritten Problem Statement - Based on the identified assumptions and further brainstorming sessions with the client, the data dictionary team rewrites the problem statement and gets the client to validate the rewritten problem statement (RPS). RPS should describe the needs of the data dictionary client completely and unambiguously. RPS will be the basis for the development of the data dictionary. Because the RPS contains the data dictionary teams interpretation of the problem, it is very important that the data dictionary team gets the client to validate the RPS. Getting the RPS validated by the client will help in identifying any errors present in the requirements. According to Jalote (1997, p.77) the cost of fixing an error increases as the time progresses. That is, a requirements error if detected and removed after the system has been developed will cost a lot more to fix, as compared to removing it in the requirements phase. By getting the client to validate the RPS, the methodology tries to remove the potential errors inserted in the development of RPS, in the same phase in which it is inserted.

Map RPS, OPS and Assumptions - This activity involves mapping OPS, RPS and the assumptions. This aim of this activity is to demonstrate that the data dictionary team has not missed any OPS when developing the RPS. The result of this mapping can be captured in the following table (Table 2).

Original Problem

Statement

Assumptions Rewritten Problem

Statement

Table 2 RPS - OPS - Assumption Table

Mapping OPS with the RPS and the assumptions ensures that the data dictionary team is developing the data dictionary, to satisfy the problems listed in the OPS. The data dictionary team can view the list of identified OPS and ensure that they map to a RPS in the developed mapping table. If there is an OPS that does not have a corresponding RPS in the mapping table, then it implies that some of the requirements have not been captured in the RPS and there are gaps in the defining the problem. This means that the data dictionary team needs to modify the RPS to include the unmapped OPS and completely define the problem.

Formation of GRAI grid - Based on the RPS and further brainstorming sessions with the data dictionary client, the data dictionary team develops a GRAI grid. For details on the procedure for the development of the GRAI grid, refer to the article ‘GRAI Grid Decisional modelling’ by Doumeingts (Doumeingts 1998).

Following are the main reasons for including GRAI grid in the development of the data dictionary:

GRAI grid identifies the key business areas of an organisation. This information will be useful for the procurement of industry wide schema and the development of Information Diagram (discussed later in the article).

GRAI grid encapsulates similar activities of an organisation and bundles them in one cell of the GRAI grid. As discussed earlier in the article, it is not possible to model all the data elements of an organisation in one big step. One of the problems with the present data dictionaries is that they try to model all the data elements of the organisation in one big step. When developing a data dictionary, the data dictionary team can use the GRAI grid to identify a set of activities that are tightly coupled (closely related) with each other and model them in an iteration of data dictionary development.

Page 8: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

STEP methodology relies on modelling information requirements in iterations or cycles (ISO/TC 184/SC4 N534, p. 36). Before the information requirements for a cycle is modelled, the STEP methodology defines the scope of the requirements to be modelled in it. Defining scope of the data dictionary can be difficult because the definition of data elements present in the data dictionary can rely on other data definitions. Therefore, it can be difficult to identify which data elements should be modelled in a cycle of data dictionary development. GRAI grid helps us to define the scope of the data dictionary by carving out data elements that can be modelled in an iteration of data dictionary development.

Development of the GRAI grid for data dictionary client involves identification of:

Those organisations or departments with whom the data dictionary client will exchange

information

What information will be exchanged with these organisations or departments

Later in the article we will observe that the identification of organisations with which information will be exchanged is important in getting common agreement on data definitions present in the data dictionary.

Rewritten Problem Statement - GRAI Grid Validation - This activity involves ensuring that the GRAI grid captures the RPS. This is achieved by mapping RPS and the GRAI grid elements. The result of the mapping can be captured in the following table (Table 3).

Rewritten Problem

Statement

GRAI Grid element

Table 3 RPS - GRAI Grid Mapping Table

Mapping RPS with the GRAI grid ensures that the data dictionary team has considered all the problems in the RPS, when developing the GRAI grid.

Enterprise Activity Model Development - The next activity in the development of the data dictionary is the development of the Enterprise Activity Model or Enterprise Activity Diagram (EAM/EAD). EAM is developed to define the scope of the data dictionary. The model is presented as a set of figures that contains activity diagrams, a set of definitions of the activities in the activity diagram and information about the data used by these activities. One of the problems identified with the traditional data dictionaries is that they do not provide the scope of the information that is modelled by them. The defense data dictionary tried to model the complete information pertaining to the organisation in one step. Enterprise Activity Model defines the scope of the information that will be modelled by the data dictionary in one cycle.

Following are the reasons, explaining why the data dictionary should be developed in iterations:

According to Alderman and Moss, “initially when a data dictionary system is build the user requirements are subject to change” (AlderMan & Moss 2001, p.1). This means that initially when the data dictionary is developed it is subject to constant change. The larger the system the longer it will take to change it. This idea is supported by Dromey (2001, p.53), who states that larger the scope of the system, harder it is to manage the system and higher is its risk of failure.

According to Bernus (2001), one of the major problems with developing the complete data dictionary in one step is that it takes a long time to develop it and by the time the data dictionary is “complete’’, the data requirements of an enterprise have changed.

Page 9: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

Because of the factors discussed above the proposed methodology builds the data dictionary system in cycles. Each cycle models a “small” set of data dictionary requirements. By following this approach we ensure that even if the data dictionary system fails (does not meet the user requirements), the cost of the development of data dictionary is much less than the cost of developing of the complete data dictionary.

STEP methodology relies on modelling information requirements in iterations (ISO/TC 184/SC4 N534 1997, p. 36). The use of EAM is borrowed from the STEP methodology. Before the information for a cycle is modelled in one step, the STEP methodology defines the scope of the requirements that will be modelled in one cycle. This scoping of data dictionary is achieved by the development of an Application Activity Model (AAM). Enterprise Application Model corresponds to the AAM in the STEP methodology and has been included in the methodology because of the success achieved by using the AAM to scope information requirements in the STEP methodology.

GRAI-EAM validation – This activity involves validating that the information present in the EAM is in sync with the information modelled by the GRAI grid. Mapping GRAI grid with the EAM ensures that all the information requirements identified in the GRAI grid have been considered when the EAM is developed.

Development of Use Case Diagram - This activity involves the development of the use case diagram. For developing the use case diagram brainstorming sessions between business experts and the data dictionary development team are performed. For details on the procedure for the development of the use case diagram, refer to ‘The UML language reference manual’ (Booch, Jacobson & Rumbach 1999, p.63).

The main advantage of the development of use case diagram is to aid in specifying the scope of a system (Abel 2001). As discussed in the previous section, the scope of the data dictionary is specified using the IDEF0 diagram. It should be noted that developing use case diagram would not provide the data dictionary team with any information that is not supplied by the EAM. The main purpose of including the use case diagram in the data dictionary methodology is similar to the inclusion of EAM i.e. both aid in identifying the scope of the data dictionary. According to Bernus (2001), even though both the use case diagram and the EAM specify the scope of the system, both of these models should be used while developing the data dictionary. The reason for this is that they have their own advantages and disadvantages. For example, the EAM decomposes the system based on the activities that the system will perform. During data dictionary development the data dictionary team may be interested in decomposition of the system based on the mechanisms performing the activities.

Development of Process Model - This activity involves illustrating the business processes performed by an organisation. This is achieved by the development of a process model. Process model, models the activities performed by a business organisation procedurally (to the extent that the activities can be made procedural). According to Bernus (2001), not all the activities can be modelled procedurally. Consider the activity ‘write a thriller book’. It may be argued that we cannot model activities involved in ‘writing a thriller’ procedurally. Bernus (2001) states that activities ‘take shower’, ‘have breakfast’ and ‘write a book’ describes the activity of ‘writing a thriller’. Note that this representation does not capture the essence of writing a thriller. Activity ‘write a book’ encapsulates the essence of writing a thriller. According to Bernus (2001), even though a process model does not always represent the actual essence of an activity, development of process model will help in identification of important protocols that take place in performing an activity. To develop process model, brainstorming sessions with the mechanisms performing the activities in the EAM and the data dictionary team will be performed.

Process model can be documented either as UML activity diagram (Booch, Jacobson & Rumbach 1999, p.81), IDEF3 diagram (IDEF 2001) or as Rosttanet flow diagram (RosettaNet 2001). Any notation structure can be used as long as it is ensured that the process model we develop clusters the activities by the organisation that performs the activity. For details on the procedure for the development of activity diagram, refer to ‘The UML language reference manual’ (Booch, Jacobson & Rumbach 1999, p.81).

Page 10: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

The following figure (Figure 1) demonstrates a process model in UML activity diagram notation. The process model demonstrates that organisation X assesses stock quantity based on the ‘Inventory Request Information’ that it receives from ‘Organisation Y’. After assessing the stock quantity, ‘Organisation X’ generates ‘Stock Availability Information’ that is used by the activity ‘Distribute Product Information’. ‘Organisation X’ sends the ‘Product Information’ to the activity ‘Receive Product Information’ performed by Organisation Y.

Figure 1 Process Model

The development of the process model to model business information has been borrowed from the RosettaNet framework. RosettaNet framework uses flow diagram (a category of process model) to illustrate the information that is exchanged between different organisations (RosettaNet 2001).

There are main reasons for including the development of process model in data dictionary development are as follows:

Activity diagrams (a category of process model) have been successfully used in object oriented analysis to model business processes (Booch, Jacobson & Rumbach 1999, p.81). The researcher expects that development of process model will aid in understanding business process’s associated with an organisation. This will aid in modelling information about the organisation.

Later in the article it is proposed that to gain common understanding of data element definitions, the data dictionary team gets the data definitions approved by different stakeholders involved with the data element. The development of process model will aid the data dictionary team in identifying different stakeholders involved with the data element.

Page 11: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

Identify Organisations/Departments with Whom Information Exchanged - This activity involves following the process logic identified in the process model and identification of organisations (or departments) with whom information exchange will take place. We have acquired an idea of these organisations when the GRAI grid is developed. After the process model is developed, we can specify this information in a detailed manner - by listing information pertaining to which business process will be exchanged with which organisation or department. For example, after this activity is performed we can state that the information pertaining to ‘purchase orders’ will be exchanged with the accounting department of organisation ‘X’ and the sales department of organisation of ‘Y’.

As discussed in earlier, the methodology aims to gain common understanding of data element definitions by getting the data definition approved by different stakeholders involved with the data element. The information generated by this activity is required to validate the data dictionary after it is developed

Acquire Industry Wide Data Elements (IWD) - The next activity in the development of the data dictionary is to acquire industry wide data elements. This activity consists of acquiring the industry wide schemas, industry wide definitions and industry wide data dictionaries. We need to acquire the industry wide data elements before we do the requirements of the organisation specific data elements so that we can refer to the industry wide data elements when developing the requirements of the data dictionary. The will bring conformity of our data dictionary with the industry wide data dictionary. Acquiring industry wide data elements consists of following steps:

STEP1 - Obtain Industry Wide Schemas: Industry wide schemas (IWS) are standard data constructs used in the creation of organisation specific schemas. Industry wide schemas capture and support the common requirements of different enterprise areas. For example, there can be an industry wide schema for the following enterprise areas: Purchase order management Inventory management Shipping products

GRAI grid and EAM identify the domain in which the enterprise is operating. GRAI grid and EAM will serve as the basis for determining the domain of industry wide schemas that the data dictionary team needs to acquire. For example, if the GRAI grid identifies that an organisation performs activities relating to transporting products, then the data dictionary can acquire industry schemas pertaining to: shipping products, product information and other IWS relating to product transportation.

IWS are developed using XML and are specified in a context that is independent of any particular organisation. IWS are used as the basis for developing organisation specific schemas and are not intended for direct implementation i.e. they define reusable components that are intended to be refined to meet specific organisational need. IWS consists of generic data constructs i.e. we do not specify whether the data elements occurring in the industry wide schemas are attributes, relationships or entities. The reason for this is that the classification of a data element as an attribute, relationship or entity will vary from one organisation to another. Suppose that we want to represent the fact that a ‘man’ is ‘married’ to a ‘woman’. This can represented as an attribute attached to the entity ‘man or ’woman’, that represents the man or woman the entity is married to. Another possibility is to represent married as a relationship between the entities ‘man’ and ‘woman’. Depending upon the organisational needs, one organisation may want to model the data element as an attribute of the entity ‘man’ or ‘woman’ and other organisation may model the data element as a relationship between entities ‘man’ and ‘woman’ (Figure 2).

Page 12: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

Figure 2 Man – Woman Married Relationship

Because of the advantages of using XML as a data modelling language, XML is used to document the industry wide schemas. For example the following ‘ELEMENT’ tag states that a product will have a unique product identifier, information about product catalogue and an (optional) effective date1.

<!ELEMENT Product(

ProductIdentifier,

ProductCatalog,

effectiveDate?

) >

STEP2 - Obtain Industry Wide Data Dictionary (IWDD): Industry wide data dictionary describes the definitions of the data elements present in the data dictionary. Because of the advantages of using XML as a language to document data definitions, XML is used to document the definitions of the tags present in the industry wide schemas.

STEP3 - Acquire Common Schema Definitions: The industry wide data dictionary will refer to some standard set of definitions. For example, IWDD may refer to a common schema definition “e-mail address”. Common schema definitions are the definitions on which common industry agreement is already present or these definitions are used in more than one industry wide schema in the same context. Even though agreement is already present on these definitions it is important to document these common definitions, because the data dictionary developed using the methodology will use these definitions to define the meaning of terms contained in it. If someone unfamiliar with the common industry definitions (this can occur due to several reasons like lack of domain knowledge) tries to understand the organisation specific data dictionary, then they can refer to the common schema definitions to understand the meaning of common terms. Common schema definitions will also be useful when someone tries to understand the meaning of the terms present in the industry wide data dictionary.

The author(s) applied the concept of the enterprise data model to a data dictionary. The researcher came to the conclusion that a data dictionary based on the concept of enterprise data model should have separation of levels, and there should be an enterprise wide level that gives definitions of elementary data concept.

1 In XML ‘?’ at the end of a data element means that the data element is optional

ManWoman

ManWoman

Married

Page 13: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

The use of industry wide data elements is borrowed from the concept of Integrated Resources in the STEP methodology and the concept of Partner Interface Process (PIP) in the RosettaNet framework. STEP methodology customises generic industry constructs called as Integrated Resources to model the organisation specific requirements. RosettaNet framework customises standard PIP to model the requirements of an organisation. The use of generic standard industry constructs to model organisation specific requirements of an organisation has been successful in the STEP methodology and the RosettaNet framework. The researcher expects that by including wide data elements to model organisation specific information in the data dictionary methodology, the same success will be achieved.

The concept of having existing libraries exists in programming languages like JAVA (Sun Microsystems 2001). When we develop a JAVA application we can import a standard set of classes and use these predefined classes in the context of our application.

Earlier in the article, the author(s) presented the method by which the SIZ bank was able to achieve consistent interpretation of data definitions. The author(s) came to a conclusion that a data dictionary created according to the concepts of the SIZ method will contain an industry wide terminology that will be the basis for the development of organisation specific terminologies. The use of industry wide data elements is in correspondence with the SIZ method. The main aim of the industry wide data elements is to guide the development of the organisation specific data dictionary and give it conformity to the standard industry data dictionary.

Development of Information Requirements - This activity involves the development of Information Requirements. Information Requirements specify information about the data elements that should be present or that already exist in the organisational data dictionary. The Information Requirements will be specified in a textual format. The development of Information Requirements for data dictionary is similar to the development of conceptual requirements to model a database. For details on the development of conceptual requirements, refer to ‘Fundamentals of Database Systems’ (Elmashri & Navatale 2000, pp.173-200). Each Information Requirement must be given a unique identification number so that it can be uniquely traced to the deliverable’s produced from other activities of the methodology. Following are examples of some Information Requirements that may be developed when developing a data dictionary for a hospital: A city has a unique name and is in a country.

A Patient may live in a city.

Map Information Requirement – EAM construct - This activity consists of mapping Information Requirements with the EAM constructs. The result of the mapping can be captured in the following table.

EAM Name

EAM Construct Information Requirement

Table 4 EAM - Information Requirement Mapping Table

One of the major problems with the traditional data dictionaries is finding data elements. As discussed later in the methodology, each data element will be mapped to the Information Requirement that originates the need for the data element. Therefore, finding a data element will be easy, if we can find the Information Requirement that originates the need for the data element. This table will be useful for searching an Information Requirement based on the EAM constructs. For example, we can use this table to find all the Information Requirements that correspond to a particular EAM activity. To find a particular Information Requirement, we can identify the EAM to which the Information Requirement should belong and using this table we can find all the Information Requirements belonging to an EAM.

Page 14: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

This will help in identifying the appropriate Information Requirement, without going through all the Information Requirements of the data dictionary.

Information Diagram Preparation - This activity consists of following steps:

STEP 1 - Walk through Information Requirements: In this step the consultant provides an overview of the Information Requirements to the members involved with the development of the Information Diagram. These members include business experts, data dictionary consultant and experts with industry wide data elements. Some of these members may not understand the Information Requirements of the data dictionary client completely. A walk through and analysis of the Information Requirements is required to provide an understanding of the Information Requirements to the members involved with the development of the Information Diagram.

STEP 2 - Plan for the next activity: In this step planning for the development of Information Diagram is performed. Planning may include activities like scheduling of the Information Diagram development workshops, identification of participants and determination of the roles that they will play in the development of the Information Diagram.

Preparation phase ensures that individuals involved with the development of the Information Diagram understand the organisation specific requirements. This understanding is essential for performing activities described later in the methodology.

Information Diagram Development - This activity consists of following steps:

STEP 1- Creation of Preliminary Information Diagram - This step involves creation of an Information Diagram for each business domain. Information Diagram provides a detailed analysis of the data elements that must be present in the data dictionary. Based on the Information Requirements, the Information Diagram specifies the business definitions, entities, attributes, relationships etc. that will meet the Information Requirements of an enterprise. It must be noted that the Information Diagram must follow the notations that are approved during data dictionary development. The GRAI grid developed during development of data dictionary identifies the chief business areas of the data dictionary client. An Information Diagram must be developed for each business area of the client organisation.

If one Information Diagram is used to document all the business areas in the organisation, then the Information Diagram will get very big. This will make it difficult for the user of the Information Diagram to understand it. Having separate Information Diagram for each business domain will make the size of each Information Diagram small (as it will model information relating to one area) and thus make the Information Diagram, easy to understand and easy to represent on paper. If we are maintaining a data dictionary of an organisation (lets say adding some entities to it), then some existing data elements will be present in the Information Diagram. When the Information Diagram is developed during the maintenance of the data dictionary we need to describe both existing data elements and the new data elements that we wish to add to the data dictionary. Information Diagrams are presented as a set of figures that contain set of entities and relationships that exist between these entities. Information Diagram can be documented in syntax similar to the UML class diagram, entity relationship diagram or IDEF1X diagram.

STEP2 - Gain Agreement on Information Diagram interfaces and modify Information Diagram (if needed): After Information Diagram for a business domain is developed its interface with other Information Diagrams should be identified. The following scenarios illustrate the importance of this step.

Consider the scenario (Scenario 1): Information Diagram of 'Business Domain X' has an entity 'SALARY', representing the salary of employees of the organisation before tax. Information Diagram of 'Business Domain Y' has an entity 'SALARY', representing the salary of employees of the organisation after tax. Information Diagram of 'Business Domain Z' has an entity 'SALARY',

Page 15: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

representing salary of employee’s dependants. If individuals from different business domains talk about the term 'salary', then problems are bound to occur, because they will be using the same terms to describe different concepts.

Consider another scenario (Scenario 2): Information Diagram of 'Business Domain X' has an entity 'Business Party' that means any organisations with which 'Business Domain X' does business. 'Business Domain Y' has an entity ‘Involved Party’ which means a natural person, organisation person, organisation with which 'Business Domain Y' does business. If an interaction takes place between 'Business Domain X' and 'Business Domain Y', then problems are bound to happen, because individuals of different business domain may use different terms to describe the same business scenario.

Gaining agreement on the interfaces involves identifying interfaces with other organisations and coming up with a common agreement on the data elements at interfaces2. For example, to gain common agreement on the term 'salary' in Scenario 1, we can divide the entity 'salary' into three entities - ‘salary before tax’, ‘salary after tax’ and ‘total salary’. In scenario 2, we may take an abstraction of the entity 'Business Party'3 of 'Business Domain X' and represent it on the Information Diagram, so that agreement on the entity definitions takes place.

Note: 'Business Domain X' can continue to use the term 'Business party' when dealing in its own domain. When transactions or communications with 'Business domain Y' will take place and 'Business domain Y' uses the term 'Involved Party', using the Information Diagram the relationships between the terms ‘Involved Party’ and ‘Business Party’ will be realised.

Development of Information Diagram has been included in the data dictionary methodology because it helps in identifying the information that will be present in the data dictionary. The concept of Information Diagram has been borrowed from the STEP methodology. STEP methodology specifies development of a diagram that identifies the data elements that will be used to model the enterprise information and specifies relationship between these data elements. This diagram is called the Application Reference Model. It is very important to keep track of existing data elements in the Information Diagram because they will play an important role in the architectural design of the data dictionary.

Information Diagram – Information Requirements Validation - This activity involves ensuring that the Information Requirements have been represented in the Information Diagram. This can be achieved by mapping the Information Requirements with the constructs in the Information Diagram. The following table (Table 5) can be used to capture the result of this mapping.

Information

Requirements ID

Information Diagram

Construct

Table 5 Information Requirement - Information Diagram Construct Mapping Table

Mapping Information Diagram with the Information Requirements ensures that the data dictionary team has considered all the Information Requirements when developing the Information Diagram. If some Information Requirements do not have a corresponding Information Diagram construct in the mapping table, then it means that the data dictionary team did not represent the Information Requirement in the Information Diagram.

2 Interface data elements are the data elements used by more than one organisation3 This abstraction will not constrain the definition of the term 'Business Party' to represent an organisation with whom 'Business Domain X' does business. The meaning of this abstraction will be the same as the meaning of the term 'Involved Party' used by 'Business Domain Y'.

Page 16: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

Architectural Design of Data Dictionary - The architectural design phase consists of two main steps:

STEP 1 - Determination of the terms that exist in the client data dictionary and determination of terms those need to be implemented in the data dictionary. If we are developing a data dictionary from scratch, then we do not need to perform this step, as the data dictionary will not have any existing data elements.

STEP 2 - Determination of how the modelled data elements will be stored in the data dictionary. For example, depending on the organisational needs some data elements may be stored as an Excel file and other data elements may be stored as a XML file.

It is important to keep track of the existing data elements and the data elements that we wish to add in a cycle of data dictionary development. The reason for this is to ensure that the data elements we add to the data dictionary are not present in the existing data dictionary. If the data dictionary team does not identify the existing data elements in the Information Diagram, then they may loose track of some of the existing data elements and add new data elements for data elements that are already present in the data dictionary. Re-adding these data elements will create multiple copies of the same data element and different people or applications may use different names for the same data element. Modelling the existing and the new data elements in the Information Diagram will ensure that if a new data element, similar to an existing data element needs to be added in a data dictionary, and then we can keep these data elements bundled together in the same schema. Bundling data elements together will aid in finding related data elements easily.

Development of Organisation Specific Schema - Organisation Specific Schema (OSS) defines the Information Requirements of an organisation in XML. To develop the OSS the data dictionary team will take the Information Diagram and convert it into a XML file according to the rules defined in a DTD. Rules specified in this DTD should cater to model constructs like generalisation, association and other constructs that can occur in the Information Diagram. The following example demonstrates how an Information Diagram may be represented:

If the Information Diagram has got two schemas and there is an association between them as illustrated below: Schema1 - with two String attributes ‘A1’ and ‘A2’Schema 2 – with two String attributes ‘A3’ and ‘A4’Relationship: 1 ‘Schema1’ ‘related_to’ 1 ‘Schema2’

Then it may be XML represented as:

<Information Diagram>

< Schema NAME = "Schema1">

<Attribute VISIBILITY = "public"

TYPE= "string"

NAME= "A1"

/>

<Attribute VISIBILITY = "public"

TYPE= "string"

NAME= "A2"

/>

Page 17: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

<Association PEER = "Schema2">

<AssocRole MULTIPLICITY = "1" />

<PeerAssocRole MULTIPLICITY = "1"

ROLENAME = "related_to"/>

</Association>

</ Schema >

< Schema NAME = "Schema2">

<Attribute VISIBILITY = "public"

TYPE= "string"

NAME= "A3"

/>

<Attribute VISIBILITY = "public"

TYPE = "string"

NAME = "A4"

/>

</ Schema >

<Information Diagram>

Using a XML DTD the conversion of the Information Diagram to XML schema can be automated. The

reason for this is that given an Information Diagram there is only one way to represent it as a XML

schema.

Gather Information About Data Definitions - This activity consists of obtaining detailed information about the data element from the data dictionary users. The detailed information includes: Information about any reference documents that specify details of the data element Person who will be responsible for maintaining the data element Example of data elements Tracing Information Requirement to corresponding IDEF0 constructs Tracing data elements to the process diagram etc.

The information generated in this phase will be used be to document definitions for the data elements

in the following phase.

Documenting data dictionary - This activity involves documenting the OSS constructs in the data dictionary. It is recommended that XML should be used to document these constructs. Because the data dictionary contains the information about the data elements used in an organisation it is important that all the data elements in the OSS are documented in the data dictionary. Following are some of the reasons for recommending XML for documenting data definitions in the data dictionary. Flexible language: One of the problems identified with the traditional dictionaries is that they do

not support features like displaying images in the data dictionary. XML is a flexible language and supports features like displaying of images and adding hyper links to images and other information.

Page 18: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

Extended Easily: XML can be extended easily to include meta data information about the data elements.

Focuses on the context in which the data elements occur (Biggar & Laucent 1999). Separation of content and presentation: One of the problems identified with the traditional

dictionaries is that, we cannot present different views of the same data dictionary to different departments or organisations. Using XML we can present different views of the same data dictionary to different users. Depending on the XSS (extensible style sheet) file associated with the user, different users can view different definitions present in the same data dictionary.

Validate Data Dictionary - This activity involves validating that the data dictionary created in the previous stage meets the requirements of the organisation for which it is developed. This is achieved by performing following steps:

STEP1- Get the data definitions approved by different stakeholders involved with the data element. The data dictionary team identifies different organisations (or departments) with whom the information pertaining to different business process will take place. This step involves sending definitions of the data elements pertaining to different business processes to appropriate stakeholders and obtaining their approval on the definitions of these “common” terms. For example, if the data definition 'ClientName' deals with business process’s retailer management, marketing information management and product manufacture, then we obtain approvals on the data definition from the key stakeholders in retailer management department, marketing information department and product manufacture department. This will ensure that all the personals who deal with the term ‘ClientName’ have got the same interpretation for the definition of ‘client name’.

STEP2 - Get the data dictionary approved by the individuals involved with the different business processes. This step involves sending the data definitions pertaining to a business process to a key stakeholder in the department that manages the business process. A list of major business process’s and individuals involved with them can be obtained from the activities and the mechanisms specified in the EAM. For example, if an organisation creates electronic circuits and performs following activities: Inventory Management Sales Order Management Manufacturing

then we can send data definitions pertaining to

Inventory management, to the key stakeholders in the inventory management department Sales order management, to the key stakeholders in the sales order management department Manufacturing electronic circuits, to the key stakeholders in the manufacturing department

This will ensure that the data dictionary has not missed any key terms relating to a given business process.

Overview and evaluation of the methodology

None of the efforts to develop a maintainable data dictionary have been successful in achieving their objectives. As discussed earlier, STEP methodology has described product data in an unambiguous manner. Success has been achieved by the STEP methodology by taking context in which the data elements occur in account. The methodology proposed in this article is based on the STEP methodology. The proposed data dictionary takes context in which the definitions occur into account, when documenting data definitions. To date, the efforts made to develop a maintainable data dictionary have not taken context of the data definitions into account. STEP models product definitions in iterations. Even though the information produced in one iteration does not model the complete product, it can be used for communication between various stakeholders. The methodology proposed in this article suggests that instead of developing the complete data dictionary in one singe step (as done by the traditional data dictionary development approaches) its development should be broken into

Page 19: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

iterations. The methodology proposes development of an IDEF0 model (EAM) that specifies the scope of the information that will be modelled by an iteration of data dictionary development.

German Saving Bank developed the SIZ data model and successfully achieved consistent interpretation of data elements. SIZ data model focuses on agreement of common definitions at an industry level. Once this agreement is achieved SIZ data model focuses on agreement of organisation specific definitions at departmental level. The proposed data dictionary methodology uses industry wide data elements for the development of organisation specific data dictionary. The role of industry wide data elements in data dictionary development is similar to the role of common definitions used by SIZ data model at an industry level. The main aim of industry wide data elements is to guide the development of the organisation specific data dictionary and give it conformity to the standard industry data dictionary. Experience with the SIZ data model demonstrates that because of the agreement on common industry definitions, agreement on almost 70% of detailed data definitions was achieved (Krahl & Kittlaus, p.680). The author(s) expect that a similar outcome will be achieved by using industry wide data elements for developing data dictionaries. Most efforts to achieve common agreement on data definitions have not used this approach.

The methodology proposed in this article is based on the prior achievements in establishing consistent data definitions. None of the existing methods that have achieved success in obtaining common agreement on definitions subsumes the others; each approach has something unique to offer. The methodology proposed in this research encapsulates concepts from the existing methods and adds the notions that are essential for establishing common definitions, but that are not addressed by any of the existing approaches.

Following diagram summarises the methodology

overall picture / model

Conclusion

The methodology proposed in this article does not attempt to curb ability of an individual to take decisions. The applications of the results of the methodology proposed in this research are intended to establish circumstances, which while variable have enough constraints to create predictable results. The aim of the research was to produce ‘engineering-like highly generalisable’ process for creating data dictionary that can be used in any enterprise. The author(s) acknowledges that we can not eliminate the social process in creating and maintaining a data dictionary by automating the process of its generation completely. The methodology proposed in this article regulates some of these processes. The methodology does not provide a unique answer in terms of how the data elements in the data dictionary should be specified. It provides strategies for the development of data dictionaries so that the data elements present in it can be used and interpreted by different individuals in a consistent manner

References

Abel, D. 2001, Personal Discussions, Brisbane 12 March 2001. Austin, L. & Truth, J. O. U. 1963, Philosophical Papers of J. L. Austin, Oxford University Press,

London. Barwise.J. & Perry.J. 1983, Situations and Attitudes, MIT Press, Cambridge. Bernus, P. 2001, Personal Discussions, Brisbane, March 2001- October 2001. IBM 2000, Enterprise Data Model, [online], Available:

http://www106.ibm.com/developerworks/patterns/glossary/enterprise-data-model.html (Jun 12, 2001).

Krahl, D. & Kittlaus,H. B. 1997, The SIZ Banking Data Model, Springer, Germany. Bernus, P. 2001, Personal Discussions, Brisbane, March 2001- October 2001. Bernus, P., Nemes, L. & Morris, R. 1996, The Meaning of an Enterprise Model, in

Bernus, P. & Nemes, L. (eds), Modelling and Methodologies for Enterprise Integration, Chapman and Hall, Great Britain.

Page 20: School of ICT, Griffith University - Common Data …bernus/publications/articles/... · Web viewAccording to David Marco (2000, p.72), even though the number of possible tag types

Booch, J., Jacobson, I. & Rumbach, J. 1999, The Unified Modelling Language Reference Manual, Addition-Wesley, New York.

Booch, J., Jacobson, I. & Rumbach, J. 1999, The Unified Modelling Language User Manual, Addition-Wesley, New York.

Dromey, G. 2001, Software Requirements, [online], Available:http://www.cit.gu.edu.au/teaching/CIT2160/lectures/lect-6.doc (Sep 28, 2001).

Feldmann, G. 1998,The Practical Guide To Business Process Guide To Business Process Reengineering Using IDEF0, Dorset house of publishing Co, New York.

Graham, I. & Quin, L. 1999, XML Specification Guide, John Wiley & Sons, New York. IDEF 2001, A structured approach to enterprise modelling and analysis, [online], Available:

http://www.idef.com/ (Aug 1, 2001). ISO/TC 184/SC4 N534 1997, Guidelines for application interpreted construct development. Jalote, P. 1997, An integrated approach to software engineering, Springer, New York. Khanna,A. 2001, ‘Building maintainable data dictionary’, unpublished B. Software Engg (Hons.)

dissertation, Griffith University, Brisbane. Krahl, D. & Kittlaus,H. B. 1997, The SIZ Banking Data Model, Springer, Germany. Marco, D. 2000, Building and Managing Meta Data Repository: A full life cycle guide, John

Wiley & Sons, London. RosettaNet 2001, RosettaNet: Lingua franca for e-Business, [online], Available:

http://www.rosettanet.org/rosettanet/Rooms/DisplayPages/LayoutInitial (Oct 2, 2001). Ross, D. 1997, ‘SA: A Language for Communicating Ideas’, Transactions on SE, vol. 3, no.1

(September),17-34. Sun Microsystems 2001, JAVA Reference, [online], Available:www.sun.com (Sep 28, 2001). US Department of Commerce 1997, ‘US Publication of Seventh Data Element Dictionary System’,

National Beaure of Standards Publication, Belkis. US General Account Office 1974, ‘Emphasis needed on Government effort to standardise data

elements and code for Computer Systems’, National Beaure of Standards Publication, Belkis.