organisatievolwassenheid en gegevenskwaliteit

166
Improving data quality Growing in Maturity 10-03-2010

Upload: frank-boterenbrood-msc

Post on 28-Jul-2015

82 views

Category:

Documents


0 download

TRANSCRIPT

Improving data qualityGrowing in Maturity

Frank Boterenbrood MSc10-03-2010

Research Improving data quality in higher educationThesisImproving data quality in higher education

Management Summary

Windesheim aims to become a near zero-latency organization: an organization that is able to respond

promptly to events in its environment. However, unexpected errors hinder the implementation of near

zero latency business process technologies. These errors are caused by poor data quality. The main

business problem triggering this research is that poor data quality inhibits Windesheim’s ability to

become a near real-time organization. Closer examination reveals a serious business impact of poor

data quality, which is defined by student (customer) dissatisfaction, inefficient process execution, loss

of image and loss of control. Earlier research revealed that poor data quality is caused by applications

not checking input values, and information objects having different values and definitions in different

business domains, which in turn is caused by a departmental view on data instead of a more holistic

business process wide view on information.

The migration of a departmental view on data towards a holistic view on information is characterized

as a growth in maturity. Not only does the impact of rapid changes in technology force Windesheim to

grow in maturity, the migration from the data processing era to the information era is driven by

international developments too. As part of this migration, a natural crisis, the technological

discontinuity has to be overcome. In this research, the relation between organizational maturity and

data quality is sculpted in an instrument predicting required organizational change. The (CMMI-

based) instrument defines five levels of data quality maturity, ranging from 1) Initial through 2)

Managed, 3) Defined, 4) Quantitatively Managed and 5) Optimizing. For each level, based on proven

theories, process areas, goals and metrics are defined.

Using this instrument, and for Windesheim’s main business process, study management, current and

required data quality and corresponding organizational maturity were investigated. Current and

required maturity levels were assessed by observing process areas and goals currently implemented

and linking required goals with business rules of study management. It was found that currently, in

this domain, Windesheim has reached data quality maturity level one, and to satisfy both the business

rules and become a near-zero latency organization, data quality maturity level three is a minimal

prerequisite. Some data quality maturity level four goals will have to be reached as well.

To reach the goals required, a three staged migration path is recommended1:

1. Reach data quality maturity level two (Managed) first by repairing the current database and

creating reports for data quality monitoring purposes by means of well defined projects;

2. Reach data quality maturity level three (Defined) by putting a lasting programme in place,

adapting Educator (preventing errors by checking data quality at the input functions and

simultaneously reducing complexity by simplifying functionalities), empowering staff,

making teachers responsible for the complete process cycle, creating near real time interfaces

based on standard application interfaces, and handling the technological discontinuity;

3. Implement required level four (Quantitatively Managed) goals by establishing and

communicating strict deadlines within the study process.

This will clear the way for Windesheim to become a near-zero latency organization, improve study

management process efficiency, reduce cost of error detection and recovery, and improve customer

(student) satisfaction. Taking into account the benefits of this outcome for Windesheim, I advice

management to decide on implementing the recommendations made in this research.

1 Detailed advice is available in paragraph 5.6.2

15-Apr-23 F. Boterenbrood Page 215-Apr-23 F. Boterenbrood Page 2

Research Improving data quality in higher educationThesisImproving data quality in higher education

The Organization

Windesheim is a university of professional education, located in Zwolle, and currently serving more

than 17.000 students. The organization is controlled by the Board of Directors, which directs the

departmental management. The number of employees is 1.800, 900 of which are teaching staff. On the

board level, Windesheim and VU university Amsterdam are closely related. As a result of this

cooperation, Windesheim does offer some master studies in Zwolle, and has recently started the

Honours College Zwolle, a college aimed at serving international and ‘high potential’ students2.

Board of directors

VU-Windesheimcoöperation

11 Schools 6 service departments

Accreditation

Students

Business Partners

CollaboratingSchools

Figure 01: Windesheim Context Diagram

2 Instellingsplan 2007 – 2012, Besluit nummer 441 College van Bestuur van Windesheim

15-Apr-23 F. Boterenbrood Page 315-Apr-23 F. Boterenbrood Page 3

Research Improving data quality in higher educationThesisImproving data quality in higher education

Parties involved

Author: Frank Boterenbrood

Waardeel 1f

8332 BB Steenwijk

E-mail: [email protected]

Supervisor: Albert Paans

E-mail: [email protected]

Supervisor: Rob Keemink

E-mail: [email protected]

15-Apr-23 F. Boterenbrood Page 415-Apr-23 F. Boterenbrood Page 4

Research Improving data quality in higher educationThesisImproving data quality in higher education

Preface

Surely, it is hard to find anything less inspiring than data quality. But look at it this way: there the

data sits in the application’s database, waiting for it to be retrieved, combined, processed and

transformed into useful information. This is its moment of glory, the moment when it shines at the

user interface, or even management dashboard perhaps, being delivered by information services

conform service level agreements and processed by applets and modules, conform well established

and glorious architectural patterns and styles, only to find that it is in error, flawed, outdated,

misplaced ….

Data, most literally, is the foundation on which information systems are build, like piling creates a

foundation for (Dutch) houses. There is nothing sexy about a concrete pillar. It is hammered into the

ground and remains invisible for eons to come. However, if it isn’t there, or if there is something

wrong with it, the construction it is supposed to support will inevitably come tumbling down.

Today, every business operation relies on their information systems. And with these information

systems, organizations create and consume immense amounts of data. If the data are flawed, time and

money may be lost in equally large quantities, causing at least embarrassment and loss of reputation.

Today, every business, every leader, every consumer has a vested interest in the quality of data.

This is true for Windesheim too. This research investigates the relation between data quality and

maturity of an organization, in particular the maturity of a higher education organization. Yet, the

results are not confined to education. What has been found here, may well be applicable in other

organizations. It is my hope therefore, that this research may contribute to improved data quality in a

much broader context. For, when data is flawed, no investment in modern and exiting technologies

may undo the damage, while once data is fit for use – or has a quality even beyond that, the

capabilities of data to support and improve business are hard to overestimate.

Acknowledgements

First I do thank my beloved spouse Carin, who in the past years had supported me in my study efforts

by enduring many hours of loneliness and reduced attention.

I would like to thank Rob Keemink, who has invested a large amount of time and money into my

study, and defended this investment, despite of many financial cut-backs and management

discussions.

There are thanks for Albert Paans too, who was assigned the burden of being the official constituent

for this research, and invested a lot of his time in studying and debating the results I put forward,

which greatly contributed to the quality of the research.

I would like to thank Maarten Westerduin, for trusting me not to lose track of the Windesheim School

of Information Sciences priorities.

Also, I would like to extend my gratitude towards my colleagues of Bedrijfskundige Informatica, who

at so many occasions enabled my study and graduation by taking on extra duties where I was not able

to fulfill them.

15-Apr-23 F. Boterenbrood Page 515-Apr-23 F. Boterenbrood Page 5

Research Improving data quality in higher educationThesisImproving data quality in higher education

And last but most certainly not least, I would like to thank Marlies van Steenbergen, Theo Thiadens

and Arjen de Graaf for their time invested in and light shun on the WDQM and data quality in higher

education in general.

15-Apr-23 F. Boterenbrood Page 615-Apr-23 F. Boterenbrood Page 6

Research Improving data quality in higher educationThesisImproving data quality in higher education

1. Table of contentsManagement Summary...........................................................................................................2

The Organization.....................................................................................................................3

Parties involved.......................................................................................................................4

Preface.....................................................................................................................................5

1. Table of contents............................................................................................................6

2. Exploring data quality in higher education....................................................................9

2.1 Project Introduction........................................................................................9

2.1.1 Windesheim’s Mission..............................................................9

2.1.2 Windesheim’s Information Technology.................................10

2.2 Business Problem description......................................................................11

2.2.1 Indications...............................................................................11

2.2.2 Consequences..........................................................................11

2.2.3 Business Problem....................................................................12

2.3 Cause analysis..............................................................................................12

2.3.1 Technical / functional causes..................................................13

2.3.2 Process design causes.............................................................13

2.3.3 Organizational causes.............................................................13

2.3.4 Growing pains.........................................................................14

2.3.5 Perspective..............................................................................15

2.3.6 Past, current and future situation............................................15

2.3.7 Summary.................................................................................17

2.4 Research Problem.........................................................................................17

2.5 Stakeholder Analysis....................................................................................18

2.6 Project Relevance.........................................................................................20

2.6.1 Stakeholder Relevance............................................................20

2.6.2 Business Relevance.................................................................20

2.6.3 Relevance to Science..............................................................20

15-Apr-23 F. Boterenbrood Page 715-Apr-23 F. Boterenbrood Page 7

Research Improving data quality in higher educationThesisImproving data quality in higher education

3. Conceptual Research Design........................................................................................21

3.1 Theoretical approach and focus....................................................................21

3.1.1 Focus.......................................................................................21

3.1.2 Maturity revisited....................................................................21

3.1.3 A vision on Maturity...............................................................22

3.1.4 What is data quality?...............................................................22

3.1.5 A vision on Data Quality........................................................24

3.2 Research Goal...............................................................................................24

3.3 Research Model............................................................................................24

3.4 Research Questions......................................................................................25

3.4.1 Main questions........................................................................25

3.4.2 Sub questions for main question 1..........................................25

3.4.3 Sub questions for main question 2..........................................26

3.4.4 Sub questions for main question 3..........................................26

3.4.5 Sub questions for main question 4..........................................26

3.5 Concepts used...............................................................................................27

4. Technical Research Design..........................................................................................28

4.1 Research Material.........................................................................................28

4.2 Research Strategy.........................................................................................29

4.2.1 Strategy...................................................................................29

4.2.2 Reliability................................................................................29

4.2.3 Validity...................................................................................29

4.2.4 Scope.......................................................................................29

5. Research Execution......................................................................................................30

5.1 Correlation between data quality and maturity............................................30

5.1.1 Maturity, a brief history..........................................................30

5.1.2 Maturity levels........................................................................30

5.1.3 Process Areas..........................................................................31

15-Apr-23 F. Boterenbrood Page 815-Apr-23 F. Boterenbrood Page 8

Research Improving data quality in higher educationThesisImproving data quality in higher education

5.1.4 Identifying relevant process areas...........................................32

5.1.5 Windesheim Data Quality Maturity Model............................37

5.1.6 Alternative views on data quality maturity.............................40

5.1.7 Conclusion..............................................................................43

5.2 Data Quality Attributes................................................................................43

5.2.1 Dimensions of data quality.....................................................43

5.2.2 Data Quality Dimensions Discussed.......................................45

5.2.3 WDQM Goals.........................................................................50

5.2.4 (Time)related dimensions.......................................................52

5.3 Business rules...............................................................................................53

5.3.1 Business rules, a definition.....................................................53

5.3.2 Study management..................................................................54

5.3.3 Business rule mining...............................................................55

5.4 Current data quality maturity level study management domain...................55

5.4.1 Interview results......................................................................56

5.4.2 Current Maturity.....................................................................56

5.4.3 Current data quality dimension’s attribute values..................57

5.4.4 Conclusion..............................................................................59

5.5 Required data quality maturity level study management domain................60

5.5.1 Workshop results....................................................................60

5.5.2 Discussion...............................................................................61

5.5.3 Initial Research Problem.........................................................61

5.5.4 A data quality maturity level three (Defined) organization....62

5.5.5 Level 4 (quantitatively managed) requirements.....................62

5.6 Growing from current to required maturity..................................................63

5.6.1 Gap analysis............................................................................63

5.6.2 Migration.................................................................................65

5.7 Concluding...................................................................................................68

15-Apr-23 F. Boterenbrood Page 915-Apr-23 F. Boterenbrood Page 9

Research Improving data quality in higher educationThesisImproving data quality in higher education

5.7.1 Conclusion..............................................................................68

5.7.2 Recommendations...................................................................69

5.7.3 Stakeholder Value...................................................................70

5.7.4 Achieved Reliability and Validity..........................................70

5.7.5 Scientific Value and Innovativeness.......................................71

5.7.6 Generalisation.........................................................................71

5.7.7 Research Questions Answered................................................71

5.7.8 Recommendation on further research.....................................73

5.7.9 Reflection................................................................................74

6. Appendices...................................................................................................................75

6.1 Interview Report Windesheim Integration Team.........................................75

6.2 Interview Report WDQM Marlies van Steenbergen....................................77

6.3 Interview Report Data Quality in Education Th. J.G. Thiadens.................78

6.4 Interview WDQM dimensions Report Arjen de Graaf................................80

6.5 Interview report Current Data Quality Educator Gerrit Vissinga................81

6.6 Interview report Current Data Quality Educator Gert IJszenga...................83

6.7 Interview report Current Data Quality Educator Gert IJszenga Continued. 84

6.8 Interview report Current Data Quality Educator Klaas Haasjes..................86

6.9 Interview report Current Data Quality Educator Louis Klomp....................87

6.10 Interview report Current Data Quality Educator Viola van Drogen............89

6.11 Data Quality Workshop................................................................................91

6.12 Business rules according to the Windesheim Educational Standards..........95

6.13 Detailed Business Rules...............................................................................96

6.14 Project Flow.................................................................................................98

6.15 Literature......................................................................................................99

6.16 List of figures and tables............................................................................103

6.17 Glossary......................................................................................................103

15-Apr-23 F. Boterenbrood Page 1015-Apr-23 F. Boterenbrood Page 10

Research Improving data quality in higher educationThesisImproving data quality in higher education

2. Exploring data quality in higher education

2.1 Project Introduction

2.1.1 Windesheim’s Mission

Windesheim´s mission statement: “As an institution in higher education in the Netherlands,

Windesheim offers a broad choice and is foremost a social venture. Windesheim is a community in

which active and knowledgeable individuals meet. Windesheim is an innovative knowledge and

expertise centre, challenging individuals to develop themselves towards valuable and self-confident

professionals.

Integration of three primary processes, education, research and social entrepreneurship results in

excellent opportunities for dispersion of knowledge.

Windesheim offers tailored education and supports individual study careers. Competences and

personal planning are the foundations for each individual student.

In the area of research and social entrepreneurship Windesheim distinguishes it selves by the

implementation of knowledge exchange centers in Zwolle and participation in regional knowledge

networks3.”

3 Instellingsplan 2007 – 2012, Besluit nummer 441 College van Bestuur van Windesheim

15-Apr-23 F. Boterenbrood Page 1115-Apr-23 F. Boterenbrood Page 11

Research Improving data quality in higher educationThesisImproving data quality in higher education

2.1.2 Windesheim’s Information Technology

As indicated by figure 2, the Windesheim application landscape has become rather intertwined over

the years.

HRMControlebestand

Facility-Office

Desktop

PersoonsgegevensEn Lesmoment per groep

Untis

DiversDiversDiv. plansys.

IBG

Journaalposten

Persoonsgegevens- NAW- Aanmeldingsgegevens- Bekostigingsgegevens

AlumniNAW mutaties

Inschrijvingen

Aanbodkeuzevakkenper student

Cijfers

Beschikbaarheid

Min v O&W Westerhof boeken Student KvKVoorraad en

Bestelgegevens

ExterneStudiepunten

Adressen AdressenScholenbestand

Postcodetabel

InzetPlanning Multo (TVS)

ISEK (Evasys)

Teleform

Studenten

LOICampus

Resultaten

LOICampus

Resultaten

SPSS pakket

NDS

Discoverer

Vubis

ERP

HRM

ProjectsMgmt info

CendrisCBAP enCRI-HO

Decos

StudentDossier

Scanproces

Persoonsgegevens

StageadresResultaten Contact

gegevens

Min v Justitie

Octaaf

Blackboard

Concord

Blackboard

Concord

MijnWindesheim

Resultaten

SIMGW

BGC

Personeelgegevens

Cats

Finx

Crim

Caas

Bank

Kern

RelatiebeheerRelatiebeheer

Docentgegevens

PaltPalt

Brongegevens

Finance

Inkoop

Grootboek

Nal

Radius

PIM

Flexlm

DB KoppelingKoppelingHandmatigAutorisatieNog uitzoeken

Intern systeem

Externe partij

Uitgefaseerd

DB KoppelingKoppelingHandmatigAutorisatieNog uitzoeken

Intern systeem

Externe partij

Uitgefaseerd

CLIEOP

APRO

CLIEOP

In/ExCasso

Xafax Kassa

VMSII

Kostenplaats bij medewerker

Bestellingen

Web-cats

Web-planon

Maggy

SURFspotToken

Persoonsgegevens

OFA

Windesheim.nlAanmeldingen

Edutel

Printkosten

Medewerker

Netstorage

Vacman

Portaal

Contactgeg.

Keuze modules

Roosters

Tentamen ins.

Winkel

Forum

Net

Inschrijving

Roosters (5 * /jaar ruimteplanning)

Inschrijvingen

Keuzemodulen

Rooster Info +Medewerker code

Voorraad

BorrelKoffie-rekening

Caso

HBO-Raad

EducaatPersoneelgegevens

Salarisgegevens

60

Personeelgegevens

Journaalposten Tijd reg.

P-nummer

ExcelHuisvesting

Kostenplaatstransformatie

Totaaloverzicht

Postregistr.Portokosten

xls

PinkRoccade

Cash mgmtCredit/Debit

Activa

Noetix

Arbo

Persoonsgegevens

Inzetgegevens

Bais ExcelStudent gegevens

OpenUniversiteitVU

SLA-BaseConfiguratieItems / afdeling

Ccerum

Aanvrager

Decaan

Psycho

Aanvrager

Cijferlijst

PABX

Figure 02: the Windesheim application landscape (Windesheim, 2004)

The figure demonstrates that it is the interfaces between (clusters of) applications causing complexity.

Almost every connection requires manual intervention; therefore every data transfer represents a delay

in business processes. To reduce integration complexity and increase business service levels, in 2005

the implementation of a service oriented architecture was initiated. One of the drivers was that

Windesheim aims to become a near real-time organization. An example of this is given by the

enrollment process: as soon as students are enrolled for a study, access to all campus-wide and study

related student information services is to be granted quickly. Today, this process takes days,

implementing real-time event driven communication patterns is believed to reduce processing time to

minutes, and perhaps mere seconds.

To design and implement the service based interfaces, a System Integration task force is installed.

This task force, currently employing three professionals, is part of the IT department, yet it is

governed by the Windesheim Information Manager.

CIO

IT department Information Management

System Integration

15-Apr-23 F. Boterenbrood Page 1215-Apr-23 F. Boterenbrood Page 12

Research Improving data quality in higher educationThesisImproving data quality in higher education

Figure 03: IT service department and system integration organization

2.2 Business Problem description

2.2.1 Indications

In the past, the system integration task force encountered integration problems, caused by unexpected

and puzzling values in data fields. Triggered by these observations, in 2007 the quality of the database

of one information system was investigated4. The investigation revealed that in cases values of fields

could not be explained, or were used to indicate specific situations. Business rules defining these

situations and explaining the odd values were not documented. As a result, accounting of costs of

facilities delivered was unsure at best.

Upon completion, operations had corrected the issues found and Windesheim was advised to

document business rules, formalize data management accordingly and implement a closed-loop data

quality process.

Surprisingly, shortly after this result was reached, the integration team encountered the same errors all

over again. And in addition to these existing issues, every new data source added to the integration

architecture introduced new and unexpected data quality problems5. Issues found today are (but not

limited to):

Enrolment of students results in duplicate accounts;

Painful mistakes like sending notifications to deceased students;

Due to database corruption, management reports are rendered useless;

Sometimes fields contain text-strings stating that ‘Debbie has to solve this problem’;

Names of students are completely missing, student addresses are incorrect, information is entered

in wrong fields;

Location (room) numbers are missing or contain special, unexpected codes;

Data is outdated or is valid in / refers to different time periods between information systems;

It was found that at least in one instance, lack of data quality caused a class to be scheduled in a

stair case6.

2.2.2 Consequences

Consequences of errors in data may be severe. In Enterprise Knowledge Management, David Loshin

binds the election problems in 2000 in Florida, USA directly to poor data quality (Loshin, Enterprise

Knowledge Management, the data quality approach, 2001). Loshin identifies an operational, tactical

and strategic impact domain suffering from poor data quality.

In the operational domain, costs are associated with detection, correction, rollback, rework, and

prevention of errors, warranty, reduction of business and loss of customers (Loshin, Enterprise

Knowledge Management, the data quality approach, 2001).

In the tactical and strategic domain, decisions may be delayed or based on external or alternative data

sources, hampering change processes. Business opportunities may be lost, business units get

4 Adviesbrief gegevenskwaliteit database facility office 2007

5 See appendix 1: Interview report system integration team Windesheim 2009

6 Fact Finding Roostersysteem Windesheim.doc, versie 1.2, 26 september 2007

15-Apr-23 F. Boterenbrood Page 1315-Apr-23 F. Boterenbrood Page 13

Research Improving data quality in higher educationThesisImproving data quality in higher education

misaligned, management looses confidence in their management information systems (Loshin,

Enterprise Knowledge Management, the data quality approach, 2001).

2.2.3 Business Problem

The initial problem, triggering this research, is that a lack of data quality threatens the implementation

of a service oriented architecture. The main business problem is that poor data quality inhibits

Windesheim’s ability to become a near real-time organization.

Looking further, and along the lines of David Loshin’s observations, in both the operational and

tactical domain areas where poor data quality has an impact on Windesheim’s business goals may be

identified:

Operational domain:

1. Today, students expect any organization encountered to be a real-time organization. Banking,

insurance companies, web shops, they all offer near zero-latency business services. So why can’t

Windesheim? Not being able to live up to modern expectations may cause Windesheim to obtain

a reduced score on rankings published by the HBO-raad7. A reduced score may students decide to

go and study elsewhere, (loss of customers) resulting in a demise of income.

2. Currently, batch files transferring data between applications are checked manually on a daily

basis. And yet, from time to time errors are propagated between applications. Detection,

correction, rollback and rework associated with poor data quality cause serious overhead,

reducing the organization’s efficiency.

3. Poor data quality is a cause for mistakes in Windesheim’s external relations. Some mistakes are

more painful than others, yet all of these mistakes cause damage to Windesheim’s image of being

a trustworthy knowledge partner in the region. This may be a cause of business opportunities

being lost. And even if this is lesser the case, being an institution largely funded by public money

Windesheim has a responsibility to be precise and correct in interacting with customers,

constituents and society in general.

Tactical domain:

4. Business intelligence retrieved from questionable data is uncertain at best. As a consequence,

Business Activity Monitoring is hampered, which in turn means that the margin of error in daily

processes is unknown. It also means that monitoring progress on achieving business targets is

influenced as well.

2.3 Cause analysis

The initial research in 20078 did have a narrow scope on exploring data quality. The research was

confined to exploring data quality in only one application. However, the application observed

supported (and still supports) facility management, in education a very relevant secondary process,

directly supporting and influencing education itself. And secondly, having a rather narrow scope, the

research dug very deep into the problem, extracting interesting conclusions from the application’s

database. Based on this research, technical and process design causes were identified. However,

organizational causes remained untouched. Therefore, in this paragraph, technical/functional and

7 HBO-Monitor, http://www.hbo-raad.nl/onderwijs/kwaliteit

8 Adviesbrief gegevenskwaliteit database facility office 2007

15-Apr-23 F. Boterenbrood Page 1415-Apr-23 F. Boterenbrood Page 14

Research Improving data quality in higher educationThesisImproving data quality in higher education

process design causes found are mentioned, and organizational causes are explored deeper. At the end

of this paragraph, a summary is presented.

2.3.1 Technical / functional causes

The first observation was that the COTS9 application used for supporting facility management was a

complicated one indeed. It was found that, to implement specific requirements special database fields

were used, for which the application offered no input checks. Therefore, the content of these fields

was dependent on input being checked on correctness manually. It was found that in cases those fields

were used to signal special situations, i.e. they were overloaded (Loshin, Enterprise Knowledge

Management, the data quality approach, 2001). In other cases, values were missing or inexplicable.

The investigation revealed that the database structure of the application was not fully utilized.

As a result, in many cases checks on correctness and consistency were not present, allowing errors in

input data to exist. Not only manual input caused flaws in data quality, processing batch files received

from adjacent applications introduced errors as well.

In the Windesheim application landscape, business objects have different names and formats between

applications. A course for instance is named course, module, (variant) onderwijs eenheid, or vak. In

various applications data with respect to a course is entered, enriched, updated and transferred to the

next application. The dispersed nature of the underlying information landscape obstructs the actual

view on the current status of a course (Windesheim, 2004).

2.3.2 Process design causes

What was found during the initial research, was that operations (functional beheer) did have a very

narrow view on its scope. Instead of using the applications standard reporting facilities, some reports

were created using self made front-end applications, compensating for (correcting) errors in data. In

storing and processing data, business rules were known and applied, yet not documented. To prevent

input errors, specialized personnel only were allowed to perform certain tasks.

In general, data management was found to be characterized by a departmental view, lacking a more

holistic view on (the role of data in) the Windesheim business process. Within the boundaries of the

individual department, measures were taken to compensate for the lack of data quality, hiding the

issue for local management: ‘My data are OK’. As a result, technical issues are not dealt with, since to

management, they are invisible.

2.3.3 Organizational causes

Why is it that, when research reveals that bills send for facilities delivered are unsure at least, the

news is seemingly accepted in stoic fashion? Does Windesheim management have a disregard for

accountability? That is not very likely. To understand the position of Windesheim on information

processing, a historical view is needed.

In 1986, Windesheim university started as a merger of 12 regional institutions in higher education

(Broers, 2007). At the time, in order to gain support for the merger, an agreement was made that

management of faculties and facilities was to be decentralized (Broers, 2007). It took a management

crisis in 1992 for the new institution to realize that the benefits of a fusion could be harvested only if

old individual values are replaced by new, common goals. A more centralized model is introduced in

9 Commercial Off The Shelve

15-Apr-23 F. Boterenbrood Page 1515-Apr-23 F. Boterenbrood Page 15

Research Improving data quality in higher educationThesisImproving data quality in higher education

1995, staff and technologic support are organized in centralized service centres. In the years that

followed, the walls that divided the once so mighty and independent faculties were steadily reduced,

while the independence (and size) of the service centers grew. (Broers, 2007).

2.3.4 Growing pains

In 1979 Nolan argued that an organization and its use of information has to grow in maturity (Nolan,

march-april 1979). Nolan defined 6 stages (initially 4) of maturity. In his vision, no stage could be

skipped. In every stage, a predictable type of crisis would signal the transition to the next stage. In a

recent publication, Architecture and Demassing took the place of the original columns 5 and 6 (Data

Administration and Maturity) (Tan, 2003).

Technology drivenFocus on costs

I nformation drivenFocus on eff ectiveness

Data Processing Era I nformation Era

initiation contagation control integration architecture demassing

Limited number of stand alone inf ormation systems

I ncrease of number of systems, introducing simple (hard wired) connections

Cluster of systems, organized by task, characterized by incompatibletechnologies and data f ormats

Redesign of systems aimed at corporate wide integration

Development of systems supporting external partners

Complete integration of internal and externalsystems of business units and partners

Figure 04: Nolan’s stage model

When we take a closer look, it can be argued that currently the Windesheim faculties are well on their

way into the integration phase, or in broader perspective, higher education is transferring from the

data processing era to the information era. This can be observed by institutions striving for integration,

developing shared minors, i.e. education crossing the borders of a faculty.

However, the supporting service centers are still lingering in the control phase, which is indicated by

an ongoing isolated view on information systems10. The Windesheim application landscape still is

very task oriented, with separate systems supporting individual business functions. Not the integrated

business process is observed, focus lies on support of individual tasks.

With faculties tearing down their walls and service centers still staying put, the organization is in

danger to lose balance. It is foreseeable that service centers will have to make a transition from the

control stage to the integration stage as well. In 1992, in Germany Richard P. Marble applied Nolan’s

stage model on the transition East-German industry went through, and described the transition as

follows:

“… management realizes a need to emphasize central planning. The attention that the

computer resource finally receives leads to a change in management thinking – now

regarding their task as one of managing the data resources of the organization, not the

computer resources.” (Marble, 1992)

With this transition, a firm crisis is to be expected. This should not be seen as a loss of control, but

merely a change of paradigms. During this change of paradigms, organizations take a step back and

10 As shown in Figure 2: the Windesheim application landscape (Windesheim, 2004)

15-Apr-23 F. Boterenbrood Page 1615-Apr-23 F. Boterenbrood Page 16

Research Improving data quality in higher educationThesisImproving data quality in higher education

have to rethink many of their existing strategies and principles. This backwards movement is called a

discontinuity (Zee, 2001).

In figure 5 Van der Zee extends the Nolan model with a third era (the network era) accompanied by

two crises: a technological discontinuity and an organizational discontinuity.

Figure 05: Era’s and discontinuities, (Zee, 2001)(Zee, 2001)

The technological discontinuity is observed by English:

“… many CIOs, by and large are NOT Chief Information Officers — but Chief Information Technology

Officers. Falling into the techno-trap of believing their job was to put in place the information technology

infrastructure, their job was then to build or acquire and deploy hardware, networks, and applications,

period. Few CIOs saw and understood the Information-Age paradigm. … The Information-Age purpose

of the CIO has always been to deliver Quality, Just-In-Time Information” (English, 2009).

Currently, Windesheim is exploring Nolan’s phase 4, and with it, the technological discontinuity crisis

must be overcome. In this discontinuity, van der Zee (2001) places the focal point on both technology

and organizational changes, such as the emergence of IT-governance, with the need for ICT to be

present in the boardroom (represented by the CIO) (Zee, 2001).

2.3.5 Perspective

It seems that at Windesheim, a perceived data quality problem is not merely a technological issue, nor

is it an issue of just getting the processes right. If this was the case, the earlier data quality project

would have had more lasting results. It seems to be a case of aiding Windesheim’s service centers

through the technological discontinuity.

2.3.6 Past, current and future situation

Are the changes Windesheim is currently experiencing a phase that will quickly blow over, or are they

part of a greater scheme? Will Windesheim continue to grow in maturity or is it likely that the

organization will experience a fall-back into the control-stage again? To find out what is going on, not

only the recent past of Windesheim is of interest, but the whole picture of Higher Education in Europe

needs attention.

15-Apr-23 F. Boterenbrood Page 1715-Apr-23 F. Boterenbrood Page 17

Research Improving data quality in higher educationThesisImproving data quality in higher education

Europe not being divided by national borders and Latin being the language of choice in medieval

universities, medieval universities attracted Wanderstudenten from all over Europe:

‘Until the eighteenth century the European university was an European institution, reflecting European

values of intellectual freedom and of a borderless community’ (Vught & Huisman, 2009).

This all changed when territorial states arose, installing national frameworks.. From the eighteenth

century up until the dawn of the twenty-first century, national borders and policies effectively resulted

in ‘national science’. It was not until the 1980’s before the first EU policy initiatives appeared. In the

second half of the 1990’s, a myriad of programmes and declarations were spawned, aimed on ‘making

Europe the most competitive and dynamic knowledge based economy in the world’ (European

Council, Lisbon, 2000) and to create ‘the European Higher Education Area’ (Sorbonne Declaration

1998 & Bologna Declaration 1999). Currently, 46 European nations are involved in this process,

including the Netherlands. (Vught & Huisman, 2009).

This process causes a landslide in the area of hogescholen (universities of applied science). The clear-

cut distinction between hogescholen en universiteiten has begun to blurr since hogescholen started to

offer both Bachelor and Master degrees, and started conducting scientific research11, where previously

only Bachelor degrees were offered and scientific research was strictly reserved for universities. But

more importantly, a search for transparency was spawned:

‘This, coinciding with increasing pressure from professional organizations and external regulatory

bodies to control what was being taught …. led towards the standardization of curricula’. (Vught &

Huisman, 2009)

In the future, it is to be expected that a generally defined common (curriculum and study logistics)

framework will both ensure transparency and yet acknowledge diversification (Vught & Huisman,

2009).

How does this all translate to Windesheim? The Five Forces model of Porter may help finding an

answer to that question. The Five Forces model of Porter is an outside-in business unit strategy tool

that is used to make an analysis of the attractiveness (value...) of an industry structure (Porter &

Millar, 1985). When we project this model on Windesheim, we find the following forces shaping an

institution like Windesheim:

First of all, institutions compete with each other for the attention of the student (Rivalry among

existing firms). This is shown by the constant attention of institutions for quality statistics

released by the HBO-raad.

Secondly, the student has a great deal of influence (Bargaining Power of Buyers). Constantly, his

opinion about the quality of education is measured and published and in response, courses and

schedules are revised. As a result of European and National developments, students are highly

mobile, strongly increasing their Bargaining Power.

At third place, commercial ‘substitutes’ do exist. Commercial organizations offer grades that rival

recognized titles. For instance, in IT employees owning Microsoft certificates are in high demand,

rivaling employees owning bachelor or master degrees.

11 By means of the lectorate. (HBO-raad Lectorenplatform, 2006)

15-Apr-23 F. Boterenbrood Page 1815-Apr-23 F. Boterenbrood Page 18

Research Improving data quality in higher educationThesisImproving data quality in higher education

And finally, potential entrants (like DNV-CIBIT) fill in niche markets. Although the titles they

offer are internationally recognized, the courses they offer do not fit governmental approval and

therefore are not subsidized.12

Windesheim, being a university of professional education, is in the midst of this turmoil. Windesheim

faculties are aligning themselves with European strategies, starting with implementing a Minor/Major

educational model, jointly developing minors and even offering those minors to students of other

institutions, (trying to ‘lure’ them away) for some minors introducing English as the general language

used in classes. It seems that the wanderstudent is re-instated, but this time in unmatched masses,

forcing the institution to synchronize education in an international setting and trying to be as attractive

as possible.

Pan-European developments force institutions to prepare for intimate inter-institutional cooperation.

In this volatile environment, Windesheim does not have the luxury not to grow in maturity.

2.3.7 Summary

In this chapter, what has been found is that:

Windesheim strives to become a near zero latency organization;

Surprising errors hamper technical initiatives to implement near zero latency business process

technologies;

These errors are caused by poor data quality;

Closer examination reveals a serious business impact of poor data quality, which is defined by

student (customer) dissatisfaction, inefficient process execution, loss of image and loss of control;

Poor data quality is caused by applications not checking input values, and information objects

having different values and definitions in different business domains;

Which in turn is caused by a departmental view on data instead of a more holistic business

process wide view on information;

International developments force Windesheim to grow in maturity, migrating from the data

processing era to the information era;

As part of this migration, a natural crisis, the technological discontinuity has to be overcome;

In this crisis, the organization is to develop a holistic view on information.

2.4 Research Problem

In the past, on technical, functional and process design level, causes of data quality issues have been

identified and countermeasures have been described. This vision needs to be extended by exploring

the relation between structures defining maturity and data quality within the context of a Dutch

institution of higher education, in particular Windesheim, and even more precise, the Windesheim

service centers. In this, the focus is on crossing the border between the control and integration stage in

Nolan’s stage model (Nolan, march-april 1979) (Tan, 2003), overcoming the technological

discontinuity (Zee, 2001).

Extending the technical / functional vision on data quality does raise a myriad of questions. What

impact on data quality will overcoming the technological discontinuity have? Will a growth in

maturity be enough to solve the data quality issues identified? What exactly does ‘growing in

12 Interestingly, the fifth force, Threat of new entrants is rather unknown to education. The emergence of new institutions is highly regulated and care is taken for new institutions not to compete with existing institutions in the region.

15-Apr-23 F. Boterenbrood Page 1915-Apr-23 F. Boterenbrood Page 19

Research Improving data quality in higher educationThesisImproving data quality in higher education

maturity’ mean? What are consequences for the organization of Windesheim? Do consequences found

align with Windesheim’s strategic developments? What will the response of Windesheim’s

management be? What arguments will spawn interest in improving data quality? Is there a danger of

falling back into the comfort of the data processing era?

By extending the research beyond the technical and functional domain, the research enters the domain

of information as a subject of organizational and political forces, and using information as a strategic

instrument. It has become a problem of strategic alignment. The research problem may therefore be

summarized as:

At Windesheim, what defines the border between the control and integration stage? What are positive

and negative correlations between structures defining organizational maturity and attributes defining

data quality, enabling Windesheim to become a near zero-latency organization?

2.5 Stakeholder Analysis

Stakeholder Role Concern Relation to the problem

Board To set and guard

Windesheim’s strategy

Control on finance, quality

of the institution and

strategy.

Alignment of the institution

with national and

international developments.

Loss of image and loss of students will

hint loss of control.

Inefficient business processes impose a

financial drain on the organization.

Changing from a localized to an integral

view on data may be a cause of concern.

Information

Manager

To implement and

guard a coherent view

on information

Correctness of data Poor data quality may cause inefficient

business processes.

Science To extend the human

knowledge base

Validity and Reliability of

knowledge

New knowledge may be discovered,

existing theories validated

Security

Manager

To prevent

unauthorised disclosure

or manipulation of

information

Availability, Integrity,

Confidentiality of data

Poor data quality obstructs integrity and

availability

CIO To safeguard

undisturbed and

reliable information

delivery in business

processes

Secure and Correct use of

data. Enabling future

change.

Poor data quality may cause inefficient

business processes, loss of image and

loss of students

Management To implement change

and control daily

business processes

Budgeting and

Effectiveness, (Baida, 2002),

Reliability of data

Poor data quality cripples effective,

reliable management.

Changing from a localized to an integral

view on data may be a cause of concern.

Students To be educated Findability, Security,

Reliability, Availability,

Timeliness

Poor data quality cause student names to

be misspelled or missing altogether,

resulting in loss of trust.

15-Apr-23 F. Boterenbrood Page 2015-Apr-23 F. Boterenbrood Page 20

Research Improving data quality in higher educationThesisImproving data quality in higher education

Staff To educate Security, Reliability,

Timeliness

Poor data quality results in students

complaining, and complicated

registration & planning processes.

Operations To ensure operational

IT

Manageability, Correctness

of data

Poor data quality cause applications to

abort and time spent on debugging

Functional

Support

To ensure operational

applications

Correctness of data Poor data quality leads to manually

identify, correct and rollback errors

daily.

System

Integration

To ensure near real-

time service-based

system integration

Correctness and availability

of data

Poor data quality cause application

interfaces to abort and time spent on

debugging

Table 01: Stakeholder analysis

For management (Board, CIO, general management, information management) solving the data

quality problem will be based on a cost/benefit assessment. Operations and Functional Support will be

willing to participate in solving the problem, if care is being taken where personal interests are

involved.

Figure 6 presents a graphical representation of stakeholders and their relation to the proposed data

quality research project.

InformationManager

FunctionalSupport

Staff

Student

Operations

Board

ManagementSecurityManager

CIO

Science

SystemIntegration

Figure 06: project stakeholders

Stakeholders being committed to this project are the CIO, Information Manager and Science.

The CIO and Information Manager are financier and constituent of this research respectively.

15-Apr-23 F. Boterenbrood Page 2115-Apr-23 F. Boterenbrood Page 21

Research Improving data quality in higher educationThesisImproving data quality in higher education

(To be) involved in the project are (members of) functional support, operations, system integration

and the security manager, since the results of this research are likely to be of direct interest to these

stakeholders and because of specific knowledge within these groups.

Management holds a somewhat special place. IT management is likely to be involved, other

management may be affected. Other stakeholders affected by any advise resulting from this research

are students, staff and board.

2.6 Project Relevance

2.6.1 Stakeholder Relevance

Relevance for stakeholders is discussed in the previous paragraph.

2.6.2 Business Relevance

Currently, education at Windesheim is embarking on a journey towards a higher level of maturity and

service centers have to join this movement. However, the destination of this journey is not clear for

everyone, and for others the road ahead is unknown. This research will shed light on this migration,

by offering knowledge on what Windesheim might look like when data processing is replaced by a

more integral view on information. In the long run, this paradigm shift will enable Windesheim to stay

in sync with (inter)national processes. In the short term it will increase efficiency, student satisfaction

and management control and prevent loss of image.

2.6.3 Relevance to Science

In the field of data quality, many publications, services and even tools are available. Publications look

at data quality from a technical point of view, suggesting valid input checks and database constraints

as a solution . Business processes are recognized to be part of the equation too, and efforts are made to

point out that processes need to be implemented as a closed loop, automatically correcting errors

(Batini & Scannapieco, 1998) (Loshin, Enterprise Knowledge Management, the data quality approach,

2001) (McGilvray, 2008) (Lee, Pipino, Funk, & Wang, 2006). However, in the field of education,

academical research binding (loss of) business data quality to business maturity has not been

identified.

The US national center for educational statistics has set up a data quality task force, offering advice to

members of staff of an educational institution to create a Culture of Data Quality (Data Quality Task

Force, 2004). This publication is aimed at the field of statistics, and underlying research is unknown,

yet recommendations presented by the report may prove useful.

One research dealing with data quality in e-business has been found (Data Quality and Data

Alignment in E-business) (Vermeer, 2001). The research defined a context for data quality, and

established a formal relation between data quality in EDI messages and business process quality.

Finally, the research presented a method for establishing data quality in business-chains: DAL (Data

Alignment through Logistics) (Vermeer, 2001). The definition of the context of data quality and its

relation to business process quality delivers strong support for the research at hand.

15-Apr-23 F. Boterenbrood Page 2215-Apr-23 F. Boterenbrood Page 22

Research Improving data quality in higher educationThesisImproving data quality in higher education

3. Conceptual Research Design

3.1 Theoretical approach and focus

3.1.1 Focus

The field to explore as defined by the Research Problem is broad. This research will focus on

identifying the relation between organizational maturity and the required level of data quality, as this

has been identified as the root cause of the business problem at hand.

3.1.2 Maturity revisited

Before enthusiastically embarking on a journey into the unknown, can additional proof be obtained,

pointing towards a link between data quality and organizational maturity?

In “Data Quality and Data Alignment in E-business” Ir. Bas H.P.J. Vermeer (2001) addressed issues

resulting from distributed data management:

“…..two problems arise in a multiple database situation: a translation problem and a distribution

problem.

The translation problem arises because the same fact may be differently structured at different

locations. Therefore, schema translation is necessary to map the structure of the source schema to the

structure of the manufacturer’s schema. This results in a mapping schema between the source schema

and the receiver’s schema that is used every time a fact in the source database is updated.

The distribution problem arises because each fact update is first translated and then transported over a

network to a limited set of users, where it is finally interpreted and stored in the receiver’s database.

During translation and interpretation, mapping errors may occur, which results in loss of data quality.

During transportation, the data may get delayed, damaged, or delivered to the wrong recipient,

resulting in inconsistencies among different locations.” (Vermeer, 2001).

Thus, having a localized view on data, distributing and transforming data objects throughout an

application landscape, introduces a translation and a distribution problem.

Then, why not develop a single, integrated view on data? Why not just implement an ERP package?

Dale L. Goodhue et al (sept 1992) question the common believe that data integration always results in

positive benefits for any organization. It was shown that creating one integrated solution is simply not

feasible in many organizations. Data integration may have positive effects in terms of improved

efficiency where subunits are highly aligned. Yet in unstable, volatile environments striving for data

integration will not result in tangible benefits:

“…This model suggests that the benefits of data integration will outweigh costs only under certain

circumstances, and probably not for all the data the organization uses. Therefore, MIS researchers and

practitioners should consider the need for better conceptualization and methods for implementing

‘partial integration’ in organizations” (Goodhue, Wybo, & Kirsch, sept 1992).

15-Apr-23 F. Boterenbrood Page 2315-Apr-23 F. Boterenbrood Page 23

Research Improving data quality in higher educationThesisImproving data quality in higher education

Conclusions:

In terms of data quality, it is best if there is only one single view on corporate wide data

definitions in existence;

Only organizations who are able to successfully align their subunits are likely to achieve business

benefits from data integration;

Even with alignment, complete data integration is not likely to be achieved.

Even without striving for data integration, the research done by Vermeer and Goodhue et al hints that

aligning business units (i.e. observing the whole business value chain, instead of localized

departmental processes) is an important prerequisite for achieving improved data quality.

3.1.3 A vision on Maturity.

Maturity may be defined by Stages. (Nolan, march-april 1979). Yet more recent theories tend to

embrace Level as measure of maturity: BPMM (Object Management Group, 2008), CMMI (Software

Engineering Institute, 2009), ISO 15504 / Spice (Hendriks, 2000) where each level is defined by

certain structures. In this research, maturity is defined as an attribute of an organizational process,

organized in maturity levels, defined by certain structures being in place, revered to as maturity

structures.

3.1.4 What is data quality?

Indeed, what is data quality? Even though on quality in general multiple definitions and standards

exists, on data quality this is lesser the case. It seems as if the idea “the computer never lies” still

holds some ground. Even T. William (Bill) Olle, in “The Codasyl Approach to Data Base

Management” (Olle, 1978) did not make any remarks regarding the relationship between data base

management and data quality. Which is remarkable, since a database management system may be

regarded to be the technical guardian of data quality!

Data, business rules and business processes are linked closely together (Besouw, 2009). In about four

decades, in most businesses the data and business rules are materialized in the form of automated

information systems. Those information systems aim to reflect reality as closely as possible. But what

we find in the real world, is that reality is in a constant flux and information systems are trying to

cope. There is a natural gap in time between the situation in reality and the registration of that

situation in an information system. The problem this time lapse introduces was unwittingly recognized

by T. William Olle with respect to the book he has written:

“The time factor is in itself a problem because the CODASYL specifications are changing inexorably as

the years go by. The book reflects as accurately as possible the most recently published specifications at

the time of writing.” (Olle, 1978)

What is true for the written word might be true for information systems too. The struggle of

information systems to stay aligned with reality is one of the topics in ‘De (on)betrouwbaarheid van

informatie’13 (Bakker, 2006). Take for instance the dynamics of the Dutch population:

“According to the CBS, in October 2004 the Netherlands housed 16.258.032 citizen, of which 8.045.914

male and 8.212.118 female……But what makes us believe that we are capable to assess the number of

13 The (un)reliability of information

15-Apr-23 F. Boterenbrood Page 2415-Apr-23 F. Boterenbrood Page 24

Research Improving data quality in higher educationThesisImproving data quality in higher education

citizen with this accuracy? In the year 2000 for instance, 206.619 people were born and 140.527

deceased. At what moment in time was that exact amount of citizen determined? Wait an hour and the

number has changed! “ (Bakker, 2006)14

Bakker not only demonstrated that it is impossible to make a headcount in a dynamically changing

system with a high degree of accuracy, he also argued that in fact, no data at all is ever exactly

correct. When, for instance, one sets off to measure the coastline of Great Britain, one will find that

using precise measurements will result in a considerably longer coastline being measured compared to

the use of coarse methods (Bakker, 2006). And then again, every measurement has a certain degree of

uncertainty, a measurement error. It is simply impossible to measure a physical object exactly.

(Bakker, 2006) Therefore, it is important to establish a threshold, defining the acceptable degree of

uncertainty.

To establish such an threshold and guard the compliance of data quality, the Data Management

Association introduces the Data Quality Management function:

“Data Quality Management – Planning, implementation and control activities that apply quality

management techniques to measure, assess, improve and ensure the fitness of data for use.” (Mosley,

2008)

This definition points the way for the definition of the right threshold: data should be fit for use.

Arvix, a Dutch company dedicated to the improvement of data quality, seems to agree: “The quality

(of data) is closely related to its use” (Arvix, 2009). In addition, Frans Besouw translates fit for use

into the ability of data to support business rules (Besouw, 2009).

In the vision of Arvix, data quality reveals the capability of data to be successfully utilized over a

prolonged period of time (Arvix, 2009). Apparently, fit for use is a measure that is likely to change

over time, as business rules evolve over time. An example can be found in banking. Two decades ago

banks sending us an account transaction overview once a week was regarded acceptable. The most

recent transactions included on this overview were about half a week old, including the account total

shown. A decade ago, private banking customers were enabled to monitor all transactions on-line. On-

line access implies on-time information, and a delay in actuality of not more than one day was seen as

acceptable. Today however, customers are able to monitor their accounts in real time. In the last ten

years, in private banking actuality of information that is perceived to be fit for use has shrunk from

days to minutes.

Quality can be measured. ISO 9126 offers an standard for the evaluation of software quality. An

extension on the ISO 9126 quality standard is the Quint quality model (Zeist, Hendriks, Paulussen, &

Trieneken, 1996). However, these quality standards are aimed at measuring integrated information

system quality. To specifically target data quality in a given situation, in “Kwaliteit van

softwareprodukten, Praktijkervaringen met een kwaliteitsmodel”15 (Zeist, Hendriks, Paulussen, &

Trieneken, 1996), the already extended ISO model was extended even further by adding two new

quality attributes to the Quint model: Database Accuracy and Database Actuality. Verreck, de Graaf

and van der Sanden even express quality of data in terms of attributes. They propose to define quality

as a function of Reliability and Relevance: Q=R2 and redefine this as ‘lasting usability’. (Verreck,

Graaf, & Sanden, 2005).

14 Translated from Dutch

15 Quality of software products, hands-on experiences with a quality model

15-Apr-23 F. Boterenbrood Page 2515-Apr-23 F. Boterenbrood Page 25

Research Improving data quality in higher educationThesisImproving data quality in higher education

3.1.5 A vision on Data Quality

We started off with the discovery that many problems in Windesheim’s IT were caused by poor data

quality. In many publications, data quality is treated as being purely a technological issue. What we

found was that this vision needs to be extended by exploring the relation between structures defining

maturity and data quality. Now we have discovered that data quality is not an absolute value, but a

question of defining the right threshold:

Data is inaccurate by nature;

When data inaccuracy exceeds a certain threshold, quality becomes flawed;

The threshold is defined by data being fit for use;

In general, fit for use can be seen as the ability of data to support business rules;

Fit for use can be operationalized by means of quality attributes;

For every specific situation, appropriate attributes are to be defined;

For these attributes, and therefore for the data quality threshold, what is being perceived as

acceptable values evolves in time, as business rules evolve in time.

3.2 Research Goal

The goal of this research is to contribute to the improvement of data quality at Windesheim by

analyzing the gap between the current and required data quality threshold and corresponding current

and required maturity, identifying positive and negative correlations between data quality attributes

and structures defining maturity.

3.3 Research Model

Current data quality threshold

Current maturity

Theories on Data Quality

Maturity and data quality instrumentTheories on

Maturity of Organizations

Theories onEnsuring Quality in Business Processes

StakeholdersInvolved

Current thresholdand maturity

View on requiredthreshold and

maturity

Advice

a

b

c

d

e

Theories onMaturity of

Business Processes

f

1

2

3

4

Benchmark

Figure 07: Research Model

15-Apr-23 F. Boterenbrood Page 2615-Apr-23 F. Boterenbrood Page 26

Research Improving data quality in higher educationThesisImproving data quality in higher education

An analysis of theories on data quality and maturity, backed by exploring an external implementation

(external benchmark) (a) results in a conceptual model (maturity and data quality instrument) (b),

which will be discussed by an expert group of stakeholders involved16. This will lead to a populated

conceptual model (view on required threshold and maturity) (c). An assessment of the current data

quality threshold and current maturity (d) results in a description of the current situation (e).

Confronting the validated view with the description of the current situation leads to a Gap Analysis

(f).

3.4 Research Questions

The main research questions are found by decomposition of the research model.

3.4.1 Main questions

Observing theories on maturity and data quality, and external benchmarks, what positive and

negative correlations between structures defining maturity and data quality attributes may be

found?

What values of data quality attributes will define the required data quality threshold and therefore

the required maturity structures at Windesheim?

What are the current organizational maturity and current values of data quality attributes?

Finally, the central research question: What is the gap between current maturity structures & data

quality threshold and required maturity structures & data quality threshold in the light of enabling

Windesheim to become a near zero latency organization?

Sub questions are found by examining the chart of concepts used, described in the next paragraph. To

avoid dispersion of research questions, the sub questions are described first, and concepts used later.

3.4.2 Sub questions for main question 1

Both decomposing the main question, and interpreting the embossed part of concepts used (next

paragraph), the following sub questions are found:

1. What structures define maturity?

a. What levels of maturity do exist?

b. What maturity structures in the field of organizational structure, process, technology,

information and staff describe each level?

2. In higher education, what positive and negative correlations between maturity and data quality

may be found?

a. For this research, what is the relevant set of business rules?

b. How will this set of business rules evolve in time?

c. What data quality attributes are relevant for these business rules?

d. What values of data quality attributes correlate with each level of maturity?

e. What do process quality theories describe about positive correlations between quality and

maturity?

f. What do process quality theories describe about negative correlations between quality and

maturity?

g. Are those observations consistent?

16 As identified in stakeholder analysis: figure 6

15-Apr-23 F. Boterenbrood Page 2715-Apr-23 F. Boterenbrood Page 27

Research Improving data quality in higher educationThesisImproving data quality in higher education

3.4.3 Sub questions for main question 2

1. To support the business rules identified earlier, what values should data quality attributes have?

2. What level of maturity is required to enable those data quality attribute values?

3. What organizational structure, process, technology, information and staff criteria define the

maturity found?

3.4.4 Sub questions for main question 3

No further decomposition is required.

3.4.5 Sub questions for main question 4

1. What is the gap between the current and required organizational structure, process, technology,

information and staff criteria?

2. What conclusions and recommendations may be derived from this gap?

15-Apr-23 F. Boterenbrood Page 2815-Apr-23 F. Boterenbrood Page 28

Research Improving data quality in higher educationThesisImproving data quality in higher education

3.5 Concepts used

Correlation

Between what?

Defined by

Described by

Maturity

Data Quality

MaturityLevels

Data QualityAttribute

Values

OrganizationalMaturityTheories

ProcessMaturityTheories

Process QualityTheories

Data QualityTheories

StructureCriteria

Systems

StaffCriteria

ProcessCriteria

TechnologyCriteria

InformationCriteria

Fit for Use

Time

Business RuleSupport

Figure 08: Concepts Used

The main concept in this research is that there is a correlation to be discovered between

Organizational Maturity and Data Quality. A quick scan of BPMM (Object Management Group,

2008), CMMI (Software Engineering Institute, 2009), ISO 15504 / Spice (Hendriks, 2000) reveals that

maturity levels seem to include criteria related to Structures, Systems and Staff of McKinsey’s 7-

factor model (Pascale, Peters, & Waterman, 2009). Processes, Technology and Information criteria all

define the Systems factor. At this stage, the Systems factor may be expected to offer a link between

maturity (information quality) and data quality attribute values.

Data Quality Attribute Values are fit for use if they offer support for business rules, a condition which

evolves in time.

The correlation may be derived from organizational maturity theories, process maturity theories,

process quality theories (six sigma, www.sixsigma.nl) and data quality theories. Process quality

theories are expected to offer a second link between maturity and data quality. A link between process

quality and process maturity has already been identified (Gack, 2009).

At this point, it is assumed that a certain level of maturity is defined by a set of structure, process,

technology, information and staff criteria. It is also assumed that information criteria and data quality

15-Apr-23 F. Boterenbrood Page 2915-Apr-23 F. Boterenbrood Page 29

Research Improving data quality in higher educationThesisImproving data quality in higher education

attribute values can be linked, and that data quality theories will support the links found. These

assumptions are to be validated in this research.

15-Apr-23 F. Boterenbrood Page 3015-Apr-23 F. Boterenbrood Page 30

Research Improving data quality in higher educationThesisImproving data quality in higher education

4. Technical Research Design

4.1 Research Material

15-Apr-23 F. Boterenbrood Page 3115-Apr-23 F. Boterenbrood Page 31

Research Improving data quality in higher educationThesisImproving data quality in higher education

Research questionResearch Object Source Retrieving Method Comment

What levels of maturity do exist?Maturity Literature Desk Research Much has been published on this topic

What structures in the field of organizational structure, process, technology, information and staff describe each level?Maturity Literature Desk Research Much has been published on this topic

At this moment, which business rules are affected by lack of data quality?Affected Windesheim Business Rules Stakeholders: operations,

integration teamInterviews Integration team and operations has latent knowledge on

business rulesWindesheim Documentation

Desk Research On data quality and Windesheim Business rules research had been done already

What data quality attributes are relevant for these business rules?Relevant data quality attributes Stakeholders: operations,

integration teamInterviews Integration team and operations has latent knowledge on

business rules and required data qualityLiterature Desk Research

What values of data quality attributes correlate with each level of maturity?Correlation between maturity levels and data quality attribute values

Literature on maturity and literature on data quality

Desk Research Some research indicating a link between quality and maturity has been identified already

Publications and research Desk ResearchExternal specialists Interview At Arvix, a company specialised in data quality, interest for this

research may be raised. Dr Theo Thiadens, lector ICT Governance at Fontys, has agreed upon an interview already

What do process quality theories describe about positive correlations between quality and maturity?See previous question Literature on process

qualitySee previous question

See previous question

What do process quality theories describe about negative correlations between quality and maturity?See previous question See previous question See previous

questionSee previous question

Are those observations consistent?Results from previous questions are compared and analysed

None Analysis

To support the business rules identified earlier, what values should the data quality attributes have?Required data quality threshold Workshop Stakeholders

involved (figure 6)

What level of maturity is required to enable those data quality attribute values?Maturity required Correlation found will be

usedAnalysis

What structure, process, technology, information and staff criteria define the maturity found?Required values of maturity elements

Theories described earlier Substitution

What are the current organizational maturity and current values of data quality attributes? Operational values of maturity elements and data quality

Stakeholders: operations, integration team

Interviews Observing both maturity and data quality improves reliability

Windesheim Documentation

Desk Research

What is the gap between the current and required structure, process, technology, information and staff criteria?Results from previous questions are compared and analysed

None Analysis

What conclusions may be derived from this gap?Result from previous questions is analysed

Theories identified earlier Analysis

What recommendations may be defined?Result from previous questions is analysed

Theories identified earlier Analysis

Table 02: Research Material

15-Apr-23 F. Boterenbrood Page 3215-Apr-23 F. Boterenbrood Page 32

Research Improving data quality in higher educationThesisImproving data quality in higher education

4.2 Research Strategy

4.2.1 Strategy

This research is characterized by a grounded theory approach, based on desk research. To improve

reliability and validity, a survey was conducted, by interviewing specialists in the field and within

Windesheim. The subjects covered by the survey were maturity levels, process quality elements, data

quality attribute values and the correlation between them. Interviewees were presented with

statements and conclusions derived from publications and literature, and asked whether these are in

line with their experience, using examples of real-world situations. The results were used to validate

the hypothesis that maturity structures and data quality are related.

External participants were chosen based on their expertise on (dealing) with data quality in general

and maturity. Internal participants in interviews and the workshop were chosen based on their

experience with data quality in the business domain, both from the viewpoint of operations and user

departments. Care was taken to include participants from a department where data quality was

perceived to be troublesome and a department where data quality issues were perceived to be

successfully resolved.

4.2.2 Reliability

To reliably discover a relation between variables (i.e. data quality and maturity structures) a

quantitative approach is required. This research however, was qualitative of nature. Multiple theories

on maturity and quality were discussed and balanced. The results were cross checked by means of a

survey amongst specialists. Population of quality attribute values was performed by a workshop

involving Windesheim specialists, enabling them to reflect on the process and results. The rigor of the

study and triangulation ensure reliability. However, results are less detailed compared to results

gained from a quantitative approach.

4.2.3 Validity

In this project plan, it has been found that multiple theories point towards a required gain in maturity.

It is therefore a valid approach to look for a relation between data quality attribute values and maturity

structures. In this research, literature and publications of theories and research were explored to

validate this hypothesis. This relation was discussed by specialists in a limited survey. Building on

multiple, accepted sources, reflection on results acquired and open discussion ensure internal validity,

while applying the grounded theory approach ensures external validity.

4.2.4 Scope

This research explored the gap between required and current maturity at Windesheim. This gap

analysis is focused on a specific business domain: study management. This business domain is chosen

in close cooperation with the CIO and the Information Manager. The main goal of study management

is to manage major, minor and course definitions, present those definitions to other business domains

like scheduling & study planning and to manage study progress.

15-Apr-23 F. Boterenbrood Page 3315-Apr-23 F. Boterenbrood Page 33

Research Improving data quality in higher educationThesisImproving data quality in higher education

5. Research Execution

This chapter presents the observations achieved by executing the research according to the research

plan.

Multiple maturity models defined in publications and literature are compared. After combining,

normalizing and transforming the results, the Windesheim Data Quality Maturity model WDQM is

created. Dimensions of data quality are explored, leading to the description of the relation between

data quality maturity levels and data quality dimensions and attributes. Business rules are harvested

from Windesheim business and IT documents focused on the Windesheim business domain of study

design, education, assessment and grading. Based on these business rules, best fitting data quality

attribute values are defined, leading to an analysis of the required data quality maturity.

5.1 Correlation between data quality and maturity

The next paragraphs explore the first research question: what positive and negative correlations

between structures defining maturity and data quality attributes may be found? To find this relation,

theories on maturity and data quality are explored.

5.1.1 Maturity, a brief history

The first effort to formalize a maturity model was triggered by problems occurring with delivering

complex software systems for the US Department of Defense (DoD), mainly in connection with the

Strategic Defense Initiative (SDI). Originally, the Capability Maturity Model (CMM) was developed

as a tool to assess software suppliers. Development started in 1986 at the Software Engineering

Institute (SEI) of Carnegie Mellon University and led to the Software Process Maturity Framework in

1987. In 1991, this resulted in the publication of CMM as the Capability Maturity Model v.1.0. Based

on experience with the use of this model, a new version 1.1 was published in 1993 (Kneuper, 2008).

The five-stage maturity model immediately got to the attention of developers worldwide. In 2002,

Brett Champlin, senior lecturer at Roosevelt university, counted over 120 maturity models, all derived

from or inspired by the initial CMM (Champlin, 2002). To integrate multiple viewpoints, in 2000 the

Capability Maturity Model for Integration (CMMI) version 1.0 was published. This model was

developed even further, resulting in CMMI version 1.2 in 2006, offering three constellations which

extend the area of applicability of CMMI to development (CMMI-DEV), acquisition (CMMI-ACQ)

and services (CMMI-SVC) (Kneuper, 2008).

5.1.2 Maturity levels

CMM, its successor CMMI and their derivatives are based on common structures, the most well-

known of which perhaps is the definition of Maturity Levels introduced by Crosby (1980).

Currently, five levels are agreed upon (Kneuper, 2008) (Curtis, Hefley, & Miller, 2009):

1. Initial, no structures are in place at all. Activities are performed on an ad-hoc basis;2. Managed, processes are characterized by the project;3. Defined, processes are defined by the organization;4. Quantitatively managed, processes are measured and controlled;5. Optimizing, focus is on continuous process improvement.

15-Apr-23 F. Boterenbrood Page 3415-Apr-23 F. Boterenbrood Page 34

Research Improving data quality in higher educationThesisImproving data quality in higher education

Some maturity models recognize the five-level structure, yet assign different labels. An example are is

Master Data Management (Loshin, Master Data Management, 2008) in which the levels are labeled 1

initial, 2 reactive, 3 managed, 4 proactive, 5 strategic performance successively. In Automotive Spice,

6 levels of maturity are recognized, starting at level 0 (0 Incomplete, 1 Performed, 2 Managed, 3

Established, 4 Predictable, 5 Optimizing) (Hoermann, Mueller, Dittmann, & Zimmer, 2008). This

seems to compensate for criticism that the step between CMMI level 1 and CMMI level 2 is too big

(Kneuper, 2008). The Organizational Project Management Maturity Model (OPM3) however, skips

the first level initial altogether and four levels remain (SMCI - Standardize, Measure, Control and

continuously Improve) (Project Management Institute, 2008).

In this research however, the level structure of the currently as standard accepted CMMI will be

adopted.

5.1.3 Process Areas

The second important structure is the definition of Process Areas. A process area is a cluster of related

practices in an area that, when implemented collectively, satisfy a set of goals considered important

for making improvement in that area. Examples of process areas are project planning, organizational

training, and causal analysis & resolution (Kneuper, 2008). At maturity level 1, processes are

characterized as ad hoc or even chaotic. Therefore, no process areas are assigned to maturity level 1

(Kneuper, 2008). In successive levels, process areas accumulate. In order to reach managed maturity,

all process areas of level 2 have to be mastered. And all process areas of both levels 2 and 3 have to be

mastered in order to reach defined maturity (Kneuper, 2008).

Each process area is defined by Goals. Goals guide the implementation of process areas within the

context of each stage. For each goal, practices to reach those goals are associated. In total, CMMI

defines up to 48 goals and 512 practices (Kneuper, 2008). In addition, People CMM for instance

identifies 22 process areas, each defined by its own set of goals and practices to reach those goals

(Curtis, Hefley, & Miller, 2009).

This on its own poses a problem. Combining multiple maturity models to identify the relevant

maturity structures in the field of organizational structure, process, technology, information and staff,

may lead to a list of hundreds of process areas, goals and practices. Such a cluster of elements cannot

be analyzed in the time available. An alternative approach is required.

Cabellero and Piattini have created a CMMI based data quality maturity model: Caldea (Caballero &

Piattini, 2003). This model recognizes five maturity levels (Initial, Definition, Integration,

Quantitative Management and Optimizing) and for levels two to five, data quality activities and goals

are defined. This model is aimed at constructing and supporting a Data Management Process within

an organization. At this point, it would be most helpful to simply adopt the Caldea model, implement

the data quality activities and operationalize associated variables. Unfortunately, the Caldea model is

described at a high abstraction level, omitting any implementation details, leaving out specifications

of maturity structures and dimensions. And, since its conception in 2003, many theories on data

quality have been published, incorporating recent developments in IT, not (fully) present at the time

Caldea was described. Therefore, Caldea simply is not specific enough to be directly applicable, and

is likely to be outdated. However, the guidelines Caldea offers, may well lead the way in constructing

a more specific and up-to-date data quality maturity model.

15-Apr-23 F. Boterenbrood Page 3515-Apr-23 F. Boterenbrood Page 35

Research Improving data quality in higher educationThesisImproving data quality in higher education

5.1.4 Identifying relevant process areas

How can maturity structures be identified efficiently without overlooking important elements? The

approach adopted here is to:

1. Identify data quality improvement measures by literature study and interview;2. Assign those measures to organizational structure, process, technology, information and staff,

thus creating a balanced view;3. Assign the resulting set of measures to maturity levels by linking each measure with a specific

process area and/or practice and, again, balance the result.

In the next paragraphs, the results of this approach are presented.

Data Quality Improvement measures

A wide range of measures is discussed in literature, ranging from proper database design to instating

data governance and data quality management.

It may easily be overlooked, yet it makes perfectly sense: when the design is flawed, the system build

according to this design may hardly be expected to deliver high quality output. To prevent data quality

issues to arise in the first place, Batini, Scannapieco and others stress the importance of good database

design (standardization / normalization) and data integration (Batini & Scannapieco, 1998) (Fishman,

2009). Design and development call for a separation between development, test and production

environments, for one would not want test and development activities to interfere with production

processes and data. Such an environment is characterized by the ROTAP17 abbreviation.

Another characteristic of building proper information systems is the elimination of manual activities.

As pointed out by Thiadens in his interview (Appendix 6.3), manual interaction may account for up to

5 percent of data quality faults (Starreveld, Leeuwen, & Nimwegen, 2004). When improving data

quality, reducing manual intervention therefore is paramount.

When data quality issues arise, a problem solving approach is required including root cause analysis,

data profiling and cleaning, source rating, schema matching and cleaning, business rule matching and

new data acquisition (Verreck, Graaf, & Sanden, 2005) (Besouw, 2009) (McGilvray, 2008) (Batini &

Scannapieco, 1998).

17 Research, Ontwikkel (Design), Test, Acceptation and Production environments

15-Apr-23 F. Boterenbrood Page 3615-Apr-23 F. Boterenbrood Page 36

Root cause analysis is a technique to identify the underlying root cause, the primary source resulting in the problems experienced.

Data profiling originated as a set of algorithms for statistical analysis and assessment of the quality of data values within a data set, as well as for exploring relationships that exist between value collections within and across data sets. For each column in a table, a data profiling tool provides a frequency distribution of the different values, offering insight into the type and use of each column. Cross-column analysis can expose embedded value dependencies, whereas intertable analysis explores overlapping value sets that may represent foreign key relationships between entities.

Source Rating has the goal of rating sources on the basis the quality of data they provide to other sources.

Schema matching takes two schemas as input and produces a mapping between semantically correspondent elements of the two schemas.

Schema cleaning provides rules for transforming a conceptual schema in order to achieve or optimize a given set of qualities.

Business rule matching is the art of comparing data values found with valid values according to business rules. For instance, a person can either be male or female, therefore a database field named ‘gender’ containing a value other than male of female is suspect.

New data acquisition is an activity in which suspect data is replaced by newly retrieved data.

Research Improving data quality in higher educationThesisImproving data quality in higher education

Solving individual data quality issues is referred to as ‘small q’ by (Besouw, 2009). Yet, it may be

easily understood that without addressing the causes leading up to data quality issues in the first

place, an organization will be problem solving continuously without ever reaching a more lasting

solution. What is needed is a holistic approach on data quality, referred to as ‘large Q’ by (Besouw,

2009). Yang W. Lee, Leo L. Pipino, James D. Funk and Richard Y. Wang propose data to be

considered to be a product: an information product (IP). “An IP is a collection of data elements that

meet the specified requirements of a data consumer” (Lee, Pipino, Funk, & Wang, 2006). In the

vision of Yang W. Lee et al, treating information as a product requires the manipulation of data to be

organized as a production process and puts data quality on the board’s agenda. To reach this goal, data

quality roles and responsibilities are established, data quality management procedures are in place and

practical data standards are in use. (Lee, Pipino, Funk, & Wang, 2006).

Yang W. Lee et al identify four fundamentals (Lee, Pipino, Funk, & Wang, 2006):

1. Understand the consumer’s needs;

2. Manage information as a product of a well defined information product process;

3. Manage the life cycle of the information product;

4. Appoint an information Product Manager.

Instating those fundamentals is also known as Master Data Management (Loshin, 2008) (Besouw,

2009) or Data Governance (Fishman, 2009). Master Data Management (or: Data Governance)

includes data quality Service Level Agreement (SLA), life cycle data management and end-to-end

process control. Process control implies the presence of controls, elements in the dataflow where the

quality of data and process is ensured and monitored. Controls include data and specifications,

technology, processes, CRUD18-roles and people & organization (work instructions and employee

education) (McGilvray, 2008) (Besouw, 2009). Thiadens identified assigning responsibilities to the

right people as a major contributor to data quality:

“Problems in grade assignment may be solved by making the lecturer directly responsible for correct

and timely grading. Lecturers are corrected by students when grade assignment is late or questionable.

Registration of lecturer availability may be much improved if the lecturer is made responsible, and is

given the right tools to manage this information” (Interview Thiadens, Appendix 6.3).

To consider information to be a product opens the way to apply production quality frameworks to

information. One widely accepted framework is Six Sigma, a product quality improvement framework

reducing defects by improving the production process. In monitoring the product quality, technology,

processes organization and staff are viewed as a whole. In Six Sigma, sigma represents the standard

deviation. Six Sigma means six times sigma, indicating 3.4 defects per million opportunities (Boer,

Andharia, Harteveld, Ho, Musto, & Prickel, 2006).

The main instrument of Six Sigma is the continuous DMAIC quality improvement cycle (Define,

Measure, Analyze, Improve, Control). In Six Sigma, Key Goal Indicators (KGI’s) are defined, and

translated in Key Performance Indicators (KPI’s) for the information manufacturing process. Controls

are identified, influencing the KPI’s. Thus, KGI’s are measured by KPI’s and managed by Controls.

Finally, the process is executed and, in continuous DMAIC cycles, improved (Boer, Andharia,

Harteveld, Ho, Musto, & Prickel, 2006).

18 Create Read Update Delete

15-Apr-23 F. Boterenbrood Page 3715-Apr-23 F. Boterenbrood Page 37

Research Improving data quality in higher educationThesisImproving data quality in higher education

The notion of applying quality cycles to data is recognized by the Massachusetts Institute of

Technology (MIT), creating the Total Data Quality Methodology TDQM (Lee, Pipino, Funk, &

Wang, 2006) This approach is characterized by five stages:

1. Identify the problem,

2. Diagnose the problem,

3. Plan the solution,

4. Implement the solution,

5. Reflect and learn.

In addition to TDQM, Larry P. English has introduced the Total Information Quality Methodology,

(TIQM), identifying six processes ensuring continuous improvement of information quality (English,

2009):

P1:Assess Information Product Specification Quality,

P2:Assess Information Quality,

P3:Measure Poor Quality Information Costs & Risks,

P4:Improve Information Process Quality,

P5:Correct Data in Source and Control Redundancy,

P6:Establish the Information Quality Environment.

In TIQM, Process six (P6) is an overall process, actually being the first process being executed. While

in both approaches, we may recognize the recursive quality loop, TIQM more clearly recognizes data

(information) to be a product. Even though both TDQM and TIQM recognize the closed quality

improvement loop, it is the six sigma approach which offers the most recognized and widely used

quality based approach. Therefore, in this research, six sigma practices are positioned at WDQM level

five.

To be able to fine-tune a process using quality cycles, process control has to be rigorous, leaving little

room for workers in the process to deviate from their instructions. This is also known as operational

excellence, in which the focus is on creating an as efficient process as feasible (Treacy & Wiersema,

1997).

Practices and structure, process, technology, information and staff

Now practices improving data quality are found, in this paragraph they will be assigned to structure,

process, technology, information and staff. As defined in paragraph 5.5 Concepts Used, Structures,

Systems and Staff are part of McKinsey’s 7-factor model (Pascale, Peters, & Waterman, 2009).

Structure deals with the way the organization is constructed (task management, coordination,

hierarchy), while Processes, Technology and Information criteria all define the Systems factor. Staff

encompasses knowledge management, rewarding, education, morale, motivation and behavior

(Pascale, Peters, & Waterman, 2009). Table 3 presents an overview.

15-Apr-23 F. Boterenbrood Page 3815-Apr-23 F. Boterenbrood Page 38

Research Improving data quality in higher educationThesisImproving data quality in higher education

Identification Structure Process Technology Information Staff

Apply proper system design

Project based development, Project teams, Project management

Proper database design, Data integration

A ROTAP environment is required

Structured Data Modelling Knowledge, Domain Knowledge, Project Management competent

Problem Solving

Ad Hoc problem solving root cause analysis, data profiling and cleaning, source rating, schema matching and cleaning, business rule matching and new data acquisition

Data Analysis and Cleaning tools

Unknown Analytical competent, Knowledge of technology, business rules and data sources

Information as a Product IP

Information Product Manager, Demand and supply structure, Data Quality on the business agenda, Data Quality roles and responsibilities are established

Information is managed as a product of a well defined information product process. Supporting Data Life Cycle Management

Data Quality Analysis and Reporting tools

Structured into an Information Product; Subject to Life Cycle Management, Practical data standards are in use

Commercial skilled (the customer is the consumer ), Understanding the customer needs, Proactive approach to changing data needs

Master Data Management

Deliver quality according to Service Level Agreement

End-to-end process control Defined and role in life-cycle (CRUD) documented

Data Quality Controls are present

Working according to strict instructions

Six Sigma Strict hierarchical DMAIC, executed according to Key Goal Indicators, monitored by Key Performance Indicators

Technology and information quality are observed as a whole

3.4 defects per million opportunities

Staff and information quality are observed as a whole

Table 03: Practices and structure, process, technology, information and staff

Please note that table 3 does not present a maturity model. In a maturity model, levels are organized

in a strict hierarchy, in which process areas accumulate over successive levels. To transform the

model found into a maturity model, further analysis is required. To do so, a view on maturity levels is

created by evaluating multiple level-based maturity models.

Table 4 combines process areas (or: best practices, capabilities and activities) of several maturity

models into one view: PeopleCMM (Curtis, Hefley, & Miller, 2009), CMMI (Kneuper, 2008),

Organizational Project Management Maturity Model OPM3 (Project Management Institute, 2008),

Master Data Management Maturity Model (Loshin, 2001) and Caldea (Caballero & Piattini, 2003).

15-Apr-23 F. Boterenbrood Page 3915-Apr-23 F. Boterenbrood Page 39

Research Improving data quality in higher educationThesisImproving data quality in higher education

Practices and maturity levels

Level Focus CMMI Process Areas People CMM Process Areas OPM3 Best Practices MDM Capabilities Caldea Activities

1 initial Processes are ad-hoc

- - - Limited enterprise consolidation of representative models, Collections of data dictionaries in various forms, Limited data cleansing

2 Managed Processess are characterized by the project

Requirements Management, Project Planning, Project Monitoring and Control , Supplier Agreement Management, Measurement and Analysis, Process and Product Quality Assurance, Configuration Management

Compensation, Training & Development, Performance Management, Work Environment, Communication & Coordination, Staffi ng

Standardize Develop Project Charter Process, Standardize develop Project Management Plan process, Standardize project Collect Requirements process, Standardize project Define Scope proces, ….

Application architectures for each business application, Data dictionaries are collected into a single repository, Initial exploration into low-level application services, Review of options for information sharing, Introduction of data quality management for parsing, standardization, and consolidation

Data Management Project Management, Data Requirements Management, Data Quality Dimensions and Metrics Management, Data Sources and data Targets Management, Database or data warehouse developmentor acquisition project management

3 Defined Processes are defined by the organization

Requirements Development, Technical Solution, Product Integration, Verification, Validation, Organizational Process Focus, Organizational Process Definition, Organizational Training, Integrated Project Management, Risk Management, Decision Analysis and Resolution

Participatory Culture, Workgroup Develpment, Competency-Based Practices, , Career Development, Competency Development, Workforce Planning, Competency Analysis

Measure Develop Project Charter Process, Measure develop Project Management Plan process, Measure project Collect Requirements process, Measure project Define Scope proces, ….

Fundamental architecture for shared master data framework, Defined services for integration with master data asset, Data quality tools, Policies and procedures for data quality management, Data quality issues tracking, Data standards processes

Data Quality Team Management, Data quality product verification andvalidation, Risk and poor data quality impactManagement, Data quality standardization Management, Organizational Processes Management

4. Quantitatively managed

Processes are measured and controlled

Organizational Process Performance, Quantitative Project Management

Mentoring, Organizational Capability Management, Quantitative Performance Management, Competency-Based Assets, Empowered Workgroups, Competency Integration

Control Develop Project Charter Process, Control develop Project Management Plan process, Control project Collect Requirements process, Control project Define Scope proces, ….

SOA for application architecture, Centralized management of business metadata, Enterprise data governance program, Enterprise data standards and metadata management, Proactive monitoring for data quality control feeds into governance program

Data Management Process Measurements Management

5 Optimizing Continuous Process Improvement

Organizational Innovation & Deployment, Causal Analysis & Resolution

Continuous Workforce Innovation, Organizational Performance Alignment, Continuous Capability Improvement

Improve Develop Project Charter Process, Improve develop Project Management Plan process, Improve project Collect Requirements process, Improve project Define Scope proces, ….

Transaction integration available to internal applications, Published APIs enable straight-through processing, Cross-organization data governance

Causal Analysis for Defect Prevention, Organizational Development andInnovation

Table 04: A combined view on maturity.

In this view, all PeopleCMM, all CMMI-COM and CMMI-DEV process areas and all Caldea

activities are shown. With regard to OPM3 and MDM, a subset of best practices and capabilities are

included, in order to present a workable overview. Using this view as a guideline, the practices

identified in table 3 are assigned to specific maturity levels, resulting in table 5, the Windesheim Data

Quality Maturity (WDQM) model. 19 The assignment of practices to WDQM levels is discussed in the

next paragraphs.

19 This table is validated in a discussion with M. van Steenbergen, lead architect at Sogeti.(see appendix 6.2)

15-Apr-23 F. Boterenbrood Page 4015-Apr-23 F. Boterenbrood Page 40

Research Improving data quality in higher educationThesisImproving data quality in higher education

5.1.5 Windesheim Data Quality Maturity Model

Level Focus Structure Process Technology Information Staff

1 initial Processes are ad-hoc

- - - Unspecified -

2 Managed Processess are characterized by the project

Project based development, Project teams, Ad Hoc problem solving

Data profiling and cleaning, source rating, schema matching and cleaning, business rule matching and new data acquisition.

Data Analysis and Cleaning tools. File Transfer data exchange pattern

Not trusted Analytical competent, Knowledge of technology, business rules and data sources, Data modeling knowledge

3 Defined Processes are defined by the organization

Programme management

Root cause analysis, Requirements Development, Product Integration, Verification, Validation, Data integration

Technical Solution, A ROTAP environment is available. Data integration through Remote Procedure Invocation

Fit for current use, A canonical data model supports data translations between domains

Domain Knowledge, Programme Management competent, Data responsible

4. Quantitatively managed

Processes are measured and controlled

Information Product Manager, Data Quality on the business agenda, Data Quality roles and responsibilities are established. Quality is delivered according to Service Level Agreement.

Information is managed as a product of a well defined information product process. Supporting Data Life Cycle Management. End-to-end process control.

Data Quality Analysis and Reporting tools, Integration patterns. Message Bus or Message Broker pattern

Structured into an Information Product; Subject to Life Cycle Management, Canonical data model defines data standards as a lingua franca. Data Quality Controls are present

Commercial skilled (the customer is the consumer ), Understanding the customer needs, Proactive approach to changing data needs

5 Optimizing Continuous Process Improvement

Processes are executed in a strict hierarchy

DMAIC, executed according to Key Goal Indicators, monitored by Key Performance Indicators

Defined and role in life-cycle (CRUD) documented. Technology and information quality are observed as a whole

3.4 defects per million opportunities.

Working according to strict instructions, Staff and information quality are observed as a whole.

Table 05: Windesheim Data Quality Maturity model WDQM

Discussion

The structure column is characterized by a growth from an ad-hoc approach, via project based

development and an integrated programme management approach, to the institution of product quality

management and finally total quality management at level five. At this level, the modus operandi for

processes execution is operational excellence, requiring employees in the workforce to adhere to strict

standards and instructions (Treacy & Wiersema, 1997).

The process column replaces the rather limited notion of proper database design by the CMMI level

three Process Areas Requirements Development, Technical Solution, Product Integration, Verification

and Validation, indicating that the issue at this level is to design, build and implement a well

functioning solution. The CMMI process area technical solution is positioned in the Technology

column. The activities mentioned under problem solving in table 3 fit maturity level two, which is

characterized by ad-hoc problem solving. Root cause analysis however does not fit on this level, since

this activity leads to solving data errors at the root of the problem. Root cause analysis is positioned at

maturity level three, where it supplements requirements development, enabling integrated, robust

solutions.

15-Apr-23 F. Boterenbrood Page 4115-Apr-23 F. Boterenbrood Page 41

Research Improving data quality in higher educationThesisImproving data quality in higher education

The technology column reveals an evolution in system integration. At level two, system integration

still is designed at an ad-hoc, individual manner. At level three, Caldea positions data standardization,

while MDM mentions having defined services for integration, and according to MDM mastering level

four is required for successfully building a SOA application architecture. This is reflected by different

system integration styles being utilized (Hope & Woolf, 2008). At level two, the File Transfer pattern

is the dominant integration style, offering ease of integration and an excellent universal storage

mechanism. At level three, the emergence of a canonical data model opens the way for a more

standardized system integration, utilizing the Remote Procedure Invocation integration style (Hope &

Woolf, 2008). Along the lines of MDM, at level four a common Messaging style supported by a

message broker pattern or message bus pattern (Hope & Woolf, 2008) results in a service oriented

application architecture.

In the information column, at level one, initial, (management of) the organization is oblivious with

regard to data quality. All is assumed to be well, the state data is in however remains unspecified. In

the next level, data quality issues have triggered numerous attempts to repair and clean data resulting

in a decline of confidence in the reliability of the information. MDM positions a rather isolated view

on data quality at level two, whereas at level three an integrated approach is supported by a

fundamental architecture for shared data. Again, we may well see the emergence of a canonical data

model at level three, enabling data to be transformed at the borders of each domain. At this level, data

quality is fit for current use, as indicated by the presence of Caldea’s risk and poor data quality

management process area, whereas at level four data is seen as a product, data quality becomes future

proof and at level five data quality reaches six sigma.

Staff finally, grows from being analytical competent (a good system programmer) to a commercial

skilled worker, being able to assess the data customer’s needs. This reflects PeopleCMM’s

professional training at level two, competence and career development at level three and the

institution of empowered workgroups at level four. PeopleCMM’s definition of professional training

at level two, creates room making the individual entering data responsible for the quality and

ultimately the effects of the data entered. This however requires the organization to focus on the

process as a whole, which initially is the case at level three. Therefore, an individual may be made

responsible for his data entered at level three: data responsible.

Level five is characterized by a continuous improvement cycle. Current data quality theories do not

include continuous improvement. It seems that data quality theories are focused on improving the data

quality to an acceptable level (fit for use). An alternative approach is to be adopted to shape level five.

Both TDQM and Six Sigma are aimed at continuous process improvement. When taking a closer look

however, TDQM is positioned as a project management approach for solving data quality problems in

general (Lee, Pipino, Funk, & Wang, 2006, p. 64) (Kovac, Lee, & Pipino, 1997) ensuring that data

errors are improved at the data source, not at the place they create havoc. This implies that a form of

continuous improvement cycle has been defined at level two already. However, TDQM does not

improve the production process it selves. It springs into action once an obvious data error has been

detected, and eliminates the root cause. Six Sigma on the other hand, improves the data production

process until data quality has reached an absolute maximum, surpassing the ‘fit for use’ boundary.

Therefore, to populate level five, Optimizing, the Six Sigma fundamentals fit best.

In the remainder of the research, this model will be referred to as the Windesheim Data Quality

Maturity model, or WDQM.

15-Apr-23 F. Boterenbrood Page 4215-Apr-23 F. Boterenbrood Page 42

Research Improving data quality in higher educationThesisImproving data quality in higher education

Data Ownership

An issue that remains largely untouched so far is data ownership. From whom is the data, anyway? To

be more precise, who owns the data at Windesheim? Take for instance grades assigned to

assessments, made by students. Who owns that grade? Is it the student, or the student administration,

or the IT department perhaps? And what does this all mean for treating information as a product? If

information is a product, and it is subject to Service Level Agreements, then who is selling what to

whom?

In literature, some is said on data ownership. On level 4, in the structure column, it is found that Data

Quality roles and responsibilities are established and an Information Product Manager instated (Lee,

Pipino, Funk, & Wang, 2006). This issue is more specifically addressed by Danette McGilvray,

introducing the Data Steward (McGilvray, 2008) as a replacement for data owner, since in her vision

ownership results in a too rigid and inflexible position of stakeholders. Indeed, when interviewed,

Thiadens, lector at Fontys university, identified ownership as an obstacle:

“The most difficult hurdle to be solved here is to overcome the notion that information is not owned by the

decentralized business units”. (Interview dr. mr. ir. Th. J.G. Thiadens, Appendix 6.3)

A data steward on the other hand is a role, acting on behalf and in the best interest of someone else,

thus creating room to maneuver and flexibility to implement this role. Gartner seems to agree:

“A data owner owns the data, much like the queen owns the land, while a data steward takes care of the

data, much like a farmer takes care of the land” (Friedman, 2009).

To be able to take care of data, one must have the right tools and responsibilities. A data steward is

able to be effective at level 4, quantitatively managed, since at this level information is managed as a

product, for the quality of which one can be responsible (Lee, Pipino, Funk, & Wang, 2006). The

Information Product Manager therefore is positioned at level 4 and assigned the role of Data Steward.

It be noted however, that management involvement remains crucial:

“…the data steward, .. cannot fulfill his role as caretaker for data quality if the means to effectively

influence data quality do not come with the job. Since data quality is related to organizational maturity,

the means required are managerial rather than technical. To ensure data quality, one may have to be

prepared to restructure the organization. Instating data stewardship without the preparedness of taking

(perhaps drastic) managerial decisions, restructuring the fabric of an organization, may be in vain. There

HAS to be a manager responsible for data quality with the authority to implement change”. (Interview de

Graaf, appendix 6.4)

15-Apr-23 F. Boterenbrood Page 4315-Apr-23 F. Boterenbrood Page 43

Research Improving data quality in higher educationThesisImproving data quality in higher education

Graphical presentation

1. Initial

2. Managed

3. Defined

4. QuantitativelyManaged

5. Optimizing

Data Quality has not yet been formally identified as the source of problems

Aware of data quality problems, solving data quality issues on ad-hoc basis.

solving data quality issues throughstructured system developmentand rigorous testing

Treating information as a product, handling data quality problems as product and process faults, controlling the process

Constantly improving process and data quality in total quality cycles

Figure 09: graphical representation WDQM

5.1.6 Alternative views on data quality maturity

In the previous paragraphs multiple maturity models were analyzed, resulting in the WDQM. The

common denominator between those models is that they are all level based maturity models using

process areas (Curtis, Hefley, & Miller, 2009), (Kneuper, 2008), best practices (Project Management

Institute, 2008) or capabilities (Loshin, 2001) to achieve goals defining each level of maturity. In

literature, other data quality maturity models are described, using a similar level-based description, yet

lacking the definition of process areas (or best practices c.q. capabilities) and goals. Therefore, using

these models as a source for analysis is difficult, if not impossible. However, now the WDQM has

been defined, what can we learn from comparing the resulting data quality maturity model with the

other maturity models described in literature?

Data Quality Management Maturity Model

An alternative view on data quality maturity is developed by Kyung-Seok Ryu, Joo-Seok Park, and

Jae-Hong Park (figure 10).

Figure 10: A Data Quality Management Maturity Model (Ryu, Park, & Park, 2006)(Ryu, Park, & Park, 2006)

15-Apr-23 F. Boterenbrood Page 4415-Apr-23 F. Boterenbrood Page 44

Research Improving data quality in higher educationThesisImproving data quality in higher education

In this view on data quality maturity, in each successive maturity level, data management operates on

an increased level of abstraction. Where initially data is managed from a rather operational point of

view, the physical database scheme, in the second level a data model is present, resulting in a more

integrated view on data. Next, this model is standardized using meta data standards, an finally a more

holistic view is obtained utilizing a data architecture. This view on maturity takes another approach,

solely focusing on the information aspect and utilizing four levels instead of the CMMI five-level

approach, whereas in the WDQM at higher maturity levels data transforms into a product, and the

focus is on improving the production process. However, similarities may be observed:

Data Quality Management Maturity Windesheim Data Quality Maturity

Lev 1 Management of physical data Level 2 Focus on repairing physical D.Q. issues

Lev 2 Management of data definitions Level 3 Focus on requirements and data design

Lev 3 Management through data standards Level 4 Data is an IP, data standards are in use

Lev 4 Holistic data management, architecture Level 5 Continuous improved through DMAIC

As shown, when maturity levels are aligned, level 1 through 3 of the data quality management

maturity model bear similarities to the levels 2 through 4 of the Windesheim Data Quality Maturity

model. The data quality management maturity model presented by Kyung-Seok Ryu et al may enrich

the information column of the WDQM (table 5).

Gartner Data Quality Maturity Model

Another data quality maturity model is defined by Gartner. Gartner recognizes five levels of maturity

(Gartner, 2007):

“Organizations at Level 1 have the lowest level of data quality maturity, with only a few people aware of

the issues and their impact. … Organizations at Level 2 are starting to react to the need for new

processes that improve the relevance of information for daily business. ….. Organizations at Level 3 are

proactive in their data quality efforts. They have seen the value of information assets as a foundation for

improved enterprise performance ….. At Level 4, information is part of the IT portfolio and considered an

enterprise wide asset, and the data quality process becomes part of an EIM program. …Companies at

Level 5 have fully evolved EIM programs for managing their information assets with the same rigor as

other vital resources, such as financial and material assets.” (Gartner, 2007).

Even though Gartner does not define process areas and goals for each level, characteristics defining

each level are provided in a descriptive text. To analyze this description, table 6 (see next page) is

created, containing both the WDQM and the characteristics from Gartner’s vision on data quality

maturity (Gartner, 2007).

Again, similarities and differences can be observed. In Gartner’s view, at maturity level three the

organization is already moving beyond project based development, which leads to a bit confusing and

less clear cut distinction between maturity levels managed and optimized.

Also, the distinction made between an organization responding in a reactive or proactive mode on data

quality issues is interesting. Being pro-active and having Enterprise Information Management (EIM)

operational at level three already might be a bit steep, considering the fact that at level three, OPM3

positions projects being measurable (not being in control), MDM defines data quality traceable (and

positions proactive monitoring at level four), and CMMI focuses on integration (and positions

15-Apr-23 F. Boterenbrood Page 4515-Apr-23 F. Boterenbrood Page 45

Research Improving data quality in higher educationThesisImproving data quality in higher education

quantitative management at level four) (Curtis, Hefley, & Miller, 2009), (Kneuper, 2008), (Project

Management Institute, 2008), (Loshin, 2001).

15-Apr-23 F. Boterenbrood Page 4615-Apr-23 F. Boterenbrood Page 46

Research Improving data quality in higher educationThesisImproving data quality in higher education

Level Focus Structure Process Technology Information Staff

1 initial, WDQM Processes are ad-hoc

- - - Unspecified -

1 Aware Gartner Lowest level of data quality maturity

Within the entire organization, no person, department or business function claims responsibility for data.

When a problem with data quality is obvious, there is a tendency to ignore it and to hope that it will disappear of its own accord

No formal initiative to cleanse data exists, and information emerging from computers is generally held to be "correct by default."

Business users are largely unaware of a variety of data quality problems, partly because they see no benefit for themselves in keeping data clean.

Only a few people aware of the issues and their impact

2 Managed WDQM

Processess are characterized by the project

Project based development, Project teams, Ad Hoc problem solving

Data profiling and cleaning, source rating, schema matching and cleaning, business rule matching and new data acquisition.

Data Analysis and Cleaning tools. File Transfer data exchange pattern

Not trusted Analytical competent, Knowledge of technology, business rules and data sources, Data modeling knowledge

2 Reactive Gartner

Reacting to the need for new processes

Although field or service personnel need access to accurate operational data to perform their roles effectively, businesses take a wait-and-see approach in relation to data quality

Starting to react to the need for new processes that improve the relevance of information for daily business.

Application developers implement simple edits and controls to standardize data formats, check on mandatory entry fields and validate possible attribute values.

Business decisions and system transactions are regularly questioned due to suspicions about data quality.

Employees have a general awareness that information provides a means for enabling greater business-process understanding and improvement.

3 Defined WDQM

Processes are defined by the organization

Programme management Root cause analysis, Requirements Development, Product Integration, Verification, Validation, Data integration

Technical Solution, A ROTAP environment is available. Data integration through Remote Procedure Invocation

Fit for current use, A canonical data model supports data translations between domains

Domain Knowledge, Programme Management competent, Data responsible

3 Proactive Gartner

Proactive data quality efforts

Organizations have seen the value of information as a foundation for enterprise performance and moved from project-level IM to a coordinated EIM strategy.

Major data quality issues are documented, but not completely remediated.

Data quality tools, for tasks such as profiling or cleansing, are used on a project-by-project basis, but housekeeping is typically performed by the IT department or data warehouse teams.

Business analysts feel data quality issues most acutely and data quality is part of the IT charter. Levels of data quality are considered "good enough" for most tactical and strategic decision-making.

Department managers and IT managers communicate data administration and data quality guidelines. The concept of "data ownership." is discussed.

4. Quantitatively managed WDQM

Processes are measured and controlled

Information Product Manager, Data Quality on the business agenda, Data Quality roles and responsibilities are established. Quality is delivered according to Service Level Agreement.

Information is managed as a product of a well defined information product process. Supporting Data Life Cycle Management. End-to-end process control.

Data Quality Analysis and Reporting tools, Integration patterns. Message Bus or Message Broker pattern

Structured into an Information Product; Subject to Life Cycle Management, Canonical data model defines data standards as a lingua franca. Data Quality Controls are present

Commercial skilled (the customer is the consumer ), Understanding the customer needs, Proactive approach to changing data needs

4. Managed Gartner

Information is an enterprisewide asset

The data quality process is part of an EIM program and is now a prime concern of the IT department and a major business responsibility.

Data quality is measured and monitored at enterprise level regularly. An impact analysis links data quality to business issues and process performance.

Commercial data quality software is implemented. Cleansing is performed either at the data integration layer or directly at the data source.

Information is part of the IT portfolio and considered an enterprisewide asset.

Multiple data stewardship roles are established within the organization.

5 Optimizing WDQM

Continuous Process Improvement

Processes are executed in a strict hierarchy

DMAIC, executed according to Key Goal Indicators, monitored by Key Performance Indicators

Defined and role in life-cycle (CRUD) documented. Technology and information quality are observed as a whole

3.4 defects per million opportunities.

Working according to strict instructions, Staff and information quality are observed as a whole.

5 Optimized Gartner

Fully evolved EIM programs

Fully evolved EIM programs for managing their information assets with the same rigor as other vital resources, such as financial and material assets.

Rigorous processes are in place: ongoing housekeeping exercises, continuous monitoring of quality levels.

Data is enriched in real time by third-party providers with additional credit, demographic, sociographic, household, geospatial or market data.

Unstructured mission-critical information, such as documents and policies, becomes subject to data quality controls.

Quality metrics are attached to the compensation plans of data stewards and other employees.

Table 06: A combined view on the WDQM and the Gartner Data Quality Maturity model

What can be observed is that at level two in Gartner’s model the emphasis lies on being able to

develop the right solution, and at level three the focus shifts towards (pro-)actively monitoring and

ensuring data quality (Gartner, 2007). In the WDQM however, at level two the emphasis is on

repairing data quality issues in an ad-hoc manner, whilst at level three the focus is shifted toward

developing more robust and better aligned solutions. Indeed, the order in which these things take place

15-Apr-23 F. Boterenbrood Page 4715-Apr-23 F. Boterenbrood Page 47

Research Improving data quality in higher educationThesisImproving data quality in higher education

may be different depending on one’s viewpoint. One may argue that, in order to experience data

quality issues, one must be able to develop applications first. The WDQM is based on sound theories

on maturity, which state that a subject (data quality) is discarded first, then dealt with on ad-hoc basis

(i.e. ‘repaired’) and only understood and implemented more robustly at maturity levels three and

upwards (see figure 9). Therefore, in this research the WDQM will remain unchanged.

5.1.7 Conclusion

In the previous paragraphs a generic data quality maturity model has been found by

1. Identifying data quality improvement practices by literature study and interview;2. Assigning those practices to organizational structure, process, technology, information and staff,

thus creating a balanced view;3. Assigning the resulting set of practices to maturity levels by linking each measure with a specific

process area, creating a maturity matrix.

Finally, the resulting Windesheim Data Quality Maturity model WDQM is compared with other data

quality maturity models. Differences may be observed, and it is found that, based on one’s viewpoint,

the order of process areas in level two and three may vary, and is therefore open for discussion. Data

ownership, being an important issue when discussing data quality, has hardly been mentioned in

literature. It is suggested to replace data ownership by data stewardship at level 4.

Now a model for Data Quality Maturity has been developed, the data quality threshold is to be

established. In the next paragraphs, data quality attributes are defined and the domain business rules

are found.

5.2 Data Quality Attributes

In this paragraph, the search is on for answers to the following questions:

In higher education, what positive and negative correlations between maturity and data quality may be

found?

What values of data quality attributes correlate with each level of maturity?

What do process quality theories describe about positive correlations between quality and

maturity?

What do process quality theories describe about negative correlations between quality and

maturity?

Are those observations consistent?

5.2.1 Dimensions of data quality

In literature, data quality is defined by dimensions, and those dimensions in turn are measured by data

quality attributes (Loshin, 2008) (Batini & Scannapieco, 1998) (McGilvray, 2008). To find the right

data quality attributes, the dimensions have to be identified first. This paragraph establishes a view on

the dimensions of data quality.

What are the dimensions of data quality? When we examine literature on this topic, what we discover

is that many dimensions are defined but, unfortunately, naming and definitions vary between sources.

Table 7 presents an overview.

15-Apr-23 F. Boterenbrood Page 4815-Apr-23 F. Boterenbrood Page 48

Research Improving data quality in higher educationThesisImproving data quality in higher education

For each dimension, the table shows the definition, the source that supplied the definition and

relationships with other dimensions. This relationship is either specifically supplied by the source (for

instance, in the form of a formula) or it is found by comparing definitions (indicating that the

dimensions are actually synonyms).

Table 7 presents an non-normalized view on data quality dimensions found in literature. To create a

more usable view, this set of dimensions will be compacted by removing duplicates and synonyms. In

some cases, in literature dimensions were mentioned that relates more to quality of software than to

quality of data (Ease of use, Maintainability and Presentation Quality). These dimensions are omitted.

Dimension Defintion Source Related to

Accessibility Ease of attainability of the data Lee, Pipino, Funk, & Wang, 2006 Accessibility = 1 - (delivery time - input time) / (outdated time - input time)

Accuracy, Database

Correctness of data in the database Zeist, Hendriks, Paulussen, & Trieneken, 1996

Accuracy, Semantic

Closeness of value v to true value v’ Batini & Scannapieco, 1998

Accuracy, Syntactic

Closeness of value v to elements of the corresponding domain D

Batini & Scannapieco, 1998

Actuality, Database

Data in the database is in conformance with reality Zeist, Hendriks, Paulussen, & Trieneken, 1996

Accuracy, Timeliness

Completeness The extent to which data are of suffi cient breadth, depth and scope for the task at hand

Batini & Scannapieco, 1998

Completeness The degree in which elements are not missing from a set Lee, Pipino, Funk, & Wang, 2006

Consistency Violation of semantic rules over (a set of) data-items Batini & Scannapieco, 1998Consistency The degree in which values and formats of data elements

are used in a univocal wayLee, Pipino, Funk, & Wang, 2006

Consistency A measure of the equivalence of information stored or used in arious data stores, applications and systems

McGilvray, 2008

Currency Concerns how promptly data are updated Batini & Scannapieco, 1998 Currency = delivery time – input time + age

Currency Lee, Pipino, Funk, & Wang, 2006 Currency = delivery time – input time + age

Data Coverage A measure of the availability and comprehensiveness of data compared to the total data universe or population of interest

McGilvray, 2008 Completeness

Decay A measure of the rate of negative change to the data McGilvray, 2008 TimelinessDuplication A measure of unwanted duplication McGilvray, 2008 UniquenessEase of use A measure of the degree to which data can be accessed

and usedMcGilvray, 2008

Format Compliance

The degree in which a modeled object conforms to the set of rules bounding its representation

Loshin, 2008

Integrity, Data A measure of total data quality McGilvray, 2008Integrity, Referential

The degree in which related sets of data are consistent Chen, 1976

Maintainability The degree to which data can be updated, maintained and managed

McGilvray, 2008

Presentation Quality

A measure of how information is presented to and collected from those who utilize it

McGilvray, 2008

Reliability Free Of Error Lee, Pipino, Funk, & Wang, 2006Reliability The degree in which data represent reality Verreck, Graaf, & Sanden, 2005Specifications A measure of the existence, completeness, quality and

documentation of data standardsMcGilvray, 2008

Timeliness Timeliness expresses how current data are for the task at hand

Batini & Scannapieco, 1998 Timeliness = 1 – volatility / currency

Timeliness Timeliness can be measured as the time between when information is expected and when it is readily available for use

Loshin, 2008

Timeliness Or Availability : A measure of the degree to which data are current and available for use

McGilvray, 2008 Availability

Trust A measure of the confidence in the data quality McGilvray, 2008 ReliabilityUniqueness Refers to requirements that entities .. are captured,

represented, and referenced uniquelyLoshin, 2008 Consistency = f(uniqueness)

Usability The total fitness of data for use Verreck, Graaf, & Sanden, 2005 Usability = Reliability *

Relevance, U=R2

Volatility Characterizes the frequency with which data vary in time Batini & Scannapieco, 1998 Decay, Currency

15-Apr-23 F. Boterenbrood Page 4915-Apr-23 F. Boterenbrood Page 49

Research Improving data quality in higher educationThesisImproving data quality in higher education

Table 07: An overview of data quality dimensions

One data quality dimension is mentioned, yet rather loosely defined: integrity. Integrity is defined to

be an over-all measure of data quality. In this research, data is defined to be integer once it is fit for

use. (see paragraph 3.1.4 What is data quality). This also means that usability and integrity are

synonymous.

A dimension that is not explicitly mentioned in literature on data quality is security. Markus Schumacher et al identify four data quality dimensions related to security: Confidentiality, Integrity, Availability and Accountability (Schumacher, Fernandez-Buglioni, Hybertson, Buschmann, &Sommerland, 2006). We may recognize Integrity and Availability to be part of the model already. Integrity is defined to be an over-all measure of data quality, acting as a container for all other dimensions of data quality. In data Quality literature, Availability is commonly known as Timeliness. Confidentiality and Accountability are added to the list of data quality dimensions. Confidentiality is the property that data is disclosed only as intended by the enterprise (Schumacher, Fernandez-Buglioni, Hybertson, Buschmann, & Sommerland, 2006), while Accountability is the property that actions affecting enterprise assets can be traced to the actor responsible for the action (Schumacher,Fernandez-Buglioni, Hybertson, Buschmann, & Sommerland, 2006). Timeliness is defined by (Batini& Scannapieco, 1998) as 1 – volatility / currency. Volatility being a frequency, and Currency being a timeframe, it is proposed in this research to replace this by a simpler equation: Timeliness = Volatility * Currency. In this case, when Currency < Volatility, Timeliness < 1. When Currency > Volatility, Timeliness > 1.

The analysis results in table 8: Dimensions of data quality.

Dimension Defintion Related toAccessibility Ease of attainibility of the data Accessibility = 1 - (delivery time - input

time) / (outdated time - input time)Accountability Accountability is the property that actions affecting enterprise

assets can be traced to the actor responsible for the actionSecurity

Accuracy Closeness of value v to true value v’Completeness The degree in which elements are not missing from a set

Confidentiality Confidentiality is the property that data is disclosed only as intended by the enterprise

Security

Consistency The degree in which values and formats of data elements are in line with semantic rules over this set of data-items

Consistency = f(Uniqueness)

Currency Concerns how promptly data are updated Currency = delivery time – input time + ageIntegrity, Data The degree in which data is fit for useIntegrity, Referential The degree in which related sets of data are consistent ConsistencyReliability The degree in which data is perceived to represent realitySpecifications A measure of the existence, completeness, quality and

documentation of data standardsTimeliness Or Availability : A measure of the degree to which data are

current and available for useTimeliness = volatility * currency

Uniqueness Refers to requirements that entities are captured, represented, and referenced uniquely

Volatility Characterizes the frequency with which data vary in time

Table 08: Dimensions of data quality

Batini points out that dimensions could be conflicting: “For instance, a list of courses published on a

university web site must be timely though there could be accuracy or consistency errors and some

fields specifying courses could be missing” (Batini & Scannapieco, 1998).

5.2.2 Data Quality Dimensions Discussed

Now the final set of dimensions of data quality is identified, can individual dimensions be assigned to

levels of the WDQM? In other words, can it be argued that a certain level of maturity has to be

15-Apr-23 F. Boterenbrood Page 5015-Apr-23 F. Boterenbrood Page 50

Research Improving data quality in higher educationThesisImproving data quality in higher education

mastered in order to be able to satisfy (a group of) data quality dimensions? In this paragraph, this

question is explored by identifying the measures which establishes the data quality dimension,

comparing these measures to WDQM process areas, thus binding the dimension to the corresponding

WDQM maturity level, and finally defining the corresponding data quality attribute(s).

Accessibility

Accessibility deals with the fact that data needs to be delivered before it becomes insignificant

(outdated). This makes for a rather complex, compound dimension. Accessibility is influenced by

Volatility (the rate at which data changes), Timeliness (the speed at which data is available for use)

and Currency (the speed at which data is updated in the system). Both Timeliness and Currency are

positioned at level 4, thus Accessibility can only be guaranteed at level 4, Quantitatively Managed.

Accessibility is measured by a ratio, indicating the ease of attainability of the data. Accessibility = 1 -

(delivery time - input time) / (outdated time - input time) (Lee, Pipino, Funk, & Wang, 2006).

Accountability

Markus Schumacher et al identifies a series of security patterns especially focused on maintaining

Accountability. Security accounting is a service area that performs four functions: Capture, Store,

Review and Report data about security events (Schumacher, Fernandez-Buglioni, Hybertson,

Buschmann, & Sommerland, 2006). Patterns used to execute this process are security accounting,

audit service, audit trail, and intrusion detecting (Schumacher, Fernandez-Buglioni, Hybertson,

Buschmann, & Sommerland, 2006).

To be able to implement these patterns, a view on data structures and data quality is required, as well

as well-defined and independent operating ROTAP environments. It may therefore be argued, that

Accountability can be maintained no earlier than at maturity level 3, defined.

Attributes involved are actors involved, assets affected, time, date and place of the event, and methods

used (Schumacher, Fernandez-Buglioni, Hybertson, Buschmann, & Sommerland, 2006).

Accuracy

Accuracy is about getting the data right, being as close to reality as possible. Amongst measures

ensuring accuracy are data profiling and cleaning. This means that a certain level of Accuracy can be

achieved at maturity level 2, managed, be it in a reactive manner and at high costs in terms of time

and labor. At this level, flaws in accuracy will continue to return, jeopardizing reliability. At level

maturity level 3, Defined, by implementing robust applications, utilizing various types of input

checks, an acceptable level of accuracy will be achieved in a more lasting fashion.

Accuracy is measured by the Accuracy error, a ratio ranging between 0 and 1, indicating the number

of characters, data elements or database tupels being in error as a fraction of the total number of

characters, data elements or database tupels (Batini & Scannapieco, 1998).

Completeness

Completeness is about getting all the data. To get all data elements in time, the business process needs

to be well organized and scheduled, with all sub-processes delivering detailed information right on

time. Therefore, maturity level 3, defined, is required to effectively organize for completeness. If

15-Apr-23 F. Boterenbrood Page 5115-Apr-23 F. Boterenbrood Page 51

Research Improving data quality in higher educationThesisImproving data quality in higher education

processes are not controlled efficiently, the process will either continue without the required data or

will come to a halt until the required data is delivered, and timeliness will be jeopardized.

Completeness cannot be ‘fixed’ with data profiling and cleaning techniques, since missing data will

only be available once processes responsible for this data have delivered their output.

Completeness is measured as a ratio ranging between 0 and 1, indicating the number of data elements

missing as a fraction of the total number of data elements (Lee, Pipino, Funk, & Wang, 2006).

Confidentiality

Confidentiality is about securing data from unauthorized access. To this means, a multitude of

security patterns exists, each of which may be invoked and combined into security services according

to security levels required (Schumacher, Fernandez-Buglioni, Hybertson, Buschmann, &

Sommerland, 2006). Security services do well in binding specific security patterns to specific

situations, they do not help us positioning Confidentiality in a maturity model. However, Schumacher

et al identifies three basic security access types currently in use: the access matrix, the role-based

access control model (RBAC) and the multilevel model (Schumacher, Fernandez-Buglioni,

Hybertson, Buschmann, & Sommerland, 2006).The most basic type, the access matrix, provides

access to resources by identifying which active entity in a system may access what resources and how

(Schumacher, Fernandez-Buglioni, Hybertson, Buschmann, & Sommerland, 2006). Role based access

simplifies access right management by grouping active entities into roles, and assigning generic rights

or each role (Schumacher, Fernandez-Buglioni, Hybertson, Buschmann, & Sommerland, 2006). This

way, once a new participant enters the organization, only the correct roles need to be added to his

identification, instead of painstakingly assigning all individual rights. In multilevel security,

sensitivity is defined at data level, not at resource level, users receive clearance, and access of users

with specific clearance levels to data is based on policies These access types may well help us

positioning Confidentiality in the maturity model, since applying these styles requires for different

insight in stakeholders and processes.

The most simple style, the access matrix, already requires a view on stakeholders involved in the

business process and the individual resources required to perform their tasks. This requires processes

to be defined by the organization, and therefore confidentiality can only be effective guaranteed at

maturity level 3, Defined. On top of this, more advanced modes of access management require

processes to be measured and controlled, and therefore both role based access and multilevel security

can be deployed effectively once an organization has reached data quality maturity level 4,

Quantitatively Managed.

Security may be expressed in security services (Schumacher, Fernandez-Buglioni, Hybertson,

Buschmann, & Sommerland, 2006).

Consistency

Consistency is the degree in which values and formats of data elements are in line with semantic rules

over the set of data-items. The first observation is that de-duplication of data may improve

consistency, since data stored in multiple locations is likely to get corrupted. There is a relation

between consistency and uniqueness, in that increasing uniqueness will support consistency.

Therefore, consistency will benefit from a holistic view on information processing, in which attention

is paid to the dispersion of data within an organization. Such a holistic view is referred to as an

Enterprise architecture (Lankhorst, 2005) (Boterenbrood, Hoek, & Kurk, 2005) and this would

15-Apr-23 F. Boterenbrood Page 5215-Apr-23 F. Boterenbrood Page 52

Research Improving data quality in higher educationThesisImproving data quality in higher education

seemingly fit maturity levels 4 and 5. However, consistency can be achieved using data profiling and

cleaning techniques, therefore consistency can be achieved at maturity level 2, Managed, already.

Consistency is measured as a ratio ranging between 0 and 1, indicating the number of data elements

violating a specific consistency type as a fraction of the total number of data elements (Lee, Pipino,

Funk, & Wang, 2006).

Currency

Currency describes how promptly data are updated and is an function of age (of the data), delivery

time and input time: Currency = age + delivery time – input time. Currency is the sum of how old the

data was when it was received plus a term that measures how long data has been in the information

system (Batini & Scannapieco, 1998) (Lee, Pipino, Funk, & Wang, 2006). Currency is targeted by

straight-through processing, in which near real time service oriented technologies replace

cumbersome batch procedures (Pant & Juric, 2008). This is reflected in Master Data Management,

where development of a Service Oriented Architecture is firmly positioned at level 4 (Loshin, 2008).

Currency can be measured in days or milliseconds. However, to be able to reach any agreement on

currency, business processes need to be measured and controlled. Currency therefore, is indeed a data

quality dimension that can only effectively be implemented and discussed at maturity level 4:

quantitatively managed.

The measure for currency is Time.

Data Integrity

Data integrity is an indication for the degree in which data is fit for use. The data are integer once they

are deemed fit for use. This data quality dimension therefore acts as an container dimension, covering

all other aspects of data quality.

Referential Integrity

Referential integrity refers to the degree in which related sets of data are consistent. It is therefore a

special instance of consistency. Referential integrity is introduced by (Chen, 1976), and within one

database easily enforced by the implementation of referential constraints. Referential integrity may

therefore be well achieved at maturity level 2, Managed, where data rules enforce referential integrity.

Referential Integrity is measured as a ratio ranging between 0 and 1, indicating the number of

database tupels violating a specific relation type as a fraction of the total number of database tupels

(Lee, Pipino, Funk, & Wang, 2006).

Reliability

Reliability is the measure in which the data is perceived to represent reality. A synonym is trust

(McGilvray, 2008). According to the WDQM model, reliability is first achieved at level 3, Defined.

Reliability is binary: data is trusted, or it is not.

15-Apr-23 F. Boterenbrood Page 5315-Apr-23 F. Boterenbrood Page 53

Research Improving data quality in higher educationThesisImproving data quality in higher education

Specifications

Specifications is a measure of the existence, completeness, quality and documentation of data

standards (McGilvray, 2008). As such, specifications are required for processes like source rating,

schema matching and cleaning, business rule matching and new data acquisition. De Graaf mentions

Insight as an important dimension of data quality (see appendix 6.4):

‘Insight in data means that it is clear for an organization what data attributes are required or available,

where and why these data attributes are created, what sources were used, where these attributes are

used, who guards and tests the attribute, when these attributes are outdated and, once obsolete, how they

are dealt with’ (Interview de Graaf, appendix 6.4).

Insight can be seen as a result from valid specifications, and is an important prerequisite for further

data quality improvement. Therefore we may expect Specifications to be present at level 2, Managed.

Specifications is binary: they are either present or absent. Incomplete, faulty or outdated specifications

fall in the absent category, since they do not contribute to reliable Insight.

Timeliness

Timeliness, or Availability is a measure of the degree to which data are current and available for use

(Batini & Scannapieco, 1998) (Loshin, 2001) (McGilvray, 2008).

Timeliness is measured as a ratio, indicating the availability for use of the data. It is expressed as a

function of Volatility and Currency: T = V*C. If currency is larger than the volatility ‘wavelength’,

timeliness becomes larger than one, meaning it is becoming less fitting. Volatility is a fixed

parameter, therefore to increase Timeliness, Currency needs to be reduced.

Since Currency is positioned at level 4, Quantitatively managed, an effective implementation of

Timeliness requires an organization to have reached level 4 as well.

Uniqueness

Uniqueness refers to requirements that entities are captured, represented, and referenced uniquely

(Loshin, 2008). In the definition given by Loshin (2008), uniqueness is bound to data in a database or

file system: “The dimension of uniqueness is characterized by stating that no entity exists more than

once within the data set” (Loshin, 2008). This implementation of uniqueness is available at level 2,

Managed, already, since data profiling tools and database constrains simply enforce this rule.

For uniqueness, no attribute has been published. Therefore, it is proposed to measure uniqueness as a

ratio ranging between 0 and 1, indicating the number of data elements being duplicated as a fraction

of the total number of data elements in a database or file.

Volatility

Volatility characterizes the frequency with which data vary in time (Batini & Scannapieco, 1998). A

synonym is decay (McGilvray, 2008). Volatility is actually not so much a data quality dimension, it is

more a dimension of data itself. Data IS volatile. Therefore, volatility is present at maturity level 1,

Initial, be it that it is recognized by just a few specialists within the organization (Interview de Graaf,

appendix 6.4). At maturity level 2, Managed, volatility is recognized by business management to be a

characteristic of data. At maturity level 3, Defined, systems are build with volatility in mind.

15-Apr-23 F. Boterenbrood Page 5415-Apr-23 F. Boterenbrood Page 54

Research Improving data quality in higher educationThesisImproving data quality in higher education

The measure for volatility is Frequency.

Level 5, Optimizing

Surprisingly, in literature on data quality, no specific data quality attributes are defined

operationalizing level 5, optimizing. At this level, all data quality process areas have already been

mastered20. Therefore, an additional theory is required, extending the reach of data quality into the

field of continuous improvement. Six Sigma is such an theory. Six Sigma results in data quality being

constantly improved, by implementing the DMAIC cycle, controlled by Key Goal Indicators

measured by Key Performance Indicators. The whole data life cycle is observed, leading to 3.4 defects

per million opportunities, in accordance with Service Level Agreements (Boer, Andharia, Harteveld,

Ho, Musto, & Prickel, 2006). Thus, metrics at this level are process oriented, not strictly data quality

oriented. To find metrics for this level, Six Sigma leads the way.

According to Six Sigma, it is primarily the spread of errors (unpredictability of a process) that

contributes to costs (Boer, Andharia, Harteveld, Ho, Musto, & Prickel, 2006, p. 36). Therefore, a

measure of quality on this level is a statistical one: the standard deviation (sigma, σ). For this, a mean

and a variance are set as goals. KPI’s are defined by Controls like External Critical to Quality (Ext

CTQ), Internal Critical to Quality (Int CTQ), Unit, Defects and Opportunities, and Population.

This level relates to the over-all data quality dimension, Data Integrity. Data Integrity will approach

six sigma (6σ)

5.2.3 WDQM Goals

The assignment of data quality dimensions and attributes to maturity levels results in the definition of

goals for the WDQM process areas. Now, for each level, data quality process areas, goals and metrics

are available. However, it proofed to be impossible to base every decision on published and well

accepted theories. In many cases, similarities in definitions gave all the information available and

sometimes, only rigor of reasoning could shed light on where to position a dimension. Therefore, to

increase validity, this model is discussed with an external expert (see appendix 6.4). Table 9 presents

the goals for each maturity level in the WDQM model.

20 It is to be noted however, that both TDQM (Lee, Pipino, Funk, & Wang, 2006) and TIQM (English, 2009) support the six sigma DMAIC-style quality improvement cycle

15-Apr-23 F. Boterenbrood Page 5515-Apr-23 F. Boterenbrood Page 55

Research Improving data quality in higher educationThesisImproving data quality in higher education

Lev Dimension Data Quality Practice Data Quality Dimension Attribute

1, initialVolatility Data IS volatile, volatility is not yet recognized Frequency

2, managedAccuracy Data profiling and cleaning Ratio ranging between 0 and 1, indicating the

number of characters, data elements or database tupels being in error as a fraction of the total number of characters, data elements or database tupels

Consistency Data profiling and cleaning Ratio ranging between 0 and 1, indicating the number of data elements violating a specific consistency type as a fraction of the total number of data elements

Integrity, Referential

Establish referential database constraints Ratio ranging between 0 and 1, indicating the number of database tupels violating a specific consistency type as a fraction of the total number of database tupels

Specifications Specifications Engineering Specifications is binary: they are either present or absent

Uniqueness Data profiling and establishment of database constrains Ratio ranging between 0 and 1, indicating the number of data elements being duplicated as a fraction of the total number of data elements in a database or file

Volatility Volatility is recognized as a characteristic of data Frequency

3, definedAccountability Event history management Accountability is binary: Updates are accounted

for, or they are notAccuracy Engineering of robust applications, utilizing various

types of input checksSee Accuracy, level 2

Completeness Business processes need to be well organized and scheduled, with all sub-processes delivering detailed information right on time

Ratio ranging between 0 and 1, indicating the number of data elements missing as a fraction of the total number of data elements

Confidentiality Basic Patterns, i.e. Access matrix autorization Security Service Level

Reliability Level 3, defined, is to be achieved Reliability is binary: data is trusted, or it is not

Volatility Build systems with volatility in mind Frequency

4, quantitatively managedAccessibility Optimize Timeliness (i.e. Currency) Accessibility = 1 - (delivery time - input time) /

(outdated time - input time)Confidentiality Advanced patterns, i.e. role based access or multilevel

security Security Service Level

Consistency Create an Enterprise Architecture Ratio ranging between 0 and 1, indicating the number of data elements violating a specific relation type as a fraction of the total number of data elements

Currency Design for straight-through processing, business processes need to be measured and controlled

Currency (Time) = delivery time - input time + age

Timeliness Volatility is a fixed parameter, therefore to increase timeliness, currency needs to be reduced

Ratio as a function of volatility and currency: Timeliness = V*C

5, optimizingIntegrity, Data Instituting DMAIC, SLA, KGI, KPI, data life cycle

managementExternal Critical to Quality (Ext CTQ), Internal Critical to Quality (Int CTQ), Unit, Defects and Opportunities, and Population. Data Integrity reaches six sigma

15-Apr-23 F. Boterenbrood Page 5615-Apr-23 F. Boterenbrood Page 56

Research Improving data quality in higher educationThesisImproving data quality in higher education

Table 09: WDQM Goals expressed in Data Quality Dimensions, Practices and Attributes

15-Apr-23 F. Boterenbrood Page 5715-Apr-23 F. Boterenbrood Page 57

Research Improving data quality in higher educationThesisImproving data quality in higher education

5.2.4 (Time)related dimensions

Volatility, Currency, Timeliness and Accessibility describe the interaction of time and data. These

dimensions are firmly related, one dimension may actually determine the value of another.

Volatility describes at which frequency data changes in the real world, while Currency describes how

promptly data are updated in an information system. Currency is age + delivery time – input time,

meaning that data, before it is finally delivered, has been lingering both inside and outside the

information system for a certain period. Figure 11 shows two different values for Currency: A and B.

Timeliness describes the relation between Volatility and Currency. T=V*C, meaning that when

Currency is smaller than Volatility (A), Timeliness is smaller than one (T<1) and stakeholders have

access to data before the next change occurs. When delivered, data is current. If Currency is larger

than Volatility (B), Timeliness becomes larger than one (T >1) and data is changed in the real world

before stakeholders have access to this data. When delivered, data is no longer current. Wither this is a

problem is determined by Accessibility. Accessibility deals with the fact that data needs to be

delivered before it becomes insignificant. It is a ratio: Accessibility = 1 - (delivery time - input time) /

(outdated time - input time), in which we may well recognize the relation between Accessibility and

Currency.

Figure 11 shows Accessibility for Currency value A. Note that Outdated time does not necessarily

have a relationship with Volatility nor Currency.

Figure 11: Related Dimensions

15-Apr-23 F. Boterenbrood Page 5815-Apr-23 F. Boterenbrood Page 58

Frequency

Update

Rest

Time

Volatility

Time

Currency

AB

AccessibilityTime

Delivered

NotDelivered

Age Outdated timeDelivery time

Input time

A

Research Improving data quality in higher educationThesisImproving data quality in higher education

5.3 Business rules

This paragraph focuses on the following questions:

In higher education, what positive and negative correlations between maturity and data quality may be

found?

For this research, what is the relevant set of business rules?

How will this set of business rules evolve in time?

What data quality attributes are relevant for these business rules?

In this paragraph a view on business rules will be established first. Secondly, the business domain this

research focuses on will be defined and scoped. Based on design documents, relevant business rules

are identified. Finally, the business rules lead to the selection of relevant data quality dimensions, the

variables of which are populated using a workshop.

5.3.1 Business rules, a definition

In order to find and populate the right data quality attributes, the business rules that need to be met

have to be defined first. In literature, the view on what business rules are, slightly differs. Business

rules are “ a written definition of a business’s policies and practices” (Agrawal, Calo, Lee, Lobo, &

Verma, 2008) or “… requirements of the business that must be adhered to in order for the business to

function properly” (Johnson & Jones, 2008). They encompass “…the controls, processes mechanisms,

and standard operating procedures (SOPs) that need to be followed” (Conway & Conway, 2008).

In the view of D. Agrawal et al, business rules are high level descriptions, guiding the behavior of an

organization. Described at this level, the business rules might proof not to be specific enough in order

to obtain corresponding data quality attributes. The more specific notion, that of business rules being

requirements of the business (Johnson & Jones, 2008), encompassing controls, processes, mechanisms

and operating procedures (Conway & Conway, 2008) seems to be more fitting. At this level, they are

referred to as production rules by D. Agrawal et al. In this research, the operational notion of business

rules, as defined by (Johnson & Jones, 2008) (Conway & Conway, 2008) will be used.

To be meaningful, the notation of a business rule is to adhere to certain semantics:

“A business rule is a compact, atomic, well-formed, declarative statement about an aspect of a business

that can be expressed in terms that can be directly related to the business and its collaborator, using

simple, unambiguous language that is accessible to all interested parties: business owner, business

analyst, technical architect, customer, and so on. This simple language may include domain-specific

jargon” (Graham, 2007).

The interesting aspects of this definition of a business rule are that it is atomic (self-contained), well-

formed (written according to specific rules) and declarative (written in a statement style vocabulary).

A well-formed business rule is written in a when – then type of construct (Davis, 2009). Business rules

are about making decisions, and for good decisions, valid information is required, also referred to as

facts (Davis, 2009).

15-Apr-23 F. Boterenbrood Page 5915-Apr-23 F. Boterenbrood Page 59

Research Improving data quality in higher educationThesisImproving data quality in higher education

5.3.2 Study management

A brief history

Study management is without a doubt the most important business domain within an educational

institution. In a response to the emergence of the European Higher Education Area (EHEA) (Vught &

Huisman, 2009), Windesheim developed a new view on study management (Broers, 2007). This new

view, identified as student centered education, would offer the student more freedom in selecting

education of his own choice (Broers, 2007). This, together with the adoption of the European Credit

Transfer System, spawned a redesign of the curricula of the various Schools at Windesheim (Broers,

2007). At the basis of this redesign of curricula, a new didactical process was designed and standards

were put in place, guiding the change process. These standards were described and accepted in the

Windesheim Onderwijs Standaards21 WOS (Iersel, Loo, Serail, & Smulders, 2009). In 2006, a domain

architecture was designed, guiding the development and implementation of new information

technology (Jansen, 2006). The domain architecture incorporated the field of education, i.e.

management of the education catalogue, the study process itself (minor selection, study process and

assessments), and management of grades (manage study progress), as shown in figure 12 (Jansen,

2006).

RequestInformation Apply Graduate

DidacticalProcess

Scheduleopvragen

StudyAssess student

Supportingprocesses

Control Process

ManageAssessments

Leerprocesbegeleiding

ManageEducationCatalogue

(Studie-) loopbaan begeleiding

OrganizeEducation

CreateManagementInformation

ManageStudyProgress

DevelopEducation

Planning &ControlCycle

Beheer Bibliotheek

Beheerbekostiging

BeheerStudent gegevens

ScheduleProcess

Ondersteunenonderwijs-kundigen

Psychologische ondersteuning

Digididact:Beheer ELO

Internatio-nalisering

Begeleidinguitwisselingbuitenland

Decanaat

ondersteuning

OndersteunenSchools enbedrijfsburo

Werkendleren

Engage

AlumniApply

Select

OriëntateDiscussProgress

Contract

Figure 12: Domain architecture student centered education Windesheim (Jansen, 2006)

In 2007, the COTS22 application Educator was selected to support study management, and currently

implementation of this system is an ongoing process.

Looking ahead

In study management, potentially interesting experiences are becoming available. After investigating

the business rules, two case studies will be performed, collecting experiences from the Windesheim

School of Build, Environment and Transport and the Windesheim School of Business and Economics

respectively. Based on the importance of this domain to Windesheim, and the availability of

potentially interesting experiences in this field, this research will focus on the domain of study

definition, education, assessment and grading, supported by the information system Educator.

21 Windesheim Educational Standards

22 Commercial Of The Shelve

15-Apr-23 F. Boterenbrood Page 6015-Apr-23 F. Boterenbrood Page 60

Research Improving data quality in higher educationThesisImproving data quality in higher education

5.3.3 Business rule mining

For study management, the WOS (Iersel, Loo, Serail, & Smulders, 2009) identifies a set of business

rules in the form of high level descriptions, guiding the behavior of an organization (Agrawal, Calo,

Lee, Lobo, & Verma, 2008). These rules are presented in appendix 6.12.

However, the abstraction level of most of these rules is too high. In order to be able to define a data

quality threshold, a translation to more specific requirements (Johnson & Jones, 2008) is needed. This

translation is offered by the domain architecture (Jansen, 2006) European rules on Higher education

(European Commission, 2005) and Educator operating instruction notes. Information on scheduling is

provided by (Riet, 2009). And finally, to be useful for further analysis, business rule notation need to

adhere to the definitions of (Graham, 2007) and (Davis, 2009).

In order to identify the business rules more clearly, the business rules are arranged according to the

business processes identified in figure 12, Domain architecture student centered education

Windesheim (Jansen, 2006). These, more detailed business rules are documented in appendix 6.13.

Now in the domain of study management the relevant business rules have been identified, current and

required data quality maturity levels of the Educator domain at Windesheim may be defined.

5.4 Current data quality maturity level study management domain

In this paragraph, the following research question is answered:

What are the current organizational maturity and current values of data quality attributes?

Current data quality maturity at Windesheim can be established by finding data quality practices

currently invoked and trying to establish a view on the current values of the maturity dimensions.

However, it should be noted that it is more easy to ascertain wither a practice is in place, which is

essentially something that is been done or not, than it is to try to figure out what the value of a data

quality dimension is, which in most cases requires analysis tools to establish a measurement.

Therefore, discussing data quality dimensions was mainly used as a check on completeness,

improving research quality, making sure no issue has been overlooked. Nevertheless, the values of

data quality attributes, populating the data quality dimensions, are discussed at the end of this

paragraph.

In interviews with stakeholders current data quality practices and dimension values have been

discussed. Stakeholders include representatives of operations, functional support and process design,

as well as teaching and management staff from within schools. In total, five members of staff have

been interviewed. Each interview collected information on experiences and solutions first, and

discussed current values of maturity dimensions later. These interviews are documented in the

appendices 6.5 through 6.10.

15-Apr-23 F. Boterenbrood Page 6115-Apr-23 F. Boterenbrood Page 61

Research Improving data quality in higher educationThesisImproving data quality in higher education

5.4.1 Interview results

It is found that stakeholders not always agree on topics presented. Some think rather positive of

accessibility, while others point out that some functions of Educator are seemingly unnecessary

complex and over-engineered, hampering accessibility. Also, accountability and confidentiality are

regarded fitting by some, while others reveal breaches in security functions, compromising

confidentiality. Interesting is the observation on the role-based access mechanism of Educator, which

is deemed far too complex by one and perfectly flexible by the other.

Issues interviewees agreed upon are that Educator has far too few reporting options, enabling data to

be monitored, and that people entering data into Educator, creating havoc down the line, should be

confronted with and made to solve these problems themselves. However, some also noted that process

execution and process control require separate functions. An example is the support offices checking

milestones, study plans and course definitions.

It is found too, that the School of Business and Economics has experienced most problems, being

amongst the first using Educator, while the School of Build, Environment and Transport, entering the

Educator arena later, has learned from these experiences and strengthened their processes first, before

deciding to implement Educator.

Surprisingly, even though some improvements on data validity input checks were mentioned, no

interviewee believed that input checks could prevent data errors all together. This is supported by the

wide spread desire to check data using reports (reactive data quality management), rather than rely on

data being checked before it is stored (proactive data quality management).The School of BET has put

procedures in place guarding data quality prior to entering data into Educator. Yet, data may get

corrupted unnoticed as a result of software bugs or human error. In these cases, issues are corrected

once students complain.

It is commonly believed that the definition of courses is complex. The current product structure (OE

and VOE combined with Semester plan and Semester variant plan) is mentioned not to be used as

originally intended (educational process and course definitions are not in conformity). In at least one

case, verification and validation was implemented using manual processes outside Educator.

Timeliness and Completeness are noticed to be conflicting dimensions. In at least one case it was

mentioned that, in order to satisfy timeliness, completeness of data could be sacrificed. Important

milestones driving Timeliness are:

1. Validation and finalizing of course descriptions,

2. Validating and finalizing student activity plans,

3. Grading,

4. Valuating the outcome of the propaedeutics phase,

5. And, in the (near) future, printing diplomas and certificates.

5.4.2 Current Maturity

When we observe the data quality process areas of table 5, Windesheim Data Quality Maturity model,

we may observe that for 1. Structure, 2. Process, 3. Technology, 4. Information and 5. Staff some

process areas are available:

15-Apr-23 F. Boterenbrood Page 6215-Apr-23 F. Boterenbrood Page 62

Research Improving data quality in higher educationThesisImproving data quality in higher education

1. Level 2 process areas Project based development, Project teams and Ad Hoc problem solving

are present;

2. Level 2 process areas Data profiling and cleaning, Source rating, Schema matching and

cleaning, Business rule matching and New data acquisition are NOT present;

3. Level 2 process areas Data Analysis and Cleaning tools are NOT present. The File Transfer

data exchange pattern however IS present.

4. Awareness of the relevance of data quality is present and information is not trusted indeed.

5. Level 2 process areas Analytical competent, Knowledge of technology, business rules and

data sources, Data modeling knowledge are present.

Additional, we may recognize data quality process areas from higher levels being discussed as well:

6. Level 3 process area Technical Solution is being discussed.

7. Level 3 process area Data Responsible is being discussed and implemented.

Given the fact that the collection of level two process areas is only partially met, we may conclude

that currently, in the Educator domain, the data quality maturity of Windesheim still remains on level

one (Initial).

5.4.3 Current data quality dimension’s attribute values

In this paragraph, Table 9: WDQM Goals expressed in Data Quality Dimensions, Practices and

Attributes is used as a basis for triangulating the current data quality maturity in the study

management domain. This evaluation offers another view on current data quality maturity, validating

the observations in the previous paragraph.

It should be noted that, since none of the process areas Data profiling and cleaning, Source rating,

Schema matching & cleaning and Business rule matching are available, exact values could not be

assigned to data quality dimension’s attributes. However, in some cases the interviews indicated

dimensions to be ‘in control’ while other dimensions needed more attention.

WDQM Level five dimensions

Data Integrity not meeting six sigma by far, does not come as a surprise. Therefore, level five of the

WDQM has not been met indeed.

WDQM Level four dimensions

One dimension interviewees agreed upon was Volatility. It was commonly believed that volatility of

data in the Educator domain was low. Changes in data occurred once every few weeks, months and

even years. And even then, in Educator data does not actually change, in most cases new data is

added, extending the information already available. It was found that study information is altered

annually, or every half year in some cases. Grades are created quarterly, amounting to about 230.000

grades being registered at Windesheim each study period. Study plans are extended every six months.

That said, for the school of Build, Environment and Transport, current volatility of course definitions

was still deemed to be too high; it seems course information is adapted every few months, while this

type of data should be stable for at least three years.

A low-frequency Volatility should be good news for Currency. Currency however was reported to

have been troublesome, caused by instability of Educator, and the manual part of the process

15-Apr-23 F. Boterenbrood Page 6315-Apr-23 F. Boterenbrood Page 63

Research Improving data quality in higher educationThesisImproving data quality in higher education

consuming too much time. This last issue was solved by making the stakeholder entering the data

responsible for dealing with the consequences of long waiting times. As a result, Currency had been

improved.

Currency, Volatility and Timeliness are all related dimensions, therefore with Volatility being

comfortably low and Currency improving, Timeliness of data may be expected to be in control.

However, at this moment Timeliness is still mentioned to be problematic. When new education is to

be developed, development has to start well in advance of the targeted study period in order to deliver

study information in time. For many, this aspect of the educational process planning is perceived as

being complex, and activities are commonly initiated too late. Accessibility is little understood during

the interviews, yet when timeliness and currency are not in control, Accessibility is not in control

either.

An exception to this rule may be found at the school of Build, Environment and Transport. Here, the

business process served by Educator is strictly managed manually, having data being entered into the

information system only after elaborate checks. The study process itself is highly standardized,

resulting in more clarity for stakeholders involved. As a result, the school of Build, Environment and

Transport reports Accessibility to be in control.

The last dimension populating level four are Consistency and Confidentiality. As a result of the

system design, offering a comprehensive role based access mechanism, Confidentiality was perceived

to be fitting. One interviewee noted that Educator offered some back-door entries, suggesting possible

breaches in Confidentiality. Therefore, even while Educator offers level four compliant authorization

mechanisms, Confidentiality is in doubt. On Consistency, it seems the situation has grown from Bad

to Better. Educator is said to generate data codes automatically, replacing more and more manual data

code definitions, thus improving Consistency. At the school of Build, Consistency was improved by

rigid process design. Even though, it has been mentioned that course definitions are not consistently

described throughout the system, therefore Consistently too is not being met at WDQM level four.

Currently, based on data quality dimension values, data quality has not reached WDQM level four.

WDQM level three dimensions

Accountability is believed to be adequate, be it that in one situation it is found that an audit trail may

be omitted. Since this situation is to be considered a manual correction of erroneous datasets, and this

situation is recognized to be in decline, Accountability may be regarded fit for current business rules.

At level three, Accuracy is guarded using application input checks. Currently, this is not the case in

most instances. Again, the school of Build, Environment and Transport is less pessimistic, using strict

data input procedures. Yet, even here it is recognized that there still is room for improvement.

Completeness is regarded to be in control by many. However, the current dead-lines in Educator’s

process implementation (Timeliness) is said to have a negative impact on Completeness.

Confidentiality at this level is implemented by access matrices. Even though Educator offers role-

based access, reported back-door threads may render confidentiality inadequate.

Reliability is reported as being absent. Many inexplicable data quality issues were mentioned,

reducing reliability. It was mentioned that using Educator only once in a while, and inadequate

training and documentation may well be at the source of doubts. Often teachers make mistakes,

15-Apr-23 F. Boterenbrood Page 6415-Apr-23 F. Boterenbrood Page 64

Research Improving data quality in higher educationThesisImproving data quality in higher education

blaming the system. The absence of basic reporting facilities was mentioned as another cause of lack

of reliability. The school of Build reports to rely on their process design.

Volatility. It is not feasible to assess wither Educator was build with volatility in mind.

With the exception of Completeness, WDQM level three dimensions have not been met.

WDQM level two dimensions

At level two, Accuracy, Consistency, Referential Integrity and Uniqueness are instated using data

profiling and cleaning tools and database referential integrity constraints. The absence of these

process areas does not spell any good for these dimensions. Accuracy is reported to have been a

problem in the past indeed, however by making the stakeholder entering the data responsible for any

problems caused further down the process, Accuracy is said to have improved greatly. And

Uniqueness too has been reported to be in control. Consistency has been reported to be greatly

improved by replacing manual activities by automated procedures. Therefore, new data being entered

into Educator may well be more accurate, consistent and unique. However, historical errors are said to

still create havoc in data exchange processes. And Referential Integrity is found to be a problem,

partly because the Course Catalogue structure is perceived to be complex. Therefore, until current

faults in the database have been corrected, these dimensions are still not met.

On a score of 1 to 10, where 1 equals non-existent and 10 equals excellent, Specifications scores a 1.5,

or 2 at most. It is safe to say that this dimension is not met.

Whither volatility is recognized as a characteristic of data, is unknown.

The WDQM level two dimensions have not been met completely, and therefore, this level had not

been reached.

5.4.4 Conclusion

Since no data quality maturity level was found having all related dimensions properly instated,

evaluation data quality dimension attribute values verify that the current data quality values in the

study management domain are at WDQM level one (Initial). We may recognize some improvement at

the school of Build, Environment and Transport, due to a rather strict definition of the Educator

business process. It must be noted that improvements here came at the cost of creating an entire new,

manually managed, information system and management process shielding Educator from calamity.

Table 10 offers an overview.

15-Apr-23 F. Boterenbrood Page 6515-Apr-23 F. Boterenbrood Page 65

Research Improving data quality in higher educationThesisImproving data quality in higher education

Past CurrentData Quality Dimension Level Level Level Passed

Data Integrity > Six Sigma > Six Sigma 5 NoAccessibility Problematic Improved 4 NoConfidentiality In doubt In doubt 4 NoConsistency Bad Better 4 YesCurrency Low Improved 4 NoTimeliness Problematic Improved 4 NoAccountability Adequate Adequate 3 YesAccuracy Problematic Improved 3 NoCompleteness In Control In Control 3 YesConfidentiality In doubt In doubt 3 NoReliability Absent Absent 3 NoAccuracy Problematic Improved 2 YesConsistency Bad Better 2 YesReferential Integrity Problematic Problematic 2 NoSpecifications Absent Absent 2 NoUniqueness Unknon Adequate 2 YesVolatility Low Low

Level met?

Table 10: Current data quality dimension values

5.5 Required data quality maturity level study management domain

In this paragraph, the following main research question and sub questions is answered:

What values of data quality attributes will define the required data quality threshold and therefore the

required maturity structures at Windesheim?

a. To support the business rules identified earlier, what values should data quality attributes have?

b. What level of maturity is required to enable those data quality attribute values?

c. What organizational structure, process, technology, information and staff criteria define the

maturity found?

The required data quality maturity level will be identified by analyzing the outcome of the data quality

workshop (see appendix 6.11) and confronting this outcome with the initial research problem (see par

2.4):

At Windesheim, what defines the border between the control and integration stage? What are positive

and negative correlations between structures defining organizational maturity and attributes defining

data quality, enabling Windesheim to become a near zero-latency organization?

5.5.1 Workshop results

To assess the data quality required, a workshop was organized, enabling stakeholders from various

departments to translate their knowledge on the Educator domain and business rules into

requirements23.

In this workshop, specialists were requested to assign data quality dimensions to one of the four

phases of the study management process, based on the requirements posed by the business rules

involved. To create a functional selection process, the data quality dimensions were valued according

23 See appendix 6.11

15-Apr-23 F. Boterenbrood Page 6615-Apr-23 F. Boterenbrood Page 66

Research Improving data quality in higher educationThesisImproving data quality in higher education

to their position in the WDQM (see table 9), and the workshop participants were supplied with a

limited amount of ‘credits’. The underpinning WDQM model however, was not revealed. For most

dimensions, participants had the opportunity to choose between a ‘must have’ implementation, paying

the full price tag for this dimension, or they had the opportunity to choose for a ‘should have’

implementation, paying less but gaining a less satisfying situation. The results of this workshop are

summarized in table 11. Based on table 10, he last column reveals wither a dimension has already

been met at the maturity level specified.

Dim.Data Quality Dimension Level Required Level Required Level Required Level Required Level Required met?

Accountability 3 3 YesAccuracy 3 Should have 3 Should have 2 Should have 3 Must have 3 Must have NoCompleteness 3 Must have 3 Should Have 3 Should have 3 Must have 3 Must have YesConfidentiality 3 Must have 3 Must have NoConsistency 4 Should have 4 Should have YesCurrency 4 Should have 4 Must have 4 Must have 4 Must have NoReferential Integrity 2 Should have 2 Should have NoReliability 3 3 3 3 NoSpecifications 2 Should have 2 Should have NoTimeliness 4 Must have 4 Should have 4 Must have No

Manage catalogue Create studyplan Study Manage progress Overall

Table 11: data quality dimension assessment workshop results

Table 11 reveals that the study management process is divided into four sub processes:

1. Manage catalogue, resulting into courses being published;2. Create study plan, resulting in an updated personal activity plan;3. Study, resulting in grades being assigned;4. Manage progress, resulting in students receiving certificates, or study rejection letters.

The final column represents the overall score. If, in any sub process, a dimension is labeled

Mandatory, this dimension becomes mandatory for the whole domain. The reason for this is that the

process is one seamless cycle and in each step in this cycle all organizational units play an equal role.

It is simply not possible to have one step assigned to single unit that could be more mature than

others. For some dimensions, participants could choose between implementations on different levels

in the WDQM. This is the case for Accuracy, which can be implemented both at level 2 and level 3. In

that case, the highest level required prevails.

5.5.2 Discussion

It is interesting to see, that even though these are ‘expensive’ dimensions, the workshop results in a

massive interest in WDQM level 3 data quality dimensions. All level 3 dimensions (Accountability,

Accuracy, Completeness, Confidentiality and Reliability) are labeled to be ‘must have’ requirements.

In the current situation, timing poses many problems. It is therefore no surprise that Currency and

Timeliness are mentioned as ‘Must have’ dimensions. The high demand for data being timely and

current implies that data should be delivered before it gets updated (Timeliness < 1, see paragraph

5.2.4). Currency describes how promptly data are updated and is an function of age (of the data),

delivery time and input time: Currency = age + delivery time – input time. Timeliness is measured as

a ratio, indicating the availability for use of the data. It is expressed as a function of Volatility and

Currency: T = V*C. Since Volatility is constant, Timeliness is improved by reducing Currency. And

15-Apr-23 F. Boterenbrood Page 6715-Apr-23 F. Boterenbrood Page 67

Research Improving data quality in higher educationThesisImproving data quality in higher education

Currency is reduced by minimizing the age of data, or the gap between input time and delivery time.

Therefore, the result of the workshop can be interpreted as a demand to reduce waiting times (age and

(delivery time – input time)). In the interviews, multiple references are made about data being entered

into the system well beyond all deadlines. This is not so much a technical issue, The interviews reveal

that this issue is related to the age of data before it is entered into the system. Therefore, actions here

should aim on improving waiting times in the manual part of the study management process.

Having Consistency defined at level four as a ‘Should have’ is a bit surprising. It seems that the Dutch

awareness of costs has played a role here, buying a high-level dimension at a fraction of the price.

Table 10, paragraph 5.4.4, identified Consistency to be available already.

However, the workshop does leave no room for misinterpretation. If Educator is to succeed in fully

supporting the study management process, the organization needs to reach WDQM level three

(defined), and for some time related aspects, WDQM level four (quantitatively managed).

5.5.3 Initial Research Problem

This research was started as a result of Windesheim experiencing surprising problems while, in its

quest to become a near zero latency organization, implementing near real time integration solutions. Is

an organization operating at level three of the WDQM sufficiently tailored to address this initial

problem? Or is the initial research problem solved with a less far reaching solution, does a simpler

solution fit? Or is a WDQM level three implementation still not mature enough, does real time

integration call for a even more robust solution?

In paragraph 2.2.1, data quality errors were identified:

Enrolment of students results in duplicate accounts;

Painful mistakes like sending notifications to deceased students;

Due to database corruption, management reports are rendered useless;

Sometimes fields contain text-strings stating that ‘Debbie has to solve this problem’;

Names of students are completely missing, student addresses are incorrect, information is entered

in invalid fields;

Location (room) numbers are missing or contain special, unexpected codes;

Data is outdated or is valid in / refers to different time periods between information systems;

It was found that at least in one instance, lack of data quality caused a class to be scheduled in a

stair case.

These errors are mainly faults in Accuracy and Completeness. To solve these issues, Accuracy and

Completeness have to be addressed. Completeness is addressed at data quality maturity level three

(defined). Accuracy is available at level two (managed) already, be it at a rather reactive manner,

repairing errors once they appear in the database. This is too late, since by then these errors have

propagated through the automated interfaces, causing havoc in other applications. This means that a

data quality maturity level three (defined) implementation of Accuracy is required. As table 10,

paragraph 5.4.4. identifies, this is currently not the case.

The Master Data Management model too positions the definition of services for data integration at

level three, yet requires organizations to reach level four for implementing a Service Oriented

Architecture (Loshin, 2008).

15-Apr-23 F. Boterenbrood Page 6815-Apr-23 F. Boterenbrood Page 68

Research Improving data quality in higher educationThesisImproving data quality in higher education

Addressing the initial research problem calls for Windesheim to organize at WDQM level three

indeed, while further growth to level four is required if a fully fledged Service Oriented Architecture is

to be developed.

5.5.4 A data quality maturity level three (Defined) organization

An organization acting at data quality maturity level three (Defined):

Has a business-wide process view instead of a localized departmental view;

Conducts an effective programme management;

Develops systems based on formal Requirements;

Has an integrated view on its products and the (quality of its) product development cycle;

Has an integrated view on its corporate data;

Has learned to identify and address root causes when problems emerge;

Has data quality pro-actively guarded by technical barriers (input checks);

Implements new functionality after rigorous testing and accepting in separated environments;

Connects systems using available application interfaces;

Is provided with data which is fit for use;

Is monitoring data quality using a canonical data model;

Is serviced by staff having deep domain knowledge and being responsible for data quality.

5.5.5 Level 4 (quantitatively managed) requirements

To satisfy both the research goal and the study management process, Windesheim does not have to

fully implement data quality maturity level four (quantitatively managed). However, Currency and

Timeliness were mentioned as required data quality dimensions. In paragraph 5.5.2, age is discussed

influencing Timeliness and Currency. Addressing age causes an organization to enter the realm of

data quality maturity level four(quantitatively managed). There may be more compelling reasons to

implement data quality at this level.

When interviewed, de Graaf (see appendix 6.4) made a strong case for organizations to try and reach

data quality maturity level four:

“Especially beyond level three, data quality becomes a matter of special interest to organizations,

opening up a whole new realm of possibilities. What we can see beyond level three in practice today are

cloud computing for data quality initiatives, new business generated and successful one-on-one business

models based on reliable data” (Interview de Graaf, appendix 6.4)

Data simply becomes more valuable once an organization manages to reach data quality maturity

level four (quantitatively managed).

5.6 Growing from current to required maturity

Now that data quality and organizational maturity and the relation between the two are understood, an

instrument based on this relation has been developed, current and required maturity levels are

identified, in this paragraph the final research question can be addressed:

What is the gap between current maturity structures & data quality threshold and required

maturity structures & data quality threshold in the light of enabling Windesheim to become a

near zero latency organization?

15-Apr-23 F. Boterenbrood Page 6915-Apr-23 F. Boterenbrood Page 69

Research Improving data quality in higher educationThesisImproving data quality in higher education

a. What is the gap between the current and required organizational structure, process,

technology, information and staff criteria?

b. What conclusions and recommendations may be derived from this gap?

5.6.1 Gap analysis

Paragraph 5.4 Current data quality maturity level study management domain has determined the

current data quality maturity level to be one (Initial), while paragraph 5.5 Required data quality

maturity level study management domain identified the required data quality maturity level to be three

(Managed). This level of data quality maturity is required both to operate the Educator business

domain and to be able to deploy near real-time system integrating technologies successfully.

In the next paragraphs, the process areas in the field of structure, process, technology, information and

staff identified to be missing are discussed. This discussion is based on table 5, paragraph 5.1.4.

Structure

At level two, it is found that an organization has implemented a structured project based development

approach. In this research, Windesheim’s project management capabilities have not been evaluated. It

is unknown therefore to what extend Windesheim has mastered project based development. Assessing

an organization’s project management capabilities properly requires for a separate research, for which

in this project resources were unavailable.

At data quality maturity level three, Projects are to be managed in relation to each other, as a

programme, defined by a portfolio of projects serving a common goal. In this research, Windesheim’s

programme management capabilities have not been evaluated. It is unknown therefore to what extend

Windesheim has mastered programme based change. Again, assessing an organization’s programme

management capabilities properly requires for a separate research, for which resources were

unavailable in this project.

Process

At data quality maturity level two (managed) data quality is repaired reactively by implementing data

profiling and cleaning activities, source rating, schema matching and cleaning, business rule

matching and new data acquisition, resulting in improved accuracy, consistency, referential integrity

and up-to-date specifications. Referential Integrity and Specifications in particular are dimensions

mentioned to be problematic and absent respectively, therefore these Process Areas require attention.

Requirements development, Product Integration, Verification and Validation are all level three related

process areas aimed at constructing a product from different components, and assuring that the

product complies to requirements (Ahern, Clouse, & Turner, 2008). In at least one case, verification

and validation was implemented using manual processes outside Educator.

Currently, it is found that the Educator process is perceived to be complex. Some structures are not

used as originally intended (the current educational process and original requirements are not in

conformity). This may point towards a change in business rules. The required data quality threshold is

related to business rule demands, therefore, when business rules change, data quality dimensions may

well change with it. Even though workshop attendees formally agreed upon the business rules

presented, during interviews the digital course catalogue was identified as an area where, in practice,

these business rules may well have been redesigned.

15-Apr-23 F. Boterenbrood Page 7015-Apr-23 F. Boterenbrood Page 70

Research Improving data quality in higher educationThesisImproving data quality in higher education

Timeliness and Completeness are found to be conflicting. Timeliness is expressed to be required,

however Educator requires the description for a course to be complete before it is accepted. Course

information is not always complete that early in the process. In many cases, information on

assessments is finalized later, seemingly conflicting with timeliness. However, scheduling and

selection of courses might not have a strong relation with the final modes of assessing student’s

capabilities, leaving room perhaps for entering assessment information later.

Technology

To monitor data quality, data quality analysis tools are required. These tools are not available. During

interviews, the absence of insight in data quality is mentioned as one of the main obstacles towards

improvement.

Wither current ROTAP environments are sufficient has not been a subject during this research.

Assessing an organization’s research, develop, test, acceptance and production environments and

strategy properly requires for a separate research, for which resources were unavailable in this project.

Information

Currently, Information is known to be not trusted. This is likely not to change until WDQM level

three (defined) is reached. The main Key Performance Indicator of a data quality improvement

programme may be that in the end, Educator data has become fit for use and therefore trusted.

Staff

At data quality maturity level two, staff is analytical competent, has knowledge of technology,

business rules and data sources, and has data modeling knowledge. There have been no indications

that these competences were missing, it is assumed therefore that currently, at Windesheim these

competences present.

At level three, staff is responsible for data quality, extending the view from a single process step to

the end-to-end business process. During interviews, it was mentioned that the process in the Educator

domain was perceived as complex and difficult to oversee. Activities, like entering course definitions,

have to be planned well ahead of execution, enabling both scheduling and students to choose their

minors. Teachers were unaware of deadlines or unable to finalize education this early in the process.

This discussion may signal a problem. When crossing the barrier between having a local view on

matters and having a more holistic business process wide view, the technological discontinuity

presents itself (Paragraph 2.3.4) (Zee, 2001). This discontinuity is experienced as a setback. New

structures replace trusted old ones and are for the time being (perceived as) not as good as the ones

being replaced. A discussion on loosing the perceived freedom of changing educational definitions up

to the last moment may well be one of these new versus old structure discussions. The fact that these

new structures are required for coping with future challenges may not be recognized by all. The

effects of the technological discontinuity may strictly spoken not be part of the WDQM model, yet

when not taken into account, they may well prevent an organization from reaching data quality

maturity level three (managed).

15-Apr-23 F. Boterenbrood Page 7115-Apr-23 F. Boterenbrood Page 71

Research Improving data quality in higher educationThesisImproving data quality in higher education

5.6.2 Migration

In the field of organizational maturity, as it is in real live, no organization can skip levels. This means

that in the Educator domain, Windesheim has to master level two and three data quality maturity

process areas successively, as defined by table 5, paragraph 5.1.4.

The recommendations presented here aim to enable Windesheim to bridge the gap between current

and required situation, building on best practices identified, strengthening and accelerating the change

process.

The transition is defined as a two-staged process. First, data quality maturity level two (Managed) is

to be implemented. Once this level is established, the second step can be initiated, moving

Windesheim from data quality maturity level two (Managed) to data quality maturity level three

(Defined).

Data Quality Level two (Managed)

Level two is characterized by project based structures, re-active data cleansing processes and

technologies, and staff having local knowledge of business rules, data sources and data modeling.

Structure and Process

It is recommended to evaluate and (re)confirm Windesheim’s project management capabilities and

strategies.

It is recommended to initiate an Educator data quality improvement project. In this project:

Extending appendix 6.13, business rules are re-established and described in great detail,

identifying areas in which business rules have changed over time. An area of concern is the

digital course catalogue;

Using these detailed business rule descriptions, the Educator database is being profiled;

Data sources are rated and new data may be acquired;

The database is cleaned, i.e. data not matching established business rules is repaired;

Up to date Data Specifications are written.

These actions will establish Referential Integration and Specifications, and will improve data quality

maturity level two Accuracy, Consistency and Uniqueness.

15-Apr-23 F. Boterenbrood Page 7215-Apr-23 F. Boterenbrood Page 72

Research Improving data quality in higher educationThesisImproving data quality in higher education

Technology

It is recommended to have data quality analysis reports developed based on the business rules defined

earlier. The data quality analysis reports are based on the business rules established in the previous

step. This will improve Referential Integration, Specifications, Accuracy, Consistency and

Uniqueness.

Information

In this level, information is likely to remain Not Trusted.

WDQM level two (Managed) data quality dimensions Accuracy, Consistency, Referential Integrity,

Specifications and Uniqueness should all be satisfied now, be it that this may still be in a reactive

manner, using reports and data quality cleaning tools for repairs.

Data Quality Level three (Defined)

Once level two has been established, a formal transition to level three can be initiated. In this

transition, the focus shifts from a local view to a more holistic, process wide view.

Structure

It is recommended to evaluate and (re)confirm Windesheim’s programme management capabilities

and strategies.

It is recommended to define a lasting Educator enrollment programme, aimed at supporting Educator

at Windesheim. It may be noted that programme management is monitored by Key Business

Indicators (Ahern, Clouse, & Turner, 2008). This programme management is to be guided by valid

Windesheim Key Business Indicators, enabling Windesheim management to be in control of the

ongoing programme. Key Business Indicators may be found by observing the study management

baselines, i.e. Catalogue Management, Study Planning., Study & Grading and Progress Management.

Key Business Indicators may express the numbers of data errors in the catalogue, study plans, grade

assignments, rejection letters and certificates.

Furthermore, the programme includes all recommendations mentioned below.

Process

It is recommended that, where experiences gave rise to new insights and changed business rules, new

Educator requirements are developed and that the functionality of Educator is changed accordingly.

An area of concern is the digital course catalogue and the way data is shielded from unauthorized

access. During interviews, the ability to access data via back-door entries was mentioned. This action

will establish maturity level three Confidentiality.

It is recommended that, when data quality related issues arise, a formal root cause analysis is initiated.

This root cause analysis will identify the source of the data quality issues at hand.

It is recommended to implement formal requirements development. Based on the root cause

identified, new requirements will be developed and prioritized. These requirements present a

foundation for system adaptations and further development.

15-Apr-23 F. Boterenbrood Page 7315-Apr-23 F. Boterenbrood Page 73

Research Improving data quality in higher educationThesisImproving data quality in higher education

It is recommended to improve Educator’s support of verification and validation. Examples are input

checks and referential integrity checks, as well as improved process support, reminding lecturers of

upcoming timeframes and baselines. The verification and validation is to be based upon the business

rules described earlier. The actions described above will establish maturity level three Accuracy and

Reliability.

Technology

It is recommended to have Educator accept changes in course information up to the moment grades

are actually assigned, thus reducing the conflict between Completeness and Timeliness of data. Data

objects not to be changed after a course is selected by students, is to be made mandatory during course

definition. Other data objects may well be made optional.

It is recommended to have the current ROTAP environments and practices evaluated and formalized.

It is recommended to continue data integration using near-real time system interfaces. Starting a

Service Oriented Architecture altogether however, would require a further growth in data quality

maturity. These actions will support the implementation of maturity level four Currency and

Timeliness.

Information

At level three, information should have become fit for use. The WDQM level three (defined) data

quality dimensions Accountability, Accuracy, Completeness, Confidentiality and Reliability should

be satisfied. Presence of Reliability in particular signals the success of the data quality programme.

It is recommended to develop a canonical data model, supporting a corporate wide view on data being

exchanged between systems. This action will improve Uniqueness, Referential Integrity and

Specifications.

Staff

Implementing data quality maturity level three, focus shifts from a localized, departmental view to an

integrated, Windesheim-wide view and care has to be taken to overcome the technological

discontinuity (Zee, 2001) (paragraph 2.3.4 Growing Pains).

Interviews revealed that good results were gained from making personnel responsible for data quality

throughout the data life cycle. It is therefore recommended to:

For the variable student centered study period, make lecturers entering course information

responsible for assigning semester (variant) plans to student´s personal activity planning, as

opposed to having these activities assigned to support offices instead,

Make lecturers entering course information responsible for solving conflicts when course

definition and course execution differ.

These practices help in making the process transparent to both the lecturer and the student.

They may also help in overcoming the technological discontinuity.

Staff may need to be trained in the field of root cause analysis and requirements analysis.

One issue specifically targeted is the technological discontinuity. The programme (see: structure)

should give special attention to communicating the reasons of the change, have stakeholders

participate and recognize the difficulties associated with it. It should be recognized that education is a

15-Apr-23 F. Boterenbrood Page 7415-Apr-23 F. Boterenbrood Page 74

Research Improving data quality in higher educationThesisImproving data quality in higher education

very diverse environment, with personnel ranging from hard-core technical competent to deeply social

and artistically engaged. A programme just enrolling a new system is a recipe for disaster. A short

exploration of this specific element will reveal the multi-colored nature of Windesheim:

In ‘Leren veranderen’24, Caluwé en Vermaak (2006) group people into different modes of thinking,

labeled by colors (Yellow for power-based, Blue for process-based, Red for relation-based, Green for

learning-based and White for freedom-based thinking). For each color, people appreciate and respond

to change differently, need different guidance and require a specific approach. Now, what colors

define Windesheim? Let’s allow ourselves a little freedom of thinking in exploring the situation:

It is interesting to see that education in itself very much used to be a blue process. A student

defined a goal (‘I want to become a plumber’) chose an institution and found itself suddenly part

of a fixed process, in which he will transform from novice to educated plumber in a given period.

Today, this process is (at least partially) transformed from Blue into Green, in which the student

is encouraged to evaluate and re-set their own goals, make his own choices along the way, is

offered tailored learning situations and intensified personal coaching. We may recognize this

practice by the existence of the activity plan in which semester plans are selected.

Schools themselves may be described as White, in which teachers are professionals, only needing

space (and removal of obstacles) to prosper and follow their calling to create and transfer

knowledge.

Then there are the supporting departments, like Finance, Personnel, Facility Management and IT.

These departments are very Yellow by nature. Rules are defined by which personnel is hired,

assessed and rewarded, financial rules for accountancy are strictly observed, and instrumentation

for education is highly standardized.

Instead of Yellow, IT is more a Blue type of organization, trying to organize work in terms of

fixed and predictable processes.

Management is a main stakeholder too. Management of supporting departments tend to be

adopters of a Yellow point of view, keeping control over their business, while management of

education is more Green of nature, enabling their teachers to learn, reflect and grow. Management

therefore, is a rather multi-colored stakeholder.

This short exploration of the field of change reveals that at Windesheim, we may discover at least

Blue, Green, White, and Yellow oriented stakeholders, a clear indication that implementing change

using only one (perhaps blue oriented) technology driven approach will NOT succeed in overcoming

the technological discontinuity. It is therefore recommended to implement a broad programme to

support change, including as much stakeholders as possible.

Data Quality Level four (Quantitatively Managed)

It is recommended to, in order to improve Timeliness and Currency even further, communicate the

educational process baselines and associated deadlines as clearly as possible. Examples may be by

means of posters and leaflets, distribution of up-to-date Educator manuals and by training-on-the-job.

Finally, it is recommended to, once level three maturity is reached, not to stop, but to start an

discussion on extending data quality initiatives into WDQM level four.

24 Learning to adapt

15-Apr-23 F. Boterenbrood Page 7515-Apr-23 F. Boterenbrood Page 75

Research Improving data quality in higher educationThesisImproving data quality in higher education

5.7 Concluding

5.7.1 Conclusion

In this research, it is found that data quality is related to organizational maturity. This relation is

defined in the Windesheim Data Quality Maturity (WDQM) model. The model defines five separate

maturity levels, each defined by the presence of specific process areas (best practices) and resulting in

data having specific quality dimensions (characteristics). Levels range from Initial, through Managed,

Defined, Quantitatively Managed and finally, Optimized.

Interviews with specialists within Windesheim have revealed that currently, in the field of data quality

maturity, Windesheim is still at maturity level one (initial). This is indicated by the incidents found to

be occurring in the Educator domain, the way these incidents are being dealt with and the tools

available to monitor and correct data quality faults. This indication is then confirmed by observing the

process areas and data quality dimensions implemented.

In order to execute the educational process as supported by Educator in a reliable and efficient

manner, and to be able to implement near real-time message based application interfaces, it is

discovered that mastering data quality maturity level three (defined) is required, while getting data

through the process in time, requires some level four process areas to be implemented. It is to be noted

that the business value of data increases dramatically once an organization succeeds in implementing

data quality maturity level four (quantitatively managed).

5.7.2 Recommendations

A two-phased approach is recommended, implementing data quality maturity level two first, and level

three later.

Data quality maturity level two (managed) is reached by solving the immediate data quality problems.

This is done by starting an Educator data quality improvement project. This project will describe the

study management business rules in great detail, investigate Educator database quality using these

business rules and repair errors found. At the end of the project, data managed by Educator is

documented (creating insight) and reports are present, enabling operations to compare actual data

quality with the required data quality as documented.

Data quality maturity level three (defined) is reached by creating a holistic view. This means that

change is managed as a coherent programme, rather than in the form of multiple isolated projects,

when problems arise a formal root cause analysis is performed first, resulting in requirements being

designed and results being tested against these requirements. With respect to the study management

process, the process as a whole is observed. This view may trigger new requirements and changes in

the current implementation of Educator, examples of which may be the assignment of process

responsibilities to teachers, organizing for more flexibility in the process, adding process schedule

support and simplifying structures.

At data quality maturity level two, data quality is guarded in a rather reactive manner, using reports,

analysis tools and repair tools to correct issues. At data quality maturity level three, data quality is

guarded in a pro-active manner, using input checks and integrity checks to guard quality before it is

stored. Extending the documentation created at level two, in data quality level three data being

communicated between systems is described using a canonical data model.

15-Apr-23 F. Boterenbrood Page 7615-Apr-23 F. Boterenbrood Page 76

Research Improving data quality in higher educationThesisImproving data quality in higher education

Once level three is mastered, reaching for a full implementation of data quality maturity level four

(quantitatively managed) is not required. However, timing issues demand some level four process

areas to be implemented and, again, it is noted that once arrived at level four, organizations start to

yield great benefits from their data. It is therefore additionally recommended to communicate the

educational process baselines and associated deadlines as clearly as possible, and once level three

maturity is reached, not to stop, but to extend data quality initiatives into data quality maturity level

four.

It is most important that, by crossing the border between having a local departmental view and having

a Windesheim-wide view, it is recognized that the organization may experience a crisis, known as the

technological discontinuity (see paragraph 2.3.4 Growing Pains). Care has to be taken to overcome

this discontinuity. Activities here are people oriented: give special attention to communicating the

reasons of change, recognize the difficulties associated with it, involve stakeholders and respect the

different concerns each group of stakeholders have, communicate process milestones using bi-annual

calendars on poster format.

Issues not addressed in this research are the current status of project management, programme

management and ROTAP environments at Windesheim. Investigating these issues properly requires

for separate research projects, time and resources of which were not available during this graduation

project.

5.7.3 Stakeholder Value

At the start of the project, three groups of stakeholders were identified (paragraph 2.5).

Committed stakeholders are the CIO, Information Manager and Science. The scientific value of this

project is discussed in the next paragraph. For the CIO, this research has provided an instrument

guiding project portfolio management, linking change required with business objectives. The trigger

and initial problem for this research were the difficulties experienced whilst trying to become a near

zero latency organization. During the research it became clear that the actual business benefit reaches

much further than that. The instrument enables the CIO to fine-tune investments in improving the

study management process. For the Information Manager, the instrument acts as a guide in reducing

errors in data processing, increase efficiency, manage responsibilities, improve business intelligence

and save costs by reducing rework.

Involved stakeholders are Management, Operations, Functional Support, System Integration and the

Security Manager. At the end of the process, management will be provided with reliable data,

supporting business process management. Operations and Functional Support will spend less time on

rework and error correction. System Integration will be able to produce reliable and stable near

real/time integration services. The Security Manager is will notice an increased integrity and

availability of data.

Affected stakeholders are the Board, Staff and Students. The Board will notice improved image,

student satisfaction and process efficiency. Students will notice prompt responses when assessments

are graded, reduced complexity and a reduction in errors. Staff will experience simplified

administrative tasks, a clear-cut study management process and direct communication with students.

15-Apr-23 F. Boterenbrood Page 7715-Apr-23 F. Boterenbrood Page 77

Research Improving data quality in higher educationThesisImproving data quality in higher education

5.7.4 Achieved Reliability and Validity

Many theories on maturity and quality were discussed and balanced. The results were checked by a

survey amongst specialists. Those specialists were chosen based on their experience with data quality

and organizational maturity. Population of quality attribute values was performed by a workshop

involving Windesheim specialists, enabling them to reflect on the process and results. Windesheim

specialists were chosen based on their experience and involvement in data management in study

management. Care was taken to involve participants from a department known to have had trouble

ensuring data quality and a department known for successfully solving data quality issues. In many

cases, triangulation was used to cross-check results. This was done by comparing multiple aspects of

an outcome, or by evaluating an observation using multiple theories. Examples of this are the

observation of both process areas and maturity dimensions to ascertain the current maturity,

confronting the WDQM model with multiple theories, and explicitly validating business rules found

during interviews and the workshop. Building on multiple, accepted sources, reflection on results

acquired and open discussion ensured internal validity, while applying the grounded theory approach

ensured external validity.

During interviews, experts involved in this research agreed on the business rules and the model as

presented, with only a few modifications to be made. Which in fact, may make one a bit wary, for

where is the discussion? The WDQM model has been crafted and is used once, and one may wonder

of this is enough proof of its qualities. Wither it is balanced enough and incorporating the right

process areas and goals should be ascertained in multiple assignments and open discussion. Which

calls for a whole new research.

5.7.5 Scientific Value and Innovativeness

With this research, based on recent theories on data quality, an up-to-date instrument has been created

and used, pinpointing required data quality dimensions to satisfy business process’ needs, and

translating these data quality requirements into organizational measures to be taken. The instrument is

based upon well established general theories on production quality improvement and specific theories

on data quality and combines these theories into one framework.

5.7.6 Generalisation

The instrument developed has been successfully used at Windesheim, yet it is not bound by the study

management domain, or even one type of organization. Even though business rules require for the

study process to be implemented at WDQM level three, solving the initial business problem requires

for a WDQM level three implementation too, and this alone includes ALL Windesheim processes.

The instrument is ‘solution independent’, since it is based on open and well established models and

theories, and thus can be used in other domains within Windesheim, other higher education

institutions, organizations in other branches and even other nations. The theories behind it apply to all

organizations – as long as these organization rely on data being processed.

5.7.7 Research Questions Answered

Main Q1: Observing theories on maturity and data quality, and external benchmarks, what positive

and negative correlations between structures defining maturity and data quality attributes may be

found?

15-Apr-23 F. Boterenbrood Page 7815-Apr-23 F. Boterenbrood Page 78

Research Improving data quality in higher educationThesisImproving data quality in higher education

1. What structures define maturity?

a. What levels of maturity do exist?

Five levels of maturity have been defined, ranging from Initial through Managed, Defined,

Quantitatively Managed to Optimizing.

b. What maturity structures in the field of organizational structure, process, technology,

information and staff describe each level?

The maturity structures in the field of organizational structure, process, technology,

information and staff are documented in table 5, paragraph 5.1.4.

2. In higher education, what positive and negative correlations between maturity and data quality

may be found?

a. For this research, what is the relevant set of business rules?

The relevant set of business rules is documented in appendix 6.13.

b. How will this set of business rules evolve in time?

It is found that some parts of the educational process are perceived to be complex. A

reduction of complexity may well be in order. An area mentioned is the digital course

catalogue.

c. What data quality attributes are relevant for these business rules?

The data quality attributes relevant to the set of business rules are Accountability, Accuracy,

Completeness, Confidentiality, Consistency, Currency, Referential Integrity, Reliability,

Specifications and Timeliness.

d. What values of data quality attributes correlate with each level of maturity?

The relation between data quality attribute values and maturity levels is documented in table

9, paragraph 5.2.2.

e. What do process quality theories describe about positive correlations between quality and

maturity?

In most cases, Process Quality theories are derived from CMMI and therefore tend to

describe a common picture: structured initiatives prevail over individualistic initiatives,

holistic initiatives prevail over structured initiatives, repetitive processes including feedback

loops prevail over holistic initiatives. An exception has been found in the Data Quality

Management Maturity Model (paragraph 5.1.5), adding a higher abstraction level to data

management in each successive maturity level.

f. What do process quality theories describe about negative correlations between quality and

maturity?

In literature, this item is not addressed explicitly.

g. Are those observations consistent?

This question has become redundant.

Main Q2: What values of data quality attributes will define the required data quality threshold and

therefore the required maturity structures at Windesheim?

1. To support the business rules identified earlier, what values should data quality attributes have?

15-Apr-23 F. Boterenbrood Page 7915-Apr-23 F. Boterenbrood Page 79

Research Improving data quality in higher educationThesisImproving data quality in higher education

Required Accuracy is to be pro-active and Confidentiality is required to be basic. Furthermore,

data should be Current en Timely.

2. What level of maturity is required to enable those data quality attribute values?

Windesheim is required to implement data quality maturity level three (defined) completely, and

for Timeliness and Currency some level four (quantitatively managed) process areas.

3. What organizational structure, process, technology, information and staff criteria define the

maturity found?

The minimal list of structure, process, technology, information and staff criteria defining the

maturity found are defined by the process areas of maturity level two and three of table 5,

paragraph 5.1.4.

Main Q3: What are the current organizational maturity and current values of data quality attributes?

The current organizational maturity and values of data quality attributes correspond with data quality

maturity level one (initial).

Main Q4 (Central research question): What is the gap between current maturity structures & data

quality threshold and required maturity structures & data quality threshold in the light of enabling

Windesheim to become a near zero latency organization?

1. What is the gap between the current and required organizational structure, process, technology,

information and staff criteria?

This gap is documented in paragraph 5.6.1.

2. What conclusions and recommendations may be derived from this gap?

Detailed conclusions and recommendations are documented in paragraph 5.6.2. This paragraph

is summarized as follows:

In order to execute the educational process as supported by Educator in a reliable and efficient

manner, and to be able to implement near real-time message based application interfaces, it is

discovered that mastering data quality maturity level three (defined) is required, while getting

data through the process in time, requires some level four process areas to be implemented.

A two-phased approach is recommended, implementing data quality maturity level two first, and

level three later.

At data quality maturity level two, data quality is guarded in a rather reactive manner, using

reports, analysis and repair tools to correct data quality issues.

At data quality maturity level three, data quality is guarded in a pro-active manner, using input

checks and integrity checks to guard quality before data is stored. Extending the documentation

created at level two, in data quality level three data being communicated between systems is

described using a canonical data model.

Reaching for a full implementation of data quality maturity level four (quantitatively managed) is

not required However, to resolve timing issues and increase benefits, it is additionally

recommended to extend data quality initiatives into WDQM level four (quantitatively managed).

15-Apr-23 F. Boterenbrood Page 8015-Apr-23 F. Boterenbrood Page 80

Research Improving data quality in higher educationThesisImproving data quality in higher education

It is most important that care is being taken to overcome the technological discontinuity by

involving all stakeholders in the migration process.

5.7.8 Recommendation on further research

In this research, three elements have remained virtually untouched:

Current and required status of project management;

Current and required status of programme management;

Current and required status of ROTAP environment management.

Addressing these issues is imperative for reaching WDQM levels two and three. It is recommended

therefore, to investigate the current and required status of these process areas and advice on a

migration strategy when applicable.

Now the route towards a fitting level of data quality has been designed, this route will have to be

travelled. An intervention based research may be started, evaluating the progress of growth in

maturity, looking for problems during implementation and delivering advice on solving these

problems.

5.7.9 Reflection

At the start of this graduation project, I have set three goals as a target. The first goal was to deliver

‘value for money’, to achieve a result that would justify the investment my employer has made in my

education. The second goal, equally (or perhaps even more) important to me, was to reach the end of

the project with good (not just satisfactory) results. And the third goal was to make a difference, to

learn and add new knowledge to the IT profession.

During the execution of the project, I have witnessed reactions and gained insights to contemplate

upon. Let’s start with the last goal, and work our way back to the first.

I have discussed the WDQM with experts on data quality and maturity. In two occasions, interest in

the WDQM as an instrument was raised, and I was invited to publish my experiences in the form of an

article in the future. In one instance, I was even invited to join in writing a book on this matter.

Therefore, I feel confident that I have stumbled upon something interesting here. The box on goal

number three is ticked.

To reach the end of this graduation with good results is a more difficult goal to predict. I feel

confident that the results will be satisfactory, but are they going to be good? There is a lot of

uncertainty here at this moment. However, what I know is that I have done my absolute best – I have

enjoyed this graduation project and could simply not have done things any better than this. To me, the

diagnosis and research part of this project is covered to my very best abilities. Therefore, I consider

this box to be ticked too.

Yet, on the first goal, it was more difficult to get a grip on the matter. It was quite uncertain if the

answers to the questions asked would produce result specific enough to deliver usable advice. And in

the end, the subject proofed to be very comprehensive. A migration in data quality maturity involves

many aspects, some of which could only be addressed briefly with the resources available for this

research. Indeed, issues like project management, programme management and ROTAP management

require a research project of their own, and deserve more attention than received right now.

15-Apr-23 F. Boterenbrood Page 8115-Apr-23 F. Boterenbrood Page 81

Research Improving data quality in higher educationThesisImproving data quality in higher education

What would I do differently next time? In this project, the development of an instrument to measure

data quality maturity was required prior to analyze the business problem and advice on solving this. In

fact, we may have well executed two research projects here: a design oriented research resulting in an

instrument and a diagnostic research, resulting in advice. The absence of a detailed and up-to-date

instrument forced this research to create one, and the need to supply valuable advice required for the

instrument to be applied. There was no escape of combining the two types of research. It may well be

that on the second part of the project, I could have involved the current organization more, exploring

areas more deeply that, right now, may have only been touched on the surface.

15-Apr-23 F. Boterenbrood Page 8215-Apr-23 F. Boterenbrood Page 82

Research Improving data quality in higher educationThesisImproving data quality in higher education

6. Appendices

6.1 Interview Report Windesheim Integration Team

Interview report system integration team Windesheim

Attendees:

Tonny Butterhoff, System Integration

Gerben Meuleman, System Integration

Gerben de Wolf, System Integration

Albert Paans, Information Management

Alex Geerts IT Front office

Windesheim, 11/11/2009

What are the responsibilities of the system integration team?

Currently, the system integration team (formerly known as KOAVRA: Koppelen onder architectuur

voor Vraagsturing25) is connecting systems at Windesheim in a service oriented architecture. First of

all, the process being supported by real-team coupling is the HRM process. When a new employee is

hired, account information is send in real time granting the new employee immediate access to

information systems.

A similar process, aimed at proliferating student account information is currently being tested and is

planned to be accepted for production shortly.

Finally, real-time service based information exchange processes concerning study information and

supporting study processes are being built.

What technologies are being utilized?

Systems being integrated are all standard packages: CATS student information system, Oracle HRM,

Planon Facility Management, Decos document management, Educator Learning Environment,

Blackboard Learning Environment. The Enterprise Service Bus is delivered by Cordys.

For some systems, building a service interface layer was quite simple. Decos for instance is an up-to-

date system supporting the use of web services. Planon and Oracle HRM are at the other end of the

scale, offering no support for web services at all. For these packages, an interface utilizing database

injection code had to be developed. In the near future however, Planon at least promises to offer a

more modern solution.

What issues in relation to data quality are found?

25 Coupling Under Architecture for student-centric education

15-Apr-23 F. Boterenbrood Page 8315-Apr-23 F. Boterenbrood Page 83

Research Improving data quality in higher educationThesisImproving data quality in higher education

Data quality related issues are commonly found. An example of a recurring problem is that female

students enroll themselves multiple times, using either their maiden name or the family name of their

spouse by mistake. Other well known and regular issues are cases in which the name of a student is

completely missing, student addresses are incorrect, information is entered in wrong fields.

In the facility management system, new personnel is assigned to a room using four zero’s as a room

identification. When new personnel arrives, manual processes ensure the assignment of the correct

room number. Unfortunately, in many cases these processes fail to correct the number in time (or

seem to correct the number not at all).

In Oracle HRM, sometimes missing information is replaced by text-strings stating that ‘Debbie has to

solve this problem’.

DECOS seems to have become a victim of migration efforts. Even though being one of the most

modern systems, interfacing resulted in a myriad of errors and unexplainable results. Upon closer

inspection, the database seems to be corrupted, as if multiple migration attempts had been made, all of

which at some point failed, leaving about 10% of the DECOS database in ruins.

Recently management focus was on correctly clearing information in case of student decease. Even

though hard evidence is missing, it seems that in the recent past information was send to deceased

students’ addresses.

What has been found too is that sometimes the timeframes of business processes itself causes

problems. In some cases students receive no formal clearing, for instance due to the fact that study

fees have not been paid. Even though those students may start their study, they do not receive a

student account yet. This situation may be misinterpreted as a data quality error.

A final issue is that all cooperating systems store data in different time frames. Oracle HRM is very

good in keeping a historic track of all data, where the student information system CATS seems to be

able to store the present situation only.

What consequences arise as a result of these issues?

Information about duplicate accounts is propagated into a adjacent information systems (currently

using file oriented interfaces) and removing those duplicate accounts take considerable effort. Even

worse perhaps, is that the existence of duplicate accounts may lead to errors in the student head-count,

leading to uncertain financial budgets.

Propagation of errors is an effect that is related to almost every issue found. Consequences are that

information systems produce incorrect information, resulting in loss of confidence. Secondly, it is

difficult and time-consuming (costly!) to find and repair those errors.

Painful mistakes like sending the wrong mail to deceased students may lead to serious image damage.

In the standard file-based integration processes, every day time is spend by operations checking the

batch files for errors manually. In fact, manual inspections and correction processes are found

everywhere. DECOS for instance is used as a source for management information, Due to the fact that

the DECOS database is corrupted, those management information reports are checked manually. If

suspicious information is found, the numbers are corrected manually. Not only does this ad-hoc

practice consume time, it effectively renders the management reports useless.

15-Apr-23 F. Boterenbrood Page 8415-Apr-23 F. Boterenbrood Page 84

Research Improving data quality in higher educationThesisImproving data quality in higher education

Where data quality errors in HRM lead the way to the solution (‘Debbie should solve it’ – so let’s ask

Debbie), problems related with facility management tend to keep everyone in the dark. When material

(like a computer) is ordered for new personnel, getting the equipment delivered proves to be a

challenge, since information regarding a valid location is not available. Not only does this practice

lead to time lost, new personnel is not served professionally on their first day on the job, damaging

Windesheim image.

Differences in storing data mutations over time may lead to incorrect system responses, like sending

information to a student’s new address prior to the actual date of moving. However, it is unknown if

anything like this has happened already.

Are there any causes and solutions identified already?

The use of Commercial Off The Shelve applications seems to contribute to the inventive use of data

fields. Packages not always support a flexible and proper implementation of a business process, and

sometimes an inventive implementation will have to be found. And then again, standard solutions not

always offer the input checks one would like to see. Even the national student portal Studielink

(www.studielink.nl) allows for students to fill out their application forms incorrectly. It would help if

correctness of data was enforced ‘at the source’.

The distinction between correct and flawed data is not always clear. To solve this problem,

development of a canonical data model is planned.

6.2 Interview Report WDQM Marlies van Steenbergen

Discussion Marlies van Steenbergen MSc Lead Architect Sogeti

Subject Validity WDQM model

Date 12 march 2010

First of all, it is noted that at level 4, emphasis lies on being able to manage a process quantitatively,

which implies the presence of a measurement mechanism.

In the WDQM, having no process area’s defined at level 1 is recognized to be correct.

At level 2, the initial positioning of root cause analysis in the process column is questionable. When

properly conducted, a root cause analysis leads to identification of underlying problems and enables a

more lasting solution. Therefore, root cause analysis may be positioned at level 3 instead.

Information being unspecified at level 1, not trusted at level 2 and structured at level 3 is not directly

based on evidence in literature. The reasoning behind these labels became clear during the discussion

and are recognized, but may need some further explanation.

At level 3, the focus is on being able to manage multiple changes in harmony and creating synergy.

Therefore, the term project management may better be replaced by Program management. And in

many cases, portfolio management is used to indicate synergy. Yet, portfolio management is often

used in conjunction with (IT) Governance, utilizing frameworks like COBIT, BiSL and ITIL. These

frameworks may fit level 4, quantitatively managed better, since their focus is on supporting the

whole process / product life cycle. Therefore, using program management instead of project

management at level 3 seems to be appropriate.

15-Apr-23 F. Boterenbrood Page 8515-Apr-23 F. Boterenbrood Page 85

Research Improving data quality in higher educationThesisImproving data quality in higher education

The explanation of the process activities on level 3 may be made more explicit. Technical solution

might be more appropriate at level 2 in the technological column. And is data integration an activity

that may better be positioned in the technical column at level 3? What are the relevant data integration

patterns here? It may be argued that at this level, data is integrated with other sources using

translation routines at the borders of each source. Supporting these translations may well be

translation script, resulting in the emergence of a canonical data model: a bottom-up description of

data being transferred between sources. In the information column at level3, we may see the

emergence of a canonical data model.

At level 3, in the staff column, data modeling knowledge is positioned. This raises the question how

personnel is able to develop information systems and solve data quality problems at level 2 in the first

place. It is therefore recommended to reposition data modeling knowledge at level 2. In this cell,

project management skill may be replaced by better fitting programme management skills. One may

argue that at this level, staff is “synergytical” competent, since staff has learned to create synergy

from combining multiple transformations (projects).

At level 4, data is approached as a product. The presence of an information product manager at this

level makes good sense. But at level 3, data may be recognized to be raw material, building blocks, a

commodity perhaps. At level 3, who is responsible for this material?

Since level 4 incorporates end-to-end business process management, all measurement and analysis

instruments to enable level 5 may be present at level 4 already. In the technology column, which

integration patterns apply here? In the information column, the canonical data model may well be

used to define a common information language to which all data sources adhere.

At level 5, the absence of general theories on data quality is not completely surprising. It seems that

data quality theories are focused on improving the data quality to an acceptable level (fit for use).

Applying six sigma may work, yet in some cases it has occurred that level 5 is discarded all together,

since the organization in question had no intention to reach this level. Delevring quality according to a

service level agreement does seem to fit level 4 better. It is advised to reposition this process area at

level 4. The organization being structured in a strict top-down hierarchy is based on theories from

Treacy and Wiersema. This should be explained in more detail.

A rather interesting issue may be that from level 3 onwards, it is implicitly described that data errors

are improved at the data source, not at the place they create havoc. This means that a continuous

improvement cycle has been defined at level 3 already. What does this mean for level 5?

6.3 Interview Report Data Quality in Education Th. J.G. Thiadens

Attendees: dr. mr. ir. Th. J.G. Thiadens, lector IT Governance, Fontys university of applied science

F. Boterenbrood

Doorn, 15-03-2010

This discussion is about data strategies in higher education. Issues discussed are the historical

perspective on IT Governance, the current status, regular problems and common solutions, and the

future of IT in higher education.

Fontys university of applied science is characterized by 35 separate schools. The decentralized

structure of the organization resulted in the presence of about 600 simultaneous projects, all resulting

15-Apr-23 F. Boterenbrood Page 8615-Apr-23 F. Boterenbrood Page 86

Research Improving data quality in higher educationThesisImproving data quality in higher education

in an IT solution. 10-15 of these projects are centrally managed. The remaining projects are local

initiatives within the 38 schools. The governance of the 10-15 centrally managed projects is

transparent, while the remaining projects are executed without central guidance. One feels, that a

portfolio of IT projects should deal with all 600 projects.

This is in fact a position many universities are experiencing today. The Dutch universities of applied

science are the result of a merger of many smaller institutions in higher education. The resulting

institutions are large organizations, be it rather decentralized. Currently, a move towards more

centralized modes of governance is visible. However, data quality may not always benefit from

centralizing. Procedures involving data being transferred between systems manually are prone to

errors. The books of Starreveld mention that manual record keeping can lead to up to 5% errors in

data quality.

In many cases, data quality may be improved by shifting responsibilities as low as possible down the

hierarchy. Examples are:

• Problems in grade assignment may be solved by making the lecturer directly responsible for

correct and timely grading. Lecturers are corrected by students when grade assignment is late

or questionable.

• Registration of lecturer availability may be much improved if the lecturer is made personally

responsible for this information, and is given the right tools to manage this information. The

effects of not having registered the right information on time (the lecturer finds himself

scheduled at undesired moments) may be a fitting incentive to have this information up to

date.

• Within schools, items are ordered and these items will have to be billed. Billing processes

should make the school which placed the order responsible for paying the bills. In this way,

schools are directly confronted with financial consequences of choices, and not at the end of

the year, by means of a error-prone budgeting process. This may be implemented by

positioning financial controllers at decentralized positions.

• Monitoring study progress is a responsibility which could be both centralized and

decentralized simultaneously. Student centered education requires for study progress

monitoring to be decentralized, allowing for study coaches to closely monitor individual

student’s progress, while business intelligence processes supply management with over-all

corporate controls.

• Examples of responsibilities that should remain centralized are strategic management and

setting the rules for employee benefits.

Transferring responsibilities to the individual is in line with current use of technological developments

like the internet, in which the individual has gained in influence. Information is perceived to be an

individual asset. This will lead to an individual approach of information. An example is given by

Harvard University, where students are presented by individual schedules every day, including

proposals for alternative classes the student may wish to attend this day.

In many cases, information systems are not trusted. Often, managers rely on information acquired

from alternative sources or different indices. The number of employees working at an organization for

instance can be found by looking at the number of monthly salary deposits.

In the future, it is to be expected that information processing is centralized even further. Private cloud

computing has a role to play, enabling multiple institutions to share services. Virtualization too

supports the emergence of shared service centers, while respecting decentralized needs. The most

15-Apr-23 F. Boterenbrood Page 8715-Apr-23 F. Boterenbrood Page 87

Research Improving data quality in higher educationThesisImproving data quality in higher education

difficult hurdle to be solved here is to overcome the notion that information is not owned by the

decentralized business units. This requires excellence on academical level to be present at both

management and workforce.

6.4 Interview WDQM dimensions Report Arjen de Graaf

Attendees Arjen de Graaf, Founder / CEO Arvix

Frank Boterenbrood

Subject Validity WDQM model

Date 09 april 2010

Introduction

As founder and CEO of Arvix, a company focused on safeguarding and improving data quality, Arjen

de Graaf has deep knowledge of data quality and its relation with organizational maturity. In this

meeting, the WDQM goals as described in table 9 are discussed.

Ownership, stewardship and a business case for data quality.

In many organizations, an employee is assigned responsibility for the quality of data. However, once

asked for the means available to monitor and correct this data, the answer is not always satisfactory.

Effective means to influence data quality are absent in many cases. Absence of means results in a

situation where one can feel responsible for data quality, but in reality, one can not actually be

responsible. In other words, the data steward, as mentioned in this research, cannot fulfill his role as

caretaker for data quality if the means to effectively influence data quality do not come with the job.

Since data quality is related to organizational maturity, the means required are managerial rather than

technical. To ensure data quality, one may have to be prepared to restructure the organization.

Instating data stewardship without the preparedness of taking (perhaps drastic) managerial decisions,

restructuring the fabric of an organization, may be in vain. There HAS to be a manager responsible for

data quality with the authority to implement change.

What can be observed is that organizations assign data quality governance not to one employee or

role, but instate a business intelligence department or data quality department. This department is

assigned the task of providing the organization with valid business indicators, directly influencing

operational processes and management decisions. In this case, data quality and business performance

are visibly connected, displaying a clear business case for data quality.

Talking of business cases: businesses are confronted with the situation that customers have directly

access to operational data and demand near real-time responsiveness. Today, when data is flawed, an

organization does not have the means nor the time to correct this data in internal processes and

procedures, and the business runs the risk of finding itself on one of the prime-time consumer

platform television shows, explaining why it all went so horribly wrong.

Value of data quality

It is important to be able to express data quality as a valuable asset of an organization. This means that

data quality has value, it can and must be expressed in terms that have meaning to the business. In the

current model, this approach towards data quality seems to be rather instrumental and the business

15-Apr-23 F. Boterenbrood Page 8815-Apr-23 F. Boterenbrood Page 88

Research Improving data quality in higher educationThesisImproving data quality in higher education

view seems to be missing. Value of data quality can be expressed in terms of financial value or

business urgence. The business management view may include elements of recognizing new patterns,

generating new business based on data mining, turning data into new money. Or costs can be reduced

by –for instance- recognizing patterns indicating cases of fraud. Reasons for attention turning to data

quality are competitiveness (creating new business), being master of business data and therefore able

to not only manage but also lead an organization, exploring client demand (instead of sending a

mailing ‘to all’).

Insight

One main dimension of data quality therefore, seemingly missing from the current model, is Insight.

Does the organization, the data steward, the manager responsible for data quality have insight in its

data and the quality thereof? Insight in data means that it is clear for an organization what data

attributes are required or available, where and why these data attributes are created, what sources were

used, where these attributes are used, who guards and tests the attribute, when these attributes are

outdated and, once obsolete, how they are dealt with.

Accreditation

Data Quality is becoming recognized as a major contributor to (or: prohibitor from when absent)

business success. We may expect a data quality standard to emerge in the near future, and

organizations may become data quality accredited using this standard. Needless to say that Insight is

one of the first dimensions required to be instated.

For now, an organization may well embark on a journey towards data quality improvement because

new management has just entered the organization, and is in doubt about the reliability of his data: he

is not sure wither the data is right or not. In this case, the new entrant acts as a maverick: not

obstructed by any corporate rules and customs, data quality is doubted and questions are asked,

demanding unambiguous answers.

Volatility

In the current model, volatility is mentioned not to be recognized at WDQM level one. This does not

seem to be right, since at operations, the importance of data quality is recognized right from the start.

The experts from operations however have a hard time communicating the importance of data quality,

and at level one, it is mostly management which is unaware of the importance of data quality.

Beef

Where is the beef? The current model is technically correct, yet it seems to be lacking real-world

business attention. For instance, current level labels are quite technical and difficult to understand.

What is meant with ‘quantitatively managed’? Who is to understand this – it is not very likely to

generate management attention instantly. Please describe a ‘WDQM for Dummies’ using management

benefits. Especially beyond level three data quality becomes a matter of special interest to

organizations, opening up a whole new realm of possibilities. What we can see beyond level three in

practice today are cloud computing for data quality initiatives, new business generated and successful

one-on-one business models based on reliable data. Make data quality more sexy!

6.5 Interview report Current Data Quality Educator Gerrit Vissinga

Attendees: Gerrit Vissinga, process engineer Educator

15-Apr-23 F. Boterenbrood Page 8915-Apr-23 F. Boterenbrood Page 89

Research Improving data quality in higher educationThesisImproving data quality in higher education

Frank Boterenbrood

Windesheim, 17-03-2010

Introduction

The Educator project is in turmoil. It has taken the best part of three years now, and full

implementation may well take another three. In the future, issuing diplomas and certificates will

become part of Education.

The current graphical representation of Educator’s scope of influence is not quite right: the process

education development is not within Educator’s scope.

Issues, causes and solutions

Management of study definitions in the catalogue is difficult. In particular updating course definitions

is tricky, since the user has to identify the type of update up front. If the update is identified as

‘complex’, a new version of the study definitions is generated. If the update is identified as ‘simple’,

current data is updated in place and no new version is created. To the user, the distinction between

‘simple’ and ‘complex’ updates is not made perfectly clear, and the consequences of a ‘complex’

update remain unknown to many. One of the consequences is, that once entered, a new version of

study definitions need to be linked to semester variant plans. Often, this step is overlooked, resulting

in study information not being made available to the student, since the student adds semester variant

plans to their activity plans, never individual courses. Indeed, in the catalogue orphaned study

definitions may be found.

Errors like these are caused by an over engineered and complex solution. Currently, a simpler more

straight forward system design is being discussed.

Another issue is caused by the fact that a student may enroll himself into a study that differs from the

one agreed upon with study coordinators. This mistake is prevented by having the supporting offices

assigning semester plans to student’s activity plans, or to have the study coordinator check activity

plans in great detail.

Anyway, some issues remain unsolved, since the focus is still on supporting the primary process.

Other issues are checked by functional support or head lecturers. These tasks however are delegated to

support offices. One may question the quality of the checks performed.

Discussion on Data Quality Dimensions

Accessibility. Seems to be OK.

Accountability. This seems to have a relationship with confidentiality. This seems to be OK.

Confidentiality. The amount of roles available in Educator is rather large, resulting in complex role

management.

Consistency. The technical implementation of Educator may not be adequate to prevent data from

becoming inconsistent. An example is the issue regarding definition updates.

Currency. -

15-Apr-23 F. Boterenbrood Page 9015-Apr-23 F. Boterenbrood Page 90

Research Improving data quality in higher educationThesisImproving data quality in higher education

Integrity, Referential. See Consistency. Integrity amongst different information systems seems to be a

problem, since data integration with Educator is to a large extend manual.

Reliability. In some cases, once grades were assigned, courses were removed from student’s activity

plans, causing grades to disappear. This was caused by a notion that the study plans were in error: the

reliability of the data was in question. It is not known if any solution preventing this type of error has

been implemented.

Specification. Leaves room for improvement

Timeliness. -

Uniqueness. -

Volatility. In particular course definitions are prone to alterations. It seems that lecturers designing

their courses change their mind on how to execute or assess their education too often.

6.6 Interview report Current Data Quality Educator Gert IJszenga

Attendees: Gert IJszenga, manager education School of Build, Environment & Transport

Frank Boterenbrood

Windesheim, 15-03-2010

The School of Build, Environment & Transport (BET) is in a process of migrating all student

information from the old Student Information System (SIS) CATS to the new SIS Educator. Starting

from year 2008-2009, the registration of the digital course catalogue, the student’s personal study

planning and student grades are registered in Educator. Currently, grade information of students

starting in preceding years is being migrated from CATS to Educator. When this process is concluded,

the School of BET is planning on utilizing Educator’s portfolio capabilities.

In implementing Educator, the School of BET applies a gradual approach. First, three years ago, the

processes of education development and definition, student activity planning, assessment and grade

registration were formalized more strictly, creating a situation in which the School of BET was in

control of these processes. Secondly, once these processes operated reliably on the current

infrastructure, process support was switched from CATS to Educator.

The challenge was to create a process in which:

education including assessment rules were defined correctly,

students create their study plan on time and correctly the first time round,

freedom of choice was balanced against predictability (of resource claims),

registration of grades is completed within a two week window without major disruptions.

The issue here was to create a situation in which information stored in Educator could be checked

against base-line documents, resulting in usable data quality controls and enabling well-informed

choices in case errors have to be corrected. The specific question addressed was: “What process

design ensures every student to be linked to the right courses, supporting the assignment of the right

grades?”

15-Apr-23 F. Boterenbrood Page 9115-Apr-23 F. Boterenbrood Page 91

Research Improving data quality in higher educationThesisImproving data quality in higher education

The leading principle at the School of BET implementation is that control over data entered into the

system is mandatory. This principle is implemented in three areas: The Digital Educational Catalogue

DOC (Digitale Onderwijs Catalogus), the student personal activity plan PAP, and grading.

DOC process control

A curriculum does not spring into existence by accident. Leading up to the registration of course

information in DOC, a process of design and discussion is executed. These activities are reflected in

planning and design documentation being present, resulting in a base line enabling control over

definitions in DOC. The School of BET therefore requires course planning and design documents to

be present prior to entering course definitions in DOC. These documents are an instrument guiding

and monitoring the quality of the course catalog.

Personal Activity Planning

Once the student completes the propaedeutics phase, the School of BET offers a variable study

programme in which the student has freedom of choice. One of the problems here is that if the student

does not use Educator to enroll himself in the courses he is attending in time, grades cannot be

assigned. Secondly, it is hard to plan education execution efficiently if participation of students is

uncertain up to the very last moment. Therefore, the student is required to create a complete plan for

his study career early in his study. To support decision making, for each study three alternative study

paths are available, each study path offering limited additional freedom of choice. The School of BET

has structured the choices available in a study planning chart, visualizing the different routes. Finally,

if a student does not complete his personal activity planning in time, he will not be allowed to

participate during one semester.

This personal activity planning results in a set of study plans, which are easily converted into files and

imported into Educator, linking students to courses, groups and classes. To enable this import, the use

of free format data structures in Educator (known as labels) is standardized. And again, if problems

are detected, the individual study plans are a benchmark against which data in Educator can be

checked.

The Windesheim Educational Standards (WOS: Windesheim Onderwijs Standaard) refers to the use

of semester variant plans. These semester variant plans are in fact an educational planning tool

encompassing a twenty week period. The School of BET may not use the Semester Variant Planning

structure literally, yet the process in use does have exactly the same effect.

Grading

Once 1) the digital course catalogue is correct and 2) the student are enrolled in the right courses in

time, assigning grades does not pose any problems. Issues the School of BET meet here are

performance issues, i.e. the speed at which the system reacts to input, bugs for which workarounds are

to be used and reporting facilities which currently are not yet available. These issues indicate that

during development and implementation Educator still was in a experimental state and they are

currently being dealt with in the Educator development project.

The major issue at this moment is getting grips on the time it takes to assess student results and assign

a grade. Ideally, this should be completed within a two-week period; however instruments to control

this service level are not yet available.

Key moments

15-Apr-23 F. Boterenbrood Page 9215-Apr-23 F. Boterenbrood Page 92

Research Improving data quality in higher educationThesisImproving data quality in higher education

Important deadlines in these processes are:

1. The moment courses are published in the digital education catalogue;

2. The moment the student submits his personal activity planning;

3. The moment grades are assigned and finalized.

Conclusion

In this discussion, not all information relevant to the research project was discovered, since one hour

proofed to be insufficiently long. A second date was set, in order to continue this meeting.

6.7 Interview report Current Data Quality Educator Gert IJszenga Continued

Attendees: Gert IJszenga, manager education School of Build, Environment & Transport

Frank Boterenbrood

Windesheim, 25-03-2010

In this interview, the current values of data quality dimensions are discussed.

Accessibility. At this moment, reports enabling control over Accessibility are missing. Some rather

elaborate manual checks are available. However, due to process design, it is believed that

Accessibility for most is sufficient. There may be an issue with assignment of grades. An estimated

80% or so is believed to made accessible for students within 10 days after an assessment. It is mainly

the lecturers motivation keeping Accessibility within limits.

Accountability. Educator offers build-in mechanisms to safeguard accountability. An audit-trail is

available, logging all data updates. In the real world however, only exams are stored, student reports

and other end-products are handed back to the student after examination. It is therefore not feasible to

reproduce the product that was assessed. Another issue is the absence of a fall-back administration, in

case errors cause Educator to fail. In one instance, deletion of courses already being graded, caused

the deletion and loss of all grades, leaving the organization without a backup. The system should

prevent this.

Accuracy. Course information is described in many documents outside Educator. As a result, the

information entered into the system is (wrongfully) regarded to be of minor importance. This

information is often less detailed as should be. This is an area with room for improvement. How

serious are we about the data in our study support systems?

Completeness. Educator requiring all course data to be available before one, fixed, deadline is

perceived to be a problem. There is no room for a more gradual approach, in which required data is

stored first, and additional, more optional data is added later. The current binary method causes course

information to be entered as late as possible, jeopardizing currency and timeliness.

Confidentiality. Is well take care off. It is hardly impossible to adjust a grade, since this function is

protected using a token (strong authentication). Using social hacking techniques, one may gain access

to student grades, be it read-only.

Consistency. Due to strict process design, consistency is believed to be managed at a pro-active level.

15-Apr-23 F. Boterenbrood Page 9315-Apr-23 F. Boterenbrood Page 93

Research Improving data quality in higher educationThesisImproving data quality in higher education

Currency. Some educator functions are troubled, Educator does show hick-ups from time to time. An

example is the limited choice of web browsers supported by the system, making it difficult to gain

access to the data in time from different devices and locations.

Integrity, Data. Course data is considered to be right for about 75%. The integrity of student activity

plans and grade management may well approach a score of 100%.

Integrity, Referential. Is guarded by strengthening the process.

Reliability. At the school of Build, Environment and Transport the data within Educator is qualified

as reliable as a result of well defined business processes.

Specification. Specifications is not quite ready yet. Currently, much knowledge stillis convined to

Gert alone, the situation is not quite transparent, i.e. ready to be shared. There is still room for

improvement here, too.

Timeliness. This is an issue which is being worked on as we speak. It is perceived to be troublesome

due to the fact that processes are started reactively. It is the process stakeholder who decides on

starting a process, and time and again it proves difficult to start processes in time. Time is poorly

planned.

Uniqueness. The design process creates a barrier against data being duplicated. Lecturers work in

teams on courses development. Course data however is described in multiple documents between

which discrepancies are possible.

Volatility. Currently, course data may well be too volatile. The organization, learning to use the

system, is changing course information much too often. A good course definition should last for a

minimum of three years, and for many courses this may well be eight years. Yet, courses are updated

multiple times each year now.

6.8 Interview report Current Data Quality Educator Klaas Haasjes

Attendees: Klaas Haasjes, operational support Educator

Frank Boterenbrood

Windesheim, 17-03-2010

Introduction

Klaas Haasjes, as a member of operations, is responsible for the correct operation of Educator and the exchange of data between Educator and its adjacent information systems. Data exchange between Educator and Blackboard for instance is a labour intensive process. In Educator, executing a query results in a comma separated file, which is then imported into Blackboard, an information system supporting secured document exchange between students and teachers.

Issues, causes and solutions

15-Apr-23 F. Boterenbrood Page 9415-Apr-23 F. Boterenbrood Page 94

Research Improving data quality in higher educationThesisImproving data quality in higher education

It is found that data entered in Educator results in problems in Blackboard. For instance, for each course (VOE) in Educator, a module in Blackboard is generated. In this process, for each module only one teacher is linked to the module, being the teacher responsible for the course. In some cases, in Educator multiple teachers or groups of teachers are linked to a course, which leads to one random teacher or no teacher at all being linked to a module in Blackboard. Operations does not correct this problem. It is found that these issues are corrected by the user in Blackboard manually.

In the past, courses in Educator could be renamed. When this happened, the consequence was that course names in Educator and Blackboard became different, rendering course selection for students a mission impossible. To prevent this confusion, Educator has been modified, preventing course names from being altered. However, ghosts from the past still remain, causing 472 errors during data integration runs.

Life cycle management of data is a problem in many cases. At www.studielink.nl, student can select a study. Once a student selects Windesheim and www.studielink.nl submits their information, an account is created at Windesheim. However, students are free to un-enroll themselves and indeed frequently do so. Their account at Windesheim is not terminated, leading to literally thousands of ghost-accounts. In many cases, these ghost-accounts are assigned to the mandatory part of the programme of the study the students initially enrolled for. Once that has happened, removing these accounts becomes difficult, since they have become intertwined with educational registrations. A solution for this problem is currently being investigated. Student ghost-accounts may cause havoc with software licensing strategies. When a license strategy is based on maximum number of enrolled students, ghost accounts may cause maximum thresholds to be exceeded.

Discussion on Data Quality Dimensions

Accessibility. Many students may still not be aware of the existence of the digital catalogue. Indeed, many lecturers may not be aware of its existence. It may be observed that seemingly the educational process is not fully understood by many. Wither actions are planned or taken to improve the situation is unknown.

Accuracy. Values entered in the catalogue are checked against general agreed upon guidelines. However, these guidelines do not seem to be known by many.

Consistency. In the past, the meaning of grades could be defined by the lecturer designing the course. This led to a plethora of grade value interpretations. One issue in particular caught the attention: grade values indicating a score being insufficient, sufficient or a course being dispensated altogether. These scores were represented by a 4 and 6 respectively, much to the dissatisfaction of students graduating cum laude, who, much to their surprise, were presented with one or more sixes amongst the row of ‘straight A’s‘. Now, grade values definitions are defined by Educator automatically. How values for sufficient, insufficient and dispensated are currently processed is unknown.

Currency. In many cases, information is entered into the system too late. This is not primarily a fault of the information system, it is the human factor causing delays. Examples are grades and course definitions being entered too late. Student plans tend to be finalized in time, since being late with student planning results in the student not being able to attend to classes for one semester.

15-Apr-23 F. Boterenbrood Page 9515-Apr-23 F. Boterenbrood Page 95

Research Improving data quality in higher educationThesisImproving data quality in higher education

Integrity. In Educator, courses with no credits attached have been defined. It is apparently not feasible to assign checks to every data attribute entered.

Reliability. Data is reliable as long as they are entered correctly.

Specification. Documentation supporting Educator is rather thin. However, documentation is being improved.

Timeliness. The human factor proves to be a large contributor to information being available late. Knowledge on how processes rely on information being timely seems to be missing. Implementation of Educator seems to be left in the hands of the individual schools.

Uniqueness is a dimension which is strictly observed and guarded.

Volatility. Information in the world of Educator does not change very frequently. Peaks are found when new students enroll themselves at Windesheim.

6.9 Interview report Current Data Quality Educator Louis Klomp

Attendees: Louis Klomp, ICTO Coordinator school of Business & Economics

Frank Boterenbrood

Windesheim, 18-03-2010

Introduction

Louis Klomp is teacher and ICTO coordinator (Information and Communication Technology in Education) at the school of Business & Economics (BE). Louis was engaged in the use of the first version of the digital education catalogue, and has participated in the migration to the current catalogue. As a teacher, Louis has hands-on experience with Educator.

Educator does not support development of courses; the focus of the model currently presented is too wide. Since at BE, Educator is used from the very first moments on, printing diplomas and grade certificates are supported by Educator this year.

Now BE is focusing on defining and registering standards and thresholds, such as the 45 EC threshold associated with the propaedeutics phase. In time, these thresholds will be assigned to student’s personal activity plan (semi) automatically.

15-Apr-23 F. Boterenbrood Page 9615-Apr-23 F. Boterenbrood Page 96

Research Improving data quality in higher educationThesisImproving data quality in higher education

Issues and actions

Much to anyone’s surprise, during grading teachers were confronted with the fact that when the definition of assessments of a course in the catalogue did not align with the way a course was assessed in real-life, grading of that course was difficult, if not impossible at all. At first, teachers were supported by coordinators removing the course from student’s activity plans, correcting the course definitions and re-inserting the course in the activity plans, restoring previously earned grades manually. Later on, this support was dropped and teachers had to deal with the issues themselves. This rather rigid support policy proved to be beneficial for data quality: teachers became much more aware of getting the definitions in the catalogue right the first time round. Now, the mindset has been transformed from a deadline being debatable and final being questionable to a deadline being the limit and final being definite.

Course definitions were entered by personnel of the BE supporting office. Communication regarding course definitions between lecturers and supporting personnel was based on notes and print-outs. These went missing regularly, causing mistakes and miscommunication. Now, it has become the lecturer’s responsibility to enter the course definitions.

It proved to be impossible to link student requests for a re-assessment to the exact moment a course had been scheduled in the past. In Educator, the moment a course had been scheduled is not registered. In order to be able to create useful management reports and to assign student rework to the correct course, all BE courses in Educator are copied and renamed each year, inserting the current year into the name of the course. The lecturer responsible for the course has to agree upon the course definition still being valid. This procedure caused course information to be improved and enabled student requests to be assigned to the right, historical, course definitions.

Many reports enabling management of Educator data are still missing. Currently, it is hard to get a view on study progress, since relevant reports are not available. Migration of grades between information systems in the past has introduced errors; however lack of reports does not help identifying these errors. Annual duplication of all course descriptions results in a growth of the database, adding to the need of management reports.

Now Educator has been used for three years, initial assumptions of how education is organized are re-evaluated. A redesign seems beneficial, in which the structure of the catalogue may be greatly simplified, improving availability and understandability of the catalogue. Now it seems that items like OE and VOE (Onderwijs Eenheid and Variant Onderwijs Eenheid) may best be combined into one course entity, while the entity Semester Plan seems to be redundant completely. Having used Educator for three years also means that next year, the first section of students will have their diplomas printed by Educator.

For reasons unknown, calculation of a final grade does not function properly. In rare cases, students are being presented with insufficient grades, while final re-assessments, resulting in grades being sufficient, should have shown a more positive result. Reports created by the software manufacturer did not clarify this mystery. Now, an approach is used in which problems are investigated once students complain.

In short, many issues are related to the absence of proper management reports.

15-Apr-23 F. Boterenbrood Page 9715-Apr-23 F. Boterenbrood Page 97

Research Improving data quality in higher educationThesisImproving data quality in higher education

In some student activity plans, courses and grades students earned were migrated from the previous study support system manually. Again, when these courses were attended to by the student and the time the grades were earned was not registered in Educator. Now this information is unavailable.

Discussion on Data Quality Dimensions

Accessibility. Currently, the system is over engineered, too complex, limiting accessibility.

Accountability. Is OK.

Accuracy. Initially accuracy proved to be a problem. By assigning responsibilities to the right functions, and confronting stakeholders with the consequences of their actions, accuracy has been improved greatly.

Completeness. See Accuracy

Confidentiality. This is OK, Educator offers comprehensive role management functions.

Consistency. It is found that the level of detail in which courses are explained in additional descriptions is not consistent. Some teachers describe their courses in great detail, while others spend only a few words. No actions are defined to correct this situation.

Currency. Grading may well be a problem. No reports exist monitoring the grading process.

Integrity, Data. The data integrity is questioned because many relevant management reports are missing, the real quality of data is unknown.

Integrity, Referential. The relations between VOE, OE, Semester plans and Variant Semester plans are questionable and in many cases, absent. Simplifying the digital catalogue would greatly improve this situation.

Reliability. Even though Roel is positive on the reliability of Educator, many colleague teachers may disagree. Using Educator only once in a while, and inadequate training and documentation may well be at the source of this attitude. In Roels experience, often teachers make mistakes, blaming the system.

Specification. On a scale of 1 to 10, where 10 equals excellent and 1 is non-existent, specification scores a poor 1.5 or 2 at most.

15-Apr-23 F. Boterenbrood Page 9815-Apr-23 F. Boterenbrood Page 98

Research Improving data quality in higher educationThesisImproving data quality in higher education

Timeliness. For many, the planning of the educational process is perceived as being complex. When new education is to be developed, development has to start well in advance of the targeted study period in order to deliver study information in time.

Uniqueness. Is OK.

Volatility. Study information is altered annually, or every half year in some cases. Grades are created quarterly, amounting to about 230.000 grades being registered at Windesheim as a whole each study period. Study plans are extended every six months.

6.10 Interview report Current Data Quality Educator Viola van Drogen

Attendees: Viola van Drogen, Functional support Educator

Frank Boterenbrood

Windesheim, 16-03-2010

Introduction.

The business domain of this research is focused on the business domain supported by Educator. To visualize this domain, the domain architecture as designed by (Jansen, 2006) is used as an information source. Currently, this domain architecture is being discussed.

Issues, causes and solutions

In Educator, data may be entered and updated by many stakeholders, while in many cases Educator does not offer input checks, resulting in data being in error the moment they are stored in the system. Causes identified by functional support are:

• no workflow has been defined for the specific data set;

• on individual fields, no data checks are available;

• the stakeholder operating Educator lacks vital knowledge on the effects of erroneous input;

• time to develop fitting reports are missing.

In the experience of functional support, many stakeholders agree on the fact that data needs to be correct, however, this attitude seems to be missing with regard to one’s own actions.

Errors in data are revealed once grade certificates are printed. On these certificates, it becomes clear that grades are not assigned to the right courses and that descriptions of courses are in error. It is revealed that in many cases errors are caused by inadequate data entry. An example of inadequate data entry is the situation in which grades are entered twice. This may seem to be an innocent mistake, since in the end, the result of this mistake is that the student receives the right grade. It seems that no

15-Apr-23 F. Boterenbrood Page 9915-Apr-23 F. Boterenbrood Page 99

Research Improving data quality in higher educationThesisImproving data quality in higher education

harm is done, yet a student is granted only one chance to redo an assessment when a grade is insufficient – and entering a grade twice counts as rework!

Currently, Educator produces grade certificates for all first grade students and, at some Schools, second grade students. In the study process, the digital education catalogue needs to be finalized first, and the student’s personal activity plan (PAP) as well as the schedule may be created next.If either the catalogue or a student’s activity plan are incomplete, teachers may not be able to assign grades. It is found that the digital catalogue is used as an experimental course development stage, instead of a catalogue of predefined and finalized course definitions, resulting in frequent change requests on previously accepted definitions.

Wither or not the semester plans and semester variant plans are actually being used is unknown.

In order to get a grip on changes in DOC and enable smooth scheduling of classes, and to support the PAP creation process, the option of freezing the digital education catalogue in April is currently being discussed. Preceding this lock-down of the digital catalogue, a mechanism using red and green ‘traffic lights’ may be implemented, reminding stakeholders of the effects of changes in DOC.

Discussion on Data Quality Dimensions

The research has defined a list of data quality dimensions. Which data quality dimensions are currently of importance?

When looking from the student’s point of view, Accessibility of data in Educator leaves room for improvement. To find the right course in the catalogue proves to be a challenge at times, since naming conventions may exist yet rarely used. This results in a plethora of course identification codes, looking rather identical in many cases. An action planned to improve this situation is the implementation of automatic course code generation, replacing the manual assignment of a course code by a code that is generated automatically based on course parameters. Another action is to assign timing information to course information, indication the semester and period in which the course is going to be scheduled.

Accountability is almost 100% implemented. All actions modifying data sets in Educator are logged, creating an audit trail binding stakeholders to actions performed. Unfortunately, an audit trail is not created if an instance of a course is removed from activity plans, and re-attached to those plans once the course is modified. This action however is under debate, since it is an action which should under normal circumstances not be required and seems to create problems at entering grades.

Accuracy seems to be in control, yet in some cases student data is found to be corrupted. The source of these problems is believed to be a data migration between the old and new student information systems. However, students entering faulty data in the online admission system studielink (www.studielink.nl) are a likely source of data corruption too. Students confronted with data quality problems may have their data corrected at the student administration. Their data will be corrected in the main student information system first, and then transferred to secondary systems later.

15-Apr-23 F. Boterenbrood Page 10015-Apr-23 F. Boterenbrood Page 100

Research Improving data quality in higher educationThesisImproving data quality in higher education

Confidentiality is under discussion. Functional support is able to modify student grades, and other stakeholders are able to view these datasets. It is rather likely that this is an undesired situation.

Currency is of importance at entering grades. At Windesheim, it is agreed that grades become available within two weeks after an assessment. However, no instruments to monitor this period are present currently.

Integrity of data seems to be under control, be it that in the past entities were discovered in which required data fields were missing. Specialists from both Windesheim and the software supplier were not able to find a cause for this anomaly. The situation is corrected, yet the cause remains unknown. A report has been defined, offering a control monitoring integrity.

Currently, reliability of Educator is being questioned. Educator does not offer basic reports, reports are being created using Business Objects. Unexpected collapses of Business Objects result in unavailable or unreliable reports. Currently, re-instating Educator’s reporting capabilities is being discussed.

To solve data quality issues, an Educator database quality taskforce is created.

6.11 Data Quality Workshop

On Tuesday 30th of March, a workshop establishing future data quality requirements was conducted.

In this paragraph, the outcome of this workshop is documented.

Date and Time: Tuesday, 30-03-2010, 14:00 – 16:00

Location: IT services, Windesheim

Attendees present: G. Spoelman (Teamleader Software Development), G. Kwakkel (Software

Development), K. Haasjes (Operations), A. Polderdijk (Information

Security), G. Vissinga (Process Design), G. IJszenga (Education

Management), R. Slagter (Project Management), M. van den Berg

(Operations), A. Paans (Information Management), H. Tellegen (Operations),

A. Jaspers (Operations), F. Boterenbrood (Research).

Workshop Schedule:

14:00 Welcome, Problem Definition (A. Paans) and Workshop (F. Boterenbrood)

discussion.

14:30 Discussing Educator process and business rules (All)

15:00 Explanation on Data Quality Dimensions (F. Boterenbrood)

15:15 Selection of future Data Quality Dimensions (All)

15:45 Discussion of results (All)

15-Apr-23 F. Boterenbrood Page 10115-Apr-23 F. Boterenbrood Page 101

Research Improving data quality in higher educationThesisImproving data quality in higher education

15:00 Wrap-Up

Workshop Preparation

For this workshop, a large room providing both free space for workshop activities and a large

table for a ‘round table’ discussion was arranged.

For each attendee, the Educator process and a set of business rules was printed.

The Educator process was divided into four main sections, each sub process resulting in a

baseline as established during interviews.

For each section, a paper sheet was taped to the wall, enabling workshop attendees to visibly

choose data quality dimensions suitable for the sub process discussed.

For each section, a set of A4 sized sheets were printed, each sheet defining one data quality

dimension. Every data quality dimension was given a value according to its position in the

WDQM (value = (level – 1)2 ). An option was offered to assign a reduced value to a dimension,

resulting in the data quality dimension being partly met. The value of a dimension was expressed

in ‘credits’.

For each section, a set of 20 green and 10 red labels was provided, limiting the number of data

quality dimensions to be selected.

Prior to workshop execution, selection of data quality dimensions was tested on colleagues

within the School of Information Sciences. Based on experiences collected from these tests, data

quality dimension definitions were improved.

Workshop Execution

In a round-table setting, the Educator process and business rules were discussed. This discussion

resulted in some business rules being dropped, while others were altered.

The data quality dimensions were discussed. Care was taken not to reveal the WDQM yet.

The participants were grouped into four groups. During 15 minutes, each group discussed a sub

process, assigning 20 green labels to data quality dimensions, each label corresponding with one

‘credit’.

After this initial round, groups switched and validated data quality dimensions assigned to a sub

process by another group. Alterations were indicated by red labels. The total number of labels

was not to exceed 20.

Finally, the results were discussed. The participants showed confidence in the results gained yet

expressed doubts regarding the way these were to be interpreted. The WDQM was discussed.

Workshop Results

The business rules were validated, in some cases altered, and agreed upon.

For sub processes, data quality dimensions were assigned:

(Sub process, DQ dimension, Required , WDQM level)

o Manage Digital Education Catalogue

Milestone: courses are published

Completeness (Must have) 3

Currency (Should have) 4

Accuracy (Should have) 3

Reliability 3

15-Apr-23 F. Boterenbrood Page 10215-Apr-23 F. Boterenbrood Page 102

Research Improving data quality in higher educationThesisImproving data quality in higher education

Specifications (Should have) 2

Consistency (Should have) 4

o Orientate, Select, Apply and Contract

Milestone: Student’s activity plan is agreed upon

Timeliness (Must have) 4

Reliability 3

Completeness (Should have) 3

Accuracy (Should have) 3

Accountability 3

o Schedule, Study and Assess

Milestone: grades are assigned

Accuracy (Should have) 2

Referential Integrity (Should have) 2

Completeness (Should have) 3

Currency (Must have) 4

Timeliness (Should have) 4

o Discuss Progress and Manage Study Progress

Milestone: Student receives a Certificate

Completeness (Must have) 3

Accuracy (Must have) 3

Reliability 3

Confidentiality (Must have) 3

Currency (Must have) 4

15-Apr-23 F. Boterenbrood Page 10315-Apr-23 F. Boterenbrood Page 103

Research Improving data quality in higher educationThesisImproving data quality in higher education

Workshop Sheets Used

Beschikbaarheid

Beschikbaarheid beschrijft hoe lang het duurt voordat gegevens beschikbaar zijn voor de deelnemers in een bedrijfsproces.

Credits: 8 4Vereist (hoog) Gewenst (minder hoog)

Afhankelijk van: -

Eenheid : Tijd, B=

delivery time - input time + age

Hoog : Meetbaar met een klok

Laag : Meetbaar met een kalender

Betrouwbaarheid

Betrouwbaarheid beschrijft de mate waarin alle gegevens die een informatiesysteem beheert door besluitvormers vertrouwd worden.

Eenheid : Binair 1 of 0, Gegevens worden vertrouwd, of zij worden niet vertrouwd

Wel : 1

Niet : 0

Credits: 4Gegevens worden vertrouwd

Afhankelijk van: Nauwkeurigheid, Volledigheid

Consistentie (reactief)

Consistentie beschrijft in hoeverre alle data elementen hetzelfde beschrijven / betekenen. Consistentie kan worden verkregen door gegevens achteraf te corrigeren.

Eenheid : Ratio 0 - 1, afwijkingen ten opzichte van totaal elementen.

C = afwijkend / totaal

Hoog : 1

Laag : 0

Credits: 2 1Vereist (laag) Gewenst (minder laag)

Afhankelijk van: -

Consistentie (proactief)

Consistentie beschrijft in hoeverre alle data elementen hetzelfde beschrijven / betekenen. Consistentie kan worden verkregen door systemen te conformeren aan een enterprisearchitecture.

Eenheid : Ratio 0 - 1, afwijkingen ten opzichte van totaal elementen.

C = afwijkend / totaal

Hoog : 1

Laag : 0

Credits: 8 4Vereist (laag) Gewenst (minder laag)

Afhankelijk van: -

Herleidbaarheid

Herleidbaarheid beschrijft de mate waarin herleidbaar is wie verantwoordelijk is voor welke wijziging van de waarde van gegevens.

Afhankelijk van: -

Eenheid : Binair 1 of 0, Mutaties zijn herleidbaar, of zij zijn niet herleidbaar

Wel : 1

Niet : 0

Credits: 4Mutaties zijn herleidbaar

Integriteit

Met de term integriteit wordt hier bedoeld dat gegevens van de allerhoogstekwaliteit moeten zijn. Gegevens zijn integer als per miljoen gegevens er minder dan 3.2 fouten optreden (Six Sigma).

Credits: 16 8Vereist (6sigma) Gewenst (3σ)

Afhankelijk van: Diverse procesindicatoren

Eenheid : Sigma σ

Hoog : 6σ 3.2 fout per miljoenLaag : 3σ 67K fout per miljoen

(93%foutvrij)

Nauwkeurigheid (proactief)

Nauwkeurigheid beschrijft in hoeverre gegevens in overeenstemming met de werkelijkheid zijn. Nauwkeurigheid kan worden verkregen door gegevens vooraf te screenen.

Eenheid : Ratio 0 - 1, fout ten opzichte van totaal aantal elementen.

N = fout / totaal

Hoog : 1

Laag : 0

Credits: 4 2Vereist (laag) Gewenst (minder laag)

Afhankelijk van: -

Nauwkeurigheid (reactief)

Nauwkeurigheid beschrijft in hoeverre gegevens in overeenstemming met de werkelijkheid zijn. Nauwkeurigheid kan worden verkregen door gegevens achteraf te corrigeren.

Eenheid : Ratio 0 - 1, fout ten opzichte van totaal aantal elementen.

N = fout / totaal

Hoog : 1

Laag : 0

Afhankelijk van: -

Credits: 2 1Vereist (laag) Gewenst (minder laag)

Referentiële integriteit

Referentiële integriteit beschrijft in hoeverre aan elkaar gerelateerde verzamelingen conform de formele relatie zijn vastgelegd. Referentiële integriteit wordt bewaakt door database constraints.

Eenheid : Ratio 0 - 1, fout ten opzichte van totaal aantal relaties.

R = fout / totaal

Hoog : 1

Laag : 0

Credits: 2 1Vereist (laag) Gewenst (minder laag)

Afhankelijk van: -

15-Apr-23 F. Boterenbrood Page 10415-Apr-23 F. Boterenbrood Page 104

Research Improving data quality in higher educationThesisImproving data quality in higher education

Workshop sheet used - continued

Specificatie

Specificatie beschrijft of de gegevensverzameling en bedrijfsregels voldoende gedocumenteerd zijn.

Eenheid : Binair 0 - 1

Voldoet : 1

Voldoet niet : 0

Credits: 2 1Vereist (Voldoet) Gewenst (Voldoet bijna)

Afhankelijk van: -

Tijdigheid

Tijdigheid beschrijft de mate waarin gegevens beschikbaar zijn en geschikt voor het gebruik.

Credits: 8 4Vereist (hoog) Gewenst (minder hoog)

Afhankelijk van: Vluchtigheid, Beschikbaarheid

Eenheid : Onbepaald, T = Vluchtigheid * Beschikbaarheid

Hoog : B << Golflengte Vluchtigheid

Laag : B >= Golflengte Vluchtigheid

Toegankelijkheid

Toegankelijkheid beschrijft de mate waarin toegang tot gegevens ontstaat voordat zij irrelevant zijn.

Credits: 8 4Vereist (hoog) Gewenst (minder hoog)

Afhankelijk van: Beschikbaarheid

Eenheid : Ratio, T = 1- (delivery time -input time) / (outdated time -input time)

Hoog : 1

Laag : 0

Uniciteit

Uniciteit beschrijft de mate waarin gegevens eenduidig zijn verkregen, opgeslagen en weergegeven.

Eenheid : Ratio 0 - 1, dubbelingen ten opzichte van totaal aantal entiteiten. U = dubbel / totaal

Hoog : 1

Laag : 0

Credits: 2 1Vereist (laag) Gewenst (minder laag)

Afhankelijk van: -

Vertrouwelijkheid

Vertrouwelijkheid beschrijft de mate waarin alle gegevens afgeschermd zijn voor ongeautoriseerd gebruik

Eenheid : Vertrouwelijkheid rust op vele maatregelen. Criterium voor indeling: BIV codering

Hoog : Essentieel

Midden : Belangrijk

Laag : Wenselijk

Credits: 8 4 2Essentieel Belangrijk Wenselijk

Afhankelijk van: Beschikbaarheid en Integriteit

Vluchtigheid

Vluchtigheid beschrijft de snelheid waarmee gegevens in het bedrijfsdomein wijzigen.

Hoog : Dagelijks (f>5/W)

Redelijk : Wekelijks (f>5/M)

Matig : Maandelijks (f>5/S)

Laag : Semester

Eenheid : Frequentie

Credits: 0 0Vereist Gewenst

Afhankelijk van: -

Volledigheid

Volledigheid beschrijft de mate waarin alle gegevens die voor het proces vereist zijn, zijn vastgelegd.

Eenheid : Ratio 0 - 1, Aantal missende gegevens ten opzichte van totaal. V = gemist / totaal

Hoog : 1

Laag : 0

Credits: 4 2Vereist (laag) Gewenst (minder laag)

Afhankelijk van: Volledigheid kan op gespannen voet staan met Tijdigheid

15-Apr-23 F. Boterenbrood Page 10515-Apr-23 F. Boterenbrood Page 105

Research Improving data quality in higher educationThesisImproving data quality in higher education

6.12 Business rules according to the Windesheim Educational Standards

The Windesheim Educational Standards (Iersel, Loo, Serail, & Smulders, 2009) identify a set of

business rules in the form of high level descriptions, guiding the behavior of an organization

(Agrawal, Calo, Lee, Lobo, & Verma, 2008).

The educational model is student centered and competence based.

Students will be offered a broad set of choices.

Students will be guided in acquiring internationally accepted qualifications (CROHO26-

competences).

Students will be guided in acquiring nationally accepted generic domain competences.

Students will be coached during their study.

A school offers one or more educational programmes.

The effort an educational programme requires is measured in EC (European Credits).

A programme will be constructed using one major and two minors.

A major is a set of courses and workshops.

A major defines the mandatory part of a programme of education.

A major is 180 EC in size.

A minor is a set of courses and workshops.

A minor defines the optional part of a programme of education.

A minor is 30 EC in size.

At least one minor will result in the student having completed the first cycle (bachelor level).

A course is defined as an onderwijseenheid (OE).

The maximum size of an onderwijseenheid is 30 EC.

The minimum size of an onderwijseenheid is advised to be 3 EC.

Every onderwijseenheid will result in at least one variant (VOE).

Onderwijseenheden are clustered into a semesterplan.

Variants of an onderwijseenheid are clustered into a semestervariantplan.

Students are free to choose minors from within their programme of education, from another

programme of education, or from another institution, nationally or internationally.

Programmes may restrict the choice of minors, based on their contribution to the

competences to be acquired.

Assessments are competence based.

Competence based assessments observe students knowledge, insights, skills and attitude.

Every onderwijseenheid is concluded by an assessment.

An onderwijseenheid is either project-based or a theoretical of nature.

Every programme has a propaedeutics phase.

The propaedeutics phase has a size of 60 EC.

The propaedeutics phase is concluded with a propaedeutics assessment.

A student is advised whiter or not to continue his study, based on the results of the

propaedeutics assessment.

The advice is a mandatory opinion.

Windesheim does support the Associate degree (Ad).

The effort to acquire an Associate degree is at least 120 EC.

Windesheim does support the second cycle (Master Degree)

Education in the second cycle has no major/minor structure.

During his study, the student will receive personal guidance.

Effort required for personal development will amount to 8 EC at least and 16 EC at most.

26 Centraal Register Opleidingen Hoger Onderwijs: Central Registration of Schools in Higher Education

15-Apr-23 F. Boterenbrood Page 10615-Apr-23 F. Boterenbrood Page 106

Research Improving data quality in higher educationThesisImproving data quality in higher education

Personal development will be assessed.

Windesheim does offer part-time studies and courses.

A part-time study does not necessarily have a major/minor structure.

6.13 Detailed Business Rules

Manage Digital Education Catalogue

1. When the development of a course is completed, it will be described in the Digital Education

Catalogue.

2. When a course is described in the Digital Education Catalogue, it will be assigned to a

semesterplan.

3. When a course is described in the Digital Education Catalogue, it will be assigned to a major

or a minor.

4. When a course is described in the Digital Education Catalogue, for each type of education

(daytime education, part-time education) a variant will be described.

5. When a variant of a course is described in the Digital Education Catalogue, it will be

assigned to a variant semesterplan.

Orientate

6. When a student engages a new semester, he will work on his Personal Activity Plan (PAP).

7. When a student works on his PAP, he may use the Digital Education Catalogue as a source to

choose from.

Select

8. When a student works on his PAP, he may choose semester variant plans from the

Educational Catalogue and add them to his PAP, thus creating an individual study

programme.

9. When a student is enlisted in a study, the mandatory major of his programme will have to be

executed first.

10. A student’s personal activity plan may in Educator may not be managed by the student. It

may actually be managed by the back-office of a School27.

Apply

11. When a student adds a semester variant plan to his PAP, including only minors offered by the

programme the student initially enlisted for, the addition is agreed upon automatically.

12. When a student adds a semester variant plan to his PAP, including minors offered by

programmes other than the one the student initially enlisted for, an examination committee

will have to agree first.

13. When a minor is either full or cancelled, the student may have to choose another semester

variant plan for his PAP.

27 As identified in the workshop of 30-03-2010, see appendix 6.11

15-Apr-23 F. Boterenbrood Page 10715-Apr-23 F. Boterenbrood Page 107

Research Improving data quality in higher educationThesisImproving data quality in higher education

Contract

14. When a PAP is agreed upon, and the minor(s) selected by the student is/are still available and

not booked already, the PAP is finalized.

Schedule

15. When the execution of minors is agreed upon, a schedule is created by individual or

collaborating Schools.

16. When a schedule is created, it takes into account the number of students attending to a

course, the specific characteristics and educational needs of a course (type and size of

classrooms and equipment), the availability of teaching staff assigned to the course and the

order in which courses are to be scheduled.

17. When the schedule is finalized, it is published.

Study

18. When the student is working on his study, he will create a portfolio.

19. When the student is working on his study, he may work with other students on a project

20. When students work in projects, they will share items in their portfolio.

Assess

21. When an item in a portfolio is ready for assessment, the student will transfer ownership of

that item to the teacher.

22. When an item is assessed, a grade will be assigned to it.

23. When a grade is assigned to an item, it may no longer be changed.

24. When all assessments of a course are finalized, the end result will be calculated.

25. When an end result is calculated, rules as defined in the Digital Course Catalog for the course

at hand are executed.

26. When all results exceed the minimal requirements as defined in the Digital Course Catalog

for the course, the student is granted the European Credits (EC) associated with this course

and as defined in the Digital Course Catalog.

Discuss Progress

27. When a semester is finished, the student’s progress is discussed.

28. When a student fails to collect the required EC’s during the propaedeutics phase within a

limited period, the student is not allowed to continue his study at Windesheim.

29. In some cases, when the student has collected 120 EC, an Associate degree may be assigned.

30. When the student has collected 210 EC, the (final) graduation minor may be started.

31. When the student has executed the graduation minor successfully, the first cycle is completed

and a Bachelor’s degree is assigned.

32. When a student wishes to earn a Master degree, he may engage in a study for the second

cycle.

33. When, while studying in the second cycle, the student collects a minimum of 60 EC, the

second cycle is completed and the Master degree is granted.

15-Apr-23 F. Boterenbrood Page 10815-Apr-23 F. Boterenbrood Page 108

Research Improving data quality in higher educationThesisImproving data quality in higher education

Manage Study Progress

34. When a product from a student is assessed and graded, the grade is stored digitally and made

available to the student.

35. When credits are granted to a student, these credits are stored digitally and made available to

the student.

36. When a first attempt to be assessed is not successful, a second assessment will be offered

during the same study year.

37. When a course is changed between assessments, the rules and number of credits associated

with the course the student originally attended to, apply.

6.14 Project Flow

During discussing the project’s progress, the constituent of the project drew a map representing the

flow of the project as he visualized it. Being indeed an accurate description of this project, this flow

was agreed to be documented in order to be able to discuss progress in the future. This appendix

contains the constituent’s vision on the flow of the project.

Theory

WDQMmodel

Data Quality

practices

Practice

Issues

Current

Solutions

Desired

Data Quality

Maturity

What can be seen is that, based on theories on data quality and maturity, a data quality maturity model

is created. This model is used to investigate issues as experienced in the current practice. Data Quality

practices, defined by the model, describe a desired situation in terms of solutions. This finally, leads to

new information adding to the body of knowledge (theory).

15-Apr-23 F. Boterenbrood Page 10915-Apr-23 F. Boterenbrood Page 109

Research Improving data quality in higher educationThesisImproving data quality in higher education

6.15 Literature

Agrawal, D., Calo, S., Lee, K.-W., Lobo, J., & Verma, D. (2008). Policy Technologies for Self-

Managing Systems. Boston: IBM Press.

Ahern, D. M., Clouse, A., & Turner, R. (2008). CMMI® Distilled: A Practical Introduction to

Integrated Process Improvement, Third Edition. Boston: Pearson Education, Inc.

Arvix. (2009). Wacht u tot de rookmelder afgaat. Retrieved november 8, 2009, from www.arvix.com:

http://www.arvix.com/user_files/file/wacht_u_tot_de_rookmelder_af_gaat_v12_web.pdf

Baida, Z. S. (2002). Architecture Visualization, Master Thesis in Computer Science. Amsterdam: VU

University.

Bakker, J. G. (2006). De (on)betrouwbaarheid van informatie, je staande houden in het

informatiegeweld. Benelux: Pearson Education.

Batini, C., & Scannapieco, M. (1998). Data Quality, Concepts, Methodologies and Techniques. New

York: Springer Berlin Heidelberg.

Besouw, F. v. (2009). Samenhang tussen bedrijfsregels, bedrijfsprocessen en gegevenskwaliteit.

Retrieved november 8, 2009, from Arvix:

http://www.arvix.com/user_files/file/samenhang_bedrijfsregels_bedrijfsprocessen_gk.pdf

Boer, S. d., Andharia, R., Harteveld, M., Ho, L. C., Musto, P. L., & Prickel, S. (2006). Six Sigma for

IT Management. Zaltbommel: Van Haren Publishing.

Boterenbrood, F., Hoek, J. W., & Kurk, J. (2005). De Informatievoorzieningsarchitectuur als

scharnier. Den Haag: Academic Service.

Broers, H. (2007). Onrust in de wijngaard, de wording van Windesheim. Zwolle: Waanders.

Caballero, I., & Piattini, M. (2003). CALDEA: A Data Quality Model Based on Maturity Levels.

Proceedings of the Third International Conference On Quality Software (pp. 380-387).

Washington: IEEE Computer Society.

Caluwé, L. d., & Vermaak, H. (2006). Leren veranderen, Een handbioek voor de veranderkundige.

Deventer: Kluwer.

Champlin, B. (2002, 01 14). Beyond The CMM: Why Implementing the SEI's Capability Maturity

Model Is Insufficient To Deliver Quality Information Systems in Real-World Corporate IT

Organizations. Retrieved 02 07, 2010, from DAMA Michigan: www.dama-michican.org

Chen, P. (1976). The Entity Relationship Model: Toward a Unified View on Data. ACM Transactions

on database systems , 166 - 193.

Conway, S. D., & Conway, M. E. (2008). Essentials of Enterprise Compliance. Hoboken, New

Jersey: John Wiley & Sons.

Curtis, B., Hefley, W. E., & Miller, S. A. (2009). People CMM: A Framework for Human Capital

Management, Second Edition. Boston, MA: Pearson Education, Inc.

15-Apr-23 F. Boterenbrood Page 11015-Apr-23 F. Boterenbrood Page 110

Research Improving data quality in higher educationThesisImproving data quality in higher education

Data Quality Task Force. (2004, 12). Forum Guide to Building a Culture of Quality Data. Retrieved

11 30, 2009, from ies national center for educational statistics:

http://nces.ed.gov/forum/pub_2005801.asp

Davis, J. (2009). Open Source SOA. Greenwich: Manning Publications.

English, L. P. (2009). Information Quality Applied: Best Practices for Improving Business

Information, Processes and Systems. Indianapolis: John Wiley & Sons.

European Commission. (2005). The Framework of Qualifications for the European Higher Education

Area. Retrieved 03 10, 2010, from The official Bologna Process Website:

http://www.ond.vlaanderen.be/hogeronderwijs/bologna

Fishman, N. A. (2009). Viral Data in SOA: An Enterprise Pandemic. Boston: Pearson plc publishing

as IBM Press.

Friedman, T. (2009, 09 09). Gartner Webinar: Data Quality Do’s and Don'ts. Retrieved 02 10, 2010,

from Gartner: www.gartner.com

Gack, G. A. (2009). Connecting Six Sigma to CMMI Measurement and Analysis. Retrieved 12 9,

2009, from i Six Sigma: http://software.isixsigma.com/library/content/c050316b.asp

Gartner. (2007, 02 07). Gartner's Data Quality Maturity Model. Retrieved 02 10, 2010, from Gartner

Research: http://my.gartner.com

Goodhue, D. L., Wybo, M. D., & Kirsch, L. J. (sept 1992). The Impact of Data Integration on the

Costs and Benefits of Information Systems. MIS Quarterly, Vol. 16, No. 3 , 293-311.

Graham, I. (2007). Business Rules Management and Service Oriented Architecture. Hoboken: John

Wiley & Sons.

HBO-raad Lectorenplatform. (2006). Lectoren bij hogescholen. Diemen: Villa Grafica.

Hendriks, P. (2000). De noodzaak van een nieuwe norm voor procesverbetering? Wat behelst ISO

15504 - SPICE? Retrieved 12 9, 2009, from Esprit project no 27700:

http://www.serc.nl/espinode/informatie/SPICE.htm

Hoermann, K., Mueller, M., Dittmann, L., & Zimmer, J. (2008). Automotive SPICE in Practice:

Surviving Interpretation and Assessment. Santa Barbara: Rocky Nook.

Hope, G., & Woolf, B. (2008). Enterprise Integration Patterns. Boston: Pearson Education, Inc.

Iersel, J. v., Loo, F. v., Serail, I., & Smulders, L. (2009). Windesheim Onderwijs Standaard versie 5.0.

Zwolle: Windesheim.

Jansen, J. (2006). Domeinarchitectuur vraaggestuurd onderwijs Windesheim. Zwolle: Windesheim.

Johnson, E., & Jones, J. (2008). A Developer’s Guide to Data Modeling for SQL Server: Covering

SQL Server 2005 and 2008. Boston: Pearson Education, Inc.

Kneuper, R. (2008). CMMI: Capability Maturity Model Integration A Process Improvement

Approach. Santa Barbara, CA: Rocky Nook.

15-Apr-23 F. Boterenbrood Page 11115-Apr-23 F. Boterenbrood Page 111

Research Improving data quality in higher educationThesisImproving data quality in higher education

Kovac, R., Lee, Y. W., & Pipino, L. L. (1997, 10). Total Data Quality Management: The Case of IRI.

Retrieved 02 24, 2010, from The MIT Total Data Quality Management Program:

http://web.mit.edu

Lankhorst, M. (2005). Enterprise Architecture At Work. Berlin: Springer-Verlag Berlin and

Heidelberg GmbH & Co. KG .

Lee, Y. W., Pipino, L. L., Funk, J. D., & Wang, R. Y. (2006). Journey to Data Quality. Cambridge,

Massachusetts: The MIT Press.

Loshin, D. (2001). Enterprise Knowledge Management, the data quality approach. San Diego:

Academic Press.

Loshin, D. (2008). Master Data Management. Burlington: Morgan Kaufmann OMG Press.

Marble, R. P. (1992). A stage theoretic approach of information system planning in existing entities of

recently established market economies. Retrieved 11 11, 2009, from System Dynamics

Society: http://www.systemdynamics.org/conferences/1992/proceed/pdfs/marbl405.pdf

McGilvray, D. (2008). Executing Data Quality Projects. Burlington, MA: Elsevier, Inc.

Mosley, M. (2008). DM BOK: Data Management Body of Knowledge. Retrieved 11 07, 2009, from

Data Management International: www.dama.org

Nolan, R. (march-april 1979). Managing the crisis in data processing. Harvard Business Review no

79206 .

Object Management Group. (2008, 06 01). Business Process Maturity Model (BPMM). Retrieved 12

9, 2009, from Object Management Group: http://www.omg.org/spec/BPMM/

Olle, T. W. (1978). The Codasyl Approach to Data Base Management. New York: John Wiley &

Sons.

Pant, K., & Juric, M. (2008). Business Process Driven SOA using BPMN and BPEL: From Business

Process Modeling to Orchestration and Service Oriented Architecture. Birmingham: Packt

Publishing.

Pascale, R., Peters, T., & Waterman, R. (2009). McKinsey's 7-s framework model. Retrieved 12 9,

2009, from Value Based Management.net:

http://www.valuebasedmanagement.net/methods_7s.html

Porter, M., & Millar, V. (1985, Juli-August). How information gives you competitive advantage.

Harvard Business Review .

Project Management Institute. (2008). Organizational Project Management Maturity Model OPM3.

Newtown Square, Pennsylvania: Project Management Institute.

Riet, P. v. (2009, 10). Knelpunten in de plannings- en roosteringsprocessen van de hogescholen.

Retrieved 02 18, 2010, from Lectoraat ICT en Onderwijsinnovatie: www.licto.nl

Ryu, K.-S., Park, J.-S., & Park, J.-H. (2006). A Data Quality Management Maturity Model. ETRI

Journal vol.28, no.2, Apr. 2006 , 191 - 204.

15-Apr-23 F. Boterenbrood Page 11215-Apr-23 F. Boterenbrood Page 112

Research Improving data quality in higher educationThesisImproving data quality in higher education

Schumacher, M., Fernandez-Buglioni, E., Hybertson, D., Buschmann, F., & Sommerland, P. (2006).

Security Patterns, Integrating Security and System Engineering. Chchester: John Wiley & Sons

Ltd.

Software Engineering Institute. (2009). Capability Maturity Model Integration Overview. Retrieved

12 9, 2009, from Software Engineering Institute / Carnegie Mellon:

http://www.sei.cmu.edu/cmmi/

Starreveld, R., Leeuwen, O. v., & Nimwegen, H. v. (2004). Bestuurlijke informatieverzorging deel 2a

- Fasen van de waardekringloop. Leiden: Stenfert Kroese.

Tan, D. (2003). Van Informatie management naar Informatie Infrastructuur management. Leiderdorp:

Lansa Publishing.

Treacy, M., & Wiersema, F. (1997). The Discipline of Market Leaders: Choose Your Customers,

Narrow Your Focus, Dominate Your Market. New York: Perseus Books.

Vermeer, B. H. (2001). Data Quality and Data Alignment in E-business. Eindhoven: CIP-Data

Library Technische Universiteit Eindhoven.

Verreck, O., Graaf, A. d., & Sanden, W. v. (2005, augustus). Meten en verbeteren van

gegevenskwaliteit. Tiem - 9 , pp. 36 - 42.

Vught, F. A., & Huisman, J. (2009). Mapping the Higher Education Landscape. Dordrecht: Springer

Science + Business Media B.V.

Windesheim. (2004). IT Architectuur 2004 ICT v3.doc. Zwolle: Windesheim dienst ICT.

Zee, P. d. (2001). Business Transformatie en IT, Vervlechting en ontvlechting van ondernemingen en

informatietechnologie. Retrieved 11 11, 2009, from Management en Consulting:

http://managementconsult.profpages.nl/man_bib/ora/vanderzee01.pdf

Zeist, B. v., Hendriks, P., Paulussen, R., & Trieneken, J. (1996). Kwaliteit van Softwareprodukten -

Ervaringen met een kwaliteitsmodel. Deventer: Kluwer Bedrijfswetenschappen.

15-Apr-23 F. Boterenbrood Page 11315-Apr-23 F. Boterenbrood Page 113

Research Improving data quality in higher educationThesisImproving data quality in higher education

6.16 List of figures and tables

Figure 01: Windesheim Context Diagram...............................................................................................4

Figure 02: The Windesheim application landscape...............................................................................11

Figure 03: IT service department and system integration organization.................................................11

Figure 04: Nolan’s stage model.............................................................................................................15

Figure 05: Era’s and discontinuities, (Zee, 2001)..................................................................................16

Figure 06: Project stakeholders..............................................................................................................20

Figure 07: Research Model....................................................................................................................25

Figure 08: Concepts Used......................................................................................................................28

Figure 09: graphical representation WDQM..........................................................................................40

Figure 10: A Data Quality Management Maturity Model (Ryu, Park, & Park, 2006)...........................41

Figure 11: Related Dimensions..............................................................................................................53

Figure 12: Domain architecture student centered education Windesheim.............................................55

Table 01: Stakeholder analysis...............................................................................................................20

Table 02: Research Material..................................................................................................................29

Table 03: Practices and structure, process, technology, information and staff......................................36

Table 04: A combined view on maturity................................................................................................37

Table 05: Windesheim Data Quality Maturity model WDQM..............................................................38

Table 06: A combined view on the WDQM and the Gartner Data Quality Maturity model.................43

Table 07: An overview of data quality dimensions................................................................................45

Table 08: Dimensions of data quality....................................................................................................46

Table 09: WDQM Goals expressed in Data Quality Dimensions, Practices and Attributes..................52

Table 10: Current data quality dimension values...................................................................................60

Table 11: Data quality dimension assessment workshop results...........................................................61

6.17 Glossary

Accessibility Ease of attainability of the data

Accountability Accountability is the property that describing that actions affecting enterprise assets can be traced to the actor responsible for the action

Accuracy Closeness of the value of the data to the value in the

15-Apr-23 F. Boterenbrood Page 11415-Apr-23 F. Boterenbrood Page 114

Research Improving data quality in higher educationThesisImproving data quality in higher education

real world

Business Rules A set of high level descriptions, guiding the behavior of an organization

Business rule matching

Comparing data values found in a database with valid values according to business rules

Canonical Data Model A thesaurus of all data being exchanged between systems

Completeness The degree in which elements are not missing from a set

Confidentiality The property that data is disclosed only as intended by the enterprise

Consistency The degree in which values and formats of data elements are in line with semantic rules over this set of data-items

Correctness The degree in which values and formats of data elements are in line with the current state of an object in the physical world represented by the data

COTS Commercial Off The Shelve. An acronym for packaged, ready to use applications

CRUD services Create, Retrieve, Update and Delete data manipulation services

Currency Concerns how promptly data are updated

Data profiling A set of algorithms for statistical analysis and assessment of the quality of data values within a data set, as well as for exploring relationships that exist between value collections within and across data sets

Discontinuity A change in values perceived to be a setback

DMAIC Quality improvement cycle including Define, Measure, Analyze, Improve and Control phases

Endless loop See loop, endless

Information Data, fit for use, available in a context

Input check A control guarding data quality when entered

IP Information Product

Integrity, Data The degree in which data is fit for use

Integrity, Referential The degree in which related sets of data are consistent

Latency Idle time in a process

Loop, endless See endless loop

MDM Master Data Management maturity model

New data acquisition An activity in which suspect data is replaced by newly retrieved data

Overloading Assigning values to a variable, indication a system state the variable was not originally intended to signal

OPM3 Organizational Project Management Maturity Model

Process A set of business rules, started by a single trigger, when executed results in a predictable outcome

Process area A cluster of related practices, part of a maturity level

Reliability The degree in which data is perceived to represent reality

15-Apr-23 F. Boterenbrood Page 11515-Apr-23 F. Boterenbrood Page 115

Research Improving data quality in higher educationThesisImproving data quality in higher education

Root cause analysis A technique to identify the underlying root cause, the primary source resulting in the problems experienced

ROTAP Research, Ontwikkel (Develop) Test, Accept and Production environments

Schema cleaning Transforming a conceptual schema in order to achieve or optimize a given set of qualities

Schema matching To create a mapping between semantically correspondent elements of two database schemas

Service center Department within an organization supporting the main business processes

SOA Service Oriented Architecture

Source Rating Assessing sources on the basis the quality of data they provide to other sources

Specifications A measure of the existence, completeness, quality and documentation of data standards

Staff Personnel involved in a process

Structure Describes the way an organization is structured

Technology Tooling required to execute a process

TDQM Total Data Quality Methodology

Timeliness Or Availability: A measure of the degree to which data are current and available for use

TIQM Total Information Quality Methodology

Uniqueness Refers to requirements that entities are captured, represented, and referenced uniquely

Volatility Characterizes the frequency with which data vary in time

WDQM Windesheim Data Quality Maturity Model

15-Apr-23 F. Boterenbrood Page 11615-Apr-23 F. Boterenbrood Page 116