organisatievolwassenheid en gegevenskwaliteit
Post on 28-Jul-2015
82 Views
Preview:
TRANSCRIPT
Research Improving data quality in higher educationThesisImproving data quality in higher education
Management Summary
Windesheim aims to become a near zero-latency organization: an organization that is able to respond
promptly to events in its environment. However, unexpected errors hinder the implementation of near
zero latency business process technologies. These errors are caused by poor data quality. The main
business problem triggering this research is that poor data quality inhibits Windesheim’s ability to
become a near real-time organization. Closer examination reveals a serious business impact of poor
data quality, which is defined by student (customer) dissatisfaction, inefficient process execution, loss
of image and loss of control. Earlier research revealed that poor data quality is caused by applications
not checking input values, and information objects having different values and definitions in different
business domains, which in turn is caused by a departmental view on data instead of a more holistic
business process wide view on information.
The migration of a departmental view on data towards a holistic view on information is characterized
as a growth in maturity. Not only does the impact of rapid changes in technology force Windesheim to
grow in maturity, the migration from the data processing era to the information era is driven by
international developments too. As part of this migration, a natural crisis, the technological
discontinuity has to be overcome. In this research, the relation between organizational maturity and
data quality is sculpted in an instrument predicting required organizational change. The (CMMI-
based) instrument defines five levels of data quality maturity, ranging from 1) Initial through 2)
Managed, 3) Defined, 4) Quantitatively Managed and 5) Optimizing. For each level, based on proven
theories, process areas, goals and metrics are defined.
Using this instrument, and for Windesheim’s main business process, study management, current and
required data quality and corresponding organizational maturity were investigated. Current and
required maturity levels were assessed by observing process areas and goals currently implemented
and linking required goals with business rules of study management. It was found that currently, in
this domain, Windesheim has reached data quality maturity level one, and to satisfy both the business
rules and become a near-zero latency organization, data quality maturity level three is a minimal
prerequisite. Some data quality maturity level four goals will have to be reached as well.
To reach the goals required, a three staged migration path is recommended1:
1. Reach data quality maturity level two (Managed) first by repairing the current database and
creating reports for data quality monitoring purposes by means of well defined projects;
2. Reach data quality maturity level three (Defined) by putting a lasting programme in place,
adapting Educator (preventing errors by checking data quality at the input functions and
simultaneously reducing complexity by simplifying functionalities), empowering staff,
making teachers responsible for the complete process cycle, creating near real time interfaces
based on standard application interfaces, and handling the technological discontinuity;
3. Implement required level four (Quantitatively Managed) goals by establishing and
communicating strict deadlines within the study process.
This will clear the way for Windesheim to become a near-zero latency organization, improve study
management process efficiency, reduce cost of error detection and recovery, and improve customer
(student) satisfaction. Taking into account the benefits of this outcome for Windesheim, I advice
management to decide on implementing the recommendations made in this research.
1 Detailed advice is available in paragraph 5.6.2
15-Apr-23 F. Boterenbrood Page 215-Apr-23 F. Boterenbrood Page 2
Research Improving data quality in higher educationThesisImproving data quality in higher education
The Organization
Windesheim is a university of professional education, located in Zwolle, and currently serving more
than 17.000 students. The organization is controlled by the Board of Directors, which directs the
departmental management. The number of employees is 1.800, 900 of which are teaching staff. On the
board level, Windesheim and VU university Amsterdam are closely related. As a result of this
cooperation, Windesheim does offer some master studies in Zwolle, and has recently started the
Honours College Zwolle, a college aimed at serving international and ‘high potential’ students2.
Board of directors
VU-Windesheimcoöperation
11 Schools 6 service departments
Accreditation
Students
Business Partners
CollaboratingSchools
Figure 01: Windesheim Context Diagram
2 Instellingsplan 2007 – 2012, Besluit nummer 441 College van Bestuur van Windesheim
15-Apr-23 F. Boterenbrood Page 315-Apr-23 F. Boterenbrood Page 3
Research Improving data quality in higher educationThesisImproving data quality in higher education
Parties involved
Author: Frank Boterenbrood
Waardeel 1f
8332 BB Steenwijk
E-mail: frank@boterenbrood.com
Supervisor: Albert Paans
E-mail: a.paans@windesheim.nl
Supervisor: Rob Keemink
E-mail: r.keemink@windesheim.nl
15-Apr-23 F. Boterenbrood Page 415-Apr-23 F. Boterenbrood Page 4
Research Improving data quality in higher educationThesisImproving data quality in higher education
Preface
Surely, it is hard to find anything less inspiring than data quality. But look at it this way: there the
data sits in the application’s database, waiting for it to be retrieved, combined, processed and
transformed into useful information. This is its moment of glory, the moment when it shines at the
user interface, or even management dashboard perhaps, being delivered by information services
conform service level agreements and processed by applets and modules, conform well established
and glorious architectural patterns and styles, only to find that it is in error, flawed, outdated,
misplaced ….
Data, most literally, is the foundation on which information systems are build, like piling creates a
foundation for (Dutch) houses. There is nothing sexy about a concrete pillar. It is hammered into the
ground and remains invisible for eons to come. However, if it isn’t there, or if there is something
wrong with it, the construction it is supposed to support will inevitably come tumbling down.
Today, every business operation relies on their information systems. And with these information
systems, organizations create and consume immense amounts of data. If the data are flawed, time and
money may be lost in equally large quantities, causing at least embarrassment and loss of reputation.
Today, every business, every leader, every consumer has a vested interest in the quality of data.
This is true for Windesheim too. This research investigates the relation between data quality and
maturity of an organization, in particular the maturity of a higher education organization. Yet, the
results are not confined to education. What has been found here, may well be applicable in other
organizations. It is my hope therefore, that this research may contribute to improved data quality in a
much broader context. For, when data is flawed, no investment in modern and exiting technologies
may undo the damage, while once data is fit for use – or has a quality even beyond that, the
capabilities of data to support and improve business are hard to overestimate.
Acknowledgements
First I do thank my beloved spouse Carin, who in the past years had supported me in my study efforts
by enduring many hours of loneliness and reduced attention.
I would like to thank Rob Keemink, who has invested a large amount of time and money into my
study, and defended this investment, despite of many financial cut-backs and management
discussions.
There are thanks for Albert Paans too, who was assigned the burden of being the official constituent
for this research, and invested a lot of his time in studying and debating the results I put forward,
which greatly contributed to the quality of the research.
I would like to thank Maarten Westerduin, for trusting me not to lose track of the Windesheim School
of Information Sciences priorities.
Also, I would like to extend my gratitude towards my colleagues of Bedrijfskundige Informatica, who
at so many occasions enabled my study and graduation by taking on extra duties where I was not able
to fulfill them.
15-Apr-23 F. Boterenbrood Page 515-Apr-23 F. Boterenbrood Page 5
Research Improving data quality in higher educationThesisImproving data quality in higher education
And last but most certainly not least, I would like to thank Marlies van Steenbergen, Theo Thiadens
and Arjen de Graaf for their time invested in and light shun on the WDQM and data quality in higher
education in general.
15-Apr-23 F. Boterenbrood Page 615-Apr-23 F. Boterenbrood Page 6
Research Improving data quality in higher educationThesisImproving data quality in higher education
1. Table of contentsManagement Summary...........................................................................................................2
The Organization.....................................................................................................................3
Parties involved.......................................................................................................................4
Preface.....................................................................................................................................5
1. Table of contents............................................................................................................6
2. Exploring data quality in higher education....................................................................9
2.1 Project Introduction........................................................................................9
2.1.1 Windesheim’s Mission..............................................................9
2.1.2 Windesheim’s Information Technology.................................10
2.2 Business Problem description......................................................................11
2.2.1 Indications...............................................................................11
2.2.2 Consequences..........................................................................11
2.2.3 Business Problem....................................................................12
2.3 Cause analysis..............................................................................................12
2.3.1 Technical / functional causes..................................................13
2.3.2 Process design causes.............................................................13
2.3.3 Organizational causes.............................................................13
2.3.4 Growing pains.........................................................................14
2.3.5 Perspective..............................................................................15
2.3.6 Past, current and future situation............................................15
2.3.7 Summary.................................................................................17
2.4 Research Problem.........................................................................................17
2.5 Stakeholder Analysis....................................................................................18
2.6 Project Relevance.........................................................................................20
2.6.1 Stakeholder Relevance............................................................20
2.6.2 Business Relevance.................................................................20
2.6.3 Relevance to Science..............................................................20
15-Apr-23 F. Boterenbrood Page 715-Apr-23 F. Boterenbrood Page 7
Research Improving data quality in higher educationThesisImproving data quality in higher education
3. Conceptual Research Design........................................................................................21
3.1 Theoretical approach and focus....................................................................21
3.1.1 Focus.......................................................................................21
3.1.2 Maturity revisited....................................................................21
3.1.3 A vision on Maturity...............................................................22
3.1.4 What is data quality?...............................................................22
3.1.5 A vision on Data Quality........................................................24
3.2 Research Goal...............................................................................................24
3.3 Research Model............................................................................................24
3.4 Research Questions......................................................................................25
3.4.1 Main questions........................................................................25
3.4.2 Sub questions for main question 1..........................................25
3.4.3 Sub questions for main question 2..........................................26
3.4.4 Sub questions for main question 3..........................................26
3.4.5 Sub questions for main question 4..........................................26
3.5 Concepts used...............................................................................................27
4. Technical Research Design..........................................................................................28
4.1 Research Material.........................................................................................28
4.2 Research Strategy.........................................................................................29
4.2.1 Strategy...................................................................................29
4.2.2 Reliability................................................................................29
4.2.3 Validity...................................................................................29
4.2.4 Scope.......................................................................................29
5. Research Execution......................................................................................................30
5.1 Correlation between data quality and maturity............................................30
5.1.1 Maturity, a brief history..........................................................30
5.1.2 Maturity levels........................................................................30
5.1.3 Process Areas..........................................................................31
15-Apr-23 F. Boterenbrood Page 815-Apr-23 F. Boterenbrood Page 8
Research Improving data quality in higher educationThesisImproving data quality in higher education
5.1.4 Identifying relevant process areas...........................................32
5.1.5 Windesheim Data Quality Maturity Model............................37
5.1.6 Alternative views on data quality maturity.............................40
5.1.7 Conclusion..............................................................................43
5.2 Data Quality Attributes................................................................................43
5.2.1 Dimensions of data quality.....................................................43
5.2.2 Data Quality Dimensions Discussed.......................................45
5.2.3 WDQM Goals.........................................................................50
5.2.4 (Time)related dimensions.......................................................52
5.3 Business rules...............................................................................................53
5.3.1 Business rules, a definition.....................................................53
5.3.2 Study management..................................................................54
5.3.3 Business rule mining...............................................................55
5.4 Current data quality maturity level study management domain...................55
5.4.1 Interview results......................................................................56
5.4.2 Current Maturity.....................................................................56
5.4.3 Current data quality dimension’s attribute values..................57
5.4.4 Conclusion..............................................................................59
5.5 Required data quality maturity level study management domain................60
5.5.1 Workshop results....................................................................60
5.5.2 Discussion...............................................................................61
5.5.3 Initial Research Problem.........................................................61
5.5.4 A data quality maturity level three (Defined) organization....62
5.5.5 Level 4 (quantitatively managed) requirements.....................62
5.6 Growing from current to required maturity..................................................63
5.6.1 Gap analysis............................................................................63
5.6.2 Migration.................................................................................65
5.7 Concluding...................................................................................................68
15-Apr-23 F. Boterenbrood Page 915-Apr-23 F. Boterenbrood Page 9
Research Improving data quality in higher educationThesisImproving data quality in higher education
5.7.1 Conclusion..............................................................................68
5.7.2 Recommendations...................................................................69
5.7.3 Stakeholder Value...................................................................70
5.7.4 Achieved Reliability and Validity..........................................70
5.7.5 Scientific Value and Innovativeness.......................................71
5.7.6 Generalisation.........................................................................71
5.7.7 Research Questions Answered................................................71
5.7.8 Recommendation on further research.....................................73
5.7.9 Reflection................................................................................74
6. Appendices...................................................................................................................75
6.1 Interview Report Windesheim Integration Team.........................................75
6.2 Interview Report WDQM Marlies van Steenbergen....................................77
6.3 Interview Report Data Quality in Education Th. J.G. Thiadens.................78
6.4 Interview WDQM dimensions Report Arjen de Graaf................................80
6.5 Interview report Current Data Quality Educator Gerrit Vissinga................81
6.6 Interview report Current Data Quality Educator Gert IJszenga...................83
6.7 Interview report Current Data Quality Educator Gert IJszenga Continued. 84
6.8 Interview report Current Data Quality Educator Klaas Haasjes..................86
6.9 Interview report Current Data Quality Educator Louis Klomp....................87
6.10 Interview report Current Data Quality Educator Viola van Drogen............89
6.11 Data Quality Workshop................................................................................91
6.12 Business rules according to the Windesheim Educational Standards..........95
6.13 Detailed Business Rules...............................................................................96
6.14 Project Flow.................................................................................................98
6.15 Literature......................................................................................................99
6.16 List of figures and tables............................................................................103
6.17 Glossary......................................................................................................103
15-Apr-23 F. Boterenbrood Page 1015-Apr-23 F. Boterenbrood Page 10
Research Improving data quality in higher educationThesisImproving data quality in higher education
2. Exploring data quality in higher education
2.1 Project Introduction
2.1.1 Windesheim’s Mission
Windesheim´s mission statement: “As an institution in higher education in the Netherlands,
Windesheim offers a broad choice and is foremost a social venture. Windesheim is a community in
which active and knowledgeable individuals meet. Windesheim is an innovative knowledge and
expertise centre, challenging individuals to develop themselves towards valuable and self-confident
professionals.
Integration of three primary processes, education, research and social entrepreneurship results in
excellent opportunities for dispersion of knowledge.
Windesheim offers tailored education and supports individual study careers. Competences and
personal planning are the foundations for each individual student.
In the area of research and social entrepreneurship Windesheim distinguishes it selves by the
implementation of knowledge exchange centers in Zwolle and participation in regional knowledge
networks3.”
3 Instellingsplan 2007 – 2012, Besluit nummer 441 College van Bestuur van Windesheim
15-Apr-23 F. Boterenbrood Page 1115-Apr-23 F. Boterenbrood Page 11
Research Improving data quality in higher educationThesisImproving data quality in higher education
2.1.2 Windesheim’s Information Technology
As indicated by figure 2, the Windesheim application landscape has become rather intertwined over
the years.
HRMControlebestand
Facility-Office
Desktop
PersoonsgegevensEn Lesmoment per groep
Untis
DiversDiversDiv. plansys.
IBG
Journaalposten
Persoonsgegevens- NAW- Aanmeldingsgegevens- Bekostigingsgegevens
AlumniNAW mutaties
Inschrijvingen
Aanbodkeuzevakkenper student
Cijfers
Beschikbaarheid
Min v O&W Westerhof boeken Student KvKVoorraad en
Bestelgegevens
ExterneStudiepunten
Adressen AdressenScholenbestand
Postcodetabel
InzetPlanning Multo (TVS)
ISEK (Evasys)
Teleform
Studenten
LOICampus
Resultaten
LOICampus
Resultaten
SPSS pakket
NDS
Discoverer
Vubis
ERP
HRM
ProjectsMgmt info
CendrisCBAP enCRI-HO
Decos
StudentDossier
Scanproces
Persoonsgegevens
StageadresResultaten Contact
gegevens
Min v Justitie
Octaaf
Blackboard
Concord
Blackboard
Concord
MijnWindesheim
Resultaten
SIMGW
BGC
Personeelgegevens
Cats
Finx
Crim
Caas
Bank
Kern
RelatiebeheerRelatiebeheer
Docentgegevens
PaltPalt
Brongegevens
Finance
Inkoop
Grootboek
Nal
Radius
PIM
Flexlm
DB KoppelingKoppelingHandmatigAutorisatieNog uitzoeken
Intern systeem
Externe partij
Uitgefaseerd
DB KoppelingKoppelingHandmatigAutorisatieNog uitzoeken
Intern systeem
Externe partij
Uitgefaseerd
CLIEOP
APRO
CLIEOP
In/ExCasso
Xafax Kassa
VMSII
Kostenplaats bij medewerker
Bestellingen
Web-cats
Web-planon
Maggy
SURFspotToken
Persoonsgegevens
OFA
Windesheim.nlAanmeldingen
Edutel
Printkosten
Medewerker
Netstorage
Vacman
Portaal
Contactgeg.
Keuze modules
Roosters
Tentamen ins.
Winkel
Forum
Net
Inschrijving
Roosters (5 * /jaar ruimteplanning)
Inschrijvingen
Keuzemodulen
Rooster Info +Medewerker code
Voorraad
BorrelKoffie-rekening
Caso
HBO-Raad
EducaatPersoneelgegevens
Salarisgegevens
60
Personeelgegevens
Journaalposten Tijd reg.
P-nummer
ExcelHuisvesting
Kostenplaatstransformatie
Totaaloverzicht
Postregistr.Portokosten
xls
PinkRoccade
Cash mgmtCredit/Debit
Activa
Noetix
Arbo
Persoonsgegevens
Inzetgegevens
Bais ExcelStudent gegevens
OpenUniversiteitVU
SLA-BaseConfiguratieItems / afdeling
Ccerum
Aanvrager
Decaan
Psycho
Aanvrager
Cijferlijst
PABX
Figure 02: the Windesheim application landscape (Windesheim, 2004)
The figure demonstrates that it is the interfaces between (clusters of) applications causing complexity.
Almost every connection requires manual intervention; therefore every data transfer represents a delay
in business processes. To reduce integration complexity and increase business service levels, in 2005
the implementation of a service oriented architecture was initiated. One of the drivers was that
Windesheim aims to become a near real-time organization. An example of this is given by the
enrollment process: as soon as students are enrolled for a study, access to all campus-wide and study
related student information services is to be granted quickly. Today, this process takes days,
implementing real-time event driven communication patterns is believed to reduce processing time to
minutes, and perhaps mere seconds.
To design and implement the service based interfaces, a System Integration task force is installed.
This task force, currently employing three professionals, is part of the IT department, yet it is
governed by the Windesheim Information Manager.
CIO
IT department Information Management
System Integration
15-Apr-23 F. Boterenbrood Page 1215-Apr-23 F. Boterenbrood Page 12
Research Improving data quality in higher educationThesisImproving data quality in higher education
Figure 03: IT service department and system integration organization
2.2 Business Problem description
2.2.1 Indications
In the past, the system integration task force encountered integration problems, caused by unexpected
and puzzling values in data fields. Triggered by these observations, in 2007 the quality of the database
of one information system was investigated4. The investigation revealed that in cases values of fields
could not be explained, or were used to indicate specific situations. Business rules defining these
situations and explaining the odd values were not documented. As a result, accounting of costs of
facilities delivered was unsure at best.
Upon completion, operations had corrected the issues found and Windesheim was advised to
document business rules, formalize data management accordingly and implement a closed-loop data
quality process.
Surprisingly, shortly after this result was reached, the integration team encountered the same errors all
over again. And in addition to these existing issues, every new data source added to the integration
architecture introduced new and unexpected data quality problems5. Issues found today are (but not
limited to):
Enrolment of students results in duplicate accounts;
Painful mistakes like sending notifications to deceased students;
Due to database corruption, management reports are rendered useless;
Sometimes fields contain text-strings stating that ‘Debbie has to solve this problem’;
Names of students are completely missing, student addresses are incorrect, information is entered
in wrong fields;
Location (room) numbers are missing or contain special, unexpected codes;
Data is outdated or is valid in / refers to different time periods between information systems;
It was found that at least in one instance, lack of data quality caused a class to be scheduled in a
stair case6.
2.2.2 Consequences
Consequences of errors in data may be severe. In Enterprise Knowledge Management, David Loshin
binds the election problems in 2000 in Florida, USA directly to poor data quality (Loshin, Enterprise
Knowledge Management, the data quality approach, 2001). Loshin identifies an operational, tactical
and strategic impact domain suffering from poor data quality.
In the operational domain, costs are associated with detection, correction, rollback, rework, and
prevention of errors, warranty, reduction of business and loss of customers (Loshin, Enterprise
Knowledge Management, the data quality approach, 2001).
In the tactical and strategic domain, decisions may be delayed or based on external or alternative data
sources, hampering change processes. Business opportunities may be lost, business units get
4 Adviesbrief gegevenskwaliteit database facility office 2007
5 See appendix 1: Interview report system integration team Windesheim 2009
6 Fact Finding Roostersysteem Windesheim.doc, versie 1.2, 26 september 2007
15-Apr-23 F. Boterenbrood Page 1315-Apr-23 F. Boterenbrood Page 13
Research Improving data quality in higher educationThesisImproving data quality in higher education
misaligned, management looses confidence in their management information systems (Loshin,
Enterprise Knowledge Management, the data quality approach, 2001).
2.2.3 Business Problem
The initial problem, triggering this research, is that a lack of data quality threatens the implementation
of a service oriented architecture. The main business problem is that poor data quality inhibits
Windesheim’s ability to become a near real-time organization.
Looking further, and along the lines of David Loshin’s observations, in both the operational and
tactical domain areas where poor data quality has an impact on Windesheim’s business goals may be
identified:
Operational domain:
1. Today, students expect any organization encountered to be a real-time organization. Banking,
insurance companies, web shops, they all offer near zero-latency business services. So why can’t
Windesheim? Not being able to live up to modern expectations may cause Windesheim to obtain
a reduced score on rankings published by the HBO-raad7. A reduced score may students decide to
go and study elsewhere, (loss of customers) resulting in a demise of income.
2. Currently, batch files transferring data between applications are checked manually on a daily
basis. And yet, from time to time errors are propagated between applications. Detection,
correction, rollback and rework associated with poor data quality cause serious overhead,
reducing the organization’s efficiency.
3. Poor data quality is a cause for mistakes in Windesheim’s external relations. Some mistakes are
more painful than others, yet all of these mistakes cause damage to Windesheim’s image of being
a trustworthy knowledge partner in the region. This may be a cause of business opportunities
being lost. And even if this is lesser the case, being an institution largely funded by public money
Windesheim has a responsibility to be precise and correct in interacting with customers,
constituents and society in general.
Tactical domain:
4. Business intelligence retrieved from questionable data is uncertain at best. As a consequence,
Business Activity Monitoring is hampered, which in turn means that the margin of error in daily
processes is unknown. It also means that monitoring progress on achieving business targets is
influenced as well.
2.3 Cause analysis
The initial research in 20078 did have a narrow scope on exploring data quality. The research was
confined to exploring data quality in only one application. However, the application observed
supported (and still supports) facility management, in education a very relevant secondary process,
directly supporting and influencing education itself. And secondly, having a rather narrow scope, the
research dug very deep into the problem, extracting interesting conclusions from the application’s
database. Based on this research, technical and process design causes were identified. However,
organizational causes remained untouched. Therefore, in this paragraph, technical/functional and
7 HBO-Monitor, http://www.hbo-raad.nl/onderwijs/kwaliteit
8 Adviesbrief gegevenskwaliteit database facility office 2007
15-Apr-23 F. Boterenbrood Page 1415-Apr-23 F. Boterenbrood Page 14
Research Improving data quality in higher educationThesisImproving data quality in higher education
process design causes found are mentioned, and organizational causes are explored deeper. At the end
of this paragraph, a summary is presented.
2.3.1 Technical / functional causes
The first observation was that the COTS9 application used for supporting facility management was a
complicated one indeed. It was found that, to implement specific requirements special database fields
were used, for which the application offered no input checks. Therefore, the content of these fields
was dependent on input being checked on correctness manually. It was found that in cases those fields
were used to signal special situations, i.e. they were overloaded (Loshin, Enterprise Knowledge
Management, the data quality approach, 2001). In other cases, values were missing or inexplicable.
The investigation revealed that the database structure of the application was not fully utilized.
As a result, in many cases checks on correctness and consistency were not present, allowing errors in
input data to exist. Not only manual input caused flaws in data quality, processing batch files received
from adjacent applications introduced errors as well.
In the Windesheim application landscape, business objects have different names and formats between
applications. A course for instance is named course, module, (variant) onderwijs eenheid, or vak. In
various applications data with respect to a course is entered, enriched, updated and transferred to the
next application. The dispersed nature of the underlying information landscape obstructs the actual
view on the current status of a course (Windesheim, 2004).
2.3.2 Process design causes
What was found during the initial research, was that operations (functional beheer) did have a very
narrow view on its scope. Instead of using the applications standard reporting facilities, some reports
were created using self made front-end applications, compensating for (correcting) errors in data. In
storing and processing data, business rules were known and applied, yet not documented. To prevent
input errors, specialized personnel only were allowed to perform certain tasks.
In general, data management was found to be characterized by a departmental view, lacking a more
holistic view on (the role of data in) the Windesheim business process. Within the boundaries of the
individual department, measures were taken to compensate for the lack of data quality, hiding the
issue for local management: ‘My data are OK’. As a result, technical issues are not dealt with, since to
management, they are invisible.
2.3.3 Organizational causes
Why is it that, when research reveals that bills send for facilities delivered are unsure at least, the
news is seemingly accepted in stoic fashion? Does Windesheim management have a disregard for
accountability? That is not very likely. To understand the position of Windesheim on information
processing, a historical view is needed.
In 1986, Windesheim university started as a merger of 12 regional institutions in higher education
(Broers, 2007). At the time, in order to gain support for the merger, an agreement was made that
management of faculties and facilities was to be decentralized (Broers, 2007). It took a management
crisis in 1992 for the new institution to realize that the benefits of a fusion could be harvested only if
old individual values are replaced by new, common goals. A more centralized model is introduced in
9 Commercial Off The Shelve
15-Apr-23 F. Boterenbrood Page 1515-Apr-23 F. Boterenbrood Page 15
Research Improving data quality in higher educationThesisImproving data quality in higher education
1995, staff and technologic support are organized in centralized service centres. In the years that
followed, the walls that divided the once so mighty and independent faculties were steadily reduced,
while the independence (and size) of the service centers grew. (Broers, 2007).
2.3.4 Growing pains
In 1979 Nolan argued that an organization and its use of information has to grow in maturity (Nolan,
march-april 1979). Nolan defined 6 stages (initially 4) of maturity. In his vision, no stage could be
skipped. In every stage, a predictable type of crisis would signal the transition to the next stage. In a
recent publication, Architecture and Demassing took the place of the original columns 5 and 6 (Data
Administration and Maturity) (Tan, 2003).
Technology drivenFocus on costs
I nformation drivenFocus on eff ectiveness
Data Processing Era I nformation Era
initiation contagation control integration architecture demassing
Limited number of stand alone inf ormation systems
I ncrease of number of systems, introducing simple (hard wired) connections
Cluster of systems, organized by task, characterized by incompatibletechnologies and data f ormats
Redesign of systems aimed at corporate wide integration
Development of systems supporting external partners
Complete integration of internal and externalsystems of business units and partners
Figure 04: Nolan’s stage model
When we take a closer look, it can be argued that currently the Windesheim faculties are well on their
way into the integration phase, or in broader perspective, higher education is transferring from the
data processing era to the information era. This can be observed by institutions striving for integration,
developing shared minors, i.e. education crossing the borders of a faculty.
However, the supporting service centers are still lingering in the control phase, which is indicated by
an ongoing isolated view on information systems10. The Windesheim application landscape still is
very task oriented, with separate systems supporting individual business functions. Not the integrated
business process is observed, focus lies on support of individual tasks.
With faculties tearing down their walls and service centers still staying put, the organization is in
danger to lose balance. It is foreseeable that service centers will have to make a transition from the
control stage to the integration stage as well. In 1992, in Germany Richard P. Marble applied Nolan’s
stage model on the transition East-German industry went through, and described the transition as
follows:
“… management realizes a need to emphasize central planning. The attention that the
computer resource finally receives leads to a change in management thinking – now
regarding their task as one of managing the data resources of the organization, not the
computer resources.” (Marble, 1992)
With this transition, a firm crisis is to be expected. This should not be seen as a loss of control, but
merely a change of paradigms. During this change of paradigms, organizations take a step back and
10 As shown in Figure 2: the Windesheim application landscape (Windesheim, 2004)
15-Apr-23 F. Boterenbrood Page 1615-Apr-23 F. Boterenbrood Page 16
Research Improving data quality in higher educationThesisImproving data quality in higher education
have to rethink many of their existing strategies and principles. This backwards movement is called a
discontinuity (Zee, 2001).
In figure 5 Van der Zee extends the Nolan model with a third era (the network era) accompanied by
two crises: a technological discontinuity and an organizational discontinuity.
Figure 05: Era’s and discontinuities, (Zee, 2001)(Zee, 2001)
The technological discontinuity is observed by English:
“… many CIOs, by and large are NOT Chief Information Officers — but Chief Information Technology
Officers. Falling into the techno-trap of believing their job was to put in place the information technology
infrastructure, their job was then to build or acquire and deploy hardware, networks, and applications,
period. Few CIOs saw and understood the Information-Age paradigm. … The Information-Age purpose
of the CIO has always been to deliver Quality, Just-In-Time Information” (English, 2009).
Currently, Windesheim is exploring Nolan’s phase 4, and with it, the technological discontinuity crisis
must be overcome. In this discontinuity, van der Zee (2001) places the focal point on both technology
and organizational changes, such as the emergence of IT-governance, with the need for ICT to be
present in the boardroom (represented by the CIO) (Zee, 2001).
2.3.5 Perspective
It seems that at Windesheim, a perceived data quality problem is not merely a technological issue, nor
is it an issue of just getting the processes right. If this was the case, the earlier data quality project
would have had more lasting results. It seems to be a case of aiding Windesheim’s service centers
through the technological discontinuity.
2.3.6 Past, current and future situation
Are the changes Windesheim is currently experiencing a phase that will quickly blow over, or are they
part of a greater scheme? Will Windesheim continue to grow in maturity or is it likely that the
organization will experience a fall-back into the control-stage again? To find out what is going on, not
only the recent past of Windesheim is of interest, but the whole picture of Higher Education in Europe
needs attention.
15-Apr-23 F. Boterenbrood Page 1715-Apr-23 F. Boterenbrood Page 17
Research Improving data quality in higher educationThesisImproving data quality in higher education
Europe not being divided by national borders and Latin being the language of choice in medieval
universities, medieval universities attracted Wanderstudenten from all over Europe:
‘Until the eighteenth century the European university was an European institution, reflecting European
values of intellectual freedom and of a borderless community’ (Vught & Huisman, 2009).
This all changed when territorial states arose, installing national frameworks.. From the eighteenth
century up until the dawn of the twenty-first century, national borders and policies effectively resulted
in ‘national science’. It was not until the 1980’s before the first EU policy initiatives appeared. In the
second half of the 1990’s, a myriad of programmes and declarations were spawned, aimed on ‘making
Europe the most competitive and dynamic knowledge based economy in the world’ (European
Council, Lisbon, 2000) and to create ‘the European Higher Education Area’ (Sorbonne Declaration
1998 & Bologna Declaration 1999). Currently, 46 European nations are involved in this process,
including the Netherlands. (Vught & Huisman, 2009).
This process causes a landslide in the area of hogescholen (universities of applied science). The clear-
cut distinction between hogescholen en universiteiten has begun to blurr since hogescholen started to
offer both Bachelor and Master degrees, and started conducting scientific research11, where previously
only Bachelor degrees were offered and scientific research was strictly reserved for universities. But
more importantly, a search for transparency was spawned:
‘This, coinciding with increasing pressure from professional organizations and external regulatory
bodies to control what was being taught …. led towards the standardization of curricula’. (Vught &
Huisman, 2009)
In the future, it is to be expected that a generally defined common (curriculum and study logistics)
framework will both ensure transparency and yet acknowledge diversification (Vught & Huisman,
2009).
How does this all translate to Windesheim? The Five Forces model of Porter may help finding an
answer to that question. The Five Forces model of Porter is an outside-in business unit strategy tool
that is used to make an analysis of the attractiveness (value...) of an industry structure (Porter &
Millar, 1985). When we project this model on Windesheim, we find the following forces shaping an
institution like Windesheim:
First of all, institutions compete with each other for the attention of the student (Rivalry among
existing firms). This is shown by the constant attention of institutions for quality statistics
released by the HBO-raad.
Secondly, the student has a great deal of influence (Bargaining Power of Buyers). Constantly, his
opinion about the quality of education is measured and published and in response, courses and
schedules are revised. As a result of European and National developments, students are highly
mobile, strongly increasing their Bargaining Power.
At third place, commercial ‘substitutes’ do exist. Commercial organizations offer grades that rival
recognized titles. For instance, in IT employees owning Microsoft certificates are in high demand,
rivaling employees owning bachelor or master degrees.
11 By means of the lectorate. (HBO-raad Lectorenplatform, 2006)
15-Apr-23 F. Boterenbrood Page 1815-Apr-23 F. Boterenbrood Page 18
Research Improving data quality in higher educationThesisImproving data quality in higher education
And finally, potential entrants (like DNV-CIBIT) fill in niche markets. Although the titles they
offer are internationally recognized, the courses they offer do not fit governmental approval and
therefore are not subsidized.12
Windesheim, being a university of professional education, is in the midst of this turmoil. Windesheim
faculties are aligning themselves with European strategies, starting with implementing a Minor/Major
educational model, jointly developing minors and even offering those minors to students of other
institutions, (trying to ‘lure’ them away) for some minors introducing English as the general language
used in classes. It seems that the wanderstudent is re-instated, but this time in unmatched masses,
forcing the institution to synchronize education in an international setting and trying to be as attractive
as possible.
Pan-European developments force institutions to prepare for intimate inter-institutional cooperation.
In this volatile environment, Windesheim does not have the luxury not to grow in maturity.
2.3.7 Summary
In this chapter, what has been found is that:
Windesheim strives to become a near zero latency organization;
Surprising errors hamper technical initiatives to implement near zero latency business process
technologies;
These errors are caused by poor data quality;
Closer examination reveals a serious business impact of poor data quality, which is defined by
student (customer) dissatisfaction, inefficient process execution, loss of image and loss of control;
Poor data quality is caused by applications not checking input values, and information objects
having different values and definitions in different business domains;
Which in turn is caused by a departmental view on data instead of a more holistic business
process wide view on information;
International developments force Windesheim to grow in maturity, migrating from the data
processing era to the information era;
As part of this migration, a natural crisis, the technological discontinuity has to be overcome;
In this crisis, the organization is to develop a holistic view on information.
2.4 Research Problem
In the past, on technical, functional and process design level, causes of data quality issues have been
identified and countermeasures have been described. This vision needs to be extended by exploring
the relation between structures defining maturity and data quality within the context of a Dutch
institution of higher education, in particular Windesheim, and even more precise, the Windesheim
service centers. In this, the focus is on crossing the border between the control and integration stage in
Nolan’s stage model (Nolan, march-april 1979) (Tan, 2003), overcoming the technological
discontinuity (Zee, 2001).
Extending the technical / functional vision on data quality does raise a myriad of questions. What
impact on data quality will overcoming the technological discontinuity have? Will a growth in
maturity be enough to solve the data quality issues identified? What exactly does ‘growing in
12 Interestingly, the fifth force, Threat of new entrants is rather unknown to education. The emergence of new institutions is highly regulated and care is taken for new institutions not to compete with existing institutions in the region.
15-Apr-23 F. Boterenbrood Page 1915-Apr-23 F. Boterenbrood Page 19
Research Improving data quality in higher educationThesisImproving data quality in higher education
maturity’ mean? What are consequences for the organization of Windesheim? Do consequences found
align with Windesheim’s strategic developments? What will the response of Windesheim’s
management be? What arguments will spawn interest in improving data quality? Is there a danger of
falling back into the comfort of the data processing era?
By extending the research beyond the technical and functional domain, the research enters the domain
of information as a subject of organizational and political forces, and using information as a strategic
instrument. It has become a problem of strategic alignment. The research problem may therefore be
summarized as:
At Windesheim, what defines the border between the control and integration stage? What are positive
and negative correlations between structures defining organizational maturity and attributes defining
data quality, enabling Windesheim to become a near zero-latency organization?
2.5 Stakeholder Analysis
Stakeholder Role Concern Relation to the problem
Board To set and guard
Windesheim’s strategy
Control on finance, quality
of the institution and
strategy.
Alignment of the institution
with national and
international developments.
Loss of image and loss of students will
hint loss of control.
Inefficient business processes impose a
financial drain on the organization.
Changing from a localized to an integral
view on data may be a cause of concern.
Information
Manager
To implement and
guard a coherent view
on information
Correctness of data Poor data quality may cause inefficient
business processes.
Science To extend the human
knowledge base
Validity and Reliability of
knowledge
New knowledge may be discovered,
existing theories validated
Security
Manager
To prevent
unauthorised disclosure
or manipulation of
information
Availability, Integrity,
Confidentiality of data
Poor data quality obstructs integrity and
availability
CIO To safeguard
undisturbed and
reliable information
delivery in business
processes
Secure and Correct use of
data. Enabling future
change.
Poor data quality may cause inefficient
business processes, loss of image and
loss of students
Management To implement change
and control daily
business processes
Budgeting and
Effectiveness, (Baida, 2002),
Reliability of data
Poor data quality cripples effective,
reliable management.
Changing from a localized to an integral
view on data may be a cause of concern.
Students To be educated Findability, Security,
Reliability, Availability,
Timeliness
Poor data quality cause student names to
be misspelled or missing altogether,
resulting in loss of trust.
15-Apr-23 F. Boterenbrood Page 2015-Apr-23 F. Boterenbrood Page 20
Research Improving data quality in higher educationThesisImproving data quality in higher education
Staff To educate Security, Reliability,
Timeliness
Poor data quality results in students
complaining, and complicated
registration & planning processes.
Operations To ensure operational
IT
Manageability, Correctness
of data
Poor data quality cause applications to
abort and time spent on debugging
Functional
Support
To ensure operational
applications
Correctness of data Poor data quality leads to manually
identify, correct and rollback errors
daily.
System
Integration
To ensure near real-
time service-based
system integration
Correctness and availability
of data
Poor data quality cause application
interfaces to abort and time spent on
debugging
Table 01: Stakeholder analysis
For management (Board, CIO, general management, information management) solving the data
quality problem will be based on a cost/benefit assessment. Operations and Functional Support will be
willing to participate in solving the problem, if care is being taken where personal interests are
involved.
Figure 6 presents a graphical representation of stakeholders and their relation to the proposed data
quality research project.
InformationManager
FunctionalSupport
Staff
Student
Operations
Board
ManagementSecurityManager
CIO
Science
SystemIntegration
Figure 06: project stakeholders
Stakeholders being committed to this project are the CIO, Information Manager and Science.
The CIO and Information Manager are financier and constituent of this research respectively.
15-Apr-23 F. Boterenbrood Page 2115-Apr-23 F. Boterenbrood Page 21
Research Improving data quality in higher educationThesisImproving data quality in higher education
(To be) involved in the project are (members of) functional support, operations, system integration
and the security manager, since the results of this research are likely to be of direct interest to these
stakeholders and because of specific knowledge within these groups.
Management holds a somewhat special place. IT management is likely to be involved, other
management may be affected. Other stakeholders affected by any advise resulting from this research
are students, staff and board.
2.6 Project Relevance
2.6.1 Stakeholder Relevance
Relevance for stakeholders is discussed in the previous paragraph.
2.6.2 Business Relevance
Currently, education at Windesheim is embarking on a journey towards a higher level of maturity and
service centers have to join this movement. However, the destination of this journey is not clear for
everyone, and for others the road ahead is unknown. This research will shed light on this migration,
by offering knowledge on what Windesheim might look like when data processing is replaced by a
more integral view on information. In the long run, this paradigm shift will enable Windesheim to stay
in sync with (inter)national processes. In the short term it will increase efficiency, student satisfaction
and management control and prevent loss of image.
2.6.3 Relevance to Science
In the field of data quality, many publications, services and even tools are available. Publications look
at data quality from a technical point of view, suggesting valid input checks and database constraints
as a solution . Business processes are recognized to be part of the equation too, and efforts are made to
point out that processes need to be implemented as a closed loop, automatically correcting errors
(Batini & Scannapieco, 1998) (Loshin, Enterprise Knowledge Management, the data quality approach,
2001) (McGilvray, 2008) (Lee, Pipino, Funk, & Wang, 2006). However, in the field of education,
academical research binding (loss of) business data quality to business maturity has not been
identified.
The US national center for educational statistics has set up a data quality task force, offering advice to
members of staff of an educational institution to create a Culture of Data Quality (Data Quality Task
Force, 2004). This publication is aimed at the field of statistics, and underlying research is unknown,
yet recommendations presented by the report may prove useful.
One research dealing with data quality in e-business has been found (Data Quality and Data
Alignment in E-business) (Vermeer, 2001). The research defined a context for data quality, and
established a formal relation between data quality in EDI messages and business process quality.
Finally, the research presented a method for establishing data quality in business-chains: DAL (Data
Alignment through Logistics) (Vermeer, 2001). The definition of the context of data quality and its
relation to business process quality delivers strong support for the research at hand.
15-Apr-23 F. Boterenbrood Page 2215-Apr-23 F. Boterenbrood Page 22
Research Improving data quality in higher educationThesisImproving data quality in higher education
3. Conceptual Research Design
3.1 Theoretical approach and focus
3.1.1 Focus
The field to explore as defined by the Research Problem is broad. This research will focus on
identifying the relation between organizational maturity and the required level of data quality, as this
has been identified as the root cause of the business problem at hand.
3.1.2 Maturity revisited
Before enthusiastically embarking on a journey into the unknown, can additional proof be obtained,
pointing towards a link between data quality and organizational maturity?
In “Data Quality and Data Alignment in E-business” Ir. Bas H.P.J. Vermeer (2001) addressed issues
resulting from distributed data management:
“…..two problems arise in a multiple database situation: a translation problem and a distribution
problem.
The translation problem arises because the same fact may be differently structured at different
locations. Therefore, schema translation is necessary to map the structure of the source schema to the
structure of the manufacturer’s schema. This results in a mapping schema between the source schema
and the receiver’s schema that is used every time a fact in the source database is updated.
The distribution problem arises because each fact update is first translated and then transported over a
network to a limited set of users, where it is finally interpreted and stored in the receiver’s database.
During translation and interpretation, mapping errors may occur, which results in loss of data quality.
During transportation, the data may get delayed, damaged, or delivered to the wrong recipient,
resulting in inconsistencies among different locations.” (Vermeer, 2001).
Thus, having a localized view on data, distributing and transforming data objects throughout an
application landscape, introduces a translation and a distribution problem.
Then, why not develop a single, integrated view on data? Why not just implement an ERP package?
Dale L. Goodhue et al (sept 1992) question the common believe that data integration always results in
positive benefits for any organization. It was shown that creating one integrated solution is simply not
feasible in many organizations. Data integration may have positive effects in terms of improved
efficiency where subunits are highly aligned. Yet in unstable, volatile environments striving for data
integration will not result in tangible benefits:
“…This model suggests that the benefits of data integration will outweigh costs only under certain
circumstances, and probably not for all the data the organization uses. Therefore, MIS researchers and
practitioners should consider the need for better conceptualization and methods for implementing
‘partial integration’ in organizations” (Goodhue, Wybo, & Kirsch, sept 1992).
15-Apr-23 F. Boterenbrood Page 2315-Apr-23 F. Boterenbrood Page 23
Research Improving data quality in higher educationThesisImproving data quality in higher education
Conclusions:
In terms of data quality, it is best if there is only one single view on corporate wide data
definitions in existence;
Only organizations who are able to successfully align their subunits are likely to achieve business
benefits from data integration;
Even with alignment, complete data integration is not likely to be achieved.
Even without striving for data integration, the research done by Vermeer and Goodhue et al hints that
aligning business units (i.e. observing the whole business value chain, instead of localized
departmental processes) is an important prerequisite for achieving improved data quality.
3.1.3 A vision on Maturity.
Maturity may be defined by Stages. (Nolan, march-april 1979). Yet more recent theories tend to
embrace Level as measure of maturity: BPMM (Object Management Group, 2008), CMMI (Software
Engineering Institute, 2009), ISO 15504 / Spice (Hendriks, 2000) where each level is defined by
certain structures. In this research, maturity is defined as an attribute of an organizational process,
organized in maturity levels, defined by certain structures being in place, revered to as maturity
structures.
3.1.4 What is data quality?
Indeed, what is data quality? Even though on quality in general multiple definitions and standards
exists, on data quality this is lesser the case. It seems as if the idea “the computer never lies” still
holds some ground. Even T. William (Bill) Olle, in “The Codasyl Approach to Data Base
Management” (Olle, 1978) did not make any remarks regarding the relationship between data base
management and data quality. Which is remarkable, since a database management system may be
regarded to be the technical guardian of data quality!
Data, business rules and business processes are linked closely together (Besouw, 2009). In about four
decades, in most businesses the data and business rules are materialized in the form of automated
information systems. Those information systems aim to reflect reality as closely as possible. But what
we find in the real world, is that reality is in a constant flux and information systems are trying to
cope. There is a natural gap in time between the situation in reality and the registration of that
situation in an information system. The problem this time lapse introduces was unwittingly recognized
by T. William Olle with respect to the book he has written:
“The time factor is in itself a problem because the CODASYL specifications are changing inexorably as
the years go by. The book reflects as accurately as possible the most recently published specifications at
the time of writing.” (Olle, 1978)
What is true for the written word might be true for information systems too. The struggle of
information systems to stay aligned with reality is one of the topics in ‘De (on)betrouwbaarheid van
informatie’13 (Bakker, 2006). Take for instance the dynamics of the Dutch population:
“According to the CBS, in October 2004 the Netherlands housed 16.258.032 citizen, of which 8.045.914
male and 8.212.118 female……But what makes us believe that we are capable to assess the number of
13 The (un)reliability of information
15-Apr-23 F. Boterenbrood Page 2415-Apr-23 F. Boterenbrood Page 24
Research Improving data quality in higher educationThesisImproving data quality in higher education
citizen with this accuracy? In the year 2000 for instance, 206.619 people were born and 140.527
deceased. At what moment in time was that exact amount of citizen determined? Wait an hour and the
number has changed! “ (Bakker, 2006)14
Bakker not only demonstrated that it is impossible to make a headcount in a dynamically changing
system with a high degree of accuracy, he also argued that in fact, no data at all is ever exactly
correct. When, for instance, one sets off to measure the coastline of Great Britain, one will find that
using precise measurements will result in a considerably longer coastline being measured compared to
the use of coarse methods (Bakker, 2006). And then again, every measurement has a certain degree of
uncertainty, a measurement error. It is simply impossible to measure a physical object exactly.
(Bakker, 2006) Therefore, it is important to establish a threshold, defining the acceptable degree of
uncertainty.
To establish such an threshold and guard the compliance of data quality, the Data Management
Association introduces the Data Quality Management function:
“Data Quality Management – Planning, implementation and control activities that apply quality
management techniques to measure, assess, improve and ensure the fitness of data for use.” (Mosley,
2008)
This definition points the way for the definition of the right threshold: data should be fit for use.
Arvix, a Dutch company dedicated to the improvement of data quality, seems to agree: “The quality
(of data) is closely related to its use” (Arvix, 2009). In addition, Frans Besouw translates fit for use
into the ability of data to support business rules (Besouw, 2009).
In the vision of Arvix, data quality reveals the capability of data to be successfully utilized over a
prolonged period of time (Arvix, 2009). Apparently, fit for use is a measure that is likely to change
over time, as business rules evolve over time. An example can be found in banking. Two decades ago
banks sending us an account transaction overview once a week was regarded acceptable. The most
recent transactions included on this overview were about half a week old, including the account total
shown. A decade ago, private banking customers were enabled to monitor all transactions on-line. On-
line access implies on-time information, and a delay in actuality of not more than one day was seen as
acceptable. Today however, customers are able to monitor their accounts in real time. In the last ten
years, in private banking actuality of information that is perceived to be fit for use has shrunk from
days to minutes.
Quality can be measured. ISO 9126 offers an standard for the evaluation of software quality. An
extension on the ISO 9126 quality standard is the Quint quality model (Zeist, Hendriks, Paulussen, &
Trieneken, 1996). However, these quality standards are aimed at measuring integrated information
system quality. To specifically target data quality in a given situation, in “Kwaliteit van
softwareprodukten, Praktijkervaringen met een kwaliteitsmodel”15 (Zeist, Hendriks, Paulussen, &
Trieneken, 1996), the already extended ISO model was extended even further by adding two new
quality attributes to the Quint model: Database Accuracy and Database Actuality. Verreck, de Graaf
and van der Sanden even express quality of data in terms of attributes. They propose to define quality
as a function of Reliability and Relevance: Q=R2 and redefine this as ‘lasting usability’. (Verreck,
Graaf, & Sanden, 2005).
14 Translated from Dutch
15 Quality of software products, hands-on experiences with a quality model
15-Apr-23 F. Boterenbrood Page 2515-Apr-23 F. Boterenbrood Page 25
Research Improving data quality in higher educationThesisImproving data quality in higher education
3.1.5 A vision on Data Quality
We started off with the discovery that many problems in Windesheim’s IT were caused by poor data
quality. In many publications, data quality is treated as being purely a technological issue. What we
found was that this vision needs to be extended by exploring the relation between structures defining
maturity and data quality. Now we have discovered that data quality is not an absolute value, but a
question of defining the right threshold:
Data is inaccurate by nature;
When data inaccuracy exceeds a certain threshold, quality becomes flawed;
The threshold is defined by data being fit for use;
In general, fit for use can be seen as the ability of data to support business rules;
Fit for use can be operationalized by means of quality attributes;
For every specific situation, appropriate attributes are to be defined;
For these attributes, and therefore for the data quality threshold, what is being perceived as
acceptable values evolves in time, as business rules evolve in time.
3.2 Research Goal
The goal of this research is to contribute to the improvement of data quality at Windesheim by
analyzing the gap between the current and required data quality threshold and corresponding current
and required maturity, identifying positive and negative correlations between data quality attributes
and structures defining maturity.
3.3 Research Model
Current data quality threshold
Current maturity
Theories on Data Quality
Maturity and data quality instrumentTheories on
Maturity of Organizations
Theories onEnsuring Quality in Business Processes
StakeholdersInvolved
Current thresholdand maturity
View on requiredthreshold and
maturity
Advice
a
b
c
d
e
Theories onMaturity of
Business Processes
f
1
2
3
4
Benchmark
Figure 07: Research Model
15-Apr-23 F. Boterenbrood Page 2615-Apr-23 F. Boterenbrood Page 26
Research Improving data quality in higher educationThesisImproving data quality in higher education
An analysis of theories on data quality and maturity, backed by exploring an external implementation
(external benchmark) (a) results in a conceptual model (maturity and data quality instrument) (b),
which will be discussed by an expert group of stakeholders involved16. This will lead to a populated
conceptual model (view on required threshold and maturity) (c). An assessment of the current data
quality threshold and current maturity (d) results in a description of the current situation (e).
Confronting the validated view with the description of the current situation leads to a Gap Analysis
(f).
3.4 Research Questions
The main research questions are found by decomposition of the research model.
3.4.1 Main questions
Observing theories on maturity and data quality, and external benchmarks, what positive and
negative correlations between structures defining maturity and data quality attributes may be
found?
What values of data quality attributes will define the required data quality threshold and therefore
the required maturity structures at Windesheim?
What are the current organizational maturity and current values of data quality attributes?
Finally, the central research question: What is the gap between current maturity structures & data
quality threshold and required maturity structures & data quality threshold in the light of enabling
Windesheim to become a near zero latency organization?
Sub questions are found by examining the chart of concepts used, described in the next paragraph. To
avoid dispersion of research questions, the sub questions are described first, and concepts used later.
3.4.2 Sub questions for main question 1
Both decomposing the main question, and interpreting the embossed part of concepts used (next
paragraph), the following sub questions are found:
1. What structures define maturity?
a. What levels of maturity do exist?
b. What maturity structures in the field of organizational structure, process, technology,
information and staff describe each level?
2. In higher education, what positive and negative correlations between maturity and data quality
may be found?
a. For this research, what is the relevant set of business rules?
b. How will this set of business rules evolve in time?
c. What data quality attributes are relevant for these business rules?
d. What values of data quality attributes correlate with each level of maturity?
e. What do process quality theories describe about positive correlations between quality and
maturity?
f. What do process quality theories describe about negative correlations between quality and
maturity?
g. Are those observations consistent?
16 As identified in stakeholder analysis: figure 6
15-Apr-23 F. Boterenbrood Page 2715-Apr-23 F. Boterenbrood Page 27
Research Improving data quality in higher educationThesisImproving data quality in higher education
3.4.3 Sub questions for main question 2
1. To support the business rules identified earlier, what values should data quality attributes have?
2. What level of maturity is required to enable those data quality attribute values?
3. What organizational structure, process, technology, information and staff criteria define the
maturity found?
3.4.4 Sub questions for main question 3
No further decomposition is required.
3.4.5 Sub questions for main question 4
1. What is the gap between the current and required organizational structure, process, technology,
information and staff criteria?
2. What conclusions and recommendations may be derived from this gap?
15-Apr-23 F. Boterenbrood Page 2815-Apr-23 F. Boterenbrood Page 28
Research Improving data quality in higher educationThesisImproving data quality in higher education
3.5 Concepts used
Correlation
Between what?
Defined by
Described by
Maturity
Data Quality
MaturityLevels
Data QualityAttribute
Values
OrganizationalMaturityTheories
ProcessMaturityTheories
Process QualityTheories
Data QualityTheories
StructureCriteria
Systems
StaffCriteria
ProcessCriteria
TechnologyCriteria
InformationCriteria
Fit for Use
Time
Business RuleSupport
Figure 08: Concepts Used
The main concept in this research is that there is a correlation to be discovered between
Organizational Maturity and Data Quality. A quick scan of BPMM (Object Management Group,
2008), CMMI (Software Engineering Institute, 2009), ISO 15504 / Spice (Hendriks, 2000) reveals that
maturity levels seem to include criteria related to Structures, Systems and Staff of McKinsey’s 7-
factor model (Pascale, Peters, & Waterman, 2009). Processes, Technology and Information criteria all
define the Systems factor. At this stage, the Systems factor may be expected to offer a link between
maturity (information quality) and data quality attribute values.
Data Quality Attribute Values are fit for use if they offer support for business rules, a condition which
evolves in time.
The correlation may be derived from organizational maturity theories, process maturity theories,
process quality theories (six sigma, www.sixsigma.nl) and data quality theories. Process quality
theories are expected to offer a second link between maturity and data quality. A link between process
quality and process maturity has already been identified (Gack, 2009).
At this point, it is assumed that a certain level of maturity is defined by a set of structure, process,
technology, information and staff criteria. It is also assumed that information criteria and data quality
15-Apr-23 F. Boterenbrood Page 2915-Apr-23 F. Boterenbrood Page 29
Research Improving data quality in higher educationThesisImproving data quality in higher education
attribute values can be linked, and that data quality theories will support the links found. These
assumptions are to be validated in this research.
15-Apr-23 F. Boterenbrood Page 3015-Apr-23 F. Boterenbrood Page 30
Research Improving data quality in higher educationThesisImproving data quality in higher education
4. Technical Research Design
4.1 Research Material
15-Apr-23 F. Boterenbrood Page 3115-Apr-23 F. Boterenbrood Page 31
Research Improving data quality in higher educationThesisImproving data quality in higher education
Research questionResearch Object Source Retrieving Method Comment
What levels of maturity do exist?Maturity Literature Desk Research Much has been published on this topic
What structures in the field of organizational structure, process, technology, information and staff describe each level?Maturity Literature Desk Research Much has been published on this topic
At this moment, which business rules are affected by lack of data quality?Affected Windesheim Business Rules Stakeholders: operations,
integration teamInterviews Integration team and operations has latent knowledge on
business rulesWindesheim Documentation
Desk Research On data quality and Windesheim Business rules research had been done already
What data quality attributes are relevant for these business rules?Relevant data quality attributes Stakeholders: operations,
integration teamInterviews Integration team and operations has latent knowledge on
business rules and required data qualityLiterature Desk Research
What values of data quality attributes correlate with each level of maturity?Correlation between maturity levels and data quality attribute values
Literature on maturity and literature on data quality
Desk Research Some research indicating a link between quality and maturity has been identified already
Publications and research Desk ResearchExternal specialists Interview At Arvix, a company specialised in data quality, interest for this
research may be raised. Dr Theo Thiadens, lector ICT Governance at Fontys, has agreed upon an interview already
What do process quality theories describe about positive correlations between quality and maturity?See previous question Literature on process
qualitySee previous question
See previous question
What do process quality theories describe about negative correlations between quality and maturity?See previous question See previous question See previous
questionSee previous question
Are those observations consistent?Results from previous questions are compared and analysed
None Analysis
To support the business rules identified earlier, what values should the data quality attributes have?Required data quality threshold Workshop Stakeholders
involved (figure 6)
What level of maturity is required to enable those data quality attribute values?Maturity required Correlation found will be
usedAnalysis
What structure, process, technology, information and staff criteria define the maturity found?Required values of maturity elements
Theories described earlier Substitution
What are the current organizational maturity and current values of data quality attributes? Operational values of maturity elements and data quality
Stakeholders: operations, integration team
Interviews Observing both maturity and data quality improves reliability
Windesheim Documentation
Desk Research
What is the gap between the current and required structure, process, technology, information and staff criteria?Results from previous questions are compared and analysed
None Analysis
What conclusions may be derived from this gap?Result from previous questions is analysed
Theories identified earlier Analysis
What recommendations may be defined?Result from previous questions is analysed
Theories identified earlier Analysis
Table 02: Research Material
15-Apr-23 F. Boterenbrood Page 3215-Apr-23 F. Boterenbrood Page 32
Research Improving data quality in higher educationThesisImproving data quality in higher education
4.2 Research Strategy
4.2.1 Strategy
This research is characterized by a grounded theory approach, based on desk research. To improve
reliability and validity, a survey was conducted, by interviewing specialists in the field and within
Windesheim. The subjects covered by the survey were maturity levels, process quality elements, data
quality attribute values and the correlation between them. Interviewees were presented with
statements and conclusions derived from publications and literature, and asked whether these are in
line with their experience, using examples of real-world situations. The results were used to validate
the hypothesis that maturity structures and data quality are related.
External participants were chosen based on their expertise on (dealing) with data quality in general
and maturity. Internal participants in interviews and the workshop were chosen based on their
experience with data quality in the business domain, both from the viewpoint of operations and user
departments. Care was taken to include participants from a department where data quality was
perceived to be troublesome and a department where data quality issues were perceived to be
successfully resolved.
4.2.2 Reliability
To reliably discover a relation between variables (i.e. data quality and maturity structures) a
quantitative approach is required. This research however, was qualitative of nature. Multiple theories
on maturity and quality were discussed and balanced. The results were cross checked by means of a
survey amongst specialists. Population of quality attribute values was performed by a workshop
involving Windesheim specialists, enabling them to reflect on the process and results. The rigor of the
study and triangulation ensure reliability. However, results are less detailed compared to results
gained from a quantitative approach.
4.2.3 Validity
In this project plan, it has been found that multiple theories point towards a required gain in maturity.
It is therefore a valid approach to look for a relation between data quality attribute values and maturity
structures. In this research, literature and publications of theories and research were explored to
validate this hypothesis. This relation was discussed by specialists in a limited survey. Building on
multiple, accepted sources, reflection on results acquired and open discussion ensure internal validity,
while applying the grounded theory approach ensures external validity.
4.2.4 Scope
This research explored the gap between required and current maturity at Windesheim. This gap
analysis is focused on a specific business domain: study management. This business domain is chosen
in close cooperation with the CIO and the Information Manager. The main goal of study management
is to manage major, minor and course definitions, present those definitions to other business domains
like scheduling & study planning and to manage study progress.
15-Apr-23 F. Boterenbrood Page 3315-Apr-23 F. Boterenbrood Page 33
Research Improving data quality in higher educationThesisImproving data quality in higher education
5. Research Execution
This chapter presents the observations achieved by executing the research according to the research
plan.
Multiple maturity models defined in publications and literature are compared. After combining,
normalizing and transforming the results, the Windesheim Data Quality Maturity model WDQM is
created. Dimensions of data quality are explored, leading to the description of the relation between
data quality maturity levels and data quality dimensions and attributes. Business rules are harvested
from Windesheim business and IT documents focused on the Windesheim business domain of study
design, education, assessment and grading. Based on these business rules, best fitting data quality
attribute values are defined, leading to an analysis of the required data quality maturity.
5.1 Correlation between data quality and maturity
The next paragraphs explore the first research question: what positive and negative correlations
between structures defining maturity and data quality attributes may be found? To find this relation,
theories on maturity and data quality are explored.
5.1.1 Maturity, a brief history
The first effort to formalize a maturity model was triggered by problems occurring with delivering
complex software systems for the US Department of Defense (DoD), mainly in connection with the
Strategic Defense Initiative (SDI). Originally, the Capability Maturity Model (CMM) was developed
as a tool to assess software suppliers. Development started in 1986 at the Software Engineering
Institute (SEI) of Carnegie Mellon University and led to the Software Process Maturity Framework in
1987. In 1991, this resulted in the publication of CMM as the Capability Maturity Model v.1.0. Based
on experience with the use of this model, a new version 1.1 was published in 1993 (Kneuper, 2008).
The five-stage maturity model immediately got to the attention of developers worldwide. In 2002,
Brett Champlin, senior lecturer at Roosevelt university, counted over 120 maturity models, all derived
from or inspired by the initial CMM (Champlin, 2002). To integrate multiple viewpoints, in 2000 the
Capability Maturity Model for Integration (CMMI) version 1.0 was published. This model was
developed even further, resulting in CMMI version 1.2 in 2006, offering three constellations which
extend the area of applicability of CMMI to development (CMMI-DEV), acquisition (CMMI-ACQ)
and services (CMMI-SVC) (Kneuper, 2008).
5.1.2 Maturity levels
CMM, its successor CMMI and their derivatives are based on common structures, the most well-
known of which perhaps is the definition of Maturity Levels introduced by Crosby (1980).
Currently, five levels are agreed upon (Kneuper, 2008) (Curtis, Hefley, & Miller, 2009):
1. Initial, no structures are in place at all. Activities are performed on an ad-hoc basis;2. Managed, processes are characterized by the project;3. Defined, processes are defined by the organization;4. Quantitatively managed, processes are measured and controlled;5. Optimizing, focus is on continuous process improvement.
15-Apr-23 F. Boterenbrood Page 3415-Apr-23 F. Boterenbrood Page 34
Research Improving data quality in higher educationThesisImproving data quality in higher education
Some maturity models recognize the five-level structure, yet assign different labels. An example are is
Master Data Management (Loshin, Master Data Management, 2008) in which the levels are labeled 1
initial, 2 reactive, 3 managed, 4 proactive, 5 strategic performance successively. In Automotive Spice,
6 levels of maturity are recognized, starting at level 0 (0 Incomplete, 1 Performed, 2 Managed, 3
Established, 4 Predictable, 5 Optimizing) (Hoermann, Mueller, Dittmann, & Zimmer, 2008). This
seems to compensate for criticism that the step between CMMI level 1 and CMMI level 2 is too big
(Kneuper, 2008). The Organizational Project Management Maturity Model (OPM3) however, skips
the first level initial altogether and four levels remain (SMCI - Standardize, Measure, Control and
continuously Improve) (Project Management Institute, 2008).
In this research however, the level structure of the currently as standard accepted CMMI will be
adopted.
5.1.3 Process Areas
The second important structure is the definition of Process Areas. A process area is a cluster of related
practices in an area that, when implemented collectively, satisfy a set of goals considered important
for making improvement in that area. Examples of process areas are project planning, organizational
training, and causal analysis & resolution (Kneuper, 2008). At maturity level 1, processes are
characterized as ad hoc or even chaotic. Therefore, no process areas are assigned to maturity level 1
(Kneuper, 2008). In successive levels, process areas accumulate. In order to reach managed maturity,
all process areas of level 2 have to be mastered. And all process areas of both levels 2 and 3 have to be
mastered in order to reach defined maturity (Kneuper, 2008).
Each process area is defined by Goals. Goals guide the implementation of process areas within the
context of each stage. For each goal, practices to reach those goals are associated. In total, CMMI
defines up to 48 goals and 512 practices (Kneuper, 2008). In addition, People CMM for instance
identifies 22 process areas, each defined by its own set of goals and practices to reach those goals
(Curtis, Hefley, & Miller, 2009).
This on its own poses a problem. Combining multiple maturity models to identify the relevant
maturity structures in the field of organizational structure, process, technology, information and staff,
may lead to a list of hundreds of process areas, goals and practices. Such a cluster of elements cannot
be analyzed in the time available. An alternative approach is required.
Cabellero and Piattini have created a CMMI based data quality maturity model: Caldea (Caballero &
Piattini, 2003). This model recognizes five maturity levels (Initial, Definition, Integration,
Quantitative Management and Optimizing) and for levels two to five, data quality activities and goals
are defined. This model is aimed at constructing and supporting a Data Management Process within
an organization. At this point, it would be most helpful to simply adopt the Caldea model, implement
the data quality activities and operationalize associated variables. Unfortunately, the Caldea model is
described at a high abstraction level, omitting any implementation details, leaving out specifications
of maturity structures and dimensions. And, since its conception in 2003, many theories on data
quality have been published, incorporating recent developments in IT, not (fully) present at the time
Caldea was described. Therefore, Caldea simply is not specific enough to be directly applicable, and
is likely to be outdated. However, the guidelines Caldea offers, may well lead the way in constructing
a more specific and up-to-date data quality maturity model.
15-Apr-23 F. Boterenbrood Page 3515-Apr-23 F. Boterenbrood Page 35
Research Improving data quality in higher educationThesisImproving data quality in higher education
5.1.4 Identifying relevant process areas
How can maturity structures be identified efficiently without overlooking important elements? The
approach adopted here is to:
1. Identify data quality improvement measures by literature study and interview;2. Assign those measures to organizational structure, process, technology, information and staff,
thus creating a balanced view;3. Assign the resulting set of measures to maturity levels by linking each measure with a specific
process area and/or practice and, again, balance the result.
In the next paragraphs, the results of this approach are presented.
Data Quality Improvement measures
A wide range of measures is discussed in literature, ranging from proper database design to instating
data governance and data quality management.
It may easily be overlooked, yet it makes perfectly sense: when the design is flawed, the system build
according to this design may hardly be expected to deliver high quality output. To prevent data quality
issues to arise in the first place, Batini, Scannapieco and others stress the importance of good database
design (standardization / normalization) and data integration (Batini & Scannapieco, 1998) (Fishman,
2009). Design and development call for a separation between development, test and production
environments, for one would not want test and development activities to interfere with production
processes and data. Such an environment is characterized by the ROTAP17 abbreviation.
Another characteristic of building proper information systems is the elimination of manual activities.
As pointed out by Thiadens in his interview (Appendix 6.3), manual interaction may account for up to
5 percent of data quality faults (Starreveld, Leeuwen, & Nimwegen, 2004). When improving data
quality, reducing manual intervention therefore is paramount.
When data quality issues arise, a problem solving approach is required including root cause analysis,
data profiling and cleaning, source rating, schema matching and cleaning, business rule matching and
new data acquisition (Verreck, Graaf, & Sanden, 2005) (Besouw, 2009) (McGilvray, 2008) (Batini &
Scannapieco, 1998).
17 Research, Ontwikkel (Design), Test, Acceptation and Production environments
15-Apr-23 F. Boterenbrood Page 3615-Apr-23 F. Boterenbrood Page 36
Root cause analysis is a technique to identify the underlying root cause, the primary source resulting in the problems experienced.
Data profiling originated as a set of algorithms for statistical analysis and assessment of the quality of data values within a data set, as well as for exploring relationships that exist between value collections within and across data sets. For each column in a table, a data profiling tool provides a frequency distribution of the different values, offering insight into the type and use of each column. Cross-column analysis can expose embedded value dependencies, whereas intertable analysis explores overlapping value sets that may represent foreign key relationships between entities.
Source Rating has the goal of rating sources on the basis the quality of data they provide to other sources.
Schema matching takes two schemas as input and produces a mapping between semantically correspondent elements of the two schemas.
Schema cleaning provides rules for transforming a conceptual schema in order to achieve or optimize a given set of qualities.
Business rule matching is the art of comparing data values found with valid values according to business rules. For instance, a person can either be male or female, therefore a database field named ‘gender’ containing a value other than male of female is suspect.
New data acquisition is an activity in which suspect data is replaced by newly retrieved data.
Research Improving data quality in higher educationThesisImproving data quality in higher education
Solving individual data quality issues is referred to as ‘small q’ by (Besouw, 2009). Yet, it may be
easily understood that without addressing the causes leading up to data quality issues in the first
place, an organization will be problem solving continuously without ever reaching a more lasting
solution. What is needed is a holistic approach on data quality, referred to as ‘large Q’ by (Besouw,
2009). Yang W. Lee, Leo L. Pipino, James D. Funk and Richard Y. Wang propose data to be
considered to be a product: an information product (IP). “An IP is a collection of data elements that
meet the specified requirements of a data consumer” (Lee, Pipino, Funk, & Wang, 2006). In the
vision of Yang W. Lee et al, treating information as a product requires the manipulation of data to be
organized as a production process and puts data quality on the board’s agenda. To reach this goal, data
quality roles and responsibilities are established, data quality management procedures are in place and
practical data standards are in use. (Lee, Pipino, Funk, & Wang, 2006).
Yang W. Lee et al identify four fundamentals (Lee, Pipino, Funk, & Wang, 2006):
1. Understand the consumer’s needs;
2. Manage information as a product of a well defined information product process;
3. Manage the life cycle of the information product;
4. Appoint an information Product Manager.
Instating those fundamentals is also known as Master Data Management (Loshin, 2008) (Besouw,
2009) or Data Governance (Fishman, 2009). Master Data Management (or: Data Governance)
includes data quality Service Level Agreement (SLA), life cycle data management and end-to-end
process control. Process control implies the presence of controls, elements in the dataflow where the
quality of data and process is ensured and monitored. Controls include data and specifications,
technology, processes, CRUD18-roles and people & organization (work instructions and employee
education) (McGilvray, 2008) (Besouw, 2009). Thiadens identified assigning responsibilities to the
right people as a major contributor to data quality:
“Problems in grade assignment may be solved by making the lecturer directly responsible for correct
and timely grading. Lecturers are corrected by students when grade assignment is late or questionable.
Registration of lecturer availability may be much improved if the lecturer is made responsible, and is
given the right tools to manage this information” (Interview Thiadens, Appendix 6.3).
To consider information to be a product opens the way to apply production quality frameworks to
information. One widely accepted framework is Six Sigma, a product quality improvement framework
reducing defects by improving the production process. In monitoring the product quality, technology,
processes organization and staff are viewed as a whole. In Six Sigma, sigma represents the standard
deviation. Six Sigma means six times sigma, indicating 3.4 defects per million opportunities (Boer,
Andharia, Harteveld, Ho, Musto, & Prickel, 2006).
The main instrument of Six Sigma is the continuous DMAIC quality improvement cycle (Define,
Measure, Analyze, Improve, Control). In Six Sigma, Key Goal Indicators (KGI’s) are defined, and
translated in Key Performance Indicators (KPI’s) for the information manufacturing process. Controls
are identified, influencing the KPI’s. Thus, KGI’s are measured by KPI’s and managed by Controls.
Finally, the process is executed and, in continuous DMAIC cycles, improved (Boer, Andharia,
Harteveld, Ho, Musto, & Prickel, 2006).
18 Create Read Update Delete
15-Apr-23 F. Boterenbrood Page 3715-Apr-23 F. Boterenbrood Page 37
Research Improving data quality in higher educationThesisImproving data quality in higher education
The notion of applying quality cycles to data is recognized by the Massachusetts Institute of
Technology (MIT), creating the Total Data Quality Methodology TDQM (Lee, Pipino, Funk, &
Wang, 2006) This approach is characterized by five stages:
1. Identify the problem,
2. Diagnose the problem,
3. Plan the solution,
4. Implement the solution,
5. Reflect and learn.
In addition to TDQM, Larry P. English has introduced the Total Information Quality Methodology,
(TIQM), identifying six processes ensuring continuous improvement of information quality (English,
2009):
P1:Assess Information Product Specification Quality,
P2:Assess Information Quality,
P3:Measure Poor Quality Information Costs & Risks,
P4:Improve Information Process Quality,
P5:Correct Data in Source and Control Redundancy,
P6:Establish the Information Quality Environment.
In TIQM, Process six (P6) is an overall process, actually being the first process being executed. While
in both approaches, we may recognize the recursive quality loop, TIQM more clearly recognizes data
(information) to be a product. Even though both TDQM and TIQM recognize the closed quality
improvement loop, it is the six sigma approach which offers the most recognized and widely used
quality based approach. Therefore, in this research, six sigma practices are positioned at WDQM level
five.
To be able to fine-tune a process using quality cycles, process control has to be rigorous, leaving little
room for workers in the process to deviate from their instructions. This is also known as operational
excellence, in which the focus is on creating an as efficient process as feasible (Treacy & Wiersema,
1997).
Practices and structure, process, technology, information and staff
Now practices improving data quality are found, in this paragraph they will be assigned to structure,
process, technology, information and staff. As defined in paragraph 5.5 Concepts Used, Structures,
Systems and Staff are part of McKinsey’s 7-factor model (Pascale, Peters, & Waterman, 2009).
Structure deals with the way the organization is constructed (task management, coordination,
hierarchy), while Processes, Technology and Information criteria all define the Systems factor. Staff
encompasses knowledge management, rewarding, education, morale, motivation and behavior
(Pascale, Peters, & Waterman, 2009). Table 3 presents an overview.
15-Apr-23 F. Boterenbrood Page 3815-Apr-23 F. Boterenbrood Page 38
Research Improving data quality in higher educationThesisImproving data quality in higher education
Identification Structure Process Technology Information Staff
Apply proper system design
Project based development, Project teams, Project management
Proper database design, Data integration
A ROTAP environment is required
Structured Data Modelling Knowledge, Domain Knowledge, Project Management competent
Problem Solving
Ad Hoc problem solving root cause analysis, data profiling and cleaning, source rating, schema matching and cleaning, business rule matching and new data acquisition
Data Analysis and Cleaning tools
Unknown Analytical competent, Knowledge of technology, business rules and data sources
Information as a Product IP
Information Product Manager, Demand and supply structure, Data Quality on the business agenda, Data Quality roles and responsibilities are established
Information is managed as a product of a well defined information product process. Supporting Data Life Cycle Management
Data Quality Analysis and Reporting tools
Structured into an Information Product; Subject to Life Cycle Management, Practical data standards are in use
Commercial skilled (the customer is the consumer ), Understanding the customer needs, Proactive approach to changing data needs
Master Data Management
Deliver quality according to Service Level Agreement
End-to-end process control Defined and role in life-cycle (CRUD) documented
Data Quality Controls are present
Working according to strict instructions
Six Sigma Strict hierarchical DMAIC, executed according to Key Goal Indicators, monitored by Key Performance Indicators
Technology and information quality are observed as a whole
3.4 defects per million opportunities
Staff and information quality are observed as a whole
Table 03: Practices and structure, process, technology, information and staff
Please note that table 3 does not present a maturity model. In a maturity model, levels are organized
in a strict hierarchy, in which process areas accumulate over successive levels. To transform the
model found into a maturity model, further analysis is required. To do so, a view on maturity levels is
created by evaluating multiple level-based maturity models.
Table 4 combines process areas (or: best practices, capabilities and activities) of several maturity
models into one view: PeopleCMM (Curtis, Hefley, & Miller, 2009), CMMI (Kneuper, 2008),
Organizational Project Management Maturity Model OPM3 (Project Management Institute, 2008),
Master Data Management Maturity Model (Loshin, 2001) and Caldea (Caballero & Piattini, 2003).
15-Apr-23 F. Boterenbrood Page 3915-Apr-23 F. Boterenbrood Page 39
Research Improving data quality in higher educationThesisImproving data quality in higher education
Practices and maturity levels
Level Focus CMMI Process Areas People CMM Process Areas OPM3 Best Practices MDM Capabilities Caldea Activities
1 initial Processes are ad-hoc
- - - Limited enterprise consolidation of representative models, Collections of data dictionaries in various forms, Limited data cleansing
2 Managed Processess are characterized by the project
Requirements Management, Project Planning, Project Monitoring and Control , Supplier Agreement Management, Measurement and Analysis, Process and Product Quality Assurance, Configuration Management
Compensation, Training & Development, Performance Management, Work Environment, Communication & Coordination, Staffi ng
Standardize Develop Project Charter Process, Standardize develop Project Management Plan process, Standardize project Collect Requirements process, Standardize project Define Scope proces, ….
Application architectures for each business application, Data dictionaries are collected into a single repository, Initial exploration into low-level application services, Review of options for information sharing, Introduction of data quality management for parsing, standardization, and consolidation
Data Management Project Management, Data Requirements Management, Data Quality Dimensions and Metrics Management, Data Sources and data Targets Management, Database or data warehouse developmentor acquisition project management
3 Defined Processes are defined by the organization
Requirements Development, Technical Solution, Product Integration, Verification, Validation, Organizational Process Focus, Organizational Process Definition, Organizational Training, Integrated Project Management, Risk Management, Decision Analysis and Resolution
Participatory Culture, Workgroup Develpment, Competency-Based Practices, , Career Development, Competency Development, Workforce Planning, Competency Analysis
Measure Develop Project Charter Process, Measure develop Project Management Plan process, Measure project Collect Requirements process, Measure project Define Scope proces, ….
Fundamental architecture for shared master data framework, Defined services for integration with master data asset, Data quality tools, Policies and procedures for data quality management, Data quality issues tracking, Data standards processes
Data Quality Team Management, Data quality product verification andvalidation, Risk and poor data quality impactManagement, Data quality standardization Management, Organizational Processes Management
4. Quantitatively managed
Processes are measured and controlled
Organizational Process Performance, Quantitative Project Management
Mentoring, Organizational Capability Management, Quantitative Performance Management, Competency-Based Assets, Empowered Workgroups, Competency Integration
Control Develop Project Charter Process, Control develop Project Management Plan process, Control project Collect Requirements process, Control project Define Scope proces, ….
SOA for application architecture, Centralized management of business metadata, Enterprise data governance program, Enterprise data standards and metadata management, Proactive monitoring for data quality control feeds into governance program
Data Management Process Measurements Management
5 Optimizing Continuous Process Improvement
Organizational Innovation & Deployment, Causal Analysis & Resolution
Continuous Workforce Innovation, Organizational Performance Alignment, Continuous Capability Improvement
Improve Develop Project Charter Process, Improve develop Project Management Plan process, Improve project Collect Requirements process, Improve project Define Scope proces, ….
Transaction integration available to internal applications, Published APIs enable straight-through processing, Cross-organization data governance
Causal Analysis for Defect Prevention, Organizational Development andInnovation
Table 04: A combined view on maturity.
In this view, all PeopleCMM, all CMMI-COM and CMMI-DEV process areas and all Caldea
activities are shown. With regard to OPM3 and MDM, a subset of best practices and capabilities are
included, in order to present a workable overview. Using this view as a guideline, the practices
identified in table 3 are assigned to specific maturity levels, resulting in table 5, the Windesheim Data
Quality Maturity (WDQM) model. 19 The assignment of practices to WDQM levels is discussed in the
next paragraphs.
19 This table is validated in a discussion with M. van Steenbergen, lead architect at Sogeti.(see appendix 6.2)
15-Apr-23 F. Boterenbrood Page 4015-Apr-23 F. Boterenbrood Page 40
Research Improving data quality in higher educationThesisImproving data quality in higher education
5.1.5 Windesheim Data Quality Maturity Model
Level Focus Structure Process Technology Information Staff
1 initial Processes are ad-hoc
- - - Unspecified -
2 Managed Processess are characterized by the project
Project based development, Project teams, Ad Hoc problem solving
Data profiling and cleaning, source rating, schema matching and cleaning, business rule matching and new data acquisition.
Data Analysis and Cleaning tools. File Transfer data exchange pattern
Not trusted Analytical competent, Knowledge of technology, business rules and data sources, Data modeling knowledge
3 Defined Processes are defined by the organization
Programme management
Root cause analysis, Requirements Development, Product Integration, Verification, Validation, Data integration
Technical Solution, A ROTAP environment is available. Data integration through Remote Procedure Invocation
Fit for current use, A canonical data model supports data translations between domains
Domain Knowledge, Programme Management competent, Data responsible
4. Quantitatively managed
Processes are measured and controlled
Information Product Manager, Data Quality on the business agenda, Data Quality roles and responsibilities are established. Quality is delivered according to Service Level Agreement.
Information is managed as a product of a well defined information product process. Supporting Data Life Cycle Management. End-to-end process control.
Data Quality Analysis and Reporting tools, Integration patterns. Message Bus or Message Broker pattern
Structured into an Information Product; Subject to Life Cycle Management, Canonical data model defines data standards as a lingua franca. Data Quality Controls are present
Commercial skilled (the customer is the consumer ), Understanding the customer needs, Proactive approach to changing data needs
5 Optimizing Continuous Process Improvement
Processes are executed in a strict hierarchy
DMAIC, executed according to Key Goal Indicators, monitored by Key Performance Indicators
Defined and role in life-cycle (CRUD) documented. Technology and information quality are observed as a whole
3.4 defects per million opportunities.
Working according to strict instructions, Staff and information quality are observed as a whole.
Table 05: Windesheim Data Quality Maturity model WDQM
Discussion
The structure column is characterized by a growth from an ad-hoc approach, via project based
development and an integrated programme management approach, to the institution of product quality
management and finally total quality management at level five. At this level, the modus operandi for
processes execution is operational excellence, requiring employees in the workforce to adhere to strict
standards and instructions (Treacy & Wiersema, 1997).
The process column replaces the rather limited notion of proper database design by the CMMI level
three Process Areas Requirements Development, Technical Solution, Product Integration, Verification
and Validation, indicating that the issue at this level is to design, build and implement a well
functioning solution. The CMMI process area technical solution is positioned in the Technology
column. The activities mentioned under problem solving in table 3 fit maturity level two, which is
characterized by ad-hoc problem solving. Root cause analysis however does not fit on this level, since
this activity leads to solving data errors at the root of the problem. Root cause analysis is positioned at
maturity level three, where it supplements requirements development, enabling integrated, robust
solutions.
15-Apr-23 F. Boterenbrood Page 4115-Apr-23 F. Boterenbrood Page 41
Research Improving data quality in higher educationThesisImproving data quality in higher education
The technology column reveals an evolution in system integration. At level two, system integration
still is designed at an ad-hoc, individual manner. At level three, Caldea positions data standardization,
while MDM mentions having defined services for integration, and according to MDM mastering level
four is required for successfully building a SOA application architecture. This is reflected by different
system integration styles being utilized (Hope & Woolf, 2008). At level two, the File Transfer pattern
is the dominant integration style, offering ease of integration and an excellent universal storage
mechanism. At level three, the emergence of a canonical data model opens the way for a more
standardized system integration, utilizing the Remote Procedure Invocation integration style (Hope &
Woolf, 2008). Along the lines of MDM, at level four a common Messaging style supported by a
message broker pattern or message bus pattern (Hope & Woolf, 2008) results in a service oriented
application architecture.
In the information column, at level one, initial, (management of) the organization is oblivious with
regard to data quality. All is assumed to be well, the state data is in however remains unspecified. In
the next level, data quality issues have triggered numerous attempts to repair and clean data resulting
in a decline of confidence in the reliability of the information. MDM positions a rather isolated view
on data quality at level two, whereas at level three an integrated approach is supported by a
fundamental architecture for shared data. Again, we may well see the emergence of a canonical data
model at level three, enabling data to be transformed at the borders of each domain. At this level, data
quality is fit for current use, as indicated by the presence of Caldea’s risk and poor data quality
management process area, whereas at level four data is seen as a product, data quality becomes future
proof and at level five data quality reaches six sigma.
Staff finally, grows from being analytical competent (a good system programmer) to a commercial
skilled worker, being able to assess the data customer’s needs. This reflects PeopleCMM’s
professional training at level two, competence and career development at level three and the
institution of empowered workgroups at level four. PeopleCMM’s definition of professional training
at level two, creates room making the individual entering data responsible for the quality and
ultimately the effects of the data entered. This however requires the organization to focus on the
process as a whole, which initially is the case at level three. Therefore, an individual may be made
responsible for his data entered at level three: data responsible.
Level five is characterized by a continuous improvement cycle. Current data quality theories do not
include continuous improvement. It seems that data quality theories are focused on improving the data
quality to an acceptable level (fit for use). An alternative approach is to be adopted to shape level five.
Both TDQM and Six Sigma are aimed at continuous process improvement. When taking a closer look
however, TDQM is positioned as a project management approach for solving data quality problems in
general (Lee, Pipino, Funk, & Wang, 2006, p. 64) (Kovac, Lee, & Pipino, 1997) ensuring that data
errors are improved at the data source, not at the place they create havoc. This implies that a form of
continuous improvement cycle has been defined at level two already. However, TDQM does not
improve the production process it selves. It springs into action once an obvious data error has been
detected, and eliminates the root cause. Six Sigma on the other hand, improves the data production
process until data quality has reached an absolute maximum, surpassing the ‘fit for use’ boundary.
Therefore, to populate level five, Optimizing, the Six Sigma fundamentals fit best.
In the remainder of the research, this model will be referred to as the Windesheim Data Quality
Maturity model, or WDQM.
15-Apr-23 F. Boterenbrood Page 4215-Apr-23 F. Boterenbrood Page 42
Research Improving data quality in higher educationThesisImproving data quality in higher education
Data Ownership
An issue that remains largely untouched so far is data ownership. From whom is the data, anyway? To
be more precise, who owns the data at Windesheim? Take for instance grades assigned to
assessments, made by students. Who owns that grade? Is it the student, or the student administration,
or the IT department perhaps? And what does this all mean for treating information as a product? If
information is a product, and it is subject to Service Level Agreements, then who is selling what to
whom?
In literature, some is said on data ownership. On level 4, in the structure column, it is found that Data
Quality roles and responsibilities are established and an Information Product Manager instated (Lee,
Pipino, Funk, & Wang, 2006). This issue is more specifically addressed by Danette McGilvray,
introducing the Data Steward (McGilvray, 2008) as a replacement for data owner, since in her vision
ownership results in a too rigid and inflexible position of stakeholders. Indeed, when interviewed,
Thiadens, lector at Fontys university, identified ownership as an obstacle:
“The most difficult hurdle to be solved here is to overcome the notion that information is not owned by the
decentralized business units”. (Interview dr. mr. ir. Th. J.G. Thiadens, Appendix 6.3)
A data steward on the other hand is a role, acting on behalf and in the best interest of someone else,
thus creating room to maneuver and flexibility to implement this role. Gartner seems to agree:
“A data owner owns the data, much like the queen owns the land, while a data steward takes care of the
data, much like a farmer takes care of the land” (Friedman, 2009).
To be able to take care of data, one must have the right tools and responsibilities. A data steward is
able to be effective at level 4, quantitatively managed, since at this level information is managed as a
product, for the quality of which one can be responsible (Lee, Pipino, Funk, & Wang, 2006). The
Information Product Manager therefore is positioned at level 4 and assigned the role of Data Steward.
It be noted however, that management involvement remains crucial:
“…the data steward, .. cannot fulfill his role as caretaker for data quality if the means to effectively
influence data quality do not come with the job. Since data quality is related to organizational maturity,
the means required are managerial rather than technical. To ensure data quality, one may have to be
prepared to restructure the organization. Instating data stewardship without the preparedness of taking
(perhaps drastic) managerial decisions, restructuring the fabric of an organization, may be in vain. There
HAS to be a manager responsible for data quality with the authority to implement change”. (Interview de
Graaf, appendix 6.4)
15-Apr-23 F. Boterenbrood Page 4315-Apr-23 F. Boterenbrood Page 43
Research Improving data quality in higher educationThesisImproving data quality in higher education
Graphical presentation
1. Initial
2. Managed
3. Defined
4. QuantitativelyManaged
5. Optimizing
Data Quality has not yet been formally identified as the source of problems
Aware of data quality problems, solving data quality issues on ad-hoc basis.
solving data quality issues throughstructured system developmentand rigorous testing
Treating information as a product, handling data quality problems as product and process faults, controlling the process
Constantly improving process and data quality in total quality cycles
Figure 09: graphical representation WDQM
5.1.6 Alternative views on data quality maturity
In the previous paragraphs multiple maturity models were analyzed, resulting in the WDQM. The
common denominator between those models is that they are all level based maturity models using
process areas (Curtis, Hefley, & Miller, 2009), (Kneuper, 2008), best practices (Project Management
Institute, 2008) or capabilities (Loshin, 2001) to achieve goals defining each level of maturity. In
literature, other data quality maturity models are described, using a similar level-based description, yet
lacking the definition of process areas (or best practices c.q. capabilities) and goals. Therefore, using
these models as a source for analysis is difficult, if not impossible. However, now the WDQM has
been defined, what can we learn from comparing the resulting data quality maturity model with the
other maturity models described in literature?
Data Quality Management Maturity Model
An alternative view on data quality maturity is developed by Kyung-Seok Ryu, Joo-Seok Park, and
Jae-Hong Park (figure 10).
Figure 10: A Data Quality Management Maturity Model (Ryu, Park, & Park, 2006)(Ryu, Park, & Park, 2006)
15-Apr-23 F. Boterenbrood Page 4415-Apr-23 F. Boterenbrood Page 44
Research Improving data quality in higher educationThesisImproving data quality in higher education
In this view on data quality maturity, in each successive maturity level, data management operates on
an increased level of abstraction. Where initially data is managed from a rather operational point of
view, the physical database scheme, in the second level a data model is present, resulting in a more
integrated view on data. Next, this model is standardized using meta data standards, an finally a more
holistic view is obtained utilizing a data architecture. This view on maturity takes another approach,
solely focusing on the information aspect and utilizing four levels instead of the CMMI five-level
approach, whereas in the WDQM at higher maturity levels data transforms into a product, and the
focus is on improving the production process. However, similarities may be observed:
Data Quality Management Maturity Windesheim Data Quality Maturity
Lev 1 Management of physical data Level 2 Focus on repairing physical D.Q. issues
Lev 2 Management of data definitions Level 3 Focus on requirements and data design
Lev 3 Management through data standards Level 4 Data is an IP, data standards are in use
Lev 4 Holistic data management, architecture Level 5 Continuous improved through DMAIC
As shown, when maturity levels are aligned, level 1 through 3 of the data quality management
maturity model bear similarities to the levels 2 through 4 of the Windesheim Data Quality Maturity
model. The data quality management maturity model presented by Kyung-Seok Ryu et al may enrich
the information column of the WDQM (table 5).
Gartner Data Quality Maturity Model
Another data quality maturity model is defined by Gartner. Gartner recognizes five levels of maturity
(Gartner, 2007):
“Organizations at Level 1 have the lowest level of data quality maturity, with only a few people aware of
the issues and their impact. … Organizations at Level 2 are starting to react to the need for new
processes that improve the relevance of information for daily business. ….. Organizations at Level 3 are
proactive in their data quality efforts. They have seen the value of information assets as a foundation for
improved enterprise performance ….. At Level 4, information is part of the IT portfolio and considered an
enterprise wide asset, and the data quality process becomes part of an EIM program. …Companies at
Level 5 have fully evolved EIM programs for managing their information assets with the same rigor as
other vital resources, such as financial and material assets.” (Gartner, 2007).
Even though Gartner does not define process areas and goals for each level, characteristics defining
each level are provided in a descriptive text. To analyze this description, table 6 (see next page) is
created, containing both the WDQM and the characteristics from Gartner’s vision on data quality
maturity (Gartner, 2007).
Again, similarities and differences can be observed. In Gartner’s view, at maturity level three the
organization is already moving beyond project based development, which leads to a bit confusing and
less clear cut distinction between maturity levels managed and optimized.
Also, the distinction made between an organization responding in a reactive or proactive mode on data
quality issues is interesting. Being pro-active and having Enterprise Information Management (EIM)
operational at level three already might be a bit steep, considering the fact that at level three, OPM3
positions projects being measurable (not being in control), MDM defines data quality traceable (and
positions proactive monitoring at level four), and CMMI focuses on integration (and positions
15-Apr-23 F. Boterenbrood Page 4515-Apr-23 F. Boterenbrood Page 45
Research Improving data quality in higher educationThesisImproving data quality in higher education
quantitative management at level four) (Curtis, Hefley, & Miller, 2009), (Kneuper, 2008), (Project
Management Institute, 2008), (Loshin, 2001).
15-Apr-23 F. Boterenbrood Page 4615-Apr-23 F. Boterenbrood Page 46
Research Improving data quality in higher educationThesisImproving data quality in higher education
Level Focus Structure Process Technology Information Staff
1 initial, WDQM Processes are ad-hoc
- - - Unspecified -
1 Aware Gartner Lowest level of data quality maturity
Within the entire organization, no person, department or business function claims responsibility for data.
When a problem with data quality is obvious, there is a tendency to ignore it and to hope that it will disappear of its own accord
No formal initiative to cleanse data exists, and information emerging from computers is generally held to be "correct by default."
Business users are largely unaware of a variety of data quality problems, partly because they see no benefit for themselves in keeping data clean.
Only a few people aware of the issues and their impact
2 Managed WDQM
Processess are characterized by the project
Project based development, Project teams, Ad Hoc problem solving
Data profiling and cleaning, source rating, schema matching and cleaning, business rule matching and new data acquisition.
Data Analysis and Cleaning tools. File Transfer data exchange pattern
Not trusted Analytical competent, Knowledge of technology, business rules and data sources, Data modeling knowledge
2 Reactive Gartner
Reacting to the need for new processes
Although field or service personnel need access to accurate operational data to perform their roles effectively, businesses take a wait-and-see approach in relation to data quality
Starting to react to the need for new processes that improve the relevance of information for daily business.
Application developers implement simple edits and controls to standardize data formats, check on mandatory entry fields and validate possible attribute values.
Business decisions and system transactions are regularly questioned due to suspicions about data quality.
Employees have a general awareness that information provides a means for enabling greater business-process understanding and improvement.
3 Defined WDQM
Processes are defined by the organization
Programme management Root cause analysis, Requirements Development, Product Integration, Verification, Validation, Data integration
Technical Solution, A ROTAP environment is available. Data integration through Remote Procedure Invocation
Fit for current use, A canonical data model supports data translations between domains
Domain Knowledge, Programme Management competent, Data responsible
3 Proactive Gartner
Proactive data quality efforts
Organizations have seen the value of information as a foundation for enterprise performance and moved from project-level IM to a coordinated EIM strategy.
Major data quality issues are documented, but not completely remediated.
Data quality tools, for tasks such as profiling or cleansing, are used on a project-by-project basis, but housekeeping is typically performed by the IT department or data warehouse teams.
Business analysts feel data quality issues most acutely and data quality is part of the IT charter. Levels of data quality are considered "good enough" for most tactical and strategic decision-making.
Department managers and IT managers communicate data administration and data quality guidelines. The concept of "data ownership." is discussed.
4. Quantitatively managed WDQM
Processes are measured and controlled
Information Product Manager, Data Quality on the business agenda, Data Quality roles and responsibilities are established. Quality is delivered according to Service Level Agreement.
Information is managed as a product of a well defined information product process. Supporting Data Life Cycle Management. End-to-end process control.
Data Quality Analysis and Reporting tools, Integration patterns. Message Bus or Message Broker pattern
Structured into an Information Product; Subject to Life Cycle Management, Canonical data model defines data standards as a lingua franca. Data Quality Controls are present
Commercial skilled (the customer is the consumer ), Understanding the customer needs, Proactive approach to changing data needs
4. Managed Gartner
Information is an enterprisewide asset
The data quality process is part of an EIM program and is now a prime concern of the IT department and a major business responsibility.
Data quality is measured and monitored at enterprise level regularly. An impact analysis links data quality to business issues and process performance.
Commercial data quality software is implemented. Cleansing is performed either at the data integration layer or directly at the data source.
Information is part of the IT portfolio and considered an enterprisewide asset.
Multiple data stewardship roles are established within the organization.
5 Optimizing WDQM
Continuous Process Improvement
Processes are executed in a strict hierarchy
DMAIC, executed according to Key Goal Indicators, monitored by Key Performance Indicators
Defined and role in life-cycle (CRUD) documented. Technology and information quality are observed as a whole
3.4 defects per million opportunities.
Working according to strict instructions, Staff and information quality are observed as a whole.
5 Optimized Gartner
Fully evolved EIM programs
Fully evolved EIM programs for managing their information assets with the same rigor as other vital resources, such as financial and material assets.
Rigorous processes are in place: ongoing housekeeping exercises, continuous monitoring of quality levels.
Data is enriched in real time by third-party providers with additional credit, demographic, sociographic, household, geospatial or market data.
Unstructured mission-critical information, such as documents and policies, becomes subject to data quality controls.
Quality metrics are attached to the compensation plans of data stewards and other employees.
Table 06: A combined view on the WDQM and the Gartner Data Quality Maturity model
What can be observed is that at level two in Gartner’s model the emphasis lies on being able to
develop the right solution, and at level three the focus shifts towards (pro-)actively monitoring and
ensuring data quality (Gartner, 2007). In the WDQM however, at level two the emphasis is on
repairing data quality issues in an ad-hoc manner, whilst at level three the focus is shifted toward
developing more robust and better aligned solutions. Indeed, the order in which these things take place
15-Apr-23 F. Boterenbrood Page 4715-Apr-23 F. Boterenbrood Page 47
Research Improving data quality in higher educationThesisImproving data quality in higher education
may be different depending on one’s viewpoint. One may argue that, in order to experience data
quality issues, one must be able to develop applications first. The WDQM is based on sound theories
on maturity, which state that a subject (data quality) is discarded first, then dealt with on ad-hoc basis
(i.e. ‘repaired’) and only understood and implemented more robustly at maturity levels three and
upwards (see figure 9). Therefore, in this research the WDQM will remain unchanged.
5.1.7 Conclusion
In the previous paragraphs a generic data quality maturity model has been found by
1. Identifying data quality improvement practices by literature study and interview;2. Assigning those practices to organizational structure, process, technology, information and staff,
thus creating a balanced view;3. Assigning the resulting set of practices to maturity levels by linking each measure with a specific
process area, creating a maturity matrix.
Finally, the resulting Windesheim Data Quality Maturity model WDQM is compared with other data
quality maturity models. Differences may be observed, and it is found that, based on one’s viewpoint,
the order of process areas in level two and three may vary, and is therefore open for discussion. Data
ownership, being an important issue when discussing data quality, has hardly been mentioned in
literature. It is suggested to replace data ownership by data stewardship at level 4.
Now a model for Data Quality Maturity has been developed, the data quality threshold is to be
established. In the next paragraphs, data quality attributes are defined and the domain business rules
are found.
5.2 Data Quality Attributes
In this paragraph, the search is on for answers to the following questions:
In higher education, what positive and negative correlations between maturity and data quality may be
found?
What values of data quality attributes correlate with each level of maturity?
What do process quality theories describe about positive correlations between quality and
maturity?
What do process quality theories describe about negative correlations between quality and
maturity?
Are those observations consistent?
5.2.1 Dimensions of data quality
In literature, data quality is defined by dimensions, and those dimensions in turn are measured by data
quality attributes (Loshin, 2008) (Batini & Scannapieco, 1998) (McGilvray, 2008). To find the right
data quality attributes, the dimensions have to be identified first. This paragraph establishes a view on
the dimensions of data quality.
What are the dimensions of data quality? When we examine literature on this topic, what we discover
is that many dimensions are defined but, unfortunately, naming and definitions vary between sources.
Table 7 presents an overview.
15-Apr-23 F. Boterenbrood Page 4815-Apr-23 F. Boterenbrood Page 48
Research Improving data quality in higher educationThesisImproving data quality in higher education
For each dimension, the table shows the definition, the source that supplied the definition and
relationships with other dimensions. This relationship is either specifically supplied by the source (for
instance, in the form of a formula) or it is found by comparing definitions (indicating that the
dimensions are actually synonyms).
Table 7 presents an non-normalized view on data quality dimensions found in literature. To create a
more usable view, this set of dimensions will be compacted by removing duplicates and synonyms. In
some cases, in literature dimensions were mentioned that relates more to quality of software than to
quality of data (Ease of use, Maintainability and Presentation Quality). These dimensions are omitted.
Dimension Defintion Source Related to
Accessibility Ease of attainability of the data Lee, Pipino, Funk, & Wang, 2006 Accessibility = 1 - (delivery time - input time) / (outdated time - input time)
Accuracy, Database
Correctness of data in the database Zeist, Hendriks, Paulussen, & Trieneken, 1996
Accuracy, Semantic
Closeness of value v to true value v’ Batini & Scannapieco, 1998
Accuracy, Syntactic
Closeness of value v to elements of the corresponding domain D
Batini & Scannapieco, 1998
Actuality, Database
Data in the database is in conformance with reality Zeist, Hendriks, Paulussen, & Trieneken, 1996
Accuracy, Timeliness
Completeness The extent to which data are of suffi cient breadth, depth and scope for the task at hand
Batini & Scannapieco, 1998
Completeness The degree in which elements are not missing from a set Lee, Pipino, Funk, & Wang, 2006
Consistency Violation of semantic rules over (a set of) data-items Batini & Scannapieco, 1998Consistency The degree in which values and formats of data elements
are used in a univocal wayLee, Pipino, Funk, & Wang, 2006
Consistency A measure of the equivalence of information stored or used in arious data stores, applications and systems
McGilvray, 2008
Currency Concerns how promptly data are updated Batini & Scannapieco, 1998 Currency = delivery time – input time + age
Currency Lee, Pipino, Funk, & Wang, 2006 Currency = delivery time – input time + age
Data Coverage A measure of the availability and comprehensiveness of data compared to the total data universe or population of interest
McGilvray, 2008 Completeness
Decay A measure of the rate of negative change to the data McGilvray, 2008 TimelinessDuplication A measure of unwanted duplication McGilvray, 2008 UniquenessEase of use A measure of the degree to which data can be accessed
and usedMcGilvray, 2008
Format Compliance
The degree in which a modeled object conforms to the set of rules bounding its representation
Loshin, 2008
Integrity, Data A measure of total data quality McGilvray, 2008Integrity, Referential
The degree in which related sets of data are consistent Chen, 1976
Maintainability The degree to which data can be updated, maintained and managed
McGilvray, 2008
Presentation Quality
A measure of how information is presented to and collected from those who utilize it
McGilvray, 2008
Reliability Free Of Error Lee, Pipino, Funk, & Wang, 2006Reliability The degree in which data represent reality Verreck, Graaf, & Sanden, 2005Specifications A measure of the existence, completeness, quality and
documentation of data standardsMcGilvray, 2008
Timeliness Timeliness expresses how current data are for the task at hand
Batini & Scannapieco, 1998 Timeliness = 1 – volatility / currency
Timeliness Timeliness can be measured as the time between when information is expected and when it is readily available for use
Loshin, 2008
Timeliness Or Availability : A measure of the degree to which data are current and available for use
McGilvray, 2008 Availability
Trust A measure of the confidence in the data quality McGilvray, 2008 ReliabilityUniqueness Refers to requirements that entities .. are captured,
represented, and referenced uniquelyLoshin, 2008 Consistency = f(uniqueness)
Usability The total fitness of data for use Verreck, Graaf, & Sanden, 2005 Usability = Reliability *
Relevance, U=R2
Volatility Characterizes the frequency with which data vary in time Batini & Scannapieco, 1998 Decay, Currency
15-Apr-23 F. Boterenbrood Page 4915-Apr-23 F. Boterenbrood Page 49
Research Improving data quality in higher educationThesisImproving data quality in higher education
Table 07: An overview of data quality dimensions
One data quality dimension is mentioned, yet rather loosely defined: integrity. Integrity is defined to
be an over-all measure of data quality. In this research, data is defined to be integer once it is fit for
use. (see paragraph 3.1.4 What is data quality). This also means that usability and integrity are
synonymous.
A dimension that is not explicitly mentioned in literature on data quality is security. Markus Schumacher et al identify four data quality dimensions related to security: Confidentiality, Integrity, Availability and Accountability (Schumacher, Fernandez-Buglioni, Hybertson, Buschmann, &Sommerland, 2006). We may recognize Integrity and Availability to be part of the model already. Integrity is defined to be an over-all measure of data quality, acting as a container for all other dimensions of data quality. In data Quality literature, Availability is commonly known as Timeliness. Confidentiality and Accountability are added to the list of data quality dimensions. Confidentiality is the property that data is disclosed only as intended by the enterprise (Schumacher, Fernandez-Buglioni, Hybertson, Buschmann, & Sommerland, 2006), while Accountability is the property that actions affecting enterprise assets can be traced to the actor responsible for the action (Schumacher,Fernandez-Buglioni, Hybertson, Buschmann, & Sommerland, 2006). Timeliness is defined by (Batini& Scannapieco, 1998) as 1 – volatility / currency. Volatility being a frequency, and Currency being a timeframe, it is proposed in this research to replace this by a simpler equation: Timeliness = Volatility * Currency. In this case, when Currency < Volatility, Timeliness < 1. When Currency > Volatility, Timeliness > 1.
The analysis results in table 8: Dimensions of data quality.
Dimension Defintion Related toAccessibility Ease of attainibility of the data Accessibility = 1 - (delivery time - input
time) / (outdated time - input time)Accountability Accountability is the property that actions affecting enterprise
assets can be traced to the actor responsible for the actionSecurity
Accuracy Closeness of value v to true value v’Completeness The degree in which elements are not missing from a set
Confidentiality Confidentiality is the property that data is disclosed only as intended by the enterprise
Security
Consistency The degree in which values and formats of data elements are in line with semantic rules over this set of data-items
Consistency = f(Uniqueness)
Currency Concerns how promptly data are updated Currency = delivery time – input time + ageIntegrity, Data The degree in which data is fit for useIntegrity, Referential The degree in which related sets of data are consistent ConsistencyReliability The degree in which data is perceived to represent realitySpecifications A measure of the existence, completeness, quality and
documentation of data standardsTimeliness Or Availability : A measure of the degree to which data are
current and available for useTimeliness = volatility * currency
Uniqueness Refers to requirements that entities are captured, represented, and referenced uniquely
Volatility Characterizes the frequency with which data vary in time
Table 08: Dimensions of data quality
Batini points out that dimensions could be conflicting: “For instance, a list of courses published on a
university web site must be timely though there could be accuracy or consistency errors and some
fields specifying courses could be missing” (Batini & Scannapieco, 1998).
5.2.2 Data Quality Dimensions Discussed
Now the final set of dimensions of data quality is identified, can individual dimensions be assigned to
levels of the WDQM? In other words, can it be argued that a certain level of maturity has to be
15-Apr-23 F. Boterenbrood Page 5015-Apr-23 F. Boterenbrood Page 50
Research Improving data quality in higher educationThesisImproving data quality in higher education
mastered in order to be able to satisfy (a group of) data quality dimensions? In this paragraph, this
question is explored by identifying the measures which establishes the data quality dimension,
comparing these measures to WDQM process areas, thus binding the dimension to the corresponding
WDQM maturity level, and finally defining the corresponding data quality attribute(s).
Accessibility
Accessibility deals with the fact that data needs to be delivered before it becomes insignificant
(outdated). This makes for a rather complex, compound dimension. Accessibility is influenced by
Volatility (the rate at which data changes), Timeliness (the speed at which data is available for use)
and Currency (the speed at which data is updated in the system). Both Timeliness and Currency are
positioned at level 4, thus Accessibility can only be guaranteed at level 4, Quantitatively Managed.
Accessibility is measured by a ratio, indicating the ease of attainability of the data. Accessibility = 1 -
(delivery time - input time) / (outdated time - input time) (Lee, Pipino, Funk, & Wang, 2006).
Accountability
Markus Schumacher et al identifies a series of security patterns especially focused on maintaining
Accountability. Security accounting is a service area that performs four functions: Capture, Store,
Review and Report data about security events (Schumacher, Fernandez-Buglioni, Hybertson,
Buschmann, & Sommerland, 2006). Patterns used to execute this process are security accounting,
audit service, audit trail, and intrusion detecting (Schumacher, Fernandez-Buglioni, Hybertson,
Buschmann, & Sommerland, 2006).
To be able to implement these patterns, a view on data structures and data quality is required, as well
as well-defined and independent operating ROTAP environments. It may therefore be argued, that
Accountability can be maintained no earlier than at maturity level 3, defined.
Attributes involved are actors involved, assets affected, time, date and place of the event, and methods
used (Schumacher, Fernandez-Buglioni, Hybertson, Buschmann, & Sommerland, 2006).
Accuracy
Accuracy is about getting the data right, being as close to reality as possible. Amongst measures
ensuring accuracy are data profiling and cleaning. This means that a certain level of Accuracy can be
achieved at maturity level 2, managed, be it in a reactive manner and at high costs in terms of time
and labor. At this level, flaws in accuracy will continue to return, jeopardizing reliability. At level
maturity level 3, Defined, by implementing robust applications, utilizing various types of input
checks, an acceptable level of accuracy will be achieved in a more lasting fashion.
Accuracy is measured by the Accuracy error, a ratio ranging between 0 and 1, indicating the number
of characters, data elements or database tupels being in error as a fraction of the total number of
characters, data elements or database tupels (Batini & Scannapieco, 1998).
Completeness
Completeness is about getting all the data. To get all data elements in time, the business process needs
to be well organized and scheduled, with all sub-processes delivering detailed information right on
time. Therefore, maturity level 3, defined, is required to effectively organize for completeness. If
15-Apr-23 F. Boterenbrood Page 5115-Apr-23 F. Boterenbrood Page 51
Research Improving data quality in higher educationThesisImproving data quality in higher education
processes are not controlled efficiently, the process will either continue without the required data or
will come to a halt until the required data is delivered, and timeliness will be jeopardized.
Completeness cannot be ‘fixed’ with data profiling and cleaning techniques, since missing data will
only be available once processes responsible for this data have delivered their output.
Completeness is measured as a ratio ranging between 0 and 1, indicating the number of data elements
missing as a fraction of the total number of data elements (Lee, Pipino, Funk, & Wang, 2006).
Confidentiality
Confidentiality is about securing data from unauthorized access. To this means, a multitude of
security patterns exists, each of which may be invoked and combined into security services according
to security levels required (Schumacher, Fernandez-Buglioni, Hybertson, Buschmann, &
Sommerland, 2006). Security services do well in binding specific security patterns to specific
situations, they do not help us positioning Confidentiality in a maturity model. However, Schumacher
et al identifies three basic security access types currently in use: the access matrix, the role-based
access control model (RBAC) and the multilevel model (Schumacher, Fernandez-Buglioni,
Hybertson, Buschmann, & Sommerland, 2006).The most basic type, the access matrix, provides
access to resources by identifying which active entity in a system may access what resources and how
(Schumacher, Fernandez-Buglioni, Hybertson, Buschmann, & Sommerland, 2006). Role based access
simplifies access right management by grouping active entities into roles, and assigning generic rights
or each role (Schumacher, Fernandez-Buglioni, Hybertson, Buschmann, & Sommerland, 2006). This
way, once a new participant enters the organization, only the correct roles need to be added to his
identification, instead of painstakingly assigning all individual rights. In multilevel security,
sensitivity is defined at data level, not at resource level, users receive clearance, and access of users
with specific clearance levels to data is based on policies These access types may well help us
positioning Confidentiality in the maturity model, since applying these styles requires for different
insight in stakeholders and processes.
The most simple style, the access matrix, already requires a view on stakeholders involved in the
business process and the individual resources required to perform their tasks. This requires processes
to be defined by the organization, and therefore confidentiality can only be effective guaranteed at
maturity level 3, Defined. On top of this, more advanced modes of access management require
processes to be measured and controlled, and therefore both role based access and multilevel security
can be deployed effectively once an organization has reached data quality maturity level 4,
Quantitatively Managed.
Security may be expressed in security services (Schumacher, Fernandez-Buglioni, Hybertson,
Buschmann, & Sommerland, 2006).
Consistency
Consistency is the degree in which values and formats of data elements are in line with semantic rules
over the set of data-items. The first observation is that de-duplication of data may improve
consistency, since data stored in multiple locations is likely to get corrupted. There is a relation
between consistency and uniqueness, in that increasing uniqueness will support consistency.
Therefore, consistency will benefit from a holistic view on information processing, in which attention
is paid to the dispersion of data within an organization. Such a holistic view is referred to as an
Enterprise architecture (Lankhorst, 2005) (Boterenbrood, Hoek, & Kurk, 2005) and this would
15-Apr-23 F. Boterenbrood Page 5215-Apr-23 F. Boterenbrood Page 52
Research Improving data quality in higher educationThesisImproving data quality in higher education
seemingly fit maturity levels 4 and 5. However, consistency can be achieved using data profiling and
cleaning techniques, therefore consistency can be achieved at maturity level 2, Managed, already.
Consistency is measured as a ratio ranging between 0 and 1, indicating the number of data elements
violating a specific consistency type as a fraction of the total number of data elements (Lee, Pipino,
Funk, & Wang, 2006).
Currency
Currency describes how promptly data are updated and is an function of age (of the data), delivery
time and input time: Currency = age + delivery time – input time. Currency is the sum of how old the
data was when it was received plus a term that measures how long data has been in the information
system (Batini & Scannapieco, 1998) (Lee, Pipino, Funk, & Wang, 2006). Currency is targeted by
straight-through processing, in which near real time service oriented technologies replace
cumbersome batch procedures (Pant & Juric, 2008). This is reflected in Master Data Management,
where development of a Service Oriented Architecture is firmly positioned at level 4 (Loshin, 2008).
Currency can be measured in days or milliseconds. However, to be able to reach any agreement on
currency, business processes need to be measured and controlled. Currency therefore, is indeed a data
quality dimension that can only effectively be implemented and discussed at maturity level 4:
quantitatively managed.
The measure for currency is Time.
Data Integrity
Data integrity is an indication for the degree in which data is fit for use. The data are integer once they
are deemed fit for use. This data quality dimension therefore acts as an container dimension, covering
all other aspects of data quality.
Referential Integrity
Referential integrity refers to the degree in which related sets of data are consistent. It is therefore a
special instance of consistency. Referential integrity is introduced by (Chen, 1976), and within one
database easily enforced by the implementation of referential constraints. Referential integrity may
therefore be well achieved at maturity level 2, Managed, where data rules enforce referential integrity.
Referential Integrity is measured as a ratio ranging between 0 and 1, indicating the number of
database tupels violating a specific relation type as a fraction of the total number of database tupels
(Lee, Pipino, Funk, & Wang, 2006).
Reliability
Reliability is the measure in which the data is perceived to represent reality. A synonym is trust
(McGilvray, 2008). According to the WDQM model, reliability is first achieved at level 3, Defined.
Reliability is binary: data is trusted, or it is not.
15-Apr-23 F. Boterenbrood Page 5315-Apr-23 F. Boterenbrood Page 53
Research Improving data quality in higher educationThesisImproving data quality in higher education
Specifications
Specifications is a measure of the existence, completeness, quality and documentation of data
standards (McGilvray, 2008). As such, specifications are required for processes like source rating,
schema matching and cleaning, business rule matching and new data acquisition. De Graaf mentions
Insight as an important dimension of data quality (see appendix 6.4):
‘Insight in data means that it is clear for an organization what data attributes are required or available,
where and why these data attributes are created, what sources were used, where these attributes are
used, who guards and tests the attribute, when these attributes are outdated and, once obsolete, how they
are dealt with’ (Interview de Graaf, appendix 6.4).
Insight can be seen as a result from valid specifications, and is an important prerequisite for further
data quality improvement. Therefore we may expect Specifications to be present at level 2, Managed.
Specifications is binary: they are either present or absent. Incomplete, faulty or outdated specifications
fall in the absent category, since they do not contribute to reliable Insight.
Timeliness
Timeliness, or Availability is a measure of the degree to which data are current and available for use
(Batini & Scannapieco, 1998) (Loshin, 2001) (McGilvray, 2008).
Timeliness is measured as a ratio, indicating the availability for use of the data. It is expressed as a
function of Volatility and Currency: T = V*C. If currency is larger than the volatility ‘wavelength’,
timeliness becomes larger than one, meaning it is becoming less fitting. Volatility is a fixed
parameter, therefore to increase Timeliness, Currency needs to be reduced.
Since Currency is positioned at level 4, Quantitatively managed, an effective implementation of
Timeliness requires an organization to have reached level 4 as well.
Uniqueness
Uniqueness refers to requirements that entities are captured, represented, and referenced uniquely
(Loshin, 2008). In the definition given by Loshin (2008), uniqueness is bound to data in a database or
file system: “The dimension of uniqueness is characterized by stating that no entity exists more than
once within the data set” (Loshin, 2008). This implementation of uniqueness is available at level 2,
Managed, already, since data profiling tools and database constrains simply enforce this rule.
For uniqueness, no attribute has been published. Therefore, it is proposed to measure uniqueness as a
ratio ranging between 0 and 1, indicating the number of data elements being duplicated as a fraction
of the total number of data elements in a database or file.
Volatility
Volatility characterizes the frequency with which data vary in time (Batini & Scannapieco, 1998). A
synonym is decay (McGilvray, 2008). Volatility is actually not so much a data quality dimension, it is
more a dimension of data itself. Data IS volatile. Therefore, volatility is present at maturity level 1,
Initial, be it that it is recognized by just a few specialists within the organization (Interview de Graaf,
appendix 6.4). At maturity level 2, Managed, volatility is recognized by business management to be a
characteristic of data. At maturity level 3, Defined, systems are build with volatility in mind.
15-Apr-23 F. Boterenbrood Page 5415-Apr-23 F. Boterenbrood Page 54
Research Improving data quality in higher educationThesisImproving data quality in higher education
The measure for volatility is Frequency.
Level 5, Optimizing
Surprisingly, in literature on data quality, no specific data quality attributes are defined
operationalizing level 5, optimizing. At this level, all data quality process areas have already been
mastered20. Therefore, an additional theory is required, extending the reach of data quality into the
field of continuous improvement. Six Sigma is such an theory. Six Sigma results in data quality being
constantly improved, by implementing the DMAIC cycle, controlled by Key Goal Indicators
measured by Key Performance Indicators. The whole data life cycle is observed, leading to 3.4 defects
per million opportunities, in accordance with Service Level Agreements (Boer, Andharia, Harteveld,
Ho, Musto, & Prickel, 2006). Thus, metrics at this level are process oriented, not strictly data quality
oriented. To find metrics for this level, Six Sigma leads the way.
According to Six Sigma, it is primarily the spread of errors (unpredictability of a process) that
contributes to costs (Boer, Andharia, Harteveld, Ho, Musto, & Prickel, 2006, p. 36). Therefore, a
measure of quality on this level is a statistical one: the standard deviation (sigma, σ). For this, a mean
and a variance are set as goals. KPI’s are defined by Controls like External Critical to Quality (Ext
CTQ), Internal Critical to Quality (Int CTQ), Unit, Defects and Opportunities, and Population.
This level relates to the over-all data quality dimension, Data Integrity. Data Integrity will approach
six sigma (6σ)
5.2.3 WDQM Goals
The assignment of data quality dimensions and attributes to maturity levels results in the definition of
goals for the WDQM process areas. Now, for each level, data quality process areas, goals and metrics
are available. However, it proofed to be impossible to base every decision on published and well
accepted theories. In many cases, similarities in definitions gave all the information available and
sometimes, only rigor of reasoning could shed light on where to position a dimension. Therefore, to
increase validity, this model is discussed with an external expert (see appendix 6.4). Table 9 presents
the goals for each maturity level in the WDQM model.
20 It is to be noted however, that both TDQM (Lee, Pipino, Funk, & Wang, 2006) and TIQM (English, 2009) support the six sigma DMAIC-style quality improvement cycle
15-Apr-23 F. Boterenbrood Page 5515-Apr-23 F. Boterenbrood Page 55
Research Improving data quality in higher educationThesisImproving data quality in higher education
Lev Dimension Data Quality Practice Data Quality Dimension Attribute
1, initialVolatility Data IS volatile, volatility is not yet recognized Frequency
2, managedAccuracy Data profiling and cleaning Ratio ranging between 0 and 1, indicating the
number of characters, data elements or database tupels being in error as a fraction of the total number of characters, data elements or database tupels
Consistency Data profiling and cleaning Ratio ranging between 0 and 1, indicating the number of data elements violating a specific consistency type as a fraction of the total number of data elements
Integrity, Referential
Establish referential database constraints Ratio ranging between 0 and 1, indicating the number of database tupels violating a specific consistency type as a fraction of the total number of database tupels
Specifications Specifications Engineering Specifications is binary: they are either present or absent
Uniqueness Data profiling and establishment of database constrains Ratio ranging between 0 and 1, indicating the number of data elements being duplicated as a fraction of the total number of data elements in a database or file
Volatility Volatility is recognized as a characteristic of data Frequency
3, definedAccountability Event history management Accountability is binary: Updates are accounted
for, or they are notAccuracy Engineering of robust applications, utilizing various
types of input checksSee Accuracy, level 2
Completeness Business processes need to be well organized and scheduled, with all sub-processes delivering detailed information right on time
Ratio ranging between 0 and 1, indicating the number of data elements missing as a fraction of the total number of data elements
Confidentiality Basic Patterns, i.e. Access matrix autorization Security Service Level
Reliability Level 3, defined, is to be achieved Reliability is binary: data is trusted, or it is not
Volatility Build systems with volatility in mind Frequency
4, quantitatively managedAccessibility Optimize Timeliness (i.e. Currency) Accessibility = 1 - (delivery time - input time) /
(outdated time - input time)Confidentiality Advanced patterns, i.e. role based access or multilevel
security Security Service Level
Consistency Create an Enterprise Architecture Ratio ranging between 0 and 1, indicating the number of data elements violating a specific relation type as a fraction of the total number of data elements
Currency Design for straight-through processing, business processes need to be measured and controlled
Currency (Time) = delivery time - input time + age
Timeliness Volatility is a fixed parameter, therefore to increase timeliness, currency needs to be reduced
Ratio as a function of volatility and currency: Timeliness = V*C
5, optimizingIntegrity, Data Instituting DMAIC, SLA, KGI, KPI, data life cycle
managementExternal Critical to Quality (Ext CTQ), Internal Critical to Quality (Int CTQ), Unit, Defects and Opportunities, and Population. Data Integrity reaches six sigma
15-Apr-23 F. Boterenbrood Page 5615-Apr-23 F. Boterenbrood Page 56
Research Improving data quality in higher educationThesisImproving data quality in higher education
Table 09: WDQM Goals expressed in Data Quality Dimensions, Practices and Attributes
15-Apr-23 F. Boterenbrood Page 5715-Apr-23 F. Boterenbrood Page 57
Research Improving data quality in higher educationThesisImproving data quality in higher education
5.2.4 (Time)related dimensions
Volatility, Currency, Timeliness and Accessibility describe the interaction of time and data. These
dimensions are firmly related, one dimension may actually determine the value of another.
Volatility describes at which frequency data changes in the real world, while Currency describes how
promptly data are updated in an information system. Currency is age + delivery time – input time,
meaning that data, before it is finally delivered, has been lingering both inside and outside the
information system for a certain period. Figure 11 shows two different values for Currency: A and B.
Timeliness describes the relation between Volatility and Currency. T=V*C, meaning that when
Currency is smaller than Volatility (A), Timeliness is smaller than one (T<1) and stakeholders have
access to data before the next change occurs. When delivered, data is current. If Currency is larger
than Volatility (B), Timeliness becomes larger than one (T >1) and data is changed in the real world
before stakeholders have access to this data. When delivered, data is no longer current. Wither this is a
problem is determined by Accessibility. Accessibility deals with the fact that data needs to be
delivered before it becomes insignificant. It is a ratio: Accessibility = 1 - (delivery time - input time) /
(outdated time - input time), in which we may well recognize the relation between Accessibility and
Currency.
Figure 11 shows Accessibility for Currency value A. Note that Outdated time does not necessarily
have a relationship with Volatility nor Currency.
Figure 11: Related Dimensions
15-Apr-23 F. Boterenbrood Page 5815-Apr-23 F. Boterenbrood Page 58
Frequency
Update
Rest
Time
Volatility
Time
Currency
AB
AccessibilityTime
Delivered
NotDelivered
Age Outdated timeDelivery time
Input time
A
Research Improving data quality in higher educationThesisImproving data quality in higher education
5.3 Business rules
This paragraph focuses on the following questions:
In higher education, what positive and negative correlations between maturity and data quality may be
found?
For this research, what is the relevant set of business rules?
How will this set of business rules evolve in time?
What data quality attributes are relevant for these business rules?
In this paragraph a view on business rules will be established first. Secondly, the business domain this
research focuses on will be defined and scoped. Based on design documents, relevant business rules
are identified. Finally, the business rules lead to the selection of relevant data quality dimensions, the
variables of which are populated using a workshop.
5.3.1 Business rules, a definition
In order to find and populate the right data quality attributes, the business rules that need to be met
have to be defined first. In literature, the view on what business rules are, slightly differs. Business
rules are “ a written definition of a business’s policies and practices” (Agrawal, Calo, Lee, Lobo, &
Verma, 2008) or “… requirements of the business that must be adhered to in order for the business to
function properly” (Johnson & Jones, 2008). They encompass “…the controls, processes mechanisms,
and standard operating procedures (SOPs) that need to be followed” (Conway & Conway, 2008).
In the view of D. Agrawal et al, business rules are high level descriptions, guiding the behavior of an
organization. Described at this level, the business rules might proof not to be specific enough in order
to obtain corresponding data quality attributes. The more specific notion, that of business rules being
requirements of the business (Johnson & Jones, 2008), encompassing controls, processes, mechanisms
and operating procedures (Conway & Conway, 2008) seems to be more fitting. At this level, they are
referred to as production rules by D. Agrawal et al. In this research, the operational notion of business
rules, as defined by (Johnson & Jones, 2008) (Conway & Conway, 2008) will be used.
To be meaningful, the notation of a business rule is to adhere to certain semantics:
“A business rule is a compact, atomic, well-formed, declarative statement about an aspect of a business
that can be expressed in terms that can be directly related to the business and its collaborator, using
simple, unambiguous language that is accessible to all interested parties: business owner, business
analyst, technical architect, customer, and so on. This simple language may include domain-specific
jargon” (Graham, 2007).
The interesting aspects of this definition of a business rule are that it is atomic (self-contained), well-
formed (written according to specific rules) and declarative (written in a statement style vocabulary).
A well-formed business rule is written in a when – then type of construct (Davis, 2009). Business rules
are about making decisions, and for good decisions, valid information is required, also referred to as
facts (Davis, 2009).
15-Apr-23 F. Boterenbrood Page 5915-Apr-23 F. Boterenbrood Page 59
Research Improving data quality in higher educationThesisImproving data quality in higher education
5.3.2 Study management
A brief history
Study management is without a doubt the most important business domain within an educational
institution. In a response to the emergence of the European Higher Education Area (EHEA) (Vught &
Huisman, 2009), Windesheim developed a new view on study management (Broers, 2007). This new
view, identified as student centered education, would offer the student more freedom in selecting
education of his own choice (Broers, 2007). This, together with the adoption of the European Credit
Transfer System, spawned a redesign of the curricula of the various Schools at Windesheim (Broers,
2007). At the basis of this redesign of curricula, a new didactical process was designed and standards
were put in place, guiding the change process. These standards were described and accepted in the
Windesheim Onderwijs Standaards21 WOS (Iersel, Loo, Serail, & Smulders, 2009). In 2006, a domain
architecture was designed, guiding the development and implementation of new information
technology (Jansen, 2006). The domain architecture incorporated the field of education, i.e.
management of the education catalogue, the study process itself (minor selection, study process and
assessments), and management of grades (manage study progress), as shown in figure 12 (Jansen,
2006).
RequestInformation Apply Graduate
DidacticalProcess
Scheduleopvragen
StudyAssess student
Supportingprocesses
Control Process
ManageAssessments
Leerprocesbegeleiding
ManageEducationCatalogue
(Studie-) loopbaan begeleiding
OrganizeEducation
CreateManagementInformation
ManageStudyProgress
DevelopEducation
Planning &ControlCycle
Beheer Bibliotheek
Beheerbekostiging
BeheerStudent gegevens
ScheduleProcess
Ondersteunenonderwijs-kundigen
Psychologische ondersteuning
Digididact:Beheer ELO
Internatio-nalisering
Begeleidinguitwisselingbuitenland
Decanaat
ondersteuning
OndersteunenSchools enbedrijfsburo
Werkendleren
Engage
AlumniApply
Select
OriëntateDiscussProgress
Contract
Figure 12: Domain architecture student centered education Windesheim (Jansen, 2006)
In 2007, the COTS22 application Educator was selected to support study management, and currently
implementation of this system is an ongoing process.
Looking ahead
In study management, potentially interesting experiences are becoming available. After investigating
the business rules, two case studies will be performed, collecting experiences from the Windesheim
School of Build, Environment and Transport and the Windesheim School of Business and Economics
respectively. Based on the importance of this domain to Windesheim, and the availability of
potentially interesting experiences in this field, this research will focus on the domain of study
definition, education, assessment and grading, supported by the information system Educator.
21 Windesheim Educational Standards
22 Commercial Of The Shelve
15-Apr-23 F. Boterenbrood Page 6015-Apr-23 F. Boterenbrood Page 60
Research Improving data quality in higher educationThesisImproving data quality in higher education
5.3.3 Business rule mining
For study management, the WOS (Iersel, Loo, Serail, & Smulders, 2009) identifies a set of business
rules in the form of high level descriptions, guiding the behavior of an organization (Agrawal, Calo,
Lee, Lobo, & Verma, 2008). These rules are presented in appendix 6.12.
However, the abstraction level of most of these rules is too high. In order to be able to define a data
quality threshold, a translation to more specific requirements (Johnson & Jones, 2008) is needed. This
translation is offered by the domain architecture (Jansen, 2006) European rules on Higher education
(European Commission, 2005) and Educator operating instruction notes. Information on scheduling is
provided by (Riet, 2009). And finally, to be useful for further analysis, business rule notation need to
adhere to the definitions of (Graham, 2007) and (Davis, 2009).
In order to identify the business rules more clearly, the business rules are arranged according to the
business processes identified in figure 12, Domain architecture student centered education
Windesheim (Jansen, 2006). These, more detailed business rules are documented in appendix 6.13.
Now in the domain of study management the relevant business rules have been identified, current and
required data quality maturity levels of the Educator domain at Windesheim may be defined.
5.4 Current data quality maturity level study management domain
In this paragraph, the following research question is answered:
What are the current organizational maturity and current values of data quality attributes?
Current data quality maturity at Windesheim can be established by finding data quality practices
currently invoked and trying to establish a view on the current values of the maturity dimensions.
However, it should be noted that it is more easy to ascertain wither a practice is in place, which is
essentially something that is been done or not, than it is to try to figure out what the value of a data
quality dimension is, which in most cases requires analysis tools to establish a measurement.
Therefore, discussing data quality dimensions was mainly used as a check on completeness,
improving research quality, making sure no issue has been overlooked. Nevertheless, the values of
data quality attributes, populating the data quality dimensions, are discussed at the end of this
paragraph.
In interviews with stakeholders current data quality practices and dimension values have been
discussed. Stakeholders include representatives of operations, functional support and process design,
as well as teaching and management staff from within schools. In total, five members of staff have
been interviewed. Each interview collected information on experiences and solutions first, and
discussed current values of maturity dimensions later. These interviews are documented in the
appendices 6.5 through 6.10.
15-Apr-23 F. Boterenbrood Page 6115-Apr-23 F. Boterenbrood Page 61
Research Improving data quality in higher educationThesisImproving data quality in higher education
5.4.1 Interview results
It is found that stakeholders not always agree on topics presented. Some think rather positive of
accessibility, while others point out that some functions of Educator are seemingly unnecessary
complex and over-engineered, hampering accessibility. Also, accountability and confidentiality are
regarded fitting by some, while others reveal breaches in security functions, compromising
confidentiality. Interesting is the observation on the role-based access mechanism of Educator, which
is deemed far too complex by one and perfectly flexible by the other.
Issues interviewees agreed upon are that Educator has far too few reporting options, enabling data to
be monitored, and that people entering data into Educator, creating havoc down the line, should be
confronted with and made to solve these problems themselves. However, some also noted that process
execution and process control require separate functions. An example is the support offices checking
milestones, study plans and course definitions.
It is found too, that the School of Business and Economics has experienced most problems, being
amongst the first using Educator, while the School of Build, Environment and Transport, entering the
Educator arena later, has learned from these experiences and strengthened their processes first, before
deciding to implement Educator.
Surprisingly, even though some improvements on data validity input checks were mentioned, no
interviewee believed that input checks could prevent data errors all together. This is supported by the
wide spread desire to check data using reports (reactive data quality management), rather than rely on
data being checked before it is stored (proactive data quality management).The School of BET has put
procedures in place guarding data quality prior to entering data into Educator. Yet, data may get
corrupted unnoticed as a result of software bugs or human error. In these cases, issues are corrected
once students complain.
It is commonly believed that the definition of courses is complex. The current product structure (OE
and VOE combined with Semester plan and Semester variant plan) is mentioned not to be used as
originally intended (educational process and course definitions are not in conformity). In at least one
case, verification and validation was implemented using manual processes outside Educator.
Timeliness and Completeness are noticed to be conflicting dimensions. In at least one case it was
mentioned that, in order to satisfy timeliness, completeness of data could be sacrificed. Important
milestones driving Timeliness are:
1. Validation and finalizing of course descriptions,
2. Validating and finalizing student activity plans,
3. Grading,
4. Valuating the outcome of the propaedeutics phase,
5. And, in the (near) future, printing diplomas and certificates.
5.4.2 Current Maturity
When we observe the data quality process areas of table 5, Windesheim Data Quality Maturity model,
we may observe that for 1. Structure, 2. Process, 3. Technology, 4. Information and 5. Staff some
process areas are available:
15-Apr-23 F. Boterenbrood Page 6215-Apr-23 F. Boterenbrood Page 62
Research Improving data quality in higher educationThesisImproving data quality in higher education
1. Level 2 process areas Project based development, Project teams and Ad Hoc problem solving
are present;
2. Level 2 process areas Data profiling and cleaning, Source rating, Schema matching and
cleaning, Business rule matching and New data acquisition are NOT present;
3. Level 2 process areas Data Analysis and Cleaning tools are NOT present. The File Transfer
data exchange pattern however IS present.
4. Awareness of the relevance of data quality is present and information is not trusted indeed.
5. Level 2 process areas Analytical competent, Knowledge of technology, business rules and
data sources, Data modeling knowledge are present.
Additional, we may recognize data quality process areas from higher levels being discussed as well:
6. Level 3 process area Technical Solution is being discussed.
7. Level 3 process area Data Responsible is being discussed and implemented.
Given the fact that the collection of level two process areas is only partially met, we may conclude
that currently, in the Educator domain, the data quality maturity of Windesheim still remains on level
one (Initial).
5.4.3 Current data quality dimension’s attribute values
In this paragraph, Table 9: WDQM Goals expressed in Data Quality Dimensions, Practices and
Attributes is used as a basis for triangulating the current data quality maturity in the study
management domain. This evaluation offers another view on current data quality maturity, validating
the observations in the previous paragraph.
It should be noted that, since none of the process areas Data profiling and cleaning, Source rating,
Schema matching & cleaning and Business rule matching are available, exact values could not be
assigned to data quality dimension’s attributes. However, in some cases the interviews indicated
dimensions to be ‘in control’ while other dimensions needed more attention.
WDQM Level five dimensions
Data Integrity not meeting six sigma by far, does not come as a surprise. Therefore, level five of the
WDQM has not been met indeed.
WDQM Level four dimensions
One dimension interviewees agreed upon was Volatility. It was commonly believed that volatility of
data in the Educator domain was low. Changes in data occurred once every few weeks, months and
even years. And even then, in Educator data does not actually change, in most cases new data is
added, extending the information already available. It was found that study information is altered
annually, or every half year in some cases. Grades are created quarterly, amounting to about 230.000
grades being registered at Windesheim each study period. Study plans are extended every six months.
That said, for the school of Build, Environment and Transport, current volatility of course definitions
was still deemed to be too high; it seems course information is adapted every few months, while this
type of data should be stable for at least three years.
A low-frequency Volatility should be good news for Currency. Currency however was reported to
have been troublesome, caused by instability of Educator, and the manual part of the process
15-Apr-23 F. Boterenbrood Page 6315-Apr-23 F. Boterenbrood Page 63
Research Improving data quality in higher educationThesisImproving data quality in higher education
consuming too much time. This last issue was solved by making the stakeholder entering the data
responsible for dealing with the consequences of long waiting times. As a result, Currency had been
improved.
Currency, Volatility and Timeliness are all related dimensions, therefore with Volatility being
comfortably low and Currency improving, Timeliness of data may be expected to be in control.
However, at this moment Timeliness is still mentioned to be problematic. When new education is to
be developed, development has to start well in advance of the targeted study period in order to deliver
study information in time. For many, this aspect of the educational process planning is perceived as
being complex, and activities are commonly initiated too late. Accessibility is little understood during
the interviews, yet when timeliness and currency are not in control, Accessibility is not in control
either.
An exception to this rule may be found at the school of Build, Environment and Transport. Here, the
business process served by Educator is strictly managed manually, having data being entered into the
information system only after elaborate checks. The study process itself is highly standardized,
resulting in more clarity for stakeholders involved. As a result, the school of Build, Environment and
Transport reports Accessibility to be in control.
The last dimension populating level four are Consistency and Confidentiality. As a result of the
system design, offering a comprehensive role based access mechanism, Confidentiality was perceived
to be fitting. One interviewee noted that Educator offered some back-door entries, suggesting possible
breaches in Confidentiality. Therefore, even while Educator offers level four compliant authorization
mechanisms, Confidentiality is in doubt. On Consistency, it seems the situation has grown from Bad
to Better. Educator is said to generate data codes automatically, replacing more and more manual data
code definitions, thus improving Consistency. At the school of Build, Consistency was improved by
rigid process design. Even though, it has been mentioned that course definitions are not consistently
described throughout the system, therefore Consistently too is not being met at WDQM level four.
Currently, based on data quality dimension values, data quality has not reached WDQM level four.
WDQM level three dimensions
Accountability is believed to be adequate, be it that in one situation it is found that an audit trail may
be omitted. Since this situation is to be considered a manual correction of erroneous datasets, and this
situation is recognized to be in decline, Accountability may be regarded fit for current business rules.
At level three, Accuracy is guarded using application input checks. Currently, this is not the case in
most instances. Again, the school of Build, Environment and Transport is less pessimistic, using strict
data input procedures. Yet, even here it is recognized that there still is room for improvement.
Completeness is regarded to be in control by many. However, the current dead-lines in Educator’s
process implementation (Timeliness) is said to have a negative impact on Completeness.
Confidentiality at this level is implemented by access matrices. Even though Educator offers role-
based access, reported back-door threads may render confidentiality inadequate.
Reliability is reported as being absent. Many inexplicable data quality issues were mentioned,
reducing reliability. It was mentioned that using Educator only once in a while, and inadequate
training and documentation may well be at the source of doubts. Often teachers make mistakes,
15-Apr-23 F. Boterenbrood Page 6415-Apr-23 F. Boterenbrood Page 64
Research Improving data quality in higher educationThesisImproving data quality in higher education
blaming the system. The absence of basic reporting facilities was mentioned as another cause of lack
of reliability. The school of Build reports to rely on their process design.
Volatility. It is not feasible to assess wither Educator was build with volatility in mind.
With the exception of Completeness, WDQM level three dimensions have not been met.
WDQM level two dimensions
At level two, Accuracy, Consistency, Referential Integrity and Uniqueness are instated using data
profiling and cleaning tools and database referential integrity constraints. The absence of these
process areas does not spell any good for these dimensions. Accuracy is reported to have been a
problem in the past indeed, however by making the stakeholder entering the data responsible for any
problems caused further down the process, Accuracy is said to have improved greatly. And
Uniqueness too has been reported to be in control. Consistency has been reported to be greatly
improved by replacing manual activities by automated procedures. Therefore, new data being entered
into Educator may well be more accurate, consistent and unique. However, historical errors are said to
still create havoc in data exchange processes. And Referential Integrity is found to be a problem,
partly because the Course Catalogue structure is perceived to be complex. Therefore, until current
faults in the database have been corrected, these dimensions are still not met.
On a score of 1 to 10, where 1 equals non-existent and 10 equals excellent, Specifications scores a 1.5,
or 2 at most. It is safe to say that this dimension is not met.
Whither volatility is recognized as a characteristic of data, is unknown.
The WDQM level two dimensions have not been met completely, and therefore, this level had not
been reached.
5.4.4 Conclusion
Since no data quality maturity level was found having all related dimensions properly instated,
evaluation data quality dimension attribute values verify that the current data quality values in the
study management domain are at WDQM level one (Initial). We may recognize some improvement at
the school of Build, Environment and Transport, due to a rather strict definition of the Educator
business process. It must be noted that improvements here came at the cost of creating an entire new,
manually managed, information system and management process shielding Educator from calamity.
Table 10 offers an overview.
15-Apr-23 F. Boterenbrood Page 6515-Apr-23 F. Boterenbrood Page 65
Research Improving data quality in higher educationThesisImproving data quality in higher education
Past CurrentData Quality Dimension Level Level Level Passed
Data Integrity > Six Sigma > Six Sigma 5 NoAccessibility Problematic Improved 4 NoConfidentiality In doubt In doubt 4 NoConsistency Bad Better 4 YesCurrency Low Improved 4 NoTimeliness Problematic Improved 4 NoAccountability Adequate Adequate 3 YesAccuracy Problematic Improved 3 NoCompleteness In Control In Control 3 YesConfidentiality In doubt In doubt 3 NoReliability Absent Absent 3 NoAccuracy Problematic Improved 2 YesConsistency Bad Better 2 YesReferential Integrity Problematic Problematic 2 NoSpecifications Absent Absent 2 NoUniqueness Unknon Adequate 2 YesVolatility Low Low
Level met?
Table 10: Current data quality dimension values
5.5 Required data quality maturity level study management domain
In this paragraph, the following main research question and sub questions is answered:
What values of data quality attributes will define the required data quality threshold and therefore the
required maturity structures at Windesheim?
a. To support the business rules identified earlier, what values should data quality attributes have?
b. What level of maturity is required to enable those data quality attribute values?
c. What organizational structure, process, technology, information and staff criteria define the
maturity found?
The required data quality maturity level will be identified by analyzing the outcome of the data quality
workshop (see appendix 6.11) and confronting this outcome with the initial research problem (see par
2.4):
At Windesheim, what defines the border between the control and integration stage? What are positive
and negative correlations between structures defining organizational maturity and attributes defining
data quality, enabling Windesheim to become a near zero-latency organization?
5.5.1 Workshop results
To assess the data quality required, a workshop was organized, enabling stakeholders from various
departments to translate their knowledge on the Educator domain and business rules into
requirements23.
In this workshop, specialists were requested to assign data quality dimensions to one of the four
phases of the study management process, based on the requirements posed by the business rules
involved. To create a functional selection process, the data quality dimensions were valued according
23 See appendix 6.11
15-Apr-23 F. Boterenbrood Page 6615-Apr-23 F. Boterenbrood Page 66
Research Improving data quality in higher educationThesisImproving data quality in higher education
to their position in the WDQM (see table 9), and the workshop participants were supplied with a
limited amount of ‘credits’. The underpinning WDQM model however, was not revealed. For most
dimensions, participants had the opportunity to choose between a ‘must have’ implementation, paying
the full price tag for this dimension, or they had the opportunity to choose for a ‘should have’
implementation, paying less but gaining a less satisfying situation. The results of this workshop are
summarized in table 11. Based on table 10, he last column reveals wither a dimension has already
been met at the maturity level specified.
Dim.Data Quality Dimension Level Required Level Required Level Required Level Required Level Required met?
Accountability 3 3 YesAccuracy 3 Should have 3 Should have 2 Should have 3 Must have 3 Must have NoCompleteness 3 Must have 3 Should Have 3 Should have 3 Must have 3 Must have YesConfidentiality 3 Must have 3 Must have NoConsistency 4 Should have 4 Should have YesCurrency 4 Should have 4 Must have 4 Must have 4 Must have NoReferential Integrity 2 Should have 2 Should have NoReliability 3 3 3 3 NoSpecifications 2 Should have 2 Should have NoTimeliness 4 Must have 4 Should have 4 Must have No
Manage catalogue Create studyplan Study Manage progress Overall
Table 11: data quality dimension assessment workshop results
Table 11 reveals that the study management process is divided into four sub processes:
1. Manage catalogue, resulting into courses being published;2. Create study plan, resulting in an updated personal activity plan;3. Study, resulting in grades being assigned;4. Manage progress, resulting in students receiving certificates, or study rejection letters.
The final column represents the overall score. If, in any sub process, a dimension is labeled
Mandatory, this dimension becomes mandatory for the whole domain. The reason for this is that the
process is one seamless cycle and in each step in this cycle all organizational units play an equal role.
It is simply not possible to have one step assigned to single unit that could be more mature than
others. For some dimensions, participants could choose between implementations on different levels
in the WDQM. This is the case for Accuracy, which can be implemented both at level 2 and level 3. In
that case, the highest level required prevails.
5.5.2 Discussion
It is interesting to see, that even though these are ‘expensive’ dimensions, the workshop results in a
massive interest in WDQM level 3 data quality dimensions. All level 3 dimensions (Accountability,
Accuracy, Completeness, Confidentiality and Reliability) are labeled to be ‘must have’ requirements.
In the current situation, timing poses many problems. It is therefore no surprise that Currency and
Timeliness are mentioned as ‘Must have’ dimensions. The high demand for data being timely and
current implies that data should be delivered before it gets updated (Timeliness < 1, see paragraph
5.2.4). Currency describes how promptly data are updated and is an function of age (of the data),
delivery time and input time: Currency = age + delivery time – input time. Timeliness is measured as
a ratio, indicating the availability for use of the data. It is expressed as a function of Volatility and
Currency: T = V*C. Since Volatility is constant, Timeliness is improved by reducing Currency. And
15-Apr-23 F. Boterenbrood Page 6715-Apr-23 F. Boterenbrood Page 67
Research Improving data quality in higher educationThesisImproving data quality in higher education
Currency is reduced by minimizing the age of data, or the gap between input time and delivery time.
Therefore, the result of the workshop can be interpreted as a demand to reduce waiting times (age and
(delivery time – input time)). In the interviews, multiple references are made about data being entered
into the system well beyond all deadlines. This is not so much a technical issue, The interviews reveal
that this issue is related to the age of data before it is entered into the system. Therefore, actions here
should aim on improving waiting times in the manual part of the study management process.
Having Consistency defined at level four as a ‘Should have’ is a bit surprising. It seems that the Dutch
awareness of costs has played a role here, buying a high-level dimension at a fraction of the price.
Table 10, paragraph 5.4.4, identified Consistency to be available already.
However, the workshop does leave no room for misinterpretation. If Educator is to succeed in fully
supporting the study management process, the organization needs to reach WDQM level three
(defined), and for some time related aspects, WDQM level four (quantitatively managed).
5.5.3 Initial Research Problem
This research was started as a result of Windesheim experiencing surprising problems while, in its
quest to become a near zero latency organization, implementing near real time integration solutions. Is
an organization operating at level three of the WDQM sufficiently tailored to address this initial
problem? Or is the initial research problem solved with a less far reaching solution, does a simpler
solution fit? Or is a WDQM level three implementation still not mature enough, does real time
integration call for a even more robust solution?
In paragraph 2.2.1, data quality errors were identified:
Enrolment of students results in duplicate accounts;
Painful mistakes like sending notifications to deceased students;
Due to database corruption, management reports are rendered useless;
Sometimes fields contain text-strings stating that ‘Debbie has to solve this problem’;
Names of students are completely missing, student addresses are incorrect, information is entered
in invalid fields;
Location (room) numbers are missing or contain special, unexpected codes;
Data is outdated or is valid in / refers to different time periods between information systems;
It was found that at least in one instance, lack of data quality caused a class to be scheduled in a
stair case.
These errors are mainly faults in Accuracy and Completeness. To solve these issues, Accuracy and
Completeness have to be addressed. Completeness is addressed at data quality maturity level three
(defined). Accuracy is available at level two (managed) already, be it at a rather reactive manner,
repairing errors once they appear in the database. This is too late, since by then these errors have
propagated through the automated interfaces, causing havoc in other applications. This means that a
data quality maturity level three (defined) implementation of Accuracy is required. As table 10,
paragraph 5.4.4. identifies, this is currently not the case.
The Master Data Management model too positions the definition of services for data integration at
level three, yet requires organizations to reach level four for implementing a Service Oriented
Architecture (Loshin, 2008).
15-Apr-23 F. Boterenbrood Page 6815-Apr-23 F. Boterenbrood Page 68
Research Improving data quality in higher educationThesisImproving data quality in higher education
Addressing the initial research problem calls for Windesheim to organize at WDQM level three
indeed, while further growth to level four is required if a fully fledged Service Oriented Architecture is
to be developed.
5.5.4 A data quality maturity level three (Defined) organization
An organization acting at data quality maturity level three (Defined):
Has a business-wide process view instead of a localized departmental view;
Conducts an effective programme management;
Develops systems based on formal Requirements;
Has an integrated view on its products and the (quality of its) product development cycle;
Has an integrated view on its corporate data;
Has learned to identify and address root causes when problems emerge;
Has data quality pro-actively guarded by technical barriers (input checks);
Implements new functionality after rigorous testing and accepting in separated environments;
Connects systems using available application interfaces;
Is provided with data which is fit for use;
Is monitoring data quality using a canonical data model;
Is serviced by staff having deep domain knowledge and being responsible for data quality.
5.5.5 Level 4 (quantitatively managed) requirements
To satisfy both the research goal and the study management process, Windesheim does not have to
fully implement data quality maturity level four (quantitatively managed). However, Currency and
Timeliness were mentioned as required data quality dimensions. In paragraph 5.5.2, age is discussed
influencing Timeliness and Currency. Addressing age causes an organization to enter the realm of
data quality maturity level four(quantitatively managed). There may be more compelling reasons to
implement data quality at this level.
When interviewed, de Graaf (see appendix 6.4) made a strong case for organizations to try and reach
data quality maturity level four:
“Especially beyond level three, data quality becomes a matter of special interest to organizations,
opening up a whole new realm of possibilities. What we can see beyond level three in practice today are
cloud computing for data quality initiatives, new business generated and successful one-on-one business
models based on reliable data” (Interview de Graaf, appendix 6.4)
Data simply becomes more valuable once an organization manages to reach data quality maturity
level four (quantitatively managed).
5.6 Growing from current to required maturity
Now that data quality and organizational maturity and the relation between the two are understood, an
instrument based on this relation has been developed, current and required maturity levels are
identified, in this paragraph the final research question can be addressed:
What is the gap between current maturity structures & data quality threshold and required
maturity structures & data quality threshold in the light of enabling Windesheim to become a
near zero latency organization?
15-Apr-23 F. Boterenbrood Page 6915-Apr-23 F. Boterenbrood Page 69
Research Improving data quality in higher educationThesisImproving data quality in higher education
a. What is the gap between the current and required organizational structure, process,
technology, information and staff criteria?
b. What conclusions and recommendations may be derived from this gap?
5.6.1 Gap analysis
Paragraph 5.4 Current data quality maturity level study management domain has determined the
current data quality maturity level to be one (Initial), while paragraph 5.5 Required data quality
maturity level study management domain identified the required data quality maturity level to be three
(Managed). This level of data quality maturity is required both to operate the Educator business
domain and to be able to deploy near real-time system integrating technologies successfully.
In the next paragraphs, the process areas in the field of structure, process, technology, information and
staff identified to be missing are discussed. This discussion is based on table 5, paragraph 5.1.4.
Structure
At level two, it is found that an organization has implemented a structured project based development
approach. In this research, Windesheim’s project management capabilities have not been evaluated. It
is unknown therefore to what extend Windesheim has mastered project based development. Assessing
an organization’s project management capabilities properly requires for a separate research, for which
in this project resources were unavailable.
At data quality maturity level three, Projects are to be managed in relation to each other, as a
programme, defined by a portfolio of projects serving a common goal. In this research, Windesheim’s
programme management capabilities have not been evaluated. It is unknown therefore to what extend
Windesheim has mastered programme based change. Again, assessing an organization’s programme
management capabilities properly requires for a separate research, for which resources were
unavailable in this project.
Process
At data quality maturity level two (managed) data quality is repaired reactively by implementing data
profiling and cleaning activities, source rating, schema matching and cleaning, business rule
matching and new data acquisition, resulting in improved accuracy, consistency, referential integrity
and up-to-date specifications. Referential Integrity and Specifications in particular are dimensions
mentioned to be problematic and absent respectively, therefore these Process Areas require attention.
Requirements development, Product Integration, Verification and Validation are all level three related
process areas aimed at constructing a product from different components, and assuring that the
product complies to requirements (Ahern, Clouse, & Turner, 2008). In at least one case, verification
and validation was implemented using manual processes outside Educator.
Currently, it is found that the Educator process is perceived to be complex. Some structures are not
used as originally intended (the current educational process and original requirements are not in
conformity). This may point towards a change in business rules. The required data quality threshold is
related to business rule demands, therefore, when business rules change, data quality dimensions may
well change with it. Even though workshop attendees formally agreed upon the business rules
presented, during interviews the digital course catalogue was identified as an area where, in practice,
these business rules may well have been redesigned.
15-Apr-23 F. Boterenbrood Page 7015-Apr-23 F. Boterenbrood Page 70
Research Improving data quality in higher educationThesisImproving data quality in higher education
Timeliness and Completeness are found to be conflicting. Timeliness is expressed to be required,
however Educator requires the description for a course to be complete before it is accepted. Course
information is not always complete that early in the process. In many cases, information on
assessments is finalized later, seemingly conflicting with timeliness. However, scheduling and
selection of courses might not have a strong relation with the final modes of assessing student’s
capabilities, leaving room perhaps for entering assessment information later.
Technology
To monitor data quality, data quality analysis tools are required. These tools are not available. During
interviews, the absence of insight in data quality is mentioned as one of the main obstacles towards
improvement.
Wither current ROTAP environments are sufficient has not been a subject during this research.
Assessing an organization’s research, develop, test, acceptance and production environments and
strategy properly requires for a separate research, for which resources were unavailable in this project.
Information
Currently, Information is known to be not trusted. This is likely not to change until WDQM level
three (defined) is reached. The main Key Performance Indicator of a data quality improvement
programme may be that in the end, Educator data has become fit for use and therefore trusted.
Staff
At data quality maturity level two, staff is analytical competent, has knowledge of technology,
business rules and data sources, and has data modeling knowledge. There have been no indications
that these competences were missing, it is assumed therefore that currently, at Windesheim these
competences present.
At level three, staff is responsible for data quality, extending the view from a single process step to
the end-to-end business process. During interviews, it was mentioned that the process in the Educator
domain was perceived as complex and difficult to oversee. Activities, like entering course definitions,
have to be planned well ahead of execution, enabling both scheduling and students to choose their
minors. Teachers were unaware of deadlines or unable to finalize education this early in the process.
This discussion may signal a problem. When crossing the barrier between having a local view on
matters and having a more holistic business process wide view, the technological discontinuity
presents itself (Paragraph 2.3.4) (Zee, 2001). This discontinuity is experienced as a setback. New
structures replace trusted old ones and are for the time being (perceived as) not as good as the ones
being replaced. A discussion on loosing the perceived freedom of changing educational definitions up
to the last moment may well be one of these new versus old structure discussions. The fact that these
new structures are required for coping with future challenges may not be recognized by all. The
effects of the technological discontinuity may strictly spoken not be part of the WDQM model, yet
when not taken into account, they may well prevent an organization from reaching data quality
maturity level three (managed).
15-Apr-23 F. Boterenbrood Page 7115-Apr-23 F. Boterenbrood Page 71
Research Improving data quality in higher educationThesisImproving data quality in higher education
5.6.2 Migration
In the field of organizational maturity, as it is in real live, no organization can skip levels. This means
that in the Educator domain, Windesheim has to master level two and three data quality maturity
process areas successively, as defined by table 5, paragraph 5.1.4.
The recommendations presented here aim to enable Windesheim to bridge the gap between current
and required situation, building on best practices identified, strengthening and accelerating the change
process.
The transition is defined as a two-staged process. First, data quality maturity level two (Managed) is
to be implemented. Once this level is established, the second step can be initiated, moving
Windesheim from data quality maturity level two (Managed) to data quality maturity level three
(Defined).
Data Quality Level two (Managed)
Level two is characterized by project based structures, re-active data cleansing processes and
technologies, and staff having local knowledge of business rules, data sources and data modeling.
Structure and Process
It is recommended to evaluate and (re)confirm Windesheim’s project management capabilities and
strategies.
It is recommended to initiate an Educator data quality improvement project. In this project:
Extending appendix 6.13, business rules are re-established and described in great detail,
identifying areas in which business rules have changed over time. An area of concern is the
digital course catalogue;
Using these detailed business rule descriptions, the Educator database is being profiled;
Data sources are rated and new data may be acquired;
The database is cleaned, i.e. data not matching established business rules is repaired;
Up to date Data Specifications are written.
These actions will establish Referential Integration and Specifications, and will improve data quality
maturity level two Accuracy, Consistency and Uniqueness.
15-Apr-23 F. Boterenbrood Page 7215-Apr-23 F. Boterenbrood Page 72
Research Improving data quality in higher educationThesisImproving data quality in higher education
Technology
It is recommended to have data quality analysis reports developed based on the business rules defined
earlier. The data quality analysis reports are based on the business rules established in the previous
step. This will improve Referential Integration, Specifications, Accuracy, Consistency and
Uniqueness.
Information
In this level, information is likely to remain Not Trusted.
WDQM level two (Managed) data quality dimensions Accuracy, Consistency, Referential Integrity,
Specifications and Uniqueness should all be satisfied now, be it that this may still be in a reactive
manner, using reports and data quality cleaning tools for repairs.
Data Quality Level three (Defined)
Once level two has been established, a formal transition to level three can be initiated. In this
transition, the focus shifts from a local view to a more holistic, process wide view.
Structure
It is recommended to evaluate and (re)confirm Windesheim’s programme management capabilities
and strategies.
It is recommended to define a lasting Educator enrollment programme, aimed at supporting Educator
at Windesheim. It may be noted that programme management is monitored by Key Business
Indicators (Ahern, Clouse, & Turner, 2008). This programme management is to be guided by valid
Windesheim Key Business Indicators, enabling Windesheim management to be in control of the
ongoing programme. Key Business Indicators may be found by observing the study management
baselines, i.e. Catalogue Management, Study Planning., Study & Grading and Progress Management.
Key Business Indicators may express the numbers of data errors in the catalogue, study plans, grade
assignments, rejection letters and certificates.
Furthermore, the programme includes all recommendations mentioned below.
Process
It is recommended that, where experiences gave rise to new insights and changed business rules, new
Educator requirements are developed and that the functionality of Educator is changed accordingly.
An area of concern is the digital course catalogue and the way data is shielded from unauthorized
access. During interviews, the ability to access data via back-door entries was mentioned. This action
will establish maturity level three Confidentiality.
It is recommended that, when data quality related issues arise, a formal root cause analysis is initiated.
This root cause analysis will identify the source of the data quality issues at hand.
It is recommended to implement formal requirements development. Based on the root cause
identified, new requirements will be developed and prioritized. These requirements present a
foundation for system adaptations and further development.
15-Apr-23 F. Boterenbrood Page 7315-Apr-23 F. Boterenbrood Page 73
Research Improving data quality in higher educationThesisImproving data quality in higher education
It is recommended to improve Educator’s support of verification and validation. Examples are input
checks and referential integrity checks, as well as improved process support, reminding lecturers of
upcoming timeframes and baselines. The verification and validation is to be based upon the business
rules described earlier. The actions described above will establish maturity level three Accuracy and
Reliability.
Technology
It is recommended to have Educator accept changes in course information up to the moment grades
are actually assigned, thus reducing the conflict between Completeness and Timeliness of data. Data
objects not to be changed after a course is selected by students, is to be made mandatory during course
definition. Other data objects may well be made optional.
It is recommended to have the current ROTAP environments and practices evaluated and formalized.
It is recommended to continue data integration using near-real time system interfaces. Starting a
Service Oriented Architecture altogether however, would require a further growth in data quality
maturity. These actions will support the implementation of maturity level four Currency and
Timeliness.
Information
At level three, information should have become fit for use. The WDQM level three (defined) data
quality dimensions Accountability, Accuracy, Completeness, Confidentiality and Reliability should
be satisfied. Presence of Reliability in particular signals the success of the data quality programme.
It is recommended to develop a canonical data model, supporting a corporate wide view on data being
exchanged between systems. This action will improve Uniqueness, Referential Integrity and
Specifications.
Staff
Implementing data quality maturity level three, focus shifts from a localized, departmental view to an
integrated, Windesheim-wide view and care has to be taken to overcome the technological
discontinuity (Zee, 2001) (paragraph 2.3.4 Growing Pains).
Interviews revealed that good results were gained from making personnel responsible for data quality
throughout the data life cycle. It is therefore recommended to:
For the variable student centered study period, make lecturers entering course information
responsible for assigning semester (variant) plans to student´s personal activity planning, as
opposed to having these activities assigned to support offices instead,
Make lecturers entering course information responsible for solving conflicts when course
definition and course execution differ.
These practices help in making the process transparent to both the lecturer and the student.
They may also help in overcoming the technological discontinuity.
Staff may need to be trained in the field of root cause analysis and requirements analysis.
One issue specifically targeted is the technological discontinuity. The programme (see: structure)
should give special attention to communicating the reasons of the change, have stakeholders
participate and recognize the difficulties associated with it. It should be recognized that education is a
15-Apr-23 F. Boterenbrood Page 7415-Apr-23 F. Boterenbrood Page 74
Research Improving data quality in higher educationThesisImproving data quality in higher education
very diverse environment, with personnel ranging from hard-core technical competent to deeply social
and artistically engaged. A programme just enrolling a new system is a recipe for disaster. A short
exploration of this specific element will reveal the multi-colored nature of Windesheim:
In ‘Leren veranderen’24, Caluwé en Vermaak (2006) group people into different modes of thinking,
labeled by colors (Yellow for power-based, Blue for process-based, Red for relation-based, Green for
learning-based and White for freedom-based thinking). For each color, people appreciate and respond
to change differently, need different guidance and require a specific approach. Now, what colors
define Windesheim? Let’s allow ourselves a little freedom of thinking in exploring the situation:
It is interesting to see that education in itself very much used to be a blue process. A student
defined a goal (‘I want to become a plumber’) chose an institution and found itself suddenly part
of a fixed process, in which he will transform from novice to educated plumber in a given period.
Today, this process is (at least partially) transformed from Blue into Green, in which the student
is encouraged to evaluate and re-set their own goals, make his own choices along the way, is
offered tailored learning situations and intensified personal coaching. We may recognize this
practice by the existence of the activity plan in which semester plans are selected.
Schools themselves may be described as White, in which teachers are professionals, only needing
space (and removal of obstacles) to prosper and follow their calling to create and transfer
knowledge.
Then there are the supporting departments, like Finance, Personnel, Facility Management and IT.
These departments are very Yellow by nature. Rules are defined by which personnel is hired,
assessed and rewarded, financial rules for accountancy are strictly observed, and instrumentation
for education is highly standardized.
Instead of Yellow, IT is more a Blue type of organization, trying to organize work in terms of
fixed and predictable processes.
Management is a main stakeholder too. Management of supporting departments tend to be
adopters of a Yellow point of view, keeping control over their business, while management of
education is more Green of nature, enabling their teachers to learn, reflect and grow. Management
therefore, is a rather multi-colored stakeholder.
This short exploration of the field of change reveals that at Windesheim, we may discover at least
Blue, Green, White, and Yellow oriented stakeholders, a clear indication that implementing change
using only one (perhaps blue oriented) technology driven approach will NOT succeed in overcoming
the technological discontinuity. It is therefore recommended to implement a broad programme to
support change, including as much stakeholders as possible.
Data Quality Level four (Quantitatively Managed)
It is recommended to, in order to improve Timeliness and Currency even further, communicate the
educational process baselines and associated deadlines as clearly as possible. Examples may be by
means of posters and leaflets, distribution of up-to-date Educator manuals and by training-on-the-job.
Finally, it is recommended to, once level three maturity is reached, not to stop, but to start an
discussion on extending data quality initiatives into WDQM level four.
24 Learning to adapt
15-Apr-23 F. Boterenbrood Page 7515-Apr-23 F. Boterenbrood Page 75
Research Improving data quality in higher educationThesisImproving data quality in higher education
5.7 Concluding
5.7.1 Conclusion
In this research, it is found that data quality is related to organizational maturity. This relation is
defined in the Windesheim Data Quality Maturity (WDQM) model. The model defines five separate
maturity levels, each defined by the presence of specific process areas (best practices) and resulting in
data having specific quality dimensions (characteristics). Levels range from Initial, through Managed,
Defined, Quantitatively Managed and finally, Optimized.
Interviews with specialists within Windesheim have revealed that currently, in the field of data quality
maturity, Windesheim is still at maturity level one (initial). This is indicated by the incidents found to
be occurring in the Educator domain, the way these incidents are being dealt with and the tools
available to monitor and correct data quality faults. This indication is then confirmed by observing the
process areas and data quality dimensions implemented.
In order to execute the educational process as supported by Educator in a reliable and efficient
manner, and to be able to implement near real-time message based application interfaces, it is
discovered that mastering data quality maturity level three (defined) is required, while getting data
through the process in time, requires some level four process areas to be implemented. It is to be noted
that the business value of data increases dramatically once an organization succeeds in implementing
data quality maturity level four (quantitatively managed).
5.7.2 Recommendations
A two-phased approach is recommended, implementing data quality maturity level two first, and level
three later.
Data quality maturity level two (managed) is reached by solving the immediate data quality problems.
This is done by starting an Educator data quality improvement project. This project will describe the
study management business rules in great detail, investigate Educator database quality using these
business rules and repair errors found. At the end of the project, data managed by Educator is
documented (creating insight) and reports are present, enabling operations to compare actual data
quality with the required data quality as documented.
Data quality maturity level three (defined) is reached by creating a holistic view. This means that
change is managed as a coherent programme, rather than in the form of multiple isolated projects,
when problems arise a formal root cause analysis is performed first, resulting in requirements being
designed and results being tested against these requirements. With respect to the study management
process, the process as a whole is observed. This view may trigger new requirements and changes in
the current implementation of Educator, examples of which may be the assignment of process
responsibilities to teachers, organizing for more flexibility in the process, adding process schedule
support and simplifying structures.
At data quality maturity level two, data quality is guarded in a rather reactive manner, using reports,
analysis tools and repair tools to correct issues. At data quality maturity level three, data quality is
guarded in a pro-active manner, using input checks and integrity checks to guard quality before it is
stored. Extending the documentation created at level two, in data quality level three data being
communicated between systems is described using a canonical data model.
15-Apr-23 F. Boterenbrood Page 7615-Apr-23 F. Boterenbrood Page 76
Research Improving data quality in higher educationThesisImproving data quality in higher education
Once level three is mastered, reaching for a full implementation of data quality maturity level four
(quantitatively managed) is not required. However, timing issues demand some level four process
areas to be implemented and, again, it is noted that once arrived at level four, organizations start to
yield great benefits from their data. It is therefore additionally recommended to communicate the
educational process baselines and associated deadlines as clearly as possible, and once level three
maturity is reached, not to stop, but to extend data quality initiatives into data quality maturity level
four.
It is most important that, by crossing the border between having a local departmental view and having
a Windesheim-wide view, it is recognized that the organization may experience a crisis, known as the
technological discontinuity (see paragraph 2.3.4 Growing Pains). Care has to be taken to overcome
this discontinuity. Activities here are people oriented: give special attention to communicating the
reasons of change, recognize the difficulties associated with it, involve stakeholders and respect the
different concerns each group of stakeholders have, communicate process milestones using bi-annual
calendars on poster format.
Issues not addressed in this research are the current status of project management, programme
management and ROTAP environments at Windesheim. Investigating these issues properly requires
for separate research projects, time and resources of which were not available during this graduation
project.
5.7.3 Stakeholder Value
At the start of the project, three groups of stakeholders were identified (paragraph 2.5).
Committed stakeholders are the CIO, Information Manager and Science. The scientific value of this
project is discussed in the next paragraph. For the CIO, this research has provided an instrument
guiding project portfolio management, linking change required with business objectives. The trigger
and initial problem for this research were the difficulties experienced whilst trying to become a near
zero latency organization. During the research it became clear that the actual business benefit reaches
much further than that. The instrument enables the CIO to fine-tune investments in improving the
study management process. For the Information Manager, the instrument acts as a guide in reducing
errors in data processing, increase efficiency, manage responsibilities, improve business intelligence
and save costs by reducing rework.
Involved stakeholders are Management, Operations, Functional Support, System Integration and the
Security Manager. At the end of the process, management will be provided with reliable data,
supporting business process management. Operations and Functional Support will spend less time on
rework and error correction. System Integration will be able to produce reliable and stable near
real/time integration services. The Security Manager is will notice an increased integrity and
availability of data.
Affected stakeholders are the Board, Staff and Students. The Board will notice improved image,
student satisfaction and process efficiency. Students will notice prompt responses when assessments
are graded, reduced complexity and a reduction in errors. Staff will experience simplified
administrative tasks, a clear-cut study management process and direct communication with students.
15-Apr-23 F. Boterenbrood Page 7715-Apr-23 F. Boterenbrood Page 77
Research Improving data quality in higher educationThesisImproving data quality in higher education
5.7.4 Achieved Reliability and Validity
Many theories on maturity and quality were discussed and balanced. The results were checked by a
survey amongst specialists. Those specialists were chosen based on their experience with data quality
and organizational maturity. Population of quality attribute values was performed by a workshop
involving Windesheim specialists, enabling them to reflect on the process and results. Windesheim
specialists were chosen based on their experience and involvement in data management in study
management. Care was taken to involve participants from a department known to have had trouble
ensuring data quality and a department known for successfully solving data quality issues. In many
cases, triangulation was used to cross-check results. This was done by comparing multiple aspects of
an outcome, or by evaluating an observation using multiple theories. Examples of this are the
observation of both process areas and maturity dimensions to ascertain the current maturity,
confronting the WDQM model with multiple theories, and explicitly validating business rules found
during interviews and the workshop. Building on multiple, accepted sources, reflection on results
acquired and open discussion ensured internal validity, while applying the grounded theory approach
ensured external validity.
During interviews, experts involved in this research agreed on the business rules and the model as
presented, with only a few modifications to be made. Which in fact, may make one a bit wary, for
where is the discussion? The WDQM model has been crafted and is used once, and one may wonder
of this is enough proof of its qualities. Wither it is balanced enough and incorporating the right
process areas and goals should be ascertained in multiple assignments and open discussion. Which
calls for a whole new research.
5.7.5 Scientific Value and Innovativeness
With this research, based on recent theories on data quality, an up-to-date instrument has been created
and used, pinpointing required data quality dimensions to satisfy business process’ needs, and
translating these data quality requirements into organizational measures to be taken. The instrument is
based upon well established general theories on production quality improvement and specific theories
on data quality and combines these theories into one framework.
5.7.6 Generalisation
The instrument developed has been successfully used at Windesheim, yet it is not bound by the study
management domain, or even one type of organization. Even though business rules require for the
study process to be implemented at WDQM level three, solving the initial business problem requires
for a WDQM level three implementation too, and this alone includes ALL Windesheim processes.
The instrument is ‘solution independent’, since it is based on open and well established models and
theories, and thus can be used in other domains within Windesheim, other higher education
institutions, organizations in other branches and even other nations. The theories behind it apply to all
organizations – as long as these organization rely on data being processed.
5.7.7 Research Questions Answered
Main Q1: Observing theories on maturity and data quality, and external benchmarks, what positive
and negative correlations between structures defining maturity and data quality attributes may be
found?
15-Apr-23 F. Boterenbrood Page 7815-Apr-23 F. Boterenbrood Page 78
Research Improving data quality in higher educationThesisImproving data quality in higher education
1. What structures define maturity?
a. What levels of maturity do exist?
Five levels of maturity have been defined, ranging from Initial through Managed, Defined,
Quantitatively Managed to Optimizing.
b. What maturity structures in the field of organizational structure, process, technology,
information and staff describe each level?
The maturity structures in the field of organizational structure, process, technology,
information and staff are documented in table 5, paragraph 5.1.4.
2. In higher education, what positive and negative correlations between maturity and data quality
may be found?
a. For this research, what is the relevant set of business rules?
The relevant set of business rules is documented in appendix 6.13.
b. How will this set of business rules evolve in time?
It is found that some parts of the educational process are perceived to be complex. A
reduction of complexity may well be in order. An area mentioned is the digital course
catalogue.
c. What data quality attributes are relevant for these business rules?
The data quality attributes relevant to the set of business rules are Accountability, Accuracy,
Completeness, Confidentiality, Consistency, Currency, Referential Integrity, Reliability,
Specifications and Timeliness.
d. What values of data quality attributes correlate with each level of maturity?
The relation between data quality attribute values and maturity levels is documented in table
9, paragraph 5.2.2.
e. What do process quality theories describe about positive correlations between quality and
maturity?
In most cases, Process Quality theories are derived from CMMI and therefore tend to
describe a common picture: structured initiatives prevail over individualistic initiatives,
holistic initiatives prevail over structured initiatives, repetitive processes including feedback
loops prevail over holistic initiatives. An exception has been found in the Data Quality
Management Maturity Model (paragraph 5.1.5), adding a higher abstraction level to data
management in each successive maturity level.
f. What do process quality theories describe about negative correlations between quality and
maturity?
In literature, this item is not addressed explicitly.
g. Are those observations consistent?
This question has become redundant.
Main Q2: What values of data quality attributes will define the required data quality threshold and
therefore the required maturity structures at Windesheim?
1. To support the business rules identified earlier, what values should data quality attributes have?
15-Apr-23 F. Boterenbrood Page 7915-Apr-23 F. Boterenbrood Page 79
Research Improving data quality in higher educationThesisImproving data quality in higher education
Required Accuracy is to be pro-active and Confidentiality is required to be basic. Furthermore,
data should be Current en Timely.
2. What level of maturity is required to enable those data quality attribute values?
Windesheim is required to implement data quality maturity level three (defined) completely, and
for Timeliness and Currency some level four (quantitatively managed) process areas.
3. What organizational structure, process, technology, information and staff criteria define the
maturity found?
The minimal list of structure, process, technology, information and staff criteria defining the
maturity found are defined by the process areas of maturity level two and three of table 5,
paragraph 5.1.4.
Main Q3: What are the current organizational maturity and current values of data quality attributes?
The current organizational maturity and values of data quality attributes correspond with data quality
maturity level one (initial).
Main Q4 (Central research question): What is the gap between current maturity structures & data
quality threshold and required maturity structures & data quality threshold in the light of enabling
Windesheim to become a near zero latency organization?
1. What is the gap between the current and required organizational structure, process, technology,
information and staff criteria?
This gap is documented in paragraph 5.6.1.
2. What conclusions and recommendations may be derived from this gap?
Detailed conclusions and recommendations are documented in paragraph 5.6.2. This paragraph
is summarized as follows:
In order to execute the educational process as supported by Educator in a reliable and efficient
manner, and to be able to implement near real-time message based application interfaces, it is
discovered that mastering data quality maturity level three (defined) is required, while getting
data through the process in time, requires some level four process areas to be implemented.
A two-phased approach is recommended, implementing data quality maturity level two first, and
level three later.
At data quality maturity level two, data quality is guarded in a rather reactive manner, using
reports, analysis and repair tools to correct data quality issues.
At data quality maturity level three, data quality is guarded in a pro-active manner, using input
checks and integrity checks to guard quality before data is stored. Extending the documentation
created at level two, in data quality level three data being communicated between systems is
described using a canonical data model.
Reaching for a full implementation of data quality maturity level four (quantitatively managed) is
not required However, to resolve timing issues and increase benefits, it is additionally
recommended to extend data quality initiatives into WDQM level four (quantitatively managed).
15-Apr-23 F. Boterenbrood Page 8015-Apr-23 F. Boterenbrood Page 80
Research Improving data quality in higher educationThesisImproving data quality in higher education
It is most important that care is being taken to overcome the technological discontinuity by
involving all stakeholders in the migration process.
5.7.8 Recommendation on further research
In this research, three elements have remained virtually untouched:
Current and required status of project management;
Current and required status of programme management;
Current and required status of ROTAP environment management.
Addressing these issues is imperative for reaching WDQM levels two and three. It is recommended
therefore, to investigate the current and required status of these process areas and advice on a
migration strategy when applicable.
Now the route towards a fitting level of data quality has been designed, this route will have to be
travelled. An intervention based research may be started, evaluating the progress of growth in
maturity, looking for problems during implementation and delivering advice on solving these
problems.
5.7.9 Reflection
At the start of this graduation project, I have set three goals as a target. The first goal was to deliver
‘value for money’, to achieve a result that would justify the investment my employer has made in my
education. The second goal, equally (or perhaps even more) important to me, was to reach the end of
the project with good (not just satisfactory) results. And the third goal was to make a difference, to
learn and add new knowledge to the IT profession.
During the execution of the project, I have witnessed reactions and gained insights to contemplate
upon. Let’s start with the last goal, and work our way back to the first.
I have discussed the WDQM with experts on data quality and maturity. In two occasions, interest in
the WDQM as an instrument was raised, and I was invited to publish my experiences in the form of an
article in the future. In one instance, I was even invited to join in writing a book on this matter.
Therefore, I feel confident that I have stumbled upon something interesting here. The box on goal
number three is ticked.
To reach the end of this graduation with good results is a more difficult goal to predict. I feel
confident that the results will be satisfactory, but are they going to be good? There is a lot of
uncertainty here at this moment. However, what I know is that I have done my absolute best – I have
enjoyed this graduation project and could simply not have done things any better than this. To me, the
diagnosis and research part of this project is covered to my very best abilities. Therefore, I consider
this box to be ticked too.
Yet, on the first goal, it was more difficult to get a grip on the matter. It was quite uncertain if the
answers to the questions asked would produce result specific enough to deliver usable advice. And in
the end, the subject proofed to be very comprehensive. A migration in data quality maturity involves
many aspects, some of which could only be addressed briefly with the resources available for this
research. Indeed, issues like project management, programme management and ROTAP management
require a research project of their own, and deserve more attention than received right now.
15-Apr-23 F. Boterenbrood Page 8115-Apr-23 F. Boterenbrood Page 81
Research Improving data quality in higher educationThesisImproving data quality in higher education
What would I do differently next time? In this project, the development of an instrument to measure
data quality maturity was required prior to analyze the business problem and advice on solving this. In
fact, we may have well executed two research projects here: a design oriented research resulting in an
instrument and a diagnostic research, resulting in advice. The absence of a detailed and up-to-date
instrument forced this research to create one, and the need to supply valuable advice required for the
instrument to be applied. There was no escape of combining the two types of research. It may well be
that on the second part of the project, I could have involved the current organization more, exploring
areas more deeply that, right now, may have only been touched on the surface.
15-Apr-23 F. Boterenbrood Page 8215-Apr-23 F. Boterenbrood Page 82
Research Improving data quality in higher educationThesisImproving data quality in higher education
6. Appendices
6.1 Interview Report Windesheim Integration Team
Interview report system integration team Windesheim
Attendees:
Tonny Butterhoff, System Integration
Gerben Meuleman, System Integration
Gerben de Wolf, System Integration
Albert Paans, Information Management
Alex Geerts IT Front office
Windesheim, 11/11/2009
What are the responsibilities of the system integration team?
Currently, the system integration team (formerly known as KOAVRA: Koppelen onder architectuur
voor Vraagsturing25) is connecting systems at Windesheim in a service oriented architecture. First of
all, the process being supported by real-team coupling is the HRM process. When a new employee is
hired, account information is send in real time granting the new employee immediate access to
information systems.
A similar process, aimed at proliferating student account information is currently being tested and is
planned to be accepted for production shortly.
Finally, real-time service based information exchange processes concerning study information and
supporting study processes are being built.
What technologies are being utilized?
Systems being integrated are all standard packages: CATS student information system, Oracle HRM,
Planon Facility Management, Decos document management, Educator Learning Environment,
Blackboard Learning Environment. The Enterprise Service Bus is delivered by Cordys.
For some systems, building a service interface layer was quite simple. Decos for instance is an up-to-
date system supporting the use of web services. Planon and Oracle HRM are at the other end of the
scale, offering no support for web services at all. For these packages, an interface utilizing database
injection code had to be developed. In the near future however, Planon at least promises to offer a
more modern solution.
What issues in relation to data quality are found?
25 Coupling Under Architecture for student-centric education
15-Apr-23 F. Boterenbrood Page 8315-Apr-23 F. Boterenbrood Page 83
Research Improving data quality in higher educationThesisImproving data quality in higher education
Data quality related issues are commonly found. An example of a recurring problem is that female
students enroll themselves multiple times, using either their maiden name or the family name of their
spouse by mistake. Other well known and regular issues are cases in which the name of a student is
completely missing, student addresses are incorrect, information is entered in wrong fields.
In the facility management system, new personnel is assigned to a room using four zero’s as a room
identification. When new personnel arrives, manual processes ensure the assignment of the correct
room number. Unfortunately, in many cases these processes fail to correct the number in time (or
seem to correct the number not at all).
In Oracle HRM, sometimes missing information is replaced by text-strings stating that ‘Debbie has to
solve this problem’.
DECOS seems to have become a victim of migration efforts. Even though being one of the most
modern systems, interfacing resulted in a myriad of errors and unexplainable results. Upon closer
inspection, the database seems to be corrupted, as if multiple migration attempts had been made, all of
which at some point failed, leaving about 10% of the DECOS database in ruins.
Recently management focus was on correctly clearing information in case of student decease. Even
though hard evidence is missing, it seems that in the recent past information was send to deceased
students’ addresses.
What has been found too is that sometimes the timeframes of business processes itself causes
problems. In some cases students receive no formal clearing, for instance due to the fact that study
fees have not been paid. Even though those students may start their study, they do not receive a
student account yet. This situation may be misinterpreted as a data quality error.
A final issue is that all cooperating systems store data in different time frames. Oracle HRM is very
good in keeping a historic track of all data, where the student information system CATS seems to be
able to store the present situation only.
What consequences arise as a result of these issues?
Information about duplicate accounts is propagated into a adjacent information systems (currently
using file oriented interfaces) and removing those duplicate accounts take considerable effort. Even
worse perhaps, is that the existence of duplicate accounts may lead to errors in the student head-count,
leading to uncertain financial budgets.
Propagation of errors is an effect that is related to almost every issue found. Consequences are that
information systems produce incorrect information, resulting in loss of confidence. Secondly, it is
difficult and time-consuming (costly!) to find and repair those errors.
Painful mistakes like sending the wrong mail to deceased students may lead to serious image damage.
In the standard file-based integration processes, every day time is spend by operations checking the
batch files for errors manually. In fact, manual inspections and correction processes are found
everywhere. DECOS for instance is used as a source for management information, Due to the fact that
the DECOS database is corrupted, those management information reports are checked manually. If
suspicious information is found, the numbers are corrected manually. Not only does this ad-hoc
practice consume time, it effectively renders the management reports useless.
15-Apr-23 F. Boterenbrood Page 8415-Apr-23 F. Boterenbrood Page 84
Research Improving data quality in higher educationThesisImproving data quality in higher education
Where data quality errors in HRM lead the way to the solution (‘Debbie should solve it’ – so let’s ask
Debbie), problems related with facility management tend to keep everyone in the dark. When material
(like a computer) is ordered for new personnel, getting the equipment delivered proves to be a
challenge, since information regarding a valid location is not available. Not only does this practice
lead to time lost, new personnel is not served professionally on their first day on the job, damaging
Windesheim image.
Differences in storing data mutations over time may lead to incorrect system responses, like sending
information to a student’s new address prior to the actual date of moving. However, it is unknown if
anything like this has happened already.
Are there any causes and solutions identified already?
The use of Commercial Off The Shelve applications seems to contribute to the inventive use of data
fields. Packages not always support a flexible and proper implementation of a business process, and
sometimes an inventive implementation will have to be found. And then again, standard solutions not
always offer the input checks one would like to see. Even the national student portal Studielink
(www.studielink.nl) allows for students to fill out their application forms incorrectly. It would help if
correctness of data was enforced ‘at the source’.
The distinction between correct and flawed data is not always clear. To solve this problem,
development of a canonical data model is planned.
6.2 Interview Report WDQM Marlies van Steenbergen
Discussion Marlies van Steenbergen MSc Lead Architect Sogeti
Subject Validity WDQM model
Date 12 march 2010
First of all, it is noted that at level 4, emphasis lies on being able to manage a process quantitatively,
which implies the presence of a measurement mechanism.
In the WDQM, having no process area’s defined at level 1 is recognized to be correct.
At level 2, the initial positioning of root cause analysis in the process column is questionable. When
properly conducted, a root cause analysis leads to identification of underlying problems and enables a
more lasting solution. Therefore, root cause analysis may be positioned at level 3 instead.
Information being unspecified at level 1, not trusted at level 2 and structured at level 3 is not directly
based on evidence in literature. The reasoning behind these labels became clear during the discussion
and are recognized, but may need some further explanation.
At level 3, the focus is on being able to manage multiple changes in harmony and creating synergy.
Therefore, the term project management may better be replaced by Program management. And in
many cases, portfolio management is used to indicate synergy. Yet, portfolio management is often
used in conjunction with (IT) Governance, utilizing frameworks like COBIT, BiSL and ITIL. These
frameworks may fit level 4, quantitatively managed better, since their focus is on supporting the
whole process / product life cycle. Therefore, using program management instead of project
management at level 3 seems to be appropriate.
15-Apr-23 F. Boterenbrood Page 8515-Apr-23 F. Boterenbrood Page 85
Research Improving data quality in higher educationThesisImproving data quality in higher education
The explanation of the process activities on level 3 may be made more explicit. Technical solution
might be more appropriate at level 2 in the technological column. And is data integration an activity
that may better be positioned in the technical column at level 3? What are the relevant data integration
patterns here? It may be argued that at this level, data is integrated with other sources using
translation routines at the borders of each source. Supporting these translations may well be
translation script, resulting in the emergence of a canonical data model: a bottom-up description of
data being transferred between sources. In the information column at level3, we may see the
emergence of a canonical data model.
At level 3, in the staff column, data modeling knowledge is positioned. This raises the question how
personnel is able to develop information systems and solve data quality problems at level 2 in the first
place. It is therefore recommended to reposition data modeling knowledge at level 2. In this cell,
project management skill may be replaced by better fitting programme management skills. One may
argue that at this level, staff is “synergytical” competent, since staff has learned to create synergy
from combining multiple transformations (projects).
At level 4, data is approached as a product. The presence of an information product manager at this
level makes good sense. But at level 3, data may be recognized to be raw material, building blocks, a
commodity perhaps. At level 3, who is responsible for this material?
Since level 4 incorporates end-to-end business process management, all measurement and analysis
instruments to enable level 5 may be present at level 4 already. In the technology column, which
integration patterns apply here? In the information column, the canonical data model may well be
used to define a common information language to which all data sources adhere.
At level 5, the absence of general theories on data quality is not completely surprising. It seems that
data quality theories are focused on improving the data quality to an acceptable level (fit for use).
Applying six sigma may work, yet in some cases it has occurred that level 5 is discarded all together,
since the organization in question had no intention to reach this level. Delevring quality according to a
service level agreement does seem to fit level 4 better. It is advised to reposition this process area at
level 4. The organization being structured in a strict top-down hierarchy is based on theories from
Treacy and Wiersema. This should be explained in more detail.
A rather interesting issue may be that from level 3 onwards, it is implicitly described that data errors
are improved at the data source, not at the place they create havoc. This means that a continuous
improvement cycle has been defined at level 3 already. What does this mean for level 5?
6.3 Interview Report Data Quality in Education Th. J.G. Thiadens
Attendees: dr. mr. ir. Th. J.G. Thiadens, lector IT Governance, Fontys university of applied science
F. Boterenbrood
Doorn, 15-03-2010
This discussion is about data strategies in higher education. Issues discussed are the historical
perspective on IT Governance, the current status, regular problems and common solutions, and the
future of IT in higher education.
Fontys university of applied science is characterized by 35 separate schools. The decentralized
structure of the organization resulted in the presence of about 600 simultaneous projects, all resulting
15-Apr-23 F. Boterenbrood Page 8615-Apr-23 F. Boterenbrood Page 86
Research Improving data quality in higher educationThesisImproving data quality in higher education
in an IT solution. 10-15 of these projects are centrally managed. The remaining projects are local
initiatives within the 38 schools. The governance of the 10-15 centrally managed projects is
transparent, while the remaining projects are executed without central guidance. One feels, that a
portfolio of IT projects should deal with all 600 projects.
This is in fact a position many universities are experiencing today. The Dutch universities of applied
science are the result of a merger of many smaller institutions in higher education. The resulting
institutions are large organizations, be it rather decentralized. Currently, a move towards more
centralized modes of governance is visible. However, data quality may not always benefit from
centralizing. Procedures involving data being transferred between systems manually are prone to
errors. The books of Starreveld mention that manual record keeping can lead to up to 5% errors in
data quality.
In many cases, data quality may be improved by shifting responsibilities as low as possible down the
hierarchy. Examples are:
• Problems in grade assignment may be solved by making the lecturer directly responsible for
correct and timely grading. Lecturers are corrected by students when grade assignment is late
or questionable.
• Registration of lecturer availability may be much improved if the lecturer is made personally
responsible for this information, and is given the right tools to manage this information. The
effects of not having registered the right information on time (the lecturer finds himself
scheduled at undesired moments) may be a fitting incentive to have this information up to
date.
• Within schools, items are ordered and these items will have to be billed. Billing processes
should make the school which placed the order responsible for paying the bills. In this way,
schools are directly confronted with financial consequences of choices, and not at the end of
the year, by means of a error-prone budgeting process. This may be implemented by
positioning financial controllers at decentralized positions.
• Monitoring study progress is a responsibility which could be both centralized and
decentralized simultaneously. Student centered education requires for study progress
monitoring to be decentralized, allowing for study coaches to closely monitor individual
student’s progress, while business intelligence processes supply management with over-all
corporate controls.
• Examples of responsibilities that should remain centralized are strategic management and
setting the rules for employee benefits.
Transferring responsibilities to the individual is in line with current use of technological developments
like the internet, in which the individual has gained in influence. Information is perceived to be an
individual asset. This will lead to an individual approach of information. An example is given by
Harvard University, where students are presented by individual schedules every day, including
proposals for alternative classes the student may wish to attend this day.
In many cases, information systems are not trusted. Often, managers rely on information acquired
from alternative sources or different indices. The number of employees working at an organization for
instance can be found by looking at the number of monthly salary deposits.
In the future, it is to be expected that information processing is centralized even further. Private cloud
computing has a role to play, enabling multiple institutions to share services. Virtualization too
supports the emergence of shared service centers, while respecting decentralized needs. The most
15-Apr-23 F. Boterenbrood Page 8715-Apr-23 F. Boterenbrood Page 87
Research Improving data quality in higher educationThesisImproving data quality in higher education
difficult hurdle to be solved here is to overcome the notion that information is not owned by the
decentralized business units. This requires excellence on academical level to be present at both
management and workforce.
6.4 Interview WDQM dimensions Report Arjen de Graaf
Attendees Arjen de Graaf, Founder / CEO Arvix
Frank Boterenbrood
Subject Validity WDQM model
Date 09 april 2010
Introduction
As founder and CEO of Arvix, a company focused on safeguarding and improving data quality, Arjen
de Graaf has deep knowledge of data quality and its relation with organizational maturity. In this
meeting, the WDQM goals as described in table 9 are discussed.
Ownership, stewardship and a business case for data quality.
In many organizations, an employee is assigned responsibility for the quality of data. However, once
asked for the means available to monitor and correct this data, the answer is not always satisfactory.
Effective means to influence data quality are absent in many cases. Absence of means results in a
situation where one can feel responsible for data quality, but in reality, one can not actually be
responsible. In other words, the data steward, as mentioned in this research, cannot fulfill his role as
caretaker for data quality if the means to effectively influence data quality do not come with the job.
Since data quality is related to organizational maturity, the means required are managerial rather than
technical. To ensure data quality, one may have to be prepared to restructure the organization.
Instating data stewardship without the preparedness of taking (perhaps drastic) managerial decisions,
restructuring the fabric of an organization, may be in vain. There HAS to be a manager responsible for
data quality with the authority to implement change.
What can be observed is that organizations assign data quality governance not to one employee or
role, but instate a business intelligence department or data quality department. This department is
assigned the task of providing the organization with valid business indicators, directly influencing
operational processes and management decisions. In this case, data quality and business performance
are visibly connected, displaying a clear business case for data quality.
Talking of business cases: businesses are confronted with the situation that customers have directly
access to operational data and demand near real-time responsiveness. Today, when data is flawed, an
organization does not have the means nor the time to correct this data in internal processes and
procedures, and the business runs the risk of finding itself on one of the prime-time consumer
platform television shows, explaining why it all went so horribly wrong.
Value of data quality
It is important to be able to express data quality as a valuable asset of an organization. This means that
data quality has value, it can and must be expressed in terms that have meaning to the business. In the
current model, this approach towards data quality seems to be rather instrumental and the business
15-Apr-23 F. Boterenbrood Page 8815-Apr-23 F. Boterenbrood Page 88
Research Improving data quality in higher educationThesisImproving data quality in higher education
view seems to be missing. Value of data quality can be expressed in terms of financial value or
business urgence. The business management view may include elements of recognizing new patterns,
generating new business based on data mining, turning data into new money. Or costs can be reduced
by –for instance- recognizing patterns indicating cases of fraud. Reasons for attention turning to data
quality are competitiveness (creating new business), being master of business data and therefore able
to not only manage but also lead an organization, exploring client demand (instead of sending a
mailing ‘to all’).
Insight
One main dimension of data quality therefore, seemingly missing from the current model, is Insight.
Does the organization, the data steward, the manager responsible for data quality have insight in its
data and the quality thereof? Insight in data means that it is clear for an organization what data
attributes are required or available, where and why these data attributes are created, what sources were
used, where these attributes are used, who guards and tests the attribute, when these attributes are
outdated and, once obsolete, how they are dealt with.
Accreditation
Data Quality is becoming recognized as a major contributor to (or: prohibitor from when absent)
business success. We may expect a data quality standard to emerge in the near future, and
organizations may become data quality accredited using this standard. Needless to say that Insight is
one of the first dimensions required to be instated.
For now, an organization may well embark on a journey towards data quality improvement because
new management has just entered the organization, and is in doubt about the reliability of his data: he
is not sure wither the data is right or not. In this case, the new entrant acts as a maverick: not
obstructed by any corporate rules and customs, data quality is doubted and questions are asked,
demanding unambiguous answers.
Volatility
In the current model, volatility is mentioned not to be recognized at WDQM level one. This does not
seem to be right, since at operations, the importance of data quality is recognized right from the start.
The experts from operations however have a hard time communicating the importance of data quality,
and at level one, it is mostly management which is unaware of the importance of data quality.
Beef
Where is the beef? The current model is technically correct, yet it seems to be lacking real-world
business attention. For instance, current level labels are quite technical and difficult to understand.
What is meant with ‘quantitatively managed’? Who is to understand this – it is not very likely to
generate management attention instantly. Please describe a ‘WDQM for Dummies’ using management
benefits. Especially beyond level three data quality becomes a matter of special interest to
organizations, opening up a whole new realm of possibilities. What we can see beyond level three in
practice today are cloud computing for data quality initiatives, new business generated and successful
one-on-one business models based on reliable data. Make data quality more sexy!
6.5 Interview report Current Data Quality Educator Gerrit Vissinga
Attendees: Gerrit Vissinga, process engineer Educator
15-Apr-23 F. Boterenbrood Page 8915-Apr-23 F. Boterenbrood Page 89
Research Improving data quality in higher educationThesisImproving data quality in higher education
Frank Boterenbrood
Windesheim, 17-03-2010
Introduction
The Educator project is in turmoil. It has taken the best part of three years now, and full
implementation may well take another three. In the future, issuing diplomas and certificates will
become part of Education.
The current graphical representation of Educator’s scope of influence is not quite right: the process
education development is not within Educator’s scope.
Issues, causes and solutions
Management of study definitions in the catalogue is difficult. In particular updating course definitions
is tricky, since the user has to identify the type of update up front. If the update is identified as
‘complex’, a new version of the study definitions is generated. If the update is identified as ‘simple’,
current data is updated in place and no new version is created. To the user, the distinction between
‘simple’ and ‘complex’ updates is not made perfectly clear, and the consequences of a ‘complex’
update remain unknown to many. One of the consequences is, that once entered, a new version of
study definitions need to be linked to semester variant plans. Often, this step is overlooked, resulting
in study information not being made available to the student, since the student adds semester variant
plans to their activity plans, never individual courses. Indeed, in the catalogue orphaned study
definitions may be found.
Errors like these are caused by an over engineered and complex solution. Currently, a simpler more
straight forward system design is being discussed.
Another issue is caused by the fact that a student may enroll himself into a study that differs from the
one agreed upon with study coordinators. This mistake is prevented by having the supporting offices
assigning semester plans to student’s activity plans, or to have the study coordinator check activity
plans in great detail.
Anyway, some issues remain unsolved, since the focus is still on supporting the primary process.
Other issues are checked by functional support or head lecturers. These tasks however are delegated to
support offices. One may question the quality of the checks performed.
Discussion on Data Quality Dimensions
Accessibility. Seems to be OK.
Accountability. This seems to have a relationship with confidentiality. This seems to be OK.
Confidentiality. The amount of roles available in Educator is rather large, resulting in complex role
management.
Consistency. The technical implementation of Educator may not be adequate to prevent data from
becoming inconsistent. An example is the issue regarding definition updates.
Currency. -
15-Apr-23 F. Boterenbrood Page 9015-Apr-23 F. Boterenbrood Page 90
Research Improving data quality in higher educationThesisImproving data quality in higher education
Integrity, Referential. See Consistency. Integrity amongst different information systems seems to be a
problem, since data integration with Educator is to a large extend manual.
Reliability. In some cases, once grades were assigned, courses were removed from student’s activity
plans, causing grades to disappear. This was caused by a notion that the study plans were in error: the
reliability of the data was in question. It is not known if any solution preventing this type of error has
been implemented.
Specification. Leaves room for improvement
Timeliness. -
Uniqueness. -
Volatility. In particular course definitions are prone to alterations. It seems that lecturers designing
their courses change their mind on how to execute or assess their education too often.
6.6 Interview report Current Data Quality Educator Gert IJszenga
Attendees: Gert IJszenga, manager education School of Build, Environment & Transport
Frank Boterenbrood
Windesheim, 15-03-2010
The School of Build, Environment & Transport (BET) is in a process of migrating all student
information from the old Student Information System (SIS) CATS to the new SIS Educator. Starting
from year 2008-2009, the registration of the digital course catalogue, the student’s personal study
planning and student grades are registered in Educator. Currently, grade information of students
starting in preceding years is being migrated from CATS to Educator. When this process is concluded,
the School of BET is planning on utilizing Educator’s portfolio capabilities.
In implementing Educator, the School of BET applies a gradual approach. First, three years ago, the
processes of education development and definition, student activity planning, assessment and grade
registration were formalized more strictly, creating a situation in which the School of BET was in
control of these processes. Secondly, once these processes operated reliably on the current
infrastructure, process support was switched from CATS to Educator.
The challenge was to create a process in which:
education including assessment rules were defined correctly,
students create their study plan on time and correctly the first time round,
freedom of choice was balanced against predictability (of resource claims),
registration of grades is completed within a two week window without major disruptions.
The issue here was to create a situation in which information stored in Educator could be checked
against base-line documents, resulting in usable data quality controls and enabling well-informed
choices in case errors have to be corrected. The specific question addressed was: “What process
design ensures every student to be linked to the right courses, supporting the assignment of the right
grades?”
15-Apr-23 F. Boterenbrood Page 9115-Apr-23 F. Boterenbrood Page 91
Research Improving data quality in higher educationThesisImproving data quality in higher education
The leading principle at the School of BET implementation is that control over data entered into the
system is mandatory. This principle is implemented in three areas: The Digital Educational Catalogue
DOC (Digitale Onderwijs Catalogus), the student personal activity plan PAP, and grading.
DOC process control
A curriculum does not spring into existence by accident. Leading up to the registration of course
information in DOC, a process of design and discussion is executed. These activities are reflected in
planning and design documentation being present, resulting in a base line enabling control over
definitions in DOC. The School of BET therefore requires course planning and design documents to
be present prior to entering course definitions in DOC. These documents are an instrument guiding
and monitoring the quality of the course catalog.
Personal Activity Planning
Once the student completes the propaedeutics phase, the School of BET offers a variable study
programme in which the student has freedom of choice. One of the problems here is that if the student
does not use Educator to enroll himself in the courses he is attending in time, grades cannot be
assigned. Secondly, it is hard to plan education execution efficiently if participation of students is
uncertain up to the very last moment. Therefore, the student is required to create a complete plan for
his study career early in his study. To support decision making, for each study three alternative study
paths are available, each study path offering limited additional freedom of choice. The School of BET
has structured the choices available in a study planning chart, visualizing the different routes. Finally,
if a student does not complete his personal activity planning in time, he will not be allowed to
participate during one semester.
This personal activity planning results in a set of study plans, which are easily converted into files and
imported into Educator, linking students to courses, groups and classes. To enable this import, the use
of free format data structures in Educator (known as labels) is standardized. And again, if problems
are detected, the individual study plans are a benchmark against which data in Educator can be
checked.
The Windesheim Educational Standards (WOS: Windesheim Onderwijs Standaard) refers to the use
of semester variant plans. These semester variant plans are in fact an educational planning tool
encompassing a twenty week period. The School of BET may not use the Semester Variant Planning
structure literally, yet the process in use does have exactly the same effect.
Grading
Once 1) the digital course catalogue is correct and 2) the student are enrolled in the right courses in
time, assigning grades does not pose any problems. Issues the School of BET meet here are
performance issues, i.e. the speed at which the system reacts to input, bugs for which workarounds are
to be used and reporting facilities which currently are not yet available. These issues indicate that
during development and implementation Educator still was in a experimental state and they are
currently being dealt with in the Educator development project.
The major issue at this moment is getting grips on the time it takes to assess student results and assign
a grade. Ideally, this should be completed within a two-week period; however instruments to control
this service level are not yet available.
Key moments
15-Apr-23 F. Boterenbrood Page 9215-Apr-23 F. Boterenbrood Page 92
Research Improving data quality in higher educationThesisImproving data quality in higher education
Important deadlines in these processes are:
1. The moment courses are published in the digital education catalogue;
2. The moment the student submits his personal activity planning;
3. The moment grades are assigned and finalized.
Conclusion
In this discussion, not all information relevant to the research project was discovered, since one hour
proofed to be insufficiently long. A second date was set, in order to continue this meeting.
6.7 Interview report Current Data Quality Educator Gert IJszenga Continued
Attendees: Gert IJszenga, manager education School of Build, Environment & Transport
Frank Boterenbrood
Windesheim, 25-03-2010
In this interview, the current values of data quality dimensions are discussed.
Accessibility. At this moment, reports enabling control over Accessibility are missing. Some rather
elaborate manual checks are available. However, due to process design, it is believed that
Accessibility for most is sufficient. There may be an issue with assignment of grades. An estimated
80% or so is believed to made accessible for students within 10 days after an assessment. It is mainly
the lecturers motivation keeping Accessibility within limits.
Accountability. Educator offers build-in mechanisms to safeguard accountability. An audit-trail is
available, logging all data updates. In the real world however, only exams are stored, student reports
and other end-products are handed back to the student after examination. It is therefore not feasible to
reproduce the product that was assessed. Another issue is the absence of a fall-back administration, in
case errors cause Educator to fail. In one instance, deletion of courses already being graded, caused
the deletion and loss of all grades, leaving the organization without a backup. The system should
prevent this.
Accuracy. Course information is described in many documents outside Educator. As a result, the
information entered into the system is (wrongfully) regarded to be of minor importance. This
information is often less detailed as should be. This is an area with room for improvement. How
serious are we about the data in our study support systems?
Completeness. Educator requiring all course data to be available before one, fixed, deadline is
perceived to be a problem. There is no room for a more gradual approach, in which required data is
stored first, and additional, more optional data is added later. The current binary method causes course
information to be entered as late as possible, jeopardizing currency and timeliness.
Confidentiality. Is well take care off. It is hardly impossible to adjust a grade, since this function is
protected using a token (strong authentication). Using social hacking techniques, one may gain access
to student grades, be it read-only.
Consistency. Due to strict process design, consistency is believed to be managed at a pro-active level.
15-Apr-23 F. Boterenbrood Page 9315-Apr-23 F. Boterenbrood Page 93
Research Improving data quality in higher educationThesisImproving data quality in higher education
Currency. Some educator functions are troubled, Educator does show hick-ups from time to time. An
example is the limited choice of web browsers supported by the system, making it difficult to gain
access to the data in time from different devices and locations.
Integrity, Data. Course data is considered to be right for about 75%. The integrity of student activity
plans and grade management may well approach a score of 100%.
Integrity, Referential. Is guarded by strengthening the process.
Reliability. At the school of Build, Environment and Transport the data within Educator is qualified
as reliable as a result of well defined business processes.
Specification. Specifications is not quite ready yet. Currently, much knowledge stillis convined to
Gert alone, the situation is not quite transparent, i.e. ready to be shared. There is still room for
improvement here, too.
Timeliness. This is an issue which is being worked on as we speak. It is perceived to be troublesome
due to the fact that processes are started reactively. It is the process stakeholder who decides on
starting a process, and time and again it proves difficult to start processes in time. Time is poorly
planned.
Uniqueness. The design process creates a barrier against data being duplicated. Lecturers work in
teams on courses development. Course data however is described in multiple documents between
which discrepancies are possible.
Volatility. Currently, course data may well be too volatile. The organization, learning to use the
system, is changing course information much too often. A good course definition should last for a
minimum of three years, and for many courses this may well be eight years. Yet, courses are updated
multiple times each year now.
6.8 Interview report Current Data Quality Educator Klaas Haasjes
Attendees: Klaas Haasjes, operational support Educator
Frank Boterenbrood
Windesheim, 17-03-2010
Introduction
Klaas Haasjes, as a member of operations, is responsible for the correct operation of Educator and the exchange of data between Educator and its adjacent information systems. Data exchange between Educator and Blackboard for instance is a labour intensive process. In Educator, executing a query results in a comma separated file, which is then imported into Blackboard, an information system supporting secured document exchange between students and teachers.
Issues, causes and solutions
15-Apr-23 F. Boterenbrood Page 9415-Apr-23 F. Boterenbrood Page 94
Research Improving data quality in higher educationThesisImproving data quality in higher education
It is found that data entered in Educator results in problems in Blackboard. For instance, for each course (VOE) in Educator, a module in Blackboard is generated. In this process, for each module only one teacher is linked to the module, being the teacher responsible for the course. In some cases, in Educator multiple teachers or groups of teachers are linked to a course, which leads to one random teacher or no teacher at all being linked to a module in Blackboard. Operations does not correct this problem. It is found that these issues are corrected by the user in Blackboard manually.
In the past, courses in Educator could be renamed. When this happened, the consequence was that course names in Educator and Blackboard became different, rendering course selection for students a mission impossible. To prevent this confusion, Educator has been modified, preventing course names from being altered. However, ghosts from the past still remain, causing 472 errors during data integration runs.
Life cycle management of data is a problem in many cases. At www.studielink.nl, student can select a study. Once a student selects Windesheim and www.studielink.nl submits their information, an account is created at Windesheim. However, students are free to un-enroll themselves and indeed frequently do so. Their account at Windesheim is not terminated, leading to literally thousands of ghost-accounts. In many cases, these ghost-accounts are assigned to the mandatory part of the programme of the study the students initially enrolled for. Once that has happened, removing these accounts becomes difficult, since they have become intertwined with educational registrations. A solution for this problem is currently being investigated. Student ghost-accounts may cause havoc with software licensing strategies. When a license strategy is based on maximum number of enrolled students, ghost accounts may cause maximum thresholds to be exceeded.
Discussion on Data Quality Dimensions
Accessibility. Many students may still not be aware of the existence of the digital catalogue. Indeed, many lecturers may not be aware of its existence. It may be observed that seemingly the educational process is not fully understood by many. Wither actions are planned or taken to improve the situation is unknown.
Accuracy. Values entered in the catalogue are checked against general agreed upon guidelines. However, these guidelines do not seem to be known by many.
Consistency. In the past, the meaning of grades could be defined by the lecturer designing the course. This led to a plethora of grade value interpretations. One issue in particular caught the attention: grade values indicating a score being insufficient, sufficient or a course being dispensated altogether. These scores were represented by a 4 and 6 respectively, much to the dissatisfaction of students graduating cum laude, who, much to their surprise, were presented with one or more sixes amongst the row of ‘straight A’s‘. Now, grade values definitions are defined by Educator automatically. How values for sufficient, insufficient and dispensated are currently processed is unknown.
Currency. In many cases, information is entered into the system too late. This is not primarily a fault of the information system, it is the human factor causing delays. Examples are grades and course definitions being entered too late. Student plans tend to be finalized in time, since being late with student planning results in the student not being able to attend to classes for one semester.
15-Apr-23 F. Boterenbrood Page 9515-Apr-23 F. Boterenbrood Page 95
Research Improving data quality in higher educationThesisImproving data quality in higher education
Integrity. In Educator, courses with no credits attached have been defined. It is apparently not feasible to assign checks to every data attribute entered.
Reliability. Data is reliable as long as they are entered correctly.
Specification. Documentation supporting Educator is rather thin. However, documentation is being improved.
Timeliness. The human factor proves to be a large contributor to information being available late. Knowledge on how processes rely on information being timely seems to be missing. Implementation of Educator seems to be left in the hands of the individual schools.
Uniqueness is a dimension which is strictly observed and guarded.
Volatility. Information in the world of Educator does not change very frequently. Peaks are found when new students enroll themselves at Windesheim.
6.9 Interview report Current Data Quality Educator Louis Klomp
Attendees: Louis Klomp, ICTO Coordinator school of Business & Economics
Frank Boterenbrood
Windesheim, 18-03-2010
Introduction
Louis Klomp is teacher and ICTO coordinator (Information and Communication Technology in Education) at the school of Business & Economics (BE). Louis was engaged in the use of the first version of the digital education catalogue, and has participated in the migration to the current catalogue. As a teacher, Louis has hands-on experience with Educator.
Educator does not support development of courses; the focus of the model currently presented is too wide. Since at BE, Educator is used from the very first moments on, printing diplomas and grade certificates are supported by Educator this year.
Now BE is focusing on defining and registering standards and thresholds, such as the 45 EC threshold associated with the propaedeutics phase. In time, these thresholds will be assigned to student’s personal activity plan (semi) automatically.
15-Apr-23 F. Boterenbrood Page 9615-Apr-23 F. Boterenbrood Page 96
Research Improving data quality in higher educationThesisImproving data quality in higher education
Issues and actions
Much to anyone’s surprise, during grading teachers were confronted with the fact that when the definition of assessments of a course in the catalogue did not align with the way a course was assessed in real-life, grading of that course was difficult, if not impossible at all. At first, teachers were supported by coordinators removing the course from student’s activity plans, correcting the course definitions and re-inserting the course in the activity plans, restoring previously earned grades manually. Later on, this support was dropped and teachers had to deal with the issues themselves. This rather rigid support policy proved to be beneficial for data quality: teachers became much more aware of getting the definitions in the catalogue right the first time round. Now, the mindset has been transformed from a deadline being debatable and final being questionable to a deadline being the limit and final being definite.
Course definitions were entered by personnel of the BE supporting office. Communication regarding course definitions between lecturers and supporting personnel was based on notes and print-outs. These went missing regularly, causing mistakes and miscommunication. Now, it has become the lecturer’s responsibility to enter the course definitions.
It proved to be impossible to link student requests for a re-assessment to the exact moment a course had been scheduled in the past. In Educator, the moment a course had been scheduled is not registered. In order to be able to create useful management reports and to assign student rework to the correct course, all BE courses in Educator are copied and renamed each year, inserting the current year into the name of the course. The lecturer responsible for the course has to agree upon the course definition still being valid. This procedure caused course information to be improved and enabled student requests to be assigned to the right, historical, course definitions.
Many reports enabling management of Educator data are still missing. Currently, it is hard to get a view on study progress, since relevant reports are not available. Migration of grades between information systems in the past has introduced errors; however lack of reports does not help identifying these errors. Annual duplication of all course descriptions results in a growth of the database, adding to the need of management reports.
Now Educator has been used for three years, initial assumptions of how education is organized are re-evaluated. A redesign seems beneficial, in which the structure of the catalogue may be greatly simplified, improving availability and understandability of the catalogue. Now it seems that items like OE and VOE (Onderwijs Eenheid and Variant Onderwijs Eenheid) may best be combined into one course entity, while the entity Semester Plan seems to be redundant completely. Having used Educator for three years also means that next year, the first section of students will have their diplomas printed by Educator.
For reasons unknown, calculation of a final grade does not function properly. In rare cases, students are being presented with insufficient grades, while final re-assessments, resulting in grades being sufficient, should have shown a more positive result. Reports created by the software manufacturer did not clarify this mystery. Now, an approach is used in which problems are investigated once students complain.
In short, many issues are related to the absence of proper management reports.
15-Apr-23 F. Boterenbrood Page 9715-Apr-23 F. Boterenbrood Page 97
Research Improving data quality in higher educationThesisImproving data quality in higher education
In some student activity plans, courses and grades students earned were migrated from the previous study support system manually. Again, when these courses were attended to by the student and the time the grades were earned was not registered in Educator. Now this information is unavailable.
Discussion on Data Quality Dimensions
Accessibility. Currently, the system is over engineered, too complex, limiting accessibility.
Accountability. Is OK.
Accuracy. Initially accuracy proved to be a problem. By assigning responsibilities to the right functions, and confronting stakeholders with the consequences of their actions, accuracy has been improved greatly.
Completeness. See Accuracy
Confidentiality. This is OK, Educator offers comprehensive role management functions.
Consistency. It is found that the level of detail in which courses are explained in additional descriptions is not consistent. Some teachers describe their courses in great detail, while others spend only a few words. No actions are defined to correct this situation.
Currency. Grading may well be a problem. No reports exist monitoring the grading process.
Integrity, Data. The data integrity is questioned because many relevant management reports are missing, the real quality of data is unknown.
Integrity, Referential. The relations between VOE, OE, Semester plans and Variant Semester plans are questionable and in many cases, absent. Simplifying the digital catalogue would greatly improve this situation.
Reliability. Even though Roel is positive on the reliability of Educator, many colleague teachers may disagree. Using Educator only once in a while, and inadequate training and documentation may well be at the source of this attitude. In Roels experience, often teachers make mistakes, blaming the system.
Specification. On a scale of 1 to 10, where 10 equals excellent and 1 is non-existent, specification scores a poor 1.5 or 2 at most.
15-Apr-23 F. Boterenbrood Page 9815-Apr-23 F. Boterenbrood Page 98
Research Improving data quality in higher educationThesisImproving data quality in higher education
Timeliness. For many, the planning of the educational process is perceived as being complex. When new education is to be developed, development has to start well in advance of the targeted study period in order to deliver study information in time.
Uniqueness. Is OK.
Volatility. Study information is altered annually, or every half year in some cases. Grades are created quarterly, amounting to about 230.000 grades being registered at Windesheim as a whole each study period. Study plans are extended every six months.
6.10 Interview report Current Data Quality Educator Viola van Drogen
Attendees: Viola van Drogen, Functional support Educator
Frank Boterenbrood
Windesheim, 16-03-2010
Introduction.
The business domain of this research is focused on the business domain supported by Educator. To visualize this domain, the domain architecture as designed by (Jansen, 2006) is used as an information source. Currently, this domain architecture is being discussed.
Issues, causes and solutions
In Educator, data may be entered and updated by many stakeholders, while in many cases Educator does not offer input checks, resulting in data being in error the moment they are stored in the system. Causes identified by functional support are:
• no workflow has been defined for the specific data set;
• on individual fields, no data checks are available;
• the stakeholder operating Educator lacks vital knowledge on the effects of erroneous input;
• time to develop fitting reports are missing.
In the experience of functional support, many stakeholders agree on the fact that data needs to be correct, however, this attitude seems to be missing with regard to one’s own actions.
Errors in data are revealed once grade certificates are printed. On these certificates, it becomes clear that grades are not assigned to the right courses and that descriptions of courses are in error. It is revealed that in many cases errors are caused by inadequate data entry. An example of inadequate data entry is the situation in which grades are entered twice. This may seem to be an innocent mistake, since in the end, the result of this mistake is that the student receives the right grade. It seems that no
15-Apr-23 F. Boterenbrood Page 9915-Apr-23 F. Boterenbrood Page 99
Research Improving data quality in higher educationThesisImproving data quality in higher education
harm is done, yet a student is granted only one chance to redo an assessment when a grade is insufficient – and entering a grade twice counts as rework!
Currently, Educator produces grade certificates for all first grade students and, at some Schools, second grade students. In the study process, the digital education catalogue needs to be finalized first, and the student’s personal activity plan (PAP) as well as the schedule may be created next.If either the catalogue or a student’s activity plan are incomplete, teachers may not be able to assign grades. It is found that the digital catalogue is used as an experimental course development stage, instead of a catalogue of predefined and finalized course definitions, resulting in frequent change requests on previously accepted definitions.
Wither or not the semester plans and semester variant plans are actually being used is unknown.
In order to get a grip on changes in DOC and enable smooth scheduling of classes, and to support the PAP creation process, the option of freezing the digital education catalogue in April is currently being discussed. Preceding this lock-down of the digital catalogue, a mechanism using red and green ‘traffic lights’ may be implemented, reminding stakeholders of the effects of changes in DOC.
Discussion on Data Quality Dimensions
The research has defined a list of data quality dimensions. Which data quality dimensions are currently of importance?
When looking from the student’s point of view, Accessibility of data in Educator leaves room for improvement. To find the right course in the catalogue proves to be a challenge at times, since naming conventions may exist yet rarely used. This results in a plethora of course identification codes, looking rather identical in many cases. An action planned to improve this situation is the implementation of automatic course code generation, replacing the manual assignment of a course code by a code that is generated automatically based on course parameters. Another action is to assign timing information to course information, indication the semester and period in which the course is going to be scheduled.
Accountability is almost 100% implemented. All actions modifying data sets in Educator are logged, creating an audit trail binding stakeholders to actions performed. Unfortunately, an audit trail is not created if an instance of a course is removed from activity plans, and re-attached to those plans once the course is modified. This action however is under debate, since it is an action which should under normal circumstances not be required and seems to create problems at entering grades.
Accuracy seems to be in control, yet in some cases student data is found to be corrupted. The source of these problems is believed to be a data migration between the old and new student information systems. However, students entering faulty data in the online admission system studielink (www.studielink.nl) are a likely source of data corruption too. Students confronted with data quality problems may have their data corrected at the student administration. Their data will be corrected in the main student information system first, and then transferred to secondary systems later.
15-Apr-23 F. Boterenbrood Page 10015-Apr-23 F. Boterenbrood Page 100
Research Improving data quality in higher educationThesisImproving data quality in higher education
Confidentiality is under discussion. Functional support is able to modify student grades, and other stakeholders are able to view these datasets. It is rather likely that this is an undesired situation.
Currency is of importance at entering grades. At Windesheim, it is agreed that grades become available within two weeks after an assessment. However, no instruments to monitor this period are present currently.
Integrity of data seems to be under control, be it that in the past entities were discovered in which required data fields were missing. Specialists from both Windesheim and the software supplier were not able to find a cause for this anomaly. The situation is corrected, yet the cause remains unknown. A report has been defined, offering a control monitoring integrity.
Currently, reliability of Educator is being questioned. Educator does not offer basic reports, reports are being created using Business Objects. Unexpected collapses of Business Objects result in unavailable or unreliable reports. Currently, re-instating Educator’s reporting capabilities is being discussed.
To solve data quality issues, an Educator database quality taskforce is created.
6.11 Data Quality Workshop
On Tuesday 30th of March, a workshop establishing future data quality requirements was conducted.
In this paragraph, the outcome of this workshop is documented.
Date and Time: Tuesday, 30-03-2010, 14:00 – 16:00
Location: IT services, Windesheim
Attendees present: G. Spoelman (Teamleader Software Development), G. Kwakkel (Software
Development), K. Haasjes (Operations), A. Polderdijk (Information
Security), G. Vissinga (Process Design), G. IJszenga (Education
Management), R. Slagter (Project Management), M. van den Berg
(Operations), A. Paans (Information Management), H. Tellegen (Operations),
A. Jaspers (Operations), F. Boterenbrood (Research).
Workshop Schedule:
14:00 Welcome, Problem Definition (A. Paans) and Workshop (F. Boterenbrood)
discussion.
14:30 Discussing Educator process and business rules (All)
15:00 Explanation on Data Quality Dimensions (F. Boterenbrood)
15:15 Selection of future Data Quality Dimensions (All)
15:45 Discussion of results (All)
15-Apr-23 F. Boterenbrood Page 10115-Apr-23 F. Boterenbrood Page 101
Research Improving data quality in higher educationThesisImproving data quality in higher education
15:00 Wrap-Up
Workshop Preparation
For this workshop, a large room providing both free space for workshop activities and a large
table for a ‘round table’ discussion was arranged.
For each attendee, the Educator process and a set of business rules was printed.
The Educator process was divided into four main sections, each sub process resulting in a
baseline as established during interviews.
For each section, a paper sheet was taped to the wall, enabling workshop attendees to visibly
choose data quality dimensions suitable for the sub process discussed.
For each section, a set of A4 sized sheets were printed, each sheet defining one data quality
dimension. Every data quality dimension was given a value according to its position in the
WDQM (value = (level – 1)2 ). An option was offered to assign a reduced value to a dimension,
resulting in the data quality dimension being partly met. The value of a dimension was expressed
in ‘credits’.
For each section, a set of 20 green and 10 red labels was provided, limiting the number of data
quality dimensions to be selected.
Prior to workshop execution, selection of data quality dimensions was tested on colleagues
within the School of Information Sciences. Based on experiences collected from these tests, data
quality dimension definitions were improved.
Workshop Execution
In a round-table setting, the Educator process and business rules were discussed. This discussion
resulted in some business rules being dropped, while others were altered.
The data quality dimensions were discussed. Care was taken not to reveal the WDQM yet.
The participants were grouped into four groups. During 15 minutes, each group discussed a sub
process, assigning 20 green labels to data quality dimensions, each label corresponding with one
‘credit’.
After this initial round, groups switched and validated data quality dimensions assigned to a sub
process by another group. Alterations were indicated by red labels. The total number of labels
was not to exceed 20.
Finally, the results were discussed. The participants showed confidence in the results gained yet
expressed doubts regarding the way these were to be interpreted. The WDQM was discussed.
Workshop Results
The business rules were validated, in some cases altered, and agreed upon.
For sub processes, data quality dimensions were assigned:
(Sub process, DQ dimension, Required , WDQM level)
o Manage Digital Education Catalogue
Milestone: courses are published
Completeness (Must have) 3
Currency (Should have) 4
Accuracy (Should have) 3
Reliability 3
15-Apr-23 F. Boterenbrood Page 10215-Apr-23 F. Boterenbrood Page 102
Research Improving data quality in higher educationThesisImproving data quality in higher education
Specifications (Should have) 2
Consistency (Should have) 4
o Orientate, Select, Apply and Contract
Milestone: Student’s activity plan is agreed upon
Timeliness (Must have) 4
Reliability 3
Completeness (Should have) 3
Accuracy (Should have) 3
Accountability 3
o Schedule, Study and Assess
Milestone: grades are assigned
Accuracy (Should have) 2
Referential Integrity (Should have) 2
Completeness (Should have) 3
Currency (Must have) 4
Timeliness (Should have) 4
o Discuss Progress and Manage Study Progress
Milestone: Student receives a Certificate
Completeness (Must have) 3
Accuracy (Must have) 3
Reliability 3
Confidentiality (Must have) 3
Currency (Must have) 4
15-Apr-23 F. Boterenbrood Page 10315-Apr-23 F. Boterenbrood Page 103
Research Improving data quality in higher educationThesisImproving data quality in higher education
Workshop Sheets Used
Beschikbaarheid
Beschikbaarheid beschrijft hoe lang het duurt voordat gegevens beschikbaar zijn voor de deelnemers in een bedrijfsproces.
Credits: 8 4Vereist (hoog) Gewenst (minder hoog)
Afhankelijk van: -
Eenheid : Tijd, B=
delivery time - input time + age
Hoog : Meetbaar met een klok
Laag : Meetbaar met een kalender
Betrouwbaarheid
Betrouwbaarheid beschrijft de mate waarin alle gegevens die een informatiesysteem beheert door besluitvormers vertrouwd worden.
Eenheid : Binair 1 of 0, Gegevens worden vertrouwd, of zij worden niet vertrouwd
Wel : 1
Niet : 0
Credits: 4Gegevens worden vertrouwd
Afhankelijk van: Nauwkeurigheid, Volledigheid
Consistentie (reactief)
Consistentie beschrijft in hoeverre alle data elementen hetzelfde beschrijven / betekenen. Consistentie kan worden verkregen door gegevens achteraf te corrigeren.
Eenheid : Ratio 0 - 1, afwijkingen ten opzichte van totaal elementen.
C = afwijkend / totaal
Hoog : 1
Laag : 0
Credits: 2 1Vereist (laag) Gewenst (minder laag)
Afhankelijk van: -
Consistentie (proactief)
Consistentie beschrijft in hoeverre alle data elementen hetzelfde beschrijven / betekenen. Consistentie kan worden verkregen door systemen te conformeren aan een enterprisearchitecture.
Eenheid : Ratio 0 - 1, afwijkingen ten opzichte van totaal elementen.
C = afwijkend / totaal
Hoog : 1
Laag : 0
Credits: 8 4Vereist (laag) Gewenst (minder laag)
Afhankelijk van: -
Herleidbaarheid
Herleidbaarheid beschrijft de mate waarin herleidbaar is wie verantwoordelijk is voor welke wijziging van de waarde van gegevens.
Afhankelijk van: -
Eenheid : Binair 1 of 0, Mutaties zijn herleidbaar, of zij zijn niet herleidbaar
Wel : 1
Niet : 0
Credits: 4Mutaties zijn herleidbaar
Integriteit
Met de term integriteit wordt hier bedoeld dat gegevens van de allerhoogstekwaliteit moeten zijn. Gegevens zijn integer als per miljoen gegevens er minder dan 3.2 fouten optreden (Six Sigma).
Credits: 16 8Vereist (6sigma) Gewenst (3σ)
Afhankelijk van: Diverse procesindicatoren
Eenheid : Sigma σ
Hoog : 6σ 3.2 fout per miljoenLaag : 3σ 67K fout per miljoen
(93%foutvrij)
Nauwkeurigheid (proactief)
Nauwkeurigheid beschrijft in hoeverre gegevens in overeenstemming met de werkelijkheid zijn. Nauwkeurigheid kan worden verkregen door gegevens vooraf te screenen.
Eenheid : Ratio 0 - 1, fout ten opzichte van totaal aantal elementen.
N = fout / totaal
Hoog : 1
Laag : 0
Credits: 4 2Vereist (laag) Gewenst (minder laag)
Afhankelijk van: -
Nauwkeurigheid (reactief)
Nauwkeurigheid beschrijft in hoeverre gegevens in overeenstemming met de werkelijkheid zijn. Nauwkeurigheid kan worden verkregen door gegevens achteraf te corrigeren.
Eenheid : Ratio 0 - 1, fout ten opzichte van totaal aantal elementen.
N = fout / totaal
Hoog : 1
Laag : 0
Afhankelijk van: -
Credits: 2 1Vereist (laag) Gewenst (minder laag)
Referentiële integriteit
Referentiële integriteit beschrijft in hoeverre aan elkaar gerelateerde verzamelingen conform de formele relatie zijn vastgelegd. Referentiële integriteit wordt bewaakt door database constraints.
Eenheid : Ratio 0 - 1, fout ten opzichte van totaal aantal relaties.
R = fout / totaal
Hoog : 1
Laag : 0
Credits: 2 1Vereist (laag) Gewenst (minder laag)
Afhankelijk van: -
15-Apr-23 F. Boterenbrood Page 10415-Apr-23 F. Boterenbrood Page 104
Research Improving data quality in higher educationThesisImproving data quality in higher education
Workshop sheet used - continued
Specificatie
Specificatie beschrijft of de gegevensverzameling en bedrijfsregels voldoende gedocumenteerd zijn.
Eenheid : Binair 0 - 1
Voldoet : 1
Voldoet niet : 0
Credits: 2 1Vereist (Voldoet) Gewenst (Voldoet bijna)
Afhankelijk van: -
Tijdigheid
Tijdigheid beschrijft de mate waarin gegevens beschikbaar zijn en geschikt voor het gebruik.
Credits: 8 4Vereist (hoog) Gewenst (minder hoog)
Afhankelijk van: Vluchtigheid, Beschikbaarheid
Eenheid : Onbepaald, T = Vluchtigheid * Beschikbaarheid
Hoog : B << Golflengte Vluchtigheid
Laag : B >= Golflengte Vluchtigheid
Toegankelijkheid
Toegankelijkheid beschrijft de mate waarin toegang tot gegevens ontstaat voordat zij irrelevant zijn.
Credits: 8 4Vereist (hoog) Gewenst (minder hoog)
Afhankelijk van: Beschikbaarheid
Eenheid : Ratio, T = 1- (delivery time -input time) / (outdated time -input time)
Hoog : 1
Laag : 0
Uniciteit
Uniciteit beschrijft de mate waarin gegevens eenduidig zijn verkregen, opgeslagen en weergegeven.
Eenheid : Ratio 0 - 1, dubbelingen ten opzichte van totaal aantal entiteiten. U = dubbel / totaal
Hoog : 1
Laag : 0
Credits: 2 1Vereist (laag) Gewenst (minder laag)
Afhankelijk van: -
Vertrouwelijkheid
Vertrouwelijkheid beschrijft de mate waarin alle gegevens afgeschermd zijn voor ongeautoriseerd gebruik
Eenheid : Vertrouwelijkheid rust op vele maatregelen. Criterium voor indeling: BIV codering
Hoog : Essentieel
Midden : Belangrijk
Laag : Wenselijk
Credits: 8 4 2Essentieel Belangrijk Wenselijk
Afhankelijk van: Beschikbaarheid en Integriteit
Vluchtigheid
Vluchtigheid beschrijft de snelheid waarmee gegevens in het bedrijfsdomein wijzigen.
Hoog : Dagelijks (f>5/W)
Redelijk : Wekelijks (f>5/M)
Matig : Maandelijks (f>5/S)
Laag : Semester
Eenheid : Frequentie
Credits: 0 0Vereist Gewenst
Afhankelijk van: -
Volledigheid
Volledigheid beschrijft de mate waarin alle gegevens die voor het proces vereist zijn, zijn vastgelegd.
Eenheid : Ratio 0 - 1, Aantal missende gegevens ten opzichte van totaal. V = gemist / totaal
Hoog : 1
Laag : 0
Credits: 4 2Vereist (laag) Gewenst (minder laag)
Afhankelijk van: Volledigheid kan op gespannen voet staan met Tijdigheid
15-Apr-23 F. Boterenbrood Page 10515-Apr-23 F. Boterenbrood Page 105
Research Improving data quality in higher educationThesisImproving data quality in higher education
6.12 Business rules according to the Windesheim Educational Standards
The Windesheim Educational Standards (Iersel, Loo, Serail, & Smulders, 2009) identify a set of
business rules in the form of high level descriptions, guiding the behavior of an organization
(Agrawal, Calo, Lee, Lobo, & Verma, 2008).
The educational model is student centered and competence based.
Students will be offered a broad set of choices.
Students will be guided in acquiring internationally accepted qualifications (CROHO26-
competences).
Students will be guided in acquiring nationally accepted generic domain competences.
Students will be coached during their study.
A school offers one or more educational programmes.
The effort an educational programme requires is measured in EC (European Credits).
A programme will be constructed using one major and two minors.
A major is a set of courses and workshops.
A major defines the mandatory part of a programme of education.
A major is 180 EC in size.
A minor is a set of courses and workshops.
A minor defines the optional part of a programme of education.
A minor is 30 EC in size.
At least one minor will result in the student having completed the first cycle (bachelor level).
A course is defined as an onderwijseenheid (OE).
The maximum size of an onderwijseenheid is 30 EC.
The minimum size of an onderwijseenheid is advised to be 3 EC.
Every onderwijseenheid will result in at least one variant (VOE).
Onderwijseenheden are clustered into a semesterplan.
Variants of an onderwijseenheid are clustered into a semestervariantplan.
Students are free to choose minors from within their programme of education, from another
programme of education, or from another institution, nationally or internationally.
Programmes may restrict the choice of minors, based on their contribution to the
competences to be acquired.
Assessments are competence based.
Competence based assessments observe students knowledge, insights, skills and attitude.
Every onderwijseenheid is concluded by an assessment.
An onderwijseenheid is either project-based or a theoretical of nature.
Every programme has a propaedeutics phase.
The propaedeutics phase has a size of 60 EC.
The propaedeutics phase is concluded with a propaedeutics assessment.
A student is advised whiter or not to continue his study, based on the results of the
propaedeutics assessment.
The advice is a mandatory opinion.
Windesheim does support the Associate degree (Ad).
The effort to acquire an Associate degree is at least 120 EC.
Windesheim does support the second cycle (Master Degree)
Education in the second cycle has no major/minor structure.
During his study, the student will receive personal guidance.
Effort required for personal development will amount to 8 EC at least and 16 EC at most.
26 Centraal Register Opleidingen Hoger Onderwijs: Central Registration of Schools in Higher Education
15-Apr-23 F. Boterenbrood Page 10615-Apr-23 F. Boterenbrood Page 106
Research Improving data quality in higher educationThesisImproving data quality in higher education
Personal development will be assessed.
Windesheim does offer part-time studies and courses.
A part-time study does not necessarily have a major/minor structure.
6.13 Detailed Business Rules
Manage Digital Education Catalogue
1. When the development of a course is completed, it will be described in the Digital Education
Catalogue.
2. When a course is described in the Digital Education Catalogue, it will be assigned to a
semesterplan.
3. When a course is described in the Digital Education Catalogue, it will be assigned to a major
or a minor.
4. When a course is described in the Digital Education Catalogue, for each type of education
(daytime education, part-time education) a variant will be described.
5. When a variant of a course is described in the Digital Education Catalogue, it will be
assigned to a variant semesterplan.
Orientate
6. When a student engages a new semester, he will work on his Personal Activity Plan (PAP).
7. When a student works on his PAP, he may use the Digital Education Catalogue as a source to
choose from.
Select
8. When a student works on his PAP, he may choose semester variant plans from the
Educational Catalogue and add them to his PAP, thus creating an individual study
programme.
9. When a student is enlisted in a study, the mandatory major of his programme will have to be
executed first.
10. A student’s personal activity plan may in Educator may not be managed by the student. It
may actually be managed by the back-office of a School27.
Apply
11. When a student adds a semester variant plan to his PAP, including only minors offered by the
programme the student initially enlisted for, the addition is agreed upon automatically.
12. When a student adds a semester variant plan to his PAP, including minors offered by
programmes other than the one the student initially enlisted for, an examination committee
will have to agree first.
13. When a minor is either full or cancelled, the student may have to choose another semester
variant plan for his PAP.
27 As identified in the workshop of 30-03-2010, see appendix 6.11
15-Apr-23 F. Boterenbrood Page 10715-Apr-23 F. Boterenbrood Page 107
Research Improving data quality in higher educationThesisImproving data quality in higher education
Contract
14. When a PAP is agreed upon, and the minor(s) selected by the student is/are still available and
not booked already, the PAP is finalized.
Schedule
15. When the execution of minors is agreed upon, a schedule is created by individual or
collaborating Schools.
16. When a schedule is created, it takes into account the number of students attending to a
course, the specific characteristics and educational needs of a course (type and size of
classrooms and equipment), the availability of teaching staff assigned to the course and the
order in which courses are to be scheduled.
17. When the schedule is finalized, it is published.
Study
18. When the student is working on his study, he will create a portfolio.
19. When the student is working on his study, he may work with other students on a project
20. When students work in projects, they will share items in their portfolio.
Assess
21. When an item in a portfolio is ready for assessment, the student will transfer ownership of
that item to the teacher.
22. When an item is assessed, a grade will be assigned to it.
23. When a grade is assigned to an item, it may no longer be changed.
24. When all assessments of a course are finalized, the end result will be calculated.
25. When an end result is calculated, rules as defined in the Digital Course Catalog for the course
at hand are executed.
26. When all results exceed the minimal requirements as defined in the Digital Course Catalog
for the course, the student is granted the European Credits (EC) associated with this course
and as defined in the Digital Course Catalog.
Discuss Progress
27. When a semester is finished, the student’s progress is discussed.
28. When a student fails to collect the required EC’s during the propaedeutics phase within a
limited period, the student is not allowed to continue his study at Windesheim.
29. In some cases, when the student has collected 120 EC, an Associate degree may be assigned.
30. When the student has collected 210 EC, the (final) graduation minor may be started.
31. When the student has executed the graduation minor successfully, the first cycle is completed
and a Bachelor’s degree is assigned.
32. When a student wishes to earn a Master degree, he may engage in a study for the second
cycle.
33. When, while studying in the second cycle, the student collects a minimum of 60 EC, the
second cycle is completed and the Master degree is granted.
15-Apr-23 F. Boterenbrood Page 10815-Apr-23 F. Boterenbrood Page 108
Research Improving data quality in higher educationThesisImproving data quality in higher education
Manage Study Progress
34. When a product from a student is assessed and graded, the grade is stored digitally and made
available to the student.
35. When credits are granted to a student, these credits are stored digitally and made available to
the student.
36. When a first attempt to be assessed is not successful, a second assessment will be offered
during the same study year.
37. When a course is changed between assessments, the rules and number of credits associated
with the course the student originally attended to, apply.
6.14 Project Flow
During discussing the project’s progress, the constituent of the project drew a map representing the
flow of the project as he visualized it. Being indeed an accurate description of this project, this flow
was agreed to be documented in order to be able to discuss progress in the future. This appendix
contains the constituent’s vision on the flow of the project.
Theory
WDQMmodel
Data Quality
practices
Practice
Issues
Current
Solutions
Desired
Data Quality
Maturity
What can be seen is that, based on theories on data quality and maturity, a data quality maturity model
is created. This model is used to investigate issues as experienced in the current practice. Data Quality
practices, defined by the model, describe a desired situation in terms of solutions. This finally, leads to
new information adding to the body of knowledge (theory).
15-Apr-23 F. Boterenbrood Page 10915-Apr-23 F. Boterenbrood Page 109
Research Improving data quality in higher educationThesisImproving data quality in higher education
6.15 Literature
Agrawal, D., Calo, S., Lee, K.-W., Lobo, J., & Verma, D. (2008). Policy Technologies for Self-
Managing Systems. Boston: IBM Press.
Ahern, D. M., Clouse, A., & Turner, R. (2008). CMMI® Distilled: A Practical Introduction to
Integrated Process Improvement, Third Edition. Boston: Pearson Education, Inc.
Arvix. (2009). Wacht u tot de rookmelder afgaat. Retrieved november 8, 2009, from www.arvix.com:
http://www.arvix.com/user_files/file/wacht_u_tot_de_rookmelder_af_gaat_v12_web.pdf
Baida, Z. S. (2002). Architecture Visualization, Master Thesis in Computer Science. Amsterdam: VU
University.
Bakker, J. G. (2006). De (on)betrouwbaarheid van informatie, je staande houden in het
informatiegeweld. Benelux: Pearson Education.
Batini, C., & Scannapieco, M. (1998). Data Quality, Concepts, Methodologies and Techniques. New
York: Springer Berlin Heidelberg.
Besouw, F. v. (2009). Samenhang tussen bedrijfsregels, bedrijfsprocessen en gegevenskwaliteit.
Retrieved november 8, 2009, from Arvix:
http://www.arvix.com/user_files/file/samenhang_bedrijfsregels_bedrijfsprocessen_gk.pdf
Boer, S. d., Andharia, R., Harteveld, M., Ho, L. C., Musto, P. L., & Prickel, S. (2006). Six Sigma for
IT Management. Zaltbommel: Van Haren Publishing.
Boterenbrood, F., Hoek, J. W., & Kurk, J. (2005). De Informatievoorzieningsarchitectuur als
scharnier. Den Haag: Academic Service.
Broers, H. (2007). Onrust in de wijngaard, de wording van Windesheim. Zwolle: Waanders.
Caballero, I., & Piattini, M. (2003). CALDEA: A Data Quality Model Based on Maturity Levels.
Proceedings of the Third International Conference On Quality Software (pp. 380-387).
Washington: IEEE Computer Society.
Caluwé, L. d., & Vermaak, H. (2006). Leren veranderen, Een handbioek voor de veranderkundige.
Deventer: Kluwer.
Champlin, B. (2002, 01 14). Beyond The CMM: Why Implementing the SEI's Capability Maturity
Model Is Insufficient To Deliver Quality Information Systems in Real-World Corporate IT
Organizations. Retrieved 02 07, 2010, from DAMA Michigan: www.dama-michican.org
Chen, P. (1976). The Entity Relationship Model: Toward a Unified View on Data. ACM Transactions
on database systems , 166 - 193.
Conway, S. D., & Conway, M. E. (2008). Essentials of Enterprise Compliance. Hoboken, New
Jersey: John Wiley & Sons.
Curtis, B., Hefley, W. E., & Miller, S. A. (2009). People CMM: A Framework for Human Capital
Management, Second Edition. Boston, MA: Pearson Education, Inc.
15-Apr-23 F. Boterenbrood Page 11015-Apr-23 F. Boterenbrood Page 110
Research Improving data quality in higher educationThesisImproving data quality in higher education
Data Quality Task Force. (2004, 12). Forum Guide to Building a Culture of Quality Data. Retrieved
11 30, 2009, from ies national center for educational statistics:
http://nces.ed.gov/forum/pub_2005801.asp
Davis, J. (2009). Open Source SOA. Greenwich: Manning Publications.
English, L. P. (2009). Information Quality Applied: Best Practices for Improving Business
Information, Processes and Systems. Indianapolis: John Wiley & Sons.
European Commission. (2005). The Framework of Qualifications for the European Higher Education
Area. Retrieved 03 10, 2010, from The official Bologna Process Website:
http://www.ond.vlaanderen.be/hogeronderwijs/bologna
Fishman, N. A. (2009). Viral Data in SOA: An Enterprise Pandemic. Boston: Pearson plc publishing
as IBM Press.
Friedman, T. (2009, 09 09). Gartner Webinar: Data Quality Do’s and Don'ts. Retrieved 02 10, 2010,
from Gartner: www.gartner.com
Gack, G. A. (2009). Connecting Six Sigma to CMMI Measurement and Analysis. Retrieved 12 9,
2009, from i Six Sigma: http://software.isixsigma.com/library/content/c050316b.asp
Gartner. (2007, 02 07). Gartner's Data Quality Maturity Model. Retrieved 02 10, 2010, from Gartner
Research: http://my.gartner.com
Goodhue, D. L., Wybo, M. D., & Kirsch, L. J. (sept 1992). The Impact of Data Integration on the
Costs and Benefits of Information Systems. MIS Quarterly, Vol. 16, No. 3 , 293-311.
Graham, I. (2007). Business Rules Management and Service Oriented Architecture. Hoboken: John
Wiley & Sons.
HBO-raad Lectorenplatform. (2006). Lectoren bij hogescholen. Diemen: Villa Grafica.
Hendriks, P. (2000). De noodzaak van een nieuwe norm voor procesverbetering? Wat behelst ISO
15504 - SPICE? Retrieved 12 9, 2009, from Esprit project no 27700:
http://www.serc.nl/espinode/informatie/SPICE.htm
Hoermann, K., Mueller, M., Dittmann, L., & Zimmer, J. (2008). Automotive SPICE in Practice:
Surviving Interpretation and Assessment. Santa Barbara: Rocky Nook.
Hope, G., & Woolf, B. (2008). Enterprise Integration Patterns. Boston: Pearson Education, Inc.
Iersel, J. v., Loo, F. v., Serail, I., & Smulders, L. (2009). Windesheim Onderwijs Standaard versie 5.0.
Zwolle: Windesheim.
Jansen, J. (2006). Domeinarchitectuur vraaggestuurd onderwijs Windesheim. Zwolle: Windesheim.
Johnson, E., & Jones, J. (2008). A Developer’s Guide to Data Modeling for SQL Server: Covering
SQL Server 2005 and 2008. Boston: Pearson Education, Inc.
Kneuper, R. (2008). CMMI: Capability Maturity Model Integration A Process Improvement
Approach. Santa Barbara, CA: Rocky Nook.
15-Apr-23 F. Boterenbrood Page 11115-Apr-23 F. Boterenbrood Page 111
Research Improving data quality in higher educationThesisImproving data quality in higher education
Kovac, R., Lee, Y. W., & Pipino, L. L. (1997, 10). Total Data Quality Management: The Case of IRI.
Retrieved 02 24, 2010, from The MIT Total Data Quality Management Program:
http://web.mit.edu
Lankhorst, M. (2005). Enterprise Architecture At Work. Berlin: Springer-Verlag Berlin and
Heidelberg GmbH & Co. KG .
Lee, Y. W., Pipino, L. L., Funk, J. D., & Wang, R. Y. (2006). Journey to Data Quality. Cambridge,
Massachusetts: The MIT Press.
Loshin, D. (2001). Enterprise Knowledge Management, the data quality approach. San Diego:
Academic Press.
Loshin, D. (2008). Master Data Management. Burlington: Morgan Kaufmann OMG Press.
Marble, R. P. (1992). A stage theoretic approach of information system planning in existing entities of
recently established market economies. Retrieved 11 11, 2009, from System Dynamics
Society: http://www.systemdynamics.org/conferences/1992/proceed/pdfs/marbl405.pdf
McGilvray, D. (2008). Executing Data Quality Projects. Burlington, MA: Elsevier, Inc.
Mosley, M. (2008). DM BOK: Data Management Body of Knowledge. Retrieved 11 07, 2009, from
Data Management International: www.dama.org
Nolan, R. (march-april 1979). Managing the crisis in data processing. Harvard Business Review no
79206 .
Object Management Group. (2008, 06 01). Business Process Maturity Model (BPMM). Retrieved 12
9, 2009, from Object Management Group: http://www.omg.org/spec/BPMM/
Olle, T. W. (1978). The Codasyl Approach to Data Base Management. New York: John Wiley &
Sons.
Pant, K., & Juric, M. (2008). Business Process Driven SOA using BPMN and BPEL: From Business
Process Modeling to Orchestration and Service Oriented Architecture. Birmingham: Packt
Publishing.
Pascale, R., Peters, T., & Waterman, R. (2009). McKinsey's 7-s framework model. Retrieved 12 9,
2009, from Value Based Management.net:
http://www.valuebasedmanagement.net/methods_7s.html
Porter, M., & Millar, V. (1985, Juli-August). How information gives you competitive advantage.
Harvard Business Review .
Project Management Institute. (2008). Organizational Project Management Maturity Model OPM3.
Newtown Square, Pennsylvania: Project Management Institute.
Riet, P. v. (2009, 10). Knelpunten in de plannings- en roosteringsprocessen van de hogescholen.
Retrieved 02 18, 2010, from Lectoraat ICT en Onderwijsinnovatie: www.licto.nl
Ryu, K.-S., Park, J.-S., & Park, J.-H. (2006). A Data Quality Management Maturity Model. ETRI
Journal vol.28, no.2, Apr. 2006 , 191 - 204.
15-Apr-23 F. Boterenbrood Page 11215-Apr-23 F. Boterenbrood Page 112
Research Improving data quality in higher educationThesisImproving data quality in higher education
Schumacher, M., Fernandez-Buglioni, E., Hybertson, D., Buschmann, F., & Sommerland, P. (2006).
Security Patterns, Integrating Security and System Engineering. Chchester: John Wiley & Sons
Ltd.
Software Engineering Institute. (2009). Capability Maturity Model Integration Overview. Retrieved
12 9, 2009, from Software Engineering Institute / Carnegie Mellon:
http://www.sei.cmu.edu/cmmi/
Starreveld, R., Leeuwen, O. v., & Nimwegen, H. v. (2004). Bestuurlijke informatieverzorging deel 2a
- Fasen van de waardekringloop. Leiden: Stenfert Kroese.
Tan, D. (2003). Van Informatie management naar Informatie Infrastructuur management. Leiderdorp:
Lansa Publishing.
Treacy, M., & Wiersema, F. (1997). The Discipline of Market Leaders: Choose Your Customers,
Narrow Your Focus, Dominate Your Market. New York: Perseus Books.
Vermeer, B. H. (2001). Data Quality and Data Alignment in E-business. Eindhoven: CIP-Data
Library Technische Universiteit Eindhoven.
Verreck, O., Graaf, A. d., & Sanden, W. v. (2005, augustus). Meten en verbeteren van
gegevenskwaliteit. Tiem - 9 , pp. 36 - 42.
Vught, F. A., & Huisman, J. (2009). Mapping the Higher Education Landscape. Dordrecht: Springer
Science + Business Media B.V.
Windesheim. (2004). IT Architectuur 2004 ICT v3.doc. Zwolle: Windesheim dienst ICT.
Zee, P. d. (2001). Business Transformatie en IT, Vervlechting en ontvlechting van ondernemingen en
informatietechnologie. Retrieved 11 11, 2009, from Management en Consulting:
http://managementconsult.profpages.nl/man_bib/ora/vanderzee01.pdf
Zeist, B. v., Hendriks, P., Paulussen, R., & Trieneken, J. (1996). Kwaliteit van Softwareprodukten -
Ervaringen met een kwaliteitsmodel. Deventer: Kluwer Bedrijfswetenschappen.
15-Apr-23 F. Boterenbrood Page 11315-Apr-23 F. Boterenbrood Page 113
Research Improving data quality in higher educationThesisImproving data quality in higher education
6.16 List of figures and tables
Figure 01: Windesheim Context Diagram...............................................................................................4
Figure 02: The Windesheim application landscape...............................................................................11
Figure 03: IT service department and system integration organization.................................................11
Figure 04: Nolan’s stage model.............................................................................................................15
Figure 05: Era’s and discontinuities, (Zee, 2001)..................................................................................16
Figure 06: Project stakeholders..............................................................................................................20
Figure 07: Research Model....................................................................................................................25
Figure 08: Concepts Used......................................................................................................................28
Figure 09: graphical representation WDQM..........................................................................................40
Figure 10: A Data Quality Management Maturity Model (Ryu, Park, & Park, 2006)...........................41
Figure 11: Related Dimensions..............................................................................................................53
Figure 12: Domain architecture student centered education Windesheim.............................................55
Table 01: Stakeholder analysis...............................................................................................................20
Table 02: Research Material..................................................................................................................29
Table 03: Practices and structure, process, technology, information and staff......................................36
Table 04: A combined view on maturity................................................................................................37
Table 05: Windesheim Data Quality Maturity model WDQM..............................................................38
Table 06: A combined view on the WDQM and the Gartner Data Quality Maturity model.................43
Table 07: An overview of data quality dimensions................................................................................45
Table 08: Dimensions of data quality....................................................................................................46
Table 09: WDQM Goals expressed in Data Quality Dimensions, Practices and Attributes..................52
Table 10: Current data quality dimension values...................................................................................60
Table 11: Data quality dimension assessment workshop results...........................................................61
6.17 Glossary
Accessibility Ease of attainability of the data
Accountability Accountability is the property that describing that actions affecting enterprise assets can be traced to the actor responsible for the action
Accuracy Closeness of the value of the data to the value in the
15-Apr-23 F. Boterenbrood Page 11415-Apr-23 F. Boterenbrood Page 114
Research Improving data quality in higher educationThesisImproving data quality in higher education
real world
Business Rules A set of high level descriptions, guiding the behavior of an organization
Business rule matching
Comparing data values found in a database with valid values according to business rules
Canonical Data Model A thesaurus of all data being exchanged between systems
Completeness The degree in which elements are not missing from a set
Confidentiality The property that data is disclosed only as intended by the enterprise
Consistency The degree in which values and formats of data elements are in line with semantic rules over this set of data-items
Correctness The degree in which values and formats of data elements are in line with the current state of an object in the physical world represented by the data
COTS Commercial Off The Shelve. An acronym for packaged, ready to use applications
CRUD services Create, Retrieve, Update and Delete data manipulation services
Currency Concerns how promptly data are updated
Data profiling A set of algorithms for statistical analysis and assessment of the quality of data values within a data set, as well as for exploring relationships that exist between value collections within and across data sets
Discontinuity A change in values perceived to be a setback
DMAIC Quality improvement cycle including Define, Measure, Analyze, Improve and Control phases
Endless loop See loop, endless
Information Data, fit for use, available in a context
Input check A control guarding data quality when entered
IP Information Product
Integrity, Data The degree in which data is fit for use
Integrity, Referential The degree in which related sets of data are consistent
Latency Idle time in a process
Loop, endless See endless loop
MDM Master Data Management maturity model
New data acquisition An activity in which suspect data is replaced by newly retrieved data
Overloading Assigning values to a variable, indication a system state the variable was not originally intended to signal
OPM3 Organizational Project Management Maturity Model
Process A set of business rules, started by a single trigger, when executed results in a predictable outcome
Process area A cluster of related practices, part of a maturity level
Reliability The degree in which data is perceived to represent reality
15-Apr-23 F. Boterenbrood Page 11515-Apr-23 F. Boterenbrood Page 115
Research Improving data quality in higher educationThesisImproving data quality in higher education
Root cause analysis A technique to identify the underlying root cause, the primary source resulting in the problems experienced
ROTAP Research, Ontwikkel (Develop) Test, Accept and Production environments
Schema cleaning Transforming a conceptual schema in order to achieve or optimize a given set of qualities
Schema matching To create a mapping between semantically correspondent elements of two database schemas
Service center Department within an organization supporting the main business processes
SOA Service Oriented Architecture
Source Rating Assessing sources on the basis the quality of data they provide to other sources
Specifications A measure of the existence, completeness, quality and documentation of data standards
Staff Personnel involved in a process
Structure Describes the way an organization is structured
Technology Tooling required to execute a process
TDQM Total Data Quality Methodology
Timeliness Or Availability: A measure of the degree to which data are current and available for use
TIQM Total Information Quality Methodology
Uniqueness Refers to requirements that entities are captured, represented, and referenced uniquely
Volatility Characterizes the frequency with which data vary in time
WDQM Windesheim Data Quality Maturity Model
15-Apr-23 F. Boterenbrood Page 11615-Apr-23 F. Boterenbrood Page 116
top related