zhongwen youxi he - university of...
TRANSCRIPT
Zhongwen Youxi He
A dissertation submitted to The University of Manchester for the degree
of Master of Science in the Faculty of Engineering and Physical Sciences
2010
By
Melville McDonald
School of Computer Science
2
Table of Contents
Table of Contents ................................................................................. 2
Table of Figures .................................................................................... 5
List of Abbreviations ............................................................................. 7
Abstract ............................................................................................... 8
Declaration ........................................................................................... 9
Copyright ........................................................................................... 10
Acknowledgements ............................................................................ 11
Chapter 1: Introduction ...................................................................... 12
1.2 Dissertation Overview ................................................................ 14
Chapter 2: Background ....................................................................... 15
2.1 The Chinese System ................................................................... 15
2.1.1 Overview ............................................................................. 15
2.1.2 Strokes ................................................................................ 16
2.1.3 Radicals .............................................................................. 19
2.1.4 Components ........................................................................ 22
2.1.5 Conclusion .......................................................................... 23
2.2 The Computerised System ......................................................... 23
2.2.1 Database Technology .......................................................... 24
2.2.2 The Editor ........................................................................... 26
2.2.3 Supplementary tools ............................................................ 27
2.2.4 Reflection ............................................................................ 29
Chapter Summary ............................................................................ 30
Chapter 3: Research Methods and Design Considerations .................... 31
3.1 Project Overview ........................................................................ 31
3.2 Project Objectives ...................................................................... 31
3.3 Project Plan ............................................................................... 32
Chapter Summary ............................................................................ 35
Chapter 4: Design ............................................................................... 36
4.1 Database ................................................................................... 36
3
4.1.1 Division of Work .................................................................. 36
4.1.2 Entity Relationship Model .................................................... 37
4.2 The Editor ................................................................................. 40
4.2.1 The Model ........................................................................... 41
4.2.2 The Controller ..................................................................... 43
4.2.3 The View ............................................................................. 46
4.2.4 The AboutEditor and TextParser .......................................... 48
4.2.4 The SVG Panel ..................................................................... 53
Chapter Summary ............................................................................ 53
Chapter 5: Implementation and Testing .............................................. 54
5.1 The Tools .................................................................................. 54
5.2 The Editor ................................................................................. 54
5.2.1 Radicals Table Test .............................................................. 55
5.2.2 Repository Manager Test ..................................................... 56
5.2.3 View Controller Test ............................................................ 57
5.3 The Visual Elements................................................................... 58
5.3.1 Adding a New Record .......................................................... 60
5.3.2 Updating a record ................................................................ 62
5.3.3 Removing a Record .............................................................. 63
5.3.4 The SVG Panel ..................................................................... 63
5.3.5 The AboutEditor and TextParser .......................................... 64
5.3.6 Discovered Issues ................................................................ 66
5.4 The Database ............................................................................ 66
一 yī : One .................................................................................... 67
丿 piě: Hook or left- falling stroke ................................................. 67
八 bā: Eight .................................................................................. 68
冫 bīng: Ice ................................................................................... 69
卜 bǔ: To Divine ............................................................................. 70
勹 bāo: Wrap ................................................................................ 70
几 jī: Table .................................................................................... 71
门 mén: Door ................................................................................ 72
4
人 rén: Human being .................................................................... 73
工 gōng: Work .............................................................................. 73
Chapter Summary ............................................................................ 74
Chapter 6: Evaluation.......................................................................... 75
6.1The Editor .................................................................................. 75
6.2The Database ............................................................................. 77
Chapter Summary ............................................................................ 78
Chapter 7: Conclusions and Future Work ............................................. 79
Chapter Summary ............................................................................ 80
References ......................................................................................... 81
Appendix ........................................................................................... 84
TestBase.......................................................................................... 84
RadicalTableTest ............................................................................. 85
RepositoryManagerTest ................................................................... 87
ViewControllerTest .......................................................................... 91
Word count: 13,377
5
Table of Figures
Figure 1: English Alphabet and Chinese character nǐ (you) ................... 13
Figure 2: Stroke order [3] .................................................................... 17
Figure 3: Top to bottom ...................................................................... 17
Figure 4: Left to right .......................................................................... 18
Figure 5: Horizontal strokes before vertical strokes ............................. 18
Figure 6: Centre strokes before left and right strokes .......................... 18
Figure 7: Outside then inside .............................................................. 19
Figure 8: Fill up before closing ............................................................ 19
Figure 9: The Eight Trigrams [5] .......................................................... 21
Figure 10: Example of component breakdown [3] ................................ 23
Figure 11: Example of an entity relationship for Chinese characters and
components (aggregation) .................................................................. 34
Figure 12: Database entity relationship diagram .................................. 37
Figure 13: Example of the TICCC ........................................................ 38
Figure 14: Example of the TICCC and database fields .......................... 39
Figure 15: Initial Model View Controller design pattern ........................ 40
Figure 16: RadicalTable class .............................................................. 42
Figure 17: RepositoryManager class .................................................... 44
Figure 18: ViewController class ........................................................... 45
Figure 19: Editor Framework ............................................................... 46
Figure 20: MainView and TabView class .............................................. 46
Figure 21:RadicalView input form........................................................ 47
Figure 22: RadicalViewer class diagram ............................................... 48
Figure 23: The AboutEditor ................................................................. 49
Figure 24: AboutEditor class diagram .................................................. 49
Figure 25: Basic flow of control for AboutEditor .................................. 50
Figure 26: The TextParser class diagram ............................................. 52
Figure 27: Radical Table Test passed .................................................. 56
6
Figure 28: Repository Manager Test passed ........................................ 57
Figure 29: View Controller Test passed ............................................... 58
Figure 30: Main View .......................................................................... 59
Figure 31: RadicalViewer input screen ................................................. 60
Figure 32: Adding a new record .......................................................... 60
Figure 33: Save confirmation .............................................................. 61
Figure 34: Save confirmation code ...................................................... 61
Figure 35: New record added .............................................................. 62
Figure 36: No radical to update ........................................................... 62
Figure 37: Remove from database confirmation .................................. 63
Figure 38: Setup of the SVG Canvas .................................................... 63
Figure 39: The SVG Panel and SVG Field .............................................. 64
Figure 40: AboutEditor rawEditor pane input ....................................... 65
Figure 41: The parsed text .................................................................. 65
7
List of Abbreviations
TICCC – Table of Indexing Chinese Character Components [12]
DBMS – Database Management System
JDBC - Java Database Connectivity
JPA – Java Persistence Architecture
SVG – Scalable Vector Graphics
XML – Extensible Mark-up Language
HTML – Hyper Text Mark-up Language
SQL – Structured Query Language
GB - Gigabyte
TB - Terabyte
API – Application Programming Interface
SAX - Simple API for XML
VB.NET – Visual basic.NET
URL – Uniform Resource Locator
MVC – Model - View – Controller Architecture
8
Abstract
As an initial stage of the Zhongwen Youxi He Project this section aims to
look into the foundations of building a tool to allow a native English
speaker to learn about Chinese characters. The tool will be composed
of a database store of the characters, and their main components,
phonetics and radicals much like the official Chinese indexing and
classification system found in literature and the Chinese Dictionary and
the “Table for Indexing Chinese Character Component”(TICCC). The
project will also investigate the architecture of storage and indexing of
the radicals and ways to infer semantics links between their radicals and
the characters and components.
9
Declaration
No portion of the work referred to in the dissertation has been
submitted in support of an application for another degree or
qualification of this or any other university or other institute of
learning;
10
Copyright
i. Copyright in text of this dissertation rests with the Author. Copies (by
any process) either in full, or of extracts, may be made only in
accordance with instructions given by the Author. Details may be
obtained from the appropriate Graduate Office. This page must form
part of any such copies made. Further copies (by any process) of copies
made in accordance with such instructions may not be made without the
permission (in writing) of the Author.
ii. The ownership of any intellectual property rights which may be
described in this thesis is vested in the University of Manchester, subject
to any prior agreement to the contrary, and may not be made available
for use by third parties without the written permission of the University,
which will prescribe the terms and conditions of any such agreement.
iii. Further information on the conditions under which disclosures and
exploitation may take place is available from the Head of the School of
Computer Science.
11
Acknowledgements
I would like to thank my supervisor Dr Richard Banach who
masterminded this project. His guidance helped me to understand the
scope of this project and its potential benefits, but also helped me to
overcome a number of difficulties throughout the course of the project.
I would also like to thank my family; my mother, father and brothers
who have supported and encouraged me in my pursuit of further
education. None of this would have been possible without their help.
12
Chapter 1: Introduction
When a person attempts to learn another language they must overcome
the challenge of learning new words, grammar systems, and completely
different alphabets. The Chinese and English writing systems differ
drastically, from the direction of reading to the composition of words. It
is a substantial task for an English speaker to learn this new system as it
originated from a completely different cultural perspective. It is this
perspective, the native language and educational system which
influences a person‟s learning processes.
The English and Chinese writing systems are fundamentally different,
with English employing an alphabet, and the Chinese using a
logographic system. An alphabetic system has letters which constitute
phonemes or sounds. These letters usually have no meaning
individually. However in the case of English, an alphabet of 26
meaningless monosyllabic letters can be combined in various legal
permutations to create multisyllabic phonetic and semantics
(words/sounds). Words from this system have the property of a user
being able to decompose them to identify the letters or even spell them
according to how they sound. The Chinese system is composed of core
components which can create monosyllabic words. These words can
also be combined to form new words though they require some
knowledge or shared basic concepts in order to be understood. In
Chinese there are some characters which are pictograms and can be
identified by their similarity to real world objects, but others are not.
13
Figure 1 shows the entire English alphabet of 26 letters. The letters can
be memorised so that the speaker can combine them to form the word
“you”, whereas the Chinese character nǐ is a logogram which requires the
user to know the character to recognise its meaning.
The Chinese language also has the concept of tones, which in English
are used to convey an emotional context such as inflection for
questions. For example “You have one” with “You have one?” The first
is a statement, possibly a reply usually given with a flat tone. The
intonation of the second however suggests it is a question with the pitch
rising near the end. (Incidentally this could be an example of a very
short conversation). With Chinese the word‟s meaning is dependent on
the phonemes which give the words its sound or pronunciation. But also
the tone associated with it. As a result there are formally documented
tones and tonal signs in the written language.
These differences also extend to the indexing systems of the two
languages, with Chinese being indexed by the radical system in many
dictionaries, and English being indexed alphabetically by starting letters.
The scarcity of reliable information makes it more difficult for a native
Figure 1: English Alphabet and Chinese character nǐ (you)
14
English person to learn about the Chinese system. Recently however the
Chinese Government Published an official Table of Indexing Chinese
Character Components, (TICCC). The main aim for this stage of the
project is to create the foundations of an indexing system for the
Chinese characters which can be further advanced to create a tool to aid
an English speaker to learn about the Chinese characters. This system
will be built using the officially published character data. In addition to
this the project aims to investigate ways to allow links to be inferred
semantically between the components and the radicals.
1.2 Dissertation Overview
The dissertation structure will be roughly analogous to the chronological
development of the project:
Chapter 2 will introduce the background to the project. It will
discuss the Chinese writing system as well as the history and
development the Chinese character radicals with some of the
research surrounding technologies which may be suitable to the
current project.
Chapter 3 will discuss other project considerations with regard to
the main project objectives, including my contribution and the
basic project plan.
Chapter 4 will describe my contribution to the design process for
the database and the editor.
Chapter 5 outlines the implementation and testing of the project.
Chapter 6 evaluates the success of the project with suggestion of
possible improvements.
Chapter 7 concludes the project and suggests possibilities for the
future of the Zhongwen Youxi He project.
15
Chapter 2: Background
This chapter will explore the fundamentals of the Chinese writing
system as well as the history and development of the Chinese character
radical indexing system. This chapter will also outline some of the
research conducted with regard to tools which could be used for the
development of this project.
2.1 The Chinese System
2.1.1 Overview
The modern Chinese writing system uses Han Characters (Hànzì). This
logographic system contains over 50,000 characters. The characters
have evolved from pictograms and hieroglyphs, to the more abstract
ideographs and the characters known today. At the same time
phonetics were beginning to be included in the character structure [3].
The sources of the modern characters however are a mixture of new and
old characters composed mostly of pictograms, ideograms, phonetic
loans and some phonetic-semantic components, there are also with
some different characters which have their meaning due to regional
differences. They can be divided into two very broad categories; simple
and compound. The simple characters account for about 4% of the
characters, and contrary to the compound characters are not divisible
[3].
In 1956 the government of The People Republic of China introduced the
first draft of simplified characters. This process was introduced to
simplify the Traditional Chinese characters to in an effort to increase
literacy nationally. According to the government the number of
characters a literate person should know is 2000. Reproducing the
characters is a difficult task however as even the simplified characters
16
contain diverse shapes. There is a claim that at least five years of
formal schooling would be needed to achieve literacy [3], though in
reality at least 6 years are needed. Most native English speakers would
quite possibly not want to have formal schooling or to have to wait this
long before they felt confident enough to with the language to begin to
use it.
2.1.2 Strokes
The main components of the hànzì characters are the strokes, radicals,
and components. The smallest structural unit is the stroke (similarly to
the English alphabetic system), which represents the action of the brush
or pen on the page. This system of strokes however is more formalised
and a specific technique which are recurrently used for the creation of
all Chinese characters [3] These strokes can be divided into eight main
categories: horizontal (一), vertical, (丨) left-downward, (丿) right-
downward, dot (、), hooks (亅), turning (乛, 乚, 乙) and rising (丶) though
the number of supplementary strokes is 30 [3] but these are only
variants.
17
Figure 2: Stroke order [3]
There is an order to the strokes for writing Chinese characters [2]:
Figure 3: Top to bottom
18
Figure 4: Left to right
Figure 5: Horizontal strokes before vertical strokes
Figure 6: Centre strokes before left and right strokes
19
Figure 7: Outside then inside
Figure 8: Fill up before closing
It is however the radicals and components that form the logical units of
the hànzì system.
2.1.3 Radicals
Radicals are the smallest meaningful unit in the hànzì writing system.
They are used both as independent simple characters and as part of
more complex characters. In modern Chinese dictionaries radicals are
used as section headers (bùshǒu) with characters indexed according to
the radical they most closely match or contain. One of the biggest
issues regarding radicals is that there is no formal and exact way to
describe exactly what they are since there are so many ways to use
them. As a result there is some disagreement as to their exact role and
how they are to be used.
20
Many Chinese Dictionaries use radicals as section headers, this system
is said to have been introduced with the Shuōwén Jiézì by Xǔ Shèn at
around the 2nd
Century A.D during the Han Dynasty. Xǔ made
distinctions between two types of characters wén 文 (pictograms) and zì
字 (characters). The dictionary contained 9352 characters “as distinct
entries and 1163 in variant form” [4]. They were organised based on
their visual components into headings numbering 540 [4]. This method
though at the time radical in thought has since become the most widely
used system of organisation as it made the process of locating
characters more methodical and convenient [4].
Prior to this dictionaries were organised differently, the earliest known
major character dictionary is the Erya which is argued to have been
created between the 8th
and 2nd
Century BC [4]. It was not created as a
dictionary, it was more of the an encyclopaedia and literary reference,
however “It was the first work to collect arrange and define words in a
systematic fashion” [4], spanning 20 chapters with over 2000 entries
organised into categories such as common terms and kinship terms [4].
This collection however was seen to be too difficult to read, and scholars
felt that the book was less of a reference for consultation and more of a
study text. It was this that led to the creation of the Shuōwén Jiézì.
The idea of organising dictionaries however maybe argued to have
originated much earlier than this even. Chinese lore claims that Chinese
characters were created by a Great Emperor who came across the idea
from observing nature, and how each object seem to fit into a category.
He noted that marks left by the animals could be used as a tool of
lasting communication e.g. claw prints and decided to create the
characters to reflect this. The characters were then placed into eight
categories called the Eight Trigrams. Though these categories were
21
broad it can be seen that there was some practicality as any text
indexed by this system would have had some search ability.
Figure 9: The Eight Trigrams [5]
The most prominent use of the radical system however was the Kāngxī
Dictionary (Kāngxī zìdiǎn) 18th
Century A.D. Containing nearly 50,000
characters, which was considerably more than that of the Shuōwén Jiézì,
the dictionary was able to reduce the number of radicals to 214. The
dictionary has characters indexed by radicals as well as by the number
of strokes [6] and contains information about variant forms and
pronunciation, and though there have been some obvious changes over
the centuries, these 214 radical characters are still the basis for all
modern radical dictionaries.
Today there are numerous radical indexing systems in use with different
numbers of radicals, sometimes with secondary radicals indexed with
stroke counts. With more than 80,000 characters in the Chinese
language there are many variations [6] such as the TICCC which lists
201 main radicals.
There is some debate as to the extent of the connection a radical has to
the characters under it, and whether there is some semantic relationship
22
with all of the characters under a certain radical. There an argument that
the term radical implies semantic links since its Latin definition is “root”
[7], and Latin derived languages such as English can be broken up into
“root and termination” [1], and although this does not translate exactly
in to the Chinese system the radical should still be considered “the
meaning part” [1].
“采 cǎi „to pick, pluck‟ is an associative compound] comprising two
elements or components, a hand 爫 (zhǎo or zhuǎ) picking items from a
tree 木 (mù); that is, it is originally a two-part graph” [7].
However the phonetic elements of words need to be taken into
consideration and there is disagreement over how to categorise radicals
and phonetics with semantic-phonetic compounds which have become
increasingly used.
2.1.4 Components
Components seem to have arisen out of a need to “reconstruct
characters into more manoeuvrable units” [3] for the modern age of
computing. The characters are divided into logical units based on the
shape and makeup of the characters. These components are based
purely on the graphic qualities as in a computer system the semantics or
the stroke composition would be of little consequence. The Information
Processing Standard Components for GB 13000.1 Character Set has 560
basic components [3] and The Specification of Common Modern Chinese
Character Components and Component Names list 514. There is some
issue with regard to the representations of the characters and whether
the necessary characters have been included, however this standard will
probably be reviewed or replaced as the character sets change and the
technology for representing them improves.
23
Figure 10: Example of component breakdown [3]
2.1.5 Conclusion
The system should utilise the most logical system based on the available
documentation and index the radicals in this manner. The most obvious
solution would be to follow the TICCC and the most recent published
character and component data as this is the most accurate and up to
date and the current organisation index will reflect this.
2.2 The Computerised System
The system will comprise two broad parts: A database and an Editor.
The database will be created to store and index characters with, radical,
phonetic, character and component data as well as semantics, and the
editor will be the method for input and retrieval from the database.
Software tools that will allow this will be investigated so that a decision
can be made about the most suitable.
24
2.2.1 Database Technology
The database will be the main repository for storage and retrieval of
information for the entire project. It is the most fundamental and one of
the most critical areas of the system, and as a result the platform used
needs to be optimal for the current and future advances and the system
architecture will need to be robust. The expected platforms are Linux
and Windows and there are a number of technologies which work on
either. Databases use query languages to allow manipulation of their
data, the current standard for databases is SQL, and there are a number
of implementations in different database systems which have different
features. This is especially true in the case of the free versions which
often limit their functionality or speed in some way. Most of the
implementations here conform to the SQL-92 standard [8].
DB2: This is an IBM created Relational Database Management system
which runs on Linux and Windows. The free version IBM DB2 Express-C1
can be installed for development of database systems for a small
number of users is limited to cores and 2GB memory. It has both a
command line and GUI interface.
Informix: Another IBM owned product, this is similar to DB2 in many
ways except that it is only available for 32bit operating systems, and is
limited to 4GB memory.
H2: This is an open source RDBMS which is written in Java. Using JaQu2
which is a Java Query language it is able to be integrated directly into
Java Applications and boasts an impressive number of features
1
About DB2 Express-C; IBM; http://www-
01.ibm.com/software/data/db2/express/about.html ; Last Accessed 10/05/10
2
H2 Database Engine; H2; http://www.h2database.com/html/jaqu.html ; Last Accessed
10/05/10
25
compared to some other database platforms including In-Memory
Databases which allows non persisting data to be manipulated. This is
useful for testing and prototyping, and may be an interesting feature to
have for this new system. The platform can run in embedded mode (run
from within the same JVM), server mode (as a server database to client
application) and mixed mode (embedded in application server) which
gives it some flexibility. There also seem to be a number of tutorials for
getting started.
MySQL: “The most popular open source database software”3
is available
for use on both Windows and Linux. Although it does not come with an
integrated GUI third party products and the MySQL Workbench can be
downloaded to provide this functionality. Most programming languages
can interface with it via the Open Database Connectivity API (or the JDBC
for Java). One of the most interesting features of this system apart from
its wide acceptance is that it allows multiple storage engines [11]
allowing different engine technologies to be used to implement
individual tables within a database .e.g. H2 for and Employees Table and
DB2 for a Payroll Table.
Microsoft SQL Server Express Edition: The Microsoft implementation
of SQL4
limits the size of the databases to 4GB and the hardware to a
single CPU with 1GB RAM. It provides native support for XML data and
can manipulate it using XQuery. It provides and easily accessible
backend for applications written in the MS.NET Framework.
3
About MySQL; Oracle; http://www.mysql.com/about/; Last Accessed 10/05/10.
4
Microsoft SQL Server 2008 Express; Microsoft Corporation;
http://www.microsoft.com/sqlserver/2008/en/us/express.aspx ; Last Accessed
10/05/10.
26
PostgreSQL: Is a PostgreSQL licensed product which in effect makes it
free to use and distribute. It is widely used and available on a range of
platforms including Windows and Linux, and can be used for enterprise
sized systems5
with an operational limit in excess of 4TB of data.
Oracle: The free version of Oracle‟s database limits the size of the
database to 4GB with a single processor with 1GB RAM. It is compatible
with both Windows and supports a number of programming languages
however only in a 32bit environment.
2.2.2 The Editor
Databases as above can be manipulated through command line and
graphical interfaces. For the purposes of this system the tools will need
to store data on Chinese characters, components, phonetics and
radicals. The integrated offerings from the DBMS developers in many
cases may not be flexible enough to handle this character data fully. As
a result an editor will be created which can allow creation, manipulation
and viewing of the character data from within the database. This tool
can be written in a number of languages such as:
Java: This would be compatible with many through the JDBC and can be
fully integrated with the H2 implementation. As an object oriented
language with a wide acceptance it is well matured has many useful
features in its API such as the DOM3 and SAX database views. It is also
widely available in the university, and is compatible with a number of
environments such as Windows and Linux meaning the implementations
of any application should have to be changed little if at all between
platforms.
5
PostgreSQL; PostgreSQL Global Development Group
http://www.postgresql.org/about/ ; Last Accessed 10/05/10.
27
Microsoft.NET Framework: This Microsoft family of languages including
VB.NET and C# can interface with databases and web applications
through a framework of base libraries. The framework is in theory cross
platform compatible although this is not quite as simple as with a Java
implementation. An editor application created with this framework
would however be very easy to integrate with MS SQL Server database
implementations.
Web Ontology Language6
: Otherwise known as OWL is a family of
languages and tools which allow data to be serialised and for semantic
conclusions to be drawn. The technology uses the idea of axioms and
assertions, in which axioms (rules) categorise the data into related
groups based on these assertions. Web based technologies and
standards such as XML/RDF are used to encode meaning in data which
can later be inferred.
2.2.3 Supplementary tools
In some cases the editor language may need an interface to access the
database. There are a number of methods to enable this connectivity.
The better known of these include
JDBC7
: The JDBC provides connectivity for java applications to different
types of databases including relational databases, allowing for SQL
based data access. It is widely used and provides database
implementation independence and flexibility.
6
OWL Web Ontology Language Guide; World Wide Web Consortium;
http://www.w3.org/TR/owl-guide/ Last Accessed 10/05/10.
7
Java SE Technologies – Database; Oracle Corporation;
http://www.oracle.com/technetwork/java/javase/tech/index-jsp-136101.html
28
JPA8
: As part of the Object-Relational Mapping (ORM) framework, the JPA
provides and interface for an ORM implementation to interact with a
database. The specification allows for the data source properties of a
database to be abstracted into a configuration file, which when packaged as
part of an application makes referencing and configuration a more
centralised process.
The JPA specification maps database tables to java classes called entities
which can then be manipulated as java objects. This allows a java
application to interact with these entities to update the database. This
specification requires an implementation to manage the interface
between the java application, entities and the database. Alternative
implmentations include Hibernate9
, iBatis10
and EclipseLink11
former are
compatible with both Java and MS.NET Framework languages with
Hibernate being written in the Java Virtual Machine environment for
platform independent. EclipseLink is a modified version of Oracle‟s
TopLink JPA implementation made for java.
Graphics: A graphical element in the editor would allow characters to be
input, and presented. Scalable Vector Graphics, an XML based drawing
platform12
allows shapes to be described in XML and drawn with an API
for scripting languages such as ECMA script. The standard has three
8
The Java Persistence API - A Simpler Programming Model for Entity Persistence; Oracle
Corporation; http://www.oracle.com/technetwork/articles/javaee/jpa-137156.html
9
Hibernate; JBoss Community; http://www.hibernate.org/ Last Accessed 11/05/10.
10
iBatis Homepage; Apache foundation; http://ibatis.apache.org/ ; Last Accessed
11/05/10.
11
Introducing EclipseLink; DZone, Inc; http://eclipse.dzone.com/articles/introducing-
eclipselink
12
About SVG; SVG Working Group; http://www.w3.org/Graphics/SVG/About; Last
Accessed 11/05/10.
29
types of graphic objects; vector graphics, rastar graphics and text. It is
a scalable standard which can be used in both mobile and desktop
systems, and supported by most web browsers. An alternative to this
includes Postscript which can be used to describe shapes in a similar
way, though SVG however has much wider support for fonts using
Unicode character encoding. SVG also supports multi directional text
allowing characters to flow from right to left and from top to bottom
which makes it suitable for the current project. The files created can
also be compressed if necessary which maybe necessary for database
space efficiency. An example of this is the Batik13
library which can be
utilised within a java application to enable the rendering of SVG
documents to a swing derived SVG Canvas.
2.2.4 Reflection
There were alternative programming languages which could have been
used to create the editor; however those listed seemed to be the most
useful for this particular project. The Java platform is familiar and cross
platform compatible, and although the Microsoft.NET Framework seems
less so, both languages are object oriented and have developed to allow
interaction with databases, and manage the application memory
themselves which is a positive for simplicity of application development.
OWL is a useful technology for the semantic requirements of the project
and it may be possible to use this tool in future iterations of the project
to enhance the inference of semantics.
The JPA specification seemed to offer a simple and robust solution to
accessing a database from a java application. In conjunction with an
13
Batik SVG Toolkit, 2010 The Apache Software Foundation;
http://xmlgraphics.apache.org/batik/; Last Accessed 11/05/10.
30
implementation such as EclipseLink it seemed the most suitable tool to
use. In conjunction with a MySQL DBMS which is a well established and
open source implementation.
Chapter Summary
This chapter looked at some of the fundamental concepts of the Chinese
writing system. The history of Chinese indexing was also explored with
the development of the Chinese character radicals. Some of the tools
that would aid the progress of the project were also outlined with the
most feasible being selected.
31
Chapter 3: Research Methods and Design
Considerations
This chapter will examine some of the issues that required consideration
in order to ensure effective project development. The project objectives
will be explained and a basic project plan outlined with reference to the
relevant sections in the dissertation. The development process will also
be explained with a description of the process used to evaluate the
project.
3.1 Project Overview
The long term aim of the Zhongwen Youxi He project is that of a fully
implemented interactive learning tool for a native English speaker. The
scope the current project however has been narrowed to creating the
foundations of the system to allow the indexing of the main articles of
the Chinese written language. Conceptually this can be thought of as
the system back-end, a purely logical and functional foundation with
little consideration for the higher level or front end systems.
3.2 Project Objectives
The project was divided as fairly as possible to allow individual members
to make a significant contribution. Both members were assigned an area
of priority research. The actual division of work however was not a
simple case of each member taking half the responsibility as some areas
of the system were shared. Collaboration and group management was
necessary to avoid the progress of one member being too dependant on
that of the other‟s. The areas of priority were assigned as follows:
Radicals (Melville)
32
Syllables (Melville)
Phonetics (Xing)
This division provided the basis for the division of work throughout the
project. This report concentrates on the Chinese character radicals and
the development of the project in relation to this. This individual
section of the project aimed record the information about the radicals
and to obtain some of the semantic themes from the characters indexed
by these radicals and thus will require a database to:
Index radicals.
Describe some basic uses and common semantic character
themes.
The project also required an editor to allow data to be input and
retrieved from the database. This editor application would therefore
need to communicate with the underlying database, to query and update
the database as necessary.
3.3 Project Plan
The project followed a basic development plan. To allow for variation in
methods of research and execution of individual group member‟s work,
regular meetings helped to ensure the coordination and a common
direction. These main project stages are milestones from which set a
foundation for the next project stage.
Database technology decision: MySQL was decided the most suitable
technology for this project as described in the ground section. This was
agreed by both team members.
33
Editor development platform: Java was chosen as the programming
language due its availability on the University machines and the fact that
both members of the group had developed applications with it
previously. The editor application would need to utilise database
connectivity provided by the Java API. The JPA was chosen for the
reasons mentioned in the Chapter 2.
Database design: This project involves three main areas of
development; the database, the editor and the translation link between
them. In order to ensure robust system design as well as flexible and
efficient development time, the design and development process would
have to be iterative. The database had to be designed and implemented
before the editor as this could be argued the most important feature for
the current and future Zhongwen Youxi He developments.
Typically database design has three main stages [9] consisting of system
analysis requirements analysis followed by conceptual, logical and
physical modelling. As this was a new system there was no current
architecture to analyse, so the requirements were obtained and
engineered to ensure that there was an understanding of proposed
system operation. The process included modelling use cases and
brainstorming.
Conceptual Modelling: This stage modelled the conceptual schema
based on the system analysis. Entity Relationship modelling was be
used to identify the main system objects and how they should be related
to each other such as the relationship between the characters and the
components or the relationship between characters and radicals. Each
individual member paid particular attention to the design of their
particular area of responsibility.
34
Logical Modelling: This stage of the process attempted to model the
conceptual model in terms of the database technology, mapping them
into a logical schema, such as the tables which stored the radicals and
their relationships to each other. Normalisation could then be applied.
Physical Modelling: Though this stage “involves the selection of
indexes (access methods), partitioning and clustering of data” [9]. This
was outside the scope of the current project, however.
Editor design: On completion of the database design the initial editor
design could also begin.
Implementation: Once the database architecture had been designed it
will be created. The editor could then also be implemented. In the
mean time the radicals table of the database could be populated with
data which would form the basis of the Zhongwen Youxi He project. This
is further discussed in Chapter 5.
Testing: The editor would be tested modularly as functionality is added
to ensure that any previous functionality was unaffected each method
would be treated as a functional sub unit. This method borrowed from
the test driven development methodology [10]. These tests would
include simple functionality such as branching and looping, but also
whether the functions operate correctly with the translation tools, i.e.
the database data is correctly affected. This would then be integrated
N 1
Composed of
Character Components
Figure 11: Example of an entity relationship for Chinese characters
and components (aggregation)
35
into the overall system in accordance with group management policy
this is discussed mainly in the group report..
The application could then be tested as a whole. The system could be
populated with test data in usability tests:
entry of radicals
retrieval of radicals
input of details
retrieval of details
deleting data
editing of data
Evaluation of System: Usability tests to determine the extent to which
the system satisfied the project objectives, identifying areas of the
system requiring improvement. The design processes, research
methods, the database and some of the conclusions drawn from the
research, would be evaluated
Chapter Summary
This chapter outlined some considerations that affected the course of
project development. The scope and the objectives of the projects were
firmly established. The research and development processes were
described to and a project plan with regard to the Chinese character
radicals was also explained.
36
Chapter 4: Design
The project was divided into the database and editor layers. The
individual projects were sub objectives of these layers, with both
members of the group attempting to share responsibility and contribute
equally. Due to overlapping areas of implementation however, this was
difficult but attempts were continuously made to ensure effective
distribution of workload. This section attempts to describe my
contribution to the design of various parts of the project.
4.1 Database
4.1.1 Division of Work
As mentioned in the group report, much of the design of the database
was reviewed by the project supervisor to ensure that as the basis of
future Zhongwen Youxi He projects the foundations were solid. Once the
basic architecture had been established, alterations were made to the
areas of the database which concerned the Chinese character radicals,
as this was my area of responsibility. My changes were then combined
with that of the group.
37
4.1.2 Entity Relationship Model
Figure 12: Database entity relationship diagram
The database architecture was as shown in figure 12. The
RADICAL_TABLE was designed to store data about the Chinese character
radicals as indexed by the TICCC [13].
38
Figure 13: Example of the TICCC
Though initially confusing, many of the fields in the RADICAL_TABLE
map directly from the information in the TICCC. Figure 13 shows an
example of a small part of the TICCC, if we give a working example:
The radical 匚 has number 8 in the list, and there are two radicals
without numbers underneath it. These are radicals have the same
number of strokes which would usually be indexed here, but they have
another form, a main form (written style) under which they are indexed,
this number is denoted by the square brackets [] and corresponds to the
RadicalLeadSequence_Id. In this example [9] corresponds to 卜 the
radical at number 9. At this index it is shown in round brackets (). This
corresponds to the RadicalLeadSequence_Id which would be 1 as it is the
first entry in the brackets, and the RadicalLastSubSequence_Id
corresponds to the total number of radicals in round brackets, 1 since
there is only one radical in round brackets. The RadicalTICCCSubsid
corresponds to the sub indexed radical (those without a number), and
39
their position in the sub index. In this example [9] occurs at position 1
and [22] at position 2.
Figure 14: Example of the TICCC and database fields
However for the purposes of the database, the Radical_Id will
correspond to the strict occurrence of the radical in the TICCC, whether
sub Index or not. As such in the above example radical [9] would be
given radical_Id 9 and [22] would be given radical_id 10 (as they occur
strictly after radical 8), therefore the radical underneath these with the
number 9 would have radical_id 11. The RadicalTICCCSubsid_Id is
adjusted accordingly so radical [9] would become RadicalTICCCSubsid_Id
= 11.
Radical_id = 8
RadicalLeadSequence_Id = 0
RadicalTICCCSubsid_Id = 0
RadicalSubsidSequence_Id = 0
RadicalLastSubSequence_Id = 0
Radical_id = 9
RadicalLeadSequence_Id = 11
RadicalTICCCSubsid_Id = 1
RadicalSubsidSequence_Id = 1
RadicalLastSubSequence_Id = 1
40
Radical_Name stores the pinyin name of the radical.
Stroke_Number is the number of strokes as indexed in the TICCC.
Unicode stores the Unicode representation where possible.
If_Char whether the radical is also a character.
About notes about the radical and some examples or characters
indexed by it. This field represents one of the main features of this
stage of the project, and will be described further in the implementation
section.
SVG is an SVG representation of the radical.
4.2 The Editor
With the database architecture finalised, work could begin on the editor.
The Java platform and the JPA specification were to be utilised to
interact with the database, and a model – view – controller software
model was deemed the most appropriate development model.
This model would allow a modular development process with database
operations separated from the view via the controller. This would also
allow development and testing to occur separately with the model being
built and tested first, followed by the controller with the view placed on
top receiving data from the model via the controller.
Controller
Data
presentation
User input Update
Model
Query Model
View
Model
Figure 15: Initial Model View Controller design pattern
41
It was decided that to enable independent member development of the
editor, that core shared elements be implemented initially. This
framework would provide basic functionality for interaction between the
database, controller and the view. Both members of the team could
then develop further functionality in order to satisfy their own project
objectives. I took responsibility for the design of this framework.
4.2.1 The Model
The JPA enables java applications to interact with relational database
tables through entity objects. Each entity object corresponds to a row of
the corresponding table and one entity object is required for any table
accessed by the application. This resulted in an entity class for every
table in the database. Entities require one empty constructor, but a
constructor which took a radical_Id was added to allow new “empty”
radicals to be added to the database with details to be filled in at a later
time. Entities also require a field for each row in the table and get and
set methods for each of these fields. In the case of the RADICAL_TABLE
the entity was constructed as in figure 16.
42
RadicalsTable()
RadicalsTable(Radical_id)
setRadical_id
getRadical_id int
setRadicalLeadSequence_Id
getRadicalLeadSequence_Id int
setRadicalTICCCSubsid_Id
getRadicalTICCCSubsid_Id int
setRadicalSubsidSequence_Id
getRadicalSubsidSequence_Id int
setRadicalLastSubSequence_Id
getRadicalLastSubSequence_Id int
setRadical_Name
getRadical_Name String
setStroke_Number
getStroke_Number int
setUnicode.
getUnicode. String
setIf_Char
getIf_Char boolean
setAbout
getAbout String
setSVG
getSVG String
RadicalTable
Radical_id int
RadicalLeadSequence_Id int
RadicalTICCCSubsid_Id int
RadicalSubsidSequence_Id int
RadicalLastSubSequence_Id int
Radical_Name String
Stroke_Number int
Unicode. String
If_Char boolean
About String
SVG String
Figure 16: RadicalTable class
43
The JPA interacts with each of these entity classes via an Entity Manager
which is created from an Entity Manager Factory method provided by the
JPA implementation environment. The Entity Manager queries the
database, creates and updates the entities as appropriate providing row
level access to the database. One of the basic functions of the entity
manager is querying the database for entities:
Query q = new entitymanager.createQuery(“SELECT r from RadicalTable r”)
This query retrieves all records from the radical table. From here a
RadicalTable object can be obtained from the query q via
q.getResultSet() which returns a list. This list can be thought of as a list
of rows from the RADICAL_TABLE, with each row having a column
mapped to each field attribute, e.g. the first RadicalsTable object in the
list would likely have radical_id = 1. These objects could then be
updated or deleted and new objects created and saved to the database.
The framework would require the functionality of the Entity Manager.
4.2.2 The Controller
The Entity Manager functionality was encapsulated in a controller class,
Repository Manager.
44
The remove and save methods are both used to alter records in the
database. They both utilise either the update() or persist() methods to
save the states of the entities to the database. The life cycle of an Entity
Manager is usually the length of a transaction e.g. save an object to or
retrieve and object from the database. During this life cycle any
changes to the object can be made with the persist() method as the
entity and entity manager are connected to the database. Once the
entities have been retrieved by the application the entity manager is
destroyed severing the connections of these entities to the database.
These entities are known as detached, and the application can make
changes to them, but in order for the changes to be saved a new entity
manager needs to establish the connection with the database once again
and merge the new entity object with the database version. The update
method provided this function.
In order to mediate between the Repository Manager and the view
another controller was designed. The View Controller would initiate
database access with the Repository Manager and process it correctly for
the view.
getRadicals() List
getPhonetics() List
getPoneticsbyId(int)
getRadicalsbyID(int)
removeRadical(RadicalTable)
removePhonetic(PhoneticTable)
saveRadical(RadicalTable)
savePhonetic(PhoneticTable)
persist(Object)
update(Object)
RepositoryManager
entityManager EntityManager
Figure 17: RepositoryManager class
45
The methods were intended to be as general as possible allowing for
later additions to the Repository Manager to be utilised here. The
getAll() switches between types of entities (given by currentEntity) and
queries repositoryManager to get all records of that type (table) from the
database, the getById does the same but calls the corresponding getById
method. The save and remove methods correspond to those in the
Repository Manager. The asTable() method takes the currentList
variable and passes it in a suitable format for a JTable in the view.
These two classes formed the controller in the model view controller
architecture, allowing the view to interact with the model and thus the
database. This was the basic editor framework.
getAll() List
getbyId (int) List
removeObject(Object)
saveObject(Object)
updateObject(Object)
updateView()
asTable() Object[][]
ViewController
repositoryManager RepositoryManager
currentList List
currentEntity String
Figure 18: ViewController class
46
4.2.3 The View
The view is the user interface for the system. The user uses the Main
View Panel to select the table in the database they wish to read or edit,
and these user requests are sent to the controller. The Main View,
which is discussed further in the group report, consists of a main panel
with a JTabbedPane TabView as shown in the figure below.
The TabView consists of a JTable which gives a grid view of the records
from the table, as passed from the ViewController. It also contains a
switchable array of JPanels; radicalPanel,
Update
Repository
Manager
Query and
update Model
View
Repository
Manager
Model
View
Controller
User input
and view
update
Query
Repository
Manager
Figure 19: Editor Framework
Figure 20: MainView and TabView class
search()
update()
MainView
tabView TabView
comboTables JComboBox
textIDfield JTextField
TabView(ViewController)
updateViews()
setPanel()
TabView
tableDBView JTable
panelEditRecordView JPanel
radicalPanel RadicalViewer
phoneticPanel PhonteicViewer
syllablePanel SyllableViewer
phoneticSyllable PhonSyllViewer
viewController View Controller
47
phoneticPanel, syllablePanel, phoneticSyllable using the CardLayout
Layout Manager. I was responsible for the design of the RadicalPanel.
Figure 21 shows the basic design for the Radical Viewer input form, this
would be shown from the Main View user input screen, All fields are
JTextFields with the exception of the JComboBox if_char which gives the
choice of true or false, the SVG Panel which is an SVG Canvas and the
About field which is a JTextArea which cannot be edited directly. The
About Field has a click listener which opens a jEditorPane which can be
used to edit the text. This will be described in the next section. The
SVG Panel is an SVG Canvas which uses the Batik API to render SVG to a
JPanel. The save clear and cancel buttons are self explanatory.
Radical_Id RadicalLeadSequence_Id RadicalTICCCSubsid_Id
RadicalSubsid
Sequence_Id
RadicalLastSubSequence_Id If_char
SVG Panel
SVG Text
About Field
unicode Stroke number
save clear cancel
JTextArea not editable
SVG Canvas
Figure 21:RadicalView input form
48
The RadicalViewer class is passed a ViewController which is used to
retrieve and update values in the database. The populateForm() method
is called when a row in the TabView tableDBView is selected, the row
number is passed to the ViewController which gets the RadicalTable
object from its currentList field. The fields on the form are then
populated with the data from the RadicalTable object. The isDuplicate()
method is a check performed on saving changes to the database. If a
user attempts to save a new record, the radical table is checked in the
ViewController‟s currentList, to ensure no radical_Id already exists. The
methods saveRadical() and removeFromDB() both check to ensure that
either a record is active (selected in the tableDBView), or that a record
with the same radical_Id doesn‟t already exist before performing the
action.
4.2.4 The AboutEditor and TextParser
The About field on the RadicalViewer panel is not directly populated
from the database, instead the value is passed to the TextParser class
which allows formatting to be performed on the text before its output to
the screen. For the purposes of testing, the About field only displays
normal text, but when clicked opens an editor field; the AboutEditor.
RadicalViewer(ViewController)
populateForm(int)
isDuplicate(int) boolean
clearAll()
saveRadical(RadicalTable)
removeFromDB(RadicalTable)
RadicalViewer
myontroller ViewContoller
Figure 22: RadicalViewer class diagram
49
The AboutEditor appears when the About field is clicked in the
RadicalViewer panel. The TextParser class is passed to the editor to
allow parsing of the text. The showEditor() method initialises the view
by switching the switchPanel to rawEditor, with the contents of myParser
Test Reset Save Cancel
jEditorPane
Jscrollpane
Figure 23: The AboutEditor
AboutEditor(TextParser)
showAbout()
saveAndClose()
test()
closeDiscard()
reset()
AboutEditor
rawEditor jEditorPane
formatEditor jEditorPane
switchPanel JPanel
myParser TextParser
Figure 24: AboutEditor class diagram
50
(the data from the about field in the database). The jEditorPane allows
text to be displayed with html mark-up as if in a web browser, this
allows basic html formatting in the pane. The rawEditor is a plain
jEditorPane which displays the text in raw form, whereas the formatPane
shows the results of the html markup. When the button “test” is pressed
this switches the rawEditor to the formatEditor, they are both contained
in the switchPanel in CardLayout. The reset button switches the view
back to rawEditor. Saving will update the TextParser with the contents
of the rawEditor pane, which will then update the database if the save
button is pressed in the RadicalViewer.
Figure 25 shows the basic workflow from the RadicalViewer, the
AboutEditor and the TextParser. In the case of RadicalViewer populating
the fields from the tableDBView jTable, the ViewController passes the
about information directly to the TextParser which displays the data in
the About field. When the About field is clicked the AboutEditor is
opened initialised with the raw contents of the About field. When the
text is parsed the text from the rawEditor is sent to the TextParser to be
passed and returned formatted with html for the html aware
About field
text
Save to DB
Display in
About
Field
Save raw text
RadicalView
TextParser
AboutEditor
Get format or
raw text
ViewContoller
Figure 25: Basic flow of control for AboutEditor
51
formatEditor. If this is saved, the contents of the rawEditor are saved to
the TextParser to be saved to the database once the RadicalViewer save
button is pressed. Otherwise no changes are made, in both cases the
contents of the Textparser are passed to the ViewController
The TextParser class allows the parsing of user defined tags into html.
In this Editor a number if user tags were defined corresponding to
different types of database information:
<k keyword></k> keyword marker
<c unicode></c> character unicode
<r id></r> radical id
<s id></s> syllable id
<f id></f> phonetic id
<fs id id> </fs> phonetic-syllable id
<g></g> for grammar
<ie></ie> for english idioms
<ic></ic> for chinese idioms
<c-col ></c-col> colour: col = r,o,g,b
<bf></bf> bold face
<it></it> italic
<sf></sf> sans serif
<tt></tt> teletype / courier
<e example id></e> example id
These various codes would be marked up with various forms of html
when parsed by the TextParser. For the purpose of this project only a
few styles were selected:
52
<c-red ></c-red> <font = “red”></font>
<c-yellow ></c-yellow> <font = “yellow”></font>
<c-blue ></c-blue> <font = “blue”></font>
<bf></bf> <b></b>
<it></it> <i></i>
The parser was designed to handle overlapping and these tags were
deemed sufficient for the purpose of testing. The framework was
however setup for the remainder of the tags to be added or altered at a
later date. Parsing of the special characters ”\n”, “\r”, “\r\n” were
mapped to <br> to maintain page formatting.
The TextParser setRawText() method sets the rawText member variable
storing the value of the About field passed from the ViewController.
The getRawText is the value passed back when the RadicalViewer is
closed. The parse() method takes a String from the rawEditor pane and
returns a formatted string of html. The algorithm iterates the start tags
splitting the string and applying formatting to the text between the start
and end tags, by replacing them with the corresponding html.
setRawText()
getRaswText() String
parse(String) String
TextParser
rawText String
startTags String[]
endTags String[]
Figure 26: The TextParser class diagram
53
4.2.4 The SVG Panel
Utilising the Batik Library, the contents of the SVG field would be
rendered to this panel as well as passed to the SVG JTextArea. This
would allow changes to be made to the SVG and viewed in the SVG
Panel. The batik library SVG Canvas is an extension of the JPanel class
which provides this functionality, the text string is passed to the canvas
to be displayed via a url or a reader.
Chapter Summary
This chapter documented the design of the system. The database
design was outlined and justification for the design of the
RADICALS_TABLE was also provided. The design of the editor
framework and the model – view – controller pattern was described with
the additional RADICAL_TABLE specific additions.
54
Chapter 5: Implementation and Testing
Once the database architecture and editor framework had been
designed, implementation and testing of core functionality began to
allow both members of the group to then undertake their own individual
tasks. This included the design and implementation of their own input
forms such as the Radical Viewer and adding data to the database. As I
assumed responsibility for the implementation of the framework, this
will be described here with testing results. Some of the tools which were
used in the development of the system will also be mentioned.
5.1 The Tools
The project was implemented using a MySQL database which was
engineered with the help of MySQL Workbench14
which enabled the
modelling of the Entity – Relationship model. The Eclipse IDE15
was used
to develop the editor with the use of the JPA specification and
EclipseLink JPA implementation and the Batik library which was used to
render SVG documents to the screen. JUnit which is included as part of
the Eclipse IDE, was used as a testing framework for some of the core
functionality as this ensured that, as the most important part of the
project the implementation was robust.
5.2 The Editor
The implementation of the core framework was carried out using
methods borrowed from Agile development. The JUnit framework was
14
Welcome to MySQL Workbench 5.2; MySQL Workbench Team; MySQL Inc;
http://wb.mysql.com/; Last Accessed 10/05/10.
15
Explore the Eclipse Universe 2010; The Eclipse Foundation 2010;
http://www.eclipse.org/; Last Accessed 12/06/10.
55
used to create test classes of which tested the main functionality, or any
functionality which could feasibly tested without user input. The test
setup for the core functionality followed the same pattern for each class
with the main test cases being an array of three RadicalTable objects;
RadicalsTable r1 = new RadicalsTable(1);
RadicalsTable r2 = new RadicalsTable(2, "4E57", "yi1", true);
RadicalsTable r3 = new RadicalsTable(3);
The objects have radical_id 1-3 respectively, with radical testing a four
argument constructor. Each Test class had a number of methods to test
a number of different properties. Full test code can be found in the
Appendix.
5.2.1 Radicals Table Test
This class tested the RadicalTable Entity using an Entity Manager:
generateRadicalTables(): This method creates the above
radicalTestCases. To test the set methods for each of the fields
raadical(3) has its fields set at runtime.
insertAndRetrieve(): This method creates an entity manager to attempt
to persist the entity objects to the database, then test whether they can
be correctly retrieved.
findAndDelete(): Retrieves said entity objects and deletes them testing
they have been deleted
56
Figure 27: Radical Table Test passed
5.2.2 Repository Manager Test
This class tested database access using the Repository Manager to
create an Entity Manager and manipulate the entities.
insertAndRetrieve(): This method creates an Repository Manager to
attempt to persist the entity objects to the database, then test whether
they can be correctly retrieved.
findAndUpdate(): Tests using a detached entity object to see if changes
can be merged with the database.
updateById(): This method retrieves an entity from the database with a
given id setting one its radical_Name field to a new value and testing
that it has merged with the database.
findAndDelete(): Retrieves said entity objects and deletes them testing
they have been deleted.
57
Figure 28: Repository Manager Test passed
5.2.3 View Controller Test
This class tests some of the functionality of the ViewController, mainly
the interaction with the Repository Manger, but none of the input from
the view.
insertAndRetrieve(): This method creates an Repository Manager to
attempt to persist the entity objects to the database, then test whether
they can be correctly retrieved.
insertAndRetrieveById(): This method creates an Repository Manager to
attempt to persist the entity objects to the database, then retrieves an
object by its radical_Id.
asArrayTest(): This method tests the asArray() and getTableData()
methods used to pass the entity properties to the JTable in the TabView.
The Repository Manager retrieves all the records, the getTableData()
submethod places this into an array which is passed to asArray()
method. The lengths of these arrays are checked.
testUpdate(): Tests using a detached entity object to see if changes can
be merged with the database.
58
testRemove(): This method attempts to delete entities, testing they have
been deleted.
Figure 29: View Controller Test passed
5.3 The Visual Elements
Testing the visual elements of the editor could not feasibly be carried
out with JUnit tests and datasets, so running tests were used, testing the
usability and functionality of the system simultaneously. The
implementation of the Main View was a start point for the visual tests
and would be used as part of the integration tests conducted by the
group.
59
The editor framework presents the user with a Main View screen.
Figure 30: Main View
With radicals_table selected, the view switches to allow editing of the
database table.
60
Figure 31: RadicalViewer input screen
5.3.1 Adding a New Record
Adding a new record with radical id 20:
Figure 32: Adding a new record
61
When the save button is clicked the Radical Viewer checks whether a
record is selected in the table view, in the case of a new record (no
selection) the viewController initiates a check to ensure that there is no
radical with that radical_id already stored, and if not the user is
presented with a selection confirmation dialogue:
Figure 33: Save confirmation
If no is clicked then view is returned back to the input screen as before,
if yes is clicked, the save process is initiated. The code below illustrates
this functionality.
Figure 34: Save confirmation code
The save process involves ensuring that the data from the fields is in the
correct format for the database. A number of try catch clauses
encapsulate this, with incorrectly formatted text being presented in an
error box to the user. Once saved the database can be queried to
refresh the view, this refreshed view can be seen in the following figure.
62
Figure 35: New record added
5.3.2 Updating a record
Updating a record follows similar logic to that of adding a new record
however the RadicalViewer class checks that a record is selected in the
jTable as it is assumed that the user will select the record from here
before attempting to edit it. The populateForm() method is called to fill
the RadicalViewer with data from the RadicalTable object in the
ViewController. If no record is selected the user is presented with the
following message box:
Figure 36: No radical to update
63
Otherwise the logic follows as adding a new table, with the
ViewController updating instead of saving to the database.
5.3.3 Removing a Record
In order to remove a record the same checks are made, to ensure the
record is selected in the table view this ensures that there is not attempt
to delete a record that doesn‟t exist. If the user is presented with a
dialogue informing them of such as above otherwise they are greeted
with a confirmation dialogue
Figure 37: Remove from database confirmation
5.3.4 The SVG Panel
The SVG field from the RADICALS_TABLE is passed simultaneously to
both the SVG Panel and SVG field when the populateForm() method is
called.
Figure 38: Setup of the SVG Canvas
64
For test purposes the SVG for the character 採 (cǎi) [13] was used. The
complexity and colour would test the effectiveness of the SVG Canvas as
shown in the figure below.
Figure 39: The SVG Panel and SVG Field
5.3.5 The AboutEditor and TextParser
The About editor and TextParser were tested by entering data into the
About field of a new record to test both parsing and saving. The
following input was used:
<c unicode>myTest</c>
<c-b>the colour blue</c-b>
<c-r>the colour red</c-r>
<it>test the italics</it>
normal
<c unicode>This is<c-b> a nested</c-b> example of colour<it>with
<c-r>some italics </c-r> </it> </c>
65
Clicking the About field on the RadicalViewer form, causes the
AboutEditor appear. The text added to the rawEditorPane is shown in
the figure below.
Figure 40: AboutEditor rawEditor pane input
The TextParser parses the input when the Test button is pressed as
figure 41 shows. Note that for the purposes of this test Unicode was
given a yellow font.
Figure 41: The parsed text
66
5.3.6 Discovered Issues
Issues were found throughout the course of the testing and sometimes,
though missed the initial tests setup were found through usability.
Some of these will be outlined here.
SVG Canvas glitch: On some occasions when saving an image the
database, the canvas would glitch covering the entire RadicalViewer
input form. An attempt to remedy this was placing the SVG canvas
inside a JScrollPane, however this only served to make the enlarged
image scrollable.
MainView refresh: When changes are committed to the database, the
user needs to query the database again to refresh the view using the
search button. As a point of usability the view should refresh
automatically when changes are made to keep the user abreast of the
state of the table.
5.4 The Database
The database was populated with data about the Chinese radicals as
described in the design section. Further information about the radicals
and any semantic relationships between the radical and the characters
indexed by it was researched and entered. This section will illustrate a
few examples of this information.
67
一 yī : One
Unicode: 4E00
This radical also known as “one horizontal”, lies at the root of many of
the numbers including:
二 èr (two)
三 sān (three)
五 wu (five)
卅 sà (thirty)
It has many meanings associated with measurement and identification
of uniqueness such as 每 „each‟ or 各 „every time‟, 统一 unitary or unified.
The characters indexed by this radical include:
屯 tún which is used in terms such as to garrison or station 屯兵 tún
bīng, or 屯聚 tún jù to amass or assemble.
再 zài which has uses in terms, 再版 zài bǎn second edition or 再次 zài cì
meaning again e.g. 再一次 one more time.
丿 piě: Hook or left- falling stroke
Unicode: 4E3F
Other forms: 乀fu2, 乁(yi2)
This radical has many different indexing characters. It is used by
onomatopoeic characters such as 乓 pāng and 乒 pīng which are both
used to describe the sound of a discharging firearm.
The character 卵 luǎn (egg) is used to describe parts of an egg 卵黃 luǎn
huáng: egg yolk, or objects that share some physical similarity with an
68
egg such as 卵形 luǎn xíng: oval shaped 卵石 luǎn shí: the rounded
shape of a pebble.
Another character, 乘 chéng is used in words which express the idea of
increasing one‟s lot, and taking advantage of opportunity. The
character 乘法 chéng fǎ multiplication [14] can be seen as a core theme
to gaining more, or increasing something. This can be seen in the
phrases which use characters such as 乘机 chéng jī: to “jump at a
chance” or “to strike while the iron is hot” 乘势 chéng shì.
八 bā: Eight
Unicode: 516B
Other forms: 丷
The number eight seems to hold some symbolic importance within the
Chinese language with 八德 bā dé: the eight virtues, the eight points of
the compass: 八方 bā fāng, the eight immortals 八仙 bā xiān and the
eight trigrams: 八卦 bā guà. The Chinese horoscope also consists of
eight characters. This radical indexes a numerous characters with an
equal number of meanings.
The character 公 gōng is used in a number of words and phrases
regarding the public and being in the open including 公安 gōng ān which
is used in the term public safety, and 公私 gōng sī used to describe
public interests. In a similar vein to above, the character 分 fēn can be
seen in phrases such as 分散:disperse or 分给: distribute. It is also used
in words which describe an acquaintance or person of whom one is
aware: 生分 shēng fen.
69
兴 xīng is a character in phrases such meaning to begin a task, or to set
it up 兴办 xīng bàn, with the intention of achieving some goal such as
兴兵 xīng bīng: starting a war.
冫 bīng: Ice
Unicode: 51AB
This radical appears on the left side of numerous characters which share
a relation to the cold, most notably 冰 ice water (ice) which is the
concatenation of the radical 冫bīng (ice) with the character 水 shuǐ
(water). Phrases which include 冰 follow this theme further such as 冰山
bīng shān (ice mountain) and 冰窖 bīng jiào which can be used to
describe structures made of ice. Other uses include 冰霜 bīng shuāng
with connotations of high moral integrity.
The character 准 zhǔn is indexed by this radical and has a number of
meanings relating to precision and strictness of order which can be seen
in phrases such as 准将 zhǔn jiàng used to describe a number of military
ranks, and 准确 zhǔn què: precise or exact. A degree of certainty and
expectation of outcome is expressed with this, the regimented nature of
the military an example of this and the rigour of service are also implied
here. 准头 zhǔn tou is another example of this being used to describe
accuracy.
净 jìng has some similar uses to those mentioned with the implication of
attention to detail and neatness such as 干净 gān jìng which can be
found in terms of cleanliness with meticulousness 溜干二净 liū gān 'èr
jìng.
70
卜 bǔ: To Divine
Unicode: 535C
This radical has roots with the Oracle bone inscriptions on the shells of
turtles from the Shan Dynasty. The art of divination:卜课 bǔ kè involved
attempting to predict:卜问 bǔ wèn the outcome of major events such as
the harvest or battle. Many phrases including this radical are some
reference to this such as; 卜骨 bǔ gǔ: the bone used for inscription,
with卜甲 bǔ jiǎ being the divination shells.
The character 占 zhàn is indexed by this radical and has uses in many
phrases which suggest the act of owning or asserting presence,
including; 占压 zhàn yā which is used in the phrases alluding to
occupation. 占有 zhàn yǒu has associations with ownership such as to
possess, occupy or have, even by force (攻占 gōng zhàn).
In contrast 卧 wò alludes to relaxation or rest; 卧车 wò chē is used in the
description of a sleeper carriage, as well as 卧床 wò chuáng (bed) and
卧房 wò fang(bedroom).
勹 bāo: Wrap
Unicode: 52F9
The radical bao is a character used to describe the act of wrapping or
placing an object in a bag. It also indexes a number of characters with
different connotations.
够 gòu is a character which is used by words and phrases with a vast
variety of meanings, from the good 够意思 gòu yì si (great), to the not so
71
good 够戗 gòu qiàng, badly or horribly. The character has associations
with words which give an impression of enough or sufficiency, such
as够格 gòu gé which describes competence or 够本 gòu běn which can
be used to express the act of covering expenses.
匀 yún is another character indexed by this radical. It has uses with
words which denote balance such as 匀称 yún chen. Other terms which
conform to this include 匀净 yún jing which describes uniformity, which
can possibly spuriously be linked to the act of being “wrapped” or
collected into a group. The term 匀溜 yún liu is used to describe the
property of an object such as its texture and consistency.
几 jī: Table
Unicode: 51E0
This radical for the word table is used to describe flat surfaces as the
meaning suggests there is also some implication of counting or
measurement with terms such as 几何 jǐ hé used in phrases related to
geometry and quantification, 几种 jǐ zhǒng (several). This term is used
to ask for certainty of amount or time; 几时 jǐ shí (what time) and 几多 jǐ
duō (how many).
The character 凡 (凡) fán is indexed by this radical and has a general
connotation of commonality, or lacking any special quality, such as 凡夫
fán fū which describes an ordinary person and 凡是 fán shì which
expresses everything collectively. Terms such as 凡庸 fán yōng
(commonplace) also affirm this theme with an implication of mediocrity.
72
咒 zhòu has connotations of the slightly mystic or occult, with terms
which describe the recitation of incantations and prayers: 咒文 zhòu
wén. As with English the term curse or swear is also linked with the
supernatural or more mysterious practices with words to this effect also
containing this character such as 咒骂 zhòu mà.
门 mén: Door
Unicode: 95E8
Other forms: 門
This radical character means door. It has meanings which relate to
doors and openings such as a door knob: 门把 mén bà. The idea of a
direction of ideas or a way to progress is also described by this radical
门道 mén dào (capability). Another main theme associated with this
radical is that of state or importance as the radical seems to have
historic links to positions of power such as the aristocracy: (门阀 mén
fá), government positions and buildings.
The character 闭 bì which is indexed by this radical has associations with
barriers and being able to cordon off an area. Terms such as 闭谷 bì gǔ
and 闭关 bì guān are used to describe isolation and the a life of
seclusion.
In contrast, 闻 wén has connotations of being well known with a hint of
celebrity status (闻达 wén dá). Other terms are related to the spread of
news and hearing or getting wind of gossip. The character 阐 chǎn has
some related meaning as it is used by words which describe explanation
or clarification 阐明 chǎn míng.
73
人 rén: Human being
Unicode: 4EBA
Other forms: 亻
This character radical‟s main association is that of a person with
numerous words of a personal nature such as; 人工 rén gōng (man -
made) and 人格 rén gé which is used to describe a person‟s character.
Other terms include crowds and populations.
The character 舒 shū has connotations of the relaxed and blissful with terms
such as; 舒服 shū fu which describes satisfaction and a sense of well being and
舒畅 shū chàng which expresses having no cares or worries.
从 cóng is a character which describes the act of people grouping
together. The character can be seen as the combination of two people
characters or a group. 从命 cóng mìng denotes the idea of conforming,
and 从军 cóng jūn describes the act of joining or enrolling.
工 gōng: Work
Unicode: 5DE5
The main semantic theme of this radical is that of employment and
work. This includes places of work such as industrial areas (工厂 gōng
chǎng) and craftsmen and artisans such as 工匠 gōng jiàng. Other
examples include 工交 gōng jiāo (industry) and 工区 gōng qū (business
area).
Characters indexed by this radical include 巫 wū which can be found in
terms related to the mysterious and magical such as 巫婆 wū pó
74
(sorceress) and 巫师 wū shī (wizard). It should be noted however that
the terms for medical doctor witchdoctor and wizard are described
using the same terms which suggests a widespread belief in the healing
power of the super natural even if this was in a bygone era.
攻 gōng has used of a more conformational nature with terms which
denote the act of attacking or defending such as 攻打 gōng dǎ. This
could relate to the theme of work and business with regard to
competition with businesses battling one another for supremacy.
Chapter Summary
This Chapter described the implementation and testing process,
including the JUnit testing of the core system functionality and the
usability testing of the graphical user interface components. Some
examples of database entries were also shown to give a flavour of the
type of information contained in the RADICALS_TABLE About field.
75
Chapter 6: Evaluation
This stage of the Zhongwen Youxi He project required the collaboration
of a group to conduct research, record their findings and develop a
system to enable them to do this. This dissertation has documented the
stages of the project under my responsibility. The success of this
process will be assessed in this chapter. This will involve a measuring
the extent to which the final products, meet their objectives, with some
suggested improvements, and a critical appraisal to the approaches to
each stage of the project.
6.1The Editor
The database editor application's success can be measured by its ability
to meet the objectives described in the early stages of the project. The
requirement to allow data from the RADICALS_TABLE to be added
removed and updated was met as the system provided this functionality,
with a user interface to make the operations more user friendly. There
are a number of areas which could also possibly have been improved
and these will be addressed here.
The user interface could have been improved, by allowing resizing of the
main screen to allow better adaptability to users with different screens
and size preferences. The Layout Managers chosen could have been the
more flexible such as Flow Layout, which would have made the process
of positioning elements more involved but provided a much more fluid
user experience.
The system worked mainly by ignoring errors made by the user incorrect
data types being saved to the database. The try/catch clauses enabled
these errors to result in no incorrect data being sent to the database
76
with output to the console to inform of the error “correction”, but a
more informative approach would have been a message which was more
suitable to the current user interface such as a message box which
would inform the user of their mistake and allow them to rectify it
before continuing. This would improve the user involvement with the
system as the mistake may otherwise be assumed to be that of the
system.
The TextParser class utilised custom tags to render text in the form of
html in a JEditor Pane. These tags each consisted of a start tag and end
tag enveloping some text e.g. <c unicode> my text </c>. This was
initially designed to enable users to nest tags with different properties
inside each other as demonstrated in the testing. This also allowed
users to overlap tags allowing a certain style to be applied from the start
tag onwards. This however requires concentration on the user‟s part to
ensure that start tags are matched with the correct end tag in order to
end the formatting. A more simple approach may have been to allow
the presence of a close tag “>” to denote the end of the most recently
applied formatting. Though this prevent overlapping it could be argued
that overlapping provides and needless complication to the styling of
text which will rarely require the overlapping of tags. A more automated
system may also have helped with the About Editor having automatically
adding the end tag once a start tag is added, allowing the user to add
their text between both.
It could be argued that the use of the JPA architecture though providing
a simple foundation for a database system limited the functionality of
the system by only allowing the table definitions specified in the
configuration files. It is possible that the JDBC would allow a more
flexible application with the database tables to be specified more
dynamically at runtime.
77
6.2The Database
The database was designed to allow information to be entered and
retrieved with higher level applications. The database therefore placed
minimal constraints on the database with regard to foreign key
relationships. The applications developed to interact with the database
will infer the relationships between the tables and collate data from the
database to be presented to the user. Though it could be argued to
some degree that this could result in some data redundancy in areas of
the database with arbitrary primary keys, with regard to the
RADICALS_TABLE this should not occur as the format of the system
follows that of the TICCC.
The main objective of the RADICALS_TABLE was to index the Chinese
character radicals. This objective was met as explained in the design,
however there was also a desire to include some added information
about the radicals including some explanation of the semantic
connotations of each radical and characters indexed by it. The quality
of this information is a subjective measure, but as the information was
gathered from a range of sources with my own input it could be argued
that some value has been added to the original information. Due to the
distributed and sparse nature of official data, sources such as
dictionaries and text books were used as a starting point, and sources
from the internet used to supplement this. The selection of these
sources could be evaluated as my selection of these was based on those
I could understand i.e. those in English. The number of official or more
prestigious sources may have been increased if I had a greater
understanding of the Chinese language, or employed the aid of a native
Chinese speaker to help me understand this information.
78
Chapter Summary
My contribution to the project was appraised in this chapter by
evaluating the extent to which the deliverables met their objectives and
some possible improvements to various areas of the project.
79
Chapter 7: Conclusions and Future Work
This stage of the Zhongwen Youxi He project aimed to create the
foundation for an interactive learning tool, to allow non native English
speakers to learn about the Chinese characters. This stage concentrated
on the creation of the database to store the data, with a means to enter
this data. This dissertation has documented this process, and my
individual attempt to contribute to meeting these objectives.
This stage of the project can be generally seen as a prototype for future
iterations of the Zhongwen Youxi He project. The database design was
functionally adequate for the current project, but perhaps in the future
requirements may change and the design may be refined. The data
stored in the database, more specifically the RADICALS_TABLE can be
seen as one part of a vast repository of data from which information can
be drawn. When the cumulative efforts of the group are combined the
project contains data on the Chinese character radicals, phonetics and
syllables. Though no character or component data was stored at this
stage there has been sufficient contribution and sourcing of information
for the other areas to be addressed at a later time.
The database editor can be seen as an experimental prototype for future
iterations of the higher level functionality that Zhongwen Youxi He
project will require. The SVG panel from the RadicalViewer will provide a
platform independent viewing window for the characters. The use of
SVG also lends itself to programmatic functions, such as searching by
path, stroke or shape. These images can also be manipulated
programmatically eventually allowing for users with the correct access
rights to add and update the images stored in the system. Colours
could be used to differentiate the radical or components that make up
80
various characters, or the radical components being colour coded by
their general connotation e.g. positive or negative.
The system may also utilise some form of analysis on external sources
of data such as characters copied and pasted by users into the system
from an external website, whether in Unicode or SVG format, allowing
the character to be matched and information retrieved to give examples
of other uses.
The future applications may be able to recognise audio input, matching
user spoken words to the pinyin entries in the system and retrieving the
information accordingly. Audio feedback or correct pronunciation could
also be included. This would help to ensure that future applications are
more rounded and complete, encompassing a vast number of sources
and utilising a number of forms of interaction.
Chapter Summary
The deliverables have been mentioned with regard to their relative
success and their place in the wider Zhongwen Youxi He project. By
commenting on the outcome of the project as the foundation of a
language tool and suggesting some possible future enhancements to
the project this chapter concludes both the dissertation and Zhongwen
Youxi He project. .
81
References
[1]. Chinese Characters: Their Origin, Etymology, History,
Classification and Signification; A thorough study from Chinese
documents By Léon Wieger , Publisher: Dover Publications; New
issue of 1927 ed edition (Jun 1 1965) ISBN-10: 0486213218
ISBN-13: 978-0486213217. Page 14.
[2]. Success With Chinese, Level 1, Reading & Writing: A
Communicative Approach for Beginners (Paperback); by De-An
Wu Swihart; Paperback: 224 pages; Publisher: Cheng & Tsui; 2
edition (Jan 2007); Language English; ISBN-10: 0887276016;
ISBN-13: 978-0887276019. pp 6 – 8.
[3]. Planning Chinese Characters: Reaction, Evolution or Revolution?
(Language Policy) (Hardcover); by Shouhui Zhao (Author), Richard
B. Jr. Baldauf (Author); # Hardcover: 420 pages; Publisher:
Springer (27 Dec 2007); Language English; ISBN-10:
0387485740; ISBN-13: 978-0387485744; pp 10 -15.
[4]. Lexicography Crit Concepts V2: 002 (Hardcover); by HARTMANN
R; Hardcover: 3 pages; Publisher: Routledge; illustrated edition
edition (29 Sep 2000); Language English; ISBN-10: 0415253675;
ISBN-13: 978-0415253673; pp 159, 161.
[5]. A History of Chinese Calligraphy (Hardcover); by Yuho Tseng;
Hardcover: 446 pages; Publisher: The Chinese University Press;
2nd edition edition (31 Dec 1998); Language English; ISBN-10:
9622014267; ISBN-13: 978-9622014268; p12.
82
[6]. Unicode Demystified: A Practical Programmers Guide to the
Encoding Standard (Paperback); by Richard Gillam; Paperback:
896 pages; Publisher: Addison Wesley (27 Sep 2002); Language
English; ISBN-10: 0201700522; ISBN-13: 978-0201700527; p357.
[7]. Chinese Radicals; Wikipedia; Wikimedia Foundation, Inc;
http://en.wikipedia.org/wiki/Radical_%28Chinese_character%29#
Semantic_Elements; Last Accessed 10/05/10.
[8]. Comparison of different DB Technologies; by Troels Arvin;
http://troels.arvin.dk/db/rdbms/; Last updated 06/04/10; Last
Accessed 10/05/10.
[9]. Database Design: Know It All (Morgan Kaufmann Know It All)
(Hardcover); by Toby J. Teorey, Stephen Buxton, Lowell Fryman,
Ralf Hartmut Güting, Terry Halpin, Jan L. Harrington, William H.
Inmon, Sam S. Lightstone, Jim Melton, Tony Morgan, Thomas P.
Nadeau, Bonnie O'Neil, Elizabeth O'Neil, Patrick O'Neil, Markus
Schneider, Graeme Simsion, Graham Witt; Hardcover: 368 pages;
Publisher: Morgan Kaufmann (12 Nov 2008); Language English;
ISBN-10: 0123746302; ISBN-13: 978-0123746306; pp 1-10.
[10]. Test Driven Development (The Addison-Wesley Signature Series);
Paperback: 240 pages; Publisher: Addison Wesley (20 Nov 2002);
Language English; ISBN-10: 0321146530; ISBN-13: 978-
0321146533.
[11]. MySQL; Wikipedia Foundation, Inc.;
http://en.wikipedia.org/wiki/MySQL; Last Accessed 10/05/10.
83
[12]. MEPRC (2009) Radicals] Ministry of Education of the People's
Republic of China, Table of Indexing Chinese Character
Components, 中华人民共和国教育部, 2009.
[13]. File:Chinese character 採 cai3 pick with ROOT colored.svg;
Wikepedia.org;
http://en.wikipedia.org/wiki/Radical_(Chinese_character); Last
Accessed 15/07/10.
[14]. Chinese Tools.com; 2008; http://www.chinese-
tools.com/tools/sinograms.html?q=%E4%B9%98; Last Accessed
01/08/10.
84
Appendix
TestBase
85
RadicalTableTest
86
87
RepositoryManagerTest
88
89
90
91
ViewControllerTest
92
93
94
95
96