KBDS - A Knowledge based concept for user-friendly selection of data in decision support systems Soren Selmar, Institute of production, University of Aalborg, Denmark
Keywords: Infonnation Retrieval. Decision Support Systems. Relational Data Bases, Distributed Databases, Application Generation. End-user Computing, Knowledge Based Systems, Thesauri. Data Modelling and Prototyping.
The KBDS-concept (Knowledge Based Data Selection) provides a new and more user-friendly approach to the retrieval of data from databases. A system based on the KBDS-concept can act as an intennediary system between decision support tools and one or more connected databases. Such a system makes it possible for the manager himself, in a few minutes, to carry out any data selection across one or more databases.
In the first part of this paper, the problems related to data selection from databases, will be dealt with. Only relational databases are concerned.
In the next part, the KBDS system will be described. The KBDS system exists as a tested prototype, implemented in an ORACLE Fourth-Generation System. However, in this paper the KBDS-system will be described, as it would appear in a window-based interface connected to distributed data bases.
The third part of the paper gives examples of the use of the KBDS-system emphasizing the ease-of-use and the generality of the system. Conclusions will be made at the end of the paper.
Context of the KBDS task Data selection is a critical and problematic task, no matter whether your decision support simply is based on a traditional Fourth-Generation System, or your company uses more advanced integrated DSS- or EIS-systems.
108
t; \' 1 " ~ ;
~.; ~~ , r· ~. t.,'
" F: Ii ft. ~ ~ :
'"' r., I' .C -;<!
~£
" .~.;
.. {-
-,,-
~ t' :fJ;
:~ '5 t ~~
{ ;"'
-.::' "t r ):
~ :f~ :.£ '::.p ,:E' 'tf '.j>
:1 ..\".
f" J: ;:-:
"
\~ \
In the future, increased needs for on-line data and ad hoc analysis, can be expected. And you will also increasingly see very large distributed data bases used as data sources for decision support. At the same time, a lot of efforts are carried out to move the decision support tools from the EDP-professional's desk to the manager's and other end-users' desks. That is why you need more advanced and intelligent tools appropriate for selecting the valuable decision support information.
It is necessary to give advanced data selection tools to managers, if they shall be able to meet the challenge of the future. They will need tools, which in tenns of minutes are able to combine data from one or more databases. It will often be preferred that the task can be perfonned in an interactive way and done by the manager himself.
The KBDS system meets these future demands. Before documenting how the KBDS system works, some essential problems according to data selection from data bases will be discussed.
Data Selection Problems When a manager needs data for decision support, and no standard application shows to be appropriate, he has a problem. The company's data related to a specific object - for instance a specific employee - will often be spread across several tables and perhaps across more than one database. An exhaustive search can easily include the opening and combining (join-operations) of several tables. If you have only few minutes left for selecting such a composed set of data, you will nonnally be recommended to give up your attempts. Even data base professionals need hours or days to perfonn advanced data selections of that kind. Especially if you at the same time want an appropriate interactive search application built.
A fundamental problem for the end-user is, that it is normally impossible to remember in which tables and in which databases data is stored. Data-dictionary data will only be helpful, if you already know, what is hidden behind those often meaningless abbreviations used as table-names and column-names in the data bases.
An Entity/Relationship data model for data bases contains valuable infonnation, if you are going to perfonn advanced data selection. However, data models of that kind presumes, that you also know the complex relations between the data models and the actual data base designs related to them. Unfortunately data models are quite difficult to understand and they are nonnally not integrated with the data base. That makes it almost impossible for end-users to use the infonnation kept in the data models as useful information when perfonning data selection.
109
Another problem is, that the standardized search language SQL (Structured Query Language) is to complicated to use for managers and other ordinary endusers. If you need to select data across more than one data base table or across more than one data base, you normally need special SQL-skilled people to assist you.
The problems mentioned above are only some of the problems, you will have to deal with, if you want to select data directly from one or more databases. Other problems will occur, if you in a hurry want to create an application, which in an interactive way can help you to select the needed data set. In fact it will very often be too expensive to carry out such data selections, especially if it shall only be used in an ad hoc situation.
In figure 1, some of the problems in selecting data from data bases are illustrated. As you see in the figure, the user's head must hold and operate on a great deal of knowledge if he wants to carry out advanced data selection with no help from professionals.
USER'S MODELS Data Models
Data Base Design Models k"'" § Data Base Allocations
Sal-language or Tool Skills Application Design Skills
'It' -I' ..", ~
~ @
" § - -
Figure 1. The user's head must hold a great deal of knowledge and skills if he wants to select data directly from one ore more relational data bases.
The KBDS system The KBDS system solves the data selection problems discussed above. It could be seen as "the missing link" between users and data bases.
The KBDS system consists of 2 subsystems, which eventually are connected only by a data network. The first one is the interface, through which the user interacts, based on his own well-known models. The second one is the knowl-
110
edge base, which holds all the knowledge necessary for supporting the users. In figure 2 the principles for the KBDS system are illustrated.
User's Models
KBDS KNOWLEDGE BASE
KBDSSYSTEM
Figure 2. The interface and the knowledge base. of the KBDS system. The KBDS system serves as an intennediary system between the manager and one or more relational databases.
The KBDS interface is a dynamic and database-neutral interface, which automatically produces a customized data selection application, as the user asks for infonnation on a specific object. At the same time, the application selects and merge the needed set of data.
The KBDS knowledge base is implemented into a relational data base, and it represents and integrates all infonnation and knowledge necessary for supporting the user through the search sessions. The knowledge base keeps the major part of the knowledge, which was shown inside the head of the manager in figure 1. It means, that the knowledge base keeps information about how to search, where to search and how to combine and merge data. The user needs only to specify in one of his own models, which of the objects he is interested in.
The interface and the knowledge base will be discussed in details in the following. (If you are only interested in how to use the KBDS system, you can continue your reading at the headline "The Use of Models in the KBDS System").
The KBDS Interface The best way to implement the KBDS interface is in a graphical, windowbased interface. And the best way to interact with the system is to use a mouse. In this way you will be able to take the best advantages of the KBDS system.
i
J
The KBDS-interface is unique, because it is general, self-generative, dynamic and database-neutral. These characteristics will be explained in the following.
The interface is general because it gives you access to any data set in all of the data bases connected to the KBDS system.
The interface is self-generative because the system during the search session automatically builds up a customized data selection application. Thanks to this application, the user is, in an interactive way, able to select and extract the needed data sets. And you do not need to write one line of code to get that far. A few clicks with the mouse and the system will do the work for you.
The user (and eventually the programmer) will also be able to pse the KBDS interface as a prototyping tool, when trying to make an standard application suitable for selecting a specific set of data.
The interface is dynamic, because data models, data base designs and data in all the data bases connected to the KBDS system, are allowed to change without affecting the functionality of the KBDS interface. You will only need to update some descriptions in the knowledge base, every time data models or designs models are changed in connected data bases.
The interface is database-neutral, because the interface talks SQL and because the interface will be able to act in the same way, no matter which relational data base is to be used for implementation of the knowledge base. If a certain data base talks ANSI-SQL, and you are able to connect it by network, then it can be connected to the KBDS system. If you want the knowledge base of the KBDS system to be stored in such a data base, you will be free to do that, too.
Later in this paper, some examples, which illustrates the facilities in the inter. faces, will be described.
Most of the special characteristics of the KBDS system are caused by the advanced, integrated knowledge base, which will be described in the following.
The KBDS Knowledge Base The KBDS knowledge base is a data base, which captures much more meaning, than you see in traditional data bases. The KBDS knowledge base supports inheritance and it supports a very high level of integrity. More details on these characteristics will be given below.
Inheritance means that you are operating with superterms and subterms in a tree-like structure - often called a Thesaurus. The ISO Standard 2788 "Documentation - Guidelines for the establishment and development of monolingual thesauri" is used when building thesauri in the KBDS system. Use of inheritance makes the KBDS system able to deal with generalization and spe-
112
I· ,
f~ I, i·~ 1: t: \, ;: E t f! "
I ~ ~:
~ k oj
',': ~,
'. I t; j' (,
:~ t .. j: j~.
f:~
~ ~ f ,. /.
N 1~: r· '.f. .~:"
t ir
.~ ,;:, ~~
\ ~' ~.
~ \.. :;;
~H' , T
~~
'}3 )C
".~ .;.:
cialization in dialogues with the users, and it supports a powerful data selection method used by the system.
The integrity in the knowledge base is a crucial feature of the system. The knowledge base is designed according to a fully integrated meta-model shown in figure 3.
Entity-type i level
(Thesaurus) Model levels
Entity ~ level
Signatures: 0 Entity-type 0 Entity
o Relationship • Data ® SOL-statement
Figure 3. The meta-model of the KBDS knowledge base.
The meta-model shown in figure 3 is coherent from entity-type level to data level. The entity-types constitute the inheritance structure, which is also the system's thesaurus. It serves as an effective data selection tool for users, and every user is allowed to customize his part and version of the system's thesaurus for using it in his personal KBOS interface.
The entity-types from the system's thesaurus will at the same time be suitable for tenn indexing of all sorts of documents, ftles, etc .. The thesaurus can then be used as a controlled indexing vocabulary, which gives the user an effective tool for retrieval of all sort of large knowledge representations like documents, computer programs, etc. It was indeed what the thesauri originally was meant for, according to ISO 2788.
On the bottom of the meta-model in figure 3 you can chose between saving data directly into the knowledge base or saving a SQL-statement to be used on data net, when you need contact to external data bases. These SQL-statements can be used to bring on-line data to the KBOS system or they can be used to download data from external data base to the KBOS knowledge base.
113
If you want to download or carry out an on-line search from other relational data bases, you have to tell the knowledge base, where the data sets logically is stored. Then the KBDS system will be able to use such data sets, as if they were stored in the knowledge base itself.
Besides the meta-model shown in figure 3, some meta-rules are used to control and maintain the KBDS knowledge base. Some of these rules are shown in figure 4. Together, the meta-model and the meta-rules make it possible to build up and maintain a very high degree of integrity and consistency in the knowledge base.
A. Every entity-type must be included in the same coherent thesaurus B. Every individual entity must be connected to at least one entity-type C. Every knowledge statement must include exactly one relationship
Figure 4. Some of the most important meta-rules of the KBDS knowledge base.
Meta-rule A in figure 4 makes it sure, that you will always fmd one coherent system thesaurus holding all the entity-types of the knowledge base. Typically this thesaurus could be a shared company thesaurus, and you will nonnally find most of the needed entity-types if you look for the entity-types in the companies Entity/Relationship-models.
Multiplicity in the thesaurus gives no problems, as long as you avoid inconsistency in the inheritance structure. Loops are not allowed.
The meta-rule B makes it sure, that all individual entities are classified according to the thesaurus. Of cause many entities belong to more than one entitytype. For example a specific employee can belong to both "Sales Managers" and "Project Managers". But if you specify, that he belongs to "Sales Managers", you do not need to tell the system, that he also belongs to the "Managers". Because of the built in inheritance structure, the system already knows, that all "Sales Managers" are "Managers". That is one of the ways inheritance is used in the KBDS system.
The meta-rule C in figure 4 tells, how to fonnulate any "knowledge representation element" kept in the KBDS knowledge base. In fact every knowledge representation element is fonnulated in a readable natural language in the knowledge base. That makes it easier to read and manipulate the knowledge representation directly.
On the lowest logical level in the knowledge base, surrogates are used intensively for perfonnance purposes. However, the user never has to use these internal identification numbers.
114
The KBDS knowledge base has also got some other remarkable characteristics. The systems knowledge base can keep itself free of redundancy, and it is automatically kept fully normalized and fully indexed. These features will not be discussed further in this paper.
The Use of Models in the KBDS System The KBDS system is model-based, which indicates that the knowledge base contains and operates with different sorts of models or knowledge structures defmed by the users. .
Tree-structures or tree-like structures can be dealt with dynamically by the KBDS system, which means that the system uses such models graphically in the interface, and the system is able to generate and maintain such structures. Figure 5 shows some examples of models, which can be used by the user.
Inheritance Structure Activity Network
Plant Lay-out M6
Figure 5. All types of models can be used in the KBDS system. Users derme their own models and use them for object-identification and as starting points for the data selections.
In figure 5 you can see an inheritance structure in the upper left corner. Referring to figure 3, this kind of models belong to the entity-type level, because it is made of pure entity-types.
In figure 5 you can also see an organizational structure, which is quite another sort of tree-structure. This structure is built up only of individual entities and you will find no inheritance in such a structure. Because of that it corresponds to the entity level in figure 3.
115
'i'"
The activity network and the plant lay-out in figure 5 are made of individual entities too, but as they are not tree-like structures, they cannot be generated and maintained automatically by the system. However, they can be used like other entity-level models for identifying entities in the interface of the KBDS system.
How to Select Data by the KBDS system Thanks to the integrity and coherence of the KBDS system's meta-model, you will be able to start your search wherever you like. I refer to the meta-model in figure 3.
As you can see in the top menu bar of the KBDS interface lay-out in figure 6, you can choose different starting points for your search session. If you click on MODELS, the system makes you choose between Thesauri. Trees and ~ models.
Traditional data search in the data tables of the KBDS knowledge base is possible too, but it is not recommended.
Data Selection Based on the User's Thesaurus. An example. In the following example you will see, how the KBDS interface in a userfriendly and effective way supports the search strategy. The Staff Thesaurus from figure 5 is used as a starting point. Se figure 6.
Company Th_auru. Private Th_uru. stan Thesaurus Product Th_urua
Figure 6. Search based on the user's Staff Thesaurus. The tree-structured thesaurus is ready for a mouse-click on the actual object.
This screen in figure 6 illustrates how MODELS and afterwards Thesauri have been chosen. Following the system shows a list of different thesauri. The user
116
! I·
I', . ,
then choses Staff Thesaurus and following this thesaurus is shown, ready for a click on one of the 6 types of Employees.
When you make a choice in a thesaurus, you are recommended to make your choice as specific as possible. That makes it possible for the system to optimize the search. To specify, you must chose a suitable entity-type as low as possible in the tree-structure.
If you click on Managers, the next screen automatically shows a distinct list of column names and similar names of relationships. These names of columns and relationships correspond partly to columns in different data base tables. See the list of columns and relationships in figure 7.
MOOELS
no. name
4 Staff 5 Projects 1 Surnames
Salary 3 Telephon 2 Name
Figure 7. All columns and relationships connected to Mana~ers in the knowledge base and in connected data bases are shown. The user is free to click on which of the columns, he wants data from.
To get the data-output from the system, you have to click on the columns, from which you want data included. As you click, you determine at the same time, the ordering of the columns in the following data table. When you have finished pointing out columns, you click on DATATABLE in the top menu bar. See screen in figure 7.
Then the KBDS system· selects and merges data from one or more connected databases, or eventually only from the knowledge base itself. Some seconds later the system shows your data set in a data table as illustrated in figure 8.
117
MODELS I RELATIONSHIPS I DATATABLE LANGUAGE I MACROES I DATAEXC.
Data on: Managers Query D Score ._ Surnames .. Name < •.. ··lTelephone ··Staff· .•.... ······Projects
Star Pet .. ~ ........0468970DeI'l'I~ .•• [:ITtchnology l' Hanson·· •.·•· •. ·Allal:i ···IO~5~78[]SuI11~~p Pat1y .. proJect GraCE!·· ....BrI.n. . .....•... 1... 24 34530. B1ac:k ··OD-proJect Smith David C055768 0Wayn.. ..Envlronment F Cassidy ···.····Slmon ···246879 OSQtlmlth>COD-proJect .... Smart ·······JotinnYH 1268790.Jan$()n CPat1y-proJect Adams ············BO ... .•. .....1454623 OKlmbl,rOTechnology 1 ,
Figure 8. The KBDS system shows a virtual data table as data output. This data table consists of data selected and merged from one or more data bases or from the knowledge base itself.
Now the most critical and problematical part of the data selection is carried out. The user have been able to perform an advanced data selection maybe based on several tables and several data bases.
Some of the data elements in figure 8 are prefIXed by small squares, which indicates that only one of the data elements in the field is shown. If you click on the square, the system will show you a list of all the values in the data field.
If you only need a part of the data set shown in figure 8, you are allowed to use traditional Query By Example to reduce the numbers of records.
As you see, any user will be able to deal with data selection, if they are given an intelligent and user-friendly tool like the KBDS system. There is no need for the user to learn anything about data models, data base design, data base languages and so on. The system makes it sure that he gets exactly the data he needs for further analysis and decision support.
For each column you want to add to the output data table, you only have to click out one more column in the screen shown in figure 7.
It is very hard to imagine, how to perform advanced data selection in a more simple and user-friendly way.
118
If you from the beginning had recorded your clicks with the mouse, you would afterwards be able to run the application again as a macro-application. And then the application would automatically be able to select the needed data set based on on-line data, every time you need it.
Search Based on Non-Thesaurus Models If you chose a model, which is not a thesaurus, you will see, that the interface looks and acts exactly the same way as before. The only difference is, that the system will not use inheritance, because inheritance does not exist between individual entities.
There is no need to show another data selection session. Any of the user-models shown in figure 5 could be used as starting point for a data selection session, and the KBDS interface would be able to supports the data selections in exactly the same way.
Conclusion If you are using knowledge based techniques together with graphical, windowbased interfaces, almost every user can be made able to carry out advanced data selection across one or more data bases. And at the same time, the user can develop an interactive data selection application for future use.
Prototypes of the KBDS system have shown, that it is possible to develop and use a data selection system like the one described in this paper.
The next step for the KBDS system must be implementation and tests in industrial environments.
That is why we are looking for co-operation with companies or software houses, to help us bringing this interesting tool from the laboratories to exploitation in real life.
Please contact the author for questions, or if you are interested in co-operation.
SjlSren Selmar Ph.D., M. of Science University of Aalborg
Fibigerstraede 13 DK- 9220 Aalborg 0
Denmark Phone. 45-98-158522 Fax.no.45-98-153030
119