[ieee 2010 seventh international conference on information technology: new generations - las vegas,...

4
Trend Ontology for Knowledge-based Trend Mining in textual Information Olga Streibel Corporate Semantic Web Institute for Computer Science Free University Berlin Berlin, Germany [email protected] Malgorzata Mochol Networked Information Systems Institute for Computer Science Free University Berlin Berlin, Germany [email protected] Abstract—Providing ontologies for the automatic trend detection enhance the quality of trend predictions. However, in the case of dynamic and fuzzy expert knowledge like the knowledge used in trend detection, it is difficult to formalize knowledge unambiguously and in a static way. In this paper we report on our experiences in modeling and formalizing trend ontology for automatic knowledge-based trend detection by the example of market research, i.e. we describe the knowledge-based trend mining approach and requirements for trend ontology, discuss obstacles in modeling trend knowledge and outline three lightweight trend ontologies modeled. Keywords-information retrieval; knowledge modeling; ontology engineering; trend mining I. INTRODUCTION Trend Mining is a term describing trend detection in general and can refer either to the detection of emerging topics from text analysis as described in Emerging Trend Detection in Text Mining (ETDiTM) [5] or to the detection of trends based on numeric data analysis as in the case of time series analysis. In this work we refer to the former one. The research described in this paper has been conducted in context of the TREMA (Trend Mining, Analysis and Fusion of Multimodal Data) project 1 In this paper, we particularly present work on one of the trend ontologies developed during the TREMA project - the trend ontology for market research. Although this paper that delivered unique practical and theoretical experience in mining trends from hybrid data (numeric data and texts). In the TREMA project we mainly focused on exploring different solutions based on Data Mining, Text Mining and Semantic Web working on the use cases from two business fields: market research and financial markets. In the context of the project we collected crucial requirements and developed a general concept of a platform for mining trends from hybrid data. One of the TREMA goals was to offer formalized knowledge based on significant concepts that are utilized by analysts for detecting trends from text streams of the particular business field and to apply this knowledge to statistical learning methods in order to improve the text clustering used for automatic trend detection. TREMA knowledge bases have been implemented as trend ontologies. 1 This project was funded by Investitionsbank Berlin and realized under cooperation with: neofonie GmbH, JRC GmbH, Metrinomics GmbH strongly focuses on trend ontology, the scope of our trend mining research lies on the sophisticated information retrieval realized by combining statistical machine learning methods with semantic technologies. The rest of the paper is organized as follows: in Section II we describe the market research case study which serves as an use case for the knowledge-based trend mining idea outlined in Section III. Section IV gives an insight into a particular trend ontology as an example of a general approach for modeling knowledge relevant for detecting trends. We conclude the paper in Section V with findings on trend knowledge modeling and an issue for the future work. II. MARKET RESEARCH: CASE STUDY The objectives of market research projects are to identify market trends as well as to analyze consumer preferences and behavior in the market 2 . In general, market research study is accomplished in projects focused on a certain topic and based on two main types of questions: (i) quantitative questions (scaled questions): single choice questions and multiple choice questions and (ii) open ended questions: results of primary research (e.g. reasons or motivations, comments, etc.), results of secondary research (e.g. results based upon Internet research in order to analyze general trends in a specific market). Both question types are crucial for trend detection. While the analysis of quantitative questions is based on the examination of numeric data and can be done automatically using appropriate statistical tools, the analysis of open ended questions still requires human involvement since it is based on the opinion analysis – text analysis where the steps based on categorization, generalization, and interpretation of information are mostly conducted manually 3 2 In our trend ontology example, we are referring to the market studies in the high tech market. . Categorization involves the analysis of positive or negative tagged customer comments that are written in form of unstructured text. Furthermore, the secondary research in market studies includes, in general, the analysis of Internet sources like reports, comments, and news articles that are relevant for the topic of the market research study. Secondary research, like primary research, aims at identifying the customers' opinion trends by categorizing 3 As for Metrinomics GmbH, 2007-2008 2010 Seventh International Conference on Information Technology 978-0-7695-3984-3/10 $26.00 © 2010 IEEE DOI 10.1109/ITNG.2010.232 1285

Upload: malgorzata

Post on 09-Apr-2017

215 views

Category:

Documents


2 download

TRANSCRIPT

Trend Ontology for Knowledge-based Trend Mining in textual Information

Olga Streibel Corporate Semantic Web

Institute for Computer Science Free University Berlin

Berlin, Germany [email protected]

Malgorzata Mochol Networked Information Systems Institute for Computer Science

Free University Berlin Berlin, Germany

[email protected]

Abstract—Providing ontologies for the automatic trend detection enhance the quality of trend predictions. However, in the case of dynamic and fuzzy expert knowledge like the knowledge used in trend detection, it is difficult to formalize knowledge unambiguously and in a static way. In this paper we report on our experiences in modeling and formalizing trend ontology for automatic knowledge-based trend detection by the example of market research, i.e. we describe the knowledge-based trend mining approach and requirements for trend ontology, discuss obstacles in modeling trend knowledge and outline three lightweight trend ontologies modeled.

Keywords-information retrieval; knowledge modeling; ontology engineering; trend mining

I. INTRODUCTION Trend Mining is a term describing trend detection in

general and can refer either to the detection of emerging topics from text analysis as described in Emerging Trend Detection in Text Mining (ETDiTM) [5] or to the detection of trends based on numeric data analysis as in the case of time series analysis. In this work we refer to the former one. The research described in this paper has been conducted in context of the TREMA (Trend Mining, Analysis and Fusion of Multimodal Data) project1

In this paper, we particularly present work on one of the trend ontologies developed during the TREMA project - the trend ontology for market research. Although this paper

that delivered unique practical and theoretical experience in mining trends from hybrid data (numeric data and texts). In the TREMA project we mainly focused on exploring different solutions based on Data Mining, Text Mining and Semantic Web working on the use cases from two business fields: market research and financial markets. In the context of the project we collected crucial requirements and developed a general concept of a platform for mining trends from hybrid data. One of the TREMA goals was to offer formalized knowledge based on significant concepts that are utilized by analysts for detecting trends from text streams of the particular business field and to apply this knowledge to statistical learning methods in order to improve the text clustering used for automatic trend detection. TREMA knowledge bases have been implemented as trend ontologies.

1 This project was funded by Investitionsbank Berlin and realized under cooperation with: neofonie GmbH, JRC GmbH, Metrinomics GmbH

strongly focuses on trend ontology, the scope of our trend mining research lies on the sophisticated information retrieval realized by combining statistical machine learning methods with semantic technologies.

The rest of the paper is organized as follows: in Section II we describe the market research case study which serves as an use case for the knowledge-based trend mining idea outlined in Section III. Section IV gives an insight into a particular trend ontology as an example of a general approach for modeling knowledge relevant for detecting trends. We conclude the paper in Section V with findings on trend knowledge modeling and an issue for the future work.

II. MARKET RESEARCH: CASE STUDY The objectives of market research projects are to identify

market trends as well as to analyze consumer preferences and behavior in the market 2 . In general, market research study is accomplished in projects focused on a certain topic and based on two main types of questions: (i) quantitative questions (scaled questions): single choice questions and multiple choice questions and (ii) open ended questions: results of primary research (e.g. reasons or motivations, comments, etc.), results of secondary research (e.g. results based upon Internet research in order to analyze general trends in a specific market). Both question types are crucial for trend detection. While the analysis of quantitative questions is based on the examination of numeric data and can be done automatically using appropriate statistical tools, the analysis of open ended questions still requires human involvement since it is based on the opinion analysis – text analysis where the steps based on categorization, generalization, and interpretation of information are mostly conducted manually3

2 In our trend ontology example, we are referring to the market studies in the high tech market.

. Categorization involves the analysis of positive or negative tagged customer comments that are written in form of unstructured text. Furthermore, the secondary research in market studies includes, in general, the analysis of Internet sources like reports, comments, and news articles that are relevant for the topic of the market research study. Secondary research, like primary research, aims at identifying the customers' opinion trends by categorizing

3 As for Metrinomics GmbH, 2007-2008

2010 Seventh International Conference on Information Technology

978-0-7695-3984-3/10 $26.00 © 2010 IEEE

DOI 10.1109/ITNG.2010.232

1285

news regarding customer sentiments hidden in texts due to the categories given by the project topic.

In general, limitations of the current approaches are mainly based on the difficulty of automatic trend discovery in textual information (customer opinions, articles, reports, news). Regarding our case study, the main goal of market research is the analysis of market and buying patterns by processing a broad amount of text based information. The core task in such text processing is the evaluation of customer opinion which is based on enhanced text analysis. This includes the detection of relevant statements, evaluation of statements as well as text categorization due to the given project category list regarding the dependencies between sentiments and categories.

III. KNOWLEDGE-BASED TREND MINING Different methods for trend detection are shown in an

overview of emerging trend detection systems (ETDS) [5]. All ETDS are based on statistical algorithms while some of them apply user feedback to the trend detection process. Our knowledge-based approach for mining trends in texts [7] applies formalized expert knowledge to the process of automatic trend detection and, in turn, extends the statistical trend detection process.

In order to realize the knowledge-based trend mining, trend knowledge has to be identified and formalized. Referring to Semantic Web ontology approach [4], we propose the usage of trend ontology as knowledge base for the automatic trend mining.

A trend, in terms of market research, is the evolution of customer's opinion referring to a specific topic that can be described by its categories or labels. Customer opinion is strictly conjoined with sentiments used by customers to express linguistically their emotional viewpoint on specific issues. In general, the automatic trend mining as for market research should allow the enhancement of process efficiency in the analysis of textual market research data that is generated in primary and secondary research described in the former section. This includes the automatic categorization and valuing process of open ended questions, the filtering of relevant information, and trends identification.

A trend ontology should support the analysis process by providing knowledge regarding main market research concepts that occurs in texts of the market research project (e.g. what is image in terms of market research, what is product quality). This includes also:

main keywords and terms used by customers in order to describe their opinion (substantives, verbs, adjectives, this brand fits to me, I like the nice logo),

customer opinion categories in terms of market research studies (overall satisfaction, level of commitment),

categorization of customer opinion based on a given list of categories that are relevant for the respective project,

knowledge about trend indicating features of any given keyword or term (positive, negative and neutral description keywords).

Considering requirements, the trend ontology has to be defined as a knowledge model that contains: (i) the meta-level knowledge about market research concepts (commonly used in the market research), (ii) common keywords used in the market research projects (based on market research specific projects) and (iii) knowledge about trend indicating terms and relations in market research. Furthermore, the trend ontology should be used as a knowledge base that can be applied in different phases of the trend mining process: feature extraction, selection and learning stage of the trend mining process (cf. Fig.1):

Figure 1. A general process of statistical learning for trend mining on texts

applying trend ontology to every stage of the process. 4

IV. TREND ONTOLOGY FOR MARKET RESEARCH

According to [3], Ontology Engineering (OE) is defined as the “set of activities that concern the ontology development process, the ontology life cycle, and the methodologies, tools and languages for building ontologies''. During the last years OE evolved from a pure research topic being common in scientific domains to real world applications, which was demonstrated by the wide range of projects with major industry involvement and by the increasing interest of SMEs requesting consultancy in this domain. At the beginning the knowledge engineers managed and controlled the ontology authoring process however, as the ontologies become larger covering more specific domains the involvement of the domain experts became indispensable and the ontology development could be tackled only through the intensive cooperation of ontology engineers and domain experts in the context of large spatial distributed teams. The authors of [2] state that ontology authoring process requires not only an active participation of domain experts but they should even lead the entire process providing the relevant domain and conceptual knowledge. Furthermore, a number of other aspects like dealing with context, or data and web integration become crucial. In order to build and deploy ontologies on a large scale beyond the boundaries of the academic community, there is still a need for technologies assisting the implementation process. The most OE methodologies rely on specialized knowledge engineers but in real world-settings the need for maintenance of domain ontologies emerges in the daily work of its users [1]. During our research in TREMA project, we experienced

4 ontology image from: http://accuracyandaesthetics.com/wp-content/uploads/2006/12/ontology.jpg

1286

the difficulty of applying common OE methodologies developed by academia to the practical problem of the trend ontology development. Therefore we used an agile, practical and expert-based method; the prototypes of trend ontologies for market research were developed under active participation of market research experts5

A. Keyword/concept based trend ontology

on the basis of three knowledge models. Our aim was to define lightweight knowledge base that can be used in real-time as enhancement for statistical learning method, therefore our trend ontologies do not include any rules. The ontologies are modeled for German language. We describe them as follow:

Relying on the experience of experts from the market research domain, we identified and modeled with Protégé using RDFS6

B. Term field based trend ontology

an initial keyword set categorized by the main concepts of the market research (our case considered only high tech market). The main set categories are: Image (image), Produktqualität (product quality), Kundenbeziehung (customer relation), Service (service), Stimmungsbild/Wahrnehmung/ Entscheidung (public opinion/customer’s view/decision). Each category is implemented as a class consisting of relevant concepts that describe the category. For product quality category, the concept set consists of: e.g. Zuverlässigkeit (realiability), Performanz/Leistung (performance/power), etc. We defined class property included_in, in order to express semantically the category membership of the given keyword/concept. In addition to the categorized concept sets, we modeled synonyms for several keywords/concepts and added the trend-indicating property to each concept that had been classified by experts as trend indicating ones. Keyword/concept based trend ontology is built on a very simple schema and can be easily applied, for instance, in order to extend the word based feature vector creation as for machine learning methods.

Extending the keyword-based trend ontology we observed the emergence of so-called term fields in market research, which correspond to the semantic fields from the Semantic Field Theory [6]. Relying on the semantic field idea, the extension of concept definition by adding term fields to the concept seemed reasonable. However, defining which term belongs to the concept field and whether a given term is trend indicating or not is difficult the more terms are used for the term field definition; we searched for the exact definition of trend indicating features in the texts of market research.

Applying statistical methods (e.g. term frequency in documents) supported by manual expertise, we identified adjectives that, according to experts, were significant for description of customer opinion. The most relevant adjectives were: vertrauenswürdig (reliable), kompetent (competent), vielseitig (all-round), aktuell (up-to-date).

5 For modeling we used Protégé tool version 3.0 - 3.3 http://protege.stanford.edu/ 6 http://www.w3.org/TR/rdf-schema/

Conducting the search for semantic fields of these adjectives and their relevance to the main concepts of market research domain, we detected the appearance of so called satisfier, disatisfier and sensitive7

identifier. We defined each main concept as a

category with its semantic field and its own that consists of diversificator. Identificators are adjectives belonging to the concept and describing its features, i.e. entertainment has entertainment identifier which is described by the adjectives: abwechslungsreich (varied), ansprechend (attractive), entspannend (relaxing), etc. Diversificator defines satisfier, disatisfier and sensitive which are adjectives grouped by the relevant meaning that refers to the positive, negative and neutral customer opinion about a given concept. Each identifier consists of a diversificator that refers to more or less positive customer opinion. The customer (dis)satisfaction refers to a (negative)/positive trend indication.

Trend ontology based on term fields adds the meta-level concepts: identifier, diversificator, sensitive, satisfier and disatisfier to the keyword based trend ontology and extends concept sets in term fields.

C. (Temporal) invariant scheme based trend ontology The adjective groups used as satisfier, disatisfier and

sensitive are important for the proper sentiment interpretation of a given set of texts. The sentiment interpretation helps for trend detection. However, the validity of diversificators often expires after some time. Assuming that adjectives used for describing customer satisfaction change with time, we searched for an invariant part of trend knowledge. The semi-automatic analysis of relevant market research news done by experts resulted in a structure that seemed to be valid for a long period of time and intuitively used by experts for analysis of market research texts. This (temporal) invariant scheme based trend ontology consists of three meta-level classes: general, quantification and classification. The class includes groups of the most important concepts like supplier and companies. Suppliers, which are important extraction features, are always used in market research projects (regarding our case study) in order to classify the relevance of the texts. The quantification part of our structure contains the idea of identifiers and diversificators, and it adds the amplifier 8

Classification consists of different categories that define the context for the quantifier. Its character is dynamic since it strongly depends on the context at a given point in time. The interesting subcategory of classification is the so-called structure that defines the basic structure for the context. We observed that this category particularly refers to the economic model of given market.

as a new meta-concept..

Even if we know that the trend-indicating keywords and concepts are changing in time, and that their positive or negative value differs and depends on the context, we

7 These terms are used in the marketing Satisfaction Research. In our case we used them to define adjective that express satisfaction or dissatisfaction in language. 8 Amplifiers are adverbs e.g.: sehr ( much), viel ( many), wenig (few), etc

1287

assume that there is an invariant trend structure which contains the three main trend detection parts: general concepts, the trend value concepts, and the classification structure that models the context of the trend.

V. CONCLUSIONS AND FUTURE WORK We observed the following crucial issues that, in our

opinion, need to be considered in further research on trend knowledge modeling:

Language: Texts relevant for market research include specific language mixed with common words expressing emotional estimation. Customer comments depend on the target group that has been focused in the market research studies – even if domain experts often use different descriptions than non-experts both descriptions have to be considered for trend mining. The synonyms of market research concepts are often hard to describe and have to be weighted since they have often slightly different semantic soundness: Engagement = {Involvement, Commitment, etc.}. i.e. commitment to service quality sounds different from engagement in service quality.

Market research studies are conducted in different languages. Semantic relations used for defining concepts dependencies rely strongly on language used in trend ontology model. Realizing our trend ontology for German language we faced problems like: which relations are proper for defining dependency between Kaufkraft (buying power/value) and Kaufentscheidung (buying decision) in terms of Stimmungsbild (the market mood). The modeling of synonyms and the use of relations like included_in or belongs_to, implies the emergence of concept groups rather than taxonomic relations. Trend ontologies have more “fuzzy” structure than ontologies created e.g. for Life Science.

Time: Concepts used for modeling market research ontology and their relevance change in time. There is a need for defining life cycles for categories modeled in trend ontology in order to detect if a given instance (i.e. satisfier instance) still belongs to the positive sentiments or if it drifted to the neutral sentiment; e.g. air bags in a car in the 80ties were used as a positive feature in car description, however, describing car nowadays, air bags are common part without that positive soundness from the past. Some concepts may “die” with time while some other can change their meaning or a new concept may replace the old one.

Context: Keywords and terms used in customer comments always depend on context. Regarding the project context, description concepts should be referring to the project topic, e.g talking about a radio in terms of the Internet may imply services like www.last.fm, while radio in the context of a car may imply concrete hardware device. In texts, the context of sentiments depends on the keywords used for its description. Picking up a concept definition

without considering concept’s term field may lead to false conclusions in trend mining.

Dynamics: Trend ontology covers a very dynamic knowledge. The aspects of time and context affect the ontology structure: meta ontology can be based on the temporal invariant scheme (that is invariant only for a given time period), middle ontology depends on market research topic and have to be adapted for every new study, the lowest level of trend ontology is the most dynamic one. Modeling trend knowledge aspect of dynamics should be considered from both knowledge level (concepts and their meaning are changing over time) and abstract level (in terms of knowledge formalization).

Concluding we can state that our work on ontology for trend detection in text collections of market research brought interesting results for modeling trend ontology in general. Continuing the research on knowledge-based trend mining, we will focus on collaborative techniques for knowledge acquisition, as for instance extreme tagging [8] and aim at applying them in order to extend the trend ontology prototypes.

ACKNOWLEDGMENT Authors thank all TREMA project members for

contributing to this work. Special thanks to Ruwen Poljak and Véronique Kaploun from Metrinomics.

REFERENCES

[1] S. Braun, A. Schmidt, and A. Walter, Ontology Maturing: A Collaborative Web 2.0 Approach to Ontology Engineering,

[2] In Proceedings of the Social and Collaborative Construction of Structured Knowledge Workshop co-loccated with the 16th International World Wide Web Conference (WWW2007)

[3] V. Dimitrova, R. Denaux, G. Hart, C. Dolbear, I. Holt, A. G. Cohn, Involving Domain Experts in Authoring OWL Ontologies, in Proceedings of the 7th International Semantic Web Conference (ISWC2008), 1--16, 2008

[4] A. Gómez-Pérez, M. Fernández-Lopéz and O. Corcho, Ontological Engineering, Springer 2003

[5] A. Kontostathis and L. Galitsky and W.M. Pottenger and S. Roy and D. J. Phelps, A Survey of Emerging Trend Detection in Textual Data Mining, in A Comprehensive Survey of Text Mining, Springer-Verlag 2003

[6] A. Lehrer, C.D. Jones, and E.F. Roberts, Semantic fields and lexical structure, North-Holland linguistic series: 11, American Elsevier 1974.

[7] O. Streibel, Semantic-Based Learning Method for Trend Recognition in Simple Hybrid Information Systems, In online Proceedings of Doctoral Consortium at Conference on Advanced Information Systems Engineering CAISE2008, pages 106-113

[8] V. Tanasescu and O. Streibel Extreme Tagging: Emergent Semantics through the Tagging of Tags, in Proceedings of the First International Workshop on Emergent Semantics and Ontology Evolution, 2007, ESOE2007, at ISWC and ASWC 2007, pages 84-94

1288