recent advances in data mining

2
Engineering Applications of Artificial Intelligence 19 (2006) 361–362 Editorial Recent advances in data mining Data mining methods have been successfully introduced in many fields. It is still a research topic, but besides that of tremendous interest for the industry in order to solve their real-world problems. Consequently, the research in data mining is not only driven by theoretical aspects. More than any other field it is influenced by practical problems and researchers are consequently taking up the request for the special needs of the industry and incorporate them into their research aspects. Therefore, new research work on distributed data clustering, incremental clustering and pattern mining is published in this special issue that should inspire the community to further developments. Clustering is still a topic of tremendous interest. The new aspect of distributed clustering while preserving the data privacy is becoming more and more important, since under the recent computer technology trends data sources are created in different places for one specific problem that represents the data of the specific workplace, and as such they are valuable, but would be more valuable still if they could be set into a much broader context. Therefore, combining several data sources representing data of the same problem from different places, would allow obtaining more reasonable results as concerns application. A particular field of application is medicine where data on a disease are collected and kept in one hospital; but combining the data from different places would yield a much larger data base and could provide more valuable results. To ensure that the owner of the data will allow using his data for a specific analysis, we need to guarantee the privacy of the data without loosing accuracy and the explanation capability of the results. The paper of da Silva and Klusch (2006) is dealing with the development of clustering methods that can work under these require- ments. The continuously created data streams by the World Wide Web or by automatic data acquisition systems, such as image scanners in medicine or quality monitoring systems in any manufacturing process, require incremental data-analysis methods that can analyse the data as long as they arrive in temporal sequence without starting the analysis process from scratch after each new sample. Bouguila and Ziou (2006) present in their paper an on- line clustering method based on the Dirichlet distribution and minimum message length principle. Incremental graph-clustering methods for case-based maintenance are presented by Perner (2006). Using clustering to learn distance function for supervised similarity assessment is presented by Eick et al. (2006). Mining patterns in a large collection of data is becoming more and more important. A problem of finding paired itemsets with high correlation in one database is already known as Discovery of Correlation and has been studied, as the highly correlated itemsets are characteristic in the database. However, even non-characteristic paired itemsets are also meaningful, provided the degree of correlation increases significantly in the local database as compared with the global one. This problem is studied by Taniguchi and Haraguchi (2006). Medical applications are still of great interest to the data mining community as well as to practioners. The usage of Data Mining methods for image segmentation of medical images is consequently further developed. Shuo Li et al. (2006) present a method based on principal component analysis and support vector machines. New applications for intrusion detection and medical literature mining advance the application of data mining. The explosion of knowledge in many fields leads to a huge amount of literature and records that requires concept-knowledge in order to be able to retrieve the desired information from a literature data base. This concept knowledge can be built automatically by using concept mining methods. This is described by Bichindaritz and Akkineni (2006) in their paper. Network intrusion detection is an arising topic. The tremendous need to ensure the security of networks and data is paving the way of this topic. Automatic detection methods are necessary to observe the huge amount of traffic data and to find out novel situations. Perdisci et al. (2006) present recent results in their paper. All the papers in this special issue are selected papers from the Industrial Conference on Data Mining ICDM- Leipzig 2005 (www.data-mining-forum.de) and the Inter- national Conference on Data Mining MLDM 2005 (www.mldm.de). The program of these two events shows once more that these events have developed over the years into the leading meeting places for data mining researchers in pattern recognition and industry. ARTICLE IN PRESS www.elsevier.com/locate/engappai 0952-1976/$ - see front matter r 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2006.01.015

Upload: petra-perner

Post on 26-Jun-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Recent advances in data mining

ARTICLE IN PRESS

0952-1976/$ - se

doi:10.1016/j.en

Engineering Applications of Artificial Intelligence 19 (2006) 361–362

www.elsevier.com/locate/engappai

Editorial

Recent advances in data mining

Data mining methods have been successfully introducedin many fields. It is still a research topic, but besides that oftremendous interest for the industry in order to solve theirreal-world problems. Consequently, the research in datamining is not only driven by theoretical aspects. More thanany other field it is influenced by practical problems andresearchers are consequently taking up the request for thespecial needs of the industry and incorporate them intotheir research aspects.

Therefore, new research work on distributed dataclustering, incremental clustering and pattern mining ispublished in this special issue that should inspire thecommunity to further developments.

Clustering is still a topic of tremendous interest. The newaspect of distributed clustering while preserving the dataprivacy is becoming more and more important, since underthe recent computer technology trends data sources arecreated in different places for one specific problem thatrepresents the data of the specific workplace, and as suchthey are valuable, but would be more valuable still if theycould be set into a much broader context. Therefore,combining several data sources representing data of thesame problem from different places, would allow obtainingmore reasonable results as concerns application. Aparticular field of application is medicine where data on adisease are collected and kept in one hospital; butcombining the data from different places would yield amuch larger data base and could provide more valuableresults. To ensure that the owner of the data will allowusing his data for a specific analysis, we need to guaranteethe privacy of the data without loosing accuracy and theexplanation capability of the results. The paper of da Silvaand Klusch (2006) is dealing with the development ofclustering methods that can work under these require-ments.

The continuously created data streams by the WorldWide Web or by automatic data acquisition systems, suchas image scanners in medicine or quality monitoringsystems in any manufacturing process, require incrementaldata-analysis methods that can analyse the data as long asthey arrive in temporal sequence without starting theanalysis process from scratch after each new sample.Bouguila and Ziou (2006) present in their paper an on-line clustering method based on the Dirichlet distribution

e front matter r 2006 Elsevier Ltd. All rights reserved.

gappai.2006.01.015

and minimum message length principle. Incrementalgraph-clustering methods for case-based maintenance arepresented by Perner (2006). Using clustering to learndistance function for supervised similarity assessment ispresented by Eick et al. (2006).Mining patterns in a large collection of data is becoming

more and more important. A problem of finding paireditemsets with high correlation in one database is alreadyknown as Discovery of Correlation and has been studied,as the highly correlated itemsets are characteristic in thedatabase. However, even non-characteristic paired itemsetsare also meaningful, provided the degree of correlationincreases significantly in the local database as comparedwith the global one. This problem is studied by Taniguchiand Haraguchi (2006).Medical applications are still of great interest to the data

mining community as well as to practioners. The usage ofData Mining methods for image segmentation of medicalimages is consequently further developed. Shuo Li et al.(2006) present a method based on principal componentanalysis and support vector machines.New applications for intrusion detection and medical

literature mining advance the application of data mining.The explosion of knowledge in many fields leads to a

huge amount of literature and records that requiresconcept-knowledge in order to be able to retrieve thedesired information from a literature data base. Thisconcept knowledge can be built automatically by usingconcept mining methods. This is described by Bichindaritzand Akkineni (2006) in their paper.Network intrusion detection is an arising topic. The

tremendous need to ensure the security of networks anddata is paving the way of this topic. Automatic detectionmethods are necessary to observe the huge amount oftraffic data and to find out novel situations. Perdisci et al.(2006) present recent results in their paper.All the papers in this special issue are selected papers

from the Industrial Conference on Data Mining ICDM-Leipzig 2005 (www.data-mining-forum.de) and the Inter-national Conference on Data Mining MLDM 2005(www.mldm.de). The program of these two events showsonce more that these events have developed over the yearsinto the leading meeting places for data mining researchersin pattern recognition and industry.

Page 2: Recent advances in data mining

ARTICLE IN PRESSEditorial / Engineering Applications of Artificial Intelligence 19 (2006) 361–362362

References

Bichindaritz, I., Akkineni, S., 2006. Concept Mining for Indexing Medical

Literature, Engineering Applications of Artificial Intelligence, in this

special issue, doi:10.1016/j.engappai.2006.01.009.

Bouguila, N., Ziou, D, 2006. Online clustering via finite mixtures of

dirichlet and minimum message length. Engineering Applications of

Artificial Intelligence, in this special issue, doi:10.1016/j.engappai.

2006.01.012.

Eick, Chr. F., Rouhana, A., Bagherjeiran, A., Vilalta, R., 2006. Using

clustering to learn distance functions for supervised similarity

assessment. Engineering Applications of Artificial Intelligence, in this

special issue, doi:10.1016/j.engappai.2006.01.004.

Perdisci, R., Giacinto, G., Roli, F., 2006. Alarm clustering for intrusion

detection systems in computer networks. Engineering Applications of

Artificial Intelligence, in this special issue, doi:10.1016/j.engappai.

2006.01.003.

Perner, P., 2006. Case base maintenance by conceptual clustering of

graphs. Engineering Applications of Artificial Intelligence, in this

special issue, doi:10.1016/j.engappai.2006.01.014.

Shuo Li, Fevens, Th., Krzyzak, A., Li S., 2006. Automatic clinical image

segmentation using pathological modelling, PCA and SVM. Engineer-

ing Applications of Artificial Intelligence, in this special issue,

doi:10.1016/j.engappai.2006.01.011.

da Silva, J.C., Klusch, M., 2006. Inference in distributed data clustering.

Engineering Applications of Artificial Intelligence, in this special issue,

doi:10.1016/j.engappai.2006.01.013.

Taniguchi, T., Haraguchi, M., 2006. Discovery of hidden correlations

in a local transaction database based on differences of correlations.

Engineering Applications of Artificial Intelligence, in this special issue,

doi:10.1016/j.engappai.2006.01.006.

Petra PernerInstitute of Computer Vision and Applied Computer

Sciences, IBaI, Kornerstr. 10, 04107 Leipzig, Germany

E-mail addresses: [email protected],[email protected].