©2010 International Journal of Computer Applications (0975 – 8887)
Volume 1 – No. 12
1
Performance Assessment using Text Mining
Radha Shakarmani
Asst. Prof, SPIT Sardar Patel Institute of Technology
Munshi Nagar, Andheri (W) Mumbai - 400 058
Nikhil Kedar Student, SPIT
903, Sai Darshan Versova, Andheri (W)
Mumbai - 400 058
ABSTRACT
Here in this paper we use Text Mining-a feature of Web
Intelligence to derive information from the unstructured textual
data on the web and device the consensus based strategy to
business decisions. This will have two fold advantages, one
mitigate the risk early and second would provide a support for our
understanding and decision making. This concept is explained
with an example of evaluating a player‘s performance based on
minute to minute commentary of the match. Parameters such as
his position on field (for example in football – defenders,
midfielders, forwards and goal keeper), his past performance, his
present fitness and form, and such other parameters are
considered. Weightage / value for each parameter is decided and
information can be derived for analyzing a player‘s performance.
During analysis we view the comments, we read through fan
forums, blogs, newspaper reviews on the play, expert
commentator views, etc. This is either used as a correction factor
to enhance the credibility of the model.
1. IDENTIFYING INFORMATION
RESOURCES The first phase is to gather information. But information cannot be
gathered from the entire web due to its vastness. To reduce the
search time static sources (web site) are chosen to extract
information. Authenticity, correctness and up-to-date nature of
such sources are very important for information retrieval. Apart
from these static sources there might be sources which may
provide additional information but are unknown to the user. A
web crawler (also known as a web spider) is a program or
automated script which browses the World Wide Web to find
such sources.
The static site which gives the performance of all the players and
their ranking / values are identified and relevant information is
retrieved. In addition to the performance, information regarding
popularity or fan following of a player can be judged from blog
sites.
2. INFORMATION RETRIVAL
Information retrieval system identifies the document, pages in the
collection which matches the user query. Information retrieval
system allows us to narrow down the set of documents that are
relevant to the particular problem.
In our example analysis of minute to minute commentary, news
flows and blogs will be done using text mining.
As text mining involves applying very computationally –intensive
algorithms to large documents collections, it can be limited to
support information retrieved. Information retrieval can speed up
analysis considerably reducing the number of documents for
analysis. As most of the resources used to gather information are
static, it is not necessary that all the information in the form of
web pages is required for further analysis.
In the case study considered, information is derived from the sites
where minute by minute description about Football matches is
available. The project intends to consider evaluation of players
based on expert opinions combined with reviews from common
people obtained from blogs. IR systems will allow us to retrieve
such documents which will ease the process of text mining. These
documents will then be applied to the information extraction
systems which are systems used for text mining.
3. INFORMATION EXTRACTION
After short listing the required web pages text mining is applied.
The following three phases - information extraction, text mining,
converting unstructured data to structured data are closely related
to each other.
Information extraction is the process of automatically obtaining
structured data from unstructured natural language document.
Often this involves defining the general form of information that
we are interested in as one or templates which are then used to
Khandelwal Student, SPIT
B-401, Mahesh Tower, Sector-2 Charkop, Kavdivli (W)
Mumbai - 400 067
Existing search engines have many remarkable capabilities; but
what is not among them is deduction capability—the capability to
synthesize an answer to a query from bodies of information which
reside in various parts of the World Wide Web. Web Intelligence
is an area of research which attempts to provide this capability.
The whole procedure involves four main stages: Web crawling i.e
identifying information resources, information retrieval and
extraction, text mining and finally converting unstructured data to
structured data.
Keywords
Web Intelligence, NLP, Text Mining, Information Extraction,
Information Retrieval, GATE.
©2010 International Journal of Computer Applications (0975 – 8887)
Volume 1 – No. 12
2
guide the extraction process. Information extractions rely heavily
on the data generated by NLP systems.
The role of Natural Language processing in text mining is to
provide linguistic data to next phase. This is done by annotating
documents with information like sentences boundaries, parts of
speech parsing results. NLP may be deep (parsing every part of
every sentence and attempting to account semantically for every
part) or shallow (parsing only certain passages or phrases within
sentences or producing only limited semantic analysis), and may
even use statistical means to disambiguate word senses or multiple
parses of the same sentence.
Tasks that Information extraction systems can perform includes:
A) Term analysis: It involves analysis of one or more
words, multi word, terms like papers, PDF etc.
B) Named entity recognition: It involves identification of
names, dates, expressions of time, quantities, associated
percentage, units.
C) Fact extraction: Relationships between entities or
events.
The data generated during Information extraction phase is
structured information derived from unstructured textual data.
Information extraction will give information in the form of
relationships. Analysis of words, phrases will give us information
about an entity and relationship of that entity with other entities.
In our case information is used to design a database which can
serve as a useful tool to evaluate a player‘s performance. The
database contains attributes such as goals scored, red cards, assists
etc for allocating points.
Text mining using NLP is one of the approaches which can be
used. Other approaches like Semantic Web and OWL can also be
used to provide a solution.
In semantic Web we search for keywords and each keyword is
considered a class, if it can be further described (i.e it has
attributes) else it is considered an attribute if it has no further
description. For example: The classes can be a Player Name and
number of goals scored can form the attribute.
In OWL, we create a data dictionary which has all the possible
replacements that can be used for a particular attribute .e.g. a
player could be referred by his jersey number or by his nick name.
The information that is extracted using NLP also contains similar
replacements for the attribute. In this case we can create our own
Data Dictionary which can be used as a reference.
The contents of tables are updated at the end of every match. This
structured data (e.g.: The number of goal scored by a particular
player) is used to determine the performance.
To get information about a player‘s performance in a particular
match we make use of the textual data available as minute to
minute commentary on the web. To get information about a
player‘s popularity we can make use of Fan Forum and blogs.
Considering performance of a player in a recent match the blogs
can give the people‗s verdict.
Now Natural Language Processing can be used to process this
information. Analysis of this information can be used in
performance evaluation.
4. IMPLEMENTATION In our Project the process of Text Mining is implemented using an
open source tool kit –GATE (General architecture for Text
Engineering).The IE system in Gate is called ANNIE i.e A Nearly
New Information Extraction System. (developed by Hamish
Cunningham, Valentin Tablan, Diana Maynard, Kalina
Bontcheva, Marin Dimitrov and others).
The functioning of ANNIE relies on finite state algorithms and the
JAPE (JAVA Annotation Pattern Engineering) language. ANNIE
has two parts: Language resources and Processing resources.
Language resources consist of GATE Documents and Corpus. The
source of GATE Documents can be specified either by giving the
path of a locally stored file on the hard disk or by specifying the
URL of a web page. Document formats supported by GATE are
XML, HTML, SGML, Plain Text, RTF, Email, PDF and
Microsoft Word. A Corpus in GATE is a Java Set whose members
are Documents. ANNIE can run only on a corpus.
Processing Resources are responsible for performing text mining
on the corpus. There are many plugins available that provide
various functionalities. Some of the important ones applicable in
our project are Tokeniser, Gazetteer and JAPE Transducer.
The Tokeniser splits the text into simple tokens. There are five
types of token – word, number, symbol, punctuation and
spacetoken. The aim is to limit the work of the Tokeniser to
maximize efficiency, and enable greater flexibility by placing the
burden on the grammar rules (JAPE), which are more adaptable.
The Gazetteer Lists used are plain text files, with one entry per
line. An index file (lists.def) is used to access these lists; for each
list, a major type is specified and, optionally, a minortype. These
lists are compiled into finite state machines. Any text tokens that
are matched by these machines will be annotated with features
specifying the major and minor types. Grammar rules (JAPE) then
specify the types to be identified in particular circumstances. Each
gazetteer list should reside in the same directory as the index file.
JAPE provides finite state transduction over annotations based on
regular expressions. A JAPE grammar consists of a set of phases,
each of which consists of a set of pattern/action rules. The phases
run sequentially and constitute a cascade of finite state transducers
over annotations. The left-hand-side (LHS) of the rules consist of
©2010 International Journal of Computer Applications (0975 – 8887)
Volume 1 – No. 12
3
an annotation pattern that may contain regular expression
operators (e.g. *, ?, +). The right-hand-side (RHS) consists of
annotation manipulation statements. Annotations matched on the
LHS of a rule may be referred to on the RHS by means of labels
that are attached to pattern elements. The RHS of the rule contains
information about the annotation. Information about the
annotation is transferred from the LHS of the rule using the label
just described, and annotated with the entity type (which follows
it). Finally, attributes and their corresponding values are added to
the annotation. Alternatively, the RHS of the rule can contain Java
code to create or manipulate annotations. JAPE grammars are
written as files with the extension ‖.jape‖, which are parsed and
compiled at run-time to execute them over the GATE
document(s).
In our project we have created Gazetteer Lists consisting of
names of players based on the club to which they belong and
their position. The Tokeniser is used to annotate the entries in the
Gazetteer List. The JAPE rules written use these annotations on
the LHS to indentify the player and the JAVA code in the RHS
part is to assign points to that player. Different rules are written to
indentify various actions like a goal scored, a red/yellow card
shown, a save made, a penalty missed, a foul made, a clearance
made and so on.
The following two figures describe the implementation
Figure3. A Screen Shot of GATE
Here is a screen shot of GATE which shows the various Language Resources and Processing Resources in GATE and ANNIE being loaded
to run on the Corpus
©2010 International Journal of Computer Applications (0975 – 8887)
Volume 1 – No. 12
4
Figure4.A Screen Shot of the Data Base
Here is a screen shot of the Data Base which shows the allocation of points to a particular based on his performance in the match.
5. OTHER APPLICATIONS
In the above example, information derived was used to provide
performance analysis of players in the form of points i.e.
structured data. But crawlers along with text mining can support a
number of applications. The only things that will change are the
information resources and analyzing of information.
To track vote count during presidential elections or to track news
about stock market and performance of firms sites for crawling
will be different (e.g. news channels websites ,stock exchange
web sites) and the analysis instead of database can be dynamic
application showing textual updates. One can get information
regarding the quarterly reports of a company or about the status of
a particular stock and expert opinions on that stock from various
websites/news channels all on a single screen. Such an analysis
can help a prospective trader to make decision regarding which
shares to buy, when to buy and when to sell.
Other applications include applications for analyzing reviews and
performance of different products by performing a comparative
study, to identify the best product based on the individual needs
of a customer. For example if a customer intends to buy a Digital
Camera, he would want a comparative report of the various
companies to decide the best product for himself. For this
websites of companies like SONY, NIKON, etc as well as those
of retail outlets need to be crawled.
©2010 International Journal of Computer Applications (0975 – 8887)
Volume 1 – No. 12
5
6. CONCLUSION
The goal of web intelligence is to retrieve information about the
customer decision process, customer needs and customer
behavior. Retrieving this information gives marketing intelligence
the opportunity to improve their predictive models and to create a
serious customer view. In this article, we used the example of
football league to explain web intelligence using text mining
techniques for player assessment.
7. ACKNOWLEDGEMENT
We are very grateful & indebted to our project guide ―Prof. Radha
Shankarmani‖ for providing her enduring patience, guidance and
invaluable suggestions. She was a constant source of inspiration
for us and took utmost interest in our project. We would also like
to thank all the IT Staff members for their invaluable co-
operation. We are also thankful to all the students for giving us
their useful advice and immense co-operation. Lastly we would
like to convey our regards to the developers of GATE as well as
it‘s other world-wide users for their constant support and
guidance through the online mailing lists.
8. REFERENCES
[1] GATE . www.gate.ac.uk. General Architecture for Text
Engineering or GATE is a Java software toolkit originally
developed at the University of Sheffield since 1995.
[2] Web Intelligence: kis.maebashi-it.ac.jp/wi01/ www.web-
intelligence.com/
[3] Muslea, I. (Ed.). (2004). Papers from the AAAI-2004
Workshop on Adaptive Text Extraction and Mining (ATEM-
2004) Workshop, San Jose, CA. AAAI Press.
[4] Weiguo Fan, et. al., ―Tapping the Power of Text Mining,‖
Communications of the ACM, 49(9), 2006.
[5] Tan, A.-H. (1999), ―Text Mining: The state of the art
and the challenges‖, in Proceedings, PAKDD‘99 workshop
on Knowledge Discovery from Advanced Databases, Beijing,
April, 1999.
[6] Intelligence on the Web: www.fas.org/irp/intelwww.html
WIN: home WEB INTELLIGENCE
NETWORK,smarter.net/
[7] J. Srivastava et al., ―Web Usage Mining: Discovery
andApplications of Usage Patterns from Web Data,‖
SIGKDD Explorations, vol. 1, no. 2, 2000, pp. 12
[8] Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996),
―From data mining to knowledge discovery: An overview‖,
in U. Fayyad et al. (eds.) Advances in Knowledge
Discovery and Data Mining, MIT Press, Cambridge, Mass.
[9] Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996),
―From data mining to knowledge discovery: An overview‖,
in U. Fayyad et al. (eds.) Advances in Knowledge
Discovery and Data Mining, MIT Press, Cambridge, Mass., 1
[10] IEEE 2000b. IEEE Standard for Modelling and Simulation
(M&S) High Level Architecture (HLA) –Federate Interface
Specification. IEEE Std 1516.1-2000. IEEE Computer
Society, New York, NY.
Mesin pencari yang ada memiliki kemampuan yang luar biasa, tetapi apa yang tidak terdapat didalamnya adalah pengurangan kemampuan-kemampuan untuk mensintesis jawaban atas permintaandari isi informasi yang ada di berbagai bagian di World Wide Web. Intelijensi web adalah daerahpenelitian yang mencoba untuk memberikan kemampuan ini.
Dalam tulisan ini digunakan Text Mining-fitur intelijensi web untuk memperoleh informasi dari datatekstual tidak terstruktur di web dan perangkat strategi berbasis konsensus untuk keputusan bisnis.Hal ini memberikan dua keuntungan kali lipat, pertama mengurangi risiko awal dan kedua akanmemberikan dukungan untuk pemahaman dan pengambilan keputusan. Konsep ini dijelaskan dengancontoh mengevaluasi kinerja pemain berdasarkan komentar menit ke menit pertandingan. Parameterseperti posisi pemain di lapangan (misalnya dalam sepak bola - pemain bertahan, gelandang, dankiper), kinerja sebelumnya, kebugarannya sekarang, dan parameter lainnya akan dipertimbangkan.Nilai untuk masing-masing parameter ditentukan dan informasi dapat ditarik untuk menganalisiskinerja pemain. Selama analisis, kita melihat komentar, kita membaca forum penggemar, blog, ulasankoran terhadap permainan, pandangan ahli komentator, dll. Hal ini baik digunakan sebagai faktorkoreksi untuk meningkatkan kredibilitas model.
Seluruh prosedur melibatkan empat tahap utama: Web crawling, yaitu mengidentifikasi sumberinformasi, pencarian informasi dan ekstraksi, penambangan teks, dan akhirnya mengkonversi datatidak terstruktur menjadi data terstruktur.
Tahap pertama adalah untuk mengumpulkan informasi. Tapi informasi tidak dapat dikumpulkan dariseluruh web karena kepadatannya. Untuk mengurangi waktu pencarian sumber statis (situs web) yangdipilih untuk mengekstrak informasi. Keaslian, kebenaran, dan kemutakhiran dari sumber tersebutsangat penting untuk pencarian informasi. Selain dari sumber-sumber statis mungkin ada sumber yangmungkin memberikan informasi tambahan tetapi tidak diketahui oleh pengguna. Sebuah web crawler(juga dikenal sebagai spider web) adalah sebuah program atau skrip otomatis yang menelusuri WorldWide Web untuk menemukan sumber tersebut.
Sistem pencarian informasi mengidentifikasi dokumen dan halaman dalam koleksi yang sesuai denganpermintaan pengguna. Pencarian informasi sistem memungkinkan kita untuk mempersempit setdokumen yang relevan dengan masalah tertentu.
Setelah pengumpulan daftar singkat, halaman web penambangan teks akan diterapkan. Tiga faseberikut - ekstraksi informasi, penambangan teks, mengkonversi data tidak terstruktur menjadi dataterstruktur yang terkait erat satu sama lain.
Dalam contoh di atas, informasi yang diperoleh digunakan untuk menyediakan analisis kinerja pemaindalam bentuk poin yaitu data terstruktur. Tapi crawler bersama dengan text mining dapat mendukungsejumlah aplikasi. Satu-satunya hal yang akan berubah adalah sumber informasi dan proses analisisinformasi.
Tujuan dari inteligensi web adalah untuk mengambil informasi tentang proses pengambilan keputusanpelanggan, kebutuhan pelanggan, dan perilaku pelanggan. Mengambil informasi ini memberikaninteligensi pemasaran kesempatan untuk memperbaiki model prediksi mereka dan untukmenciptakan pandangan serius pelanggan. Pada artikel ini, digunakan contoh liga sepak bola untukmenjelaskan web intelijen menggunakan text mining teknik untuk penilaian pemain.
National Conference on Innovative Paradigms in Engineering & Technology (NCIPET-2012)
Proceedings published by International Journal of Computer Applications® (IJCA)
20
Multicore Processing for Classification and Clustering
Algorithms
V. Vaitheeshwaran School of Computing,
KL University Vijayawada, India
Kapil Kumar Nagwanshi School of Computing,
KL University Vijayawada, India
T. V. Rao School of Computing,
KL University Vijayawada, India
ABSTRACT Data Mining algorithms such as classification and clustering are
the future of computation, though multidimensional data-
processing is required. People are using multicore processors
with GPU’s. Most of the programming languages doesn’t
provide multiprocessing facilities and hence wastage of
processing resources. Clustering and classification algorithms
are more resource consuming. In this paper we have shown
strategies to overcome such deficiencies using multicore
processing platform OpenCL.
Keywords
Parallel Processing, Clustering, Classification, OpenCL, CUDA,
NVIDIA, AMD, GPU.
1. INTRODUCTION CLUSTERING is an unsupervised learning technique that
separates data items into a number of groups, such that items in
the same cluster are more similar to each other and items in
different clusters tend to be dissimilar, according to some
measure of similarity or proximity. Pizzuti and Talia[1]presents
a P-AutoClass technique for Scalable Parallel Clustering for
Mining Large Data Sets Data clustering is an important task in
the area of data mining. Clustering is the unsupervised
classification of data items into homogeneous groups called
clusters. Clustering methods partition a set of data items into
clusters, such that items in the same cluster are more similar to
each other than items in different clusters according to some
defined criteria. Clustering algorithms are computationally
intensive, particularly when they are used to analyze large
amounts of data. A possible approach to reduce the processing
time is based on the implementation of clustering algorithms on
scalable parallel computers. This paper describes the design and
implementation of P-AutoClass, a parallel version of the
AutoClass system based upon the Bayesian model for
determining optimal classes in large data sets. The P-AutoClass
implementation divides the clustering task among the processors
of a multicomputer so that each processor works on its own
partition and exchanges intermediate results with the other
processors. The system architecture, its implementation, and
experimental performance results on different processor
numbers and data sets are presented and compared with
theoretical performance. In particular, experimental and
predicted scalability and efficiency of P-AutoClass versus the
sequential AutoClass system are evaluated and compared.
Different from supervised learning, where training
examples are associated with a class label that expresses the
membership of every example to a class, clustering assumes no
information about the distribution of the objects and it has the
task to both discover the classes present in the data set and to
assign objects among such classes in the best way. A large
number of clustering methods have been developed in several
different fields, with different definitions of clusters and
similarity among objects. The variety of clustering techniques is
reflected by the variety of terms used for cluster analysis such as
clumping, competitive learning, unsupervised pattern
recognition, vector quantization, partitioning, and winner-take-
all learning.
Most of the early cluster analysis algorithms come
from the area of statistics and have been originally designed for
relatively small data sets. Fayyad et al [2], found that the
clustering algorithms have been extended to efficiently work for
knowledge discovery in large databases and, therefore, to
classify large data sets with high-dimensional feature items.
Clustering algorithms are very computing demanding and, thus,
require high-performance machines to get results in a
reasonable amount of time. HunterandStates[3] gives
classification algorithm on protein databases and experiences of
clustering algorithms taking one week or about 20 days of
computation time on sequential machines are not rare. Scalable
parallel computers can provide the appropriate setting where to
efficiently execute clustering algorithms for extracting
knowledge from large-scale databases and, recently, there has
been an increasing interest in parallel implementations of data
clustering algorithms. There are variety of parallel approaches
to clustering has been discovered by[4],[5],[6],[7].
Classification: Classification is one of the primary
data mining tasks[8]. The input to a classification system
consists of example tuples, called a training set, with each tuple
having several attributes. Attributes can be continuous, coming
from an ordered domain, or categorical, coming from an
unordered domain. A special class attribute indicates the label
or category to which an example belongs. The goal of
classification is to induce a model from the training set, that can
be used to predict the class of a new tuple. Classification has
applications in diverse fields such as retail target marketing,
fraud detection, and medical diagnosis (Michie, 1994). Amongst
many classification methods proposed over the years
[9][10]decision trees are particularly suited for data mining,
since they can be built relatively fast compared to other methods
and they are easy to interpret [11]. Trees can also be converted
into SQL statements that can be used to access databases
efficiently [12]. Finally, decision-tree classifiers obtain similar,
and often better, accuracy compared to other methods [10].
Prior to interest in classification for database-centric data
mining, it was tacitly assumed that the training sets could fit in
memory. Recent work has targeted the massive training sets
usual in data mining. Developing classification models using
larger training sets can enable the development of higher
accuracy models. Various studies have confirmed this[13].
Recent classifiers that can handle disk-resident data include
SLIQ [14], SPRINT [15], and CLOUDS[16]. As data continue
to grow in size and complexity, high performance scalable data
mining tools must necessarily rely on parallel computing
National Conference on Innovative Paradigms in Engineering & Technology (NCIPET-2012)
Proceedings published by International Journal of Computer Applications® (IJCA)
21
techniques. Past research on parallel classification has been
focused on distributed-memory (also called shared-nothing)
machines. Examples include parallel ID3 [17], which assumed
that the entire dataset could fit in memory; Darwin toolkit with
parallel CART [18]from Thinking Machine, whose details are
not available in published literature; parallel SPRINT on IBM
SP2 [15]; and ScalParC[19]on a Cray T3D. While distributed-
memory machines provide massive parallelism, shared-memory
machines (also called shared everything systems), are also
capable of delivering high performance for low to medium
degree of parallelismat an economically attractive price.
Increasingly SMP machines arebeing networked together via
high-speed links to form hierarchical clusters. Examples include
the SGI Origin 2000and IBM SP2 system which can have a 8-
way SMP as one high node. A shared-memory system offers a
single memory address space that all processors can access.
Processors communicate through shared variables in memory.
Synchronization is used to co-ordinate processes. Any processor
can also access any disk attached to the system. The SMP
architecture offers new challenges and trade-offs that are worth
investigating in their own right.
2. THE GPU ARCHITECTURE All Initially intended as a fixed many-core processor dedicated
to transforming 3-D scenes to a 2-D image composed of pixels,
the GPU architecture has undergone several innovations to meet
the computationally demanding needs of supercomputing
research groups across the globe. The traditional GPU pipeline
designed to serve its original purpose came with several
disadvantages. Shortcomings such as the v limited data reuse in
the pipeline, excessive variations in hardware usage, and lack of
integer instructions coupled with weak floating-point precision
rendered the traditional GPU a weak candidate for HPC. In
November 2006 [20], NVIDIA introduced the GeForce 8800
GTX with a novel unified pipeline and shader architecture. In
addition to overcoming the limitations of the traditional GPU
pipeline, the GeForce 8800 GTX architecture added the concept
of streaming processor (SMP) architecture that is highly
pertinent to current GP-GPU programming. SMPs can work
together in close proximity with extremely high parallel
processing power. The outputs produced can be stored in fast
cache and can be used by other SMPs. SMPs have instruction
decoder units and execution logic performing similar operations
on the data. This architecture allows SIMD instructions to be
efficiently mapped across groups of SMPs. The streaming
processors are accompanied by units for texture fetch (TF),
texture addressing (TA), and caches. The structure is maintained
and scaled up to 128 SMPs in GeForce 8800 GTX. The SMPs
operate at 2.35 GHz in the GeForce 8800 GTX, which is
separate from core clock operating at 575 MHz. Several GP-
GPUs used thus far for HPC applications have architectures that
are concurrent with the GeForce 8800 GTX architecture.
However, the introduction of the Fermi by Nvidia in September
2009 [21]has radically changed the contours of the GP-GPU
architecture, as we will explore in the next subsection.
GPU’s amazing evolution on both computational capability
and functionality extends application of GPUs to the field of
non-graphics computations, which is so-called general purpose
computation on GPUs (GPGPU) [22]. Design and development
of GPGPU are becoming significant because of the following
reasons:
1. Cost-performance: Using only commodity hardware is
important to achieve high-performance computing at a low
cost, and GPUs have become commonplace even in low-
end PCs. Due to the hardware architecture designed for
exploiting parallelism of graphics, even today’s low-end
GPU exhibits high-performance for data-parallel
computing. In addition, GPU has much higher sequential
memory access performance than CPU because one of
GPU’s key tasks is filling regions of memory with
contiguous texture data. That is, GPU’s dedicated memory
can provide data to GPU’s processing units at the high
memory bandwidth.
2. Evolution speed: GPU’s performance such as the
number of floating-point operations per second has been
growing at a rapid pace. Amazingly-evolving GPU
capabilities have a possibility to enable the GPU
implementation of a task to outperform its CPU
implementation in the future. Modern GPUs have two
kinds of programmable processors, vertex
shaderandfragment shader, on the graphics pipeline to
render an image.
Figure 1 illustrates a block diagram of the programmable
rendering pipeline of these processors. The vertex shader
manipulates transformation and lighting of vertices of
polygons to transform them into the viewing coordinate
system. Polygons projected into the viewing coordinate
system are then decomposed into fragments each
corresponding to a pixel on the screen. Subsequently, the
color and depth of a fragment are computed by the
fragment shader. Finally, composition operations such as
tests using depth, alpha and stencil buffers are applied to
the outputs of the fragment shader to determine the final
pixel colors to be written to the frame buffer.
It is emphasized here that vertex and fragment shaders
are developed to utilize multi-grain parallelism in the
rendering processes: the coarse-grain vertex/fragment level
parallelism and the fine-grain vector component level
parallelism. To exploit the coarse-grain parallelism at the
GPU level, individual vertices and fragments can be
processed in parallel. The fragment shaders (vertex
shaders) of recent GPUs have several processing units for
parallel-processing multiple fragments (vertices). For
example, NVIDIA’s high-end GPU, GeForce 6800 Ultra,
has 16 processing units in the fragment shader, and
therefore can compute colors and depths of up to 16
fragments at the same time. On the other hand, to exploit
the fine-grain parallelism involved in all vector operations,
they have SIMD instructions that can simultaneously
operate on four 32-bit floating-point values within a 128-
bit register. For example, one of the powerful SIMD
instructions, the “multiply and add” (MAD) instruction,
performs a component-wise multiply of two registers each
storing four floating-point components, and then does a
component-wise addition of the product to another register;
the MAD instruction performs these eight floating-point
operations in a single cycle. Due to its application-specific
architecture, however, GPU does not work well
universally. To exhibit high performance for a non-
graphics application, hence, we ought to consider how to
bind it to GPU’s programmable rendering pipeline.
National Conference on Innovative Paradigms in Engineering & Technology (NCIPET-2012)
Proceedings published by International Journal of Computer Applications® (IJCA)
22
Fig 1 Overview of a programmable rendering pipeline
The most critical restriction in GPU programming for non-
graphics applications is due to the restricted data flows in
and between the vertex shader and the fragment shader.
Arrows in Fig. 1 show typically-permitted data flows. Both
vertex and fragment shader programs have to write their
outputs to write-only dedicated registers; random access
writes are not provided. This is severe impediment to
effective implementation of many data structures and
algorithms. In addition, the lack of loop-controls,
conditionals, and branching is also serious for most of
practical applications. Although the latest GPUs with
Shader Model 3.0 support dynamic controls flows, there is
some overhead to flow-control operations, and they can
limit the GPU’s performance. If an application imposes the
restriction violation on the GPU programming model
mentioned above, it is not a good idea to implement the
application entirely on GPUs. For such an application,
collaboration between CPU and GPU often leads to better
performance, though it is essential to keep time-consuming
data transfer between them minimum. From the viewpoint
of data accessibility, the fragment shader is superior to the
vertex shader because the fragment shader can randomly
access the video memory and fetch data as texture colors.1
Furthermore, the fragment shader usually has more
processing units than the vertex shader, and thereby the
fragment shader has a great potential to exploit
dataparallelism more effectively. Consequently, this paper
presents an implementation of data clustering accelerated
effectively using multi-grain parallel processing on the
fragment shader.
2.1 GPU Computing with AMD/ATi Radeon
5870 The AMD/ATi’s Radeon 5870 architecture [23]is very different
compared to NVIDIA’s Fermi architecture. The AMD/ATi
Radeon 5870 used in our study has 1600 ALUs organized in a
different fashion compared to the Fermi. The ALUs are grouped
into five-ALU Very Long Instruction Word (VLIW) processor
units. While all five of the ALUs can issue the basic arithmetic
operations, only the fifth ALU can additionally execute
transcendental operations. The five-ALU groups along with the
branch execution unit and general-purpose registers form
another group called the stream core. This translates to 320
stream cores in all, which are further grouped into compute
units. Each compute unit has 16 stream cores, resulting in 20
total compute units in the ATi Radeon 5870. One thread can be
executed on one stream core, thus 16 threads can be run on a
single compute unit. In order to hide the memory latency, 64
threads are assigned to a single compute unit. When one 16-
thread group accesses memory, the other 16-thread group
executes on the ALU. Therefore theoretically, a throughput of
16 threads per cycle is possible on the Radeon architecture.
Each ALU can execute a maximum of 2 single-precision Flops:
multiply and add instructions per cycle. The clock rate of the
Radeon GPU is 850 MHz; for 1600 ALUs this translates to a
throughput of 2.72 TFlops/s. The Radeon 5870 has a memory
hierarchy that is similar to the Fermi’s memory hierarchy. The
hierarchy includes a global memory, L1 and L2 cache, shared
memory, and registers. The 1 GB global memory has the peak
bandwidth of 153 GB/s and is controlled by eight memory
controllers. Each compute unit has 8 KB L1 cache having an
aggregate bandwidth of 1 TB/s. Multiple compute units share a
512 KB L2 cache with 435 GB/s of bandwidth between L1 and
L2 cache. Each compute unit also has a 32 KB of shared
memory, providing a total 2 TB/s aggregate bandwidth. The
registers have the highest bandwidth, 48 bytes per cycle in each
stream core (aggregate bandwidth of 48 ∗ 320 ∗ 850 MB/s, i.e.,
13 TB/s). The 256 KB register space is available per compute
unit, totaling 5.1 MB for the entire GPU.
3. THE K-MEANS ALGORITHM In data clustering[24], multivariate data units are grouped
according to their similarity or dissimilarity. MacQueen used
the term k-means to denote the process of assigning each data
unit to that cluster (of k clusters) with the nearest centroid. That
is, k-means clustering employs the Euclidean distance between
data units as the dissimilarity measure; a partition of data units
is assessed by the squared error:
𝐸 𝐷 = 𝑘
𝑚𝑖𝑛𝑗 = 1
𝑥𝑖 − 𝑦𝑖 2 𝑚
𝑖=1 (3.1)
wherexi∈Rd, i= 1, 2, . . . ,m is a data unit and yj∈Rd , j = 1, 2, . . .
, k denotes the cluster centroid.
Although there are a vast variety of k-means algorithms
[5], for the sake of explanation simplicity, this paper focuses on
a simple and standard k-means algorithm summarized as
follows:
1. Begin with any desirable initial states, e.g. initial
cluster centroids may be drawn randomly from a
given data set.
2. Allocate each data unit to the cluster with the nearest
centroid. The centroids remain fixed through the
entire data set.
3. Calculate centroids of new clusters.
4. Repeat Steps 2 and 3 until a convergence condition is
met, e.g. no data units change their membership at
Step 2, or the number of repetitions exceeds a
predefined threshold.
At each repetition the assignment of m data units to k
clusters in Step 2 requires km distance computations (and (k -
1)mdistance comparisons) for finding the nearest cluster
centroids, the so-called nearest neighbor search. The cost of
each distance computation increases in proportion to the
dimension of data, i.e. the number of vector elements in a data
unit, d. The nearest neighbor search consists of approximately 3
dkmfloating-point operations, and thus the computational cost of
the nearest neighbor search grows at O(dkm). In practical
applications, the nearest neighbor search consumes most of the
execution time for k-means clustering because m and/or d often
become tremendous. However, the nearest neighbor search
involves massive SIMD parallelism; the distance between every
pair of a data unit and a cluster centroid can be computed in
parallel, and the distance computation can further be
parallelized according to their vector components.
This motivates us to implement the distance computation on
recent programmable GPUs as multi-grain SIMD-parallel
coprocessors. On the other hand, there is no necessity to
consider the acceleration of Steps 1 and 4 using GPU
programming, because they require little execution time and
further include almost no parallelism. In Step 3, cluster centroid
National Conference on Innovative Paradigms in Engineering & Technology (NCIPET-2012)
Proceedings published by International Journal of Computer Applications® (IJCA)
23
recalculation consists of dmadditions and dkdivisions of
floating-point values. Although most of these calculations can
be performed in parallel, conditionals and random access writes
are required for effective implementation of individually
summing up vectors within each cluster. In addition, the
divisions also require conditional branching to prevent divide-
by-zero errors. Since the execution time for Step 3 is much less
than that of Step 2, there is no room for performance
improvement that outweighs the overheads derived from the
lack of random access writes and conditionals in GPU
programming. Therefore, we decide to implement Steps 1, 3,
and 4 as CPU tasks.
4. DISCUSSION
Fig. 2 Parallel processing of data clustering that uses
GPU as a co-processor to exploit two kinds of data
parallelism in the nearest neighbor search Our preliminary analysis show that the data transfer from CPU
to GPU at each rendering pass is not a bottleneck in the nearest
neighbor search. This is because the large data set has already
been placed on the GPU-side video memory in advance; only
the geometry data of a polygon, including texture coordinates as
a cluster centroid, are transferred at eachrendering pass. On the
other hand, the data transfer from the GPU-side video memory
to the main memory induces a certain overhead even when
using the PCI-Express interface. Therefore, we should be
judicious about reading data back from GPU, even in the cases
of using GPUs connected via the PCI-Express interface. In our
implementation scheme, the overhead of the data transfer is
negligible except for trivial-scale data clustering because the
data placed on the GPU-side video memory are transferred only
once in each repetition. Accordingly, our implementation
scheme of data clustering with GPU co-processing can exploit
GPU’s computing performance without critical degradation
attributable to the data transfer between CPU and GPU.
5. CONCLUSION In this paper, further research tasks include using a cluster of
GPUs for texture classification. This could be done including
various GPUs on the same machine or dividing computations
into a cluster of PCs, and would significantly increase the
applicability of the architecture to complex industrial
applications. Using this approach, a deeper analysis on the
strategy of parallelization for multi-GPU introduced is needed.
Undoubtedly, the increase of performance will not be
proportional to the number of GPUs and will be damaged by
factors such as data communication and synchronization among
different hosts. we have proposed a three-level hierarchical
parallel processing scheme for the k-means algorithm using a
modern programmable GPU as a SIMD-parallel co-processor.
Based on the divide-and-conquer approach, the proposed
scheme divides a large-scale data clustering task into subtasks
of clustering small subsets, and the subtasks are executed on a
PC cluster system in an embarrassingly parallel manner. In the
subtasks, a GPU is used as a multigrain SIMD-parallel co-
processor to accelerate the nearest neighbor search, which
consumes a considerable part of the execution time in the k-
means algorithm. The distances from one cluster centroid to
several data units are computed in parallel. Each distance
computation is parallelized by component-wise SIMD
instructions. As a result, the parallel data clustering with GPU
co-processing significantly improve the computational
efficiency of massive data clustering. Experimental results
clearly show that the proposed hierarchical parallel processing
scheme remarkably accelerate massive data clustering tasks.
Especially, acceleration of the nearest neighbor search by GPU
co-processing is significant to save the total execution time in
spite of the overhead of the data transfer from the GPU-side
video memory to the CPU-side main memory. GPU co-
processing is also effective to retain the scalability of the
proposed scheme by accelerating the aggregation stage that is a
non-parallelized part of the proposed scheme.
This paper has discussed the GPU implementation of the nearest
neighbor search, compared with the CPU implementation to
clarify the performance gain of GPU co-processing. However,
the multi-threading approach has a possibility to allow both
CPU and GPU to execute the nearest neighbor search in parallel
without interrupting each other. In such an implementation,
hence, GPU co-processing will always bring additional
computing power even in the case where only a low-end GPU is
available. The multi-threading implementation with effective
load balancing between CPU and GPU will be investigated in
our future work.
6. REFERENCES [1] C. Pizzuti and D. Talia, "P-AutoClass: Scalable Parallel
Clustering for Mining Large Data Sets," IEEE
Transactions On Knowledge And Data Engineering,vol.
15, no. 3, pp. 629-641, 2003.
[2] U. Fayyad, G. Piatesky-Shapiro and P. Smith, From Data
Mining to Knowledge Discovery: An Overview, NY:
AAAI/MIT Press, 1996.
[3] L. Hunter and D. States, "Bayesian Classification of
Protein Structure," Expert, vol. 7, no. 4, pp. 67-75, 1992.
[4] C. Olson, " Parallel Algorithms for Hierarchical
Clustering," Parallel Computing, vol. 21, pp. 1313-1325,
1995.
[5] D. Judd, P. McKinley and A. Jain, "Large-Scale Parallel
Data Clustering," in Int'l Conf. Pattern Recognition,
New York, 1996.
[6] J. Potts, Seeking Parallelism in Discovery Programs,
Arlington: Master Thesis : Univ. of Texas, 1996.
[7] K. Stoffel and A. Belkoniene, "Parallel K-Means
Clustering for Large Data Sets," in Parallel Processing,
UK, 1999.
[8] R. Agrawal, T. Imielinski and A. Swami, "Database
mining: A performance perspective," vol. 5, no. 6, p.
914–925, Dec 1993.
[9] S. Weiss and C. Kulikowski, Computer Systems that
Learn. ,, vol. 1, New York: Morgan Kaufman, 1991.
[10] D. Michie, Machine Learning, Neural and Statistical
Classification, vol. I, NJ: Ellis Horwood, 1994.
[11] J. Quinlan, Programs for Machine Learning, vol. I, New
York: Morgan Kaufman, 1999.
[12] R. Agrawal, "An interval classifier for database mining
applications," in VLDB Conference, New York, Aug
1992.
[13] J. Catlett, Megainduction Machine Learning on Very
Large Databases. PhD thesis,, vol. I, Sydney: Univ. of
Sydney, 1991.
National Conference on Innovative Paradigms in Engineering & Technology (NCIPET-2012)
Proceedings published by International Journal of Computer Applications® (IJCA)
24
[14] M. Mehta, R. Agrawal and J. Rissanen, "SLIQ: A fast
scalable classifier for data mining," in 5th Intl. Conf. on
Extending Database Technology, NJ, March 1996.
[15] J. Shafer, R. Agrawal and M. Mehta., "SPRINT: A
scalable parallel classifier for data mining," in 22nd
VLDB Conferenc, NJ, Sept 1996.
[16] K. Alsabti, S. Ranka and V. Singh, "CLOUDS: A
decision tree classifier for large datasets," in 4th Intl.
Conf. on Knowledge Discovery and DataMining, Aug
1998.
[17] D. Fifield, Distributed tree construction from large data-
sets: Bachelor Thesis,, Australian Natl. Univ., 1992.
[18] L. Breiman, Classification and Regression Trees,
Belmont: Wadsworth, 1984.
[19] M. Joshi, G. Karypis and V. Kumar, ScalParC: A
scalable and parallel classification algorithm for mining
large datasets, Intl. Parallel Processing Symp, 1998.
[20] "Technical Brief: NVIDIA GeForce 8800 GPU
architecture overview," [Online]. Available:
www.nvidia.com.
[21] "NVIDIA’s next generation CUDA compute
architecture: Fermi," [Online]. Available:
http://www.nvidia.com/content/
PDF/fermi_white_papers/NVIDIAFermiComputeArchit
ectureWhitepaper.pdf.
[22] Z. Fan, F. Qiu, A. Kaufman and S. Yoakum-Stover,
"GPU cluster for high performance computing," NY,
2004.
[23] "ATI Mobility Radeon HD 5870 GPU specifications,"
[Online]. Available:
http://www.amd.com/us/products/notebook/graphics/ati-
mobility-hd-5800/Pages/hd-5870-specs.aspx.
[24] D. Judd, P. McKinley and A. Jain, "Performance
Evaluation on Large-Scale Parallel Clustering in NOW
Environments," in Eighth SIAM Conf. Parallel
Processing for Scientific Computing, Mar 1997.
FEMBI REKRISNA GRANDEA PUTRA M0513019
UJIAN KOMPETENSI DASAR 2 PENGANTAR ORGANISASI KOMPUTER
Algoritma Data Mining seperti klasifikasi dan clustering adalah masa depan komputasi,
meskipun dataprocessing multidimensi diperlukan. Orang yang menggunakan prosesor multicore
dengan GPU. Sebagian besar bahasa pemrograman tidak menyediakan fasilitas multiprocessing dan
karenanya menjadi pemborosan pengolahan sumber daya. Clustering dan klasifikasi algoritma yang
mengonsumsi lebih banyak sumber daya. Dalam makalah ini telah menunjukkan strategi untuk
mengatasi kekurangan tersebut dengan menggunakan multicore processing platform OpenCL.
CLUSTERING adalah teknik pembelajaran tanpa pengawasan yang memisahkan item data
menjadi beberapa kelompok, sehingga item dalam cluster yang sama lebih mirip satu sama lain dan
barang-barang dalam kelompok yang berbeda cenderung berbeda, menurut beberapa ukuran
kesamaan atau kedekatan. Pizzuti dan Talia menyajikan teknik P-AutoClass untuk Clustering Paralel
Scalable untuk Set Data Mining Besar pengelompokan data merupakan tugas penting di bidang data
mining. Clustering adalah klasifikasi terawasi item data ke dalam kelompok homogen yang disebut
cluster. Metode Clustering partisi satu set item data ke dalam cluster, sehingga item dalam cluster
yang sama lebih mirip satu sama lain daripada item dalam kelompok yang berbeda menurut beberapa
kriteria yang ditetapkan. Algoritma Clustering adalah komputasi intensif, terutama ketika mereka
digunakan untuk menganalisis data dalam jumlah besar. Sebuah pendekatan yang mungkin untuk
mengurangi waktu pemrosesan didasarkan pada pelaksanaan algoritma klasterisasi pada komputer
paralel scalable. Makalah ini menjelaskan desain dan implementasi dari P-AutoClass, versi paralel dari
sistem AutoClass berdasarkan model Bayesian untuk menentukan kelas optimal dalam set data yang
besar. Pelaksanaan P-AutoClass membagi tugas pengelompokan antara prosesor multicomputer
sehingga setiap prosesor bekerja pada partisi sendiri dan pertukaran hasil antara dengan prosesor
lainnya. Sistem arsitektur, implementasi, dan hasil kinerja eksperimen pada nomor prosesor yang
berbeda dan set data disajikan dan dibandingkan dengan kinerja teoritis. Secara khusus, percobaan
dan diprediksi skalabilitas dan efisiensi P-AutoClass versus sistem AutoClass berurutan dievaluasi dan
dibandingkan.
Semua Awalnya dimaksudkan sebagai prosesor banyak-inti tetap didedikasikan untuk
mengubah 3-D layar untuk gambar 2-D yang terdiri dari pixel, arsitektur GPU telah mengalami
beberapa inovasi untuk memenuhi tuntutan kebutuhan komputasi dari superkomputer kelompok
penelitian di seluruh dunia. Pipa GPU tradisional dirancang untuk melayani tujuan aslinya datang
dengan beberapa kelemahan. Kekurangan seperti v-reuse terbatas data dalam pipa, variasi yang
berlebihan dalam penggunaan hardware, dan kurangnya instruksi integer ditambah dengan lemah
floating-point presisi diberikan GPU tradisional kandidat yang lemah untuk HPC. Pada bulan November
2006, NVIDIA memperkenalkan GeForce GTX 8800 dengan pipa terpadu baru dan arsitektur shader.
Selain mengatasi keterbatasan pipa GPU tradisional, arsitektur GeForce GTX 8800 menambahkan
FEMBI REKRISNA GRANDEA PUTRA M0513019
UJIAN KOMPETENSI DASAR 2 PENGANTAR ORGANISASI KOMPUTER
konsep streaming processor (SMP) arsitektur yang sangat relevan untuk saat ini pemrograman GP-
GPU. SMP dapat bekerja sama di dekat dengan kekuatan pemrosesan paralel yang sangat tinggi.
Output yang dihasilkan dapat disimpan dalam cache cepat dan dapat digunakan oleh SMP lainnya.
SMP memiliki unit decoder instruksi dan eksekusi logika melakukan operasi serupa pada data.
Arsitektur ini memungkinkan instruksi SIMD untuk secara efisien dipetakan di kelompok SMP.
Prosesor streaming yang disertai oleh unit untuk tekstur fetch ( TF ), tekstur menangani (TA ), dan
cache. Struktur dipertahankan dan ditingkatkan untuk 128 SMP di GeForce 8800 GTX. The SMP
beroperasi pada 2,35 GHz di GeForce 8800 GTX, yang terpisah dari core clock 575 MHz yang beroperasi
di. Beberapa GPGPUs digunakan sejauh ini untuk aplikasi HPC memiliki arsitektur yang bersamaan
dengan arsitektur GeForce GTX 8800. Namun, pengenalan Fermi oleh Nvidia pada bulan September
2009 [ 21 ] telah secara radikal mengubah kontur arsitektur GP-GPU, seperti yang kita akan
mengeksplorasi pada subseksi berikutnya.
Analisis awal kami menunjukkan bahwa transfer data dari CPU ke GPU di setiap pass render
bukanlah hambatan dalam pencarian tetangga terdekat. Hal ini karena set data yang besar telah
ditempatkan pada memori video GPU-side terlebih dahulu, hanya data geometri poligon, termasuk
koordinat tekstur sebagai centroid cluster ditransfer pada eachrendering lulus. Di sisi lain, transfer
data dari memori video GPU-sisi ke memori utama menginduksi overhead tertentu bahkan ketika
menggunakan interface PCI-Express. Oleh karena itu, kita harus bijaksana tentang membaca kembali
data dari GPU, bahkan dalam kasus menggunakan GPU terhubung melalui antarmuka PCI-Express.
Dalam skema implementasi kami, overhead transfer data diabaikan kecuali untuk clustering data yang
sepele skala karena data ditempatkan pada memori video GPU-side ditransfer hanya sekali dalam
setiap pengulangan. Dengan demikian, skema implementasi kami pengelompokan data dengan GPU
co-processing dapat memanfaatkan kinerja komputasi GPU tanpa degradasi kritis akibat dari tindakan
transfer data antara CPU dan GPU.
Dalam tulisan ini, tugas penelitian lebih lanjut termasuk menggunakan sekelompok GPU untuk
klasifikasi tekstur. Hal ini bisa dilakukan termasuk berbagai GPU pada mesin yang sama atau membagi
perhitungan menjadi sekelompok PC, dan secara signifikan akan meningkatkan penerapan arsitektur
untuk aplikasi industri yang kompleks. Dengan menggunakan pendekatan ini, analisis yang lebih
mendalam pada strategi paralelisasi untuk multi-GPU diperkenalkan diperlukan. Tidak diragukan lagi,
peningkatan kinerja tidak akan sebanding dengan jumlah GPU dan akan rusak oleh faktor-faktor
seperti komunikasi data dan sinkronisasi antara host yang berbeda. kami telah mengusulkan skema
pengolahan tiga tingkat hirarki paralel untuk algoritma k-means menggunakan GPU diprogram
modern sebagai co-prosesor SIMD-paralel.
PLEASE SCROLL DOWN FOR ARTICLE
This article was downloaded by: [TÜBİTAK EKUAL]On: 5 July 2010Access details: Access Details: [subscription number 786636116]Publisher Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
International Journal of ElectronicsPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713599654
An efficient memory allocation algorithm and hardware design withVHDL synthesisF. Karabibera; A. Sertbasa; S. Ozdemirb; H. Camc
a Computer Engineering Department, Istanbul University, Avcılar, Istanbul b Computer EngineeringDepartment, Gazi University, Maltepe, Ankara c Computer Science and Engineering Department,Arizona State University Tempe,
To cite this Article Karabiber, F. , Sertbas, A. , Ozdemir, S. and Cam, H.(2008) 'An efficient memory allocation algorithmand hardware design with VHDL synthesis', International Journal of Electronics, 95: 2, 125 — 138To link to this Article: DOI: 10.1080/00207210701828085URL: http://dx.doi.org/10.1080/00207210701828085
Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf
This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.
International Journal of Electronics,
Vol. 95, No. 2, February 2008, 125–138
An efficient memory allocation algorithm and hardware design with
VHDL synthesis
F. KARABIBER*y, A. SERTBASy, S. OZDEMIRz and H. CAM‰
yComputer Engineering Department, Istanbul University, 34320, Avc|lar, IstanbulzComputer Engineering Department, Gazi University, 06570, Maltepe, Ankara
‰Computer Science and Engineering Department, Arizona State University Tempe, AZ 85287
(Received 27 November 2006; in final form 21 November 2007)
This paper presents a hardware-efficient memory allocation technique, calledEMA, that detects the existence of any free block of requested size in memory.EMA can allocate a free memory block of any number of chunks in any partof memory without having any internal fragmentation. The gate-level designof the hardware unit, along with its area-time measurements is given inthis paper. Simulation results indicate that EMA is fast and flexible enough toallocate/deallocate a free block in any part of memory resulting in efficientutilization of memory spaces. In addition, the VHDL synthesis with FPGAimplementation shows that EMA has less complicated hardware, and is fasterthan the known hardware techniques.
Keywords: Memory allocation; Buddy system; Digital system design; VHDLsynthesis; Simulation; FPGA
1. Introduction
The high performance algorithms for dynamic memory allocation (DMA) have beenof considerable interest in the literature as DMA is a critical factor for improving acomputer system’s performance. The common allocation techniques can be dividedinto four categories: sequential fits, segregated fits, buddy system and bitmapped fits.The sequential fits, and segregated fits algorithms keep a free list of all availablememory chunks and scan the list for allocation. These two methods produce goodmemory utilization (storage), but incur a time penalty because of scanning the freespace list. The bitmapped fits method uses two bitmap for allocation process, one forthe requested allocation size and another one for encoding the allocated blockboundaries, i.e. the size of allocated blocks. If the object sizes are small, it can havespace advantages, otherwise it causes long search times.
The buddy system, introduced by Knowlton (1965), is a fast and simple memoryallocation technique which allocates memory in blocks whose lengths are power of 2.If the requested block size is not a power of 2, then the size is rounded up to the nextpower of two. This may leave a big chunk of unused space at the end of an allocatedblock (Puttkamer 1975), thereby resulting in internal fragmentation that occurs in
*Corresponding author. Email: [email protected]
International Journal of Electronics
ISSN 0020–7217 print/ISSN 1362–3060 online � 2008 Taylor & Francis
http://www.tandf.co.uk/journals
DOI: 10.1080/00207210701828085
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
the case of more memory allocation than requested. The buddy system also suffersfrom external fragmentation that arises when a request for memory allocationcannot be satisfied, even though the total amount of free space is adequate.
The performance of the buddy system can be improved using hardwaretechniques (Puttkamer 1975, Page and Hagins 1986, Chang and Gehringer 1991).A modified hardware-based buddy system which eliminates internal fragmentation isproposed in (Chang and Gehringer 1996). In this allocation technique, the memory isdivided into the fixed-sizes word groups called chunks, and their statusses aredetermined by using a bit vector. Each bit on the vector represents a leaf of theor-gate tree given in figure 1. The method in Chang and Gehringer (1996) can detecta free block of size j only if the starting address of the free block is a factor of j, ork� j, where k� 0 and j is a power of 2. Even though a block with the requested freespace is available in the memory, their hardware design may not be able to detect itdue to the limitations of the or-gate tree structure. Another memory allocationtechnique that detects free blocks of size2dlog2 ke is proposed in Cam, Abd-El-Barr andSait (1999).
In recent years, the multiple buddy systems that predefine the size set have beenintroduced, in order to reduce the external fragmentation (Chang, Srisa-an, Dan Loand Gehringer 2002). However, the modified buddy system still performs better thanthe multiple buddy system. The active memory management algorithm based on themodified buddy system is implemented in a hardware unit (AMMU), and embeddedinto SoC design in Agun and Chang (2001). The dynamic memory management unitallows easy integration into any processor, but it requires high memory for the bigobject sizes and numbers, since the total amount of memory space for the needed bitmaps is proportional to the object size and the numbers of objects. Finally, a bitmapbased memory allocator is designed in combinational logic, works in conjunctionwith application-specific instruction set extension (Chang et al. 2002).
This paper presents a fast and efficient memory allocation (EMA) algorithm todetect any available free block of requested size and to minimize internal andexternal fragmentation. The proposed technique can allocate a free memory block ofany length located in any part of a memory. EMA can allocate the free blocks of size2blog2 kc þ 2dlog2ðk�2Þblog2 kc
e and it is capable of detecting any available free block of
Figure 1. Generic structure of or-gate prefix circuit.
126 F. Karabiber et al.
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
requested size using a proposed or-gate prefix circuit. Furthermore, the proposedcircuits for implementing EMA are less complicated than those introduced inChang and Gehringer (1996); Lo, Srisa-an and Chang (2001). The simulation resultsshow that EMA occupies approximately 9.2% less memory space than the modifiedbuddy system (Chang and Gehringer 1996). In addition, EMA hardware has beensynthesized with VHDL, tested for several measurements, such as the meanallocation and deallocation time, total area etc., and compared to the memorymanagement system in Agun and Chang (2001). Simulation results show that EMAresults in a significant improvement on memory allocation behaviour compared toAgun and Chang (2001) by offering 22% less external fragmentation.
The rest of the paper is organised as follows; x 2 describes the proposed memoryallocation algorithm: x 3 presents the detailed hardware design of the allocator/deallocator proposed in this work: x 4 includes the EMAs performance analysis withthe simulation, the VHDL analysis and FPGA implementation results; concludingremarks are made in x 5.
2. Efficient memory allocation algorithm
Consider that the memory is partitioned into a number of chunks which have thesame number of words and a memory block consists of one or more chunks. Thestatus of all memory chunks by either a 0 or a 1 depending on whether the chunk isfree or used, respectively, is represented by a bit-vector. In a bit-vector, memoryallocation information is held. The bits of the bit-vector are labelled from left to rightin ascending order, starting with 0. Each bit of the bit-vector has an address registercontaining its label as the address. Figure 2 shows a simplified flowchart ofAlgorithm EMA which is presented next.
Algorithm:
Input:Allocation: the size value k of the requested blocks for allocationDeallocation: the starting address of the block to be deallocated
Output:Allocation
(i) the starting address of the allocated block,(ii) the bits corresponding to the allocated blocks are inverted from 0 to 1.
Deallocation: the bits corresponding to the deallocated blocks are inverted from1 to 0.
In order to implement the above algorithm, the logic structures of the or-gate-prefix, the search-free block and the detect-free block circuits are given in x 3.Step 1 determines the request whether memory allocation or deallocation. Step 2detects the free memory chunks of size 2blog2 kc. If k is not a power of 2, thenStep 3 detects the free memory chunks of size 2blog2 kc þ 2dlog2ðk�2blog2 kce. Step 4selects the highest address with all detected free blocks. Step 5 allocates the blocksand inverts the k bits from 0 to 1 of the bit-vector corresponding to chunks ofallocated blocks. Step 6 frees the blocks and inverts the k bits from 1 to 0corresponding to the chunks of the deallocated blocks.
An efficient memory allocation algorithm 127
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
3. Hardware design of the memory allocator
In order to implement the above algorithm, the search-free block and detect-freeblock circuits are designed. Details of the circuits and algorithm steps are given by
the following.
3.1 Search of free blocks (steps 2 and 3)
To detect all free blocks of size 2blog2 kc þ 2dlog2ðk�2blog2 kce, we use the or-gate-prefixcircuit whose logic circuit structure was given in Cam et al. (1999). The or-gate-prefixcircuit that is designed for a memory of N chunks is shown in figure 1. Each node atlevel Li represents an OR gate. As seen from figure 1, there are nþ 1 level selectors
labelled S0; S1, . . . ,Sn for a 2n-bit vector. For any free block of size 2i, therewill be exactly one corresponding or-gate node with value 0 at level Li of the
Figure 2. Block diagram of EMA algorithm.
128 F. Karabiber et al.
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
or-gate-prefix circuit. The outputs of all or-gates are inverted and then become theinputs of the tri-state buffers. When level selector Si is asserted, the outputs of thosetri-state buffers which correspond to free blocks generate their associated verticallines V(j), j� 0 depicted in figure 3. These vertical lines (called V-vector) generate theaddress associated with the first chunk of the available blocks of the requested size.When a block of size k is requested depending on k, in Step 2 or Step 3, only levelselector Si is asserted, i ¼ 2blog2 kc or i ¼ dlog2ðk� 2blog2 kce, respectively.
In this technique, the or-gate-prefix circuit can detect any free block if its size is apower of 2, no matter where the free block is located in the memory. If k is a powerof 2, the technique uses the or-gate-prefix circuit only once in step 2. In step 2, freeblocks of size 2blog2 kc are detected by a high-priority encoder, then the decoded bitsare compared with the requested block sizes, if they are the same it can easily be seenthat the requested size is a power of 2, and S1 is selected as the input size of theor-gate-prefix circuit (S). However, if k is not a power of 2, the or-gate-prefix circuitis used twice (step 2 and step 3). In step 3, instead of bit-vector bits, the NANDedV-vector bits are used, shown in figure 4. Using this new bit-vector, the algorithmdetects the free blocks of size 2dlog2ðk�2blog2 kce which are equivalent to free blocks of size2blog2 kc þ 2dlog2ðk�2blog2 kce in the original bit-vector.
As seen in figure 4, the subtractor subtracts the requested block sizes, k from S1.This difference (k-S1) corresponds to the free blocks of size 2dlog2ðk�2blog2 kce. As thesimilar procedure in step 2, in step 3, by an (high priority) encoder then a decodercircuits the obtained block size is loaded into a shift register, holds the content of S2.If the decoded size bits are not equal to the difference size, the shift register is enabledto shift one bit position to left, otherwise the decoded size bits are only loaded intothe register without any shift.
Figure 3. The or-gate-prefix logic circuit.
An efficient memory allocation algorithm 129
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
Example 1: Assume that the requested memory size is 38 chunks. In step 2, usingthe or-gate prefix circuit EMA detects free blocks of size 32 ¼ 2blog2 38c and activatesthe V-vector address bits of those blocks by S5¼ 1. Memory request size is not apower of 2, therefore EMA employs the or-gate-prefix circuit again to detect all thefree blocks of size 2blog238c þ 2dlog2ð38�2blog238ce ¼ 32þ 8 ¼ 40. Since, in Step-2, addressregisters which hold V-vector are activated from level S5 of the or-gate prefix circuiteach active register represents a free block of size 32. Moreover, 9 consecutive activeaddress registers are equal to a free block of size 40, because or-gate-prefix circuit hasthe following property: if the address register a represents the n chunks of memory,say bit-vector bits 1 to n, then the address register aþ 1 represents the bit-vector bits2 to nþ 1. EMA takes advantage of this property of the or-gate prefix circuit todetect the free blocks of size 40 by using V-vector in the bit-vector of the or-gate-prefix circuit. After the NAND operation, the free block of size 40 is represented by 8consecutive active bits of V1 register, which can be detected by or-gate prefix circuit.The results of NAND operations are inserted into bit-vector by inverting each result,so that the active bits of the new bit-vector are represented by 0s in the original bit-vector. Finally, in Step-3, EMA detects the free blocks of size 40 by finding the 8consecutive active bits in the new bit-vector.
3.2 The free block detection with the highest address (step 4)
This step is to determine the free block whose first chunk address is the greatest; thefirst chunk’s address of a block is called its starting address. The bits of V-vectorgenerated by the or-gate-prefix circuit indicate that the requested blocks are to beallocated or not. If the corresponding k bits are 1s, they are allocatable for k size ofthe free blocks. When more than one bit is set in V-vector, the selection of the highestaddress corresponding to these bits is achieved by using a high-priority encodercircuits.
Example 2: Consider the memory is partitioned into 24 chunks that have an equalnumber of words each. Each memory chunk is represented by one bit of the bit-vector as shown in figure 5. Assume that a free block of size 2 is requested. The levelselector S shown in figure 5 is activated and the V-vector bits with addresses 0010,
Figure 4. The Search and Detect-free block circuit.
130 F. Karabiber et al.
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
0101, 1001, 1010 are set. The function of the (High-Priority) Encoder is to select thefree block of size 2, having the highest address. So, the output of the encoder isobtained as ‘1010’.
3.3 Bit inversion (step 5)
Let us denote the starting and ending addresses of the determined blocks, SA andEA, respectively. Using the formula EA ¼ SAþ k� 1, EA can be easily computedfor the size k of the requested block. Since the V-vector represent the bitscorresponding to the allocated block, SA and EA correspond to the addresses of thefirst and last bit, of this vector, respectively. The bits of V-vector must be inverted to1 to indicate that they are allocated.
This is achieved in two steps by the circuit illustrated in figure 6. The objective ofthe first step is to set the gates connected to the (EAþ 1)th bit in such a way that thebits with the addresses, such as EAþ 1; EAþ 2, cannot be inverted in the secondstep. The goal of the second step is to invert all the bits of the subvector V.
As soon as the free block with the highest address is determined in step 4, the bitinversion operation is simultaneously done to update the status of the bit vector.Therefore, the bit-inversion circuit does not contribute any delay to the currentmemory allocation request. The only constraint imposed by the bit-inversion circuitis that the next memory allocation request cannot start until they complete theinversion of the bits in V.
In the first step of the bit-inversion operation, the reset line shown in figure 6 isasserted high, and EAþ 1 is put on the address lines of the address-control unit. Foreach bit of the bit-vector, there is a corresponding match cell illustrated by a dashed-line box in figure 6. The comparator in each match cell compares the address on theaddress lines with the contents of the address register in its match cell.
Obviously, only the comparator associated with the bit EAþ 1 shows a match,because the address lines carry the address EAþ 1. Therefore, only the output of thecomparator of the match cell EAþ 1 is 1, and the outputs of all other comparatorsare 0. The contents of the bit-latch in a match cell can be changed only when the resetline is 1, so that the reset line acts like an enable line for the bit-latch. Since the resetline is also 1 during the first step only, the number 0 is stored in the bit-latch of the
Figure 5. The contents of the bit-vector and V-vector for the example.
An efficient memory allocation algorithm 131
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
match cell EAþ 1, and all the other bit-latches of the match cells store the number 1.Note that the output of the bit-latch of the match cell EAþ 1 is an input to an ANDgate whose output automatically becomes 0. Finally, the reset line is made 0 and thefirst step ends. The settings of the circuit in the first step ensure that the bits with theaddresses such as EAþ 1; EAþ 2 cannot be inverted in the second step.
In the second step of the bit-inversion operation, all the bits of the subvector Vare inverted. To accomplish this, the address lines of the address-control unit infigure 6 are set to the starting address SA, and the set line is asserted high. Thecomparator of each match cell again compares its address with the address SAcarried by the address lines. Consequently, only the output of the comparator of thematch cell SA becomes 1. Because the set line is also 1, both inputs of one of theAND gates in the match cell SA become 1, thereby resulting in an output 1. Thisleads the output of the OR gate, connected to the AND gate, to be 1. The output ofthis OR gate is connected to the AND gate which enables the bit-inverter with theaddress SA. Because the bit-latches of the match cells with addresses SA to EA havebeen set to 1 in the first step, both inputs of the AND gates located between OR gatesare 1s. Therefore, the outputs of the OR gates connected to the bit-inverters SA toEA inclusive become 1 sequentially. Hence, all the bit-inverters from SA to EA areenabled. This amounts to saying that all the bits of the subvector V are invertedduring step 2. At the end of step 2, the set line is made 0.
Figure 6. Bit-inversion circuit.
132 F. Karabiber et al.
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
3.4 Memory deallocation (step 6)
In case of memory deallocation, the starting address SA and the size k of the block tobe deallocated are given. Since the ending address EA of the block is known, instep 6, bit inverters invert the bits from 1 to 0 to indicate that they are free.
4. Simulation results and VHDL synthesis
This section consists of two parts. In the first part, a comparative performanceanalysis of EMA and previous memory allocation techniques are given. The secondpart presents how EMA is implemented in hardware using VHDL synthesis. Also inthis part, comparison of EMA and active memory managament unit (AMMU)(Agun and Chang 2001) is given using criteria of fragmentation, execution time andchip area.
4.1 Simulation results
A simulator to compare the memory utilization of EMA with the following threememory allocators was written: Chang and Gehringer’s memory allocator (CGMA)(Chang 1996), the complete multiple buddy system (CMBS) (Chang 1997) andmultiple buddy system (MBS) (Lo et al. 2001). Similar to the simulations in Lo et al.(2001), the MBS in our simulations has the same size set of (2N, 3� 2N), that is, MBSis able to allocate any requested block of size 2n or 3� 2n. It is stated in Chang (1997)that CMBS yields the best memory utilization among all previously known buddysystems, therefore CMBS is also used as a benchmark in the simulations.
The simulator uses memory allocation/deallocation traces collected using eightJava benchmark programs of SPECjvm98 (1998). These benchmark programs aredesigned to measure the performance of Java virtual machine (JVM) implementa-tions. The Kaffe (2002) JVM version 1.0.7 is customized to generate a trace file ormemory allocation/deallocation requests during the execution of each Java programof SPECjvm98.
As in Chang and Gehringer (1996), the performance metric used in thesimulations is the highest address values allocated (HAA) in the memory. Thesimulator takes a previously generated memory trace file and generates the highestaddress values for the allocation/deallocation of the requested memory blocks byeach of the above four memory allocators. The simulation results in table 1 indicatethat, CMBS has the best memory utilization by generating the lowest HAAs, and isfollowed by EMA, CGMA and MBS, respectively.
Also, table 1 shows that the average memory utilization of EMA is 9.2% and13.8% better than CGMA and MBS, respectively. This is due to following facts.
. EMA is capable of allocating an available block of the requested size whereverit is in the memory.
. In EMA the size of the free memory block that has to be detected to allocate arequest is smaller than the ones in CGMA and MBS.
Although this memory utilization improvement of 9.2% may not seem to be a bignumber, it is indeed a significant improvement knowing that the best buddy typememory allocator CMBS is shown to be around only 20% better than the modified
An efficient memory allocation algorithm 133
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
buddy system (Lo et al. 2001). Moreover, the memory utilization improvement ofEMA may be increased further when the circular prefix circuit is used.
Table 2 shows the number of allocation/deallocation requests (NADR) andAverage Allocation/Deallocation Size (AADS) of each simulation program and thepercentage of memory overhead associating with EMA, CGMA (Chang andGehringer 1996), and MBS (Lo et al. 2001) compared to CMBS (Chang 1997).
As seen from tables 1 and 2, CMBS has the best memory utilization, whereasEMA clearly outperforms CGMA and MBS, resulting in lower HAAs for allbenchmark programs. However, implementation cost of CMBS is much higher thanEMA, because the number of nodes in CMBS’s or-gate tree is O(N2) where EMA’sor-gate tree has only O(NlogN) nodes. CGMA and MBS both have O(N) nodes intheir or-gate trees. The implementation costs of above memory allocators in termsof the number of nodes in their or-gate trees are presented in figure 7 for variousbit-vector sizes. Figure 7 indicates that CMBS is extremely expensive to implementcompared to EMA, CGMA and MBS.
4.2 VHDL synthesis and test results
In this section, the chip area and execution time of the proposed hardware unit areexamined. For different parameters, such as bit-vector size and the maximum object
Table 1. Highest Address Allocated (HAA) values in number of blocks whenblock size is eight.
Trace programs CMBS EMA CGMA MBS
Check 1,09,895 1,39,239 1,71,769 1,95,625Compress 1,04,808 1,33,085 1,62,738 1,74,129Jess 3,55,718 4,05,967 4,57,114 5,33,299Db 1,92,531 2,24,888 2,60,012 2,73,879Javac 3,13,403 3,42,351 3,91,277 3,74,762Mpegaudio 1,38,698 1,69,043 2,14,083 2,39,170Mtrt 7,41,602 9,02,414 9,28,932 9,50,535Jack 5,22,589 6,19,558 6,50,956 6,66,731
Table 2. Percentage overhead of memory allocators compared to CMBS, NADRand AADS.
Trace programs EMAoverhead CGMAoverhead MBSoverhead NADR AADs
Check 21.07 36.02 43.82 86,738 54Compress 21.25 35.60 39.81 77,233 53Jess 12.38 22.18 33.30 21,184,857 46Db 14.39 25.95 29.70 65,50,585 25Javac 8.46 19.90 16.37 8,22,007 41mpegaudio 17.95 35.21 42.01 92,525 53Mtrt 17.82 20.17 21.98 13,843,281 32Jack 15.65 19.72 21.62 28,039,374 36
134 F. Karabiber et al.
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
size, test results are obtained. Bit vector is used for the status and capacity ofmemory. The maximum object size is defined as the maximum number of chunksthat can be allocated once. In this application, for the different allocation requests,the bit-vector sizes range from 64 byte up to 512 byte whereas the maximumallocation sizes vary from 16 to 128 blocks. For this purpose, Xilinx ISE (2005)6.2.03i tool is used to generate a gate level representation of the memory allocator/deallocator hardware design. The hardware unit is implemented in Virtex II proFPGA (2005), having the device type XC2VP100, package type ‘FF’, number of pins1696, speed grade-5. The analysis performed on a desktop computer that has Intel’sPentium IV running at 2GHz with 512 Mbyte of RAM.
The results regarding the FPGA implementation of the allocator/deallocator unitare given in table 3. As seen from table 3, only 16% of the total FPGA can be usedfor the maximum bit-vector size selected in EMA implementation. It should be notedthat bit vector size can be increased with respect of application. These results alsoshow that the amount of FPGA used in the implementation is slightly affected by theparameter of the maximum object size.
Table 4 shows that the number of cycles needed to perform an allocation/deallocation process. Each allocation request takes 5 clock cycles, in the case that itssize is a power of 2, and 6 cycles otherwise. Each deallocation request needs only 2cycles.
In addition, the time analysis of the EMA hardware unit is performed and theresults in terms of the mean allocation and deallocation times are reported infigure 8. As seen from figure 8, large bit-vector sizes can increase both meanallocation and deallocation time delay. However, when the bit-vector length isselected as constant, increase of the maximum allocation size does not cause anydelay to allocation and deallocation time.
The active memory management algorithm, based on the modified buddy systemis implemented in a hardware unit (AMMU), embedded into the SoC design usingVHDL in Agun and Chang (2001). Total fragmentation, maximum clock frequencyand used slice number are obtained in Agun (2001). The comparison of EMA and
Figure 7. The implementation costs of CMBS, EMA, CGMA and MBS in terms of numberof nodes.
An efficient memory allocation algorithm 135
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
AMMU is given in table 5. EMA reduces total fragmentation in the ratio of 22%. AsAMMU allocates memory blocks each of whose size is a power of two, it suffersfrom internal and external fragmentation. However, in EMA only externalfragmentation can exist. Furthermore, EMA can perform allocation process in ashorter time, since allocation time is proportional to inverse of the maximum clockfrequency. On the other hand, the implementation of EMA cost in terms of used slicenumber is higher than AMMU.
5. Conclusions
The paper presented a memory allocation and deallocation technique EMA with itshardware design. When a free block of k chunks is requested, an or-gate-prefixcircuit detects all free blocks of 2blog2 kc þ 2dlog2ðk�2blog2 kce chunks in memory. The free
Table 3. Memory Alloc./Dealloc. hardware implementation results on Virtex II pro FPGA.
Total 64-16 128-16 128-32 256-16 256-32 256-64 512-32 512-64
Slices 44,096 704 1% 1479 3% 1539 3% 3086 6% 3220 7% 3414 7% 6954 15% 7451 16%
LUTs 88,192 1224 1% 2671 3% 2717 3% 5509 6% 5754 6% 6130 6% 11,722 13% 12,695 14%
Table 4. Propagation delays for the proposedhardware unit.
Steps Clock cycles
Search of free blocks (step 2–3) 2High address detection (step 4) 1Bit inversion (step 5) 2Deallocation (step 6) 2
Figure 8. Mean allocation and deallocation time.
136 F. Karabiber et al.
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
block with the highest address is determined by using encoder circuit. The startingaddress of this free block is returned to the memory manager to indicate that the firstk chunks of the block are allocated. While these free k chunks are being used by thememory manager, those k bits corresponding to them are inverted in thebackground. Thus, the bit inversion operation does not contribute any delay tothe memory allocation request.
Consequently, EMA is fast and flexible enough to allocate/deallocate a free blockin any part of memory. This leads to better utilization of memory space, therebyallowing more memory blocks to remain free than is possible with the knownhardware memory allocators. Simulation results have indicated that EMA occupiesapproximately 9.2% less memory space than the modified buddy system (Chang andGehringer 1996). Although this memory occupancy improvement of 9.2% may notseem to be a substantial improvement, it is still significant, knowing that the idealmemory occupancy improvement is shown to be around 20% better than themodified buddy system (Lo et al. 2001).
The gate-level design of the unit, the hardware implementation, and test results,along with area-time measurements versus the bit-vector size and the maximumallocation block size, are presented in this work. The proposed allocator unit (EMA)is compared with AMMU which is one of the recent dynamic memory allocationtechniques. The results show that, in comparison with AMMU, EMA reduces thetotal fragmentation up to 22%. Furthermore, VHDL synthesis results show that themaximum clock frequency of EMA is higher compared to AMMU’s maximum clockfrequency. Hence, EMA can perform allocation process in a shorter time thanAMMU.
References
S.K. Agun and J.M. Chang, ‘‘Design of a reusable memory management system’’, in Proc. of14th Annual IEEE Int’l ASIC/SOC Conf., 2001, pp. 369–373.
H. Cam, M. Abd-El-Barr and S.M. Sait, ‘‘Design and analysis of a high-performancehardware-efficient memory allocation technique,’’ in Proc. of the ICCD’99 Int. Conf. onComputer Design, October 10–13, Austin, USA, 1999, pp. 274–276.
J.M. Chang and E.F. Gehringer, ‘‘Object caching for performance in object-orientedsystems’’, in Proc. IEEE Int’l Conf. Computer Design, Cambridge, MA, pp. 379–385,1991.
J.M. Chang and E.F. Gehringer, A high performance memory allocator for object-orientedsystems, IEEE Trans. Comp., 45, pp. 357–366, 1996.
J.M. Chang, ‘‘Design and evaluation of a submesh allocation scheme for two-dimensionalmesh-connected parallel computers’’, in Proc. 1997 Int’l Symposium on ParallelArchitectures, Algorithms and Networks (I-SPAN), 1997, pp. 303–309.
Table 5. Comparison results of FEMA with AMMU.
Memory size 256 512
Techniques FEMA AMMU FEMA AMMU
Total Fragmentation 0.682 0.878 0.683 0.884Max. clock frequency (Mhz) 69.07 9.649 63.135 7.922Used Slice number 3414 1071 7735 2104
An efficient memory allocation algorithm 137
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
J.M. Chang, W. Srisa-an, C.T. Dan Lo and E.F. Gehringer, ‘‘DMMX: dynamic memorymanagement extensions’’, J. Sys. Software, 63, pp. 187–199, 2002.
Kaffe 1.0.7, Released under GPL license by GNU. Available online at http://www.kaffe.org(accessed 01 February 2002).
K.C. Knowlton, ‘‘A fast storage allocator’’, Comm. ACM, 8, pp. 623–625, 1965.C.-T.D. Lo, W. Srisa-an and J.M. Chang, ‘‘Performance analyses on the generalised buddy
system’’, IEE Proc. Comput. Digit. Tech., 148, pp. 167–175, 2001.I.P. Page and J. Hagins, ‘‘Improving the performance of buddy systems’’, IEEE Trans.
Comput., 35, pp. 441–447, 1986.E.V. Puttkamer, ‘‘A simple hardware buddy system memory allocator’’, IEEE Trans.
Comput., 24, pp. 953–957, 1975.SPECjvm98, Release 1.0. Available online at: http://www.spec.org/jvm98/jvm98/doc/
index.htmlXilinx ISE tool, Available online at: http://www.xilinx.com (accessed 12 May 2003).Xilinx, Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete Data Sheet, Available
online: http://direct.xilinx.com/bvdocs/publications/ds083.pdf (accessed 04 August 2003).
138 F. Karabiber et al.
Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010
Sebuah Algoritma Alokasi Desain Perangkat Keras dan Memori yang Efisiendengan Sintesis VHDL
1. PendahuluanAlgoritma kinerja tinggi untuk alokasi memori dinamis (DMA) telah
kepentingan yang cukup besar dalam sastra sebagai DMA merupakan faktorpenting untuk meningkatkan suatu kinerja sistem komputer. Teknik-teknikalokasi umum dapat dibagi menjadi empat kategori: cocok berurutan, cocokterpisah, sistem buddy dan cocok bitmap. Para cocok sekuensial, dan cocokterpisah algoritma menyimpan daftar bebas dari semua yang tersedia potonganmemori dan memindai daftar untuk alokasi. Kedua metode menghasilkan yangbaik pemanfaatan memori (storage), tapi dikenakan hukuman waktu karenapemindaian gratis daftar ruang. Metode cocok bitmap menggunakan duabitmap untuk proses alokasi, satu untuk ukuran alokasi yang diminta dan satulagi untuk pengkodean blok dialokasikan batas, yaitu ukuran blok yangdialokasikan. Jika ukuran benda kecil, dapat memiliki keuntungan ruang, selainitu menyebabkan waktu pencarian yang lama.
Sistem buddy, diperkenalkan oleh Knowlton (1965), adalah memori yangcepat dan sederhana Teknik alokasi yang mengalokasikan memori dalam blokyang panjangnya adalah kekuatan 2. Jika ukuran blok yang diminta bukanmerupakan kekuatan 2, maka ukuran dibulatkan ke depan kekuatan dua. Inimungkin meninggalkan sebagian besar dari ruang yang tidak terpakai padaakhir dialokasikan block (Puttkamer 1975), sehingga menghasilkanfragmentasi internal yang terjadi di kasus alokasi memori lebih dari yangdiminta. Sistem sobat juga menderita dari fragmentasi eksternal yang munculketika permintaan alokasi memori tidak dapat puas, meskipun jumlah totalruang bebas memadai.
Kinerja sistem buddy dapat ditingkatkan dengan menggunakan hardwareteknik (Puttkamer 1975, Page dan Hagins 1986, Chang dan Gehringer 1991).Sebuah sistem buddy berbasis hardware dimodifikasi yang menghilangkanfragmentasi internal diusulkan dalam (Chang dan Gehringer 1996). Dalamteknik alokasi ini, memori adalah dibagi menjadi kelompok kata yang tetap-ukuran yang disebut potongan, dan statusses mereka ditentukan denganmenggunakan vektor bit. Setiap bit pada vektor merupakan daun dariatau pohon-gate yang diberikan pada gambar 1. Metode di Chang danGehringer (1996) dapat mendeteksi blok bebas ukuran j hanya jika alamat awaldari blok bebas adalah faktor j, atau k x j, di mana k >= 0 dan j adalah kekuatan2. Meskipun blok dengan meminta gratis ruang yang tersedia dalam memori,desain hardware mereka mungkin tidak dapat mendeteksi itu karenaketerbatasan struktur pohon atau gerbang. Alokasi memori lain teknik yangdapat mendeteksi blok bebas size2[log2k] diusulkan di Cam, Abd El--Barr danSait (1999).
Dalam beberapa tahun terakhir, beberapa sistem sobat yang predefine setukuran telah diperkenalkan, dalam rangka mengurangi fragmentasi eksternal
(Chang, Srisa-an, Dan Lo dan Gehringer 2002). Namun, sistem buddydimodifikasi masih melakukan lebih baik daripada sistem sobatmultiple. Algoritma manajemen memori aktif berdasarkan sistem buddydimodifikasi diimplementasikan dalam unit hardware (Ammu), dan tertanamke dalam desain SoC di Agun dan Chang (2001). Unit manajemen memoridinamis memungkinkan integrasi yang mudah ke setiap prosesor, tetapimembutuhkan memori tinggi untuk besar ukuran objek dan angka, karenajumlah total ruang memori untuk diperlukan bit peta sebanding dengan ukuranbenda dan jumlah benda. Akhirnya, bitmap berdasarkan pengalokasi memoridirancang dalam logika kombinasional, bekerja sama dengan aplikasi-spesifikekstensi set instruksi (Chang et al. 2002).
Makalah ini menyajikan alokasi memori yang cepat dan efisien (EMA)algoritma untuk mendeteksi setiap blok gratis yang tersedia dari ukuran yangdiminta dan untuk meminimalkan internal dan fragmentasi eksternal. Teknikyang diusulkan dapat mengalokasikan blok memori bebas setiap panjang yangterletak di setiap bagian dari memori. EMA dapat mengalokasikan blok bebasukuran 2[log2k] + 2log2(k-2)[log2k] dan ia mampu mendeteksi setiap blokgratis yang tersedia dari meminta ukuran menggunakan sirkuit atau-gate awalan yang diusulkan. Selain itu, diusulkan sirkuit untuk menerapkanEMA kurang rumit daripada yang diperkenalkan di Chang dan Gehringer(1996); Lo, Srisa-an dan Chang (2001). Hasil simulasi menunjukkan bahwaEMA menempati ruang memori sekitar 9,2% kurang dari dimodifikasi sistembuddy (Chang dan Gehringer 1996). Selain itu, perangkat keras EMA telahdisintesis dengan VHDL, diuji untuk beberapa pengukuran, seperti meanalokasi dan dealokasi waktu, total luas dll, dan dibandingkan dengan memorisistem manajemen di Agun dan Chang (2001). Hasil simulasi menunjukkanbahwa EMA menghasilkan peningkatan yang signifikan pada perilaku alokasimemori dibandingkan dengan Agun dan Chang (2001) dengan menawarkan22% lebih sedikit fragmentasi eksternal.
Sisa kertas ini disusun sebagai berikut; x 2 menggambarkan memori yangdiusulkan algoritma alokasi: x 3 menyajikan desain hardware rinci pengalokasi/ deallocator diusulkan dalam pekerjaan ini: x 4 meliputi analisis kinerja EMASdengan simulasi, analisis VHDL dan FPGA hasil implementasi; menyimpulkanPernyataan yang dibuat di x 5.
2. Algoritma alokasi memori EfisienPertimbangkan bahwa memori dipartisi menjadi beberapa potongan yang
memiliki jumlah yang sama kata dan blok memori terdiri dari satu atau lebihpotongan. Itu status dari semua potongan memori dengan baik 0 atau 1tergantung pada apakah potongan tersebut gratis atau digunakan, masing-masing, diwakili oleh bit-vector. Dalam sedikit-vektor, memori Informasialokasi diadakan. Bit dari bit-vector diberi label dari kiri ke kanan dalamurutan, dimulai dengan 0. Setiap bit dari bit-vector memiliki alamat register
mengandung label sebagai alamat. Gambar 2 menunjukkan flowchartdisederhanakan Algoritma EMA yang disajikan berikutnya.
Algoritma:Input:Alokasi: nilai ukuran k dari blok yang diminta untuk alokasiDealokasi: alamat awal blok yang akan deallocatedOutput:Alokasi(i) alamat awal dari blok yang dialokasikan,(ii) bit sesuai dengan blok yang dialokasikan terbalik dari 0 ke 1.Dealokasi: bit sesuai dengan blok deallocated terbalik dari 1 menjadi 0.Dalam rangka menerapkan algoritma di atas, struktur logika atau-gate-
awalan, blok pencarian bebas dan mendeteksi bebas blok sirkuit diberikandalam 3 x. Langkah 1 menentukan apakah permintaan alokasi memori ataudealokasi. Langkah 2 mendeteksi potongan memori bebas dari ukuran2[log2k]. Jika k bukan kekuatan 2, kemudian Langkah 3 mendeteksi potonganmemori bebas dari ukuran 2[log2k] + 2[log2(k-2[log2k]. Langkah 4 memilihalamat tertinggi dengan semua blok bebas terdeteksi. Langkah 5mengalokasikan blok dan membalikkan bit k dari 0 ke 1 dari bit-vector sesuaidengan potongan dialokasikan blok. Langkah 6 membebaskan blok danmembalikkan bit k dari 1 ke 0 sesuai dengan potongan blok deallocated.
3. Desain Hardware pengalokasi memoriDalam rangka menerapkan algoritma di atas, blok pencarian bebas dan
mendeteksi bebas sirkuit blok dirancang. Rincian dari sirkuit dan langkah-langkah algoritma yang diberikan oleh berikut.
4. Hasil simulasi dan sintesis VHDLBagian ini terdiri dari dua bagian. Pada bagian pertama, kinerja
komparatif analisis EMA dan teknik alokasi memori sebelumnyadiberikan. Yang kedua bagian menyajikan bagaimana EMAdiimplementasikan pada hardware yang menggunakan sintesis VHDL. Jugadalam bagian ini, perbandingan EMA dan aktif Unit managament memori(Ammu) (Agun dan Chang 2001) diberikan menggunakan kriteria fragmentasi,waktu pelaksanaan dan area chip.
5. KesimpulanMakalah ini disajikan alokasi memori dan dealokasi teknik EMA dengan
nya desain hardware. Ketika sebuah blok bebas dari k potongan yangdiminta, atau-gate-prefix sirkuit mendeteksi semua blok bebas dari 2[log2k] +2[log2(k-2[log2k] potongan dalam memori. Gratis blok dengan alamat tertinggiditentukan dengan menggunakan rangkaian encoder. Starting alamat blokbebas ini dikembalikan ke manajer memori untuk menunjukkan bahwa yangpertama potongan k blok dialokasikan. Sementara ini k potongan bebas yang
digunakan oleh manajer memori, mereka bit k yang sesuai dengan mereka yangterbalik dalam background. Dengan demikian, operasi bit inversi tidakmemberikan kontribusi penundaan untuk permintaan alokasi memori.
Akibatnya, EMA cepat dan cukup fleksibel untuk mengalokasikan /deallocate blok gratis dalam setiap bagian dari memori. Hal ini menyebabkanpemanfaatan yang lebih baik dari ruang memori, sehingga memungkinkanlebih banyak blok memori untuk tetap bebas daripada yang mungkin dengandiketahui penyalur memori hardware. Hasil simulasi telah menunjukkan bahwaEMA menempati sekitar 9,2% lebih sedikit ruang memori dari sistem buddydimodifikasi (Chang dan Gehringer 1996). Meskipun hal ini peningkatanhunian memori sebesar 9,2% tidak mungkin tampaknya menjadi sebuahpeningkatan yang substansial, masih signifikan, mengetahui bahwa cita-citaPeningkatan hunian memori yang akan ditampilkan sekitar 20% lebih baik darisistem buddy dimodifikasi (Lo et al. 2001).
Desain gerbang-tingkat unit, implementasi perangkat keras, dan hasil tes,bersama dengan pengukuran daerah-waktu versus ukuran bit-vector danmaksimal Alokasi ukuran blok, disajikan dalam karya ini. Unit pengalokasiyang diusulkan (EMA) dibandingkan dengan Ammu yang merupakan salahsatu alokasi memori baru yang dinamis teknik. Hasil penelitian menunjukkanbahwa, dibandingkan dengan Ammu, EMA mengurangi Total fragmentasihingga 22%. Selanjutnya, hasil sintesis VHDL menunjukkan bahwa frekuensiclock maksimum EMA lebih tinggi dibandingkan dengan clock maksimumAmmu frekuensi. Oleh karena itu, EMA dapat melakukan proses alokasi dalamwaktu yang lebih pendek dari Ammu.
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
DOI : 10.5121/ijwmn.2012.4311 169
ONTOLOGY FOR MOBILE PHONE OPERATING
SYSTEMS
Hasni Neji and Ridha Bouallegue
Innov’COM Lab, Higher School of Communications of Tunis, Sup’Com
University of Carthage, Tunis, Tunisia.
Email: [email protected]; [email protected]
ABSTRACT
This ongoing study deals with an important part of a line of research that constitutes a challenging
burden. It is an initial investigation into the development of a Holistic Framework for Cellular
Communication (HFCC). The main purpose is to establish mechanisms by which existing wireless
cellular communication components and models can work holistically together. It demonstrates that
establishing a mathematical framework that allows existing cellular communication technologies (and
tools supporting those technologies) to seamlessly interact is technically feasible.
The longer-term future goals are to actually improve the interoperability, the efficiency of mobile
communication, calls quality, and reliability by applying the framework to specific development efforts.
KEYWORDS
Interoperability, Ontology, Feature modeling, HFCC, OWL.
1. INTRODUCTION
Operating system’s Interoperability is the ability of independent software systems (tailored to
mobile phones) to work with each other, exchange information and initiate actions from each
other, without the condition of knowing neither the details nor how they each work. However,
to reach this objective, the domain of study must be explicitly understood without any
ambiguities. The approach to this portion of investigation is to analyze the structures, and
architectures of a small set of individual tools: Symbian Operating System (Symbian OS) -- a
mobile operating system and computing platform designed for smart phones, Currently
maintained by Accenture (is a global management consulting, technology services and
outsourcing company headquartered in Dublin, Republic of Ireland)--. And Android
Operating System (Android OS) -- a Linux-based operating system for mobile devices such as
smart phones and tablet computers. The software is developed by the Open Handset
Alliance led by Google)--. This includes performing a domain analysis (of this subset of
tools) and building a feature model of the domain [1]. Next, the selected main artifact attributes
will be considered in the context of the objects needed for establishing an object federation.
Using this context, the characteristics will be defined within mobile phone’s operating system
Ontology [2]. This latter would be a pilot star-up framework for further extension by
incorporating new mobile operating systems.
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
170
2. TOWARDS MOBILE PHONE OPERATING SYSTEM ONTOLOGY
2.1 Ontology overview:
The term “Ontology” is widely used in Knowledge Engineering and Artificial Intelligence
where what "exists" are those entities which can be "represented." It is used to refer to the
shared understanding of some domain of interest. It may be used as a unifying framework to
solve problems in that domain [2].
Ontologies capture knowledge about some domain of interest. They describe the concepts in
that domain and also the relationships that hold between those concepts.
To be effective, Ontology must entail or embodies a world view with respect to a given domain.
This world view is often conceived as a set of concepts along with their definitions and their
inter-relationships.
2.2 Methodology for building the Ontology
One way to overcome the drawbacks generated by the lack of interoperability in heterogeneous
mobile phone’s operating systems is to establish a unifying contextual and architectural
framework for the domain. As this contextual and architectural framework or “Ontology”
emerges; mobile phone’s operating systems will be able to communicate with more efficiency.
In constructing the specific tool Ontologies (one for each mobile phone operating system); the
focus is on identifying the classes (and methods) that were needed to pass objects from one tool
to another.
Because there is currently no usable interoperability Ontology for the domain of mobile phone’s
operating systems, it was not possible to rely on previous works in designing a federation
Ontology. Instead, this Ontology had to be constructed "from scratch."
Fortunately, there were existing methodologies for designing Ontologies. The methodology
presented by [3] is one of them. It is possible to tailor this methodology to develop a specific
Ontology for constructing the federation one. The Ontology development process consists of the
following steps:
(1) identify the purpose and scope of the Ontology,
(2) perform a feature analysis for the domain of mobile phone operating systems’ tools,
(3) collect similar characteristics between different feature models, establish affinity
relationships, and group commonalities between the two tools in order to build a
federation Ontology representing these commonalities and enter this Ontology into
Protégé 4.1,
(4) construct the more detailed Ontologies for each tool in Protégé 4.1,
(5) use the Unified Modeling Language (UML) to represent the relationships between the
three Ontologies, and
(6) Document the Ontologies.
1. Step 1 -- Purpose and Scope of the Ontology
In terms of purpose and scope, the federation Ontology must be broad enough to accommodate
all potential mobile phone’s operating systems, as well as being extensible in case new
Ontology terms and relationships have to be added later. The specific development tool
Ontologies must be detailed enough to account for the structures and architectures actually
forming the basis of mobile phone’s operating systems. 2. Step 2 -- Feature Modeling The second step in the Ontology design methodology is to perform a domain analysis of mobile
operating systems by constructing and then considering the feature models of Symbian OS and
Android OS.
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
171
Feature modeling is a method used to help define software product lines and system families, to
identify and manage commonalities and variabilities between products and systems [1].
Defining a feature model for an existing mobile phone’s operating systems provides means to
explore, identify, and define the key architectural aspects of the existing mobile phone’s
operating systems so that these aspects can then be described more fully in Ontology. It is this
Ontology that can be used to establish interoperability between the existing mobile phone’s
operating systems. As shown in figure 1, the feature model is defined around concepts and not around objects. The
objective is to model features of elements and structures of a domain, not just objects in that
domain. For more detail on the mechanism of how to construct a feature model, the reader may
consult [1], [4], or [3]. In his book, Czarnecki [1] provides an excellent methodology for
gathering the information needed to construct a feature tree. He identifies the sources of features
as the following:
Figure 1. Feature Model of the Symbian high level concepts.
• Existing and potential stakeholders,
• Domain experts and domain literature,
• Existing systems,
• Pre-existing models (e.g., use-case models, object models…), and
• Models created during development process (i.e., features gotten during design).
Moreover, He identifies the following strategies for identifying and capturing features:
• Look for important domain terminology that implies variability.
• Use feature starter sets to start the analysis.
• Update feature models during the entire development cycle.
• Identify more features than you initially intend to implement.
Besides, according to [1] the following set of general steps illustrates the feature modeling process:
• Record similarities between instances (i.e. common features).
• Record differences between instances (i.e. variable features).
• Analyze feature combinations and interactions.
• Record all the additional information regarding features.
• Organize the features in hierarchical feature tree with classification (mandatory, optional,
alternative, and/or optional alternative features).
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
172
� Mandatory. A mandatory child feature must be included in all the products in which its
parent feature is included.
� Optional. An optional child feature can be optionally included in all products in which
its parent feature appears.
� Alternative. A set of child features are defined as alternative, if only one of them can be
selected when its parent feature is part of the product. As an example, a mobile phone
may use a Symbian or Android operating system but not both at the same time.
Or-relation. A set of child features are said to have an or-relation with their parent when one or
more of them can be included in the products in which its parent feature appears. For instance, a
mobile phone can provide connectivity support for Bluetooth, Universal Serial Base (USB),
Wireless Fidelity (Wi-Fi) or any combination of the three.
3. Step 3 – Establishing Commonalities After producing a feature model for Symbian and Android, the next step required is to isolate
and annotate the commonalities (or the inferred commonalities) that exist between the two
feature models. These common features then formed the basis for the basic Ontology
terminology of the mobile phone’s operating systems federation. The approach in this step is to
brainstorm and reason about the two feature diagrams, develop lists of potential terms from the
feature diagrams, identify common terms between the two lists, and then construct affinity
diagrams of these common terms. Affinity diagrams are hierarchical Venn diagrams that
provide groupings of related terms. Figure 2 illustrates how an affinity diagram is constructed;
in this case the related terms all deal with "Application_Framework" as a part of the Android
Operating System architecture.
Figure 2. Construction of an Affinity Diagram.
The groupings of terms in the affinity diagrams then provided the basis for the hierarchy of
terms in the mobile phone’s operating systems federation Ontology. These terms and their
hierarchy were then entered and stored in the Protégé 4.1 ontology Web Language1 (OWL).
4. Step 4 – Tool Ontologies
1 http://protege.stanford.edu
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
173
As the terms of each Ontology’s tool were identified they were input into the Protégé 4.1
Ontology Web Language. Different Ontology languages provide different facilities and fulfill
the requirements needed for this work. The most recent development in standard Ontology
languages is OWL from the World Wide Web Consortium (W3C). OWL makes it possible to
describe concepts. It has a richer set of operators - e.g. intersection, union and negation. It is
based on a different logical model which makes it possible for concepts to be defined as well as
described. Complex concepts can therefore be built up in definitions out of simpler concepts.
Furthermore, the logical model allows the use of a reasoner which can check whether or not all
of the statements and definitions in the Ontology are mutually consistent and can also recognize
which concepts under which definitions. The reasoner can therefore help to maintain the
hierarchy correctly. This is particularly useful when dealing with cases where classes can have
more than one parent.
OWL classes are interpreted as sets that contain individuals. They are described using formal
(mathematical) descriptions that state precisely the requirements for membership of the class.
One of the main services offered by a reasoner is to test whether or not one class is a subclass of
another class. By performing such tests on the classes in Ontology it is possible for a reasoner to
compute the inferred Ontology class hierarchy. Another standard service that is offered by
reasoners is consistency checking. Based on the description (conditions) of a class the reasoner
can check whether or not it is possible for the class to have any instances. A class is deemed to
be inconsistent if it cannot possibly have any instances. In Protégé 4.1, it is possible to manually
construct class hierarchy called the asserted hierarchy. The class hierarchy that is automatically
computed by the reasoner is called the inferred hierarchy. Moreover, this tool automatically
classifies the ontology (and check for inconsistencies).
The already mentioned reasons certify the high-quality of the tool. Protégé 4.1, OWL was
chosen in order to make the descriptions of concepts, attributes, and instances formal, so as the
knowledge can be machine-readable and reasoning-automated. Figure 3 is a screen capture of
the protégé 4.1 OWL environment.
Figure 3. Screen shot of the protégé 4.1 tool.
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
174
5. Step 5 - UML Representation of the Domain
The relationships between all three Ontologies are identified and annotated. The reason for this
is to formulate inter-relationships between the Ontologies.
Like Ontology, a well-defined class diagram, part of UML, can describe concepts and
relationships in a certain domain.
Both approach have much to offer and can work together most effectively in an integrated
environment. Figure 4 illustrates the general UML structure that was used to annotate how all
three Ontologies were related.
Figure 4. Ontology Inter-relationship.
6. Step 6 -- Documentation
The three developed Ontologies are self-documenting and are stored in Protégé 4.1 project files.
Such documentation makes it possible for future researchers to modify the Ontologies or add
additional tools to them. As a final note about the methodology used to develop these initial
mobile phone’s operating systems Ontologies, a modified version of this methodology should
be used when adding additional tools to the HFCC federation. The modified methodology
includes the following:
• Confirm that the purpose is still valid; expand the scope to include the new tool
Ontologies. Remove invalid ones from the framework, they are no longer needed.
• Only perform feature modeling of the new tools so that needed constructs for the
federation Ontology are identified. Since the federation Ontology is already established,
it is only necessary to extend and modify it, not re-build it entirely.
• Modify the federation Ontology to account for the new/modified Ontology terms.
• Modify each UML relationship diagram as needed to account for the new tool
Ontologies and the changes to the federation Ontology.
3. ONTOLOGY DESIGN
3.1 Domain analysis and feature models
A domain analysis of the mobile operating system’s domain was undertaken. The analysis was
accomplished by examining two specific mobile phone’s operating systems, building feature
models of those tools, and then identifying key terminology of the feature models. There are
two reasons why this domain analysis cannot be considered to be a complete analysis of the
domain of mobile phone’s operating systems. First, only two tools (out of many of possibilities)
were analyzed. Secondly, a domain analysis is not an “additive” activity; simply analyzing
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
175
additional tools (beyond those two) by themselves does not completely add to the overall
analysis. The ways in which the new additions affect and change the previously established
analysis must also be considered. Therefore, the limited domain analysis conducted as part of
this research can be considered a necessary, but not sufficient, analysis towards establishing a
unified framework for mobile phone’s operating systems.
3.1.1 Symbian Operating System
The first tool analyzed in the domain analysis was Symbian, a mobile operating system and
computing platform designed for smart phones. The feature model was developed by identifying
software features from the Symbian OS Architecture Source book [5] and by actual day-to-day
use of the tool. Figure 5 illustrates a single excerpt from the overall feature model for Symbian.
This excerpt illustrates the features associated with the architecture of Symbian operating
system services, is just a small portion from the total architecture. Each of these features then
becomes a candidate for possible inclusion in the federation Ontology. The complete list of
Symbian features is shown below in figure 6. It is generated by Protégé 4.1 OWL tool.
Figure 5. Operating System Services as an excerpt from the overall Symbian Feature Model.
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
176
Figure 6. List of Symbian Operating System features.
This feature list is taken directly from the Symbian feature model. The features aligned as first
level at the left of the figure are high-level "parent" features, while those second, third level,...
etc represent more detailed "atomic" features.
Figure 7. Symbian visualization ontology generated by the Protégé 4.1 tool.
Figure 7 is taken from the Ontology editor tool. It gives a thorough idea about the ontological
concepts’ hierarchy of Symbian operating system. Meanwhile, figure 8 presents the Ontological
graph automated by OWL as layers of classes given by a top down approach.
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
177
Figure 8. Ontological graph automated by OWL.
3.1.2 Android OS
The second tool analyzed during the domain analysis was the Android OS. It is for low powered
devices that run on battery, and having plenty of hardware like cameras, light and orientation
sensors, Wi-Fi and Universal Mobile Telecommunication Systems (3G telephony) connectivity
and a touch-screen. Unlike on other mobile operating systems, Android applications are written
in java and run in virtual machines.
The Android OS feature model was developed by identifying essential features from the
Android OS user guide [6], Analysis of the Android Architecture Book [7] and by actual day-to-
day use of the suite. Figure 9 below illustrate excerpts (condensed feature diagram) from the
overall feature model for Android OS.
Figure 9. Condensed Android OS feature diagram.
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
178
From the complete feature model of Android OS, it was possible to extract relevant features
(with their descriptions). Each of these features then becomes a candidate for possible inclusion
in the federation Ontology. The complete list of Android OS features is listed below in figure
10.
This feature list is taken directly from the Android OS feature model, and generated by the
OWL Protégé 4.1 as Ontology classes. As in the case of the Symbian features list, the features
aligned as first level at the left of the figure are high-level "parent" features, while those second,
third level,... etc represent more detailed "atomic" features.
Figure 10. List of Android features generated by the Protégé 4.1.
Figure 11. Android visualization Ontology generated by the Protégé 4.1 tool.
Figure 11 is taken from the Ontology editor tool. It gives a thorough idea about the Ontological
concepts’ hierarchy of android operating system. Meanwhile, figure 12 presents the Ontological
graph automated by OWL as layers of classes given by a top down approach.
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
179
Figure 12. Android Ontological graph automated by OWL.
3.2 Federation Ontology
After performing a domain analysis using an in-depth investigation of Symbian OS and Android
OS, the two lists were considered together to identify commonalities -- commonalities that
would also likely be common with other mobile phone’s operating systems--. These
commonalities begin to form the list of terms that eventually will make up the federation
Ontology.
After identifying these common terms in the domain of mobile phone’s operating systems, the
words were organized into logical groupings using an "affinity diagram" technique (recall
Figure 2). From these affinity diagrams, it was then straight-forward to establish the hierarchical
structure of the federation Ontology. The completed federation Ontology classes are shown
below in Figure 13.
Figure 13. Federation Ontology classes.
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
180
3.3 Ontology Inter-relationships
The relationships between the three Ontologies were identified and annotated. This was done
using UML. Both a top down and bottom up approach were taken to identify the relationships
between the three Ontologies and record those relationships in static class diagrams.
Figure 14 is just one example of how the relationships between the three Ontologies are related.
Figure 14. UML representation of the relationships between the three Ontologies.
4. CONTRIBUTIONS
While the most important original contribution to the field of wireless cellular communication is
to build a mobile phone’s operating system Ontology, there are several other contributions:
1. The application of the HFCC to a sample set of mobile phone’s operating system
(i.e., Symbian and Android) increases the interoperability between the selected set.
2. The methodology was adapted from other sources (notably [3]), but was tailored for
identifying and capturing the unique characteristics of mobile phone’s operating
systems. This methodology can be used to add other mobile phone’s operating systems
to the tool Ontology. The methodology is as important as the Ontology itself.
3. Three separate Ontologies (and their inter-relationships) are presented: a high-level
mobile phone’s operating system Ontology, an Ontology that describes the Common
Object Model architecture of Symbian, and an Ontology that describes important (from
an interoperability viewpoint) classes from Android OS architecture.
4. The mobile operating system Ontology is a modular Ontology. This division into
modules has two major advantages: firstly, it facilitates the future introduction of new
domain Ontologies, and secondly, it makes the domain (mobile operating system)
Ontology more reusable.
5. The main purpose for developing Ontology is to overcome some of the obstacles (such
as the limitation of interoperability, lack of communication, and poor shared
understanding.
6. The use of feature models as a key asset to manage the commonalities and the
variabilities of the mobile phone’s operating systems.
International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012
181
5. CONCLUSION
This article presented the methodology of a research effort devoted to establishing the set of
mobile phone’s operating systems Ontologies for integration into the HFCC. It summarizes the
results of the domain analysis undertaken to produce the federation Ontology. Besides, it
presents the details of the federation Ontology as well as the two specific tool Ontologies.
Finally, the article presents the results of how the three Ontologies inter-relate by using UML to
annotate the inter-relationships.
While this methodology and strategy was useful in constructing the feature tree for Symbian
and Android, it is only provide a guide for the actual work.
During this Ontology construction process, [8]'s guidelines for Ontology construction were
adhered to as much as possible: clarity, coherence, extensibility, minimal Ontological
commitment, and minimal encoding bias. However, because it was necessary to adhere closely
to the actual class constructs of the tools themselves, it was often not possible to satisfy each of
these guidelines.
The mobile phone’s operating systems federation Ontology is then left open for further future
Enhancement and extension.
REFERENCES
[1] Czarnecki, K. and Eisenecker, U., Generative Programming Methods, Tools, and Applications,
Addison-Wesley, 2000.
[2] Uschold, M. and Gruninger, M., "Ontologies: Principles, Methods and Applications,"
Knowledge Engineering Review, Vol. 11, No. 2, June 1996.
[3] Hasni, Neji, "Towards an interoperability Ontology for software development tools" Master’s
Thesis, Computer Science Department, Naval Postgraduate School, Monterey, CA, March 2003.
[4] Geyer, L., “Feature Modeling Using Design Spaces,” Proceedings of 1st
German Workshop on
Product Line Software Engineering, Kaiserslautern, Germany, November 2000.
[5] Ben Morris, the Symbian OS Architecture Source book Design and Evolution of a Mobile Phone
OS, John Wiley & Sons, Ltd 2007.
[6] Google, “Android 3.0 User’s Guide”, February 23, 2011.
[7] Stefan Brahler, “Analysis of the Android Architecture”, Karlsruhe institute for technology,
October 2010.
[8] Gruber, T. R., “Toward Principles for the Design of Ontologies Used for
Knowledge Sharing,” International. Journal of Human-Computer Studies, Vol. 43, 1995, pp.
907-928.
Ontologi untuk Sistem Operasi Telepon Seluler
Abstrak
Penelitian yang dilakukan ini berkaitan dengan bagian penting dari garis penelitianyang merupakan beban yang menantang. Ini adalah penyelidikan awal dalam pengembanganKerangka Holistik untuk Komunikasi Seluler (HFCC). Tujuan utama adalah untuk membangunmekanisme yang nirkabel komponen komunikasi seluler yang ada dan model dapat bekerjasecara holistik bersama-sama. Ini menunjukkan bahwa membangun kerangka matematisyang memungkinkan teknologi yang sudah ada komunikasi seluler (dan alat-alat yangmendukung teknologi tersebut) untuk mulus berinteraksi secara teknis layak. Tujuan masadepan jangka panjang adalah untuk benar-benar meningkatkan interoperabilitas, efisiensikomunikasi mobile, panggilan kualitas, dan keandalan dengan menerapkan kerangka kerjauntuk upaya pembangunan yang spesifik.
1. IntroduksiSistem operasi Interoperabilitas adalah kemampuan dari sistem perangkat lunak
independen (disesuaikan dengan ponsel) untuk bekerja dengan satu sama lain,pertukaran informasi dan melakukan tindakan dari satu sama lain, tanpa mengetahuikondisi baik rincian atau bagaimana mereka masing-masing bekerja. Namun, untukmencapai tujuan ini, domain penelitian harus secara eksplisit dipahami tanpaambiguitas. Pendekatan ini bagian dari investigasi adalah untuk menganalisis struktur,dan arsitektur dari set kecil alat individu: Sistem Operasi Symbian (Symbian OS)—sebuah sistem operasi mobile dan platform komputasi yang dirancang untuk ponselpintar, saat ini dikelola oleh Accenture (adalah sebuah konsultan manajemen global,layanan teknologi dan outsourcing perusahaan yang bermarkas di Dublin, RepublikIrlandia)—. Dan Sistem Operasi Android (Android OS)—sistem operasi berbasis Linuxuntuk perangkat mobile seperti ponsel pintar dan komputer tablet. Perangkat lunak inidikembangkan oleh Open Handset Alliance yang dipimpin oleh Google)—.
2. Terhadap Ontologi Sistem Operasi Telepon Seluler2.1 Gambaran ontologi:
Istilah "Ontologi" secara luas digunakan dalam Rekayasa Pengetahuan danKecerdasan Buatan di mana apa yang "ada" adalah suatu entitas yang dapat"mewakili." Hal ini digunakan untuk merujuk pada pemahaman bersama tentangbeberapa domain yang menarik. Ini dapat digunakan sebagai kerangka pemersatuuntuk memecahkan masalah dalam domain tersebut.
2.2 Metodologi untuk membangun OntologiSalah satu cara untuk mengatasi kelemahan yang dihasilkan oleh kurangnya
interoperabilitas dalam sistem operasi ponsel heterogen adalah untukmembangun kerangka kerja kontekstual dan arsitektur pemersatu untuk domain.Sebagai kerangka kontekstual dan arsitektur atau "Ontologi" muncul; sistemoperasi ponsel akan dapat berkomunikasi dengan lebih efisien.
Untungnya, ada metodologi yang ada untuk merancang Ontologi.Metodologi yang disajikan oleh Neji Hasni adalah salah satunya. Hal inidimungkinkan untuk menyesuaikan metodologi ini untuk mengembangkanOntologi khusus untuk membangun federasi satu. Proses pengembanganOntologi terdiri dari langkah-langkah berikut:(1) mengidentifikasi tujuan dan ruang lingkup Ontologi,(2) melakukan analisis fitur untuk domain alat sistem operasi ponsel ',(3) mengumpulkan karakteristik serupa antara model fitur yang berbeda,
membangun hubungan afinitas, dan kesamaan kelompok antara dua alatdalam rangka membangun sebuah federasi Ontologi mewakili kesamaan ituuntuk masuk Ontologi ini ke anak didik 4.1,
(4) membangun ontologi yang lebih rinci untuk setiap alat dalam anak didik 4.1,(5) menggunakan Unified Modeling Language (UML) untuk mewakili hubungan
antara tiga Ontologi, dan(6) Dokumen Ontologi.
3. Desain Ontologi3.1 Analisis domain dan model fitur
Sebuah analisis domain dari domain sistem operasi mobile dilakukan.Analisis ini dilakukan dengan memeriksa dua sistem operasi ponsel yang spesifik,membangun model fitur alat tersebut, dan kemudian mengidentifikasiterminologi kunci dari model fitur. Ada dua alasan mengapa analisis domain initidak dapat dianggap sebagai analisis lengkap dari domain sistem operasi ponselitu. Pertama, hanya dua alat (dari banyak kemungkinan) dianalisis. Kedua, analisisdomain bukan "aditif" kegiatan; hanya menganalisis alat tambahan (di luar yangdua) sendiri tidak sepenuhnya menambah analisis secara keseluruhan. Cara-caradi mana tambahan baru mempengaruhi dan mengubah analisis ditetapkansebelumnya juga harus dipertimbangkan. Oleh karena itu, analisis domainterbatas dilakukan sebagai bagian dari penelitian ini dapat dianggap sebagaidiperlukan, tetapi tidak cukup, analisis menuju pembentukan kerangka kerjaterpadu untuk sistem operasi ponsel itu.
3.1.1 Sistem Operasi SymbianAlat pertama dianalisis dalam analisis domain adalah Symbian, sistem
operasi mobile dan platform komputasi yang dirancang untuk ponsel pintar.Model fitur dikembangkan dengan mengidentifikasi fitur perangkat lunakdari buku Symbian OS Arsitektur Sumber dan dengan penggunaan sehari-hari yang sebenarnya dari alat. Gambar 5 menggambarkan kutipan tunggaldari model fitur keseluruhan untuk Symbian. Kutipan ini menggambarkanfitur yang berhubungan dengan arsitektur Symbian layanan sistem operasi,hanya sebagian kecil dari keseluruhan arsitektur. Masing-masing fitur inikemudian menjadi kandidat untuk dimasukkan mungkin dalam federasiOntologi. Daftar lengkap fitur Symbian ditampilkan di bawah ini padagambar 6. Hal ini dihasilkan oleh anak didik alat 4.1 OWL.
3.1.2 Sistem Operasi Android
Alat kedua dianalisis selama analisis domain adalah OS Android. Halini untuk perangkat bertenaga rendah yang berjalan pada baterai, danmemiliki banyak perangkat keras seperti kamera, cahaya dan orientasisensor, Wi-Fi dan Universal Mobile Telecommunication Sistem (3Gtelephony) konektivitas dan layar sentuh. Tidak seperti pada sistem operasimobile lainnya, aplikasi Android ditulis di Jawa dan berjalan di mesin virtual.
3.2 Federasi OntologiSetelah melakukan analisis domain menggunakan investigasi mendalam
dari Symbian OS dan Android OS, dua daftar dianggap sama untukmengidentifikasi kesamaan—kesamaan yang juga kemungkinan akan menjadiumum dengan sistem operasi ponsel lain—. Kesamaan ini mulai membentukdaftar istilah yang pada akhirnya akan membentuk federasi Ontologi.
3.3 Ontologi Inter-hubunganHubungan antara tiga Ontologi diidentifikasi dan dijelaskan. Hal ini
dilakukan dengan menggunakan UML. Pendekatan kedua top down dan bottomup diambil untuk mengidentifikasi hubungan antara tiga Ontologi dan merekamhubungan-hubungan dalam diagram kelas statis.
4. KontribusiSementara kontribusi asli yang paling penting untuk bidang komunikasi seluler
nirkabel adalah untuk membangun Ontologi sistem operasi ponsel ini, ada beberapakontribusi lainnya:1. Penerapan HFCC untuk sampel set sistem operasi ponsel (yaitu, Symbian dan
Android) meningkatkan interoperabilitas antara set yang dipilih.2. Metodologi ini diadaptasi dari sumber lain, tetapi dirancang untuk
mengidentifikasi dan menangkap karakteristik unik dari sistem operasi teleponmobile. Metodologi ini dapat digunakan untuk menambahkan sistem operasiponsel yang lain untuk alat Ontologi. Metodologi adalah sama pentingnya denganOntologi itu sendiri.
3. Tiga Ontologi terpisah (dan hubungan antar mereka) disajikan : Ontologi ponseltingkat tinggi itu sistem operasi, sebuah Ontologi yang menggambarkan Commonarsitektur Object Model dari Symbian, dan Ontologi yang menggambarkanpenting (dari sudut pandang interoperabilitas) kelas dari arsitektur OS Android.
4. Mobile sistem operasi Ontologi Ontologi adalah modular. Divisi ini ke dalammodul memiliki dua keunggulan utama : pertama, memfasilitasi pengenalan masadepan ontologi domain baru, dan kedua, itu membuat domain (sistem operasimobile) Ontologi lebih reusable.
5. Tujuan utama untuk mengembangkan Ontologi adalah untuk mengatasi beberapakendala (seperti keterbatasan interoperabilitas, kurangnya komunikasi, danpemahaman bersama miskin.
6. Penggunaan model fitur sebagai aset utama untuk mengelola persamaan dankeragaman sistem operasi telepon mobile.
5. Konklusi
Artikel ini disajikan metodologi dari upaya penelitian yang ditujukan untukmembangun set sistem operasi Ontologi ponsel untuk integrasi ke dalam HFCC. Inimerangkum hasil analisis domain dilakukan untuk menghasilkan federasi Ontologi.Selain itu, menyajikan rincian federasi Ontologi serta dua Ontologi alat khusus.Akhirnya, artikel menyajikan hasil bagaimana tiga Ontologi saling berhubungan denganmenggunakan UML untuk membubuhi keterangan keterkaitan.