ujian kompetensi dasar pengantar organisasi komputer

©2010 International Journal of Computer Applications (0975 – 8887)

Volume 1 – No. 12

1

Performance Assessment using Text Mining

Radha Shakarmani

Asst. Prof, SPIT Sardar Patel Institute of Technology

Munshi Nagar, Andheri (W) Mumbai - 400 058

Nikhil Kedar Student, SPIT

903, Sai Darshan Versova, Andheri (W)

Mumbai - 400 058

ABSTRACT

Here in this paper we use Text Mining-a feature of Web

Intelligence to derive information from the unstructured textual

data on the web and device the consensus based strategy to

business decisions. This will have two fold advantages, one

mitigate the risk early and second would provide a support for our

understanding and decision making. This concept is explained

with an example of evaluating a player‘s performance based on

minute to minute commentary of the match. Parameters such as

his position on field (for example in football – defenders,

midfielders, forwards and goal keeper), his past performance, his

present fitness and form, and such other parameters are

considered. Weightage / value for each parameter is decided and

information can be derived for analyzing a player‘s performance.

During analysis we view the comments, we read through fan

forums, blogs, newspaper reviews on the play, expert

commentator views, etc. This is either used as a correction factor

to enhance the credibility of the model.

1. IDENTIFYING INFORMATION

RESOURCES The first phase is to gather information. But information cannot be

gathered from the entire web due to its vastness. To reduce the

search time static sources (web site) are chosen to extract

information. Authenticity, correctness and up-to-date nature of

such sources are very important for information retrieval. Apart

from these static sources there might be sources which may

provide additional information but are unknown to the user. A

web crawler (also known as a web spider) is a program or

automated script which browses the World Wide Web to find

such sources.

The static site which gives the performance of all the players and

their ranking / values are identified and relevant information is

retrieved. In addition to the performance, information regarding

popularity or fan following of a player can be judged from blog

sites.

2. INFORMATION RETRIVAL

Information retrieval system identifies the document, pages in the

collection which matches the user query. Information retrieval

system allows us to narrow down the set of documents that are

relevant to the particular problem.

In our example analysis of minute to minute commentary, news

flows and blogs will be done using text mining.

As text mining involves applying very computationally –intensive

algorithms to large documents collections, it can be limited to

support information retrieved. Information retrieval can speed up

analysis considerably reducing the number of documents for

analysis. As most of the resources used to gather information are

static, it is not necessary that all the information in the form of

web pages is required for further analysis.

In the case study considered, information is derived from the sites

where minute by minute description about Football matches is

available. The project intends to consider evaluation of players

based on expert opinions combined with reviews from common

people obtained from blogs. IR systems will allow us to retrieve

such documents which will ease the process of text mining. These

documents will then be applied to the information extraction

systems which are systems used for text mining.

3. INFORMATION EXTRACTION

After short listing the required web pages text mining is applied.

The following three phases - information extraction, text mining,

converting unstructured data to structured data are closely related

to each other.

Information extraction is the process of automatically obtaining

structured data from unstructured natural language document.

Often this involves defining the general form of information that

we are interested in as one or templates which are then used to

Khandelwal Student, SPIT

B-401, Mahesh Tower, Sector-2 Charkop, Kavdivli (W)

Mumbai - 400 067

Existing search engines have many remarkable capabilities; but

what is not among them is deduction capability—the capability to

synthesize an answer to a query from bodies of information which

reside in various parts of the World Wide Web. Web Intelligence

is an area of research which attempts to provide this capability.

The whole procedure involves four main stages: Web crawling i.e

identifying information resources, information retrieval and

extraction, text mining and finally converting unstructured data to

structured data.

Keywords

Web Intelligence, NLP, Text Mining, Information Extraction,

Information Retrieval, GATE.


Volume 1 – No. 12

2

guide the extraction process. Information extractions rely heavily

on the data generated by NLP systems.

The role of Natural Language processing in text mining is to

provide linguistic data to next phase. This is done by annotating

documents with information like sentences boundaries, parts of

speech parsing results. NLP may be deep (parsing every part of

every sentence and attempting to account semantically for every

part) or shallow (parsing only certain passages or phrases within

sentences or producing only limited semantic analysis), and may

even use statistical means to disambiguate word senses or multiple

parses of the same sentence.

Tasks that Information extraction systems can perform includes:

A) Term analysis: It involves analysis of one or more

words, multi word, terms like papers, PDF etc.

B) Named entity recognition: It involves identification of

names, dates, expressions of time, quantities, associated

percentage, units.

C) Fact extraction: Relationships between entities or

events.

The data generated during Information extraction phase is

structured information derived from unstructured textual data.

Information extraction will give information in the form of

relationships. Analysis of words, phrases will give us information

about an entity and relationship of that entity with other entities.

In our case information is used to design a database which can

serve as a useful tool to evaluate a player‘s performance. The

database contains attributes such as goals scored, red cards, assists

etc for allocating points.

Text mining using NLP is one of the approaches which can be

used. Other approaches like Semantic Web and OWL can also be

used to provide a solution.

In semantic Web we search for keywords and each keyword is

considered a class, if it can be further described (i.e it has

attributes) else it is considered an attribute if it has no further

description. For example: The classes can be a Player Name and

number of goals scored can form the attribute.

In OWL, we create a data dictionary which has all the possible

replacements that can be used for a particular attribute .e.g. a

player could be referred by his jersey number or by his nick name.

The information that is extracted using NLP also contains similar

replacements for the attribute. In this case we can create our own

Data Dictionary which can be used as a reference.

The contents of tables are updated at the end of every match. This

structured data (e.g.: The number of goal scored by a particular

player) is used to determine the performance.

To get information about a player‘s performance in a particular

match we make use of the textual data available as minute to

minute commentary on the web. To get information about a

player‘s popularity we can make use of Fan Forum and blogs.

Considering performance of a player in a recent match the blogs

can give the people‗s verdict.

Now Natural Language Processing can be used to process this

information. Analysis of this information can be used in

performance evaluation.

4. IMPLEMENTATION In our Project the process of Text Mining is implemented using an

open source tool kit –GATE (General architecture for Text

Engineering).The IE system in Gate is called ANNIE i.e A Nearly

New Information Extraction System. (developed by Hamish

Cunningham, Valentin Tablan, Diana Maynard, Kalina

Bontcheva, Marin Dimitrov and others).

The functioning of ANNIE relies on finite state algorithms and the

JAPE (JAVA Annotation Pattern Engineering) language. ANNIE

has two parts: Language resources and Processing resources.

Language resources consist of GATE Documents and Corpus. The

source of GATE Documents can be specified either by giving the

path of a locally stored file on the hard disk or by specifying the

URL of a web page. Document formats supported by GATE are

XML, HTML, SGML, Plain Text, RTF, Email, PDF and

Microsoft Word. A Corpus in GATE is a Java Set whose members

are Documents. ANNIE can run only on a corpus.

Processing Resources are responsible for performing text mining

on the corpus. There are many plugins available that provide

various functionalities. Some of the important ones applicable in

our project are Tokeniser, Gazetteer and JAPE Transducer.

The Tokeniser splits the text into simple tokens. There are five

types of token – word, number, symbol, punctuation and

spacetoken. The aim is to limit the work of the Tokeniser to

maximize efficiency, and enable greater flexibility by placing the

burden on the grammar rules (JAPE), which are more adaptable.

The Gazetteer Lists used are plain text files, with one entry per

line. An index file (lists.def) is used to access these lists; for each

list, a major type is specified and, optionally, a minortype. These

lists are compiled into finite state machines. Any text tokens that

are matched by these machines will be annotated with features

specifying the major and minor types. Grammar rules (JAPE) then

specify the types to be identified in particular circumstances. Each

gazetteer list should reside in the same directory as the index file.

JAPE provides finite state transduction over annotations based on

regular expressions. A JAPE grammar consists of a set of phases,

each of which consists of a set of pattern/action rules. The phases

run sequentially and constitute a cascade of finite state transducers

over annotations. The left-hand-side (LHS) of the rules consist of


Volume 1 – No. 12

3

an annotation pattern that may contain regular expression

operators (e.g. *, ?, +). The right-hand-side (RHS) consists of

annotation manipulation statements. Annotations matched on the

LHS of a rule may be referred to on the RHS by means of labels

that are attached to pattern elements. The RHS of the rule contains

information about the annotation. Information about the

annotation is transferred from the LHS of the rule using the label

just described, and annotated with the entity type (which follows

it). Finally, attributes and their corresponding values are added to

the annotation. Alternatively, the RHS of the rule can contain Java

code to create or manipulate annotations. JAPE grammars are

written as files with the extension ‖.jape‖, which are parsed and

compiled at run-time to execute them over the GATE

document(s).

In our project we have created Gazetteer Lists consisting of

names of players based on the club to which they belong and

their position. The Tokeniser is used to annotate the entries in the

Gazetteer List. The JAPE rules written use these annotations on

the LHS to indentify the player and the JAVA code in the RHS

part is to assign points to that player. Different rules are written to

indentify various actions like a goal scored, a red/yellow card

shown, a save made, a penalty missed, a foul made, a clearance

made and so on.

The following two figures describe the implementation

Figure3. A Screen Shot of GATE

Here is a screen shot of GATE which shows the various Language Resources and Processing Resources in GATE and ANNIE being loaded

to run on the Corpus


Volume 1 – No. 12

4

Figure4.A Screen Shot of the Data Base

Here is a screen shot of the Data Base which shows the allocation of points to a particular based on his performance in the match.

5. OTHER APPLICATIONS

In the above example, information derived was used to provide

performance analysis of players in the form of points i.e.

structured data. But crawlers along with text mining can support a

number of applications. The only things that will change are the

information resources and analyzing of information.

To track vote count during presidential elections or to track news

about stock market and performance of firms sites for crawling

will be different (e.g. news channels websites ,stock exchange

web sites) and the analysis instead of database can be dynamic

application showing textual updates. One can get information

regarding the quarterly reports of a company or about the status of

a particular stock and expert opinions on that stock from various

websites/news channels all on a single screen. Such an analysis

can help a prospective trader to make decision regarding which

shares to buy, when to buy and when to sell.

Other applications include applications for analyzing reviews and

performance of different products by performing a comparative

study, to identify the best product based on the individual needs

of a customer. For example if a customer intends to buy a Digital

Camera, he would want a comparative report of the various

companies to decide the best product for himself. For this

websites of companies like SONY, NIKON, etc as well as those

of retail outlets need to be crawled.


Volume 1 – No. 12

5

6. CONCLUSION

The goal of web intelligence is to retrieve information about the

customer decision process, customer needs and customer

behavior. Retrieving this information gives marketing intelligence

the opportunity to improve their predictive models and to create a

serious customer view. In this article, we used the example of

football league to explain web intelligence using text mining

techniques for player assessment.

7. ACKNOWLEDGEMENT

We are very grateful & indebted to our project guide ―Prof. Radha

Shankarmani‖ for providing her enduring patience, guidance and

invaluable suggestions. She was a constant source of inspiration

for us and took utmost interest in our project. We would also like

to thank all the IT Staff members for their invaluable co-

operation. We are also thankful to all the students for giving us

their useful advice and immense co-operation. Lastly we would

like to convey our regards to the developers of GATE as well as

it‘s other world-wide users for their constant support and

guidance through the online mailing lists.

8. REFERENCES

[1] GATE . www.gate.ac.uk. General Architecture for Text

Engineering or GATE is a Java software toolkit originally

developed at the University of Sheffield since 1995.

[2] Web Intelligence: kis.maebashi-it.ac.jp/wi01/ www.web-

intelligence.com/

[3] Muslea, I. (Ed.). (2004). Papers from the AAAI-2004

Workshop on Adaptive Text Extraction and Mining (ATEM-

2004) Workshop, San Jose, CA. AAAI Press.

[4] Weiguo Fan, et. al., ―Tapping the Power of Text Mining,‖

Communications of the ACM, 49(9), 2006.

[5] Tan, A.-H. (1999), ―Text Mining: The state of the art

and the challenges‖, in Proceedings, PAKDD‘99 workshop

on Knowledge Discovery from Advanced Databases, Beijing,

April, 1999.

[6] Intelligence on the Web: www.fas.org/irp/intelwww.html

WIN: home WEB INTELLIGENCE

NETWORK,smarter.net/

[7] J. Srivastava et al., ―Web Usage Mining: Discovery

andApplications of Usage Patterns from Web Data,‖

SIGKDD Explorations, vol. 1, no. 2, 2000, pp. 12

[8] Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996),

―From data mining to knowledge discovery: An overview‖,

in U. Fayyad et al. (eds.) Advances in Knowledge

Discovery and Data Mining, MIT Press, Cambridge, Mass.

[9] Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996),

―From data mining to knowledge discovery: An overview‖,

in U. Fayyad et al. (eds.) Advances in Knowledge

Discovery and Data Mining, MIT Press, Cambridge, Mass., 1

[10] IEEE 2000b. IEEE Standard for Modelling and Simulation

(M&S) High Level Architecture (HLA) –Federate Interface

Specification. IEEE Std 1516.1-2000. IEEE Computer

Society, New York, NY.

Mesin pencari yang ada memiliki kemampuan yang luar biasa, tetapi apa yang tidak terdapat didalamnya adalah pengurangan kemampuan-kemampuan untuk mensintesis jawaban atas permintaandari isi informasi yang ada di berbagai bagian di World Wide Web. Intelijensi web adalah daerahpenelitian yang mencoba untuk memberikan kemampuan ini.

Dalam tulisan ini digunakan Text Mining-fitur intelijensi web untuk memperoleh informasi dari datatekstual tidak terstruktur di web dan perangkat strategi berbasis konsensus untuk keputusan bisnis.Hal ini memberikan dua keuntungan kali lipat, pertama mengurangi risiko awal dan kedua akanmemberikan dukungan untuk pemahaman dan pengambilan keputusan. Konsep ini dijelaskan dengancontoh mengevaluasi kinerja pemain berdasarkan komentar menit ke menit pertandingan. Parameterseperti posisi pemain di lapangan (misalnya dalam sepak bola - pemain bertahan, gelandang, dankiper), kinerja sebelumnya, kebugarannya sekarang, dan parameter lainnya akan dipertimbangkan.Nilai untuk masing-masing parameter ditentukan dan informasi dapat ditarik untuk menganalisiskinerja pemain. Selama analisis, kita melihat komentar, kita membaca forum penggemar, blog, ulasankoran terhadap permainan, pandangan ahli komentator, dll. Hal ini baik digunakan sebagai faktorkoreksi untuk meningkatkan kredibilitas model.

Seluruh prosedur melibatkan empat tahap utama: Web crawling, yaitu mengidentifikasi sumberinformasi, pencarian informasi dan ekstraksi, penambangan teks, dan akhirnya mengkonversi datatidak terstruktur menjadi data terstruktur.

Tahap pertama adalah untuk mengumpulkan informasi. Tapi informasi tidak dapat dikumpulkan dariseluruh web karena kepadatannya. Untuk mengurangi waktu pencarian sumber statis (situs web) yangdipilih untuk mengekstrak informasi. Keaslian, kebenaran, dan kemutakhiran dari sumber tersebutsangat penting untuk pencarian informasi. Selain dari sumber-sumber statis mungkin ada sumber yangmungkin memberikan informasi tambahan tetapi tidak diketahui oleh pengguna. Sebuah web crawler(juga dikenal sebagai spider web) adalah sebuah program atau skrip otomatis yang menelusuri WorldWide Web untuk menemukan sumber tersebut.

Sistem pencarian informasi mengidentifikasi dokumen dan halaman dalam koleksi yang sesuai denganpermintaan pengguna. Pencarian informasi sistem memungkinkan kita untuk mempersempit setdokumen yang relevan dengan masalah tertentu.

Setelah pengumpulan daftar singkat, halaman web penambangan teks akan diterapkan. Tiga faseberikut - ekstraksi informasi, penambangan teks, mengkonversi data tidak terstruktur menjadi dataterstruktur yang terkait erat satu sama lain.

Dalam contoh di atas, informasi yang diperoleh digunakan untuk menyediakan analisis kinerja pemaindalam bentuk poin yaitu data terstruktur. Tapi crawler bersama dengan text mining dapat mendukungsejumlah aplikasi. Satu-satunya hal yang akan berubah adalah sumber informasi dan proses analisisinformasi.

Tujuan dari inteligensi web adalah untuk mengambil informasi tentang proses pengambilan keputusanpelanggan, kebutuhan pelanggan, dan perilaku pelanggan. Mengambil informasi ini memberikaninteligensi pemasaran kesempatan untuk memperbaiki model prediksi mereka dan untukmenciptakan pandangan serius pelanggan. Pada artikel ini, digunakan contoh liga sepak bola untukmenjelaskan web intelijen menggunakan text mining teknik untuk penilaian pemain.

National Conference on Innovative Paradigms in Engineering & Technology (NCIPET-2012)

Proceedings published by International Journal of Computer Applications® (IJCA)

20

Multicore Processing for Classification and Clustering

Algorithms

V. Vaitheeshwaran School of Computing,

KL University Vijayawada, India

Kapil Kumar Nagwanshi School of Computing,


T. V. Rao School of Computing,


ABSTRACT Data Mining algorithms such as classification and clustering are

the future of computation, though multidimensional data-

processing is required. People are using multicore processors

with GPU’s. Most of the programming languages doesn’t

provide multiprocessing facilities and hence wastage of

processing resources. Clustering and classification algorithms

are more resource consuming. In this paper we have shown

strategies to overcome such deficiencies using multicore

processing platform OpenCL.

Keywords

Parallel Processing, Clustering, Classification, OpenCL, CUDA,

NVIDIA, AMD, GPU.

1. INTRODUCTION CLUSTERING is an unsupervised learning technique that

separates data items into a number of groups, such that items in

the same cluster are more similar to each other and items in

different clusters tend to be dissimilar, according to some

measure of similarity or proximity. Pizzuti and Talia[1]presents

a P-AutoClass technique for Scalable Parallel Clustering for

Mining Large Data Sets Data clustering is an important task in

the area of data mining. Clustering is the unsupervised

classification of data items into homogeneous groups called

clusters. Clustering methods partition a set of data items into

clusters, such that items in the same cluster are more similar to

each other than items in different clusters according to some

defined criteria. Clustering algorithms are computationally

intensive, particularly when they are used to analyze large

amounts of data. A possible approach to reduce the processing

time is based on the implementation of clustering algorithms on

scalable parallel computers. This paper describes the design and

implementation of P-AutoClass, a parallel version of the

AutoClass system based upon the Bayesian model for

determining optimal classes in large data sets. The P-AutoClass

implementation divides the clustering task among the processors

of a multicomputer so that each processor works on its own

partition and exchanges intermediate results with the other

processors. The system architecture, its implementation, and

experimental performance results on different processor

numbers and data sets are presented and compared with

theoretical performance. In particular, experimental and

predicted scalability and efficiency of P-AutoClass versus the

sequential AutoClass system are evaluated and compared.

Different from supervised learning, where training

examples are associated with a class label that expresses the

membership of every example to a class, clustering assumes no

information about the distribution of the objects and it has the

task to both discover the classes present in the data set and to

assign objects among such classes in the best way. A large

number of clustering methods have been developed in several

different fields, with different definitions of clusters and

similarity among objects. The variety of clustering techniques is

reflected by the variety of terms used for cluster analysis such as

clumping, competitive learning, unsupervised pattern

recognition, vector quantization, partitioning, and winner-take-

all learning.

Most of the early cluster analysis algorithms come

from the area of statistics and have been originally designed for

relatively small data sets. Fayyad et al [2], found that the

clustering algorithms have been extended to efficiently work for

knowledge discovery in large databases and, therefore, to

classify large data sets with high-dimensional feature items.

Clustering algorithms are very computing demanding and, thus,

require high-performance machines to get results in a

reasonable amount of time. HunterandStates[3] gives

classification algorithm on protein databases and experiences of

clustering algorithms taking one week or about 20 days of

computation time on sequential machines are not rare. Scalable

parallel computers can provide the appropriate setting where to

efficiently execute clustering algorithms for extracting

knowledge from large-scale databases and, recently, there has

been an increasing interest in parallel implementations of data

clustering algorithms. There are variety of parallel approaches

to clustering has been discovered by[4],[5],[6],[7].

Classification: Classification is one of the primary

data mining tasks[8]. The input to a classification system

consists of example tuples, called a training set, with each tuple

having several attributes. Attributes can be continuous, coming

from an ordered domain, or categorical, coming from an

unordered domain. A special class attribute indicates the label

or category to which an example belongs. The goal of

classification is to induce a model from the training set, that can

be used to predict the class of a new tuple. Classification has

applications in diverse fields such as retail target marketing,

fraud detection, and medical diagnosis (Michie, 1994). Amongst

many classification methods proposed over the years

[9][10]decision trees are particularly suited for data mining,

since they can be built relatively fast compared to other methods

and they are easy to interpret [11]. Trees can also be converted

into SQL statements that can be used to access databases

efficiently [12]. Finally, decision-tree classifiers obtain similar,

and often better, accuracy compared to other methods [10].

Prior to interest in classification for database-centric data

mining, it was tacitly assumed that the training sets could fit in

memory. Recent work has targeted the massive training sets

usual in data mining. Developing classification models using

larger training sets can enable the development of higher

accuracy models. Various studies have confirmed this[13].

Recent classifiers that can handle disk-resident data include

SLIQ [14], SPRINT [15], and CLOUDS[16]. As data continue

to grow in size and complexity, high performance scalable data

mining tools must necessarily rely on parallel computing



21

techniques. Past research on parallel classification has been

focused on distributed-memory (also called shared-nothing)

machines. Examples include parallel ID3 [17], which assumed

that the entire dataset could fit in memory; Darwin toolkit with

parallel CART [18]from Thinking Machine, whose details are

not available in published literature; parallel SPRINT on IBM

SP2 [15]; and ScalParC[19]on a Cray T3D. While distributed-

memory machines provide massive parallelism, shared-memory

machines (also called shared everything systems), are also

capable of delivering high performance for low to medium

degree of parallelismat an economically attractive price.

Increasingly SMP machines arebeing networked together via

high-speed links to form hierarchical clusters. Examples include

the SGI Origin 2000and IBM SP2 system which can have a 8-

way SMP as one high node. A shared-memory system offers a

single memory address space that all processors can access.

Processors communicate through shared variables in memory.

Synchronization is used to co-ordinate processes. Any processor

can also access any disk attached to the system. The SMP

architecture offers new challenges and trade-offs that are worth

investigating in their own right.

2. THE GPU ARCHITECTURE All Initially intended as a fixed many-core processor dedicated

to transforming 3-D scenes to a 2-D image composed of pixels,

the GPU architecture has undergone several innovations to meet

the computationally demanding needs of supercomputing

research groups across the globe. The traditional GPU pipeline

designed to serve its original purpose came with several

disadvantages. Shortcomings such as the v limited data reuse in

the pipeline, excessive variations in hardware usage, and lack of

integer instructions coupled with weak floating-point precision

rendered the traditional GPU a weak candidate for HPC. In

November 2006 [20], NVIDIA introduced the GeForce 8800

GTX with a novel unified pipeline and shader architecture. In

addition to overcoming the limitations of the traditional GPU

pipeline, the GeForce 8800 GTX architecture added the concept

of streaming processor (SMP) architecture that is highly

pertinent to current GP-GPU programming. SMPs can work

together in close proximity with extremely high parallel

processing power. The outputs produced can be stored in fast

cache and can be used by other SMPs. SMPs have instruction

decoder units and execution logic performing similar operations

on the data. This architecture allows SIMD instructions to be

efficiently mapped across groups of SMPs. The streaming

processors are accompanied by units for texture fetch (TF),

texture addressing (TA), and caches. The structure is maintained

and scaled up to 128 SMPs in GeForce 8800 GTX. The SMPs

operate at 2.35 GHz in the GeForce 8800 GTX, which is

separate from core clock operating at 575 MHz. Several GP-

GPUs used thus far for HPC applications have architectures that

are concurrent with the GeForce 8800 GTX architecture.

However, the introduction of the Fermi by Nvidia in September

2009 [21]has radically changed the contours of the GP-GPU

architecture, as we will explore in the next subsection.

GPU’s amazing evolution on both computational capability

and functionality extends application of GPUs to the field of

non-graphics computations, which is so-called general purpose

computation on GPUs (GPGPU) [22]. Design and development

of GPGPU are becoming significant because of the following

reasons:

1. Cost-performance: Using only commodity hardware is

important to achieve high-performance computing at a low

cost, and GPUs have become commonplace even in low-

end PCs. Due to the hardware architecture designed for

exploiting parallelism of graphics, even today’s low-end

GPU exhibits high-performance for data-parallel

computing. In addition, GPU has much higher sequential

memory access performance than CPU because one of

GPU’s key tasks is filling regions of memory with

contiguous texture data. That is, GPU’s dedicated memory

can provide data to GPU’s processing units at the high

memory bandwidth.

2. Evolution speed: GPU’s performance such as the

number of floating-point operations per second has been

growing at a rapid pace. Amazingly-evolving GPU

capabilities have a possibility to enable the GPU

implementation of a task to outperform its CPU

implementation in the future. Modern GPUs have two

kinds of programmable processors, vertex

shaderandfragment shader, on the graphics pipeline to

render an image.

Figure 1 illustrates a block diagram of the programmable

rendering pipeline of these processors. The vertex shader

manipulates transformation and lighting of vertices of

polygons to transform them into the viewing coordinate

system. Polygons projected into the viewing coordinate

system are then decomposed into fragments each

corresponding to a pixel on the screen. Subsequently, the

color and depth of a fragment are computed by the

fragment shader. Finally, composition operations such as

tests using depth, alpha and stencil buffers are applied to

the outputs of the fragment shader to determine the final

pixel colors to be written to the frame buffer.

It is emphasized here that vertex and fragment shaders

are developed to utilize multi-grain parallelism in the

rendering processes: the coarse-grain vertex/fragment level

parallelism and the fine-grain vector component level

parallelism. To exploit the coarse-grain parallelism at the

GPU level, individual vertices and fragments can be

processed in parallel. The fragment shaders (vertex

shaders) of recent GPUs have several processing units for

parallel-processing multiple fragments (vertices). For

example, NVIDIA’s high-end GPU, GeForce 6800 Ultra,

has 16 processing units in the fragment shader, and

therefore can compute colors and depths of up to 16

fragments at the same time. On the other hand, to exploit

the fine-grain parallelism involved in all vector operations,

they have SIMD instructions that can simultaneously

operate on four 32-bit floating-point values within a 128-

bit register. For example, one of the powerful SIMD

instructions, the “multiply and add” (MAD) instruction,

performs a component-wise multiply of two registers each

storing four floating-point components, and then does a

component-wise addition of the product to another register;

the MAD instruction performs these eight floating-point

operations in a single cycle. Due to its application-specific

architecture, however, GPU does not work well

universally. To exhibit high performance for a non-

graphics application, hence, we ought to consider how to

bind it to GPU’s programmable rendering pipeline.



22

Fig 1 Overview of a programmable rendering pipeline

The most critical restriction in GPU programming for non-

graphics applications is due to the restricted data flows in

and between the vertex shader and the fragment shader.

Arrows in Fig. 1 show typically-permitted data flows. Both

vertex and fragment shader programs have to write their

outputs to write-only dedicated registers; random access

writes are not provided. This is severe impediment to

effective implementation of many data structures and

algorithms. In addition, the lack of loop-controls,

conditionals, and branching is also serious for most of

practical applications. Although the latest GPUs with

Shader Model 3.0 support dynamic controls flows, there is

some overhead to flow-control operations, and they can

limit the GPU’s performance. If an application imposes the

restriction violation on the GPU programming model

mentioned above, it is not a good idea to implement the

application entirely on GPUs. For such an application,

collaboration between CPU and GPU often leads to better

performance, though it is essential to keep time-consuming

data transfer between them minimum. From the viewpoint

of data accessibility, the fragment shader is superior to the

vertex shader because the fragment shader can randomly

access the video memory and fetch data as texture colors.1

Furthermore, the fragment shader usually has more

processing units than the vertex shader, and thereby the

fragment shader has a great potential to exploit

dataparallelism more effectively. Consequently, this paper

presents an implementation of data clustering accelerated

effectively using multi-grain parallel processing on the

fragment shader.

2.1 GPU Computing with AMD/ATi Radeon

5870 The AMD/ATi’s Radeon 5870 architecture [23]is very different

compared to NVIDIA’s Fermi architecture. The AMD/ATi

Radeon 5870 used in our study has 1600 ALUs organized in a

different fashion compared to the Fermi. The ALUs are grouped

into five-ALU Very Long Instruction Word (VLIW) processor

units. While all five of the ALUs can issue the basic arithmetic

operations, only the fifth ALU can additionally execute

transcendental operations. The five-ALU groups along with the

branch execution unit and general-purpose registers form

another group called the stream core. This translates to 320

stream cores in all, which are further grouped into compute

units. Each compute unit has 16 stream cores, resulting in 20

total compute units in the ATi Radeon 5870. One thread can be

executed on one stream core, thus 16 threads can be run on a

single compute unit. In order to hide the memory latency, 64

threads are assigned to a single compute unit. When one 16-

thread group accesses memory, the other 16-thread group

executes on the ALU. Therefore theoretically, a throughput of

16 threads per cycle is possible on the Radeon architecture.

Each ALU can execute a maximum of 2 single-precision Flops:

multiply and add instructions per cycle. The clock rate of the

Radeon GPU is 850 MHz; for 1600 ALUs this translates to a

throughput of 2.72 TFlops/s. The Radeon 5870 has a memory

hierarchy that is similar to the Fermi’s memory hierarchy. The

hierarchy includes a global memory, L1 and L2 cache, shared

memory, and registers. The 1 GB global memory has the peak

bandwidth of 153 GB/s and is controlled by eight memory

controllers. Each compute unit has 8 KB L1 cache having an

aggregate bandwidth of 1 TB/s. Multiple compute units share a

512 KB L2 cache with 435 GB/s of bandwidth between L1 and

L2 cache. Each compute unit also has a 32 KB of shared

memory, providing a total 2 TB/s aggregate bandwidth. The

registers have the highest bandwidth, 48 bytes per cycle in each

stream core (aggregate bandwidth of 48 ∗ 320 ∗ 850 MB/s, i.e.,

13 TB/s). The 256 KB register space is available per compute

unit, totaling 5.1 MB for the entire GPU.

3. THE K-MEANS ALGORITHM In data clustering[24], multivariate data units are grouped

according to their similarity or dissimilarity. MacQueen used

the term k-means to denote the process of assigning each data

unit to that cluster (of k clusters) with the nearest centroid. That

is, k-means clustering employs the Euclidean distance between

data units as the dissimilarity measure; a partition of data units

is assessed by the squared error:

𝐸 𝐷 = 𝑘

𝑚𝑖𝑛𝑗 = 1

𝑥𝑖 − 𝑦𝑖 2 𝑚

𝑖=1 (3.1)

wherexi∈Rd, i= 1, 2, . . . ,m is a data unit and yj∈Rd , j = 1, 2, . . .

, k denotes the cluster centroid.

Although there are a vast variety of k-means algorithms

[5], for the sake of explanation simplicity, this paper focuses on

a simple and standard k-means algorithm summarized as

follows:

1. Begin with any desirable initial states, e.g. initial

cluster centroids may be drawn randomly from a

given data set.

2. Allocate each data unit to the cluster with the nearest

centroid. The centroids remain fixed through the

entire data set.

3. Calculate centroids of new clusters.

4. Repeat Steps 2 and 3 until a convergence condition is

met, e.g. no data units change their membership at

Step 2, or the number of repetitions exceeds a

predefined threshold.

At each repetition the assignment of m data units to k

clusters in Step 2 requires km distance computations (and (k -

1)mdistance comparisons) for finding the nearest cluster

centroids, the so-called nearest neighbor search. The cost of

each distance computation increases in proportion to the

dimension of data, i.e. the number of vector elements in a data

unit, d. The nearest neighbor search consists of approximately 3

dkmfloating-point operations, and thus the computational cost of

the nearest neighbor search grows at O(dkm). In practical

applications, the nearest neighbor search consumes most of the

execution time for k-means clustering because m and/or d often

become tremendous. However, the nearest neighbor search

involves massive SIMD parallelism; the distance between every

pair of a data unit and a cluster centroid can be computed in

parallel, and the distance computation can further be

parallelized according to their vector components.

This motivates us to implement the distance computation on

recent programmable GPUs as multi-grain SIMD-parallel

coprocessors. On the other hand, there is no necessity to

consider the acceleration of Steps 1 and 4 using GPU

programming, because they require little execution time and

further include almost no parallelism. In Step 3, cluster centroid



23

recalculation consists of dmadditions and dkdivisions of

floating-point values. Although most of these calculations can

be performed in parallel, conditionals and random access writes

are required for effective implementation of individually

summing up vectors within each cluster. In addition, the

divisions also require conditional branching to prevent divide-

by-zero errors. Since the execution time for Step 3 is much less

than that of Step 2, there is no room for performance

improvement that outweighs the overheads derived from the

lack of random access writes and conditionals in GPU

programming. Therefore, we decide to implement Steps 1, 3,

and 4 as CPU tasks.

4. DISCUSSION

Fig. 2 Parallel processing of data clustering that uses

GPU as a co-processor to exploit two kinds of data

parallelism in the nearest neighbor search Our preliminary analysis show that the data transfer from CPU

to GPU at each rendering pass is not a bottleneck in the nearest

neighbor search. This is because the large data set has already

been placed on the GPU-side video memory in advance; only

the geometry data of a polygon, including texture coordinates as

a cluster centroid, are transferred at eachrendering pass. On the

other hand, the data transfer from the GPU-side video memory

to the main memory induces a certain overhead even when

using the PCI-Express interface. Therefore, we should be

judicious about reading data back from GPU, even in the cases

of using GPUs connected via the PCI-Express interface. In our

implementation scheme, the overhead of the data transfer is

negligible except for trivial-scale data clustering because the

data placed on the GPU-side video memory are transferred only

once in each repetition. Accordingly, our implementation

scheme of data clustering with GPU co-processing can exploit

GPU’s computing performance without critical degradation

attributable to the data transfer between CPU and GPU.

5. CONCLUSION In this paper, further research tasks include using a cluster of

GPUs for texture classification. This could be done including

various GPUs on the same machine or dividing computations

into a cluster of PCs, and would significantly increase the

applicability of the architecture to complex industrial

applications. Using this approach, a deeper analysis on the

strategy of parallelization for multi-GPU introduced is needed.

Undoubtedly, the increase of performance will not be

proportional to the number of GPUs and will be damaged by

factors such as data communication and synchronization among

different hosts. we have proposed a three-level hierarchical

parallel processing scheme for the k-means algorithm using a

modern programmable GPU as a SIMD-parallel co-processor.

Based on the divide-and-conquer approach, the proposed

scheme divides a large-scale data clustering task into subtasks

of clustering small subsets, and the subtasks are executed on a

PC cluster system in an embarrassingly parallel manner. In the

subtasks, a GPU is used as a multigrain SIMD-parallel co-

processor to accelerate the nearest neighbor search, which

consumes a considerable part of the execution time in the k-

means algorithm. The distances from one cluster centroid to

several data units are computed in parallel. Each distance

computation is parallelized by component-wise SIMD

instructions. As a result, the parallel data clustering with GPU

co-processing significantly improve the computational

efficiency of massive data clustering. Experimental results

clearly show that the proposed hierarchical parallel processing

scheme remarkably accelerate massive data clustering tasks.

Especially, acceleration of the nearest neighbor search by GPU

co-processing is significant to save the total execution time in

spite of the overhead of the data transfer from the GPU-side

video memory to the CPU-side main memory. GPU co-

processing is also effective to retain the scalability of the

proposed scheme by accelerating the aggregation stage that is a

non-parallelized part of the proposed scheme.

This paper has discussed the GPU implementation of the nearest

neighbor search, compared with the CPU implementation to

clarify the performance gain of GPU co-processing. However,

the multi-threading approach has a possibility to allow both

CPU and GPU to execute the nearest neighbor search in parallel

without interrupting each other. In such an implementation,

hence, GPU co-processing will always bring additional

computing power even in the case where only a low-end GPU is

available. The multi-threading implementation with effective

load balancing between CPU and GPU will be investigated in

our future work.

6. REFERENCES [1] C. Pizzuti and D. Talia, "P-AutoClass: Scalable Parallel

Clustering for Mining Large Data Sets," IEEE

Transactions On Knowledge And Data Engineering,vol.

15, no. 3, pp. 629-641, 2003.

[2] U. Fayyad, G. Piatesky-Shapiro and P. Smith, From Data

Mining to Knowledge Discovery: An Overview, NY:

AAAI/MIT Press, 1996.

[3] L. Hunter and D. States, "Bayesian Classification of

Protein Structure," Expert, vol. 7, no. 4, pp. 67-75, 1992.

[4] C. Olson, " Parallel Algorithms for Hierarchical

Clustering," Parallel Computing, vol. 21, pp. 1313-1325,

1995.

[5] D. Judd, P. McKinley and A. Jain, "Large-Scale Parallel

Data Clustering," in Int'l Conf. Pattern Recognition,

New York, 1996.

[6] J. Potts, Seeking Parallelism in Discovery Programs,

Arlington: Master Thesis : Univ. of Texas, 1996.

[7] K. Stoffel and A. Belkoniene, "Parallel K-Means

Clustering for Large Data Sets," in Parallel Processing,

UK, 1999.

[8] R. Agrawal, T. Imielinski and A. Swami, "Database

mining: A performance perspective," vol. 5, no. 6, p.

914–925, Dec 1993.

[9] S. Weiss and C. Kulikowski, Computer Systems that

Learn. ,, vol. 1, New York: Morgan Kaufman, 1991.

[10] D. Michie, Machine Learning, Neural and Statistical

Classification, vol. I, NJ: Ellis Horwood, 1994.

[11] J. Quinlan, Programs for Machine Learning, vol. I, New

York: Morgan Kaufman, 1999.

[12] R. Agrawal, "An interval classifier for database mining

applications," in VLDB Conference, New York, Aug

1992.

[13] J. Catlett, Megainduction Machine Learning on Very

Large Databases. PhD thesis,, vol. I, Sydney: Univ. of

Sydney, 1991.



24

[14] M. Mehta, R. Agrawal and J. Rissanen, "SLIQ: A fast

scalable classifier for data mining," in 5th Intl. Conf. on

Extending Database Technology, NJ, March 1996.

[15] J. Shafer, R. Agrawal and M. Mehta., "SPRINT: A

scalable parallel classifier for data mining," in 22nd

VLDB Conferenc, NJ, Sept 1996.

[16] K. Alsabti, S. Ranka and V. Singh, "CLOUDS: A

decision tree classifier for large datasets," in 4th Intl.

Conf. on Knowledge Discovery and DataMining, Aug

1998.

[17] D. Fifield, Distributed tree construction from large data-

sets: Bachelor Thesis,, Australian Natl. Univ., 1992.

[18] L. Breiman, Classification and Regression Trees,

Belmont: Wadsworth, 1984.

[19] M. Joshi, G. Karypis and V. Kumar, ScalParC: A

scalable and parallel classification algorithm for mining

large datasets, Intl. Parallel Processing Symp, 1998.

[20] "Technical Brief: NVIDIA GeForce 8800 GPU

architecture overview," [Online]. Available:

www.nvidia.com.

[21] "NVIDIA’s next generation CUDA compute

architecture: Fermi," [Online]. Available:

http://www.nvidia.com/content/

PDF/fermi_white_papers/NVIDIAFermiComputeArchit

ectureWhitepaper.pdf.

[22] Z. Fan, F. Qiu, A. Kaufman and S. Yoakum-Stover,

"GPU cluster for high performance computing," NY,

2004.

[23] "ATI Mobility Radeon HD 5870 GPU specifications,"

[Online]. Available:

http://www.amd.com/us/products/notebook/graphics/ati-

mobility-hd-5800/Pages/hd-5870-specs.aspx.

[24] D. Judd, P. McKinley and A. Jain, "Performance

Evaluation on Large-Scale Parallel Clustering in NOW

Environments," in Eighth SIAM Conf. Parallel

Processing for Scientific Computing, Mar 1997.

http://www.nvidia.com/

FEMBI REKRISNA GRANDEA PUTRA M0513019

UJIAN KOMPETENSI DASAR 2 PENGANTAR ORGANISASI KOMPUTER

Algoritma Data Mining seperti klasifikasi dan clustering adalah masa depan komputasi,

meskipun dataprocessing multidimensi diperlukan. Orang yang menggunakan prosesor multicore

dengan GPU. Sebagian besar bahasa pemrograman tidak menyediakan fasilitas multiprocessing dan

karenanya menjadi pemborosan pengolahan sumber daya. Clustering dan klasifikasi algoritma yang

mengonsumsi lebih banyak sumber daya. Dalam makalah ini telah menunjukkan strategi untuk

mengatasi kekurangan tersebut dengan menggunakan multicore processing platform OpenCL.

CLUSTERING adalah teknik pembelajaran tanpa pengawasan yang memisahkan item data

menjadi beberapa kelompok, sehingga item dalam cluster yang sama lebih mirip satu sama lain dan

barang-barang dalam kelompok yang berbeda cenderung berbeda, menurut beberapa ukuran

kesamaan atau kedekatan. Pizzuti dan Talia menyajikan teknik P-AutoClass untuk Clustering Paralel

Scalable untuk Set Data Mining Besar pengelompokan data merupakan tugas penting di bidang data

mining. Clustering adalah klasifikasi terawasi item data ke dalam kelompok homogen yang disebut

cluster. Metode Clustering partisi satu set item data ke dalam cluster, sehingga item dalam cluster

yang sama lebih mirip satu sama lain daripada item dalam kelompok yang berbeda menurut beberapa

kriteria yang ditetapkan. Algoritma Clustering adalah komputasi intensif, terutama ketika mereka

digunakan untuk menganalisis data dalam jumlah besar. Sebuah pendekatan yang mungkin untuk

mengurangi waktu pemrosesan didasarkan pada pelaksanaan algoritma klasterisasi pada komputer

paralel scalable. Makalah ini menjelaskan desain dan implementasi dari P-AutoClass, versi paralel dari

sistem AutoClass berdasarkan model Bayesian untuk menentukan kelas optimal dalam set data yang

besar. Pelaksanaan P-AutoClass membagi tugas pengelompokan antara prosesor multicomputer

sehingga setiap prosesor bekerja pada partisi sendiri dan pertukaran hasil antara dengan prosesor

lainnya. Sistem arsitektur, implementasi, dan hasil kinerja eksperimen pada nomor prosesor yang

berbeda dan set data disajikan dan dibandingkan dengan kinerja teoritis. Secara khusus, percobaan

dan diprediksi skalabilitas dan efisiensi P-AutoClass versus sistem AutoClass berurutan dievaluasi dan

dibandingkan.

Semua Awalnya dimaksudkan sebagai prosesor banyak-inti tetap didedikasikan untuk

mengubah 3-D layar untuk gambar 2-D yang terdiri dari pixel, arsitektur GPU telah mengalami

beberapa inovasi untuk memenuhi tuntutan kebutuhan komputasi dari superkomputer kelompok

penelitian di seluruh dunia. Pipa GPU tradisional dirancang untuk melayani tujuan aslinya datang

dengan beberapa kelemahan. Kekurangan seperti v-reuse terbatas data dalam pipa, variasi yang

berlebihan dalam penggunaan hardware, dan kurangnya instruksi integer ditambah dengan lemah

floating-point presisi diberikan GPU tradisional kandidat yang lemah untuk HPC. Pada bulan November

2006, NVIDIA memperkenalkan GeForce GTX 8800 dengan pipa terpadu baru dan arsitektur shader.

Selain mengatasi keterbatasan pipa GPU tradisional, arsitektur GeForce GTX 8800 menambahkan

FEMBI REKRISNA GRANDEA PUTRA M0513019

UJIAN KOMPETENSI DASAR 2 PENGANTAR ORGANISASI KOMPUTER

konsep streaming processor (SMP) arsitektur yang sangat relevan untuk saat ini pemrograman GP-

GPU. SMP dapat bekerja sama di dekat dengan kekuatan pemrosesan paralel yang sangat tinggi.

Output yang dihasilkan dapat disimpan dalam cache cepat dan dapat digunakan oleh SMP lainnya.

SMP memiliki unit decoder instruksi dan eksekusi logika melakukan operasi serupa pada data.

Arsitektur ini memungkinkan instruksi SIMD untuk secara efisien dipetakan di kelompok SMP.

Prosesor streaming yang disertai oleh unit untuk tekstur fetch ( TF ), tekstur menangani (TA ), dan

cache. Struktur dipertahankan dan ditingkatkan untuk 128 SMP di GeForce 8800 GTX. The SMP

beroperasi pada 2,35 GHz di GeForce 8800 GTX, yang terpisah dari core clock 575 MHz yang beroperasi

di. Beberapa GPGPUs digunakan sejauh ini untuk aplikasi HPC memiliki arsitektur yang bersamaan

dengan arsitektur GeForce GTX 8800. Namun, pengenalan Fermi oleh Nvidia pada bulan September

2009 [ 21 ] telah secara radikal mengubah kontur arsitektur GP-GPU, seperti yang kita akan

mengeksplorasi pada subseksi berikutnya.

Analisis awal kami menunjukkan bahwa transfer data dari CPU ke GPU di setiap pass render

bukanlah hambatan dalam pencarian tetangga terdekat. Hal ini karena set data yang besar telah

ditempatkan pada memori video GPU-side terlebih dahulu, hanya data geometri poligon, termasuk

koordinat tekstur sebagai centroid cluster ditransfer pada eachrendering lulus. Di sisi lain, transfer

data dari memori video GPU-sisi ke memori utama menginduksi overhead tertentu bahkan ketika

menggunakan interface PCI-Express. Oleh karena itu, kita harus bijaksana tentang membaca kembali

data dari GPU, bahkan dalam kasus menggunakan GPU terhubung melalui antarmuka PCI-Express.

Dalam skema implementasi kami, overhead transfer data diabaikan kecuali untuk clustering data yang

sepele skala karena data ditempatkan pada memori video GPU-side ditransfer hanya sekali dalam

setiap pengulangan. Dengan demikian, skema implementasi kami pengelompokan data dengan GPU

co-processing dapat memanfaatkan kinerja komputasi GPU tanpa degradasi kritis akibat dari tindakan

transfer data antara CPU dan GPU.

Dalam tulisan ini, tugas penelitian lebih lanjut termasuk menggunakan sekelompok GPU untuk

klasifikasi tekstur. Hal ini bisa dilakukan termasuk berbagai GPU pada mesin yang sama atau membagi

perhitungan menjadi sekelompok PC, dan secara signifikan akan meningkatkan penerapan arsitektur

untuk aplikasi industri yang kompleks. Dengan menggunakan pendekatan ini, analisis yang lebih

mendalam pada strategi paralelisasi untuk multi-GPU diperkenalkan diperlukan. Tidak diragukan lagi,

peningkatan kinerja tidak akan sebanding dengan jumlah GPU dan akan rusak oleh faktor-faktor

seperti komunikasi data dan sinkronisasi antara host yang berbeda. kami telah mengusulkan skema

pengolahan tiga tingkat hirarki paralel untuk algoritma k-means menggunakan GPU diprogram

modern sebagai co-prosesor SIMD-paralel.

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by: [TÜBİTAK EKUAL]On: 5 July 2010Access details: Access Details: [subscription number 786636116]Publisher Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of ElectronicsPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713599654

An efficient memory allocation algorithm and hardware design withVHDL synthesisF. Karabibera; A. Sertbasa; S. Ozdemirb; H. Camc

a Computer Engineering Department, Istanbul University, Avcılar, Istanbul b Computer EngineeringDepartment, Gazi University, Maltepe, Ankara c Computer Science and Engineering Department,Arizona State University Tempe,

To cite this Article Karabiber, F. , Sertbas, A. , Ozdemir, S. and Cam, H.(2008) 'An efficient memory allocation algorithmand hardware design with VHDL synthesis', International Journal of Electronics, 95: 2, 125 — 138To link to this Article: DOI: 10.1080/00207210701828085URL: http://dx.doi.org/10.1080/00207210701828085

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

http://www.informaworld.com/smpp/title~content=t713599654

http://dx.doi.org/10.1080/00207210701828085

http://www.informaworld.com/terms-and-conditions-of-access.pdf

International Journal of Electronics,

Vol. 95, No. 2, February 2008, 125–138

An efficient memory allocation algorithm and hardware design with

VHDL synthesis

F. KARABIBER*y, A. SERTBASy, S. OZDEMIRz and H. CAM‰

yComputer Engineering Department, Istanbul University, 34320, Avc|lar, IstanbulzComputer Engineering Department, Gazi University, 06570, Maltepe, Ankara

‰Computer Science and Engineering Department, Arizona State University Tempe, AZ 85287

(Received 27 November 2006; in final form 21 November 2007)

This paper presents a hardware-efficient memory allocation technique, calledEMA, that detects the existence of any free block of requested size in memory.EMA can allocate a free memory block of any number of chunks in any partof memory without having any internal fragmentation. The gate-level designof the hardware unit, along with its area-time measurements is given inthis paper. Simulation results indicate that EMA is fast and flexible enough toallocate/deallocate a free block in any part of memory resulting in efficientutilization of memory spaces. In addition, the VHDL synthesis with FPGAimplementation shows that EMA has less complicated hardware, and is fasterthan the known hardware techniques.

Keywords: Memory allocation; Buddy system; Digital system design; VHDLsynthesis; Simulation; FPGA

1. Introduction

The high performance algorithms for dynamic memory allocation (DMA) have beenof considerable interest in the literature as DMA is a critical factor for improving acomputer system’s performance. The common allocation techniques can be dividedinto four categories: sequential fits, segregated fits, buddy system and bitmapped fits.The sequential fits, and segregated fits algorithms keep a free list of all availablememory chunks and scan the list for allocation. These two methods produce goodmemory utilization (storage), but incur a time penalty because of scanning the freespace list. The bitmapped fits method uses two bitmap for allocation process, one forthe requested allocation size and another one for encoding the allocated blockboundaries, i.e. the size of allocated blocks. If the object sizes are small, it can havespace advantages, otherwise it causes long search times.

The buddy system, introduced by Knowlton (1965), is a fast and simple memoryallocation technique which allocates memory in blocks whose lengths are power of 2.If the requested block size is not a power of 2, then the size is rounded up to the nextpower of two. This may leave a big chunk of unused space at the end of an allocatedblock (Puttkamer 1975), thereby resulting in internal fragmentation that occurs in

*Corresponding author. Email: [email protected]

International Journal of Electronics

ISSN 0020–7217 print/ISSN 1362–3060 online � 2008 Taylor & Francis

http://www.tandf.co.uk/journals

DOI: 10.1080/00207210701828085

Downloaded By: [TÜBTAK EKUAL] At: 15:29 5 July 2010

the case of more memory allocation than requested. The buddy system also suffersfrom external fragmentation that arises when a request for memory allocationcannot be satisfied, even though the total amount of free space is adequate.

The performance of the buddy system can be improved using hardwaretechniques (Puttkamer 1975, Page and Hagins 1986, Chang and Gehringer 1991).A modified hardware-based buddy system which eliminates internal fragmentation isproposed in (Chang and Gehringer 1996). In this allocation technique, the memory isdivided into the fixed-sizes word groups called chunks, and their statusses aredetermined by using a bit vector. Each bit on the vector represents a leaf of theor-gate tree given in figure 1. The method in Chang and Gehringer (1996) can detecta free block of size j only if the starting address of the free block is a factor of j, ork� j, where k� 0 and j is a power of 2. Even though a block with the requested freespace is available in the memory, their hardware design may not be able to detect itdue to the limitations of the or-gate tree structure. Another memory allocationtechnique that detects free blocks of size2dlog2 ke is proposed in Cam, Abd-El-Barr andSait (1999).

In recent years, the multiple buddy systems that predefine the size set have beenintroduced, in order to reduce the external fragmentation (Chang, Srisa-an, Dan Loand Gehringer 2002). However, the modified buddy system still performs better thanthe multiple buddy system. The active memory management algorithm based on themodified buddy system is implemented in a hardware unit (AMMU), and embeddedinto SoC design in Agun and Chang (2001). The dynamic memory management unitallows easy integration into any processor, but it requires high memory for the bigobject sizes and numbers, since the total amount of memory space for the needed bitmaps is proportional to the object size and the numbers of objects. Finally, a bitmapbased memory allocator is designed in combinational logic, works in conjunctionwith application-specific instruction set extension (Chang et al. 2002).

This paper presents a fast and efficient memory allocation (EMA) algorithm todetect any available free block of requested size and to minimize internal andexternal fragmentation. The proposed technique can allocate a free memory block ofany length located in any part of a memory. EMA can allocate the free blocks of size2blog2 kc þ 2dlog2ðk�2Þblog2 kc

e and it is capable of detecting any available free block of

Figure 1. Generic structure of or-gate prefix circuit.

126 F. Karabiber et al.


requested size using a proposed or-gate prefix circuit. Furthermore, the proposedcircuits for implementing EMA are less complicated than those introduced inChang and Gehringer (1996); Lo, Srisa-an and Chang (2001). The simulation resultsshow that EMA occupies approximately 9.2% less memory space than the modifiedbuddy system (Chang and Gehringer 1996). In addition, EMA hardware has beensynthesized with VHDL, tested for several measurements, such as the meanallocation and deallocation time, total area etc., and compared to the memorymanagement system in Agun and Chang (2001). Simulation results show that EMAresults in a significant improvement on memory allocation behaviour compared toAgun and Chang (2001) by offering 22% less external fragmentation.

The rest of the paper is organised as follows; x 2 describes the proposed memoryallocation algorithm: x 3 presents the detailed hardware design of the allocator/deallocator proposed in this work: x 4 includes the EMAs performance analysis withthe simulation, the VHDL analysis and FPGA implementation results; concludingremarks are made in x 5.

2. Efficient memory allocation algorithm

Consider that the memory is partitioned into a number of chunks which have thesame number of words and a memory block consists of one or more chunks. Thestatus of all memory chunks by either a 0 or a 1 depending on whether the chunk isfree or used, respectively, is represented by a bit-vector. In a bit-vector, memoryallocation information is held. The bits of the bit-vector are labelled from left to rightin ascending order, starting with 0. Each bit of the bit-vector has an address registercontaining its label as the address. Figure 2 shows a simplified flowchart ofAlgorithm EMA which is presented next.

Algorithm:

Input:Allocation: the size value k of the requested blocks for allocationDeallocation: the starting address of the block to be deallocated

Output:Allocation

(i) the starting address of the allocated block,(ii) the bits corresponding to the allocated blocks are inverted from 0 to 1.

Deallocation: the bits corresponding to the deallocated blocks are inverted from1 to 0.

In order to implement the above algorithm, the logic structures of the or-gate-prefix, the search-free block and the detect-free block circuits are given in x 3.Step 1 determines the request whether memory allocation or deallocation. Step 2detects the free memory chunks of size 2blog2 kc. If k is not a power of 2, thenStep 3 detects the free memory chunks of size 2blog2 kc þ 2dlog2ðk�2blog2 kce. Step 4selects the highest address with all detected free blocks. Step 5 allocates the blocksand inverts the k bits from 0 to 1 of the bit-vector corresponding to chunks ofallocated blocks. Step 6 frees the blocks and inverts the k bits from 1 to 0corresponding to the chunks of the deallocated blocks.

An efficient memory allocation algorithm 127


3. Hardware design of the memory allocator

In order to implement the above algorithm, the search-free block and detect-freeblock circuits are designed. Details of the circuits and algorithm steps are given by

the following.

3.1 Search of free blocks (steps 2 and 3)

To detect all free blocks of size 2blog2 kc þ 2dlog2ðk�2blog2 kce, we use the or-gate-prefixcircuit whose logic circuit structure was given in Cam et al. (1999). The or-gate-prefixcircuit that is designed for a memory of N chunks is shown in figure 1. Each node atlevel Li represents an OR gate. As seen from figure 1, there are nþ 1 level selectors

labelled S0; S1, . . . ,Sn for a 2n-bit vector. For any free block of size 2i, therewill be exactly one corresponding or-gate node with value 0 at level Li of the

Figure 2. Block diagram of EMA algorithm.



or-gate-prefix circuit. The outputs of all or-gates are inverted and then become theinputs of the tri-state buffers. When level selector Si is asserted, the outputs of thosetri-state buffers which correspond to free blocks generate their associated verticallines V(j), j� 0 depicted in figure 3. These vertical lines (called V-vector) generate theaddress associated with the first chunk of the available blocks of the requested size.When a block of size k is requested depending on k, in Step 2 or Step 3, only levelselector Si is asserted, i ¼ 2blog2 kc or i ¼ dlog2ðk� 2blog2 kce, respectively.

In this technique, the or-gate-prefix circuit can detect any free block if its size is apower of 2, no matter where the free block is located in the memory. If k is a powerof 2, the technique uses the or-gate-prefix circuit only once in step 2. In step 2, freeblocks of size 2blog2 kc are detected by a high-priority encoder, then the decoded bitsare compared with the requested block sizes, if they are the same it can easily be seenthat the requested size is a power of 2, and S1 is selected as the input size of theor-gate-prefix circuit (S). However, if k is not a power of 2, the or-gate-prefix circuitis used twice (step 2 and step 3). In step 3, instead of bit-vector bits, the NANDedV-vector bits are used, shown in figure 4. Using this new bit-vector, the algorithmdetects the free blocks of size 2dlog2ðk�2blog2 kce which are equivalent to free blocks of size2blog2 kc þ 2dlog2ðk�2blog2 kce in the original bit-vector.

As seen in figure 4, the subtractor subtracts the requested block sizes, k from S1.This difference (k-S1) corresponds to the free blocks of size 2dlog2ðk�2blog2 kce. As thesimilar procedure in step 2, in step 3, by an (high priority) encoder then a decodercircuits the obtained block size is loaded into a shift register, holds the content of S2.If the decoded size bits are not equal to the difference size, the shift register is enabledto shift one bit position to left, otherwise the decoded size bits are only loaded intothe register without any shift.

Figure 3. The or-gate-prefix logic circuit.



Example 1: Assume that the requested memory size is 38 chunks. In step 2, usingthe or-gate prefix circuit EMA detects free blocks of size 32 ¼ 2blog2 38c and activatesthe V-vector address bits of those blocks by S5¼ 1. Memory request size is not apower of 2, therefore EMA employs the or-gate-prefix circuit again to detect all thefree blocks of size 2blog238c þ 2dlog2ð38�2blog238ce ¼ 32þ 8 ¼ 40. Since, in Step-2, addressregisters which hold V-vector are activated from level S5 of the or-gate prefix circuiteach active register represents a free block of size 32. Moreover, 9 consecutive activeaddress registers are equal to a free block of size 40, because or-gate-prefix circuit hasthe following property: if the address register a represents the n chunks of memory,say bit-vector bits 1 to n, then the address register aþ 1 represents the bit-vector bits2 to nþ 1. EMA takes advantage of this property of the or-gate prefix circuit todetect the free blocks of size 40 by using V-vector in the bit-vector of the or-gate-prefix circuit. After the NAND operation, the free block of size 40 is represented by 8consecutive active bits of V1 register, which can be detected by or-gate prefix circuit.The results of NAND operations are inserted into bit-vector by inverting each result,so that the active bits of the new bit-vector are represented by 0s in the original bit-vector. Finally, in Step-3, EMA detects the free blocks of size 40 by finding the 8consecutive active bits in the new bit-vector.

3.2 The free block detection with the highest address (step 4)

This step is to determine the free block whose first chunk address is the greatest; thefirst chunk’s address of a block is called its starting address. The bits of V-vectorgenerated by the or-gate-prefix circuit indicate that the requested blocks are to beallocated or not. If the corresponding k bits are 1s, they are allocatable for k size ofthe free blocks. When more than one bit is set in V-vector, the selection of the highestaddress corresponding to these bits is achieved by using a high-priority encodercircuits.

Example 2: Consider the memory is partitioned into 24 chunks that have an equalnumber of words each. Each memory chunk is represented by one bit of the bit-vector as shown in figure 5. Assume that a free block of size 2 is requested. The levelselector S shown in figure 5 is activated and the V-vector bits with addresses 0010,

Figure 4. The Search and Detect-free block circuit.



0101, 1001, 1010 are set. The function of the (High-Priority) Encoder is to select thefree block of size 2, having the highest address. So, the output of the encoder isobtained as ‘1010’.

3.3 Bit inversion (step 5)

Let us denote the starting and ending addresses of the determined blocks, SA andEA, respectively. Using the formula EA ¼ SAþ k� 1, EA can be easily computedfor the size k of the requested block. Since the V-vector represent the bitscorresponding to the allocated block, SA and EA correspond to the addresses of thefirst and last bit, of this vector, respectively. The bits of V-vector must be inverted to1 to indicate that they are allocated.

This is achieved in two steps by the circuit illustrated in figure 6. The objective ofthe first step is to set the gates connected to the (EAþ 1)th bit in such a way that thebits with the addresses, such as EAþ 1; EAþ 2, cannot be inverted in the secondstep. The goal of the second step is to invert all the bits of the subvector V.

As soon as the free block with the highest address is determined in step 4, the bitinversion operation is simultaneously done to update the status of the bit vector.Therefore, the bit-inversion circuit does not contribute any delay to the currentmemory allocation request. The only constraint imposed by the bit-inversion circuitis that the next memory allocation request cannot start until they complete theinversion of the bits in V.

In the first step of the bit-inversion operation, the reset line shown in figure 6 isasserted high, and EAþ 1 is put on the address lines of the address-control unit. Foreach bit of the bit-vector, there is a corresponding match cell illustrated by a dashed-line box in figure 6. The comparator in each match cell compares the address on theaddress lines with the contents of the address register in its match cell.

Obviously, only the comparator associated with the bit EAþ 1 shows a match,because the address lines carry the address EAþ 1. Therefore, only the output of thecomparator of the match cell EAþ 1 is 1, and the outputs of all other comparatorsare 0. The contents of the bit-latch in a match cell can be changed only when the resetline is 1, so that the reset line acts like an enable line for the bit-latch. Since the resetline is also 1 during the first step only, the number 0 is stored in the bit-latch of the

Figure 5. The contents of the bit-vector and V-vector for the example.



match cell EAþ 1, and all the other bit-latches of the match cells store the number 1.Note that the output of the bit-latch of the match cell EAþ 1 is an input to an ANDgate whose output automatically becomes 0. Finally, the reset line is made 0 and thefirst step ends. The settings of the circuit in the first step ensure that the bits with theaddresses such as EAþ 1; EAþ 2 cannot be inverted in the second step.

In the second step of the bit-inversion operation, all the bits of the subvector Vare inverted. To accomplish this, the address lines of the address-control unit infigure 6 are set to the starting address SA, and the set line is asserted high. Thecomparator of each match cell again compares its address with the address SAcarried by the address lines. Consequently, only the output of the comparator of thematch cell SA becomes 1. Because the set line is also 1, both inputs of one of theAND gates in the match cell SA become 1, thereby resulting in an output 1. Thisleads the output of the OR gate, connected to the AND gate, to be 1. The output ofthis OR gate is connected to the AND gate which enables the bit-inverter with theaddress SA. Because the bit-latches of the match cells with addresses SA to EA havebeen set to 1 in the first step, both inputs of the AND gates located between OR gatesare 1s. Therefore, the outputs of the OR gates connected to the bit-inverters SA toEA inclusive become 1 sequentially. Hence, all the bit-inverters from SA to EA areenabled. This amounts to saying that all the bits of the subvector V are invertedduring step 2. At the end of step 2, the set line is made 0.

Figure 6. Bit-inversion circuit.



3.4 Memory deallocation (step 6)

In case of memory deallocation, the starting address SA and the size k of the block tobe deallocated are given. Since the ending address EA of the block is known, instep 6, bit inverters invert the bits from 1 to 0 to indicate that they are free.

4. Simulation results and VHDL synthesis

This section consists of two parts. In the first part, a comparative performanceanalysis of EMA and previous memory allocation techniques are given. The secondpart presents how EMA is implemented in hardware using VHDL synthesis. Also inthis part, comparison of EMA and active memory managament unit (AMMU)(Agun and Chang 2001) is given using criteria of fragmentation, execution time andchip area.

4.1 Simulation results

A simulator to compare the memory utilization of EMA with the following threememory allocators was written: Chang and Gehringer’s memory allocator (CGMA)(Chang 1996), the complete multiple buddy system (CMBS) (Chang 1997) andmultiple buddy system (MBS) (Lo et al. 2001). Similar to the simulations in Lo et al.(2001), the MBS in our simulations has the same size set of (2N, 3� 2N), that is, MBSis able to allocate any requested block of size 2n or 3� 2n. It is stated in Chang (1997)that CMBS yields the best memory utilization among all previously known buddysystems, therefore CMBS is also used as a benchmark in the simulations.

The simulator uses memory allocation/deallocation traces collected using eightJava benchmark programs of SPECjvm98 (1998). These benchmark programs aredesigned to measure the performance of Java virtual machine (JVM) implementa-tions. The Kaffe (2002) JVM version 1.0.7 is customized to generate a trace file ormemory allocation/deallocation requests during the execution of each Java programof SPECjvm98.

As in Chang and Gehringer (1996), the performance metric used in thesimulations is the highest address values allocated (HAA) in the memory. Thesimulator takes a previously generated memory trace file and generates the highestaddress values for the allocation/deallocation of the requested memory blocks byeach of the above four memory allocators. The simulation results in table 1 indicatethat, CMBS has the best memory utilization by generating the lowest HAAs, and isfollowed by EMA, CGMA and MBS, respectively.

Also, table 1 shows that the average memory utilization of EMA is 9.2% and13.8% better than CGMA and MBS, respectively. This is due to following facts.

. EMA is capable of allocating an available block of the requested size whereverit is in the memory.

. In EMA the size of the free memory block that has to be detected to allocate arequest is smaller than the ones in CGMA and MBS.

Although this memory utilization improvement of 9.2% may not seem to be a bignumber, it is indeed a significant improvement knowing that the best buddy typememory allocator CMBS is shown to be around only 20% better than the modified



buddy system (Lo et al. 2001). Moreover, the memory utilization improvement ofEMA may be increased further when the circular prefix circuit is used.

Table 2 shows the number of allocation/deallocation requests (NADR) andAverage Allocation/Deallocation Size (AADS) of each simulation program and thepercentage of memory overhead associating with EMA, CGMA (Chang andGehringer 1996), and MBS (Lo et al. 2001) compared to CMBS (Chang 1997).

As seen from tables 1 and 2, CMBS has the best memory utilization, whereasEMA clearly outperforms CGMA and MBS, resulting in lower HAAs for allbenchmark programs. However, implementation cost of CMBS is much higher thanEMA, because the number of nodes in CMBS’s or-gate tree is O(N2) where EMA’sor-gate tree has only O(NlogN) nodes. CGMA and MBS both have O(N) nodes intheir or-gate trees. The implementation costs of above memory allocators in termsof the number of nodes in their or-gate trees are presented in figure 7 for variousbit-vector sizes. Figure 7 indicates that CMBS is extremely expensive to implementcompared to EMA, CGMA and MBS.

4.2 VHDL synthesis and test results

In this section, the chip area and execution time of the proposed hardware unit areexamined. For different parameters, such as bit-vector size and the maximum object

Table 1. Highest Address Allocated (HAA) values in number of blocks whenblock size is eight.

Trace programs CMBS EMA CGMA MBS

Check 1,09,895 1,39,239 1,71,769 1,95,625Compress 1,04,808 1,33,085 1,62,738 1,74,129Jess 3,55,718 4,05,967 4,57,114 5,33,299Db 1,92,531 2,24,888 2,60,012 2,73,879Javac 3,13,403 3,42,351 3,91,277 3,74,762Mpegaudio 1,38,698 1,69,043 2,14,083 2,39,170Mtrt 7,41,602 9,02,414 9,28,932 9,50,535Jack 5,22,589 6,19,558 6,50,956 6,66,731

Table 2. Percentage overhead of memory allocators compared to CMBS, NADRand AADS.

Trace programs EMAoverhead CGMAoverhead MBSoverhead NADR AADs

Check 21.07 36.02 43.82 86,738 54Compress 21.25 35.60 39.81 77,233 53Jess 12.38 22.18 33.30 21,184,857 46Db 14.39 25.95 29.70 65,50,585 25Javac 8.46 19.90 16.37 8,22,007 41mpegaudio 17.95 35.21 42.01 92,525 53Mtrt 17.82 20.17 21.98 13,843,281 32Jack 15.65 19.72 21.62 28,039,374 36



size, test results are obtained. Bit vector is used for the status and capacity ofmemory. The maximum object size is defined as the maximum number of chunksthat can be allocated once. In this application, for the different allocation requests,the bit-vector sizes range from 64 byte up to 512 byte whereas the maximumallocation sizes vary from 16 to 128 blocks. For this purpose, Xilinx ISE (2005)6.2.03i tool is used to generate a gate level representation of the memory allocator/deallocator hardware design. The hardware unit is implemented in Virtex II proFPGA (2005), having the device type XC2VP100, package type ‘FF’, number of pins1696, speed grade-5. The analysis performed on a desktop computer that has Intel’sPentium IV running at 2GHz with 512 Mbyte of RAM.

The results regarding the FPGA implementation of the allocator/deallocator unitare given in table 3. As seen from table 3, only 16% of the total FPGA can be usedfor the maximum bit-vector size selected in EMA implementation. It should be notedthat bit vector size can be increased with respect of application. These results alsoshow that the amount of FPGA used in the implementation is slightly affected by theparameter of the maximum object size.

Table 4 shows that the number of cycles needed to perform an allocation/deallocation process. Each allocation request takes 5 clock cycles, in the case that itssize is a power of 2, and 6 cycles otherwise. Each deallocation request needs only 2cycles.

In addition, the time analysis of the EMA hardware unit is performed and theresults in terms of the mean allocation and deallocation times are reported infigure 8. As seen from figure 8, large bit-vector sizes can increase both meanallocation and deallocation time delay. However, when the bit-vector length isselected as constant, increase of the maximum allocation size does not cause anydelay to allocation and deallocation time.

The active memory management algorithm, based on the modified buddy systemis implemented in a hardware unit (AMMU), embedded into the SoC design usingVHDL in Agun and Chang (2001). Total fragmentation, maximum clock frequencyand used slice number are obtained in Agun (2001). The comparison of EMA and

Figure 7. The implementation costs of CMBS, EMA, CGMA and MBS in terms of numberof nodes.



AMMU is given in table 5. EMA reduces total fragmentation in the ratio of 22%. AsAMMU allocates memory blocks each of whose size is a power of two, it suffersfrom internal and external fragmentation. However, in EMA only externalfragmentation can exist. Furthermore, EMA can perform allocation process in ashorter time, since allocation time is proportional to inverse of the maximum clockfrequency. On the other hand, the implementation of EMA cost in terms of used slicenumber is higher than AMMU.

5. Conclusions

The paper presented a memory allocation and deallocation technique EMA with itshardware design. When a free block of k chunks is requested, an or-gate-prefixcircuit detects all free blocks of 2blog2 kc þ 2dlog2ðk�2blog2 kce chunks in memory. The free

Table 3. Memory Alloc./Dealloc. hardware implementation results on Virtex II pro FPGA.

Total 64-16 128-16 128-32 256-16 256-32 256-64 512-32 512-64

Slices 44,096 704 1% 1479 3% 1539 3% 3086 6% 3220 7% 3414 7% 6954 15% 7451 16%

LUTs 88,192 1224 1% 2671 3% 2717 3% 5509 6% 5754 6% 6130 6% 11,722 13% 12,695 14%

Table 4. Propagation delays for the proposedhardware unit.

Steps Clock cycles

Search of free blocks (step 2–3) 2High address detection (step 4) 1Bit inversion (step 5) 2Deallocation (step 6) 2

Figure 8. Mean allocation and deallocation time.



block with the highest address is determined by using encoder circuit. The startingaddress of this free block is returned to the memory manager to indicate that the firstk chunks of the block are allocated. While these free k chunks are being used by thememory manager, those k bits corresponding to them are inverted in thebackground. Thus, the bit inversion operation does not contribute any delay tothe memory allocation request.

Consequently, EMA is fast and flexible enough to allocate/deallocate a free blockin any part of memory. This leads to better utilization of memory space, therebyallowing more memory blocks to remain free than is possible with the knownhardware memory allocators. Simulation results have indicated that EMA occupiesapproximately 9.2% less memory space than the modified buddy system (Chang andGehringer 1996). Although this memory occupancy improvement of 9.2% may notseem to be a substantial improvement, it is still significant, knowing that the idealmemory occupancy improvement is shown to be around 20% better than themodified buddy system (Lo et al. 2001).

The gate-level design of the unit, the hardware implementation, and test results,along with area-time measurements versus the bit-vector size and the maximumallocation block size, are presented in this work. The proposed allocator unit (EMA)is compared with AMMU which is one of the recent dynamic memory allocationtechniques. The results show that, in comparison with AMMU, EMA reduces thetotal fragmentation up to 22%. Furthermore, VHDL synthesis results show that themaximum clock frequency of EMA is higher compared to AMMU’s maximum clockfrequency. Hence, EMA can perform allocation process in a shorter time thanAMMU.

References

S.K. Agun and J.M. Chang, ‘‘Design of a reusable memory management system’’, in Proc. of14th Annual IEEE Int’l ASIC/SOC Conf., 2001, pp. 369–373.

H. Cam, M. Abd-El-Barr and S.M. Sait, ‘‘Design and analysis of a high-performancehardware-efficient memory allocation technique,’’ in Proc. of the ICCD’99 Int. Conf. onComputer Design, October 10–13, Austin, USA, 1999, pp. 274–276.

J.M. Chang and E.F. Gehringer, ‘‘Object caching for performance in object-orientedsystems’’, in Proc. IEEE Int’l Conf. Computer Design, Cambridge, MA, pp. 379–385,1991.

J.M. Chang and E.F. Gehringer, A high performance memory allocator for object-orientedsystems, IEEE Trans. Comp., 45, pp. 357–366, 1996.

J.M. Chang, ‘‘Design and evaluation of a submesh allocation scheme for two-dimensionalmesh-connected parallel computers’’, in Proc. 1997 Int’l Symposium on ParallelArchitectures, Algorithms and Networks (I-SPAN), 1997, pp. 303–309.

Table 5. Comparison results of FEMA with AMMU.

Memory size 256 512

Techniques FEMA AMMU FEMA AMMU

Total Fragmentation 0.682 0.878 0.683 0.884Max. clock frequency (Mhz) 69.07 9.649 63.135 7.922Used Slice number 3414 1071 7735 2104



J.M. Chang, W. Srisa-an, C.T. Dan Lo and E.F. Gehringer, ‘‘DMMX: dynamic memorymanagement extensions’’, J. Sys. Software, 63, pp. 187–199, 2002.

Kaffe 1.0.7, Released under GPL license by GNU. Available online at http://www.kaffe.org(accessed 01 February 2002).

K.C. Knowlton, ‘‘A fast storage allocator’’, Comm. ACM, 8, pp. 623–625, 1965.C.-T.D. Lo, W. Srisa-an and J.M. Chang, ‘‘Performance analyses on the generalised buddy

system’’, IEE Proc. Comput. Digit. Tech., 148, pp. 167–175, 2001.I.P. Page and J. Hagins, ‘‘Improving the performance of buddy systems’’, IEEE Trans.

Comput., 35, pp. 441–447, 1986.E.V. Puttkamer, ‘‘A simple hardware buddy system memory allocator’’, IEEE Trans.

Comput., 24, pp. 953–957, 1975.SPECjvm98, Release 1.0. Available online at: http://www.spec.org/jvm98/jvm98/doc/

index.htmlXilinx ISE tool, Available online at: http://www.xilinx.com (accessed 12 May 2003).Xilinx, Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete Data Sheet, Available

online: http://direct.xilinx.com/bvdocs/publications/ds083.pdf (accessed 04 August 2003).



Sebuah Algoritma Alokasi Desain Perangkat Keras dan Memori yang Efisiendengan Sintesis VHDL

1. PendahuluanAlgoritma kinerja tinggi untuk alokasi memori dinamis (DMA) telah

kepentingan yang cukup besar dalam sastra sebagai DMA merupakan faktorpenting untuk meningkatkan suatu kinerja sistem komputer. Teknik-teknikalokasi umum dapat dibagi menjadi empat kategori: cocok berurutan, cocokterpisah, sistem buddy dan cocok bitmap. Para cocok sekuensial, dan cocokterpisah algoritma menyimpan daftar bebas dari semua yang tersedia potonganmemori dan memindai daftar untuk alokasi. Kedua metode menghasilkan yangbaik pemanfaatan memori (storage), tapi dikenakan hukuman waktu karenapemindaian gratis daftar ruang. Metode cocok bitmap menggunakan duabitmap untuk proses alokasi, satu untuk ukuran alokasi yang diminta dan satulagi untuk pengkodean blok dialokasikan batas, yaitu ukuran blok yangdialokasikan. Jika ukuran benda kecil, dapat memiliki keuntungan ruang, selainitu menyebabkan waktu pencarian yang lama.

Sistem buddy, diperkenalkan oleh Knowlton (1965), adalah memori yangcepat dan sederhana Teknik alokasi yang mengalokasikan memori dalam blokyang panjangnya adalah kekuatan 2. Jika ukuran blok yang diminta bukanmerupakan kekuatan 2, maka ukuran dibulatkan ke depan kekuatan dua. Inimungkin meninggalkan sebagian besar dari ruang yang tidak terpakai padaakhir dialokasikan block (Puttkamer 1975), sehingga menghasilkanfragmentasi internal yang terjadi di kasus alokasi memori lebih dari yangdiminta. Sistem sobat juga menderita dari fragmentasi eksternal yang munculketika permintaan alokasi memori tidak dapat puas, meskipun jumlah totalruang bebas memadai.

Kinerja sistem buddy dapat ditingkatkan dengan menggunakan hardwareteknik (Puttkamer 1975, Page dan Hagins 1986, Chang dan Gehringer 1991).Sebuah sistem buddy berbasis hardware dimodifikasi yang menghilangkanfragmentasi internal diusulkan dalam (Chang dan Gehringer 1996). Dalamteknik alokasi ini, memori adalah dibagi menjadi kelompok kata yang tetap-ukuran yang disebut potongan, dan statusses mereka ditentukan denganmenggunakan vektor bit. Setiap bit pada vektor merupakan daun dariatau pohon-gate yang diberikan pada gambar 1. Metode di Chang danGehringer (1996) dapat mendeteksi blok bebas ukuran j hanya jika alamat awaldari blok bebas adalah faktor j, atau k x j, di mana k >= 0 dan j adalah kekuatan2. Meskipun blok dengan meminta gratis ruang yang tersedia dalam memori,desain hardware mereka mungkin tidak dapat mendeteksi itu karenaketerbatasan struktur pohon atau gerbang. Alokasi memori lain teknik yangdapat mendeteksi blok bebas size2[log2k] diusulkan di Cam, Abd El--Barr danSait (1999).

Dalam beberapa tahun terakhir, beberapa sistem sobat yang predefine setukuran telah diperkenalkan, dalam rangka mengurangi fragmentasi eksternal

(Chang, Srisa-an, Dan Lo dan Gehringer 2002). Namun, sistem buddydimodifikasi masih melakukan lebih baik daripada sistem sobatmultiple. Algoritma manajemen memori aktif berdasarkan sistem buddydimodifikasi diimplementasikan dalam unit hardware (Ammu), dan tertanamke dalam desain SoC di Agun dan Chang (2001). Unit manajemen memoridinamis memungkinkan integrasi yang mudah ke setiap prosesor, tetapimembutuhkan memori tinggi untuk besar ukuran objek dan angka, karenajumlah total ruang memori untuk diperlukan bit peta sebanding dengan ukuranbenda dan jumlah benda. Akhirnya, bitmap berdasarkan pengalokasi memoridirancang dalam logika kombinasional, bekerja sama dengan aplikasi-spesifikekstensi set instruksi (Chang et al. 2002).

Makalah ini menyajikan alokasi memori yang cepat dan efisien (EMA)algoritma untuk mendeteksi setiap blok gratis yang tersedia dari ukuran yangdiminta dan untuk meminimalkan internal dan fragmentasi eksternal. Teknikyang diusulkan dapat mengalokasikan blok memori bebas setiap panjang yangterletak di setiap bagian dari memori. EMA dapat mengalokasikan blok bebasukuran 2[log2k] + 2log2(k-2)[log2k] dan ia mampu mendeteksi setiap blokgratis yang tersedia dari meminta ukuran menggunakan sirkuit atau-gate awalan yang diusulkan. Selain itu, diusulkan sirkuit untuk menerapkanEMA kurang rumit daripada yang diperkenalkan di Chang dan Gehringer(1996); Lo, Srisa-an dan Chang (2001). Hasil simulasi menunjukkan bahwaEMA menempati ruang memori sekitar 9,2% kurang dari dimodifikasi sistembuddy (Chang dan Gehringer 1996). Selain itu, perangkat keras EMA telahdisintesis dengan VHDL, diuji untuk beberapa pengukuran, seperti meanalokasi dan dealokasi waktu, total luas dll, dan dibandingkan dengan memorisistem manajemen di Agun dan Chang (2001). Hasil simulasi menunjukkanbahwa EMA menghasilkan peningkatan yang signifikan pada perilaku alokasimemori dibandingkan dengan Agun dan Chang (2001) dengan menawarkan22% lebih sedikit fragmentasi eksternal.

Sisa kertas ini disusun sebagai berikut; x 2 menggambarkan memori yangdiusulkan algoritma alokasi: x 3 menyajikan desain hardware rinci pengalokasi/ deallocator diusulkan dalam pekerjaan ini: x 4 meliputi analisis kinerja EMASdengan simulasi, analisis VHDL dan FPGA hasil implementasi; menyimpulkanPernyataan yang dibuat di x 5.

2. Algoritma alokasi memori EfisienPertimbangkan bahwa memori dipartisi menjadi beberapa potongan yang

memiliki jumlah yang sama kata dan blok memori terdiri dari satu atau lebihpotongan. Itu status dari semua potongan memori dengan baik 0 atau 1tergantung pada apakah potongan tersebut gratis atau digunakan, masing-masing, diwakili oleh bit-vector. Dalam sedikit-vektor, memori Informasialokasi diadakan. Bit dari bit-vector diberi label dari kiri ke kanan dalamurutan, dimulai dengan 0. Setiap bit dari bit-vector memiliki alamat register

mengandung label sebagai alamat. Gambar 2 menunjukkan flowchartdisederhanakan Algoritma EMA yang disajikan berikutnya.

Algoritma:Input:Alokasi: nilai ukuran k dari blok yang diminta untuk alokasiDealokasi: alamat awal blok yang akan deallocatedOutput:Alokasi(i) alamat awal dari blok yang dialokasikan,(ii) bit sesuai dengan blok yang dialokasikan terbalik dari 0 ke 1.Dealokasi: bit sesuai dengan blok deallocated terbalik dari 1 menjadi 0.Dalam rangka menerapkan algoritma di atas, struktur logika atau-gate-

awalan, blok pencarian bebas dan mendeteksi bebas blok sirkuit diberikandalam 3 x. Langkah 1 menentukan apakah permintaan alokasi memori ataudealokasi. Langkah 2 mendeteksi potongan memori bebas dari ukuran2[log2k]. Jika k bukan kekuatan 2, kemudian Langkah 3 mendeteksi potonganmemori bebas dari ukuran 2[log2k] + 2[log2(k-2[log2k]. Langkah 4 memilihalamat tertinggi dengan semua blok bebas terdeteksi. Langkah 5mengalokasikan blok dan membalikkan bit k dari 0 ke 1 dari bit-vector sesuaidengan potongan dialokasikan blok. Langkah 6 membebaskan blok danmembalikkan bit k dari 1 ke 0 sesuai dengan potongan blok deallocated.

3. Desain Hardware pengalokasi memoriDalam rangka menerapkan algoritma di atas, blok pencarian bebas dan

mendeteksi bebas sirkuit blok dirancang. Rincian dari sirkuit dan langkah-langkah algoritma yang diberikan oleh berikut.

4. Hasil simulasi dan sintesis VHDLBagian ini terdiri dari dua bagian. Pada bagian pertama, kinerja

komparatif analisis EMA dan teknik alokasi memori sebelumnyadiberikan. Yang kedua bagian menyajikan bagaimana EMAdiimplementasikan pada hardware yang menggunakan sintesis VHDL. Jugadalam bagian ini, perbandingan EMA dan aktif Unit managament memori(Ammu) (Agun dan Chang 2001) diberikan menggunakan kriteria fragmentasi,waktu pelaksanaan dan area chip.

5. KesimpulanMakalah ini disajikan alokasi memori dan dealokasi teknik EMA dengan

nya desain hardware. Ketika sebuah blok bebas dari k potongan yangdiminta, atau-gate-prefix sirkuit mendeteksi semua blok bebas dari 2[log2k] +2[log2(k-2[log2k] potongan dalam memori. Gratis blok dengan alamat tertinggiditentukan dengan menggunakan rangkaian encoder. Starting alamat blokbebas ini dikembalikan ke manajer memori untuk menunjukkan bahwa yangpertama potongan k blok dialokasikan. Sementara ini k potongan bebas yang

digunakan oleh manajer memori, mereka bit k yang sesuai dengan mereka yangterbalik dalam background. Dengan demikian, operasi bit inversi tidakmemberikan kontribusi penundaan untuk permintaan alokasi memori.

Akibatnya, EMA cepat dan cukup fleksibel untuk mengalokasikan /deallocate blok gratis dalam setiap bagian dari memori. Hal ini menyebabkanpemanfaatan yang lebih baik dari ruang memori, sehingga memungkinkanlebih banyak blok memori untuk tetap bebas daripada yang mungkin dengandiketahui penyalur memori hardware. Hasil simulasi telah menunjukkan bahwaEMA menempati sekitar 9,2% lebih sedikit ruang memori dari sistem buddydimodifikasi (Chang dan Gehringer 1996). Meskipun hal ini peningkatanhunian memori sebesar 9,2% tidak mungkin tampaknya menjadi sebuahpeningkatan yang substansial, masih signifikan, mengetahui bahwa cita-citaPeningkatan hunian memori yang akan ditampilkan sekitar 20% lebih baik darisistem buddy dimodifikasi (Lo et al. 2001).

Desain gerbang-tingkat unit, implementasi perangkat keras, dan hasil tes,bersama dengan pengukuran daerah-waktu versus ukuran bit-vector danmaksimal Alokasi ukuran blok, disajikan dalam karya ini. Unit pengalokasiyang diusulkan (EMA) dibandingkan dengan Ammu yang merupakan salahsatu alokasi memori baru yang dinamis teknik. Hasil penelitian menunjukkanbahwa, dibandingkan dengan Ammu, EMA mengurangi Total fragmentasihingga 22%. Selanjutnya, hasil sintesis VHDL menunjukkan bahwa frekuensiclock maksimum EMA lebih tinggi dibandingkan dengan clock maksimumAmmu frekuensi. Oleh karena itu, EMA dapat melakukan proses alokasi dalamwaktu yang lebih pendek dari Ammu.

International Journal of Wireless & Mobile Networks (IJWMN) Vol. 4, No. 3, June 2012

DOI : 10.5121/ijwmn.2012.4311 169

ONTOLOGY FOR MOBILE PHONE OPERATING

SYSTEMS

Hasni Neji and Ridha Bouallegue

Innov’COM Lab, Higher School of Communications of Tunis, Sup’Com

University of Carthage, Tunis, Tunisia.

Email: [email protected]; [email protected]

ABSTRACT

This ongoing study deals with an important part of a line of research that constitutes a challenging

burden. It is an initial investigation into the development of a Holistic Framework for Cellular

Communication (HFCC). The main purpose is to establish mechanisms by which existing wireless

cellular communication components and models can work holistically together. It demonstrates that

establishing a mathematical framework that allows existing cellular communication technologies (and

tools supporting those technologies) to seamlessly interact is technically feasible.

The longer-term future goals are to actually improve the interoperability, the efficiency of mobile

communication, calls quality, and reliability by applying the framework to specific development efforts.

KEYWORDS

Interoperability, Ontology, Feature modeling, HFCC, OWL.

1. INTRODUCTION

Operating system’s Interoperability is the ability of independent software systems (tailored to

mobile phones) to work with each other, exchange information and initiate actions from each

other, without the condition of knowing neither the details nor how they each work. However,

to reach this objective, the domain of study must be explicitly understood without any

ambiguities. The approach to this portion of investigation is to analyze the structures, and

architectures of a small set of individual tools: Symbian Operating System (Symbian OS) -- a

mobile operating system and computing platform designed for smart phones, Currently

maintained by Accenture (is a global management consulting, technology services and

outsourcing company headquartered in Dublin, Republic of Ireland)--. And Android

Operating System (Android OS) -- a Linux-based operating system for mobile devices such as

smart phones and tablet computers. The software is developed by the Open Handset

Alliance led by Google)--. This includes performing a domain analysis (of this subset of

tools) and building a feature model of the domain [1]. Next, the selected main artifact attributes

will be considered in the context of the objects needed for establishing an object federation.

Using this context, the characteristics will be defined within mobile phone’s operating system

Ontology [2]. This latter would be a pilot star-up framework for further extension by

incorporating new mobile operating systems.


170

2. TOWARDS MOBILE PHONE OPERATING SYSTEM ONTOLOGY

2.1 Ontology overview:

The term “Ontology” is widely used in Knowledge Engineering and Artificial Intelligence

where what "exists" are those entities which can be "represented." It is used to refer to the

shared understanding of some domain of interest. It may be used as a unifying framework to

solve problems in that domain [2].

Ontologies capture knowledge about some domain of interest. They describe the concepts in

that domain and also the relationships that hold between those concepts.

To be effective, Ontology must entail or embodies a world view with respect to a given domain.

This world view is often conceived as a set of concepts along with their definitions and their

inter-relationships.

2.2 Methodology for building the Ontology

One way to overcome the drawbacks generated by the lack of interoperability in heterogeneous

mobile phone’s operating systems is to establish a unifying contextual and architectural

framework for the domain. As this contextual and architectural framework or “Ontology”

emerges; mobile phone’s operating systems will be able to communicate with more efficiency.

In constructing the specific tool Ontologies (one for each mobile phone operating system); the

focus is on identifying the classes (and methods) that were needed to pass objects from one tool

to another.

Because there is currently no usable interoperability Ontology for the domain of mobile phone’s

operating systems, it was not possible to rely on previous works in designing a federation

Ontology. Instead, this Ontology had to be constructed "from scratch."

Fortunately, there were existing methodologies for designing Ontologies. The methodology

presented by [3] is one of them. It is possible to tailor this methodology to develop a specific

Ontology for constructing the federation one. The Ontology development process consists of the

following steps:

(1) identify the purpose and scope of the Ontology,

(2) perform a feature analysis for the domain of mobile phone operating systems’ tools,

(3) collect similar characteristics between different feature models, establish affinity

relationships, and group commonalities between the two tools in order to build a

federation Ontology representing these commonalities and enter this Ontology into

Protégé 4.1,

(4) construct the more detailed Ontologies for each tool in Protégé 4.1,

(5) use the Unified Modeling Language (UML) to represent the relationships between the

three Ontologies, and

(6) Document the Ontologies.

1. Step 1 -- Purpose and Scope of the Ontology

In terms of purpose and scope, the federation Ontology must be broad enough to accommodate

all potential mobile phone’s operating systems, as well as being extensible in case new

Ontology terms and relationships have to be added later. The specific development tool

Ontologies must be detailed enough to account for the structures and architectures actually

forming the basis of mobile phone’s operating systems. 2. Step 2 -- Feature Modeling The second step in the Ontology design methodology is to perform a domain analysis of mobile

operating systems by constructing and then considering the feature models of Symbian OS and

Android OS.


171

Feature modeling is a method used to help define software product lines and system families, to

identify and manage commonalities and variabilities between products and systems [1].

Defining a feature model for an existing mobile phone’s operating systems provides means to

explore, identify, and define the key architectural aspects of the existing mobile phone’s

operating systems so that these aspects can then be described more fully in Ontology. It is this

Ontology that can be used to establish interoperability between the existing mobile phone’s

operating systems. As shown in figure 1, the feature model is defined around concepts and not around objects. The

objective is to model features of elements and structures of a domain, not just objects in that

domain. For more detail on the mechanism of how to construct a feature model, the reader may

consult [1], [4], or [3]. In his book, Czarnecki [1] provides an excellent methodology for

gathering the information needed to construct a feature tree. He identifies the sources of features

as the following:

Figure 1. Feature Model of the Symbian high level concepts.

• Existing and potential stakeholders,

• Domain experts and domain literature,

• Existing systems,

• Pre-existing models (e.g., use-case models, object models…), and

• Models created during development process (i.e., features gotten during design).

Moreover, He identifies the following strategies for identifying and capturing features:

• Look for important domain terminology that implies variability.

• Use feature starter sets to start the analysis.

• Update feature models during the entire development cycle.

• Identify more features than you initially intend to implement.

Besides, according to [1] the following set of general steps illustrates the feature modeling process:

• Record similarities between instances (i.e. common features).

• Record differences between instances (i.e. variable features).

• Analyze feature combinations and interactions.

• Record all the additional information regarding features.

• Organize the features in hierarchical feature tree with classification (mandatory, optional,

alternative, and/or optional alternative features).


172

� Mandatory. A mandatory child feature must be included in all the products in which its

parent feature is included.

� Optional. An optional child feature can be optionally included in all products in which

its parent feature appears.

� Alternative. A set of child features are defined as alternative, if only one of them can be

selected when its parent feature is part of the product. As an example, a mobile phone

may use a Symbian or Android operating system but not both at the same time.

Or-relation. A set of child features are said to have an or-relation with their parent when one or

more of them can be included in the products in which its parent feature appears. For instance, a

mobile phone can provide connectivity support for Bluetooth, Universal Serial Base (USB),

Wireless Fidelity (Wi-Fi) or any combination of the three.

3. Step 3 – Establishing Commonalities After producing a feature model for Symbian and Android, the next step required is to isolate

and annotate the commonalities (or the inferred commonalities) that exist between the two

feature models. These common features then formed the basis for the basic Ontology

terminology of the mobile phone’s operating systems federation. The approach in this step is to

brainstorm and reason about the two feature diagrams, develop lists of potential terms from the

feature diagrams, identify common terms between the two lists, and then construct affinity

diagrams of these common terms. Affinity diagrams are hierarchical Venn diagrams that

provide groupings of related terms. Figure 2 illustrates how an affinity diagram is constructed;

in this case the related terms all deal with "Application_Framework" as a part of the Android

Operating System architecture.

Figure 2. Construction of an Affinity Diagram.

The groupings of terms in the affinity diagrams then provided the basis for the hierarchy of

terms in the mobile phone’s operating systems federation Ontology. These terms and their

hierarchy were then entered and stored in the Protégé 4.1 ontology Web Language1 (OWL).

4. Step 4 – Tool Ontologies

1 http://protege.stanford.edu


173

As the terms of each Ontology’s tool were identified they were input into the Protégé 4.1

Ontology Web Language. Different Ontology languages provide different facilities and fulfill

the requirements needed for this work. The most recent development in standard Ontology

languages is OWL from the World Wide Web Consortium (W3C). OWL makes it possible to

describe concepts. It has a richer set of operators - e.g. intersection, union and negation. It is

based on a different logical model which makes it possible for concepts to be defined as well as

described. Complex concepts can therefore be built up in definitions out of simpler concepts.

Furthermore, the logical model allows the use of a reasoner which can check whether or not all

of the statements and definitions in the Ontology are mutually consistent and can also recognize

which concepts under which definitions. The reasoner can therefore help to maintain the

hierarchy correctly. This is particularly useful when dealing with cases where classes can have

more than one parent.

OWL classes are interpreted as sets that contain individuals. They are described using formal

(mathematical) descriptions that state precisely the requirements for membership of the class.

One of the main services offered by a reasoner is to test whether or not one class is a subclass of

another class. By performing such tests on the classes in Ontology it is possible for a reasoner to

compute the inferred Ontology class hierarchy. Another standard service that is offered by

reasoners is consistency checking. Based on the description (conditions) of a class the reasoner

can check whether or not it is possible for the class to have any instances. A class is deemed to

be inconsistent if it cannot possibly have any instances. In Protégé 4.1, it is possible to manually

construct class hierarchy called the asserted hierarchy. The class hierarchy that is automatically

computed by the reasoner is called the inferred hierarchy. Moreover, this tool automatically

classifies the ontology (and check for inconsistencies).

The already mentioned reasons certify the high-quality of the tool. Protégé 4.1, OWL was

chosen in order to make the descriptions of concepts, attributes, and instances formal, so as the

knowledge can be machine-readable and reasoning-automated. Figure 3 is a screen capture of

the protégé 4.1 OWL environment.

Figure 3. Screen shot of the protégé 4.1 tool.


174

5. Step 5 - UML Representation of the Domain

The relationships between all three Ontologies are identified and annotated. The reason for this

is to formulate inter-relationships between the Ontologies.

Like Ontology, a well-defined class diagram, part of UML, can describe concepts and

relationships in a certain domain.

Both approach have much to offer and can work together most effectively in an integrated

environment. Figure 4 illustrates the general UML structure that was used to annotate how all

three Ontologies were related.

Figure 4. Ontology Inter-relationship.

6. Step 6 -- Documentation

The three developed Ontologies are self-documenting and are stored in Protégé 4.1 project files.

Such documentation makes it possible for future researchers to modify the Ontologies or add

additional tools to them. As a final note about the methodology used to develop these initial

mobile phone’s operating systems Ontologies, a modified version of this methodology should

be used when adding additional tools to the HFCC federation. The modified methodology

includes the following:

• Confirm that the purpose is still valid; expand the scope to include the new tool

Ontologies. Remove invalid ones from the framework, they are no longer needed.

• Only perform feature modeling of the new tools so that needed constructs for the

federation Ontology are identified. Since the federation Ontology is already established,

it is only necessary to extend and modify it, not re-build it entirely.

• Modify the federation Ontology to account for the new/modified Ontology terms.

• Modify each UML relationship diagram as needed to account for the new tool

Ontologies and the changes to the federation Ontology.

3. ONTOLOGY DESIGN

3.1 Domain analysis and feature models

A domain analysis of the mobile operating system’s domain was undertaken. The analysis was

accomplished by examining two specific mobile phone’s operating systems, building feature

models of those tools, and then identifying key terminology of the feature models. There are

two reasons why this domain analysis cannot be considered to be a complete analysis of the

domain of mobile phone’s operating systems. First, only two tools (out of many of possibilities)

were analyzed. Secondly, a domain analysis is not an “additive” activity; simply analyzing


175

additional tools (beyond those two) by themselves does not completely add to the overall

analysis. The ways in which the new additions affect and change the previously established

analysis must also be considered. Therefore, the limited domain analysis conducted as part of

this research can be considered a necessary, but not sufficient, analysis towards establishing a

unified framework for mobile phone’s operating systems.

3.1.1 Symbian Operating System

The first tool analyzed in the domain analysis was Symbian, a mobile operating system and

computing platform designed for smart phones. The feature model was developed by identifying

software features from the Symbian OS Architecture Source book [5] and by actual day-to-day

use of the tool. Figure 5 illustrates a single excerpt from the overall feature model for Symbian.

This excerpt illustrates the features associated with the architecture of Symbian operating

system services, is just a small portion from the total architecture. Each of these features then

becomes a candidate for possible inclusion in the federation Ontology. The complete list of

Symbian features is shown below in figure 6. It is generated by Protégé 4.1 OWL tool.

Figure 5. Operating System Services as an excerpt from the overall Symbian Feature Model.


176

Figure 6. List of Symbian Operating System features.

This feature list is taken directly from the Symbian feature model. The features aligned as first

level at the left of the figure are high-level "parent" features, while those second, third level,...

etc represent more detailed "atomic" features.

Figure 7. Symbian visualization ontology generated by the Protégé 4.1 tool.

Figure 7 is taken from the Ontology editor tool. It gives a thorough idea about the ontological

concepts’ hierarchy of Symbian operating system. Meanwhile, figure 8 presents the Ontological

graph automated by OWL as layers of classes given by a top down approach.


177

Figure 8. Ontological graph automated by OWL.

3.1.2 Android OS

The second tool analyzed during the domain analysis was the Android OS. It is for low powered

devices that run on battery, and having plenty of hardware like cameras, light and orientation

sensors, Wi-Fi and Universal Mobile Telecommunication Systems (3G telephony) connectivity

and a touch-screen. Unlike on other mobile operating systems, Android applications are written

in java and run in virtual machines.

The Android OS feature model was developed by identifying essential features from the

Android OS user guide [6], Analysis of the Android Architecture Book [7] and by actual day-to-

day use of the suite. Figure 9 below illustrate excerpts (condensed feature diagram) from the

overall feature model for Android OS.

Figure 9. Condensed Android OS feature diagram.


178

From the complete feature model of Android OS, it was possible to extract relevant features

(with their descriptions). Each of these features then becomes a candidate for possible inclusion

in the federation Ontology. The complete list of Android OS features is listed below in figure

10.

This feature list is taken directly from the Android OS feature model, and generated by the

OWL Protégé 4.1 as Ontology classes. As in the case of the Symbian features list, the features

aligned as first level at the left of the figure are high-level "parent" features, while those second,

third level,... etc represent more detailed "atomic" features.

Figure 10. List of Android features generated by the Protégé 4.1.

Figure 11. Android visualization Ontology generated by the Protégé 4.1 tool.

Figure 11 is taken from the Ontology editor tool. It gives a thorough idea about the Ontological

concepts’ hierarchy of android operating system. Meanwhile, figure 12 presents the Ontological

graph automated by OWL as layers of classes given by a top down approach.


179

Figure 12. Android Ontological graph automated by OWL.

3.2 Federation Ontology

After performing a domain analysis using an in-depth investigation of Symbian OS and Android

OS, the two lists were considered together to identify commonalities -- commonalities that

would also likely be common with other mobile phone’s operating systems--. These

commonalities begin to form the list of terms that eventually will make up the federation

Ontology.

After identifying these common terms in the domain of mobile phone’s operating systems, the

words were organized into logical groupings using an "affinity diagram" technique (recall

Figure 2). From these affinity diagrams, it was then straight-forward to establish the hierarchical

structure of the federation Ontology. The completed federation Ontology classes are shown

below in Figure 13.

Figure 13. Federation Ontology classes.


180

3.3 Ontology Inter-relationships

The relationships between the three Ontologies were identified and annotated. This was done

using UML. Both a top down and bottom up approach were taken to identify the relationships

between the three Ontologies and record those relationships in static class diagrams.

Figure 14 is just one example of how the relationships between the three Ontologies are related.

Figure 14. UML representation of the relationships between the three Ontologies.

4. CONTRIBUTIONS

While the most important original contribution to the field of wireless cellular communication is

to build a mobile phone’s operating system Ontology, there are several other contributions:

1. The application of the HFCC to a sample set of mobile phone’s operating system

(i.e., Symbian and Android) increases the interoperability between the selected set.

2. The methodology was adapted from other sources (notably [3]), but was tailored for

identifying and capturing the unique characteristics of mobile phone’s operating

systems. This methodology can be used to add other mobile phone’s operating systems

to the tool Ontology. The methodology is as important as the Ontology itself.

3. Three separate Ontologies (and their inter-relationships) are presented: a high-level

mobile phone’s operating system Ontology, an Ontology that describes the Common

Object Model architecture of Symbian, and an Ontology that describes important (from

an interoperability viewpoint) classes from Android OS architecture.

4. The mobile operating system Ontology is a modular Ontology. This division into

modules has two major advantages: firstly, it facilitates the future introduction of new

domain Ontologies, and secondly, it makes the domain (mobile operating system)

Ontology more reusable.

5. The main purpose for developing Ontology is to overcome some of the obstacles (such

as the limitation of interoperability, lack of communication, and poor shared

understanding.

6. The use of feature models as a key asset to manage the commonalities and the

variabilities of the mobile phone’s operating systems.


181

5. CONCLUSION

This article presented the methodology of a research effort devoted to establishing the set of

mobile phone’s operating systems Ontologies for integration into the HFCC. It summarizes the

results of the domain analysis undertaken to produce the federation Ontology. Besides, it

presents the details of the federation Ontology as well as the two specific tool Ontologies.

Finally, the article presents the results of how the three Ontologies inter-relate by using UML to

annotate the inter-relationships.

While this methodology and strategy was useful in constructing the feature tree for Symbian

and Android, it is only provide a guide for the actual work.

During this Ontology construction process, [8]'s guidelines for Ontology construction were

adhered to as much as possible: clarity, coherence, extensibility, minimal Ontological

commitment, and minimal encoding bias. However, because it was necessary to adhere closely

to the actual class constructs of the tools themselves, it was often not possible to satisfy each of

these guidelines.

The mobile phone’s operating systems federation Ontology is then left open for further future

Enhancement and extension.

REFERENCES

[1] Czarnecki, K. and Eisenecker, U., Generative Programming Methods, Tools, and Applications,

Addison-Wesley, 2000.

[2] Uschold, M. and Gruninger, M., "Ontologies: Principles, Methods and Applications,"

Knowledge Engineering Review, Vol. 11, No. 2, June 1996.

[3] Hasni, Neji, "Towards an interoperability Ontology for software development tools" Master’s

Thesis, Computer Science Department, Naval Postgraduate School, Monterey, CA, March 2003.

[4] Geyer, L., “Feature Modeling Using Design Spaces,” Proceedings of 1st

German Workshop on

Product Line Software Engineering, Kaiserslautern, Germany, November 2000.

[5] Ben Morris, the Symbian OS Architecture Source book Design and Evolution of a Mobile Phone

OS, John Wiley & Sons, Ltd 2007.

[6] Google, “Android 3.0 User’s Guide”, February 23, 2011.

[7] Stefan Brahler, “Analysis of the Android Architecture”, Karlsruhe institute for technology,

October 2010.

[8] Gruber, T. R., “Toward Principles for the Design of Ontologies Used for

Knowledge Sharing,” International. Journal of Human-Computer Studies, Vol. 43, 1995, pp.

907-928.

Ontologi untuk Sistem Operasi Telepon Seluler

Abstrak

Penelitian yang dilakukan ini berkaitan dengan bagian penting dari garis penelitianyang merupakan beban yang menantang. Ini adalah penyelidikan awal dalam pengembanganKerangka Holistik untuk Komunikasi Seluler (HFCC). Tujuan utama adalah untuk membangunmekanisme yang nirkabel komponen komunikasi seluler yang ada dan model dapat bekerjasecara holistik bersama-sama. Ini menunjukkan bahwa membangun kerangka matematisyang memungkinkan teknologi yang sudah ada komunikasi seluler (dan alat-alat yangmendukung teknologi tersebut) untuk mulus berinteraksi secara teknis layak. Tujuan masadepan jangka panjang adalah untuk benar-benar meningkatkan interoperabilitas, efisiensikomunikasi mobile, panggilan kualitas, dan keandalan dengan menerapkan kerangka kerjauntuk upaya pembangunan yang spesifik.

1. IntroduksiSistem operasi Interoperabilitas adalah kemampuan dari sistem perangkat lunak

independen (disesuaikan dengan ponsel) untuk bekerja dengan satu sama lain,pertukaran informasi dan melakukan tindakan dari satu sama lain, tanpa mengetahuikondisi baik rincian atau bagaimana mereka masing-masing bekerja. Namun, untukmencapai tujuan ini, domain penelitian harus secara eksplisit dipahami tanpaambiguitas. Pendekatan ini bagian dari investigasi adalah untuk menganalisis struktur,dan arsitektur dari set kecil alat individu: Sistem Operasi Symbian (Symbian OS)—sebuah sistem operasi mobile dan platform komputasi yang dirancang untuk ponselpintar, saat ini dikelola oleh Accenture (adalah sebuah konsultan manajemen global,layanan teknologi dan outsourcing perusahaan yang bermarkas di Dublin, RepublikIrlandia)—. Dan Sistem Operasi Android (Android OS)—sistem operasi berbasis Linuxuntuk perangkat mobile seperti ponsel pintar dan komputer tablet. Perangkat lunak inidikembangkan oleh Open Handset Alliance yang dipimpin oleh Google)—.

2. Terhadap Ontologi Sistem Operasi Telepon Seluler2.1 Gambaran ontologi:

Istilah "Ontologi" secara luas digunakan dalam Rekayasa Pengetahuan danKecerdasan Buatan di mana apa yang "ada" adalah suatu entitas yang dapat"mewakili." Hal ini digunakan untuk merujuk pada pemahaman bersama tentangbeberapa domain yang menarik. Ini dapat digunakan sebagai kerangka pemersatuuntuk memecahkan masalah dalam domain tersebut.

2.2 Metodologi untuk membangun OntologiSalah satu cara untuk mengatasi kelemahan yang dihasilkan oleh kurangnya

interoperabilitas dalam sistem operasi ponsel heterogen adalah untukmembangun kerangka kerja kontekstual dan arsitektur pemersatu untuk domain.Sebagai kerangka kontekstual dan arsitektur atau "Ontologi" muncul; sistemoperasi ponsel akan dapat berkomunikasi dengan lebih efisien.

Untungnya, ada metodologi yang ada untuk merancang Ontologi.Metodologi yang disajikan oleh Neji Hasni adalah salah satunya. Hal inidimungkinkan untuk menyesuaikan metodologi ini untuk mengembangkanOntologi khusus untuk membangun federasi satu. Proses pengembanganOntologi terdiri dari langkah-langkah berikut:(1) mengidentifikasi tujuan dan ruang lingkup Ontologi,(2) melakukan analisis fitur untuk domain alat sistem operasi ponsel ',(3) mengumpulkan karakteristik serupa antara model fitur yang berbeda,

membangun hubungan afinitas, dan kesamaan kelompok antara dua alatdalam rangka membangun sebuah federasi Ontologi mewakili kesamaan ituuntuk masuk Ontologi ini ke anak didik 4.1,

(4) membangun ontologi yang lebih rinci untuk setiap alat dalam anak didik 4.1,(5) menggunakan Unified Modeling Language (UML) untuk mewakili hubungan

antara tiga Ontologi, dan(6) Dokumen Ontologi.

3. Desain Ontologi3.1 Analisis domain dan model fitur

Sebuah analisis domain dari domain sistem operasi mobile dilakukan.Analisis ini dilakukan dengan memeriksa dua sistem operasi ponsel yang spesifik,membangun model fitur alat tersebut, dan kemudian mengidentifikasiterminologi kunci dari model fitur. Ada dua alasan mengapa analisis domain initidak dapat dianggap sebagai analisis lengkap dari domain sistem operasi ponselitu. Pertama, hanya dua alat (dari banyak kemungkinan) dianalisis. Kedua, analisisdomain bukan "aditif" kegiatan; hanya menganalisis alat tambahan (di luar yangdua) sendiri tidak sepenuhnya menambah analisis secara keseluruhan. Cara-caradi mana tambahan baru mempengaruhi dan mengubah analisis ditetapkansebelumnya juga harus dipertimbangkan. Oleh karena itu, analisis domainterbatas dilakukan sebagai bagian dari penelitian ini dapat dianggap sebagaidiperlukan, tetapi tidak cukup, analisis menuju pembentukan kerangka kerjaterpadu untuk sistem operasi ponsel itu.

3.1.1 Sistem Operasi SymbianAlat pertama dianalisis dalam analisis domain adalah Symbian, sistem

operasi mobile dan platform komputasi yang dirancang untuk ponsel pintar.Model fitur dikembangkan dengan mengidentifikasi fitur perangkat lunakdari buku Symbian OS Arsitektur Sumber dan dengan penggunaan sehari-hari yang sebenarnya dari alat. Gambar 5 menggambarkan kutipan tunggaldari model fitur keseluruhan untuk Symbian. Kutipan ini menggambarkanfitur yang berhubungan dengan arsitektur Symbian layanan sistem operasi,hanya sebagian kecil dari keseluruhan arsitektur. Masing-masing fitur inikemudian menjadi kandidat untuk dimasukkan mungkin dalam federasiOntologi. Daftar lengkap fitur Symbian ditampilkan di bawah ini padagambar 6. Hal ini dihasilkan oleh anak didik alat 4.1 OWL.

3.1.2 Sistem Operasi Android

Alat kedua dianalisis selama analisis domain adalah OS Android. Halini untuk perangkat bertenaga rendah yang berjalan pada baterai, danmemiliki banyak perangkat keras seperti kamera, cahaya dan orientasisensor, Wi-Fi dan Universal Mobile Telecommunication Sistem (3Gtelephony) konektivitas dan layar sentuh. Tidak seperti pada sistem operasimobile lainnya, aplikasi Android ditulis di Jawa dan berjalan di mesin virtual.

3.2 Federasi OntologiSetelah melakukan analisis domain menggunakan investigasi mendalam

dari Symbian OS dan Android OS, dua daftar dianggap sama untukmengidentifikasi kesamaan—kesamaan yang juga kemungkinan akan menjadiumum dengan sistem operasi ponsel lain—. Kesamaan ini mulai membentukdaftar istilah yang pada akhirnya akan membentuk federasi Ontologi.

3.3 Ontologi Inter-hubunganHubungan antara tiga Ontologi diidentifikasi dan dijelaskan. Hal ini

dilakukan dengan menggunakan UML. Pendekatan kedua top down dan bottomup diambil untuk mengidentifikasi hubungan antara tiga Ontologi dan merekamhubungan-hubungan dalam diagram kelas statis.

4. KontribusiSementara kontribusi asli yang paling penting untuk bidang komunikasi seluler

nirkabel adalah untuk membangun Ontologi sistem operasi ponsel ini, ada beberapakontribusi lainnya:1. Penerapan HFCC untuk sampel set sistem operasi ponsel (yaitu, Symbian dan

Android) meningkatkan interoperabilitas antara set yang dipilih.2. Metodologi ini diadaptasi dari sumber lain, tetapi dirancang untuk

mengidentifikasi dan menangkap karakteristik unik dari sistem operasi teleponmobile. Metodologi ini dapat digunakan untuk menambahkan sistem operasiponsel yang lain untuk alat Ontologi. Metodologi adalah sama pentingnya denganOntologi itu sendiri.

3. Tiga Ontologi terpisah (dan hubungan antar mereka) disajikan : Ontologi ponseltingkat tinggi itu sistem operasi, sebuah Ontologi yang menggambarkan Commonarsitektur Object Model dari Symbian, dan Ontologi yang menggambarkanpenting (dari sudut pandang interoperabilitas) kelas dari arsitektur OS Android.

4. Mobile sistem operasi Ontologi Ontologi adalah modular. Divisi ini ke dalammodul memiliki dua keunggulan utama : pertama, memfasilitasi pengenalan masadepan ontologi domain baru, dan kedua, itu membuat domain (sistem operasimobile) Ontologi lebih reusable.

5. Tujuan utama untuk mengembangkan Ontologi adalah untuk mengatasi beberapakendala (seperti keterbatasan interoperabilitas, kurangnya komunikasi, danpemahaman bersama miskin.

6. Penggunaan model fitur sebagai aset utama untuk mengelola persamaan dankeragaman sistem operasi telepon mobile.

5. Konklusi

Artikel ini disajikan metodologi dari upaya penelitian yang ditujukan untukmembangun set sistem operasi Ontologi ponsel untuk integrasi ke dalam HFCC. Inimerangkum hasil analisis domain dilakukan untuk menghasilkan federasi Ontologi.Selain itu, menyajikan rincian federasi Ontologi serta dua Ontologi alat khusus.Akhirnya, artikel menyajikan hasil bagaimana tiga Ontologi saling berhubungan denganmenggunakan UML untuk membubuhi keterangan keterkaitan.

ujian kompetensi dasar pengantar organisasi komputer

Education