gpu it2fls

Speed up of Interval Type-2 Fuzzy Logic Systemsbased on GPU for Robot NavigationLong Thanh Ngo, Dzung Dinh Nguyen, Long The Pham, Cuong Manh Luong

Department of Information SystemsLe Quy Don Technical University, Hanoi, Vietnam

E-mail: [email protected], [email protected], [email protected], [email protected]

Abstract—As the number of rules and sample rate for Type-2Fuzzy Logic Systems (T2FLS) increase, the speed of calculationsbecome a problem. The T2FLS has a large membership value ofinherent algorithmic parallelism that modern CPU architecturesdo not exploit. In the T2FLS, many rules and algorithms can bespeed up on a Graphics Processing Unit (GPU) as long as themajority of computation a various stages and components are notdependent on each other. This paper demonstrates how to install aInterval Type-2 Fuzzy Logic Systems (IT2-FLS) on the GPU andexperiments for obstacle avoidance behavior of robot navigation.GPU-based calculations are high performance solution and freeup the CPU. The experimental results show that the performanceof the GPU is many times faster than CPU.

Index Terms—Graphics Processing Units, high performancecomputing, Interval Type-2 Fuzzy Logic System, Type-2 FuzzySets.

I. INTRODUCTION

Graphic Processing Units (GPUs) give a new way to per-form general purpose computing on hardware that is bettersuited for the complicated Fuzzy Logic Systems. However,the installation of these system on the GPUs are also difficultbecause many algorithms are not designed in a parallel formatconducive to GPU processing. In addition, there may be toomany dependencies at various stages in the algorithm that willslow down GPU processing.

Type-2 fuzzy logic has been developed in theory andpractice to obtain achievement for real applications [10]-[19].A review of the methods used in the design of interval type-2 fuzzy controllers has been considered [6]. However, thecomplexity of T2FLS is still large and many researches focusto reduce these problems on the approach to algorithm orhardware implementation. Some proposals implement type-2FLS focus on the design, and software development for codinga high-speed defuzzication stage based on the average methodof two type-1 FLS [7] or the optimization of an incrementalfuzzy PD controller based on a genetic algorithm [8]. Morerecent works, where an interval type-2 FIS Karnik-Mendel isdesigned, tested and implemented based on hardware imple-mentation [5]. Using GPUs for general purpose computing ismentioned in many researches, recently, to speed up compli-cated algorithms by parallelizing to suitable GPU architecture,especially for applications of fuzzy logic. Anderson et al [1]presented a GPU solution for the Fuzzy C-Means (FCM).This solution used OpenGL and Cg to achieve approximatelytwo orders of magnitude computational speed-up for someclustering profiles using an nVIDIA 8800 GPU. They later

generalized the system for the use of non-Euclidean metrics[2]. Further, Sejun Kim [22] describes the method used toadapt a multi-layer trees structure composed of fuzzy adaptiveunits into CUDA platforms. Iurie Chiosa et al [23] presentsa framework for mesh clustering solely implemented on theGPU with a new generic multi level clustering technique. Chia-Feng [25] proposes the implementation of a zero-order TSK-fuzzy neural network (FNN)on GPUs to reduce training time.Harvey et al [4] presents a GPU solution for fuzzy inference.Anderson et al [3] presents a parallel implementation of fuzzyinference on a GPU using CUDA. Again, over two ordersof speed improvement of this naturally parallel algorithm canbe achieved under particular inference profiles. One problemwith this system, as well as the FCM GPU implementation, isthat they both rely upon OpenGL and Cg (graphics libraries),which makes the system and generalization of its difficult fornewcomers to GPU programming.

Therefore, we carried out fuzzy logic systems analysis inorder to take advantage of GPUs processing capabilities. Thealgorithm must be altered in order to be computed fast on aGPU. In this paper, we explore the use of nVIDIA ’s ComputeUnified Device Architecture (CUDA) for the implementationof a Interval Type-2 Fuzzy Logic System (IT2FLS). Thislanguage exposes the functionality of the GPU in a languagethat most programmers are familiar with, the C/C++ languagethat the masses can understand and more easily integrate intoapplications that do not have the need otherwise to interfacewith a graphics API. Experiments are implemented for obstacleavoidance behavior of robot navigation based on nVIDIAplatform with the summarized reports on runtime.

The paper is organized as follows: Section II presents anoverview on GPUs and CUDA; Section III introduces theInterval Type-2 Fuzzy Logic Systems; Section IV proposesa Speed-up of IT2FLS using GPU and CUDA ; Section Vpresents experimental results of IT2FLS be implemented onGPUs in comparing with on CPU; Section VI is conclusionand future works.

II. GRAPHICS PROCESSOR UNITS AND CUDA

Traditionally, graphics operations, such as mathematicaltransformations between coordinate spaces, rasterization, andshading operations have been performed on the CPU. GPUswere invented in order to offload these specialized proceduresto advanced hardware better suited for the task at hand.

Because of the popularity of gaming, movies and computer-aided design, these devices are advancing at an impressive rate.Classically, before the advent of CUDA, general purpose pro-gramming on a GPU (GPGPU) was performed by translatinga computational procedure into a graphics format that couldbe executed in the standard graphics pipeline. This refers tothe process of encoding data into a texture format, identifyingsampling procedures to access this data, and converting thealgorithms into a process that utilized rasterization (the map-ping of array indices to graphics fragments) and Frame BufferObjects (FBO) for multi-pass rendering. GPUs are specialisedstream processing devices.

This processing model takes batches of elements andcomputes a similar independent calculation in parallel to allelements. Each calculation is performed with respect to aprogram, typically called a kernel. GPUs are growing at a fasterrate than CPUs, and their architecture and stream processingdesign makes them a natural choice for many algorithms,such as computational intelligence algorithms, that can beparallelised.

nVIDIA’s CUDA is a data-parallel computing environmentthat does not require the use of a graphics API, such asOpenGL and a shader language. CUDA applications are cre-ated using the C/C++ language. A CPU and a GPU programare developed in the same environment (i. e. a single C/C++program) and the GPU code is later translated from C/C++to instructions to be executed by the GPU. nVIDIA has evengone as far as providing a CUDA Matlab plug-in. A C/C++program using CUDA can interface with one GPU or multipleGPUs can be identified and utilized in parallel, allowing forunprecedented processing power on a desktop or workstation.

CUDA allows multiple kernels to be run simultaneously ona single GPU. CUDA refers to each kernel as a grid. A gridis a collection of blocks. Each block runs the same kernel butis independent of each other (this has significance in termsof access to memory types). A block contains threads, whichare the smallest divisible unit on a GPU. This architecture isshown in Fig. 1.

The next critical component of a CUDA application is thememory model. There are multiple types of memory and eachhas different access times. The GPU is broken up into read-write per-thread registers, read-write per-thread local memory,read-write per-block shared memory, read-write per-grid globalmemory, read-only per-grid constant memory, and read-onlyper-grid texture memory. This model is shown in Fig. 2.

Texture and constant memory have relatively small accesslatency times, while global memory has the largest accesslatency time. Applications should minimize the number ofglobal memory reads and writes. This is typically achieved byhaving each thread read its data from global memory and storeits content into shared memory (a block level memory structurewith smaller access latency time than global memory). Threadsin a block synchronize after this step. Memory is allocated onthe GPU using a similar mechanism to malloc in C, using thefunctions cudaMalloc and cudaMallocArray. GPU functionsthat can be called by the host (the CPU) are prefixed with the

Fig. 1. CUDA processing model design [26]

Fig. 2. CUDA GPU memory model design [26]

symbol global , GPU functions that can only be called bythe GPU are prefixed with device , and standard functionsthat are callable from the CPU and executed on the CPU areprefixed with host (or the symbol can be omitted, as it isthe default). GPU functions can take parameters, as in C. Whenthere are a few number of variables that the CPU would liketo pass to the GPU, parameters are a good choice, otherwise,such as in the case of large arrays, the data should be storedin global, constant, or texture memory and a pointer to thismemory is passed to the GPU function. Whenever possible,data should be kept on the GPU and not transferred back and

forth to the CPU.

III. INTERVAL TYPE-2 FUZZY LOGIC SYSTEMS

A. Type-2 Fuzzy Sets

A type-2 fuzzy set in X is denoted A, and its membershipgrade of x ∈ X is µA(x, u), u ∈ Jx ⊆ [0, 1], which is a type-1 fuzzy set in [0, 1]. The elements of domain of µA(x, u)are called primary memberships of x in A and membershipsof primary memberships in µA(x, u) are called secondarymemberships of x in A.

Definition 3.1: A type− 2 fuzzy set, denoted A, is char-acterized by a type-2 membership function µA(x, u) wherex ∈ X and u ∈ Jx ⊆ [0, 1], i. e. ,

A = {((x, u), µA(x, u))|∀x ∈ X,∀u ∈ Jx ⊆ [0, 1]} (1)

or

A =

∫x∈X

∫u∈Jx

µA(x, u))/(x, u), Jx ⊆ [0, 1] (2)

in which 0 ≤ µA(x, u) ≤ 1.At each value of x, say x = x′, the 2-D plane whose axes

are u and µA(x′, u) is called a vertical slice of µA(x, u). A

secondary membership function is a vertical slice of µA(x, u).It is µA(x = x′, u) for x ∈ X and ∀u ∈ Jx′ ⊆ [0, 1], i. e.

µA(x = x′, u) ≡ µA(x′) =

∫u∈Jx′

fx′(u)/u, Jx′ ⊆ [0, 1] (3)

in which 0 ≤ fx′(u) ≤ 1.In manner of embedded fuzzy sets, a type-2 fuzzy sets [10]

is union of its type-2 embedded sets, i. e

A =

n∑j=1

Aje (4)

where n ≡N∏i=1

Mi and Aje denoted the jth type-2 embedded

set of A, i. e. ,

Aje ≡ {

(uji , fxi(u

ji )), i = 1, 2, ..., N} (5)

where uji ∈ {uik, k = 1, ...,Mi}.Type-2 fuzzy sets are called an interval type-2 fuzzy sets if

the secondary membership function fx′(u) = 1 ∀u ∈ Jx i. e.a type-2 fuzzy set are defined as follows:

Definition 3.2: An interval type-2 fuzzy set A is character-ized by an interval type-2 membership function µA(x, u) = 1where x ∈ X and u ∈ Jx ⊆ [0, 1], i. e. ,

A = {((x, u), 1)|∀x ∈ X,∀u ∈ Jx ⊆ [0, 1]} (6)

Uncertainty of A, denoted FOU, is union of primary func-tions i. e. FOU(A) =

⋃x∈X Jx. Upper/lower bounds of

membership function (UMF/LMF), denoted µA(x) and µA(x),

of A are two type-1 membership function and bounds of FOU.

Fig. 3. Diagram of type 2 fuzzy logic system [13].

B. Interval Type-2 Fuzzy Logic Systems (IT2FLS)

The general type-2 Fuzzy Logic System is introduced asFig. 3. The output block of a type-2 fuzzy logic system consistsof two blocks are type-reduce and defuzzifier. The type-reduceblock will map a type-2 fuzzy set to a type-1 fuzzy set, and thedefuzzifier block will map a fuzzy to a crisp. The membershipfunction of an interval type-2 fuzzy set is called FOU whichis limited by two membership functions of an type-1 fuzzy setare UMF and LMF.

Fig. 4. The membership function of an interval type 2 fuzzy set [10].

The combination of antecedents in a rule for IT2FLS iscalled firing strength process represented by the Fig. 5.

Fig. 5. The combination of antecedents in a rule for IT2FLS [20].

In the IT2FLS, calculating process involves 5 steps togetting outputs: fuzzyfication, combining the antecedents (ap-ply fuzzy operators or implication function), aggregation, anddefuzzyfication.

Because each pattern has a membership interval as the upperµA(x) and the lower µ

A(x), each centroid of a cluster is

represented by the interval between cL and cR. Now, we willrepresent an iterative algorithm to find cL and cR

Step 1: Calculate θi by the following equation:

θi =1

2[µ(xi) + µ(xi)] (7)

Step 2: Calculate c′:

c′= c(θ1, θ2, ..., θN ) =

N∑i=1

xi ∗ θiN∑i=1

θi

(8)

Step 3: Find k such that xk ≤ c′ ≤ xk+1

Step 4: Calculate c” by following equation: In case c” isused for finding cL

c” =

k∑i=1

xiµ(xi) +N∑

i=k+1

xiµ(xi)

k∑i=1

µ(xi) +N∑

i=k+1

µ(xi)

(9)

In case c” is used for finding cR

c” =

k∑i=1

xiµ(xi) +N∑

i=k+1

xiµ(xi)

k∑i=1

µA(xi) +N∑

i=k+1

µ(xi)

(10)

Step 5: If c′= c” go to Step 6 else set c

′= c” then back

to Step 3Step 6: Set cL = c

′or cR = c

′

Finally, Compute the mean of centroid , y as:

y =cR + cL

2(11)

IV. SPEED-UP OF IT2FLS USING GPU AND CUDA

The first step in IT2FLS on the GPU is selection of memorytypes and sizes. This is a critical step, the choice of formatand type dictate performance. Memory should be allocatedsuch that sequential access (of read and write operations) isas possible as the algorithm will permit.

Let the number of inputs be N , the number of parametersthat define a membership function be P ,the number of rulesbe R and the discretization rate be S. Inputs are stored on theGPU as a one dimensional array of size N .

Fig. 6. Input Vector.

The consequences are a CPU two dimensional array of sizeR× P . They are used only on the CPU when calculating thediscrete fuzzy set membership values.

Fig. 7. Consequent.

Fig. 8. Antecedent.

The antecedents are a two dimensional array on the GPUof size R× (N × P ).

The fired antecedents are an R one dimensional array onthe GPU, which stores the result of combining the antecedentsof each rule. The last memory layout is the discretized conse-quent, which is an S ×R matrix created on the GPU.

Fig. 9. Discretized Consequent

The inputs and antecedents are of type texture memory,because they do not change during the firing of a FLS, butcould change between consecutive firings of a FLS and needto be updated. we proposed the GPU program flow diagramfor a CUDA application computing a IT2FLS in Fig. 10.

In IT2FLS, we have to calculate two values for two mem-bership functions that are UMF and LMF. The first step is akernel that fuzzifies the inputs and combines the antecedents.The next steps are implication and a process which is respon-sible for aggregating the rule outputs. The last GPU kernel isthe defuzzification step.

The first kernel reads from the inputs and antecedentstextures and stores its results in the fired antecedent’s globalmemory section. All inputs are sampled for each rule, therth rule samples the rth row in the antecedent’s memory,membership values are calculated, the minimum of the an-tecedents is computed and stored in the rth row of the firedantecedent’s memory region. There are B blocks used by thiskernel, partially because there is a limit in terms of the numberof threads that can be created per block (current max is 512threads). Also, one must consider the number of threads and

Fig. 10. IT2FLS Diagram for a CUDA application on the GPU.

the required amount of register and local memory needed bya kernel to avoid memory overflow. This information can befound per each GPU. We limited the number of threads perblock to 128 (an empirical value found by trying differentblock and thread profiles for a system that has two inputsand trapezoidal membership functions). The general goal ofa kernel should be to fetch a small number of data pointsand it should have high arithmetic intensity. This is the reasonwhy only a few memory fetches per thread are made and themembership calculations and combination step is performed ina single kernel.

The next steps are implication and rule aggregation kernels.At first, one might imagine that using two kernels to calculatethe implication results and rule aggregation would be desirable.However, the implication kernel, which simply calculates theminimum between the respective combined antecedent resultsand the discretized consequent, is inefficient. As stated above,the ratio of arithmetic operations to memory operations isimportant. We want more arithmetic intensity than memoryaccess in a kernel. Attempt to minimize the number of globalmemory samples, we perform implication in the first step ofreduction. Reduction, in this context, is the repeated appli-cation of an operation to a series of elements to produce asingle scalar result. In the case of rule aggregation, this is

the application of the maximum operator over each discreteconsequent sample point for each rule. The advantage of GPUreduction is that it takes advantage of the parallel processingunits to perform a divide and conquer strategy. And the laststep is defuzzification kernel. As described above, rule outputaggregation and defuzzification reduction are used for IT2FLSon the GPU.

The output of rule output aggregation is two rows in thediscretized consequent global memory array. The defuzzifierstep is done by the Karnik - Mendel algorithms with twoinputs are rule combine UMF and rule combine LMF withtwo outputs yl and yr respectively. The crisp output y iscalculated by the formula y = (yl + yr)/2.

The steps for finding yl and yr on GPU (Notation:Rule Combine UMF(i) =µA(xi), Rule Combine LMF(i)=µA(xi), N= Sample rate)

Step 1: Calculate θi on GPU by the following equation:

θi =1

2[µA(xi) + µA(xi)] (12)

Step 2: Calculate c′

on GPU:

c′= c(θ1, θ2, ..., θN ) =

N∑i=1

xi ∗ θiN∑i=1

θi

(13)

Next, copy c′

to host memory.Step 3: Find k such that xk ≤ c

′ ≤ xk+1 (Calculated onCPU)

Step 4: Calculate c” on GPU by following equation: In casec” is used for finding yl

c” =

k∑i=1

xiµA(xi) +N∑

i=k+1

xiµA(xi)

k∑i=1

µA(xi) +N∑

i=k+1

µA(xi)

(14)

In case c” is used for finding yr

c” =

k∑i=1

xiµA(xi) +N∑

i=k+1

xiµA(xi)

k∑i=1

µA(xi) +N∑

i=k+1

µA(xi)

(15)

Next, copy c” to host memoryStep 5: If c

′= c” go to Step 6 else set c

′= c” then back

to Step 3 (Calculated on CPU)Step 6: Set yl = c

′or yr = c

′

V. EXPERIMENTS

A. Problems

We implement IT2FLS with collision avoidance behaviorof robot navigation. The fuzzy logic systems have two inputs:the extended fuzzy directional relation (FDR) [21] and rangeto obstacle, the output is angle of deviation (AoD). The fuzzyrule has the form as following:

IF FDR is Ai AND Range is Bi THEN AoD is Ci

where Ai, Bi, Ci are type-2 fuzzy sets of antecedent andconsequent, respectively.

Fig. 11. Membership grades of FDR.

Fig. 12. Membership grades of Range.

Fig. 13. Membership grades of AoD.

The fuzzy directional relation has six linguistic values(NLarge, NMedium, NSmall, PSmall, PMedium and PLarge).The range from robot to obstacle is divided in four subsets:VNear, Near, Medium and Far. The output of fuzzy if-then isa linguistic variable representing for angle of deviation, hassix linguistic variables the same the fuzzy directional relationwith the different membership functions. Linguistic values areinterval type-2 fuzzy subsets that membership functions aredescribed in Fig. 11,12,13. The problem are built with 24 rulegiven by the following table. I.

TABLE ITHE RULE BASE OF COLLISION AVOIDANCE BEHAVIOR

FDR Range AoD FDR Range AoDNS VN PL PS VN NLNS N PL PS N NLNS M PM PS M NMNS F PS PS F NSNM VN PM PM VN NMNM N PM PM N NMNM M PM PM M NMNM F PS PM F NSNL VN PM PL VN NMNL N PM PL N NMNL M PS PL M NSNL F PS PL F NS

B. ExperimentsThe performance of the GPU implementation of a IT2FLS

was compared to a CPU implementation. The problem is

written in C / C + + console format and be installed onthe Microsoft Visual Studio 2008, and it was performed ona computers with the operating system was windows 7 32bitand nVIDIA CUDA support with specifications:

CPU was the Core i3-2310M 2. 1GHz, the system had 2GB of system RAM (DDR3).

GPU was an nVIDIA Gerforce GT 540M graphics card with96 CUDA Core, 1GB of texture memory and PCI Express X16.

The number of inputs was fixed to 2, the number of ruleswas varied between 32, 64, 128, 256 and 512 and sample ratewas varied between 256, 512, 1024, 2048, 4096 and 8192.

We take the ratio of CPU versus GPU performance. Avalue below 1 indicated the CPU is performing best and valueabove 1 indicates the GPU is performing best. The CPU/GPUperformance ratios for the IT2FLS are given in table 2 andRun-time Graph of the problem implementation was showedin Fig. 14

TABLE IICPU/GPU PERFORMANCE RATIO

HHHHR

S 128 256 512 1024 2048 4096 8192

32 0.1 0.24 0.41 0.76 1.47 2.64 3.22

64 0.21 0.39 0.73 1.58 2.7 4.69 8.99

128 0.32 0.76 1.25 3.07 3.95 9.01 18.26

256 0.72 1.19 2.95 5.35 10.7 17.56 21.3

512 0.77 2.43 3.32 12.57 18.43 20.51 29.3

VI. CONCLUSION

As demonstrated in this paper, the implementation of in-terval type-2 FLS on a GPU without the use of a graphicsAPI which can be used by any researcher with knowledge ofC/C++. We have demonstrated that the CPU outperforms theGPU for small systems. As the number of rules and samplerate grow the GPU outperforms the CPU. There is a switchpoint in the performance ratio matrices (Table 2) that indicateswhen the GPU is more efficient than the CPU. In the casethat Sample rate is 8192 and Rule is 512, the GPU runsapproximately 30 times faster on the computer.

Future work will look at to extend interval type-2 FLS to thegeneralised type-2 FLS and applying to various applications.

ACKNOWLEDGMENT

This paper is sponsored by Intelligent Robot Project atLQDTU and the Research Fund RFit@LQDTU, Faculty ofInformation Technology, Le Quy Don University.

REFERENCES

[1] Anderson, D., Luke, R., Keller, J. , ”Speedup of Fuzzy Clustering ThroughStream Processing on Graphics Processing Units”, IEEE Trans. on FuzzySystems, Vol.16:4, pp. 1101 - 1106, 2007.

[2] Anderson, D., Luke, R., Keller, J. ,”Incorporation of Non-Euclidean Dis-tance Metrics into Fuzzy Clustering on Graphics Processing Units”,Proc.IFSA, 41, pp 128-139, 2007.

[3] Anderson, Parallelisation of Fuzzy Inference on a Graphics Processor UnitUsing the Compute Unified Device Architecture, The 2008 UK Workshopon Computational Intelligence, UKCI 2008, pp 1-6, 2008.

Fig. 14. Run-time graph of the problem implementation.

[4] Harvey, N., Luke, R., Keller, J., Anderson, D., ”Speedup of Fuzzy Logicthrough Stream Processing on Graphics Processing Units”, Proc. IEEECongress on Evolutionary Computation, 2008.

[5] Roberto Seplveda, Oscar Montiel, Oscar Castillo, Patricia Melin: Embed-ding a high speed interval type-2 fuzzy controller for a real plant into anFPGA. Appl. Soft Comput. 12(3): 988-998 (2012).

[6] Oscar Castillo, Patricia Melin: A review on the design and optimizationof interval type-2 fuzzy controllers. Appl. Soft Comput. 12(4): 1267-1278(2012).

[7] Roberto Seplveda, Oscar Montiel, Oscar Castillo, Patricia Melin: Mod-elling and Simulation of the defuzzification Stage of a Type-2 Fuzzy Con-troller Using VHDL Code. Control and Intelligent Systems 39(1):(2011).

[8] Yazmn Maldonado, Oscar Castillo, Patricia Melin: Optimization of Mem-bership Functions for an Incremental Fuzzy PD Control Based on GeneticAlgorithms. Soft Computing for Intelligent Control and Mobile Robotics2011: 195-211.

[9] J. D. Owens, et al., A Survey of General-Purpose Computation onGraphics Hardware, Eurographics 2005, State of the Art Reports, August,2005.

[10] N. Karnik, J.M. Mendel and Liang Q., ”Type-2 Fuzzy Logic Systems”,IEEE Trans. on Fuzzy Systems, 7(6), pp. 643-658, 1999.

[11] N. Karnik and J.M. Mendel, ”Centroid of a Type-2 Fuzzy Set”, Infor-mation Sciences, 132, pp. 195-220, 2001.

[12] Liang Q. and J.M. Mendel, ”Interval Type-2 Fuzzy Logic Systems:Theory and Design”, IEEE Trans. on Fuzzy Systems, 8(5), pp. 535-550,2000.

[13] J.M. Mendel , John R. I., Feilong Liu, ”Interval Type-2 Fuzzy LogicSystems Made Simple”, IEEE Trans. on Fuzzy Systems, 14(6), pp. 808-821, 2006.

[14] Feilong Liu, An efficient centroid type-reduction strategy for generaltype-2 fuzzy logic system , Information Sciences, Vol. 178(9), PP. 2224-2236, 2008.

[15] Long Thanh Ngo, Long The Pham, Phuong Hoang Nguyen, K.Hirota,On approximate representation of type-2 fuzzy sets using triangulatedirregular network, Foundations of Fuzzy Logic and Soft Computing,Lecture Notes in Computer Science, LNCS 4529, Springer, pp. 584-593,2007.

[16] Long Thanh Ngo, Long The Pham, Phuong Hoang Nguyen, K.Hirota,Refinement geometric algorithms for type-2 fuzzy set operations, Pro-ceedings, IEEE-FUZZ 09,, pp. 866-871, 2009.

[17] Long Thanh Ngo, Refinement CTIN for General Type-2 Fuzzy LogicSystems, Proceedings, IEEE-FUZZ 2011,, pp. 1225 - 1232, 2011.

[18] Janusz T. Starczewski, Efficient triangular type-2 fuzzy logic systems,International Journal of Approximate Reasoning, Vol. 12, No.5, pp. 799-811, 2009.

[19] H. Hagras, A Hierarchical Type-2 Fuzzy Logic Control Architecture forAutonomous Mobile Robots, IEEE Trans. on Fuzzy Systems, Vol. 12,No.4, pp. 524-539, 2004.

[20] R.Imam, B.Kharisma,Design of Interval Type-2 Fuzzy Logic BasedPower System Stabilizer,International Journal of Electrical and ElectronicsEngineering, Vol.3, No.10,pp 593-600, 2009.

[21] L. T. Ngo, L. T. Pham, P. H. Nguyen. Extending fuzzy directionalrelationship and applying for mobile robot collision avoidance behavior,International Journal of Advanced Computational Intelligence & Intelli-gent Informatics, Vol. 10 No.4, 2006

[22] Sejun K., Donald C., ”A GPU based Parallel Hierarchical Fuzzy ARTClustering”, Proceedings of International Joint Conference on NeuralNetworks, vol.1, pp.2778-2782, 2011.

[23] Iurie,C., Andreas,K. ”GPU-Based Multi level Clustering” ,Visualizationand Computer Graphics,IEEE Transactions on ,vol.17,no.2,,pp.132-145,February 2011

[24] Sharanyan S. ,Vasiya K. ,Arvind K. ,”Online Accelarated Implemen-tation of the Fuzzy C-means algorithm with the use of the GPU plat-form”,ICCCT, 2011.

[25] Chia-F.,Teng-C., Wei-Y, ”Speedup of Implementing Fuzzy NeuralNetworks With High-Dimensional Inputs Through Parallel Process-ing on Graphic Processing Units”,Fuzzy Systems,IEEE Transactionson,,vol.19,no.4,,pp.717-728, August 2011

[26] Harris, M., Optimizing Parallel Reduction in CUDA,NVIDIA Whitepaper, March 18, 2008, http ://www.nvidia.com/object/cuda/sample/dat/a− parallel.html

[27] NVIDIA CUDA Programming Guide, http : //www.nvidia.com/

gpu it2fls

Documents