on the performance of parametric polymorphism in maple laurentiu draganstephen m. watt ontario...

On the Performance of On the Performance of Parametric Parametric Polymorphism in MaplePolymorphism in Maple

Laurentiu DraganLaurentiu Dragan Stephen M. WattStephen M. Watt

Ontario Research Centre for Computer Ontario Research Centre for Computer AlgebraAlgebra

University of Western OntarioUniversity of Western Ontario

Maple Conference 2006Maple Conference 2006

OutlineOutline

Parametric PolymorphismParametric Polymorphism SciMarkSciMark SciGMarkSciGMark A Maple Version of SciGMarkA Maple Version of SciGMark ResultsResults ConclusionsConclusions

Parametric PolymorphismParametric Polymorphism

Type Polymorphism – Allows a single definition Type Polymorphism – Allows a single definition of a function to be used with different types of of a function to be used with different types of datadata

Parametric Polymorphism Parametric Polymorphism – A form of polymophism where the code does not use A form of polymophism where the code does not use

any specific type informationany specific type information– Instances with type parametersInstances with type parameters

Increasing popularity – C++, C#, JavaIncreasing popularity – C++, C#, Java Code reusability and reliabilityCode reusability and reliability Generic Libraries – STL, Boost, NTL, LinBox, Generic Libraries – STL, Boost, NTL, LinBox,

Sum-IT (Aldor)Sum-IT (Aldor)

SciMarkSciMark

National Institute of Standards and TechnologyNational Institute of Standards and Technology– http://math.nist.gov/scimark2http://math.nist.gov/scimark2

Consists of five kernels:Consists of five kernels:

1.1. Fast Fourier transformFast Fourier transform– One-dimensional transform of 1024 complex One-dimensional transform of 1024 complex

numbersnumbers– Each complex number 2 consecutive entries in the Each complex number 2 consecutive entries in the

arrayarray– Exercises complex arithmetic, non-constant memory Exercises complex arithmetic, non-constant memory

references and trigonometric functionsreferences and trigonometric functions

SciMarkSciMark

2.2. Jacobi successive over-relaxationJacobi successive over-relaxation– 100 × 100 grid100 × 100 grid– Represented by a two dimensional arrayRepresented by a two dimensional array– Exercises basic “grid averaging” – each A(i, j) is Exercises basic “grid averaging” – each A(i, j) is

assigned the average weighting of its four nearest assigned the average weighting of its four nearest neighborsneighbors

3.3. Monte CarloMonte Carlo– Approximates the value of Approximates the value of π π by computing the integral by computing the integral

part of the quarter unit cyclepart of the quarter unit cycle– Random points inside the unit square – compute the Random points inside the unit square – compute the

ratio of those within the cycleratio of those within the cycle– Exercises random-number generators, function inliningExercises random-number generators, function inlining

SciMarkSciMark

4.4. Sparse matrix multiplicationSparse matrix multiplication– Uses an unstructured sparse matrix representation Uses an unstructured sparse matrix representation

stored in a compressed-row formatstored in a compressed-row format– Exercises indirection addressing and non-regular Exercises indirection addressing and non-regular

memory referencesmemory references

5.5. Dense LU factorizationDense LU factorization– LU factorization of a dense 100 × 100 matrix using LU factorization of a dense 100 × 100 matrix using

partial pivotingpartial pivoting– Exercises dense matrix operationsExercises dense matrix operations

SciMarkSciMark

The kernels are repeated until the time spent The kernels are repeated until the time spent in each kernel exceeds a certain threshold (2 in each kernel exceeds a certain threshold (2 seconds in our case)seconds in our case)

After the threshold is reached, the kernel is After the threshold is reached, the kernel is run once more and timedrun once more and timed

The time is divided by number of floating The time is divided by number of floating point operationspoint operations

The result is reported in MFlops (or Million The result is reported in MFlops (or Million Floating-point instructions per second)Floating-point instructions per second)

SciMarkSciMark

There are two sets of data for the tests: large There are two sets of data for the tests: large and smalland small

Small uses small data sets to reduce the Small uses small data sets to reduce the effect of cache misseseffect of cache misses

Large is the opposite of small Large is the opposite of small For our Maple tests we used only the small For our Maple tests we used only the small

data setdata set

SciGMarkSciGMark

Generic version of SciMark (SYNASC 2005)Generic version of SciMark (SYNASC 2005)– http://www.orcca.on.ca/benchmarkshttp://www.orcca.on.ca/benchmarks

Measure difference in performance between Measure difference in performance between generic and specialized codegeneric and specialized code

Kernels rewritten to operate over a generic Kernels rewritten to operate over a generic numerical type supporting basic arithmetic numerical type supporting basic arithmetic operations (+, -, ×, /, zero, one)operations (+, -, ×, /, zero, one)

Current version implements a wrapper for Current version implements a wrapper for numbers using double precision floating-numbers using double precision floating-point representationpoint representation

Parametric Polymorphism in Parametric Polymorphism in MapleMaple

Module-producing functionsModule-producing functions– Functions that take one or more modules as Functions that take one or more modules as

arguments and produce modules as their resultarguments and produce modules as their result– Resulting modules use operations from the Resulting modules use operations from the

parameter modules to provide abstract algorithms parameter modules to provide abstract algorithms in a generic formin a generic form

ExampleExample

MyGenericType := proc(R)MyGenericType := proc(R) module () module () export f, g; export f, g; #Here f and g can use u and v from R #Here f and g can use u and v from R f := proc(a, b) foo(R:-u(a), R:-v(b)) end; f := proc(a, b) foo(R:-u(a), R:-v(b)) end; g := proc(a, b) goo(R:-u(a), R:-v(b)) end; g := proc(a, b) goo(R:-u(a), R:-v(b)) end; end module: end module:end proc:end proc:

ApproachesApproaches

Object-orientedObject-oriented– Data and operations togetherData and operations together– Module for each valueModule for each value– Closer to the original SciGMark implementationCloser to the original SciGMark implementation

Abstract Data TypeAbstract Data Type– Each value is some data objectEach value is some data object– Operations are implemented separately in a Operations are implemented separately in a

generic modulegeneric module– Same module shared by all the values belonging to Same module shared by all the values belonging to

each typeeach type

Object-Oriented ApproachObject-Oriented Approach

DoubleRing := proc(val::float)DoubleRing := proc(val::float) local Me; local Me; Me := module() Me := module() export v, a, s, m, d, gt, zero, one, export v, a, s, m, d, gt, zero, one, coerce, absolute, sine, sqroot; coerce, absolute, sine, sqroot; v := val; # Data value of object v := val; # Data value of object # Implementations for +, -, *, /, >, etc # Implementations for +, -, *, /, >, etc a := (b) -> DoubleRing(Me:-v + b:-v); a := (b) -> DoubleRing(Me:-v + b:-v); s := (b) -> DoubleRing(Me:-v – b:-v); s := (b) -> DoubleRing(Me:-v – b:-v); m := (b) -> DoubleRing(Me:-v * b:-v); m := (b) -> DoubleRing(Me:-v * b:-v); d := (b) -> DoubleRing(Me:-v / b:-v); d := (b) -> DoubleRing(Me:-v / b:-v); gt := (b) -> Me:-v > b:-v; gt := (b) -> Me:-v > b:-v; zero := () -> DoubleRing(0.0); zero := () -> DoubleRing(0.0); coerce := () -> Me:-v; coerce := () -> Me:-v; . . . . . . end module: end module: return Me; return Me;end proc:end proc:

Object-Oriented ApproachObject-Oriented Approach

Previous example simulates object-oriented Previous example simulates object-oriented approach by storing the value in the moduleapproach by storing the value in the module

The exports a, s, m, d correspond to basic The exports a, s, m, d correspond to basic arithmetic operationsarithmetic operations

We chose names other than the standard +, -, We chose names other than the standard +, -, ×, / for two reasons:×, / for two reasons:– The code looks similar to the original SciGMark (Java The code looks similar to the original SciGMark (Java

does not have operator overloading) does not have operator overloading) – It is not very easy to overload operators in MapleIt is not very easy to overload operators in Maple

Functions like sine and sqroot are used by the Functions like sine and sqroot are used by the FFT algorithm to replace complex operationsFFT algorithm to replace complex operations

Abstract Data Type ApproachAbstract Data Type Approach

DoubleRing := module()DoubleRing := module() export a, s, m, d, gt, zero, one, export a, s, m, d, gt, zero, one, coerce, absolute, sine, sqroot; coerce, absolute, sine, sqroot; # Implementations for +, -, *, /, >, etc # Implementations for +, -, *, /, >, etc a := (a, b) -> a + b; a := (a, b) -> a + b; s := (a, b) -> a – b; s := (a, b) -> a – b; m := (a, b) -> a * b; m := (a, b) -> a * b; d := (a, b) -> a / b; d := (a, b) -> a / b; gt := (a, b) -> a > b; gt := (a, b) -> a > b; zero := () -> 0.0; zero := () -> 0.0; one := () -> 1.0; one := () -> 1.0; coerce := (a::float) -> a; coerce := (a::float) -> a; absolute := (a) -> abs(a); absolute := (a) -> abs(a); sine := (a) -> sin(a); sine := (a) -> sin(a); sqroot := (a) -> sqrt(a); sqroot := (a) -> sqrt(a);end module:end module:

Abstract Data Type ApproachAbstract Data Type Approach

Module does not store data, provides only Module does not store data, provides only the operationsthe operations

As a convention one must coerce the float As a convention one must coerce the float type to the representation used by this type to the representation used by this modulemodule

In this case the representation is exactly floatIn this case the representation is exactly float DoubleRing module created only once for DoubleRing module created only once for

each kerneleach kernel

KernelsKernels

Each SciGMark kernel exports an Each SciGMark kernel exports an implementation of its algorithm and a implementation of its algorithm and a function to compute the estimated floating function to compute the estimated floating point operationspoint operations

Each kernel is parametrized by a module R, Each kernel is parametrized by a module R, that abstracts the numerical typethat abstracts the numerical type

Kernel StructureKernel Structure

gFFT := proc(R)gFFT := proc(R) module() module() export num_flops, transform, inverse; export num_flops, transform, inverse; local transform_internal, bitreverse; local transform_internal, bitreverse; num_flops := . . .; num_flops := . . .; transform := . . .; transform := . . .; inverse := . . .; inverse := . . .; transform_internal := . . .; transform_internal := . . .; bitreverse := . . .; bitreverse := . . .; end module: end module:end proc:end proc:

KernelsKernels

The high level structure is the same for The high level structure is the same for object-oriented and for abstract data typeobject-oriented and for abstract data type

Implementation inside the functions is Implementation inside the functions is differentdifferent

ModelModel CodeCode

SpecializedSpecialized x*x + y*yx*x + y*y

Object-orientedObject-oriented (x:-m(x):-a(y:-m(y))):-coerce()(x:-m(x):-a(y:-m(y))):-coerce()

Abstract Data Abstract Data TypeType

R:-coerce(R:-a(R:-m(x,x), R:-m(y,y)))R:-coerce(R:-a(R:-m(x,x), R:-m(y,y)))

Kernel Sample (Abstract Data)Kernel Sample (Abstract Data)

GenMonteCarlo := proc(DR::`module`)GenMonteCarlo := proc(DR::`module`) local m; local m; m := module () m := module () export num_flops, integrate; export num_flops, integrate; local SEED; SEED := 113; local SEED; SEED := 113; num_flops := (Num_samples) -> Num_samples * 4.0; num_flops := (Num_samples) -> Num_samples * 4.0; integrate := proc (numSamples) integrate := proc (numSamples) local R, under_curve, count, x, y, nsm1; local R, under_curve, count, x, y, nsm1; R := Random(SEED); R := Random(SEED); under_curve := 0; nsm1 := numSamples - 1; under_curve := 0; nsm1 := numSamples - 1; for count from 0 to nsm1 do for count from 0 to nsm1 do x := DR:-coerce(R:-nextDouble()); x := DR:-coerce(R:-nextDouble()); y := DR:-coerce(R:-nextDouble()); y := DR:-coerce(R:-nextDouble()); if DR:-coerce(DR:-a(DR:-m(x,x), DR:-m(y, y))) <= 1.0 then if DR:-coerce(DR:-a(DR:-m(x,x), DR:-m(y, y))) <= 1.0 then under_curve := under_curve + 1; under_curve := under_curve + 1; end if; end if; end do; end do; return (under_curve / numSamples) * 4.0; return (under_curve / numSamples) * 4.0; end proc; end proc; end module: end module: return m; return m;end proc:end proc:

Kernel Sample (Object-Kernel Sample (Object-Oriented)Oriented)

GenMonteCarlo := proc(r::`procedure`)GenMonteCarlo := proc(r::`procedure`) local m; local m; m := module () m := module () export num_flops, integrate; export num_flops, integrate; local SEED; SEED := 113; local SEED; SEED := 113; num_flops := (Num_samples) -> Num_samples * 4.0; num_flops := (Num_samples) -> Num_samples * 4.0; integrate := proc (numSamples) integrate := proc (numSamples) local R, under_curve, count, x, y, nsm1; local R, under_curve, count, x, y, nsm1; R := Random(SEED); R := Random(SEED); under_curve := 0; nsm1 := numSamples - 1; under_curve := 0; nsm1 := numSamples - 1; for count from 0 to nsm1 do for count from 0 to nsm1 do x := r(R:-nextDouble()); x := r(R:-nextDouble()); y := r(R:-nextDouble()); y := r(R:-nextDouble()); if (x:-m(x):-a(y:-m(y))):-coerce() <= 1.0 then if (x:-m(x):-a(y:-m(y))):-coerce() <= 1.0 then under_curve := under_curve + 1; under_curve := under_curve + 1; end if; end if; end do; end do; return (under_curve / numSamples) * 4.0; return (under_curve / numSamples) * 4.0; end proc; end proc; end module: end module: return m; return m;end proc:end proc:

Kernel Sample (Contd.)Kernel Sample (Contd.)

measureMonteCarlo := proc(min_time, R)measureMonteCarlo := proc(min_time, R) local Q, cycles; local Q, cycles; Q := Stopwatch(); Q := Stopwatch(); cycles := 1; cycles := 1; while true do while true do Q:-strt(); Q:-strt(); GenMonteCarlo(DoubleRing):-integrate(cycles); GenMonteCarlo(DoubleRing):-integrate(cycles); Q:-stp(); Q:-stp(); if Q:-rd() >= min_time then break; end if; if Q:-rd() >= min_time then break; end if; cycles := cycles * 2; cycles := cycles * 2; end do; end do; return GenMonteCarlo(DoubleRing):-num_flops(cycles) / Q:-rd() return GenMonteCarlo(DoubleRing):-num_flops(cycles) / Q:-rd() * 1.0e-6; * 1.0e-6;end proc;end proc;

Results (MFlops)Results (MFlops)

TestTest SpecializeSpecializedd

Abstract Abstract Data TypeData Type

Object Object OrientedOriented

Fast Fourier Fast Fourier TransformTransform

0.1230.123 0.0880.088 0.01030.0103

Successive Over Successive Over RelaxationRelaxation

0.2430.243 0.1660.166 0.01670.0167

Monte CarloMonte Carlo 0.0920.092 0.0690.069 0.01650.0165

Sparse Matrix Sparse Matrix MultiplicationMultiplication

0.0450.045 0.0410.041 0.01290.0129

LU FactorizationLU Factorization 0.1620.162 0.1310.131 0.01110.0111

CompositeComposite 0.1330.133 0.0990.099 0.01350.0135

RatioRatio 100%100% 74%74% 10% 10% Note: Larger means faster

ResultsResults

Abstract Data Type is very close in Abstract Data Type is very close in performance to the specialized version – performance to the specialized version – about 75% as fastabout 75% as fast

Object-oriented model simulates closely the Object-oriented model simulates closely the original SciGMark – produces many modules original SciGMark – produces many modules and this leads to a significant overhead about and this leads to a significant overhead about only 10% as fastonly 10% as fast

Useful to separate the instance specific data Useful to separate the instance specific data from the shared methods module – values from the shared methods module – values are formed as composite objects from the are formed as composite objects from the instance and the shared methods moduleinstance and the shared methods module

ConclusionsConclusions

Performance penalty should not discourage Performance penalty should not discourage writing generic code writing generic code – Provides code reusability that can simplify librariesProvides code reusability that can simplify libraries– Writing generic programs in mathematical context Writing generic programs in mathematical context

helps programmers operate at a higher level of helps programmers operate at a higher level of abstractionabstraction

Generic code optimization is possible and we Generic code optimization is possible and we proposed an approach to optimize it by proposed an approach to optimize it by specializing the generic type according to the specializing the generic type according to the instances of the type parametersinstances of the type parameters

Conclusions (Contd.)Conclusions (Contd.)

Parametric polymorphism does not introduce Parametric polymorphism does not introduce excessive performance penaltyexcessive performance penalty– Possible because of the interpreted nature of Maple, not Possible because of the interpreted nature of Maple, not

many optimizations performed on the specialized code many optimizations performed on the specialized code (even specialized code uses many function calls)(even specialized code uses many function calls)

Object-oriented use of modules not well supported Object-oriented use of modules not well supported in Maple; simulating sub-classing polymorphism in in Maple; simulating sub-classing polymorphism in Maple is very expensive and should be avoidedMaple is very expensive and should be avoided

Better support for overloading would help Better support for overloading would help programmers write more generic code in Maple. programmers write more generic code in Maple.

More info about SciGMark at:More info about SciGMark at:http://www.orcca.on.ca/benchmarks/http://www.orcca.on.ca/benchmarks/

AcknowledgmentsAcknowledgments

ORCCA membersORCCA members MapleSoftMapleSoft

on the performance of parametric polymorphism in maple laurentiu draganstephen m. watt ontario...

Documents

sets of data

vb end end module

vb end g

data objectoperations

parameter modules

resultresulting modules

generic numerical type

basic arithmetic operations