haskell vs. f# vs. scala
DESCRIPTION
A High-level Language Features and Parallelism Support ComparisonTRANSCRIPT
Haskell vs. F# vs. ScalaA High-level Language Features and Parallelism Support Comparison1
Prabhat Totoo, Pantazis Deligiannis, Hans-Wolfgang Loidl
Denpendable Systems GroupSchool of Mathematical and Computer Sciences
Heriot-Watt University, UK
FHPC’12
Copenhagen, Denmark
15 September 2012
1URL:http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/fhpc12.html
Goal
I comparison of parallel programming support in the 3 languages- performance- programmability
I we look at:I language features and high-level abstractions for parallelismI experimental results from n-body problem on multi-coresI discussion about performance, programmability and pragmatic aspects
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 1 / 29
Motivation
I promises of functional languages for parallel programmingI referential transparency due to the absence of side effectsI high-level abstractions through HOFs, function compositionI skeletons encapsulating common parallel patterns
I ”specify what instead of how to compute something ”
I SPJ: ”The future of parallel is declarative ”
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 2 / 29
Recent trends in language design
I mainstream languages integrating functional features in their designI e.g. Generics (Java), lambda expression (C#, C++), delegates (C#)I improves expressiveness in the language by providing functional
constructs
I Multi-paradigm languages - make functional programming moreapproachable
I F#- Microsoft .NET platform- functional-oriented; with imperative and OO constructs
I Scala- JVM- emphasize on OO but combined with powerful functional features
I library support for parallel patterns (skeletons)I e.g. TPL
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 3 / 29
THE LANGUAGES
Summary table
Table: Summary of language features
Key Features Parallelism Support
Haskell pure functional, lazy evaluation, par and pseqstatic/inferred typing Evaluation strategies
F# functional, imperative, object oriented, Async Workflowsstrict evaluation, static/inferred typing, TPL.NET interoperability PLINQ
Scala functional, imperative, object oriented, Parallel Collectionsstrict evaluation, strong/inferred typing, ActorsJava interoperability
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 4 / 29
Haskell2
I pure functional language, generally functions have no side-effects
I lazy by default
I typing: static, strong, type inference
I advanced type system supporting ADTs, typeclasses, typepolymorphism
I monads used to:I chain computation (no default eval order)I IO monad separates pure from side-effecting computations
I main implementation: GHC- highly-optimised compiler and RTS
2http://haskell.org/Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 5 / 29
Haskell - parallel support
I includes support for semi-explicit parallelism through the Glasgowparallel Haskell (GpH) extensions
I GpHI provides a single primitive for parallelism: par
1 par : : a −> b −> b
to annotate expression that can usefully be evaluated in parallel(potential NOT mandatory parallelism)
I and a second primitive to enforce sequential ordering which is neededto arrange parallel computations: pseq
1 pseq : : a −> b −> b
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 6 / 29
Haskell - parallel support
I GpH e.g.
1 f = s1 + s2 −− s e q u e n t i a l2 f = s1 ‘ par ‘ s2 ‘ pseq ‘ ( s1 + s2 ) −− p a r a l l e l
I Evaluation strategiesI abstractions built on top of the basic primitivesI separates coordination aspects from main computation
I parallel map using strategies
1 parMap s t r a t f x s = map f xs ‘ u s i n g ‘ p a r L i s t s t r a t
I parList Spark each of the elements of a list using the given strategyI strat e.g. rdeepseq Strategy that fully evaluates its argument
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 7 / 29
Haskell - parallel support
I Other abstractions for parallelism:I Par monad
- more explicit approach- uses IVars, put and get for communication- abstract some common functions e.g. parallel map
I DPH- exploits data parallelism, both regular and nested-data
I Eden- uses process abstraction, similar to λ-abstraction- for distributed-memory but also good performance on shared-memorymachines3
3Prabhat Totoo, Hans-Wolfgang Loidl. Parallel Haskell implementations of then-body problem.
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 8 / 29
F#4
I functional-oriented language
I SML-like language with OO extensions
I implemented on top of .NET, interoperable with languages such asC#
I can make use of arbitrary .NET libraries from F#
I strict by default, but has lazy values as well
I advanced type system: discriminated union (=ADT), object types(for .NET interoperability)
I value vs. variable
4http://research.microsoft.com/fsharp/Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 9 / 29
F# - parallel support
I uses the .NET Parallel Extensions library- high-level constructs to write/execute parallel programs
I Tasks Parallel Library (TPL)- hide low-level thread creation, management, scheduling details
I Tasks- main construct for task parallelismTask: provides high-level abstraction compared to working directlywith threads; does not return a valueTask<TResult>: represents an operation that calculates a value oftype TResult eventually i.e. a future
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 10 / 29
F# - parallel support
I Tasks Parallel Library (TPL)
I Parallel Class - for data parallelism- basic loop parallelisation using Parallel.For andParallel.ForEach
I PLINQ - declarative model for data parallelism- uses tasks internally
I Async Workflows- use the async {...} keyword; doesn’t block calling thread -provide basic parallelisation- intended mainly for operations involving I/O e.g. run multipledownloads in parallel
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 11 / 29
Scala5
I designed to be a “better Java”...
I multiparadigm language; integrating OO and functional model
I evaluation: strict; typing: static, strong and inferred
I strong interoperability with Java (targeted for JVM)
I still very tied with the concept of objects
I language expressiveness is extended by functional features
5http://www.scala-lang.org/Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 12 / 29
Scala - parallel support
I Parallel Collections FrameworkI implicit (data) parallelismI use par on sequential collection to invoke parallel implementationI subsequent operations are parallelI use seq to turn collection back to sequential
1 xs . map( x => f x ) // s e q u e n t i a l2
3 xs . par . map( x => f x ) . seq // p a r a l l e l
I built on top of Fork/Join frameworkI thread pool implementation schedules tasks among available processors
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 13 / 29
Scala - parallel support
I ActorsI message-passing model; no shared stateI similar concurrency model to ErlangI lightweight processes communicates by exchanging asynchronous
messagesI messages go into mailboxes; processed using pattern matching
1 a ! msg // send message2
3 r e c e i v e { // p r o c e s s m a i l b o x4 c a s e m s g p a t t e r n 1 => a c t i o n 15 c a s e m s g p a t t e r n 2 => a c t i o n 26 c a s e m s g p a t t e r n 3 => a c t i o n 37 }
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 14 / 29
IMPLEMENTATION
N-body problem
I problem of predicting and simulating the motion of a system of Nbodies that interact with each other gravitationally
I simulation proceeds over a number of time stepsI in each step
I calculate acceleration of each body wrt the others,I then update position and velocity
I solving methods:I all-pairs: direct body-to-body comparison, not feasible for large NI Barnes-Hut algorithm: efficient approximation method, more advanced
consists of 2 phases:- tree construction- force calculation (most compute-intensive)
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 15 / 29
Sequential Implementation and optimisations
I Approach: start off with an efficient seq implementation, preservingopportunities for parallelism
I generic optimisations:I deforestation
- eliminating multiple traversal of data structuresI tail-call eliminationI compiler optimisations
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 16 / 29
Haskell
I OptimisationsI eliminates stack overflow by making functions tail recursiveI add strictness annotations where requiredI use strict fields and UNPACK pragma in datatype definitionsI shortcut fusion e.g. foldr/build
I Introduce parallelism at the top-level map function
I Core parallel code for Haskell
1 c h u n k s i z e = ( l e n g t h bs ) ‘ quot ‘ ( n u m C a p a b i l i t i e s ∗ 4)2 new bs = map f bs ‘ u s i n g ‘ p a r L i s t C h u n k c h u n k s i z e
r d e e p s e q
I Parallel tuningI chunking to control granularity, not too many small tasks; not too few
large tasks
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 17 / 29
F# (1)
I translate Haskell code into F#
I keeping the code functional, so mostly syntax changes
I some general optimisations apply+ inlining
1 (∗ Async Workf lows ∗)2 l e t pmap async f xs =3 seq { f o r x i n xs −> a sy nc {
r e t u r n f x } }4 |> Async . P a r a l l e l5 |> Async . RunSynchronous ly6 |> Seq . t o L i s t
1 (∗ TPL ∗)2
3 (∗ E x p l i c i t t a s k s c r e a t i o n . ∗)4 l e t p m a p t p l t a s k s f ( xs : l i s t < >)
=5 L i s t . map ( fun x −>6 Task< >. F a c t o r y . StartNew ( fun
( ) −> f x ) . R e s u l t7 ) xs
1 (∗ PLINQ − u s e s TPL i n t e r n a l l y ∗)2 l e t pm ap p l i nq f ( xs : l i s t < >) =3 xs . A s P a r a l l e l ( ) . S e l e c t ( fun x −> f x ) |> Seq . t o L i s t
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 18 / 29
F# (2)
I imperative style
1 (∗ P a r a l l e l . For ∗)2
3 l e t p m a p t p l p a r f o r f ( xs : a r r a y < >) =4 l e t new xs = Array . z e r o C r e a t e xs . Length5 P a r a l l e l . For ( 0 , xs . Length , ( fun i −>6 new xs . [ i ] <− f ( x s . [ i ] ) ) ) |> i g n o r e7 new xs
I Parallel tuningI maximum degree of parallelism
- specifies max number of concurrently executing tasksI chunking/partitioning
- to control the granularity of tasks
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 19 / 29
Scala
I translate Haskell and F# implementations into Scala
I keeping the code as much functional as possible
I optimisations- unnecessary object initialisations removal
1 // P a r a l l e l C o l l e c t i o n s .2 b o d i e s . par . map ( ( b : Body ) => new Body ( b . mass , updatePos ( b ) ,
u p d a t e V e l ( b ) ) ) . seq
1 // P a r a l l e l map u s i n g F u t u r e s .2 d e f pmap [T ] ( f : T => T) ( xs : L i s t [T ] ) : L i s t [T] = {3 v a l t a s k s = xs . map ( ( x : T) => F u t u r e s . f u t u r e { f ( x ) })4 t a s k s . map( f u t u r e => f u t u r e . a p p l y ( ) )5 }
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 20 / 29
RESULTS
Experimental setup
I Platforms and language implementations:
Linux (64-bit) Windows (32-bit)2.33GHz (8 cores) 2.80GHz (4 cores with HT)
Haskell GHC 7.4.1 GHC 7.4.1
F# F# 2.0 / Mono 2.11.1 F# 2.0 / .NET 4.0
Scala Scala 2.10 / JVM 1.7 Scala 2.10 / JVM 1.7
I Input size: 80,000 bodies, 1 iteration
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 21 / 29
Sequential Runtimes
Haskell (Linux) 479.94sF# (Win) 28.43sScala (Linux) 55.44s
before
25.2821.1239.04
after
Linux Windows
Haskell 25.28 17.64F# 118.12 21.12Scala 39.04 66.65
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 22 / 29
Sequential Runtimes
Haskell (Linux) 479.94sF# (Win) 28.43sScala (Linux) 55.44s
before
25.2821.1239.04
after
Linux Windows
Haskell 25.28 17.64F# 118.12 21.12Scala 39.04 66.65
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 22 / 29
Sequential Runtimes
Haskell (Linux) 479.94sF# (Win) 28.43sScala (Linux) 55.44s
before
25.2821.1239.04
after
Linux Windows
Haskell 25.28 17.64F# 118.12 21.12Scala 39.04 66.65
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 22 / 29
Sequential Runtimes
Haskell (Linux) 479.94sF# (Win) 28.43sScala (Linux) 55.44s
before
25.2821.1239.04
after
Linux Windows
Haskell 25.28 17.64F# 118.12 21.12Scala 39.04 66.65
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 22 / 29
Parallel Runtimes
Haskell (EvalStrat) F# (PLINQ) Scala (Actors)
seq 25.28 118.12 39.041 26.38 196.14 40.012 14.48 120.78 22.344 7.41 80.91 14.888 4.50 70.67 13.26
Table: Linux (8 cores)
Haskell (EvalStrat) F# (PLINQ) Scala (Actors)
seq 17.64 21.12 66.651 18.05 21.39 67.242 9.41 17.32 58.664 6.80 10.56 33.84
8 (HT) 4.77 8.64 25.28
Table: Windows (4 cores)
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 23 / 29
SpeedupsLinux (8 cores)
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 24 / 29
SpeedupsWindows (4 cores)
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 25 / 29
Observations
I PerformanceI Haskell outperforms F# and Scala on both platforms
- has been possible after many optimisationsI poor performance - F#/Mono (vs. F#/.NET runtime)I poor performance - Scala on Windows
I ProgrammabilityI Haskell - implication of laziness - hard to estimate how much work is
involvedI Scala - verbose, unlike other functional languagesI in general, initial parallelism easy to specifyI chunking required to work for Haskell, but PLINQ and ParColl are
more implicitI F# and Scala retain much control in the implementation
- not easy to tune parallel program
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 26 / 29
Observations
I PragmaticsI Haskell
- good tool support e.g. for space and time profiling- threadscope: visualisation tool to see work distribution
I F#- Profiling and Analysis tools available in VS2011 Beta Pro -Concurrency Visualizer
I Scala- benefits from free tools available for JVM
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 27 / 29
Conclusions
I employ skeleton-based, semi-explicit and explicit approaches toparallelism
I (near) best runtimes using highest-level of abstraction
I start with an optimised sequential program
I but use impure features only after parallelisation
I Haskell provides least intrusive mechanism for parallelisation
I F# provides the preferred mechanism for data-parallelism with theSQL-like PLINQ
I extra programming effort using actor-based code in Scala not justified
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 28 / 29
Future Work: Eden on Clusters6
4
8
16
32
64
128
256
512
1 2 4 8 16 32 64 128
Tim
e (
s)
Processors
Eden Iteration Skeletons: Naive NBody with 15000 bodies, 10 iteration
localLoop/allToAllIterloopControl/allToAllRD
linear speedup
6Mischa Dieterle, Thomas Horstmeyer, Jost Berthold and Rita Loogen: IteratingSkeletons - Structured Parallelism by Composition, IFL’12
Totoo, Deligiannis, Loidl (HWU) Haskell vs. F# vs. Scala 15-Sep-2012 29 / 29
Thank you for listening!
I Full paper and sources:http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/fhpc12.html
I Email:{pt114, pd85, H.W.Loidl}@hw.ac.uk