libhpc: software sustainability and reuse through metadata preservation
DESCRIPTION
TRANSCRIPT
libHPC: Software sustainability and reuse through metadata preservation Jeremy Cohen, John Darlington, Brian Fuchs London e-Science Centre / Department of Computing, Imperial College London
David Moxey, Chris Cantwell, Pavel Burovskiy, Spencer Sherwin Department of Aeronautics, Imperial College London
Neil Chue Hong Software Sustainability Institute, University of Edinburgh First Workshop on Maintainable Software Practices in e-Science, Chicago Tuesday 9th October 2012
Introduction
• Decision making – building scientific software can be hard
• Abstraction – hide the complexity
• Efficiency – achieve the performance
• Aim for a universal technology that spans all application
domains, machines, metrics
• Coordination forms – a different approach to task
specification
• Components – encapsulated building blocks
Machines
Applications
Metrics
ClusterCloudMulti-core
GPUFPGA
Time
Cost
Energy
Num.Intensive
Data Intensive
Bioinformatics
CFD
Information and decisions
Why is software development and re-use hard?
• A particular piece of code is the result of many development decisions
• Developers invest significant knowledge about the task to be solved
…however…
• Decisions made by developers cannot be reconstructed from the code
• Loss of original information and structure invested by developer(s)
Information and decisions
Understanding code structure and the options available and the decisions made during development is important:
• Portability; optimisation on different architectures
• Long-term sustainability
Need an explicit representation of decisions and alternatives:
• Decision tree used to represent this (structure)
• Metadata used to annotate decision tree (information)
• Modifications can be made to decision tree (based on metadata analysis) which can than be mapped to modified code
Information and decisions
e.g. code that uses a solver:
• Many options to select suitable solver – abstract components
• Choice dependent on problem being addressed, parameters, etc.
• Represent solver choice on a tree of component alternatives, leaf nodes are concrete implementations higher-level nodes are abstract
Linear Solver"
Jacobi"LU"
Matrix
Vector Vector
Matrix Vector Vector Matrix
Vector Vector
Sequential LU" Parallel LU"(OpenMP)"
Parallel LU"(MPI)"
Sequential Jacobi"
Parallel Jacobi (UPC)"
Abstractions
a Encapsulation
Encapsulate functions as components (reuse)
Allow alternatives
a Functional properties
Referentially transparent a Encapsulation
Church-Rosser a Alternative behaviours
Abstractions – alternative behaviours
i.e. Church-Rosser
(4 + 3) – (2 + 1)
7 – 3
4
7 – (2 + 1) (4 + 3) – 3
Application flow and specification
We represent application elements using two techniques
• Data processing – core code that forms application building blocks
a Components (first-order functions)
• Control flow, orchestration
a High-order functions
a Coordination Forms
e.g. Pipe, Parallel, Map / Reduce, …
• A functional/mathematical approach to job specification
• Based on work by Darlington, et al.
• Applied to components – define application flow
• May be:
• General – applicable to most applications – e.g. PIPE, PAR
• Iterative patterns – e.g FARM, ITERATE
• Domain-specific higher-level forms – e.g. Monte Carlo
• Extensible – new patterns can be introduced
Coordination Forms
J. Darlington, Y. Guo, H. W. To and J. Yang. Functional skeletons for parallel coordination. In proceedings of EURO-PAR ’95 Parallel Processing, LNCS 966/1995, p. 55-66, 1995. Springer Berlin/Heidelberg
• A given form may have multiple underlying implementations
• E.g. PAR may provide sequential, multi-threaded and MPI parallel implementations
• Forms aim to be as lightweight as possible
• They result in code that can be run
• They intelligently glue together component building blocks
• PIPE as an example – functions f1 to fn with initial input a:
PIPE [ f1, f2,…fn ]a = (f1 ° f2 ° … fn)a
= f1(f2 (… (fn(a))))
Coordination Forms
PIPE ([component list], initial input)
PAR ([component list], [(input1), (input2), …, (inputn)])
Coordination Forms – Impementation
• Prototype implementation in Python • Class wrappers for component and parameter metadata –
concrete implementation code selectable
PIPE – Compose a series of components in the order specified
PAR – Run a series of components independently (perhaps in parallel)
Additional parameters can be added in component list
E.g. for components add, multiply, divide:
2 * ( (245+34) / (6+8) )
PIPE([(multiply, 2), divide, PAR([add,add],[(245,34),(6,8)])])
Bioinformatics: Genome Read Pre-Processing/Mapping Reference Genome
FASTA file
Short Read Set (Paired)
Single FASTQ file
FASTQ splitbwa index
bwa aln bwa aln
SR_1 SR_2
bwa sampe - generate alignment (paired ended)
samtools import
FAST
A file
+ in
dex
file
SAM file
BAM file
samtools sort
sorted BAM file
samtools index
OUTPUT
Input files – Reference Genome – FASTA file Reads from sequencing machine - FASTQ
((sr1,sr2), u) = PAR([fastq_split, bwa_index], [(short_read_file, None, None),(ref_genome_file,)])
(v, w) = PAR([bwa_aln, bwa_aln], [(ref_genome_file, sr1, None), (ref_genome_file, sr2, None)])
result = PIPE([samtools_index, samtools_sort, (samtools_import, ref_genome_file), bwa_sampe],
[ref_genome_file, [v,w], [sr1, sr2], None])
LibHPC Project
• LibHPC
• Two year project under EPSRC HPC Software Programme
• Imperial College London (Computing (LeSC), Aeronautics, ICT)
• SSI, Edinburgh
• Implementing/demonstrating framework with main supporting application (Nektar++) + other exemplars
Example
Optimising FEM Codes
High-level Application Description / Job Specification(Co-ordination Forms, DSLs, etc.)
Job Specification Analysis/Processing
Hardware Resources
Software Component Library & Metadata Resource
Discovery & Metadata
Domain-specificApplication Support
Libraries
Nektar++ - Hybrid Assembly
• Nektar++ operates on matrices based on input mesh
• Each element of input mesh is mapped to an (elemental) matrix
• There are two matrix assembly strategies:
• Local
• Global
Nektar++ - Hybrid Assembly
=
=
=
1
=
=
=
1
Local Assembly Global Assembly
Nektar++ - Hybrid Assembly
=
=
=
1
Hybrid Assembly
Thank You