robust memory-aware mappings for parallel multifrontal...
TRANSCRIPT
Robust memory-aware mappings forparallel multifrontal factorizationsJoint work with P. R. Amestoy, E. Agullo, A. Buttari, A. Guermouche, J.-Y. L’Excellent
François-Henry Rouet, Lawrence Berkeley National Laboratory
MUMPS Users Group Meeting, May 29-30, 2013
Introduction
Motivation
• Memory is often a bottleneck for direct solvers.• Physical memory per core tends to decrease.• Estimating memory consumption in a dynamic scheduling contextis difficult (cf. the infamous error code “-9” in MUMPS).
Context
• We want to design mapping algorithms that enforce some memoryconstraints and provide better memory estimates.
• We focus on multifrontal methods (e.g., MUMPS).
2/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
FactorsActivefrontalmatrix
Stack ofcontributionblocks
Active storage
5
54
145
34
23
Elimination tree
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:• Factors• Active memory
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The multifrontal method [Duff & Reid ’83]
• Factors are incompressible and usuallyscale fairly; they can optionally bewritten on disk.
• In sequential, the traversal thatminimizes active memory is known[Liu’86].
• In parallel, active memory becomesdominant.Example: share of active storage onthe AUDI matrix using MUMPS 4.10.01 processor: 11%256 processors: 59%
5
54
145
34
23
Active memory
3/21 MUMPS Users Group Meeting, May 29-30, 2013
The problem
Metric: memory efficiency
e(p) = Sseqp × Smax (p)
We would like e(p) ' 1, i.e. Sseq/p on each processor.
Common mappings/schedulings provide a poor memory efficiency:• Proportional mapping: lim
p→∞e(p) = 0 on regular problems.
• MUMPS relies primarily on a relaxed proportional mapping; typicalefficiency: between 0.10 and 0.40.Memory estimates are unreliable.
4/21 MUMPS Users Group Meeting, May 29-30, 2013
Proportional mapping
Proportional mapping: top-down traversal of the tree, where processors as-signed to a node are distributed among its children proportionally to the weightof their respective subtrees.
5/21 MUMPS Users Group Meeting, May 29-30, 2013
Proportional mapping
Proportional mapping: top-down traversal of the tree, where processors as-signed to a node are distributed among its children proportionally to the weightof their respective subtrees.
64
5/21 MUMPS Users Group Meeting, May 29-30, 2013
Proportional mapping
Proportional mapping: top-down traversal of the tree, where processors as-signed to a node are distributed among its children proportionally to the weightof their respective subtrees.
64
3262640% 10% 50%
5/21 MUMPS Users Group Meeting, May 29-30, 2013
Proportional mapping
Proportional mapping: top-down traversal of the tree, where processors as-signed to a node are distributed among its children proportionally to the weightof their respective subtrees.
64
32626
26
13 13
2 2 221 11
11 10
• Targets run time but poor memory efficiency.• Usually a relaxed version is used: more memory-friendly but unreliable
estimates.5/21 MUMPS Users Group Meeting, May 29-30, 2013
Proportional mapping
Proportional mapping: assuming that the sequential peak is 5 GB,
Smax (p) > max{4GB26 ,
1GB6 ,
5GB32
}= 0.16 GB⇒ e(p) 6 5
64× 0.16 6 0.5
64
326264GB 1GB 5GB
5GB
6/21 MUMPS Users Group Meeting, May 29-30, 2013
A more memory-friendly strategy. . .
All-to-all mapping: postorder traversal of the tree, where all the processorswork at every node:
64
646464
64
64 64
64 64
6464
646464
7/21 MUMPS Users Group Meeting, May 29-30, 2013
A more memory-friendly strategy. . .
All-to-all mapping: postorder traversal of the tree, where all the processorswork at every node:
64
646464
64
64 64
64 64
6464
646464
Optimal memory scalability (Smax (p) = Sseq/p) but no tree parallelism andprohibitive amounts of communications.
7/21 MUMPS Users Group Meeting, May 29-30, 2013
A class of “memory-aware” mappings
“Memory-aware” mapping [Agullo ’08]: aims at enforcing a given memory con-straint (M0, maximum memory per processor):
64
326264GB 1GB 5GB
5GB
1. Try to apply proportional mapping.
2. Enough memory for each subtree?
If not, serialize them, and update M0:processors stack equal shares of the CBs from previous nodes.
8/21 MUMPS Users Group Meeting, May 29-30, 2013
A class of “memory-aware” mappings
“Memory-aware” mapping [Agullo ’08]: aims at enforcing a given memory con-straint (M0, maximum memory per processor):
64
326264GB 1GB 5GB
5GB
!
1. Try to apply proportional mapping.2. Enough memory for each subtree?
If not, serialize them, and update M0:processors stack equal shares of the CBs from previous nodes.
8/21 MUMPS Users Group Meeting, May 29-30, 2013
A class of “memory-aware” mappings
“Memory-aware” mapping [Agullo ’08]: aims at enforcing a given memory con-straint (M0, maximum memory per processor):
64
64 6464
1. Try to apply proportional mapping.2. Enough memory for each subtree? If not, serialize them, and update M0:
processors stack equal shares of the CBs from previous nodes.8/21 MUMPS Users Group Meeting, May 29-30, 2013
A class of “memory-aware” mappings
“Memory-aware” mapping [Agullo ’08]: aims at enforcing a given memory con-straint (M0, maximum memory per processor):
64
64 6464
64 6464
32 32
22 21 21
32 32
• Ensures the given memory constraint and provides reliable estimates.• Tends to assign many processors on nodes at the top of the tree
=⇒ performance issues on parallel nodes.8/21 MUMPS Users Group Meeting, May 29-30, 2013
A class of “memory-aware” mappings
A finer “memory-aware” mapping? Serializing all the children at once is veryconstraining: more tree parallelism can be found.
64
51 641364
Find groups on which proportional mapping works, and serialize these groups.Heuristic: follow a given order (e.g. the serial postorder) and form groups aslarge as possible.
9/21 MUMPS Users Group Meeting, May 29-30, 2013
Memory-aware with groups
The algorithm maps each child i on pi processors so that, on everyprocessor working on i , two conditions are enforced:
(Cstk) there is enough memory to hold Pseq(i)/pi .(Casm) the assembly of the CB of i into its parent is feasible.
while not all children collected doi= current nodeif i cannot be alone then
Reset previous children to anall-to-all mapping (this shouldimprove balance)
end ifStarting from i : collect as manynodes as possible as long as (Cstk)and (Casm) are ensuredUse proportional mapping on thegroup, serialize with previous ones
end while
pf
p2
p1
10/21 MUMPS Users Group Meeting, May 29-30, 2013
Memory-aware with groups
The algorithm maps each child i on pi processors so that, on everyprocessor working on i , two conditions are enforced:
(Cstk) there is enough memory to hold Pseq(i)/pi .(Casm) the assembly of the CB of i into its parent is feasible.
while not all children collected doi= current nodeif i cannot be alone then
Reset previous children to anall-to-all mapping (this shouldimprove balance)
end ifStarting from i : collect as manynodes as possible as long as (Cstk)and (Casm) are ensuredUse proportional mapping on thegroup, serialize with previous ones
end while
pf
p2
p1
10/21 MUMPS Users Group Meeting, May 29-30, 2013
Memory-aware with groups
The algorithm maps each child i on pi processors so that, on everyprocessor working on i , two conditions are enforced:
(Cstk) there is enough memory to hold Pseq(i)/pi .(Casm) the assembly of the CB of i into its parent is feasible.
while not all children collected doi= current nodeif i cannot be alone then
Reset previous children to anall-to-all mapping (this shouldimprove balance)
end ifStarting from i : collect as manynodes as possible as long as (Cstk)and (Casm) are ensuredUse proportional mapping on thegroup, serialize with previous ones
end while
pf
pf
pf
10/21 MUMPS Users Group Meeting, May 29-30, 2013
Memory-aware with groups
The algorithm maps each child i on pi processors so that, on everyprocessor working on i , two conditions are enforced:
(Cstk) there is enough memory to hold Pseq(i)/pi .(Casm) the assembly of the CB of i into its parent is feasible.
while not all children collected doi= current nodeif i cannot be alone then
Reset previous children to anall-to-all mapping (this shouldimprove balance)
end ifStarting from i : collect as manynodes as possible as long as (Cstk)and (Casm) are ensuredUse proportional mapping on thegroup, serialize with previous ones
end while
pf
pf
pf
10/21 MUMPS Users Group Meeting, May 29-30, 2013
Memory-aware with groups
The algorithm maps each child i on pi processors so that, on everyprocessor working on i , two conditions are enforced:
(Cstk) there is enough memory to hold Pseq(i)/pi .(Casm) the assembly of the CB of i into its parent is feasible.
while not all children collected doi= current nodeif i cannot be alone then
Reset previous children to anall-to-all mapping (this shouldimprove balance)
end ifStarting from i : collect as manynodes as possible as long as (Cstk)and (Casm) are ensuredUse proportional mapping on thegroup, serialize with previous ones
end while
pf
pf
pf
p3
p4
p5
10/21 MUMPS Users Group Meeting, May 29-30, 2013
Experiments
• Matrix: finite-difference model of acoustic wave propagation,27-point stencil, 192× 192× 192 grid; Seiscope consortium.N = 7 M, nnz = 189 M, factors=152 GB (METIS).Sequential peak of active memory: 46 GB.
• Machine: 64 nodes with two quad-core Xeon X5560 per node.We use 256 MPI processes.
• Perfect memory scalability: 46 GB/256=180MB.
11/21 MUMPS Users Group Meeting, May 29-30, 2013
Experiments
Prop Memory-aware mappingMap M0 = 225 MB M0 = 380 MB
w/o groups w/ groupsMax stack peak (MB) 1932 227 330 381Avg stack peak (MB) 626 122 177 228Time (s) 1323 2690 2079 1779
The root node has 3 children that cannot bemapped using a proportional mapping:
46 GB+ 32 GB+ 30 GB > 380 MB× 256
• The “basic” memory-aware mapping serializesthe three children.
• The variant with groups puts two childrentogether and one alone.
46GB
45GB 32GB 30GB
12/21 MUMPS Users Group Meeting, May 29-30, 2013
Experiments
We assess how the different strategies map nodes, compared to theproportional mapping.
Dept
h
0
2
1
34
00
0.5
1.5
2
2.5
3
3.5
4
2 4 6 8 10 12 14
Average #procs/#procs PM per node, for each layer
Depth
Ratio
#pr
ocs/
#pr
ocs P
M
MA, 225MBMA, 380MB, w/o groupsMA, 380MB, w/ groups
1
• Relaxing M0 enhances tree parallelism.• The variant with groups enhances tree parallelism.
13/21 MUMPS Users Group Meeting, May 29-30, 2013
Performance issues
The new mapping tends to assign many processes to nodes at the topof the tree. One of the communication patterns became a bottleneck.
M
S1S2S3
(unsymmetric case)
Non-blocking broadcast of a block of factorsfrom the master process.A kind of ibcast.
Baseline implementation: a loop of non-blocking sends from themaster to the processors. Communication speed is constant.Hyperion (CICT): 1.7 GB/s; Hopper (NERSC): 4.5 GB/s.Tree-based implementation: communication speed is almostproportional to the number of processors.Hyperion, 128 cores: 28 GB/s; Hopper, 768 cores: 122 GB/s.
14/21 MUMPS Users Group Meeting, May 29-30, 2013
Performance issues
The new mapping tends to assign many processes to nodes at the topof the tree. One of the communication patterns became a bottleneck.
M
S1S2S3
(unsymmetric case)
M
S1
Sp
... p
Baseline implementation: a loop of non-blocking sends from themaster to the processors. Communication speed is constant.Hyperion (CICT): 1.7 GB/s; Hopper (NERSC): 4.5 GB/s.
Tree-based implementation: communication speed is almostproportional to the number of processors.Hyperion, 128 cores: 28 GB/s; Hopper, 768 cores: 122 GB/s.
14/21 MUMPS Users Group Meeting, May 29-30, 2013
Performance issues
The new mapping tends to assign many processes to nodes at the topof the tree. One of the communication patterns became a bottleneck.
M
S1S2S3
(unsymmetric case)
M
S...
S...... w
S...
S...
S...
S...
... w... w
Baseline implementation: a loop of non-blocking sends from themaster to the processors. Communication speed is constant.Hyperion (CICT): 1.7 GB/s; Hopper (NERSC): 4.5 GB/s.Tree-based implementation: communication speed is almostproportional to the number of processors.Hyperion, 128 cores: 28 GB/s; Hopper, 768 cores: 122 GB/s.
14/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – implementation
MUMPS follows an (almost) fully dynamic scheduling scheme where aprocess can be asked on the fly to work on virtually any task.
Main loop:while . . . do
Probe for messageif message received then
Read tag, execute associated task(e.g., Facto panel)
end ifend while
Facto panel:Do BLAS stuff on panelTry to send panel to some processes:if Send buffer is full then
Call Main loopend ifFree stuff
15/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – implementation
MUMPS follows an (almost) fully dynamic scheduling scheme where aprocess can be asked on the fly to work on virtually any task.
Main loop:while . . . do
Probe for messageif message received then
Read tag, execute associated task(e.g., Facto panel)
end ifend while
Facto panel:Do BLAS stuff on panelTry to send panel to some processes:if Send buffer is full then
Call Main loopend ifFree stuff
BLAS
Task 1
15/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – implementation
MUMPS follows an (almost) fully dynamic scheduling scheme where aprocess can be asked on the fly to work on virtually any task.
Main loop:while . . . do
Probe for messageif message received then
Read tag, execute associated task(e.g., Facto panel)
end ifend while
Facto panel:Do BLAS stuff on panelTry to send panel to some processes:if Send buffer is full then
Call Main loopend ifFree stuff
BLASSEND
Task 1
15/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – implementation
MUMPS follows an (almost) fully dynamic scheduling scheme where aprocess can be asked on the fly to work on virtually any task.
Main loop:while . . . do
Probe for messageif message received then
Read tag, execute associated task(e.g., Facto panel)
end ifend while
Facto panel:Do BLAS stuff on panelTry to send panel to some processes:if Send buffer is full then
Call Main loopend ifFree stuff
BLASSEND
Task 1
15/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – implementation
MUMPS follows an (almost) fully dynamic scheduling scheme where aprocess can be asked on the fly to work on virtually any task.
Main loop:while . . . do
Probe for messageif message received then
Read tag, execute associated task(e.g., Facto panel)
end ifend while
Facto panel:Do BLAS stuff on panelTry to send panel to some processes:if Send buffer is full then
Call Main loopend ifFree stuff
BLAS
BLASSEND
Task 1 Task 2
15/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – implementation
MUMPS follows an (almost) fully dynamic scheduling scheme where aprocess can be asked on the fly to work on virtually any task.
Main loop:while . . . do
Probe for messageif message received then
Read tag, execute associated task(e.g., Facto panel)
end ifend while
Facto panel:Do BLAS stuff on panelTry to send panel to some processes:if Send buffer is full then
Call Main loopend ifFree stuff
BLASSEND
BLASSEND
Task 1 Task 2
15/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – implementation
MUMPS follows an (almost) fully dynamic scheduling scheme where aprocess can be asked on the fly to work on virtually any task.
Main loop:while . . . do
Probe for messageif message received then
Read tag, execute associated task(e.g., Facto panel)
end ifend while
Facto panel:Do BLAS stuff on panelTry to send panel to some processes:if Send buffer is full then
Call Main loopend ifFree stuff
BLAS
BLASSEND
BLASSEND
Task 1 Task 2 Task 3
15/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – implementation
MUMPS follows an (almost) fully dynamic scheduling scheme where aprocess can be asked on the fly to work on virtually any task.
Main loop:while . . . do
Probe for messageif message received then
Read tag, execute associated task(e.g., Facto panel)
end ifend while
Facto panel:Do BLAS stuff on panelTry to send panel to some processes:if Send buffer is full then
Call Main loopend ifFree stuff
FREE
BLASSEND
BLASSEND
BLASSEND
Task 1 Task 2 Task 3
15/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – implementation
MUMPS follows an (almost) fully dynamic scheduling scheme where aprocess can be asked on the fly to work on virtually any task.
Main loop:while . . . do
Probe for messageif message received then
Read tag, execute associated task(e.g., Facto panel)
end ifend while
Facto panel:Do BLAS stuff on panelTry to send panel to some processes:if Send buffer is full then
Call Main loopend ifFree stuff
FREE
BLASSEND
FREE
BLASSEND
BLASSEND
Task 1 Task 2
15/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – implementation
MUMPS follows an (almost) fully dynamic scheduling scheme where aprocess can be asked on the fly to work on virtually any task.
Main loop:while . . . do
Probe for messageif message received then
Read tag, execute associated task(e.g., Facto panel)
end ifend while
Facto panel:Do BLAS stuff on panelTry to send panel to some processes:if Send buffer is full then
Call Main loopend ifFree stuff
FREE
BLASSEND
FREE
BLASSEND
FREE
BLASSEND
Task 1
15/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – implementation
MUMPS follows an (almost) fully dynamic scheduling scheme where aprocess can be asked on the fly to work on virtually any task.
Main loop:while . . . do
Probe for messageif message received then
Read tag, execute associated task(e.g., Facto panel)
end ifend while
Facto panel:Do BLAS stuff on panelTry to send panel to some processes:if Send buffer is full then
Call Main loopend ifFree stuff
FREE
BLASSEND
FREE
BLASSEND
FREE
BLASSEND
15/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – implementation
Difficulties:• Messages representing different blocks of factors can overtake.• Deadlocks. . .
Example of messages overtaking (broadcast tree is a chain):
P0
P1
P2
P0P1P2
recursive call
tta tb tc
P2 receives the second panel before the first one!
16/21 MUMPS Users Group Meeting, May 29-30, 2013
Non-blocking broadcast – experiments
Experiments on a single node of size 64000 with 64 processors:
Strategy Time (s)1-way threaded BLAS 4-way threaded BLAS
Baseline 596 468Tree-based 401 167
Better overlapping between communications and computations⇒ more potential for exploiting multithreaded BLAS / faster cores.
Quite critical in a memory-aware context since the algorithm tends toincrease the number of processes per subtree.
17/21 MUMPS Users Group Meeting, May 29-30, 2013
Memory-aware mapping – more results
Back to experiments with the memory-aware mapping; matrices:
Matrix name Order Entries Factors Sseq Description; origin(millions) (millions) (GB) (GB)
cage13 0.4 7.5 30.7 17.9 Directed weighted graph; Utrecht U.as-Skitter 1.7 23.9 17.7 6.2 Internet topology graph; SNAPHV15R 2.0 283.1 366.4 87.8 CFD, 3D engine fan; FLUOREMMORANSYS1 2.7 81.3 63.5 19.1 Model Order Reduction; CADFEMmeca_raff6 3.3 130.2 63.5 8.8 Thermo-mechanical coupling; EDF
18/21 MUMPS Users Group Meeting, May 29-30, 2013
Memory-aware mapping – more results
Memory-aware mapping vs. default strategy (MUMPS 4.10.0):
Matrix Mapping Smax (MB) Savg(MB) Time(s)
cage13 Default 1069 727 285MA, M0 = 380MB 305 302 498
as-Skitter Default 1484 584 578MA, M0 = 130MB 135 70 285
HV15R Default 9063∗ 8773∗ N/AMA, M0 = 2600MB 2225 1803 7169
MORANSYS1 Default 1246 834 373MA, M0 = 380MB 379 305 651
meca_raff6 Default 1078 796 324MA, M0 = 320MB 329 209 367
19/21 MUMPS Users Group Meeting, May 29-30, 2013
Conclusion – on-going work
Conclusion• The memory-aware mapping ensures a given constraint. . .• . . . and its variant with groups tries to add more parallelism.• Increasing the number of processes per node led to performanceproblems =⇒ new (dense!) algorithms.
Future work• Memory-aware mapping with groups: assess some heuristics.• We have performance models that provide the optimal number ofprocessors to use on a node, and we are working on how to injectthese models into the mapping.
• More dynamic scheduling.• Compressing CBs with low-rank approximation techniques (cf workby Clément Weisbecker and the MUMPS team).
20/21 MUMPS Users Group Meeting, May 29-30, 2013
End
Thank you for your attention!
Any questions?
Acknowledgements
• Matrices: Stéphane Operto (SEISCOPE), Olivier Boiteau (EDF).• Computational resources: Nicolas Renon, CICT, Toulouse.
21/21 MUMPS Users Group Meeting, May 29-30, 2013