![Page 1: Claude Tadonki Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS University of Orsay](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56814c4e550346895db959dc/html5/thumbnails/1.jpg)
Claude TadonkiLaboratoire de l’Accélérateur Linéaire/IN2P3/CNRS
University of OrsayOrsay / France
1st Workshop on Applications for Multi and Many Core Architectures22nd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2010)
October, 27 – 30 2010, Petrópolis, Rio de Janeiro, Brazil.
![Page 2: Claude Tadonki Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS University of Orsay](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56814c4e550346895db959dc/html5/thumbnails/2.jpg)
1st Workshop on Applications for Multi and Many Core Architectures22nd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2010)
October, 27 – 30 2010, Petrópolis, Rio de Janeiro, Brazil.
Ring pipelined algorithm for the algebraic path problem on the CELL Broadband Engine C. TADONKI
The Algebraic Path ProblemThe Algebraic Path Problem
![Page 3: Claude Tadonki Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS University of Orsay](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56814c4e550346895db959dc/html5/thumbnails/3.jpg)
Ring pipelined algorithm for the algebraic path problem on the CELL Broadband Engine C. TADONKI
The Warshall-Floyd AlgorithmThe Warshall-Floyd Algorithm
1st Workshop on Applications for Multi and Many Core Architectures22nd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2010)
October, 27 – 30 2010, Petrópolis, Rio de Janeiro, Brazil.
![Page 4: Claude Tadonki Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS University of Orsay](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56814c4e550346895db959dc/html5/thumbnails/4.jpg)
Ring pipelined algorithm for the algebraic path problem on the CELL Broadband Engine C. TADONKI
Shift-toroïdal Reindexation ( Kung-Lo-Lewis, 1987)Shift-toroïdal Reindexation ( Kung-Lo-Lewis, 1987)
1st Workshop on Applications for Multi and Many Core Architectures22nd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2010)
October, 27 – 30 2010, Petrópolis, Rio de Janeiro, Brazil.
![Page 5: Claude Tadonki Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS University of Orsay](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56814c4e550346895db959dc/html5/thumbnails/5.jpg)
Ring pipelined algorithm for the algebraic path problem on the CELL Broadband Engine C. TADONKI
The CELL Broadband EngineThe CELL Broadband Engine
1st Workshop on Applications for Multi and Many Core Architectures22nd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2010)
October, 27 – 30 2010, Petrópolis, Rio de Janeiro, Brazil.
![Page 6: Claude Tadonki Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS University of Orsay](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56814c4e550346895db959dc/html5/thumbnails/6.jpg)
Ring pipelined algorithm for the algebraic path problem on the CELL Broadband Engine C. TADONKI
Ring Pipelined Algorithm for the APP ( algorithm )Ring Pipelined Algorithm for the APP ( algorithm )
1st Workshop on Applications for Multi and Many Core Architectures22nd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2010)
October, 27 – 30 2010, Petrópolis, Rio de Janeiro, Brazil.
![Page 7: Claude Tadonki Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS University of Orsay](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56814c4e550346895db959dc/html5/thumbnails/7.jpg)
Ring pipelined algorithm for the algebraic path problem on the CELL Broadband Engine C. TADONKI
Ring Pipelined Algorithm for the APP ( algorithm )Ring Pipelined Algorithm for the APP ( algorithm )
1st Workshop on Applications for Multi and Many Core Architectures22nd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2010)
October, 27 – 30 2010, Petrópolis, Rio de Janeiro, Brazil.
Can run with any number of processors p <= N ( natural LPGS )
Interesting properties of our algorithm
Generic tiling applies ( LSGP by blocking )
Each processor only requires a buffer of size bN ( Block of size b )
Fully pipelined process with local synchronization only
Perfect computation-communication overlap
![Page 8: Claude Tadonki Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS University of Orsay](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56814c4e550346895db959dc/html5/thumbnails/8.jpg)
Ring pipelined algorithm for the algebraic path problem on the CELL Broadband Engine C. TADONKI
Ring Pipelined Algorithm for the APP ( implementation on the CELL BE )Ring Pipelined Algorithm for the APP ( implementation on the CELL BE )
1st Workshop on Applications for Multi and Many Core Architectures22nd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2010)
October, 27 – 30 2010, Petrópolis, Rio de Janeiro, Brazil.
PPE-DMA is issued only by the first and the last processor
Inner SPEs communicate and synchronize locally
Computation-communication overlap occurs for all communications
Can run on more SPEs or CELL Blades by natural extension
![Page 9: Claude Tadonki Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS University of Orsay](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56814c4e550346895db959dc/html5/thumbnails/9.jpg)
Ring pipelined algorithm for the algebraic path problem on the CELL Broadband Engine C. TADONKI
PerformancesPerformances
1st Workshop on Applications for Multi and Many Core Architectures22nd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2010)
October, 27 – 30 2010, Petrópolis, Rio de Janeiro, Brazil.
![Page 10: Claude Tadonki Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS University of Orsay](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56814c4e550346895db959dc/html5/thumbnails/10.jpg)
Ring pipelined algorithm for the algebraic path problem on the CELL Broadband Engine C. TADONKI
Conclusion and PerspectivesConclusion and Perspectives
1st Workshop on Applications for Multi and Many Core Architectures22nd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2010)
October, 27 – 30 2010, Petrópolis, Rio de Janeiro, Brazil.
Our ring SPMD algorithm suits for the CELL BE with a good scalabilityOur ring SPMD algorithm suits for the CELL BE with a good scalability
Communication and synchronization yield less than 5% overheadCommunication and synchronization yield less than 5% overhead
Absolute performance can be improved by optimizing the APP kernelAbsolute performance can be improved by optimizing the APP kernel
Close to 80% of the peak performance expectedClose to 80% of the peak performance expected
Our scheduling can be applied to similar problemsOur scheduling can be applied to similar problems
![Page 11: Claude Tadonki Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS University of Orsay](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56814c4e550346895db959dc/html5/thumbnails/11.jpg)
Ring pipelined algorithm for the algebraic path problem on the CELL Broadband Engine C. TADONKI
END & QUESTIONSEND & QUESTIONS
1st Workshop on Applications for Multi and Many Core Architectures22nd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2010)
October, 27 – 30 2010, Petrópolis, Rio de Janeiro, Brazil.