Teraflux
EC Project Teraflux
- Start date: 01.01.2010
- End date: 31.03.2014
- Funded by: EC (European Community)
- Local head of project: Prof. Dr. Theo Ungerer
- Local scientists: Sebastian Weis
-
External scientists / cooperations:
Roberto Giorgi (Project Leader), Sandro Bartolini (University of Siena)
Mateo Valero, Nacho Navarro, Yoav Etsion (Barcelona Supercomputing Center, Spain)
Francois Bodin (CAPS entreprise, France)
Paolo Faraboschi, Eduardo Argollo (Hewlett-Packard Labs Barcelona, Spain)
Albert Cohen (INRIA, France)
Avi Mendelson (Microsoft R&D Israel)
Eric Lenormand, Philippe Bonnot, Teodora Petrisor (THALES)
Paraskevas Evripidou, Pedro Trancoso (University of Cyprus)
Abstract
Parallel systems will in future be widely available in form of multi-/many-core building blocks with hundreds or thousands of cores on a chip.
In order to address the programmability challenges of such many-cores, we combine an underlying dataflow-based thread execution with advanced programming models like transactional memory.
The second challenge addressed by this project is the definition of an appropriate architecture to match the proposed execution model and reliability challenges. The architectural explorations will cover the concepts of data-driven and decoupled thread execution, provide architectural support for the parallel programming model, introduce specific hardware scheduling units able to manage different levels of thread granularities, take care of code or data migration based on information passed by the virtual layer, and consider power, thermal, and fault information. The underlying architectural elements will essentially encompass a heterogeneous architecture trying to reuse existing ”off-the-shelf” or well-known components.
The third challenge, which is the particular objective of University of Augsburg, concerns reliability issues: such a large number of cores together with the high density of the components that are integrated into the chip results obviously in systems that will suffer from failures during runtime. These failures may be transient or permanent. The system must provide mechanisms to detect such failures and resume execution with reconfigured core, link and memory assignments in order to complete the execution successfully.
Our approach for evaluating the research proposals is based on a many-core simulator model provided by the COTSon simulation framework of TERAFLUX partner HP Labs.
Partners in the TERAFLUX project are the University of Siena, the Barcelona Supercomputing Center, CAPS Enterprise, Hewlett Packard, INRIA, Microsoft, THALES, the University of Cyprus, the University of Manchester, and the Chair of Systems and Networking of the University of Augsburg.
Publications
2014
- TERAFLUX: Harnessing Dataflow in Next Generation Teradevices
Roberto Giorgi, Rosa M. Badia, François Bodin, Albert Cohen, Paraskevas Evripidou, Paolo Faraboschi, Bernhard Fechner, Guang R. Gao, Arne Garbade, Rahul Gayatri, Sylvain Girbal, Daniel Goodman, Behram Khan, Souad Koliai, Joshua Landwehr, Nhat Minh Lê, Feng Li, Mikel Luján, Avi Mendelson, Laurent Morin, Nacho Navarro, Tomasz Patejko, Antoniu Pop, Pedro Trancoso, Theo Ungerer, Ian Waton, Sebastian Weis, Stéphane Zuckerman, Mateo Valero
Journal of Microprocessors and Microsystems: Embedded Hardware Design (MICPRO), April 2014
2013
- Fault Localization in NoCs Exploiting Periodic Heartbeat Messages in a Many-Core Environment
Arne Garbade, Sebastian Weis, Bernhard Fechner, Theo Ungerer
Proceedings of the 27th International Symposium on Parallel & Distributed Processing Workshops and PhD Forum (CASS 2013), Boston, USA, pages 791-795
- Impact of Message-Based Fault Detectors on a Network on Chip
Arne Garbade, Sebastian Weis, Sebastian Schlingmann, Bernhard Fechner, Theo Ungerer
Proceedings of the 21th International Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP 2013), pages 470-477
2012
- Fault Coverage of a Timing and Control Flow Checker for Hard Real-Time Systems
Julian Wolf, Bernhard Fechner, and Theo Ungerer
Proceeding of the 18th IEEE International On-Line Testing Symposium (IOLTS '12), Sitges, Spain, p. 161-163
- Fine-Grained Timing and Control Flow Error Checking for Hard Real-Time Task Execution
Julian Wolf, Bernhard Fechner, Sascha Uhrig, and Theo Ungerer
Proceeding of the 7th IEEE International Symposium on Industrial Embedded Systems (SIES '12), Karlsruhe, Germany, p. 257-266
- Simulating the Future kilo-x86-64 core Processors and their Infrastructure
Antoni Portero, Alberto Scionti, Zhibin Yu, Paolo Faraboschi, Caroline Concatto, Luigi Carro, Arne Garbade, Sebastian Weis, Theo Ungerer, Roberto Giorgi
2012 Spring Simulation Multiconference (SpringSim 2012)
- Fault localization in NoCs by Timed Heartbeats
Bernhard Fechner, Arne Garbade, Sebastian Weis, Theo Ungerer
Proceedings of the 8th Workshop on Dependability and Fault Tolerance (ARCS / VERFE 2012), LNI 200, pages 191 - 200
2011
- A Fault Detection and Recovery Architecture for a Teradevice Dataflow System
Sebastian Weis, Arne Garbade, Julian Wolf, Bernhard Fechner, Avi Mendelson, Roberto Giorgi, Theo Ungerer
Proceedings of the First Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM 2011), pages 38-44
- OC Techniques Applied to Solve Reliability Problems in Future 1000-core Processors
Arne Garbarde, Sebastian Weis, Sebastian Schlingmann, Theo Ungerer
Organic Computing — A Paradigm Shift for Complex Systems, pages 575 - 577
- Towards Fault Detection Units as an Autonomous Fault Detection Approach for Future Many-Cores
Sebastian Weis, Arne Gabarde, Sebastian Schlingmann, Theo Ungerer
Proceedings of the 1st Workshop on Software-Controlled, Adaptive Fault-Tolerance in Microprocessors (SCAFT 2011) at the 24th International Conference on Architecture of Computing Systems (ARCS 2011), pages 20-23
- Connectivity-sensitive Algorithm for Task Placement on a Many-core Considering Faulty Regions
Sebastian Schlingmann, Arne Garbade, Sebastian Weis, Theo Ungerer
Proceedings of the 19th International Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP 2011)
2010
- Fault detection and reliability techniques for future many-cores
Sebasitan Weis, Arne Gabarde, Faruk Bagci, Theo Ungerer
Poster Abstracts of the 6th international summer school on advanced computer architecture and compilation for high-performance and embedded systems (ACACES 2010), pages 175-178