Gesellschaft für Informatik e.V.

Lecture Notes in Informatics

ARCS 2012 Workshops P-200, 191-200 (2012).

Gesellschaft für Informatik, Bonn

Copyright © Gesellschaft für Informatik, Bonn


Ungerer fault localization in nocs by timed heartbeats

Bernhard Fechner , Arne Garbade , Sebastian Weis and Theo


Future computing systems will contain more and more cores on a single die. Permanent faults occur not only during manufacturing but may also arise at runtime. To detect these faults, a group of cores is monitored by a single unit, receiving heartbeats from all cores. In this paper, we present a simple method to localize permanent faults in a 2D mesh-based NoC by using heartbeats and by measuring the time from source (core) to destination (monitoring unit). We introduce a heartbeat network along with the normal application message network to guarantee a deterministic heartbeat timing and no interferences with application messages. If the time for a heartbeat exceeds a given interval, it can be concluded that the heartbeat is missing or delayed, e.g. because of a faulty core, link or router. As this is not sufficient to localize a fault, we introduce the concept of Timed Heartbeats, which uses different routing directions in contrary to the intended routing to introduce a fixed, additional delay for rerouted heartbeats. The delay helps to localize the fault without any additional bandwidth consumption.

Full Text: PDF

Gesellschaft für Informatik, Bonn
ISBN 978-3-88579-294-9

Last changed 20.02.2014 12:47:43