A New Approach to Reduce Control Messages in Failure Detectors / UMA NOVA ABORDAGEM PARA REDUÇÃO DE MENSAGENS DE CONTROLE EM DETECTORES DE DEFEITOS

AUTOR(ES)
DATA DE PUBLICAÇÃO

2006

RESUMO

An unreliable failure detector is a basic building block widely used to implement fault tolerance techniques in asynchronous distributed systems. The use of failure detectors comes from the impossibility to implement deterministic agreement protocols in these environments, since it is not possible to distinguish a crashed process from a very slow process. However, the massive use of distributed computational resources claims for solutions applicable in large scale distributed systems. In these systers, traditional failure detector algorithms can present scalability problems, such as control message explosion problem, because a large number of messages could compromise the quality of service of failure detectors and the system scalability. The goal of this dissertation is minimize the problem of control message explosion generated by failure detector algorithms in large scale processes monitoring. To do that, we propose a new approach to reduce the number of control messages from reusing messages. Our approach explores the manipulation of the interrogation period or heartbeat period, maximizing the reuse of messages, and it is organized by two strategies: ATF (Frequency Rate Adaptation), that reuses failure detector messages to suppress control messages; and AMA (Reusing of Application Message), that reuses client application messages to suppress control messages. As result, the resulting approach is generic, in the sense that it could be applied to any failure detector algorithm, and practical, in the sense that for its, the traditional failure detectors algorithms need only to change the semantic of control messages. From our experimental results, we demonstrate that our approach reduces the number of control messages, minimizing the message explosion problem, without compromising the quality of service of the failure detector

ASSUNTO(S)

message explosion explosão de mensagens failure detector tolerância a falhas reuse of messages fault tolerance engenharia de producao asynchronous distributed systems reaproveitamento de mensagens detectores de defeitos sistemas distribuídos assíncronos

Documentos Relacionados