Neighbour replica affirmative adaptive failure detection and autonomous recovery
High availability is an important property for current distributed systems. The trends of current distributed systems such as grid computing and cloud computing are the delivery of computing as a service rather than a product. Thus, current distributed systems rely more on the highly available...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English English |
Published: |
2012
|
Subjects: | |
Online Access: | http://eprints.uthm.edu.my/2475/1/24p%20AHMAD%20SHUKRI%20MOHD%20NOOR.pdf http://eprints.uthm.edu.my/2475/2/AHMAD%20SHUKRI%20MOHD%20NOOR%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/2475/3/AHMAD%20SHUKRI%20MOHD%20NOOR%20WATERMARK.pdf http://eprints.uthm.edu.my/2475/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | High availability is an important property for current distributed systems. The trends
of current distributed systems such as grid computing and cloud computing are the
delivery of computing as a service rather than a product. Thus, current distributed
systems rely more on the highly available systems. The potential to fail-stop failure
in distributed computing systems is a significant disruptive factor for high
availability distributed system. Hence, a new failure detection approach in a
distributed system called Affirmative Adaptive Failure Detection (AAFD) is
introduced. AAFD utilises heartbeat for node monitoring. Subsequently, Neighbour
Replica Failure Recovery(NRFR) is proposed for autonomous recovery in distributed
systems. AAFD can be classified as an adaptive failure detector, since it can adapt to
the unpredictable network conditions and CPU loads. NRFR utilises the advantages
of the neighbour replica distributed technique (NRDT) and combines with weighted
priority selection in order to achieve high availability, since automatic failure
recovery through continuous monitoring approach is essential in current high
availability distributed system. The environment is continuously monitored by
AAFD while auto-reconfiguring environment for automating failure recovery is
managed by NRFR. The NRFR and AAFD are evaluated through virtualisation
implementation. The results showed that the AAFD is 30% better than other
detection techniques. While for recovery performance, the NRFR outperformed the
others only with an exception to recovery in two distributed technique (TRDT).
Subsequently, a realistic logical structure is modelled in complex and interdependent
distributed environment for NRDT and TRDT. The model prediction showed that
NRDT availability is 38.8% better than TRDT. Thus, the model proved that NRDT is
the ideal replication environment for practical failure recovery in complex distributed
systems. Hence, with the ability to minimise the Mean Time To Repair (MTTR)
significantly and maximise Mean Time Between Failure (MTBF), this research has
accomplished the goal to provide high availability self sustainable distributed system. |
---|