IN-FEED-AD

Fault Tolerance In Cloud Computing | Cloud Tutorial | Pywix Classes

What is fault Tolerance?

Fault Tolerance is a methodology to system design that permits a system to keep performing actually when one of its parts falls flat or it can be defined as capacity of a system to react fastly to an unexpected equipment or programming break down. If not fully operational, fault tolerance solutions may allow a system to continue operating at reduced capacity rather than shutting down completely.

Metrics for Fault Tolerance in Cloud Computing

The existing fault tolerance technique in cloud computing considers various parameters: throughput, response-time, scalability, performance, availability, usability, reliability, security and associated over-head.


Types of fault tolerance:-

  • Reactive fault tolerance:- Reactive fault tolerance techniques are used to reduce the impact of failures on a system when the failures have actually occurred. Techniques based on this policy are checkpoint/Restart and retry and so on.
    • Check pointing/Restart- The failed task is restarted from the recent checkpoint rather than from the beginning. It is an efficient technique for large applications.
    • Replication: In order to make the execution succeed, various replicas of task are run on different resources until the whole replicated task is not crashed. Hadoop and Amazon EC2 are used for implementing replication.
    • Job migration: On the occurrence of failure, the job is migrated to a new machine. HAProxy can be used for migrating the jobs to other machines.
    • Retry: This task level technique is simplest among all. The user resubmits the task on the same cloud resource.
    • Task Resubmission: The failed task is submitted again either to the same machine on which it was operating or to some other machine.

Related Other Post

  • Proactive Fault Tolerance: -Proactive fault tolerance predicts the faults proactively and replace the suspected components by other working components thus avoiding recovery from faults and errors.
    • Software Renew:-the system is planned for periodic reboots and every time the system starts with a new state.
    • self-healing:- Failure of an instance of an application running on multiple virtual machines is controlled automatically.
    • Preemptive Migration:- In this technique an application is constantly observed and analyzed. Preemptive migration of a task depends upon feed-back-loop control mechanism.

Ask question #Pywix

Please Like, Comment, Share and Subscribe THANK YOU!

Post a Comment

2 Comments

  1. This is really an important blog with many helpful information. I have been searching for a long time for this types of content. Keep up posting more and thanks for your great staff.

    Check The Details About CCAC Course Here

    ReplyDelete

if u have any doubts please let me know,