It's common for software systems to make remote calls to software running in different processes, probably on different machines across a network. One of the big differences between in-memory calls and remote calls is that remote calls can fail, or hang without a response until some timeout limit is reached. What's worse if you have many callers on a unresponsive supplier, then you can run out of critical resources leading to cascading failures across multiple systems. In his excellent book Release It, Michael Nygard popularized the Circuit Breaker pattern to prevent this kind of catastrophic cascade.

The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all. Usually you'll also want some kind of monitor alert if the circuit breaker trips.

Continue reading Martin Fowler article on circuit barriers for fault tolerance implementaton.

1 Fault tolerant libraries

There are several fault tolerant implementations in Java including:

  • Nextlix Hystrix, is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.
  • Resilenze4J, a lightweight fault tolerance library inspired by Netflix Hystrix, but designed for Java 8 and functional programming created by Robert Winkler.
  • Istio, a library that supports managing traffic flows between microservices, enforcing access policies, and aggregating telemetry data, all without requiring changes to the microservice code.

2 Circuit barrier libraries

  • Failsafe, a lightweight, zero-dependency library for handling failures

3 Retry libraries

  • Retry4j, a library to assist with retrying transient failure situations or unreliable code
  • Java retry, lets developers make their applications more resilient by adding robust transient fault handling logic
  • Guava retrying, a small extension to Google's Guava library to allow for the creation of configurable retrying strategies for an arbitrary function call, such as something that talks to a remote service with flaky uptime