Share

Reactive Systems

 

In July 2013 a group of software engineers led by Jonas Bonér published the Reactive Manifesto, recently updated in the version 2.0. The design of reactive systems is currently a growing trend and a topic that crops up time and again at software engineering conferences. But what is a reactive system? What does this manifesto proclaim? Let's take it one step at a time.

Increasingly demanding systems

Application requirements have changed dramatically in recent years. Not so long ago, an application regarded as large might have dozens of servers, gigabytes of data and seconds of response times. Furthermore, these systems required hours of offline maintenance.

Nowadays, applications are deployed in a huge variety of devices, from cell phones to cloud computing clusters  with thousands of multi-core processors. Data are measured in petabtyes and users expect millisecond response times and 100% uptime.

In this situation, the demands made of today's systems are not adequately met by existing software architectures.

Reactive Manifesto

The manifesto proclaims that a new, more coherent approach to systems architecture is needed to ensure that it is:

·        Responsive: The system quickly detects problems and manages them effectively. Responsive systems focus on providing rapid and consistent response times and a consistent quality of service as the cornerstones of usability.

·        Resilient: The system remains responsive in the face of failure thanks to the use of replication, containment, isolation and delegation techniques. Failures are contained within each component, isolating components from each other and therefore ensuring that parts of the system can fail and recover without compromising the entire system. Recovery of each component is delegated to other external components and high availability is ensured by replication where necessary. The client of a component is not burdened with the responsibility for handling its failures.

·        Elastic: The system remains responsive under varying workloads, dynamically increasing or decreasing the resources allocated (CPU, memory, storage, etc.) for the service. This implies designs that have no bottlenecks, resulting in the ability to replicate components and distribute the workload between them. The system must also provide relevant performance measures in real time so that scaling algorithms can be applied predictively and/or reactively.

·        Message driven: The system relies on asynchronous message-passing between components to ensure loose coupling and isolation. The system also provides the means to delegate errors as asynchronous messages.

Reactive systems are much more tolerant of failure, and when failure occurs they handle it more elegantly. They are also highly adaptable to the surrounding conditions and can provide effective feedback to users.

An example: the coffee machine (or how to handle errors with elegance)

Error management is usually a secondary aspect and is left to the end in most applications. This approach usually generates two problems: poor isolation and containment of errors, and sending these as asynchronous messages directly to the client for resolution.

Let's suppose a person wants to buy a coffee from a vending machine. The coffee costs 40 cents. If the person only inserts a 20-cent coin, nothing will happen because he has not fulfilled his part of the service contract. So instead of returning coffee, the coffee will display an error message: “Please insert another 20-cent coin." This is what you would expect. The user of the coffee machine is responsible for fulfilling his part of the service contract. Most applications do a good job of presenting error messages and handling failures at this level (validation errors).

But what if the person inserts another 20-cent coin and the machine doesn't work because the coffee beans are stuck? You wouldn't expect the machine to return a message telling the user to open and dismantle it in order to fix the problem. That is not the user's responsibility (it is an application error). Instead, ideally, the machine would send a notification to the maintenance service requesting the presence of someone to repair it.

The reactive focus: Let it crash

Instead of simply sending error messages to the user (who is not in a position to do anything to resolve the problem), with a reactive approach the system is able to isolate and contain the error (avoiding a complete failure of the application) and then try to resolve it automatically by sending a message to the most suitable receiver (the supervisor component of the component that has failed) to handle the error.

In this design for failure model – sometimes called embrace failure or let it crash – errors are not regarded as exceptional but as part of the normal message workflow between the different components of an application.

Conclusion

The separation between validation errors and application errors is very important and yet it is still missing or at best confused in most applications.

The fact is that application errors should never be sent to the user, but the majority of languages (eg. Java) expect you to do just that with their asynchronous mechanism for launching exceptions, where try-catch statements are the only tool for handling errors. This forces you to program very defensively and be prepared for any method or service call to fail at any time and return an application error.

As a result of this approach, we often see applications in which the error management code is scattered all over the application and tangled with the business logic in an incomprehensible mess.