Error Handling in Distributed Systems II: Coming Back from the Dead

Here's yet more blather about exception handling. The last one, I promise! (For now...)

My last post describes how I like to coordinate exception handling inside the components of a distributed system. A component handles the failure of other remote services that it uses - database servers, for example - by sending back appropriate error responses or rolling back distributed transactions. That's all well and good when a component runs within an application server, because the server handles all the messy details of fail-over and reconnection for you. But what about "main apps": plain old Java programs that run from a main method?

It can take a lot of code to correctly handle connection failure and reconnection, exponentially back-off connection attempts, clean up long-lived objects that hold onto connections, and hide all the messy technical details away from the business logic behind domain-term interfaces. It's also hard to get all the corner cases right.

It's much easier to just not bother with reconnection at all.

When a main app catches an EnvironmentException it should roll back transactions, send back response codes, or whatever it needs to do and then, instead of trying to reconnect, just die. Launch the app from a supervisor process that restarts it whenever it dies. The Java Service Wrapper does the job very nicely.

Now you don't need to write any reconnection logic at all. The application's start-up code is enough.

This greatly simplifies writing distributed Java apps. And a simple system is more reliable, easier to secure and easier to change.

Mistaeks I Hav Made

Error Handling in Distributed Systems II: Coming Back from the Dead