How to Handle Failed Transactions in Microservices

Many challenges occur when you migrate to a microservices architecture from a monolithic application. These are related to the complexities of a distributed system.

A normal database transaction done in a monolithic application using local transactions is a complicated distributed transaction problem when you move to microservices architecture. In this article, we investigate what causes this, look at some possible solutions, and some best practices to be followed.

In monolithic applications, ACID transactions as found in RDBMS are mainly used.

Atomicity: All operations are either executed successfully or all of them fail together.

Consistency: The data in the database is kept in a valid state by referential integrity.

Isolation: Separated transactions running concurrently do not interfere with one another. Each transaction should run in its isolated environment and other transactions cannot see any of these changes in that duration.

Durability: Once a transaction is committed, the changes are stored in a durable medium such as a disk, so that any temporary crash of the database server will not cause data loss.

Read our blog “Transaction Boundaries in Monolithic and Microservices Systems”.

In a monolithic system, these ACID transactions help to maintain the integrity and correctness of the data. In most typical monolithic systems, these data and processing requirements reside on a single database server. However, when you need to scale up the system because of growing requirements for data access, read/write scaling, or storage capacity this architecture model falls apart.

To address these challenges and avoid performance bottlenecks in microservices, many organizations are turning to patterns like Circuit Breakers. Circuit Breakers act as a safeguard against failures in distributed systems by preventing cascading issues across services. When a service becomes unresponsive or encounters an error, the Circuit Breaker can "trip" and halt further calls to the problematic service, allowing the system to maintain stability and recover without overloading the network or causing further delays.

Learn more about the importance of Circuit Breakers in microservices architecture here: Circuit Breaker in Microservices.

Data Modeling in Microservices Architecture

A microservices architecture model has to be both loosely coupled and cohesive. Therefore, microservices should not share databases in a strict sense. If a microservice cannot have a database of its own, then this microservice should be merged with another one.

A key principle in maintaining loose coupling and cohesion in a microservices architecture is ensuring that microservices communicate through well-defined interfaces, rather than sharing a common database. Microservices communication typically happens via API calls, message queues, or event streaming, enabling each service to operate independently while still collaborating within the system.

Read our blog Why have big companies like Amazon and Netflix adopted Microservices?

For data consistency in microservices, 2 PC based solutions are used. However, these are used with caution because typical 2 PC transactions cause increased lock times in the backend databases. The time gaps increase when there are extra communication hops between networks. However, a majority of real-life workflows do not need ACID guarantees. Here, incorrect actions may be reversed using the opposite of the action that took place (crediting back a payment to a credit card, adding product count back to the product inventory, etc.).

What can go wrong

There are several moving components in a microservices architecture, and therefore, has more points of failure. Failures can be caused by many reasons such as the release of new code, errors and exceptions in code, bad deployments, data center failure, hardware failures, poor architecture, communication over an unreliable network, lack of unit tests, and dependent services, among others.

Making Services Resilient

Distributed applications have an inherent problem in that they communicate over a network, which by nature may be unreliable. Hence, it is vital to design your microservices so that they are fault-tolerant and handle failures gracefully. In a microservices architecture model, there may be several services talking with one another and hence you have to ensure that a failed service does not bring down the entire microservices system.

Read our blog: Microservices Authorization 8 Best Practices

Other ways of Handling Partial Failures

Using Asynchronous Communication

As an example, you could use message-based communication across internal microservices. One mistake is creating long chains of synchronous HTTP calls across internal microservices. The incorrect design will eventually cause bad outages. However, other than for the front-end communications between the client applications and the first level of microservices/fine-grained API Gateways, it is always recommended to use only asynchronous (message-based) communication across the internal microservices (i.e., once past the initial request/response cycle). Designing for eventual consistency and event-driven architectures can help to minimize ripple effects. These approaches enforce a better level of autonomy for the microservices and prevent the above mentioned problem.

Download and read our eBook “Porting from Monolith to Microservices

Using Retries with Exponential Backoff

This technique is used to avoid short/intermittent failures by performing call retries a specific number of times. This is resorted to if the service was not available only for a short period. This break could have occurred because of intermittent network issues or when a service/container is moved to a different node (in a cluster). In case these retries have not been appropriately designed using circuit breakers, it can aggravate ripple effects, and may even cause an eventual Denial of Service (DoS).

Working Around Network Timeouts

In general, clients should be designed to always use timeouts when waiting for a response and not to block indefinitely. Using timeouts makes sure that resources are not tied up indefinitely.

Using the Circuit Breaker Pattern

This would help to make your service more resilient. You can wrap a protected function call inside a circuit breaker object that looks out for failures. When the failures reach a specific threshold, the circuit breaker trips and any further call to the circuit breaker returns an error/alternative service/a default message, without the protected function call being made in the first place. This makes the system responsive enough. The circuit breaker is designed to have three distinct states, closed, open, and half-open.

Read our blog How Microservices will help you develop a better web application

Closed

The circuit breaker remains ‘closed’ when all is normal and all calls pass through to the services. Once the number of failures exceeds a threshold (predetermined), then the breaker trips, and moves to the ‘open’ state.

Open

The circuit breaker returns an error for function calls without executing the function.

Half-Open

After a specified timeout period, the circuit switches to ‘half-open’ to test if the underlying problem persists. Even if a single call fails in this state, the circuit breaker trips once again. If the call succeeds, the circuit breaker resets back to normal and is in the ‘closed’ state.

Are you thinking of shifting to microservices to help your business grow? Call SayOne today!

You can implement latency and fault tolerance libraries that are designed to isolate services, points of access to remote systems, and 3rd-party libraries in a distributed environment.

Providing Fallbacks

Here, the client process is designed to perform fallback logic when a request fails, including returning cached data/a default value. This is an approach that suits when a response has to be given for queries and gets more complex for updates/commands.

Limiting the Number of Queued Requests

Clients should also be designed to impose an upper limit on the number of outstanding requests that a client microservice can send to another particular service. When this limit has been reached, the system should not able to make additional requests, and such attempts should automatically fail. The Polly Bulkhead Isolation policy can be used to implement this requirement. This is essentially a parallelization throttle which permits a "queue" outside the bulkhead. You can proactively cut off excess load even before execution. This can work faster than a circuit breaker in response to certain failure scenarios because the circuit breaker waits for failures to happen.

Download and read our eBook “MICROSERVICES- A Short Guide”.

Conclusion

Transactions in distributed systems surely bring their challenges, and even while solving them, a new set of problems can simultaneously evolve.

How SayOne can help in Microservices Development

At Sayone, we design and implement microservices systems that do not have complex architectural layers, and this enables the services to deliver exceptionally fast performance. Moreover, we provide services that are significantly decoupled, allowing you to launch independent services and not end up with the usual inter-dependent microservices that work more or less like a monolith.

We design the microservices keeping in mind the margin required to allow for the transitioning into the new system of your organization’s legacy architecture, as well as expanding into the cloud system. Our microservices comprise lightweight code and we provide competitive pricing options for our clients.

Are you looking for a microservices vendor? Call SayOne or drop us a note!

Our microservices are built according to the latest international security guidelines that ensure the complete safety of all the data. We also ensure that we deliver the services within stipulated deadlines and we always assure a quick turnaround time for our clients. Equipped with the best infrastructure and the latest tools and technologies, our expert developers will provide you with the best microservices that are easily scalable, enabling a good ROI in the shortest period.