Event-Driven Architecture with the Saga Pattern Approach in AWS

Collin Smith
7 min readFeb 15, 2023

--

Event-Driven Architecture is a key concept when designing modern cloud applications with a microservices approach. The core concept is that when an event is created it will be used to drive change within a solution. Services are not tied together in a monolithic manner. This means that calls are made asynchronously and the results will be achieved in an eventually consistent manner.

Components of the Event-driven architecture

Event Producers, Event Routers, Event Consumers in AWS

Event Producers: Services which generate a stream of events. Event producers can be implemented by straight forward microservices implemented with Lambda or even with Kubernetes based microservices in EKS.

Event Consumers: Services which listen for events. Event consumers can be implemented by straight forward microservices implemented with Lambda or even with Kubernetes based microservices in EKS.

Event Routers: Routing services an be implemented with SQS, SNS, MSK, Step Functions and EventBridge. Event routers create the path and flow for each of the messages within an Event-driven architecture bases solution.

By decoupling producer and consumer services, independent scalability, updates and deployments can be achieved. This aids in improving cost implications and improving performance. If an event producer creates an event, this reduces the wait time as they do not have to be aware of the eventual work required to process the event. The event producer does not have to wait for the event to be fully processed and can rely on the concept of Eventual Consistency. Essentially, eventual consistency guarantees that if no new events are produced for a given data item it will eventually settle to its expected value.

Key benefits of an Event Driven Architecture

Agile Development

Event-driven architectures promote loose coupling between publishers and subscribers. Event subscription and routing is independent of the actual producers and consumers. A decoupled approach allows you to develop features faster by removing complexity.

Cost savings

Event-driven architectures are push-based which removes the need for continuous polling to check for events. This means that there is less network bandwidth consumption, less CPU utilization, less idle fleet capacity, and less SSL/TLS handshakes. There is less capacity guessing involved

Scale and fail independently

In an event driven architecture, the services are not aware of one another. They are only aware of the events they receive. This allows each service to scale independently and not be affected by the load that other services are undergoing. This means that resource sizing is not an issue. The system will scale as it requires and will adjust dynamically as required.

Auditability

The event routing mechanism is a centralized location where your application can restrict who can publish and subscribe to a router and control which users and resources have permission to access your data. Events can be encrypted at rest and in transit and configured with security policies.

Key Design Patterns for Event-driven solutions

Saga Pattern

The Saga Pattern is an important pattern when implementing event driven architectures. This pattern helps manage data consistency across microservices in distributed transaction scenarios. If a step fails, the saga should execute compensating transactions to counteract the preceding transactions.

There are 2 Saga pattern options: Choreography or Orchestration

Saga Choreography

Choreography is a Saga pattern where participants exchange events without a centralized point of control. Each local transaction publishes events for other services to consume.

Choreography is good for simple workflows which require only a few participants. Some of the disadvantages of this approach is that it can become confusing when adding new steps. Additionally, cyclic dependencies can make integration testing difficult.

Saga Choreography(in green)

The figure above is a Saga Choreography as implemented in Santa’s Workshop. Note the lack of an orchestrator and the relatively simple flow.

Saga Orchestration

Saga Orchestration can be achieved with the use of a centralized controller or Orchestrator. In AWS, this can be achieved with AWS Step Functions. This orchestrator manages and coordinates the entire transaction lifecycle.

Saga Orchestration with AWS Step Functions

When building out the Saga Orchestration, consideration must be given to compensating transactions to counteract preceding transactions. If a local transaction fails then a sequence of compensating transactions must be executed to undo changes caused by preceding local transactions.

Compensating transactions

Transactions that will be required if the overall Saga Transaction has been long lived and the system needs to be returned to its initial state. See the gray elements in the Saga Orchestration above. Compensating transactions are not a trivial portion of the development process to ensure that the Saga Pattern handles transactions to ensure that eventual consistency is achieved.

Pivot transactions

Transactions which indicate whether a Saga Transaction will be successful or not. It is a go/no-go point. If it is successful, the saga will be successful. Basically, the pivot transaction can be either the last compensating transaction or the first retryable transaction. See the green elements with question marks in the diagram above.

Retryable transactions

Transactions within the Saga should implement some form of retry if the associated service is not available or is subject to intermittent outages. This is especially important in the case of 3rd Party APIs that may be not available for some duration. Availability or failure conditions can be mitigated with the Circuit Breaker Pattern or Exponential Backoff.

Circuit Breaker Pattern & Exponential Backoff

The Circuit Breaker Pattern to detect failures and encapsulating logic of preventing failure when certain services are unavailable. The concept involves opening the circuit to help prevent events from having an effect on downstream services. Retrying will occur after some predetermined period. If connectivity is achieved, then the circuit will be closed allowing events to be successfully processed. This prevents resource exhaustion of the system and having failures ripple through the system.

Circuit Breaker Pattern

This pattern can also be used in conjunction with an Exponential Backoff approach where the time between retries is adjusted. For example, the initial retry period might be 1 second to start, then it becomes 2 seconds, then 4 seconds, etc. The exact numbers is dependent on the use case but the principle can be applied to help enforce a fair distribution of access to resources and prevent network congestion.

Semantic Locks

Semantic locks are an application level lock that has a semaphore to indicate that an update on this record is in progress. This will prevent multiple Saga pattern transactions from acting on the same record to lead to inconsistent data. This can be a field in the data store record to indicate that it is in some some state (ORDER_PENDING, PAYMENT_PENDING, etc.) or otherwise in a current transaction.

Semantic locks will help the system avoid a lack of isolation which can cause:

  • Lost updates: Two or more transactions updating the same data
  • Dirty reads: Avoid reading data that is currently involved in a running transaction
  • Non-repeatable reads: rereading the same record does not produce the same result

Applying Event-Driven Architecture to Modernization Efforts

The event-driven architecture and microservices approach lends itself to the Strangler Fig Pattern when migrating and modernizing applications. In modernizing an existing monolithic application starts by placing an API Gateway or an AWS Migration Hub Refactor Spaces in front where all requests will be forwarded through this new “front door” to the monolith. Then by applying a microservices architecture, the migration/modernization effort can be started by creating new microservices to gradually replace the monolith as described with the Strangler Fig Pattern. This allows a microservices based event-driven architecture to be achieved with lower risk and to show value earlier than a big bang approach.

Placing an existing Monolith application behind an API Gateway with AWS Migration Hub Refactor Spaces could look like the following:

Demonstrating a Strangler Fig implementation with AWS Migration Hub Refactor Spaces

In the above implementation, the Monolith application is left in it’s entirety but individual webservices(or monolith calls) can be replaced one by one with new microservices. Either with serverless lambdas or even EKS (Elastic Kubernetes Service) microservices. For more information, please see Safely Modernizing a Monolith with AWS Migration Refactor Spaces .

Conclusion

This article has presented an overview of Event-Driven Architecture and the Saga Pattern along with some of the different nuances to be considered when building out solutions with these techniques.

Do not hesitate to reach out to us at Accolite Digital for more information about your organization’s cloud and digital transformation needs and how we can help you.

--

--

Collin Smith

AWS Ambassador/Solutions Architect/Ex-French Foreign Legion