Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
October 16, 2022 07:27 am GMT

Using AWS Step Functions To Implement The SAGA Pattern

Introduction

In this post I will walk you through how to leverage AWS Step Functions to implement the SAGA Pattern.

Put simply, the Saga pattern is a failure management pattern, that provides us the means to establish semantic consistency in our distributed applications by providing compensating transactions for every transaction where you have more than one collaborating services or functions.

For our use case, imagine we have a workflow that goes as the following:

  • The user books a hotel
  • If that succeeds, we want to book a flight
  • If booking a flight succeeds we want to book a rental
  • If booking a rental succeeds, we consider the flow a success.

As you may have guessed, this is the happy scenario. Where everything went right (shockingly ...).

However, if any of the steps fails, we want to undo the changes introduced by the failed step, and undo all the prior steps if any.

What if booking the hotel step failed? How do we proceed? What if the booking hotel step passes but booking a flight fails? We need to be able to revert the changes.

Example:

  1. User books a hotel successfully
  2. Booking the flight failed
  3. Cancel the flight (assuming the failure happened after we saved the flight record in the database)
  4. Cancel the hotel record
  5. Fail the machine

AWS Step functions can help us here, since we can implement these functionalities as steps (or tasks). Step functions can orchestrate all these transitions easily.

Deploying The Resources

You will find the code repository here.

Please refer to this section to deploy the resources.

For the full list of the resources deployed, check out this table.

DynamoDB Tables

In our example, we are deploying 3 DynamoDB tables:

  • BookHotel
  • BookFlight
  • BookRental

The following is the code responsible for creating the BookHotel table

module "book_hotel_ddb" {  source         = "./modules/dynamodb"  table_name     = var.book_hotel_ddb_name  billing_mode   = var.billing_mode  read_capacity  = var.read_capacity  write_capacity = var.write_capacity  hash_key       = var.hash_key  hash_key_type  = var.hash_key_type  additional_tags = var.book_hotel_ddb_additional_tags}

Lambda Functions

We will be relying on 6 Lambda functions to implement our example:

  • BookHotel
  • BookFlight
  • BookRental
  • CancelHotel
  • CancelFlight
  • CancelRental

The functions are pretty simple and straightforward.

BookHotel Function

exports.handler = async (event) => {  ...  const {    confirmation_id,    checkin_date,    checkout_date  } = event...  try {    await ddb.putItem(params).promise();    console.log('Success')  } catch (error) {    console.log('Error: ', error)    throw new Error("Unexpected Error")  }  if (confirmation_id.startsWith("11")) {    throw new BookHotelError("Expected Error")  }  return {    confirmation_id,    checkin_date,    checkout_date  };};

For the full code, please checkout the index.js file

As you can see, the function expects an input of the following format:

  • confirmation_id
  • checkin_date
  • checkout_date

The function will create an item in the BookHotel table. And it will return the input as an output.

To trigger an error, you can create a confirmation_id that starts with '11' this will throw a custom error that the step function will catch.

CancelHotel Function

const AWS = require("aws-sdk")const ddb = new AWS.DynamoDB({ apiVersion: '2012-08-10' });const TABLE_NAME = process.env.TABLE_NAMEexports.handler = async (event) => {    var params = {        TableName: TABLE_NAME,        Key: {            'id': { S: event.confirmation_id }        }    };    try {        await ddb.deleteItem(params).promise();        console.log('Success')        return {            statusCode: 201,            body: "Cancel Hotel uccess",        };    } catch (error) {        console.log('Error: ', error)        throw new Error("ServerError")    }};

This function simply deletes the item that was created by the BookHotel function using the confirmation_id as a key.

We could have checked if the item was created. But to keep it simple, and I am assuming that the failure of the Booking functions always happen after the records were created in the tables.

NOTE: The same logic goes for all the other Book and Cancel functions.

Reservation Step Function

# Step Functionmodule "step_function" {  source = "terraform-aws-modules/step-functions/aws"  name = "Reservation"  definition = templatefile("${path.module}/state-machine/reservation.asl.json", {    BOOK_HOTEL_FUNCTION_ARN    = module.book_hotel_lambda.function_arn,    CANCEL_HOTEL_FUNCTION_ARN  = module.cancel_hotel_lambda.function_arn,    BOOK_FLIGHT_FUNCTION_ARN   = module.book_flight_lambda.function_arn,    CANCEL_FLIGHT_FUNCTION_ARN = module.cancel_flight_lambda.function_arn,    BOOK_RENTAL_LAMBDA_ARN     = module.book_rental_lambda.function_arn,    CANCEL_RENTAL_LAMBDA_ARN   = module.cancel_rental_lambda.function_arn  })  service_integrations = {    lambda = {      lambda = [        module.book_hotel_lambda.function_arn,        module.book_flight_lambda.function_arn,        module.book_rental_lambda.function_arn,        module.cancel_hotel_lambda.function_arn,        module.cancel_flight_lambda.function_arn,        module.cancel_rental_lambda.function_arn,      ]    }  }  type = "STANDARD"}

This is the code that creates the step function. I am relying on a terraform module to create it.

This piece of code, will create a step function with the reservation.asl.json file as a definition. And in the service_integrations, we are giving the step function the permission to invoke the lambda functions (since these functions are all part of the step function workflow)

Below is the full diagram for the step funtion:

Step Function Diagram

The reservation.asl.json is relying on the Amazon State language.

If you open the file, you will notice on the second line the "StartAt" : "BookHotel". This tells the step functions to start at the BookHotel State.

Happy Scenario

"BookHotel": {    "Type": "Task",    "Resource": "${BOOK_HOTEL_FUNCTION_ARN}",    "TimeoutSeconds": 10,    "Retry": [        {            "ErrorEquals": [                "States.Timeout",                "Lambda.ServiceException",                "Lambda.AWSLambdaException",                "Lambda.SdkClientException"            ],            "IntervalSeconds": 2,            "MaxAttempts": 3,            "BackoffRate": 1.5        }    ],    "Catch": [        {            "ErrorEquals": [                "BookHotelError"            ],            "ResultPath": "$.error-info",            "Next": "CancelHotel"        }    ],    "Next": "BookFlight"},

The BookHotel state is a Task. With a "Resource" that will be resolved to the BookHotel Lambda Function via terraform.

As you might have noticed, I am using a retry block. Where the step function will retry executing the BookHotel functions up to 3 times (after the first attempt) in case of an error that is equal to any of the following errors:

  • "States.Timeout"
  • "Lambda.ServiceException"
  • "Lambda.AWSLambdaException"
  • "Lambda.SdkClientException"

You can ignore the "Catch" block for now, we will get back to it in the unhappy scenario section.

After the BookHotel task is done, the step function will transition to the BookFlight, as specified in the "Next" field.

"BookFlight": {    "Type": "Task",    "Resource": "${BOOK_FLIGHT_FUNCTION_ARN}",    "TimeoutSeconds": 10,    "Retry": [        {            "ErrorEquals": [                "States.Timeout",                "Lambda.ServiceException",                "Lambda.AWSLambdaException",                "Lambda.SdkClientException"            ],            "IntervalSeconds": 2,            "MaxAttempts": 3,            "BackoffRate": 1.5        }    ],    "Catch": [        {            "ErrorEquals": [                "BookFlightError"            ],            "ResultPath": "$.error-info",            "Next": "CancelFlight"        }    ],    "Next": "BookRental"},

The BookFlight state follows the same pattern. As we retry invoking the BookFlight function if we face any of the errors specified in the Retry block. If no error is thrown the step function will transition to the BookRental state.

"BookRental": {    "Type": "Task",    "Resource": "${BOOK_RENTAL_LAMBDA_ARN}",    "TimeoutSeconds": 10,    "Retry": [        {            "ErrorEquals": [                "States.Timeout",                "Lambda.ServiceException",                "Lambda.AWSLambdaException",                "Lambda.SdkClientException"            ],            "IntervalSeconds": 2,            "MaxAttempts": 3,            "BackoffRate": 1.5        }    ],    "Catch": [        {            "ErrorEquals": [                "BookRentalError"            ],            "ResultPath": "$.error-info",            "Next": "CancelRental"        }    ],    "Next": "ReservationSucceeded"},

The BookRental state follows the same pattern. Again we retry invoking the BookRental function if we face any of the errors specified in the Retry block. If no error is thrown the step function will transition to the ReservationSucceeded state.

"ReservationSucceeded": {    "Type": "Succeed" },

The ReservationSucceeded, is a state with Succeed type.
In this case it terminates the state machine successfully

Happy scenario

Unhappy Scenarios

Oh no BookHotel failed

As you recall, in the BookHotel state, I included a Catch block. In the BookHotel function, if the confirmation_id starts with 11, a custom error of BookHotelError type will be thrown. This "Catch block" will catch it, and will use the state mentioned in the "Next" field, which is the CancelHotel in this case.

"CancelHotel": {    "Type": "Task",    "Resource": "${CANCEL_HOTEL_FUNCTION_ARN}",    "ResultPath": "$.output.cancel-hotel",    "TimeoutSeconds": 10,    "Retry": [        {            "ErrorEquals": [                "States.Timeout",                "Lambda.ServiceException",                "Lambda.AWSLambdaException",                "Lambda.SdkClientException"            ],            "IntervalSeconds": 2,            "MaxAttempts": 3,            "BackoffRate": 1.5        }    ],    "Next": "ReservationFailed"},

The CancelHotel is a "Task" as well, and has a retry block to retry invoking the function in case of an unexpected error. The "Next" field instructs the step function to transition to the "ReservationFailed" state.

"ReservationFailed": {    "Type": "Fail"}

The "ReservationFailed" state is a Fail type, it will terminate the machine and mark it as "Failed".

BookHotel failed

BookFlight is failing

We can instruct the BookFlight lambda function to throw an error by passing a confirmation_id that starts with 22.

The BookFlight step function task, has a Catch block, that will catch the BookFlightError, and instruct the step function to transition to the CancelFlight state.

"CancelFlight": {    "Type": "Task",    "Resource": "${CANCEL_FLIGHT_FUNCTION_ARN}",    "ResultPath": "$.output.cancel-flight",    "TimeoutSeconds": 10,    "Retry": [      {        "ErrorEquals": [          "States.Timeout",          "Lambda.ServiceException",          "Lambda.AWSLambdaException",          "Lambda.SdkClientException"        ],        "IntervalSeconds": 2,        "MaxAttempts": 3,        "BackoffRate": 1.5      }    ],    "Next": "CancelHotel"  },

Similar to the CancelHotel, the CancelFlight state will trigger the CancelFlight lambda function, to undo the changes. Then it will instruct the step function to go to the next step, CancelHotel. And we saw earlier that the CancelHotel will undo the changes introduced by the BookHotel, and will then call the ReservationFailed to terminate the machine.

BookFlight Failed

BookRental is failing

The BookRental lambda function will throw the ErrorBookRental error if the confirmation_id starts with 33.

This error will be caught by the Catch block in the BookRental task. And will instruct the step function to go to the CancelRental state.

"CancelRental": {    "Type": "Task",    "Resource": "${CANCEL_RENTAL_LAMBDA_ARN}",    "ResultPath": "$.output.cancel-rental",    "TimeoutSeconds": 10,    "Retry": [        {            "ErrorEquals": [                "States.Timeout",                "Lambda.ServiceException",                "Lambda.AWSLambdaException",                "Lambda.SdkClientException"            ],            "IntervalSeconds": 2,            "MaxAttempts": 3,            "BackoffRate": 1.5        }    ],    "Next": "CancelFlight"},

Similar to the CancelFlight, the CancelRental state will trigger the CancelRental lambda function, to undo the changes. Then it will instruct the step function to go to the next step, CancelFlight. After cancelling the flight, the CancelFlight has a Next field that instructs the step function to transition to the CancelHotel state, which will undo the changes and call the ReservationFailed state to terminate the machine.

BookRental failed

Conclusion

In this post, we saw how we can leverage AWS Step Functions to orchestrate and implement a fail management strategy to establish semantic consistency in our distributed reservation application.

I hope you found this article beneficial. Thank you for reading ...

Additional Resources


Original Link: https://dev.to/eelayoubi/using-aws-step-functions-to-implement-the-saga-pattern-14o7

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To