This article will help you to understand how to make the workflow you generate more fault tolerant.
Customers are using this solution to deploy, operate and govern AWS multi account estates. These estates are made up of several regions and hundreds of accounts.
When building a workflow using this solution you can use dependencies to say only perform an action after another action has completed successfully. By default, every sub action of the dependency must complete successfully before the dependent action is instructed to start.
When inflating your accounts you may want to enable AWS Config before enabling AWS Config rules. By default, if the enabling of Config fails in one region of one account no Config rules will be deployed. This is great for rapid feedback - fail fast - but in production this means you have regions of accounts with missing guardrails.
The rest of this section explains your options for improving fault tolerance along with the costs of each option.
Using spoke execution mode is the quickest way to improve fault tolerance. When you configure an action to be performed using spoke execution mode you are delegating execution to the spoke. This solution will fan out the spoke actions to each spoke in parallel in a single batch. Each spoke does not affect another - meaning a failure in one spoke will have no impact to others. However, within the spoke a failure enabling Config in one region will mean no rules enabled in other regions. To further improve fault tolerance you will need to use fine grain depends on.
Enabling spoke execution mode means errors will not be visible in your pipeline output. You will need to log into the spoke account to view the execution there or you will need to use AWS Ops Center in the hub account to see what went wrong. This can increase operational complexity.
Enabling spoke execution mode typically reduces execution time considerably. We have seen it reduce execution time by more than a factor of 8.
When you say AWS Config rules depends on AWS Config enablement the framework will enable AWS Config across all regions of all accounts before attempting to enable any AWS Config rules. Using fine grain depends on you can change that behaviour. You can say:
This can be useful when deploying / configuring critical services within you multi account estate but it can increase the execution time of your workflow. We have seen it add 30 - 60 mins of execution time to estates with 6 regions, 200 accounts and 8 actions.