AWS Well-Architected Framework
Software design is one level below software architecture. Before discussing the design principles, let's briefly discuss software architecture—engineering enterprise software solutions in many ways building civil engineering systems such as building bridges. If the foundation is not architected, designed, and built-in a proper engineering way, the structural building problem may undermine the integrity and function of the building. Or it may cause extension, modification, and repair to be expensive. Building software systems have two types of requirements: functional and non-functional. The software architecture addresses non-functional requirements, for example, performance, reliability, scalability, security, etc.
Cloud systems also have functional and non-functional requirements. What it means is that when building software systems on the cloud, we need to consider quality attributes to build well-architected software solutions. Now the question is what those quality attributes are that we need to consider when building software solutions on the cloud platform.
Table of Contents
Architecting Software Solutions on the AWS Cloud
Screenshot from: https://aws.amazon.com/architecture/well-architected/
According to a blog on the AWS Partner Network (https://aws.amazon.com/blogs/apn/the-6-pillars-of-the-aws-well-architected-framework/), there are six pillars of the well-architected framework. These are operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability. Therefore, architecting systems by focusing on these six pillars help produce efficient and stable systems. So, let's discuss these quality attributes for building well-architected solutions on the AWS cloud platform.
Operational Excellence
The Operational Excellence pillar of the AWS Well-Architected framework includes supporting the development team and effectively running workloads most efficiently. The support to the development team and effectively running workloads are critical to a successful cloud platform. The reason is that your engineers will be using the AWS cloud platform to do all their development work. In addition, depending on your business area, you will be running different workloads such as analytic jobs, machine learning-related models, and many other operations. The important points are you need to continuously gain insights to improve the processes and procedures to deliver business value.
There is a saying from the Greek philosopher Heraclitus: “The only constant in life changes.” Things change: your customers' requirements and business context may change. Therefore, it is essential to design operations in such a way as they can evolve quickly with the change and incorporate changes from the insights.
Key design principles to consider for the Operational Excellence pillar:
• Perform operations as code
• Make frequent, small, reversible changes
• Refine operations procedures frequently
• Anticipate failure
• Learn from all operational failures
Security
Security is another essential pillar to consider for well-architected solutions on the AWS platform. The Security pillar includes the ability to protect data and IT assets. You can leverage various AWS security-related services such as IAM, KMS, and other related services to provide security to your solutions. It would be best if you had proper procedures to manage any security incidents. Strong security and operations to handle security incidents help mitigate financial loss and help comply with regulatory obligations.
AWS has a concept of the Shared Responsibility Model. What it means is that the AWS platform protects your physical infrastructure. As a result, this helps you focus on using AWS services to achieve your business goals and not being concerned or responsible for the security of the physical infrastructure, such as servers and other components of a data center.
Key design principles to consider for the Security pillar:
• Implement a strong identity foundation
• Enable traceability
• Apply security at all layers
• Automate security best practices
• Protect data in transit and at rest
• Keep people away from data
• Prepare for security events
Reliability
Reliability is another pillar of well-architected solutions on the cloud. Reliability emphasizes the ability of a system to operate without any failure. Before running your workloads, testing what resources are required for compute, storage, and network helps run the workloads reliably in production. Cloud by design has theoretically unlimited resources. That means you should easily find the resources and services you need to build reliable solutions, for example, AWS Auto-Scaling service, to run workloads without any failure or outage.
To build a reliable system, you will need to anticipate changes such as a spike in workload or changes in the environment – what if the server running workload fails, and other related demands of resources such as extra resources needed when deploying new feature releases. And you will need to take steps such as fault isolation, automated failover to healthy resources, and a disaster recovery strategy to implement resiliency.
Keep these in mind to help you increase reliability:
- Automatically recover from failure
- Test recovery procedures
- Scale horizontally to increase aggregate workload availability
- Stop guessing capacity
- Manage change in automation
Performance Efficiency
The Performance Efficiency pillar includes the ability to use computing resources efficiently to manage the current demand of resources, including when there is a change in requirements – essentially maintaining SLA (Service Level Agreements) by utilizing compute resources efficiently.
The question is how we can ensure we are using resources efficiently. First, we can review our AWS solution to find out if we are using resources efficiently – logs and monitoring will be a good help. We can also review if there is any alternate way to use the system more efficiently. For example, we can make tradeoffs such as compression or caching to improve management resources efficiently.
The following design principles can help you achieve and maintain efficient workloads in the cloud.
• Democratize advanced technologies
• Go global in minutes
• Use serverless architectures
• Experiment more often
• Consider mechanical sympathy
Cost Optimization
Cost Optimization pillars deal with the system's ability to deliver business value at the lowest cost. The point here is not to concede service-level agreements to save costs. Instead, we must review our choices and if there are alternate ways where we can provide the same business value – go for it. That's the essence of the cost optimization pillar.
Some key design principles to manage cost optimization:
- Implement cloud financial management
- Adopt a consumption model
- Measure overall efficiency
- Stop spending money on undifferentiated heavy lifting
- Analyze and attribute expenditure
Sustainability
The Sustainability pillar addresses how in the long-term, architecture manages the change in business requirements, environment, or economic change.
The following are the key design principles when architecting your cloud workloads to maximize sustainability and minimize impact.
• Understand your impact
• Establish sustainability goals
• Maximize utilization
• Anticipate and adopt new, more efficient hardware and software offerings
• Use managed services
• Reduce the downstream impact of your cloud workloads
Well-Architected Framework General Design Principles
The Well-Architected Framework identifies a set of general design principles to facilitate good design in the cloud:
• Stop guessing your capacity needs: Before deploying an application, you often buy expensive idle resources or deal with limited capacity when you plan to make a capacity decision. With cloud computing, you can use and access the suitable capacity — as much or as little capacity as you need. In addition, you can scale up and down very quickly as required. Cloud computing helps you stop guessing capacity.
• Test systems at production scale: In the cloud, you can create a production-scale test environment on-demand to help set up and perform the complete testing of your application. And then, you can release the resources. Simulating a live production environment is much cheaper because you only pay for what resources you use for the testing.
• Automate to make architectural experimentation easier: Automation saves time and money on repetitious tasks and avoids the expense of manual effort when you have to do the same thing next time. In addition, automation helps you track changes, audit the impact, and revert to previous parameters when necessary.
• Allow for evolutionary architectures: In traditional classic enterprise architecture, architectural decisions are often slow and implemented as static, one-time events, with a few major versions of a system during its lifetime — again, a slow and sometimes a bureaucratic process. However, as a business and its context continue to evolve, these initial decisions might hinder the system's ability to deliver changing business requirements. The capability to automate and test on-demand lowers the risk of impact from design changes in the cloud. This allows systems to evolve so that businesses can take advantage of innovations as a standard practice.
• Drive architectures using data: In the cloud, you can log and collect data about how your architectural choices affect the behavior of your workload –cost, performance, etc. This helps you make more fact-based decisions on how to improve your architecture. Your cloud infrastructure is code, so you can use that data to inform your architecture choices and improvements over time.
• Improve through game days: Try simulating events in production by regularly scheduling game days. This will help you understand where improvements can be made and can also help develop organizational experience in dealing with different types of events.
AWS Well-Architected Tool
The AWS Well-Architected Tool guides reviewing the workloads state and compares them to the architectural best practices of AWS. The AWS Well-Architected Tool using the AWS Well-Architected Framework is developed to help cloud architects build secure, high-performing, resilient, and efficient application infrastructure.
To use the AWS Well-Architected Tool, which is available in the AWS Management Console, first define your workload and then answer a set of questions regarding operational excellence, security, reliability, performance efficiency, and cost optimization. The AWS Well-Architected Tool then provides a plan on how to architect for the cloud using established best practices.
The AWS Well-Architected Tool gives you access to knowledge and best practices used by AWS architects whenever you need it. You answer a series of questions about your workload, and the tool delivers an action plan with step-by-step guidance on how to build better workloads for the cloud.
References
- https://aws.amazon.com/blogs/apn/the-6-pillars-of-the-aws-well-architected-framework/
- https://docs.aws.amazon.com/wellarchitected/latest/framework/sec-design.html
- https://docs.aws.amazon.com/wellarchitected/latest/operational-excellence-pillar/design-principles.html
- https://docs.aws.amazon.com/wellarchitected/latest/performance-efficiency-pillar/design-principles.html
- https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/design-principles.html
- https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/design-principles.html
SK Singh is the founder, a software, cloud, and data engineer. He has been involved in the software industry for around 25 years. He has a bachelor's degree in computer science and engineering from India and a master's degree in software engineering from the Pennsylvania State University. SK has been involved in a wide range of software projects for many governments, private, start-ups, and large public companies in various software engineering roles. He has many professional certifications such as AWS, Hadoop, Kafka, Oracle, Unix, Java, Java-related frameworks, and many others related.