Google Cloud Certified Professional Cloud Architect Study Guide. Dan Sullivan
Читать онлайн книгу.about the current solution. You might want to ask technical questions so that you can start eliminating options. You may even think that you've solved this kind of problem before and you just have to pick the right architecture pattern. Resist those inclinations if you have them. All architecture design decisions must be made in the context of business requirements.
Business requirements define the operational landscape in which you will develop a solution. Example business requirements are as follows:
The need to reduce capital expenditures
Accelerating the pace of software development
Reporting on service-level objectives
Reducing time to recover from an incident
Improving compliance with industry regulations
Business requirements may be about costs, customer experience, or operational improvements. A common trait of business requirements is that they are rarely satisfied by a single technical decision.
Reducing Operational Expenses
Reducing operational expenses may be satisfied by using managed services instead of operating services yourself, accepting different services commitments such as preemptible virtual machines and Pub/Sub Lite, and using services that automatically scale to load.
Managed services reduce the workload on systems administrators and DevOps engineers because they eliminate some of the work required when managing your own implementation of a platform. Note that while managed services can reduce costs, that is not always the case; if cost is a key driver for selecting a managed service, it is important to verify that managed services will actually cost less. A database administrator, for example, would not have to spend time performing backups or patching operating systems if they used Cloud SQL instead of running a database on Compute Engine instances or in their own data center. BigQuery is a widely used data warehouse and analytics managed service that can significantly reduce the cost of data warehousing by eliminating many database administrator tasks, such as managing storage infrastructure.
Some services have the option of trading some availability, scalability, or reliability features for lower costs. Preemptible VMs, for example, are low-cost instances that can be shut down at any time but can run up to 24 hours before they will be preempted, that is, shut down and no longer available to you. They are a good option for batch processing and other tasks that are easily recovered and restarted. Pub/Sub Lite can be an order of magnitude less expensive than Pub/Sub but comes with lower availability and durability. Pub/Sub Lite is recommended only when the cost savings justify additional operational work to reserve and manage resource capacity.
Autoscaling enables engineers to deploy an adequate number of resources needed to meet the load on a system. In a Compute Engine Managed Instance Group, additional virtual machines are added to the group when demand is high; when demand is low, the number of instances is reduced. With autoscaling, organizations can stop pre-purchasing infrastructure to meet peak capacity and can instead scale their infrastructure to meet the immediate need. With Cloud Run, when a service is not receiving any traffic, the revision of that service is scaled to zero and no costs are incurred.
Accelerating the Pace of Development
Successful businesses are constantly innovating. Agile software development practices are designed to support rapid development, testing, deployment, and feedback.
A business that wants to accelerate the pace of development may turn to managed services to reduce the operational workload on their operations teams. Managed services also allow engineers to implement services, such as image processing and natural language processing, which they could not do on their own if they did not have domain expertise on the team.
Continuous integration and continuous delivery are additional practices within software development. The idea is that it's best to integrate small amounts of new code frequently so that it can be tested and deployed rather than trying to release many changes at one time. Small releases are easier to review and debug. They also allow developers to get feedback from colleagues and customers about features, performance, and other factors.
As an architect, you may have to work with monolithic applications that are difficult to update in small increments. In that case, there may be an implied business requirement to consider decomposing the monolithic application into a microservice architecture. If there is an interest in migrating to a microservice architecture, then you will need to decide if you should migrate the existing application into the cloud as is, known as lift and shift, or you should begin transforming the application during the cloud migration. Alternatively, you could also rebuild on the cloud using cloud-native design without migrating, which is known as rip and replace.
There is no way to decide about this without considering business requirements. If the business needs to move to the cloud as fast as possible to avoid a large capital expenditure on new equipment or to avoid committing to a long-term lease in a co-location data center or if the organization wants to minimize change during the migration, then lift and shift is the better choice. Most importantly, you must assess if the application can run in the cloud with minimal modification. Otherwise, you cannot perform a lift-and-shift migration.
If the monolithic application is dependent on deprecated components and written in a language that is no longer supported in your company, then rewriting the application or using a third-party application is a reasonable choice.
Reporting on Service-Level Objectives
The operational groups of a modern business depend on IT applications. A finance department needs access to accounting systems. A logistics analyst needs access to data about how well the fleet of delivery vehicles is performing. The sales team constantly queries and updates the customer management system. Different business units will have different business requirements around the availability of applications and services.
A finance department may only need access to accounting systems during business hours. In that case, upgrades and other maintenance can happen during off-hours and would not require the accounting system to be available during that time. The customer management system, however, is typically used 24 hours a day, every day. The sales team expects the application to be available all the time. This means that support engineers need to find ways to update and patch the customer management system while minimizing or even avoiding downtime.
Requirements about availability are formalized in service-level objectives (SLOs). SLOs can be defined in terms of availability, such as being available 99.9 percent of the time. A database system may have SLOs around durability or the ability to retrieve data. For example, the human resources department may have to store personnel data reliably for seven years, and the storage system must guarantee that there is a less than 1 in 10 billion chances of an object being lost. Interactive systems have performance-related SLOs. A web application SLO may require a page loading average response time of 2 seconds with a 95th percentile of 4 seconds.
Logging and monitoring data are used to demonstrate compliance with SLOs. The Cloud Logging service collects information about significant events, such as a disk running out of space. Cloud Monitoring collects metrics from infrastructure, services, and applications such as average CPU utilization during a particular period of time or the number of bytes written to a network in a defined time span. Developers can create reports and dashboards using logging details and metrics to monitor compliance with SLOs. These metrics are known as service-level indicators (SLIs).
Reducing Time to Recover from an Incident
Incidents, in the context of IT services, are a disruption that causes a service to be degraded or unavailable. An incident can be caused by single factors, such as an incorrect configuration. Often, there is no single root cause of an incident. Instead, a series of failures and