Elastic Cloud Infrastructure: Scaling and Automation - Summary Notes

Cloud Load Balancing gives a customer the ability to distribute the load across compute resources in single or multiple Regions to meet their high availability requirements, to put their resources behind a single anycast IP address, and to scale their resources up or down with intelligent autoscaling
Using Cloud Load Balancing, one can serve content as close as possible to their users on a system that can respond to over one million queries per second
Cloud Load Balancing is a fully distributed software defined managed service that is not instance or device based so one does not need to manage a physical load balancing infrastructure
This following illustration shows the different types of Load Balancers:

Fig.1

GCP offers different types of Load Balancers that can be divided into two categories - Global and Regional
The Global Load Balancers are the HTTP(S), SSL Proxy, and TCP Proxy Load Balancers
The Regional Load Balancers are the Internal TCP/UDP, Network TCP/UDP , and Internal HTTP(S) Load Balancers
The Global Load Balancers leverage the Google front ends which are software defined, distributed systems that sit in Google's PoPs and are distributed globally
The Regional Load Balancers are the internal and network Load Balancers and they distribute traffic to instances that are in a single GCP Region
The Internal Load Balancer uses Andromeda which is GCP's Software Defined Network (SDN) virtualization stack and the Network Load Balancer uses Maglev which is a large distributed software system
A Proxy based Regional Layer 7 Load Balancer that enables one to run and scale their services behind a private load balancing IP address that is accessible only in the Load Balancers' Region in their VPC network

This following illustration talks about Managed Instance Groups:

Fig.2

A Managed Instance Group is a collection of identical VM instances that one can control as a single entity using an Instance Template
One can easily update all the instances in the group by specifying a new Template in a rolling update
When an application requires additional compute resources, Managed Instance Groups can automatically scale the number of instances in the group
Managed Instance Groups can work with Load Balancing services to distribute network traffic to all instances in the group
If an instance in the group stops, crashes, or is deleted by an action other than the instance groups command, the Managed Instance Group automatically recreates the instance so it can resume its processing tasks. The recreated instance uses the same name and the same Instance Template as the previous instance
Managed Instance Groups can automatically identify and recreate unhealthy instances in a group to ensure that all the instances are running optimally
Regional Managed Instance Groups are generally recommended over Zonal Managed Instance Groups because they allow one to spread the application load across multiple Zones within a Region instead of confining the application to a single Zone or a user having to manage multiple Instance Groups across different Zones. This type of replication protects against Zonal failures and unforeseen scenarios where an entire group of instances in a single Zone malfunctions
This following illustration shows the creation of an Instance Template:

Fig.3

To create a Managed Instance Group, a user first needs to create an Instance Template. Then, they are going to create a Managed Instance Group of a specific number of instances
This following illustration shows the creation of an Instance Group:

Fig.4

When creating an Instance Group, a user defines specific rules for the Instance Group, such as, decide whether the Instance Group is going to be single or multi Zoned and where those locations will be, choose the ports that to allow and load balance across, select the Instance Template to use, decide whether to use auto-scale and under what circumstances, finally consider creating a Health Check to determine which instances are healthy and should receive traffic
This following illustration shows auto scaling with Managed Instance Groups:

Fig.5

Managed Instance Groups offer autoscaling capabilities that allow one to automatically add or remove instances from a Managed Instance Group based on increase or decrease in load
Autoscaling helps an applications gracefully handle increase in traffic and reduces cost when the need for resource is lower
One just defines the autoscaling policy and the autoscaler performs automatic scaling based on the measured load
Applicable autoscaling policies include scaling based on CPU Utilization, Load Balancing Capacity, Monitoring Metrics, or by a Queue-based Workload like Cloud Pub/Sub
For example, assume we have two instances that are at 100 percent and 85 percent CPU utilization as shown in Fig.5 above. If the target CPU utilization is 75 percent, the autoscaler will add another instance to spread out the CPU load and stay below the 75 percent target CPU utilization. Similarly if the overall load is much lower than the target, the autoscaler will remove instances as long as that keeps the overall utilization below the target
This following illustration shows the creation of a Health Check:

Fig.6

A Health Check is very similar to an Uptime Check in Stackdriver. A user just defines a protocol, port, and health criteria as shown in the Fig.6 above. Based on this configuration, GCP computes a health state for each instance
The health criteria defines how often to check whether an instance is healthy (Check Interval), how long to wait for a response (Timeout), how many successful attempts are decisive (Healthy Threshold), how many failed attempts are decisive (Unhealthy Threshold)
From the Fig.6 above, the Health Check would have to fail twice over a total of 15 seconds before an instance is considered unhealthy

HTTPS Load Balancing which acts at Layer 7 of the OSI model which is the Application Layer that deals with the actual content of each message allowing for routing decisions based on the URL
This following illustration talks about HTTPS Load Balancing:

Fig.7

HTTPS Load Balancing provides Global Load Balancing for HTTP/HTTPS requests destined for a customers instances that are available at a single any cast IP address (simplifies DNS setup)
HTTPS Load Balancing balances HTTP (port 80 0r 8080) and HTTPS (port 443) traffic across multiple backend instances and across multiple Regions
HTTPS Load Balancing supports both IPv4 and IPv6 clients, is scalable, requires no pre-warming, and enables content-based and cross-Regional load balancing
One can configure their own Route Maps that route some URLs to one set of instances and route other URLs to other instances
Requests are generally routed to the Instance Group that is closest to the client
If the closest Instance Group does not have sufficient capacity, the request is sent to the next closest Instance Group that does have the capacity
This following illustration shows the architecture of an HTTPS Load Balancer:

Fig.8

A Global Forwarding Rule directs incoming requests from the Internet to a target HTTP Proxy. The target HTTP Proxy checks each request against a URL Map to determine the appropriate backend service for the request. The backend service directs each request to an appropriate backend based on serving capacity and Zone
This following illustration talks about Backend services:

Fig.9

The backend services contain a Health Check, Session Affinity, a Timeout setting, and one or more Backends
A Health Check polls instances attached to the Backend service at configured intervals. Instances that pass the Health Check are allowed to receive new requests. Unhealthy instances are not sent requests until they are healthy again
By default, HTTPS Load Balancing uses a Round Robin algorithm to distribute requests among available instances
One can configure HTTPS Load Balancing to use Session Affinity algorithm. Session affinity attempts to send all requests from the same client to the same VM Instance
Backend services also have a Timeout setting which is set to 30 seconds by default. This is the amount of time the Backend service will wait on the Backend before considering the request a failure
The Backends themselves contain an Instance Group, a Balancing Mode, and a Capacity Scalar
An Instance Group contains VM Instances. The Instance Group may be a Managed Instance Group with or without autoscaling or an Unmanaged Instance Group
A Balancing Mode tells the load balancing system how to determine when the Backend is at full usage
If all the Backends for the backend service in a Region are at the full usage, new requests are automatically routed to the nearest Region that can still handle requests
The Balancing Mode can be based on CPU utilization or requests per second
A Capacity setting is an additional control that interacts with the Balancing Mode setting
As an example, if a user normally wants their instances to operate at a maximum of 80 percent CPU utilization, they would set your Balancing Mode to 80 percent CPU utilization and their Capacity to 100 percent. If they want to cut the instance utilization in half, they could leave the Balancing Mode at 80 percent CPU utilization and set Capacity to 50 percent
This following illustration talks about SSL around HTTPS Load Balancing:

Fig.10

An HTTPS Load Balancer requires at least one Signed SSL certificate to be installed on the target HTTPS Proxy for the load balancer. The client SSL session terminates at the load balancer
One can configure the target Proxy with up to 10 SSL certificates
HTTPS Load Balancer support the QUIC transport layer protocol
QUIC is a transport layer protocol that allows faster client connection initiation, eliminates head of line blocking in multiplexed streams, and supports connection migration when a client's IP address changes

SSL Proxy is a Global Load Balancing service for encrypted, non-HTTP traffic. This load balancer terminates user SSL connections at the load balancing layer, then balances the connections across a users VM instances using the SSL or TCP protocols
This following illustration talks about SSL Proxy Load Balancing:

Fig.11

The customers VM instances can be in multiple Regions and the Load Balancer automatically directs traffic to the closest region that has capacity
SSL Proxy Load Balancing supports both IPv4 and IPv6 addresses for client traffic and provides Intelligent Routing, Certificate Management, Security Patching, and SSL Policies
Intelligent Routing means that this load balancer can route requests to backend locations where there is capacity
From a Certificate Management perspective, the user only needs to update their customer facing certificate in one place
If vulnerabilities arise in the SSL or TCP stack, GCP will apply patches at the Load Balancer automatically in order to keep the user instances safe

Network Load Balancing is a Regional non-proxied load balancing service
This following illustration talks about Network Load Balancing:

Fig.13

All traffic is passed through the Load Balancer instead of being proxied and traffic can only be balanced between VM instances that are in the same Region unlike a Global Load Balancer
Network Load Balancing service uses Forwarding Rules to balance the load on a users system based on incoming IP protocol data such as address, port, and protocol type
One can use the Network Load Balancing service to load balance UDP, TCP, and SSL traffic on ports that are not supported by the TCP proxy and SSL Proxy Load Balancers
The Backends of a Network Load Balancer can be a template-based Instance Group or Target Pool resource
This following illustration talks about Target Pool resource:

Fig.14

A Target Pool resource defines a group of instances that receive incoming traffic from forwarding rules. When a forwarding rule directs traffic to a Target Pool, the Load Balancer picks an instance from the Target Pool based on hash of the source IP and port, and the destination IP and port
A Target Pool can only be used with Forwarding Rules that handled TCP and UDP traffic
Each Project can have up to 50 Target Pools and each Target Pool can have only one Health Check
All the instances of a Target Pool must be in the same Region (same limitation as for the Network Load Balancer)

Internal Load Balancing is a Regional, private IP address load balancing service for TCP and UDP based traffic
This following illustration talks about Internal Load Balancing:

Fig.15

Internal Load Balancing is only accessible through the internal IP address of VM instances that are in the same Region
Use Internal Load Balancing to configure an internal IP address load balancer to act as the front end to the private backend instances
Internal Load Balancing often results in lower latency because all the load balanced traffic stays within Google's network
This following illustration talks about Software Defined Networking:

Fig.16

Internal Load Balancing is not based on a device or a VM instance. Instead, it is a Software-Defined, fully distributed load balancing solution
From Fig.16 above, the diagram on the left is a traditional Proxy model of Internal Load Balancing in which one configures an internal IP address on a Load Balancing device or VM instances and the client instance connects to this IP address. Traffic coming to the IP address is terminated at the Proxy Load Balancer and the Load Balancer selects a backend to establish a new connection to. Essentially, there are two connections - one between the client and the Load Balancer, and the other between the Load Balancer and the backend
From Fig.16 above, the diagram on the right shows the Internal Load Balancing that distributes the client requests to the different Backend instances. It uses a lightweight Load Balancer built on top of Andromeda, Google's network virtualization stack, to provide software-defined load balancing that directly delivers the traffic from the client to a backend instance
This following table shows an example 3-tier web services architecture using Internal Load Balancing:

Fig.17

The example in Fig.17 above illustrates a 3-tier web services architecture that uses an external facing HTTPS Load Balancer that provides a single global IP address for clients. The backends (Web Tier) of this load balancer are located in 2 Regions. These backends then access an Internal Load Balancer in each Region as the application (Internal Tier). The benefit of this 3-tier approach is that neither the Database Tier nor the Application Tier is exposed externally. The simplifies security and network pricing
This following illustration talks about IPv6 termination:

Fig.18

One differentiator between the different GCP Load Balancers is the support for IPv6 clients. Only the HTTPS, SSL Proxy, and TCP Proxy Load Balancing services support IPv6 clients. IPv6 termination for these load balancers enables one to handle IPv6 requests from the clients and proxy them over IPv4 backends
This following illustration shows a decision tree in choosing a Load Balancer:

Fig.19

To choose which Load Balancer best suits a need, one needs to consider the following aspects of Cloud Load Balancing, namely, Global versus Regional load balancing, External versus Internal load balancing, and the traffic type
If the traffic type is HTTP or HTTPS traffic, choose the HTTPS Load Balancing service as a Layer 7 load balancer
If the traffic type is TCP or UDP, determine whether the SSL Proxy, TCP Proxy, or Network Load Balancing service meets the need
For internal load balancing needs, choose the Internal Load Balancing service which supports both TCP and UDP traffic
This following illustration summarizes the Load Balancing options:

Fig.20