One of the most exciting and underrated things to come out of AWS Re:Invent 2020 is one of the most foundational ones! GP3 EBS volumes are here! Why should anyone care? It turns out that it can have a dramatic effect of how you optimize cost in AWS...
One of the most versatile AWS storage types is the Elastic Block Store (EBS) Volume, the AWS equivalent of SAN (Storage Area Network) storage. EBS is the most common (and sometimes only...) method to add storage to a EC2 Compute Instance (AWS equivalent of a virtual machine). EBS is so foundational and ubiquitous as a service, that it is typically the 2nd largest expense on the monthly cloud bill for new adopters, behind EC2, or the compute instance itself.
Within EBS, there are several different storage types that vary in performance, scale, and use case, with the typical default of General Purpose V2 class storage (GP2) for many deployments. GP2 is SSD based storage with highly burst-able features…a way to buffer "spiky" reads and writes, using caching. GP2 is simple to deploy, highly reliable, simple to manage and works for a wide variety of typical use cases.
All is not perfect with GP2, as it also has some important limitations:
Usage of the burst cache is measured in AWS CloudWatch Metrics through the burst balance metric, and shows how often an instance "leans" into the buffer cache to maintain disk performance. This is one of the first metrics that we measure in Cloud Cost Optimization services, because it is such an important measurement of actual usage and user experience. To highlight how dramatically the burst balance can effect the sustained performance of an application, consider a 20GB GP2 volume has 3,000 IOPS burst-ability, but when exhausted, that volume drops to 100 IOPS of sustained performance. After 30 minutes of sustained 3,000 IOPS performance, the burst balance would be exhausted and sustained performance would drop to only 3% of the "norm". Not 3% less performance, 3% of "typical" performance...
In GP2, the total performance that the volume can provide is a function of the total allocation of the volume provisioned (in GB). The allocation directly affects baseline performance, while using the earlier discussed burst balance (cache) to "smooth" out performance spikes. An example table highlights this:
Note that the burst performance doesn't change across the volume allocation spectrum. What do you do if you need 1,500 baseline IOPS, on a 100GB volume? Provision 500 GiBs and let 400 be wasted…
Note that the burst performance doesn't change across the volume allocation spectrum. What do you do if you need 1,500 baseline IOPS, on a 100GB volume? Sadly, you provision 500 GiBs and let 400 be wasted…
Not all EC2 instances can communicate with the EBS service at the EBS volume's full potential. In fact, unless you are operating on EBS-Optimized instances, the EBS volume traffic will contend with traditional network traffic over the same network interface card. In the case of a t2.micro, network performance can be constrained between 50Mbit-300Mbit…for both storage AND network traffic. For these situations, the end user ends up utilizing a higher cost EC2 instance, compounding monthly waste. Additionally, only EBS-optimized instances carve out dedicated bandwidth out for EBS volumes, under a separate network path.
GP3 was designed to address all three of these problems…for 20% lower cost, then GP2. Let's look at how…Burst Performance is now scalable to 16,000 IOPS (instead of 3,000 IOPS). While the default burst performance is unchanged from 3,000 IOPS/sec, users can reserve additional burst performance up to 16,000 IOPS/sec for an additional fee.
Volume Performance can now be configured from 3,000 IOPS/sec (note: this is base, not burstable like GP2) up to 16,000 IOPS! Speed can be configured from 125MiBs to 1,000 MiBs independent of volume size.
GP3 Volumes now communicate over a separate network path instead of sharing the allocated network bandwidth
Existing volumes can be migrated in place through an API call without downtime or disruption! This has created a no risk scenario with no apparent downsides. Additionally, all snapshot capabilities are fully supported under existing lifecycle management policies. Did I mention that it is 20% lower in cost than GP2? This is proof that a few great things did come out of 2020. For the vast majority of workloads that were either over-provisioned for performance or migrated to IO2 (the most performant, and expensive tier of EBS), GP3 is now a lower cost, no compromise solution.
Of course, there will still be a few of the highest performance use cases that need more than 16,000 IOPS, they can stay on IO2…but for everyone else...GP3 is here and an important new tool for Cloud Cost Optimization.