Version 2 of AWS' cloud-native relational database brings signficant improvements
SPONSORED FEATURE Application development has transformed in the last few years, with software built for the cloud serving highly volatile workloads from globally distributed users at scale. Serverless computing evolved to support these applications, eliminating traditional performance and capacity management concerns altogether.
The venerable relational database model, which still supports millions of applications across the world today, must keep up with advances in serverless computing. That's why Amazon created a serverless version of its Aurora cloud-native relational database system to support evolving customer demands.
"More customers want to build applications that matter to their end users instead of focusing on managing infrastructure and database capacity," explains Chayan Biswas, principal technical product manager at Amazon Web Services. "Serverless computing is a way they can achieve that very easily."
A history of serverless computing
Amazon first introduced its Lambda serverless computing concept in 2014. It was a natural evolution of the virtualization trend that had gone before it, which eliminated the need to run each application on a separate physical server by abstracting operating systems away from the hardware.
Virtual machines are far more efficient than dedicated hardware servers, compressing applications' computational footprint. But many applications don't run constantly, only needing to operate in response to other events. This is especially true when you break monolithic applications into container-based services.
Lambda serverless computing uses Amazon's Firecracker microVM framework under the hood. It enables developers to call a function without running it on a server and retrieve the result. The underlying framework takes care of the rest.
This offers at least two benefits. The first is that not running a dedicated virtual machine or container for the function reduces the cost of operation. The second is that the underlying container-based infrastructure can quickly scale the function's capacity, maintaining performance even as the volume of events that call it increase.
The birth of the serverless database
Lambda supports cloud-based applications, but customers wanted the next logical step: support for serverless databases. AWS took a serverless version of Aurora MySQL into general availability in 2018, followed by a version for PostgreSQL in 2019.
Amazon Aurora Serverless translates the cost and performance benefits to the database, enabling customers to scale their relational data workloads quickly without disruption. They also pay only for the database resources they use, which is especially useful for applications that don't call the database frequently, like a low-volume blog or development and testing databases.
Beyond this, serverless database operations also save DBAs from having to provision and manage database capacity. This is part of a workload that Amazon calls 'undifferentiated heavy lifting', which is mundane work that doesn't make full use of a DBA's skills. Using serverless automation to abstract this enables DBAs to concentrate on more important tasks like database optimization and data governance.
Before the move to Aurora Serverless, customers would have to scale their databases by manually changing the type of virtual machine that their system ran on. That created an additional management overhead and also took the database down for up to 30 seconds, which was unacceptable for many users.
Instead, customers would constantly provision their databases for peak workloads. This was expensive and wasted resources, causing them to pay for large VMs that would sit partly idle for large periods of time.
How Aurora Serverless works
Amazon Aurora Serverless v1 changed everything by enabling customers to resize their VMs without disrupting the database. It would look for gaps in transaction flows that would give it time to resize the VM. It would then freeze the database, move to a different VM behind the scenes, and then start the database again.
This was a great starting point, explains Biswas, but finding transaction gaps isn't always easy. "When we have a very chatty database, we are running a bunch of concurrent transactions that overlap," he explains. "If there's no gap between them, then we can't find the point where we can scale."
Consequently, the scaling process could take between five and 50 seconds to complete. It could sometimes end up disrupting the database if an appropriate transaction gap could not be found. That restricted Aurora Serverless instances to sporadic, infrequent workloads.
"One piece of feedback that we heard from customers was that they wanted us to make Aurora Serverless databases suitable for their most demanding, most critical workloads," explained Biswas. That included those with strict service level agreements and high-availability needs.
Improving serverless database services
With that in mind, version two of Aurora Serverless brings some significant improvements, including a new approach that lets it scale to thousands of transactions in seconds. AWS achieved this by providing the database process with more resources, over-provisioning them under the hood. That eliminates the need to find a gap in database traffic because the serverless process doesn't move between different VMs to scale.
That might seem like a losing proposition on Amazon's side, because the company has to absorb the cost of that over-provisioning. AWS is used to finding new internal efficiencies using its economy of scale, though. To improve scalability in Aurora Serverless v2, it got smarter about workload placements.
The company can now place workloads with complementary profiles on the same machine. A reporting workload that runs at night could run on the same VM as a business application that operates during the day, for example. That's a benefit of the cloud's multi-tenant operating model.
Serverless v2 also scales in finer-grained increments. V1 customers could only double their provisioned amounts of the database computing unit, known as the Aurora Capacity Unit (ACU), when usage exceeded a set threshold. Aurora Serverless v2 allows increases in .5 ACU increments.
There are also improvements in other areas, including availability. Although high-availability for storage is standard, Aurora Serverless v1 doesn't offer high availability for compute. V2 offers configuration across multiple availability zones. It will also support read replicas across those instances for faster record retrieval, along with Aurora Global Database support for read-only workloads. This means faster data replication across regions and failover times of under a minute for increased reliability in an Aurora Serverless 2 environment.
RDS Proxy
Amazon has also introduced technology that reconciles a fundamental difference between the serverless operating model and relational database principles.
DynamoDB, Amazon's managed NoSQL key-value database, is already serverless because of its underlying architecture. You can easily introduce auto-scaling rules directly in the web interface when setting up DynamoDB tables.
Things are different with Aurora because of the way that relational databases set up connections, Biswas explains.
"In a serverless environment, a Lambda function runs and then it's done," he points out. "Relational databases tend to be persistent."
Relational databases are typically stateful, maintaining a single connection to an application over time so that they don't have to waste time setting things up again every time the application makes a query. Serverless computing is a stateless concept that creates and rips down connections as needed.
Applications using modern container architectures are designed to scale quickly. If every container-based function opens a connection to a database, the relational engine will spend all its time managing connections rather than serving queries.
At AWS re:Invent 2019, Amazon launched RDS Proxy, a service to solve the connection problem. The service, which entered general availability in June 2020, sits between the application and database and pools connections. Instead of bombarding the database server, container-based applications connect to the proxy, which can hand out connections from the pool. It supports serverless Lambda functions, Kubernetes containers, or any other stateless application that doesn't natively support connection pooling.
Lambda integration
AWS doesn't just support efficient access to Aurora Serverless from AWS Lambda functions; it supports the reverse. Lambda integration lets customers invoke serverless functions from within the database. That lets developers write business logic in a Lambda function, which supports various languages, rather than writing stored procedures in a procedural dialect of SQL.
Lambda integration does more than give developers more flexibility. It also puts compute power outside the database, allowing it to concentrate on queries rather than impeding its performance by running embedded application logic. Finally, it simplifies application workflows. For example, a developer can have Aurora call a machine learning model directly as a Lambda function rather than coding that request into their application.
Amazon continues to make advances with its serverless database applications. Amazon Aurora Serverless v2 with PostgreSQL and MySQL compatibility GA'd last week at AWS Summit in San Francisco. "We will essentially support all of the features in Aurora with Aurora Serverless v2," Biswas concludes. Soon, for many customers, the concept of a database server could be an anachronism.
No comments:
Post a Comment