Databases on AWS
- AWS provides a wide range of database services, including the traditional relational databases (RDBMS), NoSQL databases, in-memory cache, etc.
- Database services shall choose provisioned IOPS SSD (
io1
) as its underlying storage layer.
Relational Database Service (RDS)
- Relational Database Service (RDS) mainly has two important features: multi-AZ (for disaster recovery) & read replica (for performance).
- There are 3 typical use cases of RDS:
- Online transaction processing (OLTP): real-time processing for business data, where queries usually run on a single row of record;
- Online analytical processing (OLAP): post-event analytics to help with decision making, where queries usually run on a large set of records;
- Data warehouse: used for business intelligence (BI), could contain very complex and extremely large datasets, could help to perform analysis for historical data.
- AWS RDS provide a wide variety of database engines:
- For OLTP databases: MySQL, PostgreSQL, Microsoft SQL Server, MariaDB,Oracle, Aurora (invented by AWS);
- For OLAP databases and data warehouses: Redshift (invented by AWS).
- RDS is not a serverless technology. It is simply an AWS-managed database.
- Patching & upgrading of RDS underlying OS and database software is controlled by AWS.
- However, Aurora Serverless is serverless.
- RDS provides two backup mechanisms:
- Automated backups: store daily snapshots for the retention period (1 to 35 days) in S3. When there is a need for disaster recovery, RDS picks the last snapshot and applies transaction logs until the point of failure;
- Daily backup happens in a daily maintainence window, and will continue if it does not finish when the maintainence window closes.
- Database snapshots: snapshots initiated by users manually, stored in S3 even after the RDS instance is deleted;
- No matter via automated backup or manual snapshot, the restored database will be on a new RDS instance with a new DNS endpoint.
- All RDS engines support encryption at rest, done using AWS KMS.
- Multi-AZ achieves data replication synchronously, while read replica achieves that asynchronously.
- Automated backups must be turned on in order to deploy a read replica (so that replication can be done via transaction logs).
- Read replicas can be deployed in different regions, and can be promoted to master when necessary.
Redshift
- AWS Redshift is a fast, fully-managed, PB-scale data warehouse service in the cloud.
- Data warehouse databases apply a different architecture from both database layer and infrastructure layer.
- Redshift can have 2 possible configurations:
- Single node mode, can store up to 160GB of data;
- Multi-node mode, consisting of 1 leader node and up to 128 compute nodes.
- Redshift uses advanced column-based compression techiques to save storage space. It automatically samples the data and chooses the most appropriate compression schema when data is loaded to an empty table.
- Redshift supports massively parallel programming (MPP), which automatically distributes data and query load across all nodes (similar to the concept of map-reduce).
- Redshift comes with backups enabled by default with a retention period of 1 - 35 days (by default 1 day).
- The snapshots can be additionally replicated to another region for DR.
- Redshift always attempts to store at least 3 copies of data (the original copy, the replica on compute nodes, and a backup in S3).
- The pricing model of Redshift is as follows:
- Compute node hours:
# of hours
* # of compute nodes
(leader node does not incur cost);
- Backup (similar to that for RDS);
- Data transfer (within the VPC).
- Redshift currently is only available within the same AZ, but its snapshots can be restored to another AZ.
Aurora
- Aurora is a cloud-based RDBMS, compatible with MySQL/PostgreSQL, 5 times faster than MySQL, 10 times cheaper than commercial databases.
- Aurora maintains 2 copies of the data, each stored in 3 AZs.
- The loss of up to 2 copies of the data will not affect write availability;
- The loss of up to 3 copies of the data will not affect read availablity.
- Aurora's storage layer is self-healing. Its engine continously scans the underlying blocks and repairs automatically.
- Aurora can have up to 15 Aurora replicas (allow automated failovers) and up to 5 MySQL replicas.
- Aurora always enable automated backups and it does not affect its performance.
ElasticCache
- ElasticCache is a web service that provide cloud-hosted, AWS-managed in-memory cache. It provides two caching engines, MemCached and Redis.
- Using ElasticCache and read replicas are two major methods to improve database performance on AWS.
- Redis supports multi-AZ and can do backups & restores.
NoSQL Databases
- DynamoDB is the NoSQL database invented by AWS.
- Data in DynamoDB are stored in SSD and spread across at least 3 geographically distinct data centers.
- It supports both document model and key-value pair model.
- DynamoDB could be configured to have 2 consistency models:
- Eventual consistent read (default): could reach best performance, consistency reached within 1 second;
- Strong consistent read: could degrade performance, returns result only after all writes succeed.