I recently published a benchmark of EC2 vs RDS MySQL performance. This article is a follow up to that. Specifically, I’m comparing EC2 vs RDS in the situation where one EC2 instance is being used as a combined app server and database server and it is time to upgrade because of load.
When a single EC2 server hosting both the app and the db is getting too much load, here are some options for scaling up:
- Scale vertically (upgrade to the next largest instance). This is very simple. It can sometimes be done in place if you can handle the down time.
- Split the application server and the database server into two EC2 instances.
Amazon RDS introduces a 3rd option:
- Migrate the database to RDS.
In cases 2 and 3, going with a 2-tier approach introduces network lag when querying the DB, hopefully this is negligible. Adding an extra server without a high availability configuration reduces up-time, since there is now an additional single point of failure. Multiple specialized servers pave the way for greater scaling down the road and provide flexibility.
The hosting cost is nearly equivalent:
With AWS the cost scaling vertically, adding an extra EC2 server, or adding a RDS instance is roughly the same (which makes sense because the cost boils down to RAM/CPU resources).
So, what it really comes down to is the cost of your time to setup and maintain all this.
Option #1 is the simplest to implement: plop in more physical resources via the control panel. There is a limit to how far vertical scaling can be pushed. For systems that have very high load, or anticipate very high load in 1-2 years this probably isn’t appropriate. I still like this option for teams that don’t have a dedicated sysadmin or devops position.
Adding another EC2 instance is the most complex route since you are on your own in terms of configuration, management, monitoring and patching. Then again, you have total freedom to configure it however you want.
That leaves RDS is in the middle, perhaps the sweet spot, but as always it depends... RDS comes with automatic backups and maintenance out of the box. RDS also features the intangible 15% performance boost over an EC2 MySQL instance as I found in this article. There are a few drawbacks to RDS if you are doing very complex operations that require SUPER privileges or you have high security needs. Don’t use RDS if your data must be encrypted at rest, since RDS is effectively shared database hosting. I would not store PHI (protected health information), credit cards, or social security numbers in RDS at this point.
What about redundancy and backups? It is not IF but WHEN a hardware failure will occur.
Running mysqldump on a regular schedule, encrypting the dump, and pushing it to a geographically distant server (or S3) is essential. I prefer to copy my encrypted dumps completely outside of AWS when feasible.
However, even if mysqldump is ran every hour it might not be good enough for the business case at hand. What if the goal is to have as little data loss as possible given a failure? In my ideal world, the data is replicated to geographically distant data centers as changes happen.
MySQL replication can be used for this, but doing it over a WAN might not be the best idea since the connection cannot be guaranteed. MySQL replication adds load to the server, eats some bandwidth, and requires a slave to be setup. Enabling replication may require application code changes to ensure all SQL is replication safe. For example INSERTS, that use UUID() are not allowed with MySQL replication. MySQL replication needs to be monitored to ensure the data stays in sync. Plans need to be made to handle situations where drift occurs between the master/slave and when the master fails. At this point, you really need a DBA to handle this.
RDS offers yet again a 3rd option called Multi-AZ:
Multi-AZ stands for multi availability zone. RDS with Multi-AZ syncs to a slave server in an adjacent availability zone. Multi-AZ costs 2x, and handles failover for you. Availability zones are basically next door, not geographically distant. Replication is file system based, and does not use MySQL replication.
As of May 2013, I wish RDS had Multi-Region support! That would be sweet, but no doubt clog the pipes between Amazon’s regions because all the data redundancy geeks like me would jump on it.
Multi-AZ is probably worth it for most businesses, even if there is only a 90% chance it will actually work correctly. The July 2012 outage brought up a lot of complaints from a few noisy customers:
Amazon’s official response [http://aws.amazon.com/message/67457/] said that only a fraction of users experienced the failure, and most users failed over without a problem.
If you do go with Multi-AZ, make sure to understand the way Multi-AZ fails over:
The failover relies on DNS based host remapping and a short ttl (3 minutes). There will be downtime in this situation. It is also possible different servers in your cluster could be talking to different databases for a short time during the failover. Assuming the master is completely down, that might not be a problem, but who knows? The uncertainty around this sort of question is why database failovers are normally a human initiated action, not a decision made by a computer. As always it depends…