As a follow on to my last post about PHP memory consumption, I wanted to get some ideas out there about memory utilization. This post explores:
- An equation for the maximum number of users an application can support on a given server.
- What can happen when the maximum number of users is exceeded.
- How memory consumption impacts the cost of scaling to huge numbers of users.
Rough equation for simultaneous user limit based on memory:
(Memory Available / Memory Needed Per Request) = Memory Based Simultaneous User Limit
Memory Available != Total Memory. The operating system and other processes consume memory too. Even though a server might have 2GB of RAM, maybe only 60% of that is available to the application when everything is idle. The same idea applies to the JVM where classes and singletons occupy memory permanently, leaving a portion of the memory allocated to the JVM available to process incoming requests.
What happens when the limit is reached?
Keep in mind this is simulataneous users all hitting the site within say 20ms of each other.
- The worst case is, free memory will be exhausted and the operating system will start to swap to disk. In that situation performance will degrade and all users will have to wait for the server to catch up. Eventually the server will completely crash. All users will be impacted negatively.
- A better setup is to configure the application with a maximum user limit. When the limit is reached, new visitors will get a ‘page not available’ warning of some kind. This is better than crashing, and you can weather the rare surge of traffic that exceeds the limit.
- The best case is to anticipate this limit, monitor the system constantly, and always stay comfortably ahead of maximum capacity limits. This is what you pay your top notch systems administrators to keep tabs on.
This is not a purely technical decision. Some business models do not tolerate downtime. Other business leaders love taking risk and are not satisfied until something breaks. The business leaders on the team should be made aware of the issue and decide which approach they want.
The cost of scaling:
If your site is low traffic (less than 50 simultaneous users), then memory consumption probably doesn’t matter much. Using a framework and leveraging plug-ins saves a lot of development time. Down the road it would be possible to turn on caching or employ other techniques to manage additional traffic.
However, if the site needs to scale to millions of users, memory utilization should be at the top of your list. Scaling cheaply is in part about using memory effectively. At a certain point it becomes worth it to refactor to light weight frameworks, which might take more up front development time, but the site will perform better.
As a very simple example, let’s consider three sites, A, B, and C, that use 4MB, 8MB, and 32MB on average per page request. Based on the findings in my previous post it might be that site A uses FatFree, site B uses Symonfy, and site C uses Drupal.
The AWS cost calculator reports that 1 large EC2 instance costs about $250/month. Site A has a huge competitive advantage over B and C when it comes to scaling. As an example, let’s say site A is reasonably successful and needs 10 servers to meet demand. An equivalent amount of traffic would require 20 servers for site B, and 80 servers for site C. In this example, the monthly server cost for A, B, and C are $2,500/month, $5,000/month, and $20,000/month respectively. The difference between A and C is about $17,500 per month. For a small business, that’s a fair amount of money. Now imagine if the traffic continues to grow, say ten times. The difference between A and C is now $175,000 per month. Did C paint themselves into a corner? Perhaps not, I don’t think comparing FatFree to Drupal is really that fair since they are different animals built for different purposes. Time to market is extremely important. Not to mention, how many sites really get 10,000 simultaneous users? It is important to be aware of the difference, and use tools for their best purpose.
Memory is just one variable:
Focusing just on memory oversimplifies the picture. The real user limit could be lower depending on CPU, Disk I/O or other factors (like a user triggering a huge report!). Request processing time is also important to pay attention to. Response time and bounce rate are positively correlated. The longer a page takes the load the more likely users are to leave the site. I have worked on applications where the response time had to be below one second. This influences the architecture to say the least. Disk I/O is another factor, especially in the cloud. The only real way to get to the bottom of all this is to run load tests and analyze the data.