I’ve been thinking about this idea for awhile, and I thought I would put a name to the thought. I brought up this idea while I was giving my “Real Life Scaling” presentation at the Utah Open Source Conference in 2009. Here is the problem I think most individuals in the web development face:
Hopefully at some point, your website gets a lot of traffic. Yay, you’ve reached your goal of getting good traffic, but it is soon followed by issues with performance and load. I like to call these the growing pains of a website. So as a web developer, I suddenly have the epiphany of “Hey, I need to scale my website!” What follows next is the biggest mistake a web developer can make:
They start looking at articles on how Google scales, or maybe how Facebook manages all of their traffic.
This is a mistake! To be brutally honest, you are not Google. You are not Facebook. You are not Twitter. You are a website that receives less than 0.000001% of the traffic that some major websites receive.
Why is this dangerous for web developers to do? Google, Twitter, Facebook, and others like them are solving complicated at a very large scale. I remember a presentation by a Twitter engineer who developed a program for a unique ID generator that can generate millions of IDs per second. The probability of you needing this type of solution is about the same as being struck by lightening. Applying these same practices at a much smaller scale are not realistic. If a locally owned grocery store wanted to open a second store, they would not adopt the same practices that Wal-mart use to manage their 8970 stores.
A Little Reality Check
I’m sure that most of my readers know of StackExchange.com. They power the popular website StackOverflow and several others. They have about two million visitors per day. That is a lot of traffic. StackOverflow is ranked #123 on Alexa. So you would imagine that they have a very large infrastructure serving all of this traffic?
Earlier this year, Stack Exchange wrote an article about their production environment. I was surprised on what exactly they were using. In paticular, the number of Production Servers*:
- 12 Web Servers (Windows Server 2008 R2)
- 2 Database Servers (Windows Server 2008 R2 and SQL Server 2008 R2)
- 2 Load Balancers (Ubuntu Server and HAProxy)
- 2 Caching Servers (Redis on CentOS)
- 1 Router / Firewall (Ubuntu Server)
- 3 DNS Servers (Bind on CentOS)
That is 22 servers for 2 Million Visits per day, serving 800 HTTP requests per second. Now, StackExchange did clarify that they did have other servers for management and fail over, but 22 servers handle their production load. This is a website that is ranked the 123rd most visited website in the world.
Honestly, most websites could be run on half a dozen servers if designed and configured correctly, including redundancy. Some really busy websites could run off a dozen servers. Unless you’re in the top 5,000 websites on the web, you really shouldn’t be worried about large-scale techniques.
So when you’re website is starting to grow, and you leave small scale, you’ll enter the phase of “Middle-Scale.”
What is Middle-Scale?
Middle-Scale is like being an awkward teenager:
You know that you can’t be the only one suffering through this, but you’re unsure how to proceed. It feels like you’re missing missing out on things everyone else must already know, but aren’t talking about. Like everyone else are awesome vampires or something:
But the reality is this: they don’t have some awesome secret! They are just normal teenagers.
This same idea applies to Middle-Scale websites.
Middle-Scale is when the most important things are still the best practices. Only now when you deviate from them you can feel those consequences. When you only had 100 users, a couple of nested queries and missing indexes didn’t cause that much of a problem. Your database is powerful enough to hide the inefficiencies. However, when you get to 10,000 users, your database can no longer hide the inefficiencies.
Middle-Scale is when simply separating your web server and database server isn’t enough. You’ll probably need to add some sort of cache like memcached. You’ll need to start tweaking your MySQL, Apache, and PHP configurations.
Then, after you’ve ironed out your inefficiencies, you’ll start to use multiple servers. You’ll probably add a Load Balancer with multiple web servers. After that, you’ll probably have some sort of Master-Slave replication for your Database for backups and fail-overs.
You start to leave this “Middle-Scale” classification when you move to multiple data centers, and start to do some load balancing at the the DNS layer. This is when you’ll start to have a dedicated sys-admin team.
Okay, I’m Middle-Scale! So what should I do? Where do I look?
First off, you must adhere to best practices. If you are working with PHP, research PHP performance and best practices. Do they same for each of your technologies, like Apache and MySQL. You will need to stop treating your application as one big app, and start to understand all of it’s moving parts.
Second, you must understand your specific problems. Scaling ins’t a problem, nor is it a solution. It is a generic term for many different types of solutions. Without understanding why your website is running slow, or why it cannot handle the load, you will not be able to create an effective solution.
So you don’t have a scaling problem. You have a MySQL performance issue, or a Apache problem, or a PHP problem. Most likely, it is something extremely specific. You have a high volume of MySQL write operations (i.e. UPDATE, INSERT, DELETE, REPLACE), or perhaps you are missing some indexes and have too many full table scans.
Third, Googling for help will only get you so far. You are starting to enter a phase when it is harder and harder to find answers to your broad issues. Talking with other experienced people who have gone through the Middle-Scale pains before will help immensely. I cannot recommend highly enough going to User groups. Being able to communicate with someone, either face to face, on the Phone, over IRC, etc. is invaluable. While I’ve learned a lot at conference and usergroup presentations, I’ve learned even more by just talking with the people attending and at the social gatherings.
Profile & Performance will Naturally Lead to Scaling
When you want to scale, it can feel like a very daunting task. It seems like this big unknown complicated solution. What in the world am I going to do? I remember feeling these worries when I first started to investigate load balancing and sharding for some websites I was working on.
The thing is, if you start to profile your application, you will discover it’s inefficiencies. I remember when I spent a sold week, working 12-16 hours a day profiling and optimizing Dating DNA’s database. I found a lot of bad queries, and I was able to cut our load times from 2-5 seconds to under 0.1 seconds. The CPU on the database server went from 80-90% CPU utilization to under 10%. It was incredible, and then I promptly took the entire next week off. When we migrated to new servers, I was able to move to less powerful database server and still have the same great performance. So by profiling and optimizing our database, I didn’t need to worry about spinning up multiple master databases and sharding our data.
With Clipish, we faced almost opposite scaling problems. The database was rarely an issue, but our web server CPU’s were. We do a lot of ImageMagick manipulations of images, and at high volumes on virtual servers this can be a big issue. So over the last year we’ve introduced some load balancing and CDN tools to help serve all 10 TB of bandwidth for Clipish.
The thing is, when you start to profile your application, you start to understand it’s low areas better, so you have a much better idea on what do to. Even if you don’t know your solution, it is much easier to find a solution with a sound understanding of the problem. For example “scaling mysql” yields much less helpful results than “mysql full table scans” in Google.
So should I ignore what Facebook and Google do for scaling?
Of course not! First off, they do cool stuff. Just because I’ll watch NASA launch a space shuttle doesn’t mean I’ll try to make a rocket system for my broken lawn mower. But you have to put what they are doing into context. People from large websites have published several good “best practices” articles on techniques that help any website. Especially things on the client/browser side of things. Just use caution. I cringe when I hear someone say “we’re trying to use Cassandra to solve XYZ problem at work” when it is severe overkill.
Most of the time when I talk about performance and scaling with other people, it is when they are in “critical mode.” Their website is down, slow, unusable, etc, and they are looking to fix the problem. I will say, it is much more difficult to profile in “critical mode” than profiling before hand. The reason is you are much more desperately focused on getting it working again instead of understanding the problem.
I’ll be giving a presentation this Thursday at UPHPU on Profiling PHP Applications. I’ll post the slides, and most likely write some articles on the subject afterwards. As always, feel free to email me or leave a comment.
5 thoughts on “Working with Middle-Scale Websites”
Figuring out the right scaling techniques for your situation is an important question, one that often doesn’t get asked enough. As you point out, many people just say “I’ll use X because that is what Google|Facebook|Twitter uses”. But their scaling issues are unlikely to be exactly the same as yours.
Another related issue that comes up is high availability. Even though a site might not need the scalability from a purely usage point of view (a single box can handle the load), there are times where you employ some scaling style techniques to achieve higher availability (multiple web servers, DB replication, etc.). High availability and scaling techniques aren’t exactly the same, but there is some overlap.
It was a good article, but I struggled to get through it. You should proofread your posts before publishing them to the web. A little polishing and this would be fantastic.
I enjoyed reading this article. In the past, I too were looking at these google scale technologies and were thinking about using them. But you are right, we not really need them, using middle-scale components make a lot more sense.
Alright, first off, thankfully, I’m not in critical mode, and I think I have the PHP and queries working great. Iuse ocPortal as my framework and if you do away with most of the stock “blocks” it runs super fast. I’m basically just betting my future on what I’m doing and want to make sure I’m doing it right from the start. I really don’t want to share my idea with the world on your website, but I really could use someone good to talk to concerning how to best go about what I’m about to do. Hopefully on the phone or something where conversations can happen so much easier. My email should be in your logs for this post. I’m pretty blind, so finding your email address on this page is a little tricky with my screen reader.
For now though, I’ll ask this question here, as maybe your answer will help others in my boat…
Let’s say I’m going to set up a box with nginx to do my load balancing. How powerful does this box need to be? I am basically planning on implementing this solution solely to reduce the risk of downtime associated with having to upgrade servers and then having to wait for my dns to propagate over to the new IP. To be clear to others who might read this, the problems I’m discussing don’t really apply to people who run their own data centers, but I use 1and1 for my hosting and they only allow one server per contract. Now one can have as many contracts, and thus servers as they wish, but in order to upgrade to a different server you have to put it on a completely different contract which means if you have your domains over at GoDaddy as I do, then you have to jump through the hoop of dropping your current dns settings and then re-import each domain into the new contract, which means at the very least 24 hours of downtime on a website that you feel is doing so well it’s time to upgrade servers.
Alright I’m starting to confuse myself now…
The main question here is… Am I wrong in assuming that a load balancer doesn’t need to be all that powerful as far as cpu and ram are concerned since all it is doing is basically routing traffic?
Furthermore, am I also wrong for assuming that it might be okay to use a virtual private server (VPS) to do the load balancing since I’m again assuming that the bandwidth the load balancer would consume would not be all that much?
My concerns with power are this… The biggest VPS that 1and1 offers allows at most 8gb of ram and 2-4 cores of cpu. Now right now, I’d probably be fine with the $5 a month version that has at most 1gb of ram (I’m doing 0 traffic and nginx only uses 100 – 200mb of ram), and unlike the dedicated boxes I mentioned above the VPS offer “1 click upgrades” to the higher tiers that allow me to retain all of my dns and server configurations. Now I know what you might be thinking… ‘Dude, if you’re doing “0 traffic”, then why are you worried about load balancing’? Good question. Like I said above, I’m at this point simply just trying to future proof my setup and make it a possibility to quickly throw another server in the mix if need be. For now, I’d be doing nothing more than “load balancing” to one server. I mean, I might grab one more $5 a month VPS to use just to test that everything is working correctly and have it in place, sync, and ready for that one click upgrade, but I’m not trying to fool myself by thinking I really need any of this at this point. I’m just trying to do things right from the gate you know. e.g. Load balancer 1 dedicated database server.
Just writing this is helping me clarify things in my own head but I still have the question… Does a load balancer eat up bandwidth? The VPS are limited in this capacity… For example: On the $5 a month version , the first 500gb of bandwidth is free but then it’s 49 cents a gb after that. I’d also have the question of how much traffic I’d have to be seeing in order to outgrow a 8gb of ram 2500gb of traffic load balancer VPS?
While I’m at it, I should ask this as well… the VPS I set up as my db is tiny and still under a 30 day money back guarantee. I can always “1 click upgrade” that all the way to the 8gb of ram version for $60 a month if need be and then on to an actual box when the time comes, but with only 2gb of ram, at the current point, do you think I’ll run into problems even with low traffic?
I’m still a ways away from putting everything online, but any advice you have to offer will be greatly appreciated.
Thanks a million.