A Little Background
It was the summer of 2008 when I attended the LT Pact conference that I first drank the “Cloud Kool-Aid.” As you can see from my notes, I was pretty excited about this whole concept. Excited enough, one client of mine signed up for Layered Tech’s grid layer (which is what they were demoing at the conference).
One interesting thing I heard at the LT Pact conference was one of their keynotes was as IT Researcher for Gartner. He outlined what phases the current industry would go through with “cloud computing”:
- First, several early adopters would have great success implementing the technology.
- Second, mainstream hype would “over-hype” cloud computing.
- Due to the hype, many people would start “using the cloud” as a silver bullet, thinking the cloud would be perfect for anything.
- A “cloud crash” would occur where disillusioned IT managers realized the cloud had its drawbacks, and wasn’t the perfect solution for any problem. More over-hyped people will start to move away from the cloud.
- After the dust had settled and the hype faded away, the actual cloud technology would find a common place in today’s IT.
Currently I think different people and groups are at different stages of the last three listed above. Some are still raving and hyping the cloud for everything. Some are bitter about their disillusionment. Many are now looking beyond the hype at the real concepts behind “cloud computing” and finding “the services are wildly different. While many parts of Web hosting are pretty standard, the definition of ‘cloud computing’ varies widely.” (InfoWorld)
After a few hours, the fog of hype starts to lift and it becomes apparent that the clouds are pretty much shared servers just as the Greek gods are filled with the same flaws as earthbound humans. Yes, these services let you pull more CPU cycles from thin air whenever demand appears, but they can’t solve the deepest problems that make it hard for applications to scale gracefully. Many of the real challenges lie at the architectural level, and simply pouring more server cycles on the fire won’t solve fundamental mistakes in design.
In a rebuttal blog post, Don MacAskill defending accusations that EC2 was over 50% slower than stated.
… let me explain what I think is happening: Amazon’s done a poor job at setting user expectations around how much compute power an instance has. And, to be fair, this really isn’t their fault – both AMD and Intel have been having a hard time conveying that very concept for a few years now…
Bottom line? EC2 is right on the money. Ted’s 2.0GHz Pentium 4 performed the benchmark almost exactly as fast as the Small (aka 1.7GHz old Xeon) instance. My 866MHz Pentium 3 was significantly slower, and my modern Opteron was significantly faster.
So what about that guy with the Ruby benchmark? Can you see what I missed, now? See, he’s using a Core 2 Duo. The Core line of processors has completely revolutionized Intel’s performance envelope, and thus, the Core processors preform much better for each clock cycle than the older Pentium line of CPUs. This is akin to AMD, which long ago gave up the GHz race, instead choosing to focus on raw performance (or, more accurately, performance per watt).
Virtualized Servers can be awesome. They have some great advantages, while also having clear disadvantages (if you look beyond the hype). While I know I’ve quoted a lot about Amazon EC2, I have only experimented with it. However, I have a good deal of experience with 3Tera’s AppLogic, and have recently moved a website to Rackspace Cloud.
While Rackspace Cloud was more straight forward, learning 3Tera’s AppLogic is a large task. I spent several months learning a lot (and tearing my hair out some days) and then another year of learning to deal with it’s quirks. It is pretty cool once you get it down, but in all honesty its features didn’t line up with what the company using it needed. Every now and then we spin up a custom server for a website or something else. However, AppLogic is built more for people who are doing a large volume of similar instances. So while it works, it probably isn’t the best fit for us.
Also, with AppLogic’s components, they are engineered for very very specific tasks. Doing things outside of their scope *is* possible, however there was a vast amount of tinkering, reading, blood, sweat, and tears trying to get it to work. There are still things, like with the way AppLogic handle network routing, that are just unique at best.
Bare Metal to The Cloud
Each instance I’ve gone to deploy on the Cloud, either Rackspace Cloud, Amazon EC2, or AppLogic, the project itself had started on a dedicated server that we then moved to a virtualized server. Since when virtualized servers started to pop up, even in the earlier days of simple VPS solutions, you would be moving from dedicated bare metal to a virtual server. The inherit problem is that virtual servers, especially on shared hardware, have *less* resources available to them, especially CPU power. On Bare Metal Dedicated Servers, I rarely run into CPU issues. When the CPU power is being shared with several other “virtual servers”, you suddenly only have a quarter of what was available.
Then you have your neighbors. While moving Clipish to the Rackspace Cloud the little 1GB RAM server felt peppy. I was confident in our choice. Clipish isn’t complicated, and it’s web services are pretty easy going. The only intense part is when you clear the image cache. Then when a web server request comes in and the image isn’t there, it pulls the image from the Database, performs the ImageMagick operations to resize, watermark, create a thumbnail, etc on demand. During the early morning, it isn’t a problem at all. However, one night I “cleared the cache” during peak times, and within minutes the server had locked up.
Turns out on our old server, the web services could easily burst to all 8 cores for a few minutes while regenerating hundreds of image caches. Even the Rackspace server could burst during off-peak hours and repopulate. However, during peak times, when the dozen or so “neighbors” we’re using their allotted CPU, and our server couldn’t “burst” like before. So what did we have to do? We wrote a script that would loop through our image library one at a time and regenerate it. This works, but after the move and we were experiencing these problems, we thought “did we make the right move to go to the cloud?”
Cloud to Bare Metal
I think the approach with the “cloud” needs to be this. You have your baby project, so throw it up on the cloud. Let it cost $15-$20 bucks a month. Take advantage of the daily backup snapshots, etc, and just worry about development. Let it grow and at some point you’re going to hit a fork in the road:
- Bare Metal Benefits outweigh Cloud Benefits – Bare Metal is typically cheaper than Cloud power. If you are not spinning up and down servers on a daily biases, but is a more fixed size, then quickly scaling isn’t an issue. Cost can and should be a huge priority for hosting your application. If your application needs more higher CPU, Memory, or Bandwidth that doesn’t fit the Cloud model then going Bare Metal is a good idea.
- Cloud Benefits outweigh Bare Metal Benefits – If your application, however, can fit on the cloud and the budget at the same time, then it can be a beneficial to stay on the cloud. Just remember you can’t expect a Cloud server to outperform a Bare Metal server of the same specs. But you can take advantage of things like spinning up new servers in minutes, take quick backups of virtual machines, etc.
I guess my biggest advice when it comes to deciding on bare metal vs cloud is to understand the pros & cons. Remember that while you can burst to more CPU, you shouldn’t rely on that always being the case. The Cloud does afford some cool flexibility, but it isn’t perfect, and it can be a lot more expensive than going bare metal. Also, if you’re on bare metal, and moving to the cloud, the cost of raw power is a great deal more expensive on the Cloud.