With the frequency that you hear “cloud computing” mentioned at conferences and in the tech press, you might believe it’s just a tired buzzword that salesmen use to get boring people excited about buying enterprise services.
James Hamilton wants to convince you otherwise. He approaches technical innovation from a skeptical perspective, with a flair for hands-on empiricism. As a Vice President and Distinguished Engineer on the Amazon Web Services (AWS) team, he’s been hard at work building the basic technology of cloud computing before the first impeccably coiffed consultant first uttered the phrase.
I’ve learned that it’s super-easy to do great work on a problem that’s just not that relevant. You’ve got to work on the problems that are the most important.
Hamilton says that he “loves cost models” for just this reason. High expenses are signals that indicate opportunities for innovation. This is what Amazon honed in on when it sought to up-end the traditional data center business model. The company saw that enterprise services were earning 80% profit margins or greater, and had been doing so for years, using relatively stagnant hardware and organizational principles.
Using prices as guides, AWS has managed to provide a vastly improved service to an enlarged customer base while driving those margins down well below 5%.
What You Can Accomplish By Just ‘Not Believing’ In Consensus
Throughout his career, Hamilton has been struck by the irrational methods that people use to compose their mental models of how the world works. While those models are perhaps necessary for ordinary social function, they impede the empirical process of innovation.
Models teach us that people don’t look hard enough at things. What happens is people start to say something, and then you say something and then three of us say something, and after a while it becomes true. It’s not true.
It’s phenomenal how many things I’ve managed to innovate by just not believing things. Just checking to see if it’s true. It’s almost criminal.”
It’s easier for most people to take what other people say as a representation of reality than it is to get into the field, look at how the world is actually put together, and build something new that solves real problems. It’s easier to roam with the herd than it is to be an iconoclast. Humans tend to favor inertia over change, even when the change represents a technological improvement.
To make this point more concrete, Hamilton explores a basic data center cost model. Conventional wisdom says that power is the number one cost driver in data centers. Now remember, we’re talking about big data centers, think 50,000 servers in a single build. Everyone is seemingly obsessed with optimizing power. But, the basic cost model disagrees. When you dive into the model, it’s not number one, not number two, it’s number three. And even in the third position, it’s not a big number three.
How Reducing Administration Costs Enables Greater Scale in Computing
In breaking down the costs of running conventional data centers, AWS found that administrative costs were really what limited the scale of computing. If you need to manage 10,000 MySQL servers to accomplish your task, it’s not going to be possible for any company to hire and manage enough administrators to do it profitably.
According to Hamilton, people tend to make mistakes at least “20% of the time,” particularly in boring tasks like server administration. Automating many of those processes eliminates opportunities for clumsy humans to make costly errors. This means that computing tasks that could never have been accomplished using the conventional methods for building data centers can now be achieved economically.
All those administrators who were running in-house data centers (and still are) don’t scale in the way that a well-engineered system can. No matter how adept they become at running their own systems, they’re still prone to error, and prone to ever-increasing management costs. People have a greater need for middle-managers than machines do.
It’s these kinds of cost centers in businesses that provide ample opportunities for innovation. Where are there too many people performing tedious tasks at an enormous cost to their employers? How much can you automate what they’re doing?
How Cloud Computing Smooths Equipment Usage
The central innovation that makes cloud computing actually valuable is that it eliminates a lot of slack capacity that would otherwise be present in servers. One example that Hamilton uses is that of a tax preparation software company. The tax prep people experience a spike in their computing needs once a year, followed by a low level of steady number crunching for the rest of the year.
The hypothetical tax preparer needs to purchase enough equipment to handle its solitary spike in requirements, and for most of the rest of the year it remains idle. This is still the standard non-cloud model. A lot of people were neglecting the waste in equipment, power, and administration. That lazy consensus created a multi-billion dollar opportunity for AWS to exploit.
"A way that’s productive to look at it is you’re wasting 87% [of your investment], because you still bought the servers, you still bought the network, you still bought the power distribution gear and didn’t use it. You still bought the mechanical gear but you’re not using it, and you still bought the shell that it all exists in. What this tells you is if I could run a workload that was worth more than the marginal cost of power, it’d be a win. Yes, I shouldn’t be shutting these servers off, I should be turning them all on. If any workload in your company is worth more than 13% turn it on."
Idle hands may be the devil’s playthings, but according to Hamilton idle servers are even worse: They’re wasteful. Aggregating those processing needs through a service like AWS makes it possible to actually get far closer to full utilization out of all that equipment at all times.
Hunting for that sort of slack capacity in any industry is also an example of finding those “almost criminal” startup opportunities. In every industry, there’s a spreadsheet somewhere that lacks enough eyes on it, where losses are accumulating because the people running these companies don’t have the time or know-how to even know that they’re missing something important.
The cost of supporting the efficiency of the aggregate workload is wildly better than any individual workload. Super cool. As a cloud provider, I win before I even start thinking.
The cloud makes what was originally a capital expense — all that machinery — into a variable expense that can be altered as business requirements change. The costs and management overhead that needed to go towards running servers can now all be someone else’s problem: Hamilton’s. Bringing all those costs down can now be their specialty, rather than being spread onto all kinds of non-specialist companies that are never going to be capable of devoting as much focus to the problem as AWS is.
- Capital costs don’t block business formation anymore.
- There’s no longer any need to over-buy server capacity.
- It transfers capital expense (new server equipment) to variable expense (cloud fees).
- It allows companies to apply capital to business investments rather than infrastructure.
"When we started AWS, I figured we would be growing at about an Amazon.com in the year 2000 every month and if that was true, I thought it would be an impressive number. I was wrong; we’re doing that every day. That means just today (one 24 hour period) we roll enough gear into data centers to support Amazon.com in the year 2000. We’ll do it again tomorrow, and then we’ll do it the next day and the next day."
The Amazon cloud also allows for an entirely different kind of redundancy.
"Most companies approach redundancy by putting one data center in Phoenix and one data center in New York and if there’s a failure, then all the smart people get together when Phoenix is down, 'Should we fail over to New York? If we fail over to New York we lose data, that’s tough. But if we don’t fail over to New York, we lose availability.' All the smartest guys think hard and say, 'Is it going to be down for an hour or a week?' We guess and we guess wrong every time. The think it’s always going to be up in an hour and never is, so we lose availability."
Amazon solves this problem by creating what they call “Availability Zones.” Each availability zone is in fact a data center. The way they get redundancy and reliability is to run a workload in multiple data centers in the same region. So, as a company, you should choose your region to get close to your customers and once you make that decision, you replicate within the same region.
There Are Still Similar Opportunities For AWS-Style Innovation
During Hamilton’s presentation at the First Round CTO Summit in October 2012, he teed up some of the other industries he sees as bound to 1980s and 1990s enterprise orthodoxy.
There are tons of startup opportunities that are circling around networking right now. It’s hard for the big players to fundamentally change the way they conduct their business and still maintain 80% margins.
The issues associated with the people whose job it is to stare at those spreadsheets are quite different from the ones that technologists are capable of solving. Accountants are unlikely to develop solutions to hard technological problems, but technologists are unlikely to hunt for the problems that the numbers are screaming out to be solved. Bean-counters are rarely innovative. But those beans that they count can be valuable indicators for entrepreneurs to discover advantages in.
Hamilton considers software-defined networking to be a promising field for this reason.
Another area that he sees as relatively stagnant is in data storage. Hard drives are taking the former place of tape drives in technology infrastructure, and are increasingly being replaced by flash drives, which are becoming more reliable and cheaper over time.
Finding areas where customers are paying too much for a service and then tinkering with ways to reduce those costs with relentless drive seems to have worked for AWS. The opportunity to build a billion-dollar business is likely lurking in the under-scrutinized cells of a spreadsheet somewhere. The typical human response to an ordinary practice — like handing over egregious profits to a vendor that’s been earning massive profits using ossified methods for close to a generation — is to accept it. By just refusing to believe that what’s been so is always going to be so, paths to opportunity can reveal themselves.