March 3, 2005 4:00 AM PST
Google's secret of success? Dealing with failure
- Related Stories
-
Eclipse shines light on future projects
March 1, 2005
On Wednesday, Urs Hoelzle, a vice president of engineering and of operations at the search giant, shed some light on how Google's data centers operate. Many people consider the company's operations expertise more valuable than the actual search algorithms that launched the enterprise.
Hoelzle spoke at EclipseCon, a conference for application programmers that's going on till Thursday here.
What's new:
At EclipseCon, Urs Hoelzle, a vice president of engineering and of operations at the search giant, shed some light on how Google's data centers operate.
Bottom line:
According to Hoelzle, Google has inexpensively built out its computing infrastructure by using thousands of "commodity" servers, instead of fewer high-end, and high-priced, machines. The trick is to make these racks of hardware work together and to ensure that the failure of one machine doesn't derail an operation.
The trick is to make these racks of hardware operate in tandem and to ensure that the failure of one machine does not derail an operation, such as returning a search query or serving up an ad.
Consider a home PC, Hoelzle said. Optimistically, a consumer PC might crash once in three years from a software glitch or hardware problem.
"At Google scale...if you have thousands of PCs, you can expect one (failure) a day," he said. "So you better deal with that in an automated way, or you will have service outages."
Google, known for its rigorous hiring practices aimed at attracting the brightest minds in computer science, has created a number of software tools to handle its computing installation.
The company wrote its own file system, called Google File System, which is optimized for handling large, 64 megabyte blocks of data. Significantly, the file system was designed to assume that a failure, such as a failed disk or unplugged network cable, can happen at any time.
Data is replicated in three places, and there is a "master" machine that
See more CNET content tagged:
EclipseCon,
failure,
Google Inc.,
file system,
data center


It seems that between suddenly dropped sites and too much emphasis on link analysis, it is becoming more of an effort to get good results for some searches.
It seems that between suddenly dropped sites and too much emphasis on link analysis, it is becoming more of an effort to get good results for some searches.
- Not all that new news...
-
by
March 7, 2005 2:16 AM PST
- One of there engeneers were talking about this 4 months ago... the video is here http://www.uwtv.org/programs/displayevent.asp?rid=2459
-
Reply to this comment
-
-
See all 22 Comments >>