BigDaddy Means Big Changes at Google
Jan 25, 2006 11:19 AM , By Brian Quinton
One of the most popular forms of exercise among many search engine optimizers—both the third-party firms that do it for others and the advertisers who spiff up their own Web pages for better natural search rankings—is a periodic workout called “chasing the algorithm”. The race begins when Google or Yahoo! updates some portion of the software that determines how they look at Web pages and decide which are most relevant and valuable to a searcher. The engine makes that change; Web operators see their rankings rise or fall as a result; and they, or their outside search engine optimization (SEO) firm, scramble to get back the old rank by providing the new elements the search engine now needs. After a few months, the engines make another change, and it’s off to the races again.
Well, optimizers on Google are lacing up their running shoes for another race. Only this one promises to be more a marathon than the usual sprint. Google is testing a new data center infrastructure, a feat much bigger and comprehensive than an algorithm change. Dubbed “Big Daddy” both in the search marketing blogs and forums and by the friendly folks at Google, this new data center—still in shakedown mode—will reportedly add new ground-level capabilities into the Google search function and drive those powers deep into all the algorithms with which Google searches, studies and indexes the Web.
First, a bit of big-picture talk. Google’s examination of the Web relies on a global network of data centers with different IP addresses. These decentralized servers speed the job of sending specialized Google services to users in different regions; they also share the workload of spidering the Web and comparing those discoveries to Web pages that are already in Google’s index.
The new BigDaddy data center contains new code for examining and sorting the Web, and once it has been tested fully, will become the default source for Web results, according to Yahoo!’s chief search engineer Matt Cutts. In a January 6 post on his blog, Cutts said that might happen in early February or March of this year.
But what is BigDaddy intended to do? According to Rob Sullivan, head organic search strategist at search marketing firm Enquiro, “If an algorithm update is like putting new tires on a car or installing a new stereo system, this BigDaddy is like putting in a whole new motor. They’re totally revamping how Google works and resolving some long-standing issues with getting sites indexed properly.”
One of those issues is “canonicalization”. That’s a fancy Google word for instructing a search engine how to decide which of a series of related URLs is the proper one to insert into the Google index. Say your Web site has a number of different home page URLs, including “stuff.com”, “www.stuff.com”, www.stuff.com/index.html” and “stuff.com/home.asp”. This can come about because Web servers are often set up to accept aliases for Web pages, and to know that a request for “stuff.com” means someone’s looking for “www.stuff.com”. That’s a concession to users who get tired of getting error messages when they don’t type in “www”.
The problem is that while these URLs may pull up the same page content, they’re technically four different pages. That could skew the page count Google gets for the Web site, so that a site with 1000 pages and two aliases per page might look twice its real size to Google.
It’s also possible that those aliases could inadvertently contain different content or different incoming links. In that case the Google index, which looks at the value of the content and the quality of the links, could give those four pages different rankings.
Finally, a Google search that turns up multiple entries for what is essentially the same content makes the results page that much less valuable to users. Better to select one of the URLs as the most representative and make room for other results.
“If you want to go to the Seattle Seahawks page on the NFL Web site, you’ll get this long, horrendous URL,” Sullivan says. “But the site also has another URL that’s just ‘Seattle Seahawks’. It pulls the content from the first page and just displays it under a prettier URL. So Google wants to be able to say that second page is the one people really want, and they’ll attribute all the traffic, links and value to the shorter URL.”
BigDaddy is also intended to provide a solution to another long-standing Google problem: that of illicit redirects, known as “302 redirects”. Nefarious Webmasters can “hijack” a page by replacing the pages that should come up in a search with a virtual page that masquerades under the URL for the correct page. The searcher sees the correct result, but when clicked on, the listing can redirect the searcher to any page the hijacker wants—including adult content or false storefronts set up to capture personal information. If a Web site suffers enough hijackings, Google will consider all the pages contaminated and drop it from the index.
“302 redirects are a big hole in the system,” Sullivan says. “People are using 302 redirects to hijack content and pages and many other things. By fixing this, Google will be eliminating a lot of problems.”
Of course, how BigDaddy will fix these issues is a closely held secret. As with many other questions surrounding the compiling and ranking of its index, Google refuses to be specific for fear that too much information will only teach the bad guys how to get around the system.
And there’s something else new about BigDaddy. While search optimizers often know where to find a Google testing data center and have usually tried to go there to see how the pages they’re working on are being searched and indexed, those IP addresses change often, even in a day.
But for BigDaddy, Google’s thrown open the doors. In early January, Cutts published a pair of IP addresses (66.249.93.104 and 64.233.179.104, for those who want a look) and actively called for feedback from Webmasters about problems and issues they perceived with the new system and its indexing.
Some of these changes will bring Google’s indexing technology up to par with its competitors; for example, Yahoo! and MSN have been handling 302 redirects for a year or more, although perhaps not as effectively as BigDaddy will eventually do. But other aspects of BigDaddy will help position Google to measure up to the search requirements of the future in some interesting ways, Sullivan says.
“This will lay the groundwork for more advanced algorithms, larger databases, and being able to index different types of content more effectively,” he says. For example, Google has also begun using a search crawler built on a Mozilla browser. The new search bot is more flexible, seems faster and can read non-text content more readily; that should mean that in time, it will be able to read links within images and even within Flash video, matter that gets ignored by bots that can’t speak Javascript.
“As Web technology develops and we get richer and more interactive Web sites, [the search engines] can’t just stick with just indexing hyperlinks and text,” Sullivan says. “They’re going to have to do everything.”
Case Study: mysupersales.com & BigDaddy
- Fred Palmerino, President, Lancer Media
Summation
The changes that Google will employ in its algorithm through Big Daddy will not affect www.mysupersales.com at all. That's most important. Second, the factors that rank your site on the front page of Google, Yahoo and MSN have not changed. The only methods by which to place your site on the front pages of the search engines in the organic listings are to:
- optimize source code
- optimize/add content - become the true authority for what it is you do
- provide sophisticated and high quality links to your site
About Canonicalization
Big Daddy is a data center and Google is leveraging it to "canonicalize" its index which means it chooses to index the best URL of your homepage when there can be/are several URLs to choose from (that also point to your homepage). Canonicalization also means to instruct a search engine how to decide which of a series of related URLs is the proper one to insert into the Google index.
Here's an example - most people would consider these the same URLs:
- www.mysupersales.com
- mysupersales.com/
- www.mysupersales.com/index.html
- mysupersales.com/home.asp
Technically all of the above URLs are different. A webserver could return unique content for all the urls above - this actually happens. When Google “canonicalizes” a URL, it tries to pick the URL that seems like the best representative from that set. Also, different links could be pointing to different URLs so when the page is canonicalized, it consolidates all links and that one favorite and preferred URL, i.e., www.mysupersales.com, is awarded with all of the link popularity. This is great news for www.mysupersales.com.
Example:
Before Canonicalization
- www.mysupersales.com -- has 5 links pointed to it
- mysupersales.com/ -- has 4 links pointed to it
- www.mysupersales.com/index.html -- has 3 links pointed to it
- mysupersales.com/home.asp -- has 2 links pointed to it
- Total 14 incoming links
After Canonicalization
- www.mysupersales.com -- has 14 links pointed to it
How does Google choose the URL you want? Because it seems that Google treats all www.mysupersales.com URLs the same. Take a look: If you type in any of the following URLs, you'll notice that the same PR7 and the same # of links apply:
www.mysupersales.com
mysupersales.com
www.mysupersales.com/
mysupersales.com/
Webmasters should always choose a URL and use that URL consistently across the entire site. For example, don’t have 1/2 of your links go to http://mysupersales.com/ and the other half go to http://www.mysupersales.com. Instead, pick the URL you prefer and always use that format for your internal links.
Also, suppose you want your default URL to be http://www.mysupersales.com/ which it already is, by the way. You can direct your webserver so that if someone requests http://mysupersales.com/, it does a 301 (permanent) redirect to http://www.mysupersales.com/ . That helps Google know which URL you prefer to be "canonicalized". Adding a 301 redirect is a good idea if your site changes often (e.g. dynamic content, a blog, etc.).
Last benefit to canonicalization ...... it prevents the hijacking of ones site. Hijacking doesnt mean taking over control of ones site. It means inheriting the properties of another’s site through illegal means. Things to inherit are 1) links, and 2) PageRank. Noone will be able to hijack www.mysupersales.com in the future with this new process.
Call us today at 818.995.7861 or email us at info@LancerMedia.com for a free Website Optimization or Reputation Management analysis and discussion.

