Friday, May 1, 2009

Google Dance

Google, the greatest search engine, maintains an index of all the pages that it crawls in the database. This index is updated or reformed after a rough time interval of about one month however the exact dates fluctuate and are not disclosed. The update process of this mega search engine is popularity known as the “Google Dance”. Google also happens to be one of the very few search engines offering free submission, and that is why it is having over 3.4 billion pages in its database. SEO industry seems to revolve around this giant engine as nearly 77% search results on internet are powered by Google. So it becomes extremely important for a webmaster to understand the process of Google dance in order to plan optimization for a website.

Every dance starts with a deep crawl that entrails spidering the whole web from the start which ends up to many days. Google uses around 15000 servers spread all over the world at its data centers. Obviously an index update cannot be proceeded on all those servers at the same time. One server after the other has to be updated with the new index. Servers used by Google are placed at data centers mainly located in US. It is possible for Google to record all queries centrally and then distribute them to the data centers but this would obviously be inefficient. During the Google Dance, the data centers do not receive the new index at the same time. One after the other the data centers receive the updated index. So if a user queries Google during the Google Dance, it is possible that he may get the results from a data center which still has the old index at one point of time and after a few minutes from a data center which has the new index because DNS is resolved to that latest data center.

Google Dance can be identified by querying the IP addresses of its data centers which are unique for every center. Generally all queries on the IP addresses of data centers are redirected to www.google.com but there are special domains by Google that resolve to these data centers individually. Those domain and IP addresses are

Domain IP Address
www-ex.google.com 216.239.33.100
www-fi.google.com 216.239.41.100
www-ab.google.com 216.239.51.100
www-in.google.com 216.239.53.100
www-zu.google.com 216.239.55.100
www-va.google.com 216.239.37.100
www-dc.google.com 216.239.39.100
www-cw.google.com 216.239.57.100

Source

No comments: