Google MediaBot Indexes Web Pages

Google’s Matt Cutts confirmed that the AdSense Mediapartners bot, commonly known as “mediabot,” is indexing webpages for Google’s Big Daddy index, according to some well-known bloggers. What that means to webmasters: having two versions of a webpage (one for each bot) can get you into duplicate content issues.

WebGuerilla’s Greg Boser learned this lesson the hard way when setting up 301 redirects for the Googlebot crawler. Boser says he had neglected to redirect the mediabot, however. The end result: Google indexed and cached both the old and the new URL’s with identical content.

“The interesting thing to note about this page,” writes Boser, “is that the post was originally made in January. And for quite sometime it had a cached page that was a representation of what Googlebot was given. But then the Mediapartner bot visited on April 7th. And the page it was served on that date ended up replacing the Googlebot version in the cache.”

The ShoeMoney blog relays that Cutts confirmed the news:

“Matt said it is a bandwidth saving feature to have GoogleBot and MediaBot both contributing to big daddy. Matt also stated that you will gain zero advantage in search listings however if you are serving different content to MediaBot then [sic] to Googlebot then you could be in trouble.”

Besides duplicate content issues and past speculative issues that AdSense clients could enjoy a certain favorable treatment if indexed by the mediabot, it is unclear the full implications of this change.

“It will be interesting to see if other consequences arise for webmasters, such as excluding pages for googlebot via robots.txt that end up being indexed via the mediabot,” writes Jennifer Slegg.

But until now, it has been Google’s position that the mediabot and the Googlebot were entirely separate technologies with separate purposes, citing a desire to support freedom of expression without bias towards AdSense clients.

From the AdSense Help Center:

“Adding the Google AdSense ad code or AdSense for search code to your site will not queue your pages for crawling by our main index bots. While our bot (starting with ‘Mediapartners-Google’) does crawl content pages for the purpose of targeting ads, participation in AdSense does not increase the number of pages from a site in our main index.”

Tag: | document.write(“Email Murdok here.”)

Drag this to your Bookmarks.

Add to document.write(“Del.icio.us”) DiggThis Yahoo My Web

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top