iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Google Discussion Forum Google Discussion forum is for topics specifically related to Google. There is a subforum dedicated to AdSense/AdWords subjects.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 07-12-2004, 09:23 PM
ADAM Web Design's Avatar
WebProWorld 1,000+ Club
 
Join Date: Dec 2003
Location: Toronto, Ontario, Canada
Posts: 2,181
ADAM Web Design RepRank 1
Default Google vs. MSN/Yahoo crawler and indexing observations

As many of you may or may not know, I've relaunched my company site a little more than 2 weeks ago. Having done so, and having added two articles since then to it, here's what I've measured:

Average Google crawler time (from the time I uploaded a change): approx. 27 hours.
Fastest Google crawler time: approx. 21 hours, 45 minutes (discovered earlier today).
Slowest Google crawler time: approx. 31 hours.

Now...MSN and Yahoo!?

Yahoo!/Inktomi has begun to crawl the site over the course of the last week (Wednesday it started, but I'm not sure of the tiem), yet has not finished crawling.

MSN, on the other hand...nothing. The bot has been by repeatedly but for some reason, the old content is still in their index.

Now here's the tricky part...my primary gauge of the crawlers (my site articles) have been indexed on other sites that I've submitted them to after I put them up on my own site, yet not with my domain name (i.e. the articles on adamwebdesign.ca don't show up, but the same ones do on other sites).

The site wouldn't appear to be crawler-unfriendly since Google's pretty well got all of it, and Yahoo! has part of it. But the question becomes: why is Google's so quick to index things permanently (no more Freshbot, it seems) vs. Yahoo!/MSN working much more slowly?
Reply With Quote
  #2 (permalink)  
Old 07-13-2004, 12:55 AM
WebProWorld Pro
 
Join Date: Apr 2004
Posts: 144
wavedancing RepRank 0
Default

Google use page rank to choose which page it will crawl (it will not crawl all pages encountered). Apparently the higher the page rank, the more likely and frequently it will be crawled by the google bot. That's why it may crawl your article on the site other than your own before it crawl your site. Also, the crawler should crawl in a parallel fashion.
Reply With Quote
  #3 (permalink)  
Old 07-13-2004, 02:42 AM
ADAM Web Design's Avatar
WebProWorld 1,000+ Club
 
Join Date: Dec 2003
Location: Toronto, Ontario, Canada
Posts: 2,181
ADAM Web Design RepRank 1
Default

That's all well and good but my PR is only 4 (not especially low, but not all that high either) and Google crawls the new stuff within 24 hours.

It's MSN and to a lesser extent Yahoo! that are slow. Google seems to be on the ball.
Reply With Quote
  #4 (permalink)  
Old 07-13-2004, 04:04 AM
WebProWorld Veteran
 
Join Date: Apr 2004
Posts: 447
HardCoded RepRank 0
Default

Quote:
Google use page rank to choose which page it will crawl (it will not crawl all pages encountered).
No it doesn't, and yes it does.

In my experience, the biggest factor in Google's frequency is how often you update the site. And when it gets around to it, it crawls every last scrap it can find. If you find that it does not crawl your whole site, I can almost guarantee that it's because of gunk in the URL, as has been discussed here in countless threads.
Reply With Quote
  #5 (permalink)  
Old 07-13-2004, 04:13 AM
WebProWorld Member
 
Join Date: Jun 2004
Location: Seattle WA USA
Posts: 86
robinev RepRank 0
Default

It is kind of unfair to include MSN along with Yahoo and Google since MSN doesn't yet have its own search engine. MSN is still using results supplied to it by Yahoo's crawling process.

MSN's own bots and the SE reports they help create are called "technology previews". That's not even "beta" quality.

At some point a long time from now, MSN will release a search engine that gives results based on Microsoft's crawls and MS search/indexing algorithms. But they're not there yet.
Reply With Quote
  #6 (permalink)  
Old 07-13-2004, 12:02 PM
ADAM Web Design's Avatar
WebProWorld 1,000+ Club
 
Join Date: Dec 2003
Location: Toronto, Ontario, Canada
Posts: 2,181
ADAM Web Design RepRank 1
Default

I'm not totally sure on that, robinev. The thing that's weird about the "beta" bot is that it's only just now picked up on stuff I added last week to the site. It sees the links to the new item, but hasn't added them yet. I know it can see them because if I do a search for the title of the new item, it shows up in the results.

If the present incarnation of MSN is still using results from Inktomi, which it would appear to be, then it's certainly not retrieving the updated info on as frequent a basis as say a Yahoo!, who has appeared to update on a 2-week basis from the time the original content was added.

I guess what I'm trying to ultimately figure out is what the update schedule is.
Reply With Quote
  #7 (permalink)  
Old 07-13-2004, 02:05 PM
WebProWorld Member
 
Join Date: Jun 2004
Location: Seattle WA USA
Posts: 86
robinev RepRank 0
Default

Microsoft says that the results of msnbot's crawling are not being added to the indexes that show up at http://search.msn.com/. Those indexes are supplied by Yahoo and its associates, and then (apparently) tweaked in various ways by Microsoft.

The results from the "technical preview" indexes built by MSN's own crawlers are likely to be far different than what Yahoo supplies. How different will only become apparent when Microsoft releases something that they're at least willing to call a "beta". (Yahoo's searching algos are so buggy they shouldn't even be called "beta", but Yahoo doesn't seem to care.)
Reply With Quote
  #8 (permalink)  
Old 07-13-2004, 03:31 PM
WebProWorld Pro
 
Join Date: Apr 2004
Posts: 144
wavedancing RepRank 0
Default

Quote:
Originally Posted by HardCoded
Quote:
Google use page rank to choose which page it will crawl (it will not crawl all pages encountered).
No it doesn't, and yes it does.

In my experience, the biggest factor in Google's frequency is how often you update the site. And when it gets around to it, it crawls every last scrap it can find. If you find that it does not crawl your whole site, I can almost guarantee that it's because of gunk in the URL, as has been discussed here in countless threads.
To be more precise, google will not crawl all pages it right after it find it. It might try to crawl it later. If a new page is found by following a link from a page with high page rank, it's more likely to be crawled or earlier it will be crawled.

Google will not know if you already updated your site until it revisit it. Google seems revisit every site everyday. It has to be done in a parallel fashion.
__________________
Wave Dancing Chinese calligraphy>-Chinese calligraphy art, lessons and tattoo design.
Reply With Quote
  #9 (permalink)  
Old 07-14-2004, 02:38 AM
ADAM Web Design's Avatar
WebProWorld 1,000+ Club
 
Join Date: Dec 2003
Location: Toronto, Ontario, Canada
Posts: 2,181
ADAM Web Design RepRank 1
Default

If MSN isn't using the MSNBot, and I'm inclined to believe that as well, it has to still be using Inktomi. Yahoo! has updated from the Inktomi database. Based on the experiment I conducted, MSN still hasn't.

If MSN is using the Inktomi bot as suspected, then the question still remains: what's taking it so long and what's the update schedule that others have seen as far as new Inktomi results appearing on MSN? In other words, if Yahoo can do it, why can't MSN?
Reply With Quote
  #10 (permalink)  
Old 07-14-2004, 03:15 AM
WebProWorld 1,000+ Club
 
Join Date: Feb 2004
Location: Australia
Posts: 1,255
Dave Hawley RepRank 0
Default

IMO, this is one of the main reasons Google is number 1. Its technology is beyond that of all other SE's. This give Google the largest database in the world to pick the most relevant results from. Googlebot seems to get everywhere that it is allowed to go.

When/if other SE's can freely deep crawl the www like Google, there will be some serious competition. Until then....
Reply With Quote
Reply

  WebProWorld > Search Engines > Google Discussion Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 09:57 PM.



Search Engine Optimization by vBSEO 3.3.0