View Full Version : Yahoo spider frequency
morpheus.100
05-27-2004, 08:34 AM
does anyone in here know of an average timespan between site indexes from yahoo's bot?
They spidered one of my sites approx 2 months ago and messed up the listings and hav'nt spidered it since. I have used their feedback form and also re-submiteed the URL, but dont want to be omitted for spamming. I hear yahoo are currently working on a new bot and engine and wish to remain listed before it becomes paid inclusion only. Any ideas appreciated. Also does anyone know the name of Yahoo's bot as I wish to modify my alert script to inform me when it has crawled my pages.
Well the Yahoo bot seems to hit my site about twice a day, but I don't really know what controls the frequency of visits for this bot. It seems that Yahoo are doing periodic updates of thier index, with the last two five to six weeks apart.
Yahoos new search engine and bot have been live for a couple of months now, and they assure us that there will always be free submit option available, and at present some 99% of the index is free content.
michalm
05-27-2004, 08:52 AM
Yahoo' bot identifies itself as Yahoo! Slurp lately.
morpheus.100
05-27-2004, 02:51 PM
Thanks for your help guys.
Anyone else feels like adding to the above please do. If I cant make use of anything left I feel others sure could.
tempotantrum
05-28-2004, 06:15 AM
I'm pretty sure that the Slurp crawler like the other major crawlers doesn't have a set pattern as such and I'd say the regularity of its visits differ from site to site and I'm guessing that it builds some kind of 'pattern' for each site depending on how often a site is updated. You'll probably find that sites updated daily (blogs, newssites) get visited more often.
These links might explain more !
www.robotstxt.org/wc/faq.html
www.tamingthebeast.net/articles2/search-engine-spiders.htm
www.iplists.com/
Ade
Mac 5
05-28-2004, 08:28 AM
My stats have the Yahoo robot identified as "Inktomi Slurp."
MSNBot, Inktomi Slurp, Googlebot and BaiDuSpider all visited yesterday May 27.
Jeeves- May 25
Unknown robot (identified by 'crawl')- May 26
Turn It In- May 14
GigaBot- May 3
WISENutbot (Looksmart)- May 13
Alexa (IA Archiver)- May 19
I don't know what a few of these bots are. If you have long spans between visits try adding more links to your site and changing content regularly. There was a lengthy discussion on a different post about adding links to your signature in this forum, which is regularly indexed by the search engines.
http://www.webproworld.com/viewtopic.php?t=19547&highlight=signature+links
morpheus.100
05-30-2004, 08:49 AM
Thanks for all your responses guys. Yahoo has just indexed the site in question again. Maybe when they update the listings they will get it right this time.
rocky1
05-31-2004, 06:13 AM
Greetings Morpheus!
It's at least reassuring to see someone else is having this problem with Yahoo! Although they haven't messed up listings on the site I'm currently working on they haven't updated it in cache or results in over 6 weeks.
When I began SEO work on my latest project, a Honey - Honey Sales (www.thomashoney.com) site some six weeks ago, Yahoo, Google, and several others reflected the initial changes to the index page within 30 hours of posting them to the web. The site screamed from totally unknown to page one for Honey Sales overnight. All engines then reverted back to the old page in cache within 14 hours, and thereafter Google flip-flopped back and forth between new page revisions and old page in cache for about a week. I finally had to quit with revisions and allow the site to settle into a spot, or so it seemed.
Yahoo likewise had the newly revised page in cache the morning after I posted it, but upon returning to the old page has not updated anything on the site since. The old page remains in cache after nearly 6 weeks. None of the site revisions have been seen in results, and none of the new pages introduced have been indexed. Much as you indicated, the URL has been submitted more than once, pages that have been substantially revised have been submitted more than once, the new pages have been submitted more than once, but Yahoo! has not updated and/or indexed any of these changes to the site in over 6 weeks.
During this same period of time Google has updated the site's position at least a dozen times, has fully indexed the site on at least 2 occasions, and has picked up all but one of the new pages introduced. Submissions on Google's part have been only the new pages, Google has otherwise been on top of everything as far as changes to the site.
The site is listed page 1 for Honey Sales, Florida Honey Sales, North Florida Honey Sales, Florida Honey Production - Producers, North Florida Honey Production - Producers, North Dakota Honey Production - Producers, 4 specific varieties of honey sales, and a host of other honey related finds on Google and numerous other engines. Mama.com it sees 2 to 5 listings page one for targeted search strings, showing results for Google, Gigablast, MSN, Teoma, and others, but Yahoo! the only two finds seen, are the company and URL specific search for Thomas Honey (http://search.yahoo.com/search?ei=UTF-8&fr=sfp&p=Thomas+Honey) and Gallberry Honey (http://search.yahoo.com/search?p=Gallberry+Honey), which were both found prior to my work.
In my humble opinion, there is a reason Google is #1 in the search engine market, and a reason they will stay that way! Yahoo truly needs to get up to speed if they want to continue to act like they pose a threat to Google's #1 seat in the search engine market, if they can't re-index a site after six weeks that they know is changing, having seen those changes in their cache files, then they have serious problems! Outdated results do no one any good. Maybe Yahoo! needs to go back to using Google's info, at least their results were up to speed then!!!
trsiyengar
06-03-2004, 03:08 PM
Rocky1 wrote:
When I began SEO work on my latest project, a Honey - Honey Sales site some six weeks ago, Yahoo, Google, and several others reflected the initial changes to the index page within 30 hours of posting them to the web.
Rocky, IMO, the weekly programmed bot visit might be a coincidence you got your listings within thirty hours.
FYI, when I revised my web site and modified the entire look and appearance, it took just hardly 12 hours for Google to list my newly revised site, wherein other SE took some weeks to list but the cache show only the old version. Later on, after a month or so, Yahoo too listed the new cache.
ronniethedodger
06-03-2004, 04:52 PM
I'm pretty sure that the Slurp crawler like the other major crawlers doesn't have a set pattern as such and I'd say the regularity of its visits differ from site to site and I'm guessing that it builds some kind of 'pattern' for each site depending on how often a site is updated. You'll probably find that sites updated daily (blogs, newssites) get visited more often.
No pattern at all.
As for blogs, I have one and Slurp has not crawled any of the archive pages or the single article pages that come with it. It has been going for 9 weeks now and still no indexing of those pages. They did index the main page, the atom.xml file, as well as the RSS file (which is provided by an external service).
They do seem to keep current on the main page though, with updates every couple of days. The XML files are always up to date, but only because I ping Yahoo to let them know that they have been updated.
A few hypothesis come to mind in this regard, and one is that it is a Blogger blog. It is not hosted at Blogger though, it is hosted on my own website. So I am not leaning toward that part of it (albeit there is the atom.xml file there).
All of the files are static Html extension files. It could be that they just do not like the way the are filed, for they appear to be several directories deep at first glance.
If it isn't that, then they are not indexing more that the one click from the Home Page. You take your pick on which one to go with, because I am still waiting. They have roughly only indexed 10% of the pages that Google has for this site.
Another thing I have noticed about Slurp is that it does not index dynamic Urls past the first query. It will stop at the ampersand sign (&) and it also stops at the number sign (#) as well. This was noticed on a forum that I am part of. In this case, Slurp has only indexed one out of 6 pages that Google has....and those pages are very old and out of date.
My conclusion is that Yahoo is either not interested in fresh content, or is incapable of indexing it.
rocky1
06-03-2004, 09:59 PM
I'm pretty sure that the Slurp crawler like the other major crawlers doesn't have a set pattern as such and I'd say the regularity of its visits differ from site to site and I'm guessing that it builds some kind of 'pattern' for each site depending on how often a site is updated. You'll probably find that sites updated daily (blogs, newssites) get visited more often.
Ade
Rocky, IMO, the weekly programmed bot visit might be a coincidence you got your listings within thirty hours.
FYI, when I revised my web site and modified the entire look and appearance, it took just hardly 12 hours for Google to list my newly revised site, wherein other SE took some weeks to list but the cache show only the old version. Later on, after a month or so, Yahoo too listed the new cache.
Guys the Honey - Honey Sales (http://www.thomashoney.com) site I'm working on has been in on-going revisions and additions for the entire 6 weeks. After the first week of Google results bouncing in and out on the site, I backed off for a week or so to allow the revised pages to settle on Google. Since then I have been tweaking pages constantly, making revisions, changes, additions, and improving results on the engines that are indexing the site 3 - 4 times a week.
Yahoo has not changed a single page in cache, (aside from the first appearance of the newly revised page within 30 hours, which was removed later that same day, to never be seen again). The cache page (http://216.109.117.135/search/cache?p=Thomas+Honey&ei=UTF-8&cop=mss&u=www.thomashoney.com/&w=thomas+honey&d=18EA5A87FC&c=482&yc=57868&icp=1) since is marked "Beta", meaning what? That this page in cache is being tested to see how long a result can remain the same before someone notices? That this page in cache is being tested to see how long before the media it is written on falls apart and they have to reindex the site?
I understand it often takes time to get a new site listed... but, this is not a new site. It's been up for several years. It was carrying a PR 3 on Google, with the recent work and the very recent update it's now carrying a PR 4. It's not like this site just happened, it's like Yahoo! is just ignoring it exists in hopes someone will pay $300 to get it listed.
And, honestly I'm okay with that, we'll just politely forget Yahoo! exists because if they continue to do business in that fashion they won't, Google will waste them, plain and simple. Why would I pay $300 to get listed on Yahoo!, when the #1 Search Engine will list me for nothing, AND come back and update that listing 3 - 4 times a week, for nothing!!!
The problem I have, is Yahoo! having made all the noise about how they were going to be a contender for Google's #1 crown, and now they're no where to be seen. They aren't even in the same league let alone the same ballpark, when it comes to keeping fresh data in their results. I think we all need to let Yahoo! know that, "Hey, you aren't Google guys! You are not #1... you never will be if you can't index an established site that has been submitted for review repeatedly and has been in a constant state of change for over six weeks, and still is. It's that simple!