iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Insider Reports Anyone is welcome to reply and discuss but starting new topics is reserved for WebProWorld staff and MVPs.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 03-08-2004, 12:40 PM
Garrett's Avatar
WebProWorld Veteran
 
Join Date: Jun 2003
Location: Lexington, KY. USA
Posts: 316
Garrett RepRank 0
Default Crawler Tips From Google and Yahoo!

My favorite sessions at the SES conference were those where Google and Yahoo appeared on the same panels. You could almost always count on some crackling tension between the two search giants.

The "Meet The Crawlers" session was no exception.

Craig Neville-Manning, senior research scientist at Google, had some great advice for webmasters (and a pointed barb for Yahoo - we'll get to that though).

"We don't like pointing users to pages that you change," said Craig, describing the practice of cloaking. Avoid cloaking at all costs. As Tim Mayer of Yahoo pointed out during our lunch chat, there's a legitimate use for every potential spam technique.

You can use cloaking to show search engines an optimized page and then, say, a Flash intensive page when users arrive. Despite cloaking's legitimate uses though, authorities recommend that you don't do it.

Concerned you might be cloaking? Read this page cloaking article.

Craig revealed an good rule of thumb for optimizers - Google's algorithm values text and links that your site visitors can see more highly than anything they can't see. This means focus your efforts more on explicit, helpful, and keyword-focused links, as well as copy that informs your visitors.

For those concerned that the Google bot uses too much bandwidth he mentioned that it can detect when your server is slowing down and it backs off. Also, the Google bot follows the robots.txt file to the letter.

If you have content you don't want the bot to find be sure to put the robots.txt file up to keep it out. The bot, says Craig, can find content that's unlinked. That's right, the Google bot can find single pages dangling unlinked in space. He didn't explain how this happens.

The question and answer session revealed a bit of how the Google looks at keywords in the url. Someone from the audience asked about how Google views words in the url, whether you should hyphenate them or not for added relevancy and ranking.

Craig said that Google does index words from the url, but they don't have as much weight as text links. He added quickly though that you should not engineer your links for the algorithm - it's better to have your url meaningful to your visitors than use it to affect your ranking.

Tim Mayer of Yahoo seconded this. He said focusing too much on url engineering can get you into the realm of over-optimization. "As a user," said Tim, "if I see a domain with lots of hyphens it's usually a low quality site." He advised that you not push your filenames too hard, and that you have intuitive directory structures.

In the Link Building Strategies session prominant seo guru Greg Boser said "hyphenated domains have come and gone."

The big talk at this conference was the new Yahoo paid inclusion program, which allows webmasters to pay to show up in Yahoo's primary search results. At the close of Google-employed Craig's presentation he declared, in a comment obviously leveled at fellow presenter Tim Mayer of Yahoo, "our search results are not for sale."

Tim, ever the gentleman, let it slide.
__________________
Garrett French
Editor, WebProNews.com
http://www.WebProNews.com
Reply With Quote
  #2 (permalink)  
Old 03-08-2004, 08:17 PM
haystack's Avatar
WebProWorld New Member
 
Join Date: Jan 2004
Location: Minneapolis, MN
Posts: 9
haystack RepRank 0
Default Re: Crawler Tips From Google and Yahoo!

Quote:
Originally Posted by Garrett
If you have content you don't want the bot to find be sure to put the robot.txt file up to keep it out.
True, but you might want to take things a step further and password protect the content because some people will check your robots.txt file to fine out what you don't want people to see. In a sense, placing content you don't want to have spidered in your robots.txt file is like telling people where the safe is.
Reply With Quote
  #3 (permalink)  
Old 03-08-2004, 09:34 PM
Mel Mel is offline
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Posts: 1,903
Mel RepRank 2Mel RepRank 2
Default

To add to what Ed said not all bots comply with the robots.txt file (such as email harvesting bots) but they can use it as a roadmap to the goodies they want to gather.
__________________
Mel Nelson
Expert SEO | Cheap used cars
Reply With Quote
  #4 (permalink)  
Old 03-08-2004, 10:32 PM
minstrel's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Location: Ottawa, Canada
Posts: 2,554
minstrel RepRank 2minstrel RepRank 2
Default

Some people take the trouble to track down lists of "bad bots" and add them to the robots.txt file with a Disallow: / instruction (meaning ignore everything).

However, since the email harvesters and other bad bots are probably going to ignore the robots.txt file anyway, what is the point of going to the trouble of creating a file that lists a whole bunch of such 'bots and tells them not to spider your site? Seems to me this just makes it more difficult for the legitmate spiders who have to "read" through the list...
Reply With Quote
  #5 (permalink)  
Old 03-09-2004, 10:19 AM
Mel Mel is offline
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Posts: 1,903
Mel RepRank 2Mel RepRank 2
Default

The reason is that many bots dofollow the robots.txt protocols, but you might not like them chewing up your bandwidth as though they were your customers.

A good example is the bot which Zeus uses to crawl pages, which follows the robots.txt protocol but which was visting my site in one or more of its guises about twenty times a week, and taking about ten pages each visit. Since I disallowed them in the bots file, they have not taken a page. There are lots of other bots similar to this one, and disallowing them in the robots.txt file will save you bandwidth.
Anothers which you might or might not want to disallow:

Googlebot-Image which indexs your images.

ia_archiver which would like to have an image of every variation of your pages in its index since your site began.

Web Image Collector...
__________________
Mel Nelson
Expert SEO | Cheap used cars
Reply With Quote
  #6 (permalink)  
Old 03-09-2004, 11:58 AM
minstrel's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Location: Ottawa, Canada
Posts: 2,554
minstrel RepRank 2minstrel RepRank 2
Default

You're missing the point, Mel. I'm not talking about bots like Zeus or Google-Image which DO read and pay attention to the robots.txt file.

I'm talking about instructions to harvester bots which likely will be ignored by those bots anyway - extending the size of the robots.txt file by including instructions to those bots seems to me to be (1) pointless since they will likely be ignored anyway (sort of like posting a sign on your door saying "please do not steal my stuff"; people who are on an illegal mission anyway are hardly likely to read it and say, "oh, righto! I'll steal from the guy next door instead because he doesn't HAVE a sign."); and (2) disadvantageous to googlebot and the other legitimate spiders who now have to wade through more text to find any instructions which might apply to them.
Reply With Quote
  #7 (permalink)  
Old 03-09-2004, 02:54 PM
janeth's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Jul 2003
Location: Colombia S.A
Posts: 5,709
janeth RepRank 7janeth RepRank 7janeth RepRank 7janeth RepRank 7janeth RepRank 7janeth RepRank 7janeth RepRank 7janeth RepRank 7
Default

Quote:
That's right, the Google bot can find single pages dangling unlinked in space. He didn't explain how this happens.
How would the bot find a single page with no links pointing to it?
Reply With Quote
  #8 (permalink)  
Old 03-09-2004, 03:56 PM
WebProWorld Veteran
 
Join Date: Feb 2004
Location: Lodz, Poland
Posts: 328
adore RepRank 0
Default

Maybe some kind of file names simulation based on global statistics of file names?
__________________
http://www.twojecentrum.pl - Polish e-shopping center
http://dzwonki-loga.pl - Ringtones for mobile phones
Reply With Quote
  #9 (permalink)  
Old 03-09-2004, 05:00 PM
cbp cbp is offline
WebProWorld 1,000+ Club
 
Join Date: Oct 2003
Posts: 4,938
cbp RepRank 1
Default

Quote:
How would the bot find a single page with no links pointing to it?
Via someone with the Google toolbar clicking on it?

CBP
Reply With Quote
  #10 (permalink)  
Old 03-09-2004, 06:21 PM
WebProWorld New Member
 
Join Date: Sep 2003
Location: Vancouver
Posts: 3
jestersi RepRank 0
Default Google Toolbar

That's exactly it and the alexa toolbar too.
I noticed my awstats ended up in google one time from a server that's only used inside my network.
I have looked at the stats one day and the only way google could have known about it, was through the toolbar and sending "anonymous statistics" I haven't looked to much further into this but a Ethereal packet capture would reveal all I'm sure.
Reply With Quote
  #11 (permalink)  
Old 03-09-2004, 06:38 PM
Mel Mel is offline
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Posts: 1,903
Mel RepRank 2Mel RepRank 2
Default

Quote:
Originally Posted by minstrel
You're missing the point, Mel. I'm not talking about bots like Zeus or Google-Image which DO read and pay attention to the robots.txt file.

I'm talking about instructions to harvester bots which likely will be ignored by those bots anyway - extending the size of the robots.txt file by including instructions to those bots seems to me to be (1) pointless since they will likely be ignored anyway (sort of like posting a sign on your door saying "please do not steal my stuff"; people who are on an illegal mission anyway are hardly likely to read it and say, "oh, righto! I'll steal from the guy next door instead because he doesn't HAVE a sign."); and (2) disadvantageous to googlebot and the other legitimate spiders who now have to wade through more text to find any instructions which might apply to them.
First there seems to be some sort of assumption here that the bots need to read your robots.txt file to navigate your site, they do not, they do just fine with links and within .2 seconds of arriving at your site they will have a complete directory listing if they ignore the robots.txt.

Secondly they either obey the robots.txt or they do not. For those that do you have saved them and your self some time and bandwidth and for those that don't you have lost nothing. ;-))

As for Google having to "wade through all that text", they don't read your robots.txt file every timethey visit and teven a 6k file is really nothing for a bot that indexes 40k pages by the bushel basket all day long.

But if you have no objections to various bots eating up your bandwidth, or to haing say all you copyrigted photographs downloaded and indexed, or to let your email directory be spidered, the answer is to leave the robots.txt file off or just put in a welcome sign for all bots to come and go as they please.

Or do as Dan suggests and p/w protect your private directories.
__________________
Mel Nelson
Expert SEO | Cheap used cars
Reply With Quote
  #12 (permalink)  
Old 03-09-2004, 06:45 PM
Cyber Gypsy's Avatar
WebProWorld Member
 
Join Date: Jul 2003
Location: Reno, Nevada
Posts: 25
Cyber Gypsy RepRank 0
Default Dangleing pages

I think google crawls everything on your server. They have indexed pages that I know no one with any kind of searchengine toolbar have clicked only once.
I don't keep anything on my server I don't want them to index. They indexed a few sites I was working on a couple of times that never had a link to it. This caused me problems because MSN picked them up and they are 6 months behind and it had the wrong address when I finally got it done.[/url]
__________________
CyberGypsy
http://www.artmatrixwebdesign.com
Reply With Quote
  #13 (permalink)  
Old 03-09-2004, 06:55 PM
WebProWorld Member
 
Join Date: Jul 2003
Location: Springfield, IL
Posts: 42
hupp25 RepRank 0
Default crawling

When the bots crawl do they crawl httpsdoc pages or just httpdoc pages. I am just curious. Actually I have issues with the https pages not being crawled even with text links. Do all of those pages need to be in httpdoc form? Thanks,jlh.
Reply With Quote
  #14 (permalink)  
Old 03-09-2004, 06:56 PM
WebProWorld New Member
 
Join Date: Feb 2004
Location: Rye, Mornington Peninsula, Victoria, Australia
Posts: 24
aucomp RepRank 0
Default Crawlers

I have had over teh years many development sites in a temp directory under the main site.

Google and MSN have picked them up and I did not want them to.

I now use Robots.txt to stop this
__________________
mailto:info@netbookings.com.au
http://www.netbookings.com.au
Reply With Quote
  #15 (permalink)  
Old 03-09-2004, 07:03 PM
WebProWorld Veteran
 
Join Date: Jul 2003
Location: Mass, U.S.A.
Posts: 399
Conficio RepRank 0
Default Re: Dangleing pages

Quote:
Originally Posted by cyber Gypsy
I think google crawls everything on your server. They have indexed pages that I know no one with any kind of searchengine toolbar have clicked only once.
I don't keep anything on my server I don't want them to index. They indexed a few sites I was working on a couple of times that never had a link to it. This caused me problems because MSN picked them up and they are 6 months behind and it had the wrong address when I finally got it done.[/url]
Hi Cyber Gypsy,
I doubt your theory. In this case you'd see lots of page faults (404 errors) in your logs. And hopefully you don't. I think CBP's theory with the GoogleBar is much more logical.

K<o>
Reply With Quote
  #16 (permalink)  
Old 03-09-2004, 07:34 PM
TrafficProducer's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Jul 2003
Location: United Kingdom
Posts: 1,643
TrafficProducer RepRank 4TrafficProducer RepRank 4TrafficProducer RepRank 4
Default You crawlers

You crawlers

robots.txt

Is there any bots, crawlers, which use:-

robot.txt note no "s"
Reply With Quote
  #17 (permalink)  
Old 03-09-2004, 08:07 PM
janeth's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Jul 2003
Location: Colombia S.A
Posts: 5,709
janeth RepRank 7janeth RepRank 7janeth RepRank 7janeth RepRank 7janeth RepRank 7janeth RepRank 7janeth RepRank 7janeth RepRank 7
Default

The tool bar would be the only way I would see it could work
Reply With Quote
  #18 (permalink)  
Old 03-09-2004, 09:00 PM
minstrel's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Location: Ottawa, Canada
Posts: 2,554
minstrel RepRank 2minstrel RepRank 2
Default

Let me try one last time...

Quote:
Originally Posted by minstrel
I'm not talking about bots like Zeus or Google-Image which DO read and pay attention to the robots.txt file. I'm talking about instructions to harvester bots which likely will be ignored by those bots anyway
Quote:
Originally Posted by Mel
First there seems to be some sort of assumption here that the bots need to read your robots.txt file to navigate your site, they do not, they do just fine with links and within .2 seconds of arriving at your site they will have a complete directory listing if they ignore the robots.txt.
Where on earth did you get that from? I neither said it nor believe it - look at all the sites that don't even HAVE a robots.txt file. But if it's there, googlebot will read and respect it, as will spiders from the other major search engines - if you doubt that, go and have a look at what Google, Yahoo!, and others say in their instructions to webmasters...

Quote:
Secondly they either obey the robots.txt or they do not. For those that do you have saved them and your self some time and bandwidth and for those that don't you have lost nothing.
See above - I have said twice now that I am talking about the ones that DON'T, not the ones that do... *sigh*

Quote:
But if you have no objections to various bots eating up your bandwidth, or to having say all you copyrighted photographs downloaded and indexed, or to let your email directory be spidered, the answer is to leave the robots.txt file off or just put in a welcome sign for all bots to come and go as they please.
Yet another example of something I didn't say...
Reply With Quote
  #19 (permalink)  
Old 03-09-2004, 09:40 PM
minstrel's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Location: Ottawa, Canada
Posts: 2,554
minstrel RepRank 2minstrel RepRank 2
Default

Quote:
Originally Posted by cbp
Quote:
How would the bot find a single page with no links pointing to it?
Via someone with the Google toolbar clicking on it?
Possibly. Or if the paqe is on an existing site but simply isolated in terms of links, it may be that without a Disallow instruction Google will spider everything on that site (once it's found the site), including subdirectories?
Reply With Quote
  #20 (permalink)  
Old 03-10-2004, 02:50 AM
Mel Mel is offline
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Posts: 1,903
Mel RepRank 2Mel RepRank 2
Default

NO Minstrel you didn't say it I did. It is just possible that others will express independant opinions here from time to time, and possible even that others will be interested in those opinions. ;-))
__________________
Mel Nelson
Expert SEO | Cheap used cars
Reply With Quote
  #21 (permalink)  
Old 03-10-2004, 03:51 AM
WebProWorld 1,000+ Club
 
Join Date: Feb 2004
Location: Australia
Posts: 1,255
Dave Hawley RepRank 0
Default

Minstrel, it's a lot like pulling teeth isn't it :o)

Quote:
That's right, the Google bot can find single pages dangling unlinked in space
Quote:
Via someone with the Google toolbar clicking on it?
But how would someone (with the toolbar) arrive at the page in the first place?
Reply With Quote
  #22 (permalink)  
Old 03-10-2004, 05:06 AM
WebProWorld New Member
 
Join Date: Mar 2004
Location: Calcutta, India
Posts: 21
Ajigrmtech RepRank 1
Default Toolbar can do it

Dave, I have personel experience with it, google does that.

We had a page,which we removed it completly after a month or so (once crawled though) and there were no links as for sure as our site was new. But after viewing our log files we saw google bot scanning the unlinked page too.

I will agree with minstrel as when we asked few experts about it, they all answered 'G Toolbar', I donot know the tech behind it. Robots.txt can be used though but the pages remains in the cache if it is crawled once.

Aji
__________________
Human Edited Directory - hedir.com is human edited directory with a difference
Aji - My New Blog
Reply With Quote
  #23 (permalink)  
Old 03-10-2004, 06:57 AM
WebProWorld 1,000+ Club
 
Join Date: Feb 2004
Location: Australia
Posts: 1,255
Dave Hawley RepRank 0
Default

Hi Aji

But my question still stands unanswered.
Reply With Quote
  #24 (permalink)  
Old 03-10-2004, 07:10 AM
WebProWorld Pro
 
Join Date: Jul 2003
Location: UK
Posts: 127
PhilC RepRank 1
Default

Neither Googlebot nor any other spider can find an isolated page in a site (on the web) unless they are told that it is there. It doesn't matter what Craig from Google said - they just can't do it.

There are several way that Google can be told that the page is there:-

The toolbar is one that has been mentioned. Uploading a page and simply viewing it in a browser that has the toolbar working is enough.

Putting AdSense on the page, however briefly, is another way.

If the page has a link from it to another page, and if the other page's site has log analyising software that makes the analysis public, then the page may be listed as a referrer.

Obviously, naming it in the robots.txt file is an obvious way of finding it.

I can't think of any more ways offhand, but, without such ways, no spider can find a page that is not linked to from another page, or its URL written in another page. They must be told that the page exists one way or another, otherwise they cannot find it.
__________________
PhilC
SEO articles, information and seo forums
Reply With Quote
  #25 (permalink)  
Old 03-10-2004, 08:43 AM
WebProWorld New Member
 
Join Date: Mar 2004
Location: Tokyo, Japan
Posts: 20
ChrisBowd RepRank 0
Default Google, hanging files and the joys of seo

Possibly the Google chappie meant that Google 'can' index hanging files as opposed to Google 'does' index them as a matter of course.

I think it's probably reasonable to state that given a dedicated server running a secure Web server hosting a single URL on a unique IP address, setup with secure directory permissions and administered by a competent sysadmin Google will not find orphan pages.

Now, in the real world xx% of URLs are hosted on shared severs with questionable security policies, unknown OS update policies, layers of virtual hosting software, web host management software, virtual directories, multiple resellers selling off the same virtual root and inexperienced webmasters who are oblivious to the meaning of directory permissions etc. etc. Web server directory traversal exploits are well documented and I can well believe that Googlebot inadvertently finds an enormous amount of data that its not supposed to.

Anyway, I just love Googlebot - I spent months optimizing my main site to try to achieve a decent SERP in the relatively uncompetitive Japanese business consulting area with 0 results and Google only ever indexing the index.htm file. Then 2 weeks ago I put up a 2 page mini-site which I have never submitted to any search-engine (let alone Google) and here it is today sitting in Google's #3 slot for one of my keyphrases and that specific phrase only appears once in the 2 pages!!!

Not of course that I am complaining :-)

Thanks for a great forum - it has given me some great pointers over the months.
__________________
Chris Bowd
doing business in Japan
Reply With Quote
  #26 (permalink)  
Old 03-10-2004, 08:45 AM
WebProWorld Member
 
Join Date: Jul 2003
Location: Springfield, IL
Posts: 42
hupp25 RepRank 0
Default http vs https

Can anyone tell me if it matters - will Google or any of them crawl a httpsdoc page or does it need to be in a httpdoc form???? I need to know this and am very curious if anyone can help me. Thanks,jlh.
Reply With Quote
  #27 (permalink)  
Old 03-10-2004, 08:54 AM
WebProWorld Pro
 
Join Date: Jul 2003
Location: UK
Posts: 127
PhilC RepRank 1
Default

Security exploits could sometimes find pages that could not be found any other way, but there's no way that Google would indulge in that sort of activity - isn't it illegal where they are?
__________________
PhilC
SEO articles, information and seo forums
Reply With Quote
  #28 (permalink)  
Old 03-10-2004, 08:59 AM
WebProWorld New Member
 
Join Date: Mar 2004
Location: Tokyo, Japan
Posts: 20
ChrisBowd RepRank 0
Default httpsdocs

Yes - Google can index pages in your httpsdocs/ folder.
__________________
Chris Bowd
doing business in Japan
Reply With Quote
  #29 (permalink)  
Old 03-10-2004, 09:08 AM
WebProWorld New Member
 
Join Date: Mar 2004
Location: Tokyo, Japan
Posts: 20
ChrisBowd RepRank 0
Default Google - illegal - never!

I don't suppose for one moment that Google deliberately indexes or looks for orphan pages. I would think they have enough problems just keeping up with the growth of the pages we want them to see!

My real point was/is that given some of the questionable security practices, especially of smaller 'notebbok' hosting companies it is likely that Google inadvertently indexes orphan pages.

Sorry if I caused any confusion.

As to whether its illegal that's an interesting question. In principle unauthorized access to any file is illegal unless you explicitly state that access is allowed - a similar legal principle to that of trespass, i.e. thieves should not need to see a notice on your front door stating "Private" to know that it is so! On the Web though the inverse applies - if you do not use a robots.txt file the assumption is that everything is public.
__________________
Chris Bowd
doing business in Japan
Reply With Quote
  #30 (permalink)  
Old 03-10-2004, 09:46 AM
WebProWorld New Member
 
Join Date: Oct 2003
Location: Columbus Ohio
Posts: 6
spamfork RepRank 0
Default

Quote:
That's right, the Google bot can find single pages dangling unlinked in space


Quote:
Via someone with the Google toolbar clicking on it?


But how would someone (with the toolbar) arrive at the page in the first place?


Hello Dave-

To help further answer your question....

Typically Google arrives at a dangling page b/c someone within your company who knows about the page goes to it through the Google Toolbar. One other way that Google finds dangling pages is by dropping directory structures. For example let's say you have a direct link from your front page to www.yoursite.com/your_photos/vacation/mary.gif.

Let's say you don't have a link to a folder within your your_photos directory but you didn't protect that directory....Then Google may end up indexing everything by looking first to see if your vacation directory is indexable and then to the your_photos directory which you didn't protect so they could index everything in it.

~Spamfork
Reply With Quote
  #31 (permalink)  
Old 03-10-2004, 11:22 AM
freelancemom's Avatar
WebProWorld Member
 
Join Date: Aug 2003
Location: New England - USA
Posts: 75
freelancemom RepRank 0
Default Thanks

Thanks Garrett! Your posts are gold.

Lori
Reply With Quote
  #32 (permalink)  
Old 03-10-2004, 02:24 PM
WebProWorld New Member
 
Join Date: Feb 2004
Location: Lansing, MI
Posts: 3
mlwatson RepRank 0
Default

It's amazing what google will find. I for a week or so put a redirect from a free website to my main site in an attempt to upload link pages from the Zeus robot. I found that for anyone to trade links I would have to have my links page in the same domain.

I then changed the free site back to the original design (it's basically abandoned). Yet I still see results when I search for my company name Excalibur Brothers referencing the free website but not my main site and it no longer redirects. Perhaps I should put a link on that main page to my website so anyone clicking through can find my main site.

Anyway although I am now achieving first or second page ranking on about 30 terms similar to what I was targeting I still am not getting the ranking I'd like on my main search terms. Go figure.

I am curious if anyone knows for sure if Googlebot reads the "revisit-after" META tag? I have mine set to revisit every 2 days (I make a lot of changes trying to fine tune the site) and it seems to be doing so but I'm wondering if this is normal practice or is the META tag working?
Reply With Quote
  #33 (permalink)  
Old 03-11-2004, 03:00 AM
Mel Mel is offline
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Posts: 1,903
Mel RepRank 2Mel RepRank 2
Default

Regarding the revisit after tag, IMO it is not likely that it is honored by the spiders for many reasons, but chiefly because there is no way that any spider could operate efficiently if his spidering was controlled by millions of different unrelated tags around the web.

What you should be doing IMO is to forget about the revisit after tag and making sure instead that your server responds correctly to the if-modifed-since GET which Googlebot (and other spiders) use to insure it only reads pages which have changed since its last spidering.

This will ensure that your page is read by the spider on its next visit after you have made changes, but will not waste its time and your bandwidth uploading pages it already has in its index.
__________________
Mel Nelson
Expert SEO | Cheap used cars
Reply With Quote
Reply

  WebProWorld > Search Engines > Insider Reports

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 04:43 AM.



Search Engine Optimization by vBSEO 3.3.0