View Single Post
  #61 (permalink)  
Old 01-01-2009, 01:46 AM
Webnauts's Avatar
Webnauts Webnauts is offline
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Internal and external links and the rel="nofollow" attribute

Quote:
Originally Posted by deepsand View Post
I'm presently typing with one hand, owing to a pinched nerve rendering my right arm & shoulder very much in pain & less than fully functional, such that scrolling through lengthy pages is presently no mean task.

I've revisited said blog, and still find nothing there stated by Matt that relates to the issue at hand.

Kindly explicitly state your point.
I am about OBLs which we may not pass PR (e.g paid links, affiliates, etc)

Quote:
Originally Posted by Matt Cutts
At SES New York, someone asked “Why don’t you provide a parameter, like ‘?googlebot=nocrawl’ to say ‘Googlebot, don’t index this page’?” That was a pretty good question. The short answer would be that on pages you don’t want indexed by spiders, you can add this meta tag to the page:

<META NAME=”ROBOTS” CONTENT=”NOINDEX”>

You can read more about the noindex and nofollow meta tags on our webmaster pages.

But the user specifically wanted a url parameter. I mentioned that because the parameter “id” is often used for session IDs, Googlebot used to avoid urls with “?id=(let’s say a five digit or larger number)” but that I didn’t know if that was still true. I think someone else nearby asked “Isn’t that kind of an ugly hack though?” and I had to fall back on “You asked for something that worked, not something that was pretty.” The questioner persisted, but I was out of other ways to do it, so I said I’d pass the feedback on, namely “someone wants a url parameter that’s keeps Googlebot from indexing the page.”

That question came up again today, and I wanted to mention one more way to block Googlebot by using wildcards in robots.txt (Google supports wildcards like ‘*’ in robots.txt). Here’s how:
1. Add the parameter like ‘http://www.mattcutts.com/blog/some-random-post.html?googlebot=nocrawl’ to pages that you don’t want fetched by Googlebot.
2. Add the following to your robots.txt:

User-agent: Googlebot
Disallow: *googlebot=nocrawl

That’s it. We may see links to the pages with the nocrawl parameter, but we won’t crawl them. At most, we would show the url reference (the uncrawled link), but we wouldn’t ever fetch the page.
E.t.c.

Source: Googlebot: Keep out!

----------------------------------

HAPPY NEW YEAR EVERYBODY!!! My first post of 2009!
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO

Last edited by Webnauts; 01-01-2009 at 01:49 AM.
Reply With Quote