iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 11-27-2008, 07:52 PM
WebProWorld Veteran
 
Join Date: Jun 2004
Location: Australia
Posts: 531
watto RepRank 2watto RepRank 2
Default Robots.txt Question

I am trying to block pages at my site which are generated via a database. Example:

http://www.business-trader .com.au/buy_sell_business_result/23157/Delatite-Apartments-Merrijig-Timbertop.html

http://www.business-trader .com.au/buy_sell_business_result/23135/Contours-Woodville-.html

Will adding this code to my robots.txt work??

Disallow: /*buy_sell_business_result

Regards

watto
Reply With Quote
  #2 (permalink)  
Old 11-28-2008, 10:31 AM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,648
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Re: Robots.txt Question

I think this would be better:

Disallow: /buy_sell_business_result/

The asterisk acts as a wild card, but there is nothing between the / and the start of the folder name so the asterisk should not be needed.
__________________
The best way to learn anything, is to question everything.
Reply With Quote
  #3 (permalink)  
Old 11-28-2008, 05:03 PM
WebProWorld Veteran
 
Join Date: Jun 2004
Location: Australia
Posts: 531
watto RepRank 2watto RepRank 2
Default Re: Robots.txt Question

Thanks wige!

watto
Reply With Quote
  #4 (permalink)  
Old 11-28-2008, 11:58 PM
WebProWorld New Member
 
Join Date: Nov 2008
Posts: 11
muckle.martin RepRank 0
Default Re: Robots.txt Question

You can also disable this for some particular crawler by using:

User-agent: <botname>
Disallow: /buy_sell_business_result/

I normally stick to User-agent: *

Do you want to disable this only for bots or also for people browsing the website.

Please note that Disallow: /buy_sell_business_result/* may not work because it's not standard.
Reply With Quote
  #5 (permalink)  
Old 11-29-2008, 01:24 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt Question

Quote:
Originally Posted by muckle.martin View Post
Please note that Disallow: /buy_sell_business_result/* may not work because it's not standard.
That is not a standard, but Google, Yahoo and MSN supports it.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #6 (permalink)  
Old 11-29-2008, 03:15 AM
WebProWorld Veteran
 
Join Date: Jun 2004
Location: Australia
Posts: 531
watto RepRank 2watto RepRank 2
Default Re: Robots.txt Question

I am only trying to disallow bots. I actually removed this url from the serps using my GWT, but now they have re-indexed these pages. Not sure how this happend??? I did notice that in the url removal section of GWT it now says "expired" next to it!

Does anyone know what this means?

watto
Reply With Quote
  #7 (permalink)  
Old 11-29-2008, 03:41 AM
WebProWorld New Member
 
Join Date: Nov 2008
Posts: 11
muckle.martin RepRank 0
Default Re: Robots.txt Question

Quote:
Originally Posted by Webnauts View Post
That is not a standard, but Google, Yahoo and MSN supports it.
Thanks for sharing. Any reference where these SEs say they support this non standard syntax?
Reply With Quote
  #8 (permalink)  
Old 11-29-2008, 11:42 AM
WebProWorld Pro
 
Join Date: Dec 2007
Location: Brussels, Belgium
Posts: 164
Jean-Luc RepRank 2
Default Re: Robots.txt Question

Quote:
Originally Posted by wige View Post
I think this would be better:

Disallow: /buy_sell_business_result/
This line will disallow access to any URL starting with /buy_sell_business_result/. In other words, all the following URL's will be disallowed :
- http://www.business-trader.com.au/buy_sell_business_result/
- http://www.business-trader.com.au/buy_sell_business_result/23135/Contours-Woodville-.html
- http://www.business-trader.com.au/buy_sell_business_result/blahblah/index.php?color=green

Not sure that this is what you want.

Jean-Luc
__________________
Checking redirects made easy | | Professional AWStats Services
Reply With Quote
  #9 (permalink)  
Old 11-30-2008, 02:05 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Thumbs up Re: Robots.txt Question

Quote:
Originally Posted by muckle.martin View Post
Thanks for sharing. Any reference where these SEs say they support this non standard syntax?
Google: Using a robots.txt file to control access to your site - Webmaster Help Center

Yahoo: Yahoo! Search Blog: Yahoo! Search Crawler (Yahoo! Slurp) - Supporting wildcards in robots.txt

MSN: Live Search Webmaster Center Blog : Robots Exclusion Protocol: Joining Together to Provide Better Documentation
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #10 (permalink)  
Old 11-30-2008, 02:39 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt Question

Quote:
Originally Posted by watto View Post
I am only trying to disallow bots. I actually removed this url from the serps using my GWT, but now they have re-indexed these pages. Not sure how this happend???
If you removed the URLs from the SERPs using your GWT, that means that at that point your robots.txt were edited and disallowing properly. Otherwise your GWT would have returned a message "DENIED"!!!

That means, you screwed up your robots.txt, and not someone else. Just to make that clear here... (thanks for telling the truth after all).

Quote:
Originally Posted by watto View Post
I did notice that in the url removal section of GWT it now says "expired" next to it!

Does anyone know what this means?

watto
When you use the URL removal request tool to remove content from the Google index, your content is removed for a minimum of 90 days. However, you can reinclude your content at any time during the 90-day period. After that period, the button "Re-include" next to the removed URL changes to "Expired".
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO

Last edited by Webnauts; 11-30-2008 at 02:53 AM.
Reply With Quote
  #11 (permalink)  
Old 11-30-2008, 05:37 PM
WebProWorld Veteran
 
Join Date: Jun 2004
Location: Australia
Posts: 531
watto RepRank 2watto RepRank 2
Default Re: Robots.txt Question

webnauts, what do you mean by (thanks for telling the truth after all)?????? I don't recall telling any lies in this thread.

Also, how did I screw up my robots.txt file? All I did was re-include my category pages in GWT. My robots.txt file has stayed the same. If you take a look, you will see!

Why don't you actually have a look at my robots.txt and you tell me why all of the urls starting with http://www.business-trader.com.au/bu...siness_result/ were re-indexed? Visit site:business-trader.com.au and you will see.....

(Disallow: /buy_sell_business_result/) I added this code myself the other day hoping it will fix the problem.

Any suggestions?

watto

Last edited by watto; 11-30-2008 at 06:01 PM. Reason: Because I want to add more info?
Reply With Quote
  #12 (permalink)  
Old 11-30-2008, 10:39 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Angry Re: Robots.txt Question

Quote:
Originally Posted by watto View Post
webnauts, what do you mean by (thanks for telling the truth after all)?????? I don't recall telling any lies in this thread.
I am not talking about what you said in the thread. But since you want to play innocent, I think I must make something clear here - last time and for good!

I am a member here since 2003, and endless of members are thankful for my free free support. And as endless of respectful members here can confirm, my over 7500 posts were almost no questions I asked for help for.

But the experience I made here was the motive generator to improve my skills, and as a thank I always shared what I knew and learned from my many years studying and experimenting.

Now! Since members here are not aware that you usually blame others when your screw up your site, and to be specific you blame others who try to help, I think I would be fair to the community to share some facts:

A while ago I was involved in a thread here about web site security using .htaccess. I am sure our member in this thread Wige can remember that very well.

I posted a copy of the .htaccess I created and shared it with the community here. Then you picked that up and added it in your .htaccess file. The same time you did that, Google had problems with their webmaster tools, which they still do, and you were blaming here that the .htaccess information I shared here have screwed up your site. After back and forth arguing, it came out that my rules have not caused any damages, rather it was one of those endless glitches/bugs of Google Webmaster Tools.

After that, you asked me if I can help you providing you my services for PageRank Sculpting. Then I have gave you a quote and you said that you can't afford it asking for a possible discount.

Exceptionally I agreed to offer you a 50% discount, and I did not only provide you with that service, but far a lot more than I had to do for you.

You were with my service so happy and the site had an brilliant success (Incrediblehelp can confirm, since I have shared with him this success story). Then you have send me a testimonial which I could add on my testimonial page, which I published immediately.

Here is the message you have send me:

Quote:
Hi John, below is my testimonial of you would like to use this for your site.

*I highly recommend using John and his team at SeoWorkers.com for all Search Engine Optimization work required. After analyzing our site, John put together a plan and implemented these techniques within 1 day of making payment. To our surprise we experienced fantastic results within 1 week! John's knowledge of SEO and attention to detail surpasses any other SEO 'guru' I have ever come across. *
You even added a link to my web site on your page here http://www.business-trader.com.au/fees.php, which after this story you took it out! Glad you did though. And never do that again.

At some point you began messing around with the robots.txt which you have modified, and then you contact me asking for help because something went wrong. When I told you that I am not paid for monitoring your site, but I can give you a quote you have been pissed off, and you have send me per email the following messages:

Quote:
I'll get someone to do it at Elance.com for peanuts. It only needs to be
basic, so you would be over qualified for this. :- )
Quote:
"Looks like you made a big mistake with my robots.txt. Don't worry John; your secret is safe with me. J
I'm just glad that I found it without having to pay you a ridiculous amount of money for you to fix your own mistake."
Quote:
Please remove my testimonial from your site SEO Workers Clients & Partners Testimonials
Quote:
Originally Posted by watto View Post
Also, how did I screw up my robots.txt file? All I did was re-include my category pages in GWT. My robots.txt file has stayed the same. If you take a look, you will see!
I think I see what you screwed up. Most probably it is not a robots.txt issue, especially if you changed that stuff you modified back to what I did. But do you think I will tell you what I found? Hell not. Everyone here would say that I am nuts if I do sorry. No Watto, sorry.

Quote:
Originally Posted by watto View Post
Why don't you actually have a look at my robots.txt and you tell me why all of the urls starting with http://www.business-trader.com.au/bu...siness_result/ were re-indexed? Visit site:business-trader.com.au and you will see.....

(Disallow: /buy_sell_business_result/) I added this code myself the other day hoping it will fix the problem.

Any suggestions?

watto
I will not do that for free or for cash either. I think that is obvious or?

I felt like sharing all this with our members here, before they will be blamed to when they try to help you out and you screw things up. Once I probably can take such BS . Twice is far too much.

I would like to share with you the tip you shared with me:
Go to Elance.com. You can get help for peanuts.

__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO

Last edited by Webnauts; 11-30-2008 at 10:53 PM.
Reply With Quote
  #13 (permalink)  
Old 12-01-2008, 01:17 AM
WebProWorld Veteran
 
Join Date: Jun 2004
Location: Australia
Posts: 531
watto RepRank 2watto RepRank 2
Default Re: Robots.txt Question

Wow John! Lets read your first sentence "I am not talking about what you said in this thread".....Um, ok then, sure.

Then what in the hell are you going on about? I'm not quite sure where all of this is coming from, but unless you have some suggestions in relation to my question, please stay out of this thread.

Take care!

watto
Reply With Quote
  #14 (permalink)  
Old 12-01-2008, 01:33 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt Question

Quote:
Originally Posted by watto View Post
Wow John! Lets read your first sentence "I am not talking about what you said in this thread".....Um, ok then, sure.
Fine.

Quote:
Originally Posted by watto View Post
Then what in the hell are you going on about? I'm not quite sure where all of this is coming from,
Should I post here the emails including the headers? Are you claiming that I am lying? Should I post here the entire mails including their headers as evidence? Give it up buddy.

Quote:
Originally Posted by watto View Post
but unless you have some suggestions in relation to my question, please stay out of this thread.
I prefer to leave the thread.

Quote:
Originally Posted by watto View Post
Take care! Thanks.

watto
I will.

John
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #15 (permalink)  
Old 12-01-2008, 11:07 AM
crankydave's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Aug 2004
Location: Playing with fire!
Posts: 4,243
crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9crankydave RepRank 9
Default Re: Robots.txt Question

watto and John...

I'd appreciate if you'd like to continue your discussion that you do so privately.

Thanx!

Dave
Reply With Quote
  #16 (permalink)  
Old 12-01-2008, 01:46 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt Question

Quote:
Originally Posted by crankydave View Post
watto and John...

I'd appreciate if you'd like to continue your discussion that you do so privately.

Thanx!

Dave
You are right Dave, though I was done with this thread anyway.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #17 (permalink)  
Old 12-01-2008, 03:04 PM
Moderator
WebProWorld Moderator
 
Join Date: Oct 2003
Location: Alberta, Canada
Posts: 878
weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6
Default Re: Robots.txt Question

That is one serious robots.txt file. Wouldn't some of the more obvious entries be better handled in .htaccess or your server config? 7573 bytes... isn't this a little excessive?

My knowledge of robots.txt is liimited, so if I appear ignorant here, so be it... Isn't the whole point of robots.txt one of disallowing robots from indexing certain directories or files? Where does ALLOW come into this?
Reply With Quote
  #18 (permalink)  
Old 12-01-2008, 04:28 PM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,648
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Re: Robots.txt Question

The default is to allow everything. The Allow directive acts as an override if you have disallowed something at a higher level. Suppose you want to disallow your entire /cgi-bin/ folder except for the file /cgi-bin/important-content.pl. This would be done with Allow.

User-Agent: *
Disallow: /cgi-bin/
Allow: /cgi-bin/important-content.pl

Although I do agree that 7573 is hefty for a robots.txt file, putting directives in the .htaccess file will add to the server's load as it has to process each directive each time a page is requested. Compare that with a single download of the robots.txt file per day, per bot, and using the robots.txt file for certain exclusions may be preferable.
__________________
The best way to learn anything, is to question everything.
Reply With Quote
  #19 (permalink)  
Old 12-02-2008, 12:34 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt Question

Quote:
Originally Posted by weegillis View Post
That is one serious robots.txt file. Wouldn't some of the more obvious entries be better handled in .htaccess or your server config? 7573 bytes... isn't this a little excessive?

My knowledge of robots.txt is liimited, so if I appear ignorant here, so be it... Isn't the whole point of robots.txt one of disallowing robots from indexing certain directories or files? Where does ALLOW come into this?
I am creating and optimizing a CubeCart based Greek online store. How can I edit the below rules otherwise, that I get the same results?

User-agent: Googlebot
Disallow: /*?
Noindex: /*?
Disallow: /*/*?
Noindex: /*/*?
Disallow: /*i-i*
Noindex: /*i-i*
Disallow: /*-i-i*
Noindex: /*-i-i*
Disallow: /*%23content-skip
Noindex: /*%23login-skip
Allow: /about/info_1.html
Allow: /contact/info_2.html
Allow: /sitemap/info_5.html
Allow: /affiliates/info_6.html
Disallow: /*/info_*.html
Noindex: /*/info_*.html
Allow: /*/cat_*.html
Allow: /*/*/cat_*.html
Disallow: /*/*/*/prod_*.html
Noindex: /*/*/*/prod_*.html
Disallow: /tellafriend/tell_*.html
Noindex: /tellafriend/tell_*.html
Disallow: /discl.htm
Noindex: /discl.htm
Disallow: /dialogue.htm
Noindex: /dialogue.htm
Disallow: /pchase_verify.htm
Noindex: /pchase_verify.htm
Disallow: /orderform.htm
Noindex: /orderform.htm
Disallow: /search.htm
Noindex: /search.htm

Please notice that the "nofollow" attribute is not used sitewide at all, and I definitively don't want to use it.

Notice: I only added here the rules I am using for Googlebot.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #20 (permalink)  
Old 12-02-2008, 12:58 AM
Moderator
WebProWorld Moderator
 
Join Date: Oct 2003
Location: Alberta, Canada
Posts: 878
weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6
Default Re: Robots.txt Question

May I take it that you're asking Wige, John? You can't be asking me...

Question, though, concerning the Noindex: directive. Isn't it redundant if the directory is already disallowed? And another, if you will bear with my naivete, didn't I just read that ALLOW would follow DISALLOW? What am I missing here?

Good point about server load, Wige. I'll keep that in mind, from now on.
Reply With Quote
  #21 (permalink)  
Old 12-02-2008, 05:59 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt Question

Quote:
Originally Posted by weegillis View Post
Question, though, concerning the Noindex: directive. Isn't it redundant if the directory is already disallowed?
If you can believe that or not, I removed URLs indexed by Google on some sites of mine with their removal tool. The disallow rule was all the time there. And suddenly Google indexed then again. Scary, but true.

Quote:
Originally Posted by weegillis View Post
And another, if you will bear with my naivete, didn't I just read that ALLOW would follow DISALLOW? What am I missing here?
It is not your naivete, but it is that you are right. I was not ready with the robots.txt yet, and I saw that when I was editing the last rules I was missing. And you posted that too. Thanks though, because if I was ready, it would not do what I wanted. [/quote]
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #22 (permalink)  
Old 12-02-2008, 06:03 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt Question

Quote:
Originally Posted by watto View Post
I am trying to block pages at my site which are generated via a database. Example:

http://www.business-trader .com.au/buy_sell_business_result/23157/Delatite-Apartments-Merrijig-Timbertop.html

http://www.business-trader .com.au/buy_sell_business_result/23135/Contours-Woodville-.html

Will adding this code to my robots.txt work??

Disallow: /*buy_sell_business_result

Regards

watto
Change this part of your robots.txt

Code:
Allow: /buy_sell_business_search/australia/South%20Australia/index.html     
Allow: /buy_sell_business_search/australia/Victoria/index.html     
Allow: /buy_sell_business_search/australia/New%20South%20Wales/index.htm
Allow: /buy_sell_business_search/australia/Queens%20Land/index.html 
Allow: /buy_sell_business_search/0/australia/Western%20Australia/index.html 
Allow: /buy_sell_business_search/australia/Northern%20Territory/index.html
Allow: /buy_sell_business_search/australia/Tasmania/index.html 
Allow: /buy_sell_business_search/Cafes/australia/0/index.html     
Allow: /buy_sell_business_search/Hotels/australia/0/index.html     
Allow: /buy_sell_business_search/Pubs/australia/0/index.html     
Allow: /buy_sell_business_search/Restaurant/australia/0/index.html     
Allow:/buy_sell_business_search/Retail General/australia/0/index.html     
Allow: /buy_sell_business_search/Tourism/australia/0/index.html
to this

Code:
Disallow: /buy_sell_business_search/
Allow: /buy_sell_business_search/australia/South%20Australia/index.html     
Allow: /buy_sell_business_search/australia/Victoria/index.html     
Allow: /buy_sell_business_search/australia/New%20South%20Wales/index.htm
Allow: /buy_sell_business_search/australia/Queens%20Land/index.html 
Allow: /buy_sell_business_search/0/australia/Western%20Australia/index.html 
Allow: /buy_sell_business_search/australia/Northern%20Territory/index.html
Allow: /buy_sell_business_search/australia/Tasmania/index.html 
Allow: /buy_sell_business_search/Cafes/australia/0/index.html     
Allow: /buy_sell_business_search/Hotels/australia/0/index.html     
Allow: /buy_sell_business_search/Pubs/australia/0/index.html     
Allow: /buy_sell_business_search/Restaurant/australia/0/index.html     
Allow:/buy_sell_business_search/Retail General/australia/0/index.html     
Allow: /buy_sell_business_search/Tourism/australia/0/index.html
and add an a html file in http://www.business-trader .com.au/buy_sell_business_result/ directory, and naming it index.html

Let me know if that worked.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO

Last edited by Webnauts; 12-02-2008 at 06:11 AM.
Reply With Quote
  #23 (permalink)  
Old 12-02-2008, 06:19 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt Question

Quote:
Originally Posted by watto View Post
webnauts, what do you mean by (thanks for telling the truth after all)?????? I don't recall telling any lies in this thread.

Also, how did I screw up my robots.txt file? All I did was re-include my category pages in GWT. My robots.txt file has stayed the same. If you take a look, you will see!

Why don't you actually have a look at my robots.txt and you tell me why all of the urls starting with http://www.business-trader.com.au/bu...siness_result/ were re-indexed? Visit site:business-trader.com.au and you will see.....

(Disallow: /buy_sell_business_result/) I added this code myself the other day hoping it will fix the problem.

Any suggestions?

watto
If I understood, you want to delete that entire directory and with that rule you did not have success. Is that right?
If yes, add an html file in that directory naming it index.html

Then add the following rules in your robots.txt:
Disallow: /buy_sell_business_result/
Noindex: /buy_sell_business_result/

I know many will say that the noindex is not necessary, but as I mentioned in a previous post, I experienced that Google is misbehaving.

Let me know if that worked.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #24 (permalink)  
Old 12-02-2008, 12:02 PM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,648
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Re: Robots.txt Question

Noindex seems to be one of those quirky things. It is in the specification for robots.txt files, or at least it was in one of the early specifications, but it is poorly supported. If you ask Google how it is handled, the usual answer is "There's a noindex statement? I didn't know that..."

Although Google shows that they process the noindex directive as though it were a Disallow, I would be somewhat conservative with it's use since there is no way to know how other bots may react. You may want to put this directive toward the bottom of the robots.txt, just in case a spider hangs on it.
__________________
The best way to learn anything, is to question everything.
Reply With Quote
  #25 (permalink)  
Old 12-02-2008, 12:36 PM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,648
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Re: Robots.txt Question

John, looking at the robots.txt you asked about...

Wouldn't these four lines be redundant? The * should match a slash I believe, so if it matches /*/*? it should also match /*?...
Disallow: /*?
Noindex: /*?
Disallow: /*/*?
Noindex: /*/*?

Again, same idea for these, if it matches the pattern /*-i-i*, it must match the previous and less specific /*i-i* rule.
Disallow: /*i-i*
Noindex: /*i-i*
Disallow: /*-i-i*
Noindex: /*-i-i*

Also, should these be different?
Disallow: /*%23content-skip
Noindex: /*%23login-skip
__________________
The best way to learn anything, is to question everything.
Reply With Quote
  #26 (permalink)  
Old 12-02-2008, 09:11 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt Question

Quote:
Originally Posted by wige View Post
Noindex seems to be one of those quirky things. It is in the specification for robots.txt files, or at least it was in one of the early specifications, but it is poorly supported. If you ask Google how it is handled, the usual answer is "There's a noindex statement? I didn't know that..."
Google supports the "noindex".

Quote:
Originally Posted by wige View Post
Although Google shows that they process the noindex directive as though it were a Disallow, I would be somewhat conservative with it's use since there is no way to know how other bots may react. You may want to put this directive toward the bottom of the robots.txt, just in case a spider hangs on it.
I never had any problems using noindex. I only had positive experiences.

I would like to show you the entire robots.txt I have so far:
http://gameshop.seoworkers.com/robots.txt


Any space for improvement?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO

Last edited by Webnauts; 12-02-2008 at 09:15 PM.
Reply With Quote
  #27 (permalink)  
Old 12-02-2008, 09:20 PM
kgun's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: May 2005
Location: Norway
Posts: 5,684
kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9
Default Re: Robots.txt Question

Quote:
Originally Posted by Webnauts View Post
I would like to show you the entire robots.txt I have so far:
http://gameshop.seoworkers.com/robots.txt
John, a fairly impressive file. You block bad bots in .htaccess?
Reply With Quote
  #28 (permalink)  
Old 12-02-2008, 09:43 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt Question

Quote:
Originally Posted by kgun View Post
You block bad bots in .htaccess?
Not in the .htaccess file on this testing enviroment. But when the site will goes live on the client's server yes.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #29 (permalink)  
Old 12-02-2008, 10:06 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt Question

Lets look at this common for me scenario.

You have a page which you did not want that it shows up in Googles search results.
When you noticed that, you wanted to exclude it from their search results and you have disallowed it in the robots.txt.

You were not aware that you could use the noindex meta tag and/or you did not know that Google has a removal tool.

Do you believe that Google will remove that page when they will crawl you site and find that page?

Definetely not!

What I have noticed sculpting PR for many sites already, when they see that disallow directive, they do not crawl the page, and they also don't remove it.

When I add the noindex directive, and they crawl that page, they remove it from their index.

What more do I need to see, to be able to tell how google deals with the noindex directive?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO

Last edited by Webnauts; 12-02-2008 at 10:10 PM.
Reply With Quote
  #30 (permalink)  
Old 12-07-2008, 09:56 PM
full house's Avatar
WebProWorld Veteran
 
Join Date: Sep 2007
Posts: 522
full house RepRank 2
Default Re: Robots.txt Question

is there a robots txt for increasing the PR?
Reply With Quote
  #31 (permalink)  
Old 12-07-2008, 11:05 PM
Moderator
WebProWorld Moderator
 
Join Date: Oct 2003
Location: Alberta, Canada
Posts: 878
weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6
Default Re: Robots.txt Question

Would you have us increase PR (a proprietary aspect) by the restrictions we place on robots? Well, if you want maximum PR why not just forbid everyone except G?
Reply With Quote
  #32 (permalink)  
Old 12-08-2008, 09:19 PM
full house's Avatar
WebProWorld Veteran
 
Join Date: Sep 2007
Posts: 522
full house RepRank 2
Default Re: Robots.txt Question

how exactly you forbid everyone except google? you also need the help of other robots other than google bots.
Reply With Quote
  #33 (permalink)  
Old 12-08-2008, 10:08 PM
Moderator
WebProWorld Moderator
 
Join Date: Oct 2003
Location: Alberta, Canada
Posts: 878
weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6weegillis RepRank 6
Default Re: Robots.txt Question

PR is Google only, as far as I know. We're really only speaking hypothetically...
Reply With Quote
  #34 (permalink)  
Old 12-11-2008, 01:09 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,167
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Robots.txt Question

Quote:
Originally Posted by full house View Post
how exactly you forbid everyone except google?
User-agent: Googlebot
Allow: /

User-agent: *
Disallow: /

__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
Reply

  WebProWorld > Search Engines > Search Engine Optimization Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
robots.txt file question kruser IT Discussion Forum 2 09-13-2007 06:15 PM
robots.txt question kimber23 Search Engine Optimization Forum 4 12-05-2006 05:51 PM
robots.txt question braknews Search Engine Optimization Forum 2 05-11-2005 04:12 PM
Robots.txt question dyno Google Discussion Forum 2 08-24-2004 11:13 PM
A Robots.txt question braknews Google Discussion Forum 3 08-20-2004 03:31 PM


All times are GMT -4. The time now is 06:20 AM.



Search Engine Optimization by vBSEO 3.3.0