|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
I am trying to block pages at my site which are generated via a database. Example:
http://www.business-trader .com.au/buy_sell_business_result/23157/Delatite-Apartments-Merrijig-Timbertop.html http://www.business-trader .com.au/buy_sell_business_result/23135/Contours-Woodville-.html Will adding this code to my robots.txt work?? Disallow: /*buy_sell_business_result Regards watto |
|
|||
|
Thanks wige!
watto |
|
|||
|
You can also disable this for some particular crawler by using:
User-agent: <botname> Disallow: /buy_sell_business_result/ I normally stick to User-agent: * Do you want to disable this only for bots or also for people browsing the website. Please note that Disallow: /buy_sell_business_result/* may not work because it's not standard. |
|
||||
|
That is not a standard, but Google, Yahoo and MSN supports it.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
|||
|
I am only trying to disallow bots. I actually removed this url from the serps using my GWT, but now they have re-indexed these pages. Not sure how this happend??? I did notice that in the url removal section of GWT it now says "expired" next to it!
Does anyone know what this means? watto |
|
|||
|
Thanks for sharing. Any reference where these SEs say they support this non standard syntax?
|
|
|||
|
This line will disallow access to any URL starting with /buy_sell_business_result/. In other words, all the following URL's will be disallowed :
- http://www.business-trader.com.au/buy_sell_business_result/ - http://www.business-trader.com.au/buy_sell_business_result/23135/Contours-Woodville-.html - http://www.business-trader.com.au/buy_sell_business_result/blahblah/index.php?color=green Not sure that this is what you want. Jean-Luc |
|
||||
|
Quote:
Yahoo: Yahoo! Search Blog: Yahoo! Search Crawler (Yahoo! Slurp) - Supporting wildcards in robots.txt MSN: Live Search Webmaster Center Blog : Robots Exclusion Protocol: Joining Together to Provide Better Documentation
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Quote:
That means, you screwed up your robots.txt, and not someone else. Just to make that clear here... (thanks for telling the truth after all). When you use the URL removal request tool to remove content from the Google index, your content is removed for a minimum of 90 days. However, you can reinclude your content at any time during the 90-day period. After that period, the button "Re-include" next to the removed URL changes to "Expired".
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 11-30-2008 at 02:53 AM. |
|
|||
|
webnauts, what do you mean by (thanks for telling the truth after all)?????? I don't recall telling any lies in this thread.
Also, how did I screw up my robots.txt file? All I did was re-include my category pages in GWT. My robots.txt file has stayed the same. If you take a look, you will see! Why don't you actually have a look at my robots.txt and you tell me why all of the urls starting with http://www.business-trader.com.au/bu...siness_result/ were re-indexed? Visit site:business-trader.com.au and you will see..... (Disallow: /buy_sell_business_result/) I added this code myself the other day hoping it will fix the problem. Any suggestions? watto Last edited by watto; 11-30-2008 at 06:01 PM. Reason: Because I want to add more info? |
|
|||||||
|
Quote:
I am a member here since 2003, and endless of members are thankful for my free free support. And as endless of respectful members here can confirm, my over 7500 posts were almost no questions I asked for help for. But the experience I made here was the motive generator to improve my skills, and as a thank I always shared what I knew and learned from my many years studying and experimenting. Now! Since members here are not aware that you usually blame others when your screw up your site, and to be specific you blame others who try to help, I think I would be fair to the community to share some facts: A while ago I was involved in a thread here about web site security using .htaccess. I am sure our member in this thread Wige can remember that very well. I posted a copy of the .htaccess I created and shared it with the community here. Then you picked that up and added it in your .htaccess file. The same time you did that, Google had problems with their webmaster tools, which they still do, and you were blaming here that the .htaccess information I shared here have screwed up your site. After back and forth arguing, it came out that my rules have not caused any damages, rather it was one of those endless glitches/bugs of Google Webmaster Tools. After that, you asked me if I can help you providing you my services for PageRank Sculpting. Then I have gave you a quote and you said that you can't afford it asking for a possible discount. Exceptionally I agreed to offer you a 50% discount, and I did not only provide you with that service, but far a lot more than I had to do for you. You were with my service so happy and the site had an brilliant success (Incrediblehelp can confirm, since I have shared with him this success story). Then you have send me a testimonial which I could add on my testimonial page, which I published immediately. Here is the message you have send me: Quote:
At some point you began messing around with the robots.txt which you have modified, and then you contact me asking for help because something went wrong. When I told you that I am not paid for monitoring your site, but I can give you a quote you have been pissed off, and you have send me per email the following messages: Quote:
Quote:
Quote:
Quote:
Quote:
I felt like sharing all this with our members here, before they will be blamed to when they try to help you out and you screw things up. Once I probably can take such BS . Twice is far too much. I would like to share with you the tip you shared with me: Go to Elance.com. You can get help for peanuts.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 11-30-2008 at 10:53 PM. |
|
|||
|
Wow John! Lets read your first sentence "I am not talking about what you said in this thread".....Um, ok then, sure.
Then what in the hell are you going on about? I'm not quite sure where all of this is coming from, but unless you have some suggestions in relation to my question, please stay out of this thread. Take care! watto |
|
||||
|
Quote:
Quote:
Quote:
I will. John
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
You are right Dave, though I was done with this thread anyway.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
|||
|
That is one serious robots.txt file. Wouldn't some of the more obvious entries be better handled in .htaccess or your server config? 7573 bytes... isn't this a little excessive?
My knowledge of robots.txt is liimited, so if I appear ignorant here, so be it... Isn't the whole point of robots.txt one of disallowing robots from indexing certain directories or files? Where does ALLOW come into this? |
|
||||
|
The default is to allow everything. The Allow directive acts as an override if you have disallowed something at a higher level. Suppose you want to disallow your entire /cgi-bin/ folder except for the file /cgi-bin/important-content.pl. This would be done with Allow.
User-Agent: * Disallow: /cgi-bin/ Allow: /cgi-bin/important-content.pl Although I do agree that 7573 is hefty for a robots.txt file, putting directives in the .htaccess file will add to the server's load as it has to process each directive each time a page is requested. Compare that with a single download of the robots.txt file per day, per bot, and using the robots.txt file for certain exclusions may be preferable.
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Quote:
User-agent: Googlebot Disallow: /*? Noindex: /*? Disallow: /*/*? Noindex: /*/*? Disallow: /*i-i* Noindex: /*i-i* Disallow: /*-i-i* Noindex: /*-i-i* Disallow: /*%23content-skip Noindex: /*%23login-skip Allow: /about/info_1.html Allow: /contact/info_2.html Allow: /sitemap/info_5.html Allow: /affiliates/info_6.html Disallow: /*/info_*.html Noindex: /*/info_*.html Allow: /*/cat_*.html Allow: /*/*/cat_*.html Disallow: /*/*/*/prod_*.html Noindex: /*/*/*/prod_*.html Disallow: /tellafriend/tell_*.html Noindex: /tellafriend/tell_*.html Disallow: /discl.htm Noindex: /discl.htm Disallow: /dialogue.htm Noindex: /dialogue.htm Disallow: /pchase_verify.htm Noindex: /pchase_verify.htm Disallow: /orderform.htm Noindex: /orderform.htm Disallow: /search.htm Noindex: /search.htm Please notice that the "nofollow" attribute is not used sitewide at all, and I definitively don't want to use it. Notice: I only added here the rules I am using for Googlebot.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
|||
|
May I take it that you're asking Wige, John? You can't be asking me...
Question, though, concerning the Noindex: directive. Isn't it redundant if the directory is already disallowed? And another, if you will bear with my naivete, didn't I just read that ALLOW would follow DISALLOW? What am I missing here? Good point about server load, Wige. I'll keep that in mind, from now on. |
|
||||
|
Quote:
Quote:
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Quote:
Code:
Allow: /buy_sell_business_search/australia/South%20Australia/index.html Allow: /buy_sell_business_search/australia/Victoria/index.html Allow: /buy_sell_business_search/australia/New%20South%20Wales/index.htm Allow: /buy_sell_business_search/australia/Queens%20Land/index.html Allow: /buy_sell_business_search/0/australia/Western%20Australia/index.html Allow: /buy_sell_business_search/australia/Northern%20Territory/index.html Allow: /buy_sell_business_search/australia/Tasmania/index.html Allow: /buy_sell_business_search/Cafes/australia/0/index.html Allow: /buy_sell_business_search/Hotels/australia/0/index.html Allow: /buy_sell_business_search/Pubs/australia/0/index.html Allow: /buy_sell_business_search/Restaurant/australia/0/index.html Allow:/buy_sell_business_search/Retail General/australia/0/index.html Allow: /buy_sell_business_search/Tourism/australia/0/index.html Code:
Disallow: /buy_sell_business_search/ Allow: /buy_sell_business_search/australia/South%20Australia/index.html Allow: /buy_sell_business_search/australia/Victoria/index.html Allow: /buy_sell_business_search/australia/New%20South%20Wales/index.htm Allow: /buy_sell_business_search/australia/Queens%20Land/index.html Allow: /buy_sell_business_search/0/australia/Western%20Australia/index.html Allow: /buy_sell_business_search/australia/Northern%20Territory/index.html Allow: /buy_sell_business_search/australia/Tasmania/index.html Allow: /buy_sell_business_search/Cafes/australia/0/index.html Allow: /buy_sell_business_search/Hotels/australia/0/index.html Allow: /buy_sell_business_search/Pubs/australia/0/index.html Allow: /buy_sell_business_search/Restaurant/australia/0/index.html Allow:/buy_sell_business_search/Retail General/australia/0/index.html Allow: /buy_sell_business_search/Tourism/australia/0/index.html Let me know if that worked.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 12-02-2008 at 06:11 AM. |
|
||||
|
Quote:
If yes, add an html file in that directory naming it index.html Then add the following rules in your robots.txt: Disallow: /buy_sell_business_result/ Noindex: /buy_sell_business_result/ I know many will say that the noindex is not necessary, but as I mentioned in a previous post, I experienced that Google is misbehaving. Let me know if that worked.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Noindex seems to be one of those quirky things. It is in the specification for robots.txt files, or at least it was in one of the early specifications, but it is poorly supported. If you ask Google how it is handled, the usual answer is "There's a noindex statement? I didn't know that..."
Although Google shows that they process the noindex directive as though it were a Disallow, I would be somewhat conservative with it's use since there is no way to know how other bots may react. You may want to put this directive toward the bottom of the robots.txt, just in case a spider hangs on it.
__________________
The best way to learn anything, is to question everything. |
|
||||
|
John, looking at the robots.txt you asked about...
Wouldn't these four lines be redundant? The * should match a slash I believe, so if it matches /*/*? it should also match /*?... Disallow: /*? Noindex: /*? Disallow: /*/*? Noindex: /*/*? Again, same idea for these, if it matches the pattern /*-i-i*, it must match the previous and less specific /*i-i* rule. Disallow: /*i-i* Noindex: /*i-i* Disallow: /*-i-i* Noindex: /*-i-i* Also, should these be different? Disallow: /*%23content-skip Noindex: /*%23login-skip
__________________
The best way to learn anything, is to question everything. |
|
||||
|
Quote:
Quote:
I would like to show you the entire robots.txt I have so far: http://gameshop.seoworkers.com/robots.txt Any space for improvement?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 12-02-2008 at 09:15 PM. |
|
||||
|
Quote:
|
|
||||
|
Not in the .htaccess file on this testing enviroment. But when the site will goes live on the client's server yes.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
|
||||
|
Lets look at this common for me scenario.
You have a page which you did not want that it shows up in Googles search results. When you noticed that, you wanted to exclude it from their search results and you have disallowed it in the robots.txt. You were not aware that you could use the noindex meta tag and/or you did not know that Google has a removal tool. Do you believe that Google will remove that page when they will crawl you site and find that page? Definetely not! What I have noticed sculpting PR for many sites already, when they see that disallow directive, they do not crawl the page, and they also don't remove it. When I add the noindex directive, and they crawl that page, they remove it from their index. What more do I need to see, to be able to tell how google deals with the noindex directive?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO Last edited by Webnauts; 12-02-2008 at 10:10 PM. |
|
||||
|
is there a robots txt for increasing the PR?
__________________
Hawaii Events|Oahu Events|Honolulu Events |led signs|outdoor led sign |
|
||||
|
how exactly you forbid everyone except google? you also need the help of other robots other than google bots.
__________________
Hawaii Events|Oahu Events|Honolulu Events |led signs|outdoor led sign |
|
||||
|
User-agent: Googlebot
Allow: / User-agent: * Disallow: /
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO |
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| robots.txt file question | kruser | IT Discussion Forum | 2 | 09-13-2007 06:15 PM |
| robots.txt question | kimber23 | Search Engine Optimization Forum | 4 | 12-05-2006 05:51 PM |
| robots.txt question | braknews | Search Engine Optimization Forum | 2 | 05-11-2005 04:12 PM |
| Robots.txt question | dyno | Google Discussion Forum | 2 | 08-24-2004 11:13 PM |
| A Robots.txt question | braknews | Google Discussion Forum | 3 | 08-20-2004 03:31 PM |
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2009 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |