View Full Version : Best Robots.txt wordpress
innominds
07-17-2011, 08:52 AM
Recently I've got a wordpress blog at www.domain.com/blog/
With the recent changes in Google algorithm, I would like to take concrete steps in making it more SEO friendly by removing the low quality pages with tags.
I mean to say I do want to remove the indexing of tag and other low quality ages that decrease the value of main blog posts.
For this I need to write a good robots.txt
Could anyone help me in getting the best robots.txt for my wordpress blog placed at www.domain.com/blog/
Thanks in advance
mjtaylor
07-17-2011, 09:10 AM
There is a basic instruction page and automated tool from Google here: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360.
innominds
07-18-2011, 10:48 AM
Hey! Thanks for the reference. I've gone through it.
After research I've found this robots.txt:
====================
User-agent: Googlebot
Disallow: /wp-content/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /feed/
Disallow: /archives/
Disallow: /sitemap.xml
Disallow: /index.php
Disallow: /*?
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: */feed/
Disallow: */trackback/
Disallow: /page/
Disallow: /tag/
Disallow: /category/
User-agent: Googlebot-Image
Disallow: /wp-includes/
User-agent: Mediapartners-Google*
Disallow:
User-agent: ia_archiver
Disallow: /
User-agent: duggmirror
Disallow: /
========================
As I said above, I'm installing the blog at www.domain.com/blog/ , so do I need to change the above robots.txt file?
One more query! I want to block the robots from accessing a particular web page www.domain.com/XYXABC.html then how to include that in the txt file?
AboutWeb
07-18-2011, 12:00 PM
LoL! Disallow: /sitemap.xml
Why would you disallow your sitemap ? The sitemap.xml file purpose is to tell search engines about your pages and you block Google ?
deepsand
07-18-2011, 11:17 PM
And, right below that we see
Disallow: /index.php
:confused: :confused: :confused:
innominds
07-19-2011, 12:54 AM
I'm sorry! Is this OK now?
User-agent: Googlebot
Disallow: /wp-content/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /feed/
Disallow: /archives/
Disallow: /*?
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: */feed/
Disallow: */trackback/
Disallow: /page/
Disallow: /tag/
Disallow: /category/
User-agent: Googlebot-Image
Disallow: /wp-includes/
User-agent: Mediapartners-Google*
Disallow:
User-agent: ia_archiver
Disallow: /
User-agent: duggmirror
Disallow: /
deepsand
07-19-2011, 01:16 AM
In the first instance, why block only Google?
And, you want to block crawling of all pages with the PHP extension?
AboutWeb
07-19-2011, 01:36 AM
I agree with deepsand that you must use the * for all search engine spiders.
For the second, on a wordpress site all articles/pages doesn't have a php extension. So he would block the wordpress core files by blocking *.php .
You don't need to disallow /archives/ , just set them to nofollow by using All in One Seo Pack.
/category/ and /page/ is a bad idea to block them: I got Page Rank 2 on most of the pages and categories.
deepsand
07-19-2011, 01:39 AM
He's got other PHP pages on his site.
innominds
07-19-2011, 05:40 AM
Sorry for the mistakes!
Actually I'm poor at coding (php)
If you don't mind could you give me the best robots.txt for my wordpress blog at www.domain.com/blog/
deepsand
07-19-2011, 07:26 PM
You need to first identify which directories/files you do and do not want indexed.
Bear in mind that directories such as those that are used for segregating non-contextual files, such as CSS and JS, need not be blocked, as they've no effect on the ability of an SE's indexing engine to discover and have a crawler fetch copies of those files which you do want to be indexed.
Webnauts
07-20-2011, 04:37 AM
Trying to prevent bots to access pages via robots.txt is not the ultimate or best solution. You can prevent the bots access to the chosen pages, but you can not prevent them to show snippets in their search results if someone is linking to your blocked pages. Also you cannot prevent a leak of PageRank either.
The best advise is to install and configure "SEO Ultimate" wordpress plugin: http://wordpress.org/extend/plugins/seo-ultimate/
Good luck.
Best Robots.txt wordpress
Recently I've got a wordpress blog at www.domain.com/blog/ (http://www.domain.com/blog/)
With the recent changes in Google algorithm, I would like to take concrete steps in making it more SEO friendly by removing the low quality pages with tags.
My bolding.
The best advisor I can recommend for you is this http://www.packtpub.com/wordpress-3-search-engine-optimization/book fairly up to date book and it is cheap in PDF format.
I have tried Antispam Bee and am very satisfied with that plugin. It should be even better for English / German speaking people since there is an option to restrict comments to English or German.
mjtaylor
07-20-2011, 10:27 AM
It's also worth taking a look at what Yoast has to say:
http://yoast.com/prevent-site-being-indexed/ and Yoast's SEO plugin is another good choice for you: http://yoast.com/wordpress/seo/.
Webnauts is right, and his plugin suggestion is also a good one.