PDA

View Full Version : wordpress archive pages, robots.txt and Adsense (all together)



Emark2009
02-19-2008, 11:03 AM
hi,

as suggested by a lot of wordpress users, it's a good thing to disallow the archive pages in the robots.txt file, to avoid duplicate content.

I did this as well on my blog, but now an annoying thing happens: i have an Adsense wide skycraper in my sidebar, on every page of the blog, also on the archive-pages (you can have a look at e.g. Lifestyle | Emigrant.be (http://www.emigrant.be/blog/category/lifestyle/)). Problem with these archive pages is that they show irrelevant Adsense ads, just because the pages are not indexed by Google..
(all the other pages that are indexed show relevant ads)

My question:
can i remove to archive pages from my robots.txt file so they get indexed as well? is this really gonna give me a penalty in Google? as i'm using excerpts in the archives personaly i think it can't be a problem..

what do you think you wordpress seniors? hehe..

incrediblehelp
02-20-2008, 06:21 PM
Can you list a link to an archive page so we can see? Really hard for me to tell if it is irrelevant or not since I dont speak the language. not sure why the ads would all of sudden become irrelevant by disallowing the spider access.

Emark2009
02-21-2008, 08:55 AM
archive pages:

Categories:
Bedenkingen & Emoties | Emigrant.be (http://www.emigrant.be/blog/category/bedenkingen-emoties/)
Film | Emigrant.be (http://www.emigrant.be/blog/category/film/)
Immobiliƫn | Emigrant.be (http://www.emigrant.be/blog/category/immobilien/)
Levensstandaard | Emigrant.be (http://www.emigrant.be/blog/category/levensstandaard/)
Lifestyle | Emigrant.be (http://www.emigrant.be/blog/category/lifestyle/)
Salondans | Emigrant.be (http://www.emigrant.be/blog/category/salondans/)
Sfeerbeelden | Emigrant.be (http://www.emigrant.be/blog/category/sfeerbeelden/)
Uitgaan | Emigrant.be (http://www.emigrant.be/blog/category/uitgaan/)
Windsurf | Emigrant.be (http://www.emigrant.be/blog/category/windsurf/)
Zon, Zee & Strand | Emigrant.be (http://www.emigrant.be/blog/category/zon-zee-strand/)

The category-pages show irrelevant Adsense Ads like:
Grant Farm, International Super FUnd, WIn a $10K scholarship, ..
Some of the ads however may appear relevant, because they pick up a word on the page and show ads about that word. That doesn't mean they are relevant. The ads on the individual (indexed) post-pages are much more relevant.

Same goes for:
Months:
2007 september | Emigrant.be (http://www.emigrant.be/blog/2007/09/)
2007 oktober | Emigrant.be (http://www.emigrant.be/blog/2007/10/)
2007 november | Emigrant.be (http://www.emigrant.be/blog/2007/11/)
2007 december | Emigrant.be (http://www.emigrant.be/blog/2007/12/)
2008 januari | Emigrant.be (http://www.emigrant.be/blog/2008/01/)
2008 februari | Emigrant.be (http://www.emigrant.be/blog/2008/02/)

Some posts (that allready got indexed):

Investeren in immobilien in Natal (http://www.emigrant.be/blog/investeren-in-immobilien-loont-het-nog-de-moeite/)
This is a post about real estate in the city of Natal, Brazil.
Adsense ads: Brazil Travel, hotels Natal, real estate, .. => relevant ads

Auto-ongeval in Natal en verzekering (http://www.emigrant.be/blog/auto-ongeval-en-verzekering/)
This is a post about a carcrash and car insurance.
Adsense ads: mainly about insurances => relevant ads

Interview met een Aalstenaar in Brazilie (http://www.emigrant.be/blog/interview-met-een-aalstenaar-in-brazilie/)
This is a post about an interview with a Belgian in Brazil
Adsense Ads: living abroad, hotels Brazil, move to Brazil => relevant ads

...

They next question i could ask is:
let's say i have my archive-pages indexed.. should i provide them with relevant titles (and metatags) because actually there is no reason at all for them to show up in the search results..

Jean-Luc
02-21-2008, 09:33 AM
Dag Gert,

Your robots.txt file is not valid. First of all, you should remove all blank lines. Secondly, I would recommend you look at the robots.txt specification here (http://www.robotstxt.org/robotstxt.html).

Also note that Google uses several web robots: Googlebot is for the Google search engine and MediaPartners is for AdSense.

To disallow all bots but Mediapartners, use something like this:

User-agent: *
Disallow: /dir1/
Disallow: /dir2/
Disallow: /dir3/

User-agent: Mediapartners-Google
Disallow:


Groetjes,

Jean-Luc

Emark2009
02-24-2008, 06:28 PM
hallo Jean-Luc,

thanks for the tip about the mediapartners-bot, i didn't think about that, and it will (probably) resolve my problem.

about the robots.txt file not being valid, i'm not sure about that.. Google webmasters tools doesn't report any problem and the file does what it's supposed to do.

i haven't changed the robots.txt yet. i'll return in some days to report the changes in this thread.

thanks for the tip!

groeten,
Gert

uberanimal
02-25-2008, 04:05 PM
i find that it helped to leave the archives visable to the crawler. i would allow it. it does not really hurt and count as "duplicate content".

incrediblehelp
02-25-2008, 06:18 PM
i find that it helped to leave the archives visable to the crawler. i would allow it. it does not really hurt and count as "duplicate content".

For some people it does and for other it doesnt. Very finicky part of working on optimization of a website.

Emark2009
02-29-2008, 05:56 PM
i have added the following lines the my robots.txt:

User-agent: Mediapartners-Google
Disallow:

The archive-pages remain disallowed for all bots but Mediapartners-Google.
Now let's wait and see what happens with the Adsense-ads..

Emark2009
03-04-2008, 03:37 PM
ok problem solved !!

the archives remain disallowed for all buts but media-partners
they now show relevant ads !!

thanks Jean-Luc for the tip !