WebProWorld Part of WebProNews.com
Page One Link To Us Edit Profile Private Messages Archives FAQ RSS Feeds  
 

Go Back   WebProWorld > Search Engines > Search Engine Optimization Forum
Subscribe to the Newsletter FREE!


Register FAQ Members List Calendar Arcade Chatbox Mark Forums Read

Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-28-2006, 05:24 PM
dutter's Avatar
WebProWorld Veteran
 

Join Date: Apr 2005
Posts: 628
dutter RepRank 0
Default Duplicate Content Jeopardizes Your Site

Those all-important search engine rankings you desire for your website could be in peril if you utilize duplicate content that runs afoul of search engine guidelines.

Mike McDonald observed this session from SES 2006's day two agenda, and passed along some very good information for our readers.

Plenty of websites try to pull tricks that will move them up in the rankings. People like Matt Cutts take gleeful delight in exposing the likes of BMW and Ricoh and booting them out of Google's index. Anyone thinking Google or Yahoo wouldn't ban their site should think again.

Presenters at the SES 2006 New York session, Duplicate Content Issues, discussed the dangers of duplicate content. It comes in many forms, like multiple domains for the same homepage content; multiple links to several domains for one site; and "doorway" pages, according to Anne Kennedy, managing partner at Beyond Ink.

Yahoo does not want those multiple sites, and neither does the Open Directory Project. Since these are both places where search engines tend to start looking for content to index, being kicked out of them would be a bad development. Google's webmaster guidelines specifically state one should not create duplicate content.

The robots.txt and 301 redirects are the webmaster's friend here. Use robots.txt to keep search engines from indexing landing pages, while using 301 redirects to point all domains owned by the business to a single site. Beyond Ink provides tips on doing redirects on their site.

Shari Thurow from GrantasticDesigns recommended hunting down and removing "boilerplate" code that has been duplicated throughout a website. She also recommended reading up on Andrei Broder's papers on shingles, another way of classifying documents by their unique signature or fingerprint. Overlapping words or phrases look like shingles on a roof and can be found via a mathematical approach, like that used by a search engine algorithm.

Jake Baillie, TrueLocal's president, described the top six duplicate content mistakes:
  • Circular navigation - having different paths through a site should be avoided. Publishers should define a consistent way of addressing page content no matter what navigation path a user takes through a site.

    Printer friendly pages - if these are html pages, robots.txt should be used to block search engines from indexing them.

    Inconsistent linking - calling directory pages in an inconsistent manner, like /directory and /directory/, should be avoided.

    Product-only pages - it is not good for a site to have product pages and SKU pages; they should be consolidated if possible.

    Transparent serving domains - use 301 redirection instead of DNS aliasing to get users to a canonical site from multiple domains.

    Bad cloaking - Don't use cloaking scripts you didn't write. Make sure your cloaking script is returning separate content for each URL being cloaked.
Rajat Mukherjee, director of product management at Yahoo!, emphasized sites should try not to make the same content available through multiple URLs.

"Search engines are not trying to penalize content," Mukherjee said. "We're trying to find the right content to promote. Independent of how large our indexes get, there will always be capacity constraints."

"Honest site owners often worry about duplicate content when they don't really have to," Google's Cutts said. "There are also people that are a little less conscientious." He also noted that different top level domains, like x.com, x.ca, are not a concern.

A site that has an article broken into multiple sections, and also a printer-friendly version that contains all of those sections, likewise aren't cause for worry.

Cutts did emphasize the importance of consistent internal linkage as Baillie did. For example, if a site uses www., it should be used everywhere on a site, or nowhere on the site. He also made the interesting observation that sites using absolute links instead of relative ones reduces the number and likelihood of being "scraped" for their content. Cutts recommended using copyright notices throughout a site.

For those users who have heard about Google's "Bigdaddy" datacenter update, Cutts said it is rolling out every 7 to 10 days and it should be done in the next 6 weeks.
Reply With Quote
  #2 (permalink)  
Old 02-28-2006, 08:40 PM
T2DMan's Avatar
WebProWorld Member
 

Join Date: Nov 2003
Location: Auckland, New Zealand
Posts: 52
T2DMan RepRank 0
Default There should be a worry? Wasted opportunities!

Quote:
Originally Posted by Cutts
A site that has an article broken into multiple sections, and also a printer-friendly version that contains all of those sections, likewise aren't cause for worry.
The other people quoted sound more like the following lines of reasoning:

Any duplicate content being multiple pages for the same/similar information:
- is a watering down of the link popularity available for any one page
- is internal competition within your site of what is the most important page for a search phrase

So:
- I use rel=nofollow for pages such as printer-friendly versions, as well as robots.txt or you could very clearly in the title, meta description and opening paragraph on the page mention "Printer friendly version" and so differentiate from your main copy - we do start running into the shingles issue mentioned.
- make sure that there is some extra text to differentiate between the first and subsequent pages. Like I add "Page 2" etc to the title, meta description, and opening paragraph.
- make sure that links from subsequent pages link back to the main page with the search phrase, and so add great link popularity back into the first page. Makes that first page receive far more links than any other similar page on your site.

As much as anything, the extra text helps users to know where they are in an article/print version, and on the search engines means you can clearly see you have found page x, and that you should page one to get the start. Also means less likely to be seen as duplicate content given the extra words added to the key parts of the page.

Having the extra text in the title also waters down the title/major part of ranking formula generally enough that even if you get some external links into the second page, your first page can still rank top.

Then make sure that you give that first page enough link popularity in your site, that you have given preference to that first page over the second page.

Done well, you can also get that second page as an indented result.

So while printer friendly pages and subsequent pages should not be a cause for concern, they are wasted opportunities unless they are dealt with correctly on your site.
__________________
T2DMan - Michael Brandon

Search Engine Marketing SEO - specializing in SEO vBulletin
Reply With Quote
  #3 (permalink)  
Old 02-28-2006, 10:02 PM
T2DMan's Avatar
WebProWorld Member
 

Join Date: Nov 2003
Location: Auckland, New Zealand
Posts: 52
T2DMan RepRank 0
Default When will Webproworld follow the dutter advice?

So when is this webproworld forum going to follow the advice talked about???

- No robots.txt to keep the bots away from printer pages, new topic, reply to topic, report post...
- No rel=nofollow's on irrelevant links to allow the proper pages to get higher link popularity per link.
- Titles with the actual subject mentioned after the generic phrase "View Topic" - the subject should be first
- No <h1> or <h2> to direct the search engines from the generic page navigation to the actual content.
- No meta description - although this does mean that the snippet becomes the first time that phrase is mentioned on the page.


How many url's can we find for this page???

The proper url:
http://www.webproworld.com/viewtopic.php?t=61246

http://www.webproworld.com/archive.p...your-site.html
and the new next topic and previous topic that has the potential of adding still more url's for a page

http://www.webproworld.com/viewtopic.php?p=287594
http://www.webproworld.com/viewtopic...asc&highlight= with the highlight being any number of terms!

Bound to be a few others.

Archive Version

I am not keen on the archive version. People are not likely to link to the archive version, and so you are creating competition within your own site between various versions of the page.

Summary

There are so many issues raised in the dutter post that are not followed on this forum.

My expertise happens to be in forums, and I personally like how the combination of vbulletin.com and vbseo.com work together to put right all of the issues mentioned. This advice has been generally used with the http://forums.seochat.com/ and many other sites to great effect. But you do need to be careful not to go overboard with the rel=nofollows, and allow live sig links to members and members profiles.

Whichever way you go, following your own advice for this forum would certainly improve traffic via search engines.
__________________
T2DMan - Michael Brandon

Search Engine Marketing SEO - specializing in SEO vBulletin
Reply With Quote
  #4 (permalink)  
Old 02-28-2006, 11:34 PM
WebProWorld Veteran
 

Join Date: Oct 2005
Posts: 529
aaron2005 RepRank 0
Default LOL!

LOL...T2, I often think of this, WMW could be owning in the engines for many phrases but they don't because of this and other issues. :) Maybe they don't care? ;-o

What's up with that dutter?
__________________
SEO Blog
Reply With Quote
  #5 (permalink)  
Old 03-01-2006, 01:31 AM
mike's Avatar
Administrator
 

Join Date: Jun 2003
Location: In the back, off the side and far away
Posts: 1,809
mike RepRank 11mike RepRank 11mike RepRank 11mike RepRank 11mike RepRank 11mike RepRank 11mike RepRank 11mike RepRank 11mike RepRank 11mike RepRank 11mike RepRank 11
Default

First, David wrote the article based on information I sent him. David is also not the forum admin. Again, that would be me.

As such, it isn't exactly fair to put him on the defensive over neither the content of the article nor the status of WPW's SEO-worthiness.

Second, the information presented in the article/post is simply what I gathered from a session on duplicate content. There is some solid advice there and I simply submit it here for your consideration. Whether or not you do it or find it valuable based on the forum's compliance with it is entirely up to you.

There are some things we can do to enhance this forum's seo, sure, but at the same time, I have to work within the constraints of the forum's software. Some things I can do, some things I can't. Some of these things I knew prior to yesterday's session, some I didn't.

At the end of the day there are a couple of things to keep in mind. Search engines are not on a witch hunt looking to penalize sites for duplicate content. There isn't anything going on here that is going to raise any flags to the point of us being penalized for it. There are some things we could do better, there are some things I can't do much about without rewriting massive chunks of the forums base code (and that aint happening).

Just as an FYI, Brett (at WMW) disallowed ALL robots from his forum a few months ago. Is that good seo? Well, probably not. But he had his reasons and that suited his purposes. I saw him this afternoon actually, and he seemed to be getting along just fine.

We're here as a resource for SEO related information and advice. I never claimed that WPW was any shining example of SEO perfection but tht really has little to do with the guidelines presented.
__________________
WebProNews Videos
Reply With Quote
  #6 (permalink)  
Old 03-01-2006, 02:06 AM
incrediblehelp's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,597
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default

I agree that duplicate content is a big no-no and should be avoided, just for the sake of design alone if not for SEO as well. One thing I have learned from working with many websites that were guilty of the some of the dup info above is that the websites usually do not get demoted, banned, penalized so much as the wrong content/URL may show in the search engines index. Usually the Google's of the world are forced to pick one of the pages to show (which is usually the one you don't want), but they don't actually penalize you for duplicate content within the same website. Do others feel the same way?

Now dup content on one website to another is another whole story all together. Consider article distribution like many preach as a good way to gain IBL's. Sure syndicating your article is a good idea, but wait a couple of weeks after you have published it on your website so all the search engines can recognize you as the original author. Most are so eager to get it out in the public that they forget this.
Reply With Quote
  #7 (permalink)  
Old 03-01-2006, 02:12 AM
incrediblehelp's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,597
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default

On going thread at WPW on this.
Reply With Quote
  #8 (permalink)  
Old 03-01-2006, 11:17 AM
WebProWorld 1,000+ Club
 

Join Date: Dec 2003
Location: Houston
Posts: 5,715
greeneagle RepRank 0
Default

This whole dupe content issue seems to be in the forefront in many threads coming in right now, as we continue through another grueling 6 week? update.

When that happens, we have to look back and try and determine if a recent update has spurred the issues surfacing in many threads. That alone is a great statistical indicator.

IMO - A very timely article.

Maybe it should have been entitled; "Do You Have Shingles?".

Ken
Reply With Quote
  #9 (permalink)  
Old 03-02-2006, 12:34 AM
WebProWorld Member
 

Join Date: Jul 2003
Location: Humboldt Bay
Posts: 64
downstrike RepRank 0
Default Re: Duplicate Content Jeopardizes Your Site

Quote:
Originally Posted by dutter
For example, if a site uses www., it should be used everywhere on a site, or nowhere on the site.
Yes, and how is a webmaster who can't even get other people linking to his site to spell his file names correctly going to get them to abide by the www. convention he chose?

I recently realized that these clowns had manipulated my relative links into linking to duplicates of my content. I introduced "<base href" tags to all my pages so that I can at least control my own links. However, for the incoming links, I frequently can't even find the web sites with the offending links - some times I think it's the search engines themselves - and when I do find them, I rarely can get both the attention and the comprehension of the offending "webmaster".
Reply With Quote
  #10 (permalink)  
Old 03-02-2006, 12:48 AM
WebProWorld Member
 

Join Date: Oct 2005
Location: Australia
Posts: 57
ppanwar RepRank 0
Default

Quote:
Originally Posted by incrediblehelp

Now dup content on one website to another is another whole story all together. Consider article distribution like many preach as a good way to gain IBL's. Sure syndicating your article is a good idea, but wait a couple of weeks after you have published it on your website so all the search engines can recognize you as the original author. Most are so eager to get it out in the public that they forget this.
So what you are saying is first publish the article on your website and then publish the article on other article websites for one way link.

Don't you think it is still going to create a problem of duplicate content
__________________
Parveen Panwar
http://www.uvouch.com
Reply With Quote
  #11 (permalink)  
Old 03-02-2006, 10:05 AM
JKomp's Avatar
WebProWorld 1,000+ Club
 

Join Date: Dec 2004
Location: UK
Posts: 1,044
JKomp RepRank 0
Default

No, not if you wait for your page to be spidered by all the major SEs and then submit, every time the site is indexed a date/time is associated with it - this will mean that duplicate content is not your problem.
__________________
My Albatross - Indie Music Myspace Stuff - Wii News and Reviews
Reply With Quote
  #12 (permalink)  
Old 03-02-2006, 04:38 PM
WebProWorld Pro
 

Join Date: Oct 2004
Posts: 117
stretch dog RepRank 0
Default Re: Duplicate Content Jeopardizes Your Site

Quote:
Originally Posted by downstrike
Yes, and how is a webmaster who can't even get other people linking to his site to spell his file names correctly going to get them to abide by the www. convention he chose?
Wouldn't the "Canonical Hostname Redirect (non-www to www)" permanent 301 redirect look after the problem of IBL's not following the "www" convention of choice... ?

We generally set it up so anyone pointing to http://dom.com is redirected to http://www.dom.com, that way when the search engines follow the link they are redirected to the "www" version of the page...

It can be a real problem for some sites... there is a clients site I'm working on now that google has 2 versions of every page cached, visited them both at different times and attributes different PR to both the "www" and "non-www" versions of every page....

Personally I've never been able to wrap my head around the fact that google would be so stupid as to not be able to tell they were the same pages, instead of treating them as distinctly different pages with the same content... makes no sense to me, but when I see two different url's cached for the same page and they each have different pr, one can only assume that google is seeing them as two pages...

Please, can anyone shead an honest light on this for me please...

Also, unlike our own sites which we simply do a Canonical Hostname Redirect (non-www to www) in a .htaccess file, with some clients its not so easy depending on their hosting and server access... any professional input would be appreciated, thank you!

stretched and confused!
__________________
WebFoot Creative - Website Design, Marketing and SEO.
Debt Help USA | Bankruptcy USA - For Help with Debt and Bankruptcy.
Reply With Quote
  #13 (permalink)  
Old 03-02-2006, 04:44 PM
incrediblehelp's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,597
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default Re: Duplicate Content Jeopardizes Your Site

Quote:
Originally Posted by stretch dog
Please, can anyone shead an honest light on this for me please...
Computers (algo's) are not smart. People are.
Reply With Quote
  #14 (permalink)  
Old 03-02-2006, 05:08 PM
WebProWorld Pro
 

Join Date: Oct 2004
Posts: 117
stretch dog RepRank 0
Default another question or two please, cause i'm confused...

Just reviewing the original post again, and i'm not clear on a couple of other things discussed...

1) "Product-only pages - it is not good for a site to have product pages and SKU pages; they should be consolidated if possible."

K, we have a number of 'product" data base driven shopping sites, and I wonder if this is something I should be clear on... maybe I'm dimmer than I give myself credit for, but even the guy that builds the sites couldn't explain what is meant by the above statement, so if anyone can please do... lol

2) Do you really mean to tell me that the search engines can't tell that a "printer friendly" page is simply there for the convenience of the visitor... ?

...and that we are expected to screw with robot.txt files in order to "manipulate" the se's into indexing what we want instead of simply letting them decide what they want to clutter their indices with.

And using meta robots and robot.txt to stop the se's from seeing that you have duplicate sites, each optimized for specific search engines... i mean come on, do you really think they don't take a look anyway if for no other reason than to see why you don't want them going there...

Or am i just being cynical...

3) I certainly understand what dutter means by "multiple domains for the same homepage content", but I am not clear what he means by "multiple links to several domains for one site"... dutter, can you please clarify, for once again I am confused!

4) "Circular navigation - having different paths through a site should be avoided. Publishers should define a consistent way of addressing page content no matter what navigation path a user takes through a site."...

K, now please don't get mad at me, but this simply makes no sense to me. What is wrong with having as many different ways to navigate throughout a site as possible. Aren't we all individuals with our own personal preference for how we like to navigate a site...

... and the last part "Publishers should define a consistent way of addressing page content no matter what navigation path a user takes through a site." I'm not even sure what is being implied...

can someone help me to understand or simply tell me its not important, but then i figure if it wasn't important it wouldn't have been posted in the first place... lol.

5) "Bad cloaking - Don't use cloaking scripts you didn't write. Make sure your cloaking script is returning separate content for each URL being cloaked."...

Hmmm, I thought cloaking was not exactly within the se's guidelines... anyone care to comment...

6) "Honest site owners often worry about duplicate content when they don't really have to," Google's Cutts said."...

"Search engines are not trying to penalize content," Mukherjee said. "We're trying to find the right content to promote."...

So with comments like these coming directly from the horses mouth, why all the kafuffle over unintention duplicate content?

And then Cutts says... "...different top level domains, like x.com, x.ca, are not a concern."...

K, well now I'm confused again, because wouldn't the same site at www.domain.com and also at www.domain.ca be duplicate or mirrored sites... unless of course they are in different languages or something...

obviously still dazed and confused, if anyone can clarify some of these points for me, it would be appreciated...
__________________
WebFoot Creative - Website Design, Marketing and SEO.
Debt Help USA | Bankruptcy USA - For Help with Debt and Bankruptcy.
Reply With Quote
  #15 (permalink)  
Old 03-02-2006, 05:30 PM
incrediblehelp's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,597
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default

1)No idea here. I assume they mean you should not have the same content on two different pages maybe?

2)Like I said above search engines are not smart. We must help them and guide them through our websites. This is just proper management and we should expect to have to do it if we want to show up in the SE properly. If you don't care how you show up in the SE's then ignore these things.

4) I think linking to pages in multiple ways is fine as long it makes sense. I don't think you want to do it in a spammy nature which link to every page, on every page....gross

5) Just don't cloak.

6) I have said this a million times. Duplicate content is not really penalized. Google just shows what they feel is the original page/content first and sometimes sticks the rest of the dup pages in its supplemental index. Google only needs one copy of it so why have many in the index. I can say it enough you are not getting penalized for duplicate content within the same website. Google just picks one to show. It could be the one you want, it could be the printer page, it could be one with session IDs, it could be one with...etc.

IT IS UP YOU to tell to organize your content in manner in which you want it to show up in the search engine.

Quote:
K, well now I'm confused again, because wouldn't the same site at www.domain.com and also at www.domain.ca be duplicate or mirrored sites... unless of course they are in different languages or something...
I have seen them getting better on separating these two domains themselves, by the TLD only, even with the same content on each. This used to be an issue in the past, but not so much for Google anymore. You should still try to make each website as customized for each region as possible.
Reply With Quote
  #16 (permalink)  
Old 03-03-2006, 12:56 PM
incrediblehelp's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,597
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default

Good snippet of info here:

http://www.searchengineguide.com/sea...ws/006921.html

Quote:
I find that Google works to filter out the duplicate content and display the best page. That may not always be your site. Such would be the case where you authored an article but then had it republished on a major ezine. The ezine shows up first because it is considered more of an authority site then yours, even though the article may have been originally published on your own site. While you cannot always control what Google decides to display, a webmaster can be assured that Google will not penalize for duplicate content unless it is some kind of extreme case (such as having 2500 mirror sites).

Yahoo on the other hand does penalize for duplicate content. I believe this is a part of their algorithm that was inherited with the integration of Inktomi technology into their own search engine. That is why it is my advice to always try to avoid duplicate content. If you have things like PDF files that are the same as html versions or "print friendly" pages that represent the html version, block access to them using the robots.txt file. If you are redesigning a site and your file structure is changing, set up 301 redirects from old pages to new ones. The same is true if re-branding and using a new domain for site – 301 redirect old domain to new one while removing files at old domain. Do not spread duplicate content across multiple domains. I see this happen with affiliate sites where multiple affiliates use the same language on their sites provided by the very company they are an affiliate of.. I also see it happen when people assume that keywords in domains will help them rank better so they register multiple domains targeting specific keywords hoping to rank better but it only ends up biting them in the end.

I don't think MSN has matured enough to really deal with duplicate content yet so you will probably find it abounding in their index. And as for Ask, their index is more selective anyway meaning that they are not out to build the biggest index, but rather one of quality. Therefore they probably work to index quality content and can easily identify duplicates of the original.
David is right on here!
Reply With Quote
  #17 (permalink)  
Old 03-04-2006, 12:24 AM
T2DMan's Avatar