|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
||||
|
Those all-important search engine rankings you desire for your website could be in peril if you utilize duplicate content that runs afoul of search engine guidelines.
Mike McDonald observed this session from SES 2006's day two agenda, and passed along some very good information for our readers. Plenty of websites try to pull tricks that will move them up in the rankings. People like Matt Cutts take gleeful delight in exposing the likes of BMW and Ricoh and booting them out of Google's index. Anyone thinking Google or Yahoo wouldn't ban their site should think again. Presenters at the SES 2006 New York session, Duplicate Content Issues, discussed the dangers of duplicate content. It comes in many forms, like multiple domains for the same homepage content; multiple links to several domains for one site; and "doorway" pages, according to Anne Kennedy, managing partner at Beyond Ink. Yahoo does not want those multiple sites, and neither does the Open Directory Project. Since these are both places where search engines tend to start looking for content to index, being kicked out of them would be a bad development. Google's webmaster guidelines specifically state one should not create duplicate content. The robots.txt and 301 redirects are the webmaster's friend here. Use robots.txt to keep search engines from indexing landing pages, while using 301 redirects to point all domains owned by the business to a single site. Beyond Ink provides tips on doing redirects on their site. Shari Thurow from GrantasticDesigns recommended hunting down and removing "boilerplate" code that has been duplicated throughout a website. She also recommended reading up on Andrei Broder's papers on shingles, another way of classifying documents by their unique signature or fingerprint. Overlapping words or phrases look like shingles on a roof and can be found via a mathematical approach, like that used by a search engine algorithm. Jake Baillie, TrueLocal's president, described the top six duplicate content mistakes:
"Search engines are not trying to penalize content," Mukherjee said. "We're trying to find the right content to promote. Independent of how large our indexes get, there will always be capacity constraints." "Honest site owners often worry about duplicate content when they don't really have to," Google's Cutts said. "There are also people that are a little less conscientious." He also noted that different top level domains, like x.com, x.ca, are not a concern. A site that has an article broken into multiple sections, and also a printer-friendly version that contains all of those sections, likewise aren't cause for worry. Cutts did emphasize the importance of consistent internal linkage as Baillie did. For example, if a site uses www., it should be used everywhere on a site, or nowhere on the site. He also made the interesting observation that sites using absolute links instead of relative ones reduces the number and likelihood of being "scraped" for their content. Cutts recommended using copyright notices throughout a site. For those users who have heard about Google's "Bigdaddy" datacenter update, Cutts said it is rolling out every 7 to 10 days and it should be done in the next 6 weeks. |
|
||||
|
Quote:
Any duplicate content being multiple pages for the same/similar information: - is a watering down of the link popularity available for any one page - is internal competition within your site of what is the most important page for a search phrase So: - I use rel=nofollow for pages such as printer-friendly versions, as well as robots.txt or you could very clearly in the title, meta description and opening paragraph on the page mention "Printer friendly version" and so differentiate from your main copy - we do start running into the shingles issue mentioned. - make sure that there is some extra text to differentiate between the first and subsequent pages. Like I add "Page 2" etc to the title, meta description, and opening paragraph. - make sure that links from subsequent pages link back to the main page with the search phrase, and so add great link popularity back into the first page. Makes that first page receive far more links than any other similar page on your site. As much as anything, the extra text helps users to know where they are in an article/print version, and on the search engines means you can clearly see you have found page x, and that you should page one to get the start. Also means less likely to be seen as duplicate content given the extra words added to the key parts of the page. Having the extra text in the title also waters down the title/major part of ranking formula generally enough that even if you get some external links into the second page, your first page can still rank top. Then make sure that you give that first page enough link popularity in your site, that you have given preference to that first page over the second page. Done well, you can also get that second page as an indented result. So while printer friendly pages and subsequent pages should not be a cause for concern, they are wasted opportunities unless they are dealt with correctly on your site.
__________________
T2DMan - Michael Brandon Search Engine Marketing SEO - specializing in SEO vBulletin |
|
||||
|
So when is this webproworld forum going to follow the advice talked about???
- No robots.txt to keep the bots away from printer pages, new topic, reply to topic, report post... - No rel=nofollow's on irrelevant links to allow the proper pages to get higher link popularity per link. - Titles with the actual subject mentioned after the generic phrase "View Topic" - the subject should be first - No <h1> or <h2> to direct the search engines from the generic page navigation to the actual content. - No meta description - although this does mean that the snippet becomes the first time that phrase is mentioned on the page. How many url's can we find for this page??? The proper url: http://www.webproworld.com/viewtopic.php?t=61246 http://www.webproworld.com/archive.p...your-site.html and the new next topic and previous topic that has the potential of adding still more url's for a page http://www.webproworld.com/viewtopic.php?p=287594 http://www.webproworld.com/viewtopic...asc&highlight= with the highlight being any number of terms! Bound to be a few others. Archive Version I am not keen on the archive version. People are not likely to link to the archive version, and so you are creating competition within your own site between various versions of the page. Summary There are so many issues raised in the dutter post that are not followed on this forum. My expertise happens to be in forums, and I personally like how the combination of vbulletin.com and vbseo.com work together to put right all of the issues mentioned. This advice has been generally used with the http://forums.seochat.com/ and many other sites to great effect. But you do need to be careful not to go overboard with the rel=nofollows, and allow live sig links to members and members profiles. Whichever way you go, following your own advice for this forum would certainly improve traffic via search engines.
__________________
T2DMan - Michael Brandon Search Engine Marketing SEO - specializing in SEO vBulletin |
|
|||
|
LOL...T2, I often think of this, WMW could be owning in the engines for many phrases but they don't because of this and other issues. :) Maybe they don't care? ;-o
What's up with that dutter?
__________________
SEO Blog |
|
||||
|
First, David wrote the article based on information I sent him. David is also not the forum admin. Again, that would be me.
As such, it isn't exactly fair to put him on the defensive over neither the content of the article nor the status of WPW's SEO-worthiness. Second, the information presented in the article/post is simply what I gathered from a session on duplicate content. There is some solid advice there and I simply submit it here for your consideration. Whether or not you do it or find it valuable based on the forum's compliance with it is entirely up to you. There are some things we can do to enhance this forum's seo, sure, but at the same time, I have to work within the constraints of the forum's software. Some things I can do, some things I can't. Some of these things I knew prior to yesterday's session, some I didn't. At the end of the day there are a couple of things to keep in mind. Search engines are not on a witch hunt looking to penalize sites for duplicate content. There isn't anything going on here that is going to raise any flags to the point of us being penalized for it. There are some things we could do better, there are some things I can't do much about without rewriting massive chunks of the forums base code (and that aint happening). Just as an FYI, Brett (at WMW) disallowed ALL robots from his forum a few months ago. Is that good seo? Well, probably not. But he had his reasons and that suited his purposes. I saw him this afternoon actually, and he seemed to be getting along just fine. We're here as a resource for SEO related information and advice. I never claimed that WPW was any shining example of SEO perfection but tht really has little to do with the guidelines presented.
__________________
WebProNews Videos |
|
||||
|
On going thread at WPW on this.
|
|
||||
|
This whole dupe content issue seems to be in the forefront in many threads coming in right now, as we continue through another grueling 6 week? update.
When that happens, we have to look back and try and determine if a recent update has spurred the issues surfacing in many threads. That alone is a great statistical indicator. IMO - A very timely article. Maybe it should have been entitled; "Do You Have Shingles?". Ken |
|
|||
|
Quote:
I recently realized that these clowns had manipulated my relative links into linking to duplicates of my content. I introduced "<base href" tags to all my pages so that I can at least control my own links. However, for the incoming links, I frequently can't even find the web sites with the offending links - some times I think it's the search engines themselves - and when I do find them, I rarely can get both the attention and the comprehension of the offending "webmaster". |
|
|||
|
Quote:
Don't you think it is still going to create a problem of duplicate content |
|
|||
|
Quote:
We generally set it up so anyone pointing to http://dom.com is redirected to http://www.dom.com, that way when the search engines follow the link they are redirected to the "www" version of the page... It can be a real problem for some sites... there is a clients site I'm working on now that google has 2 versions of every page cached, visited them both at different times and attributes different PR to both the "www" and "non-www" versions of every page.... Personally I've never been able to wrap my head around the fact that google would be so stupid as to not be able to tell they were the same pages, instead of treating them as distinctly different pages with the same content... makes no sense to me, but when I see two different url's cached for the same page and they each have different pr, one can only assume that google is seeing them as two pages... Please, can anyone shead an honest light on this for me please... Also, unlike our own sites which we simply do a Canonical Hostname Redirect (non-www to www) in a .htaccess file, with some clients its not so easy depending on their hosting and server access... any professional input would be appreciated, thank you! stretched and confused!
__________________
WebFoot Creative - Website Design, Marketing and SEO. Debt Help USA | Bankruptcy USA - For Help with Debt and Bankruptcy. |
|
|||
|
Just reviewing the original post again, and i'm not clear on a couple of other things discussed...
1) "Product-only pages - it is not good for a site to have product pages and SKU pages; they should be consolidated if possible." K, we have a number of 'product" data base driven shopping sites, and I wonder if this is something I should be clear on... maybe I'm dimmer than I give myself credit for, but even the guy that builds the sites couldn't explain what is meant by the above statement, so if anyone can please do... lol 2) Do you really mean to tell me that the search engines can't tell that a "printer friendly" page is simply there for the convenience of the visitor... ? ...and that we are expected to screw with robot.txt files in order to "manipulate" the se's into indexing what we want instead of simply letting them decide what they want to clutter their indices with. And using meta robots and robot.txt to stop the se's from seeing that you have duplicate sites, each optimized for specific search engines... i mean come on, do you really think they don't take a look anyway if for no other reason than to see why you don't want them going there... Or am i just being cynical... 3) I certainly understand what dutter means by "multiple domains for the same homepage content", but I am not clear what he means by "multiple links to several domains for one site"... dutter, can you please clarify, for once again I am confused! 4) "Circular navigation - having different paths through a site should be avoided. Publishers should define a consistent way of addressing page content no matter what navigation path a user takes through a site."... K, now please don't get mad at me, but this simply makes no sense to me. What is wrong with having as many different ways to navigate throughout a site as possible. Aren't we all individuals with our own personal preference for how we like to navigate a site... ... and the last part "Publishers should define a consistent way of addressing page content no matter what navigation path a user takes through a site." I'm not even sure what is being implied... can someone help me to understand or simply tell me its not important, but then i figure if it wasn't important it wouldn't have been posted in the first place... lol. 5) "Bad cloaking - Don't use cloaking scripts you didn't write. Make sure your cloaking script is returning separate content for each URL being cloaked."... Hmmm, I thought cloaking was not exactly within the se's guidelines... anyone care to comment... 6) "Honest site owners often worry about duplicate content when they don't really have to," Google's Cutts said."... "Search engines are not trying to penalize content," Mukherjee said. "We're trying to find the right content to promote."... So with comments like these coming directly from the horses mouth, why all the kafuffle over unintention duplicate content? And then Cutts says... "...different top level domains, like x.com, x.ca, are not a concern."... K, well now I'm confused again, because wouldn't the same site at www.domain.com and also at www.domain.ca be duplicate or mirrored sites... unless of course they are in different languages or something... obviously still dazed and confused, if anyone can clarify some of these points for me, it would be appreciated...
__________________
WebFoot Creative - Website Design, Marketing and SEO. Debt Help USA | Bankruptcy USA - For Help with Debt and Bankruptcy. |
|
||||
|
1)No idea here. I assume they mean you should not have the same content on two different pages maybe?
2)Like I said above search engines are not smart. We must help them and guide them through our websites. This is just proper management and we should expect to have to do it if we want to show up in the SE properly. If you don't care how you show up in the SE's then ignore these things. 4) I think linking to pages in multiple ways is fine as long it makes sense. I don't think you want to do it in a spammy nature which link to every page, on every page....gross 5) Just don't cloak. 6) I have said this a million times. Duplicate content is not really penalized. Google just shows what they feel is the original page/content first and sometimes sticks the rest of the dup pages in its supplemental index. Google only needs one copy of it so why have many in the index. I can say it enough you are not getting penalized for duplicate content within the same website. Google just picks one to show. It could be the one you want, it could be the printer page, it could be one with session IDs, it could be one with...etc. IT IS UP YOU to tell to organize your content in manner in which you want it to show up in the search engine. Quote:
|
|
||||
|
Good snippet of info here:
http://www.searchengineguide.com/sea...ws/006921.html Quote:
|
|
||||
|
Most of the issues seem to have been adequately addressed. But lets add my comments anyway. ;) Somehow the email reminders of replies did not get to me...
Within a site, its not so much penalty, as in wasted opportunity. For every page that has 5 url's for it, if that was to be one url it would have 5 times the inbound links. That much better the ranking. You would be deciding which url you wanted top, rather than Google deciding. Incrediblehelp stated it rather well, and the supplemental index is not a nice place for pages to be. The opportunities for WebProWorld to make more $ - and help us find the info better Quote:
Is WPW aware that there are solutions out there that are very well seo'ed. I happen to have worked with vBulletin, vBSEO and my template mods enough that I know that you would be hard pressed to get a better combination. Add to that the amazing Google PR and number of links that this WPW site has.... you could certainly own the SE's for many more phrases. I come here mainly because of the newsletters prompting me. When I am Googling for solutions on web issues, I seldom come across webproworld. Thats a missed opportunity. I thought that webproworld was not only about helping people, but was also about making $ from advertisers. There are enough adverts on newsletters and on this site. I would think that doubling site traffic from SEO'ing the site correctly, would help with the $ side somewhat ;) The cost benefit analysis of SEO'ing a site such as this is well worth it. After all, that is what so much of the discussion on the site is all about. Obviously the time required for doing newsletters is worth it. Why not the SEO! Robots.txt and Webmasterworld I found the blog on the robots.txt at WMW. Rather cool. Interesting how all the search engines are disallowed from the site, yet Google still has current content in its index: http://www.google.com/search?q=site:...+other+file%22 - that thread was Feb 18th 2006 (not cached). I have not researched the issue, but I guess that they have used htaccess to make sure that Google cant see that robots.txt. Easier to disallow all search engines, then white list friendly ones.
__________________
T2DMan - Michael Brandon Search Engine Marketing SEO - specializing in SEO vBulletin |
|
|||
|
RELEVANT RESULTS! A search engine has to deliver RELEVANT RESULTS. If any search engine stops delivering relevant results the masses will go elsewhere.
I don't know what BMW and Ricoh did to be dropped by Google but the fact they where dropped or penalized is nonsense! If someone is searching for BMW or Ricoh, and Google hands them results that don't include BMW or Ricoh, that person will find a different search engine. To me search engine results seem to be getting worse and worse. A lot of my results are pages that are not unique content but instead a page full of generated links. What are these pages? How are they made? Why can't Google filter these sites out? Someone will make a better search engine someday. Like Google use to be, it will be obscure and only the "in" people will know about it. You'll be "cool" if you are one of the people that know about it. More and more people will discover that if you don't want to waste 30 minutes looking at generated link pages then go to the new _ _ _ _ search engine. |
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2009 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |