WebProWorld Part of WebProNews.com
Page One Link To Us Edit Profile Private Messages Archives FAQ RSS Feeds  
 

Go Back   WebProWorld > Search Engines > Google Discussion Forum
Subscribe to the Newsletter FREE!


Register FAQ Members List Calendar Arcade Chatbox Mark Forums Read

Google Discussion Forum Google Discussion forum is for topics specifically related to Google. There is a subforum dedicated to AdSense/AdWords subjects.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 10-04-2006, 10:27 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
 

Join Date: Aug 2003
Location: Worldwide
Posts: 7,399
Webnauts RepRank 3Webnauts RepRank 3
Default Same content for HTML and PDF documents - Duplicated?

We are planning to provide the content of some articles and tutorials of our site in PDF format too, offering on those pages a download option for later reading.

Should there be a duplicated content issue there?
Reply With Quote
  #2 (permalink)  
Old 10-04-2006, 11:24 PM
Duncan Pollock's Avatar
WebProWorld Veteran
 

Join Date: Jul 2003
Location: Niagara-on-the-Lake, Ontario, Canada
Posts: 895
Duncan Pollock RepRank 1
Default

I'm far from sure about the answer, but I think I'd be surprised if it was an issue.
For instance, I'm aware of how often the SERPs offer the choice of an html version if the listing shows as a .pdf file. This must mean that there are two URLs, each with the same content. What isn't so clear, of course, is whether the reverse (i.e. a .pdf alternative to an html page) applies.
I'll be interested to learn what one of our residents gurus has to say (and, ahem, you're usually one of them!).

Duncan
__________________
Acts as an Exclusive Buyer Broker for purchasers of residential, industrial, commercial, and investment properties in all parts of the Niagara Peninsula.
http://www.duncanpollock.com
http://www.iciniagara.com
Reply With Quote
  #3 (permalink)  
Old 10-05-2006, 09:11 AM
SEOforGoogle's Avatar
WebProWorld Veteran
 

Join Date: Jan 2005
Location: USA
Posts: 436
SEOforGoogle RepRank 0
Default

As long as your site is the originator of the content, you won't be penalized by Google - they understand that humans like content delivered in diiferent formats.
Reply With Quote
  #4 (permalink)  
Old 10-05-2006, 09:17 AM
WebProWorld 1,000+ Club
 

Join Date: Dec 2003
Location: Houston
Posts: 5,715
greeneagle RepRank 0
Default

I am going to agree with SEOforgoogle,

I haven't had any problem with an industrial Site I manage, in fact it seems to reinforce positions to some degree.

Another good play there is to make sure that the verbiage on both is different. A good technical editor shouldn't have a problem with that, and it's good insurance just in case - not to mention wagging a longer tail.

Ken
Reply With Quote
  #5 (permalink)  
Old 10-05-2006, 02:35 PM
WebProWorld New Member
 

Join Date: Mar 2006
Location: Sunny Southwest Florida
Posts: 4
sonjay RepRank 0
Default

I agree that it's probably not an issue, SEO-wise, but I still like to put all my pdf's in a folder that's disallowed in robots.txt.

I don't want the pdf showing up in the SERPs. I'd prefer that people follow the link to the html page on the site, where they can either read the article online or download the pdf. This way, at least they've made it to the site, and they have an opportunity to see what else is available there. If they just click the pdf link in the SERPs, they never even make it to the site, and they might well have no clue where the pdf even came from.
Reply With Quote
  #6 (permalink)  
Old 10-05-2006, 03:00 PM
WebProWorld 1,000+ Club
 

Join Date: Dec 2003
Location: Houston
Posts: 5,715
greeneagle RepRank 0
Default

I absolutely do not put them in a disallowed folder. With clean code and verbiage differences...I have found they act as second sentry in the SEs and many times are preferred by visitors.

Ken
Reply With Quote
  #7 (permalink)  
Old 10-07-2006, 08:04 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
 

Join Date: Aug 2003
Location: Worldwide
Posts: 7,399
Webnauts RepRank 3Webnauts RepRank 3
Default

Quote:
Originally Posted by greeneagle
I absolutely do not put them in a disallowed folder. With clean code and verbiage differences...I have found they act as second sentry in the SEs and many times are preferred by visitors.

Ken
You mean they will not fall into the supplemental results?
Reply With Quote
  #8 (permalink)  
Old 10-07-2006, 10:28 PM
WebProWorld Member
 

Join Date: Oct 2006
Location: Canada
Posts: 90
Hiops RepRank 0
Default

I don't think it's gonna be an issue,because the content is on the same url, and I would put it into diallowed folder, because maybe some visitors prefer to see content in pdf format.
Article Directory
Reply With Quote
  #9 (permalink)  
Old 10-07-2006, 11:07 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
 

Join Date: Aug 2003
Location: Worldwide
Posts: 7,399
Webnauts RepRank 3Webnauts RepRank 3
Default

Quote:
Originally Posted by Hiops
I don't think it's gonna be an issue,because the content is on the same url, and I would put it into diallowed folder, because maybe some visitors prefer to see content in pdf format.
I don't understand you. It is not a problem, but I should add it in a disallowed folder, while visitors might prefer the pdf format?

Sure I will provide a PDF file. The question is, if I should exclude it from the SE index, to avoid duplicated content issues.

So what do you mean?
Reply With Quote
  #10 (permalink)  
Old 10-07-2006, 11:41 PM
WebProWorld Member
 

Join Date: Oct 2006
Location: Canada
Posts: 90
Hiops RepRank 0
Default

I meant it should not be a problem with duplicate content, because they spot duplicate content on different sites, that's what I think, but I'm not sure about that. And if google can penalize for dup content on the same site, that would be ridiculos, but absolutely possible, because you never know what is on their mind. And bearing this in mind would be better to put it in disallowed folder to avoid dup content issuies.
__________________
Alex
Submit your articles
Reply With Quote
  #11 (permalink)  
Old 10-08-2006, 06:57 AM
WebProWorld 1,000+ Club
 

Join Date: Dec 2003
Location: Houston
Posts: 5,715
greeneagle RepRank 0
Default

John,

I have gone back and checked my results comparing the HTML pages vs their pdf counterparts and I have just a couple in supplimental.

The ones in supplemental are almost identical to the HTML versions in content. THe one's that didn't make the supplimental index have enough difference to avoid same site duplicate content problems.

I would think there are 3 good approaches here, depending on your Site needs and developers level of expertise:

1) Use CSS Printer Profiles on XHTML pages, avoiding "copies" all together.

2) Differentiated content between parseable "printer friendly" (eg pdf) format and HTML versions.

3) Same content using disallowing bot/crawlers to the pdf's.

4) Using the pdf versions only and allowing full indexing.

In my view, I have placed them in order of what I see as best practice.

Ken
Reply With Quote
  #12 (permalink)  
Old 10-09-2006, 04:57 PM
incrediblehelp's Avatar
Moderator
WebProWorld Moderator
 

Join Date: Jan 2004
Location: Live in Cincy Now
Posts: 7,654
incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4incrediblehelp RepRank 4
Default

I agree not really an issue, but to be sure do what sonjay has suggested.

Put them in a separate PDF folder and disallow the bots, to ensure no issues (supplemental or duplication) will ever arise. Remember algos change all of the time, just because today we have no evidence that this is a issue, doesn't mean in a year from now it wont be.
Reply With Quote
Reply

  WebProWorld > Search Engines > Google Discussion Forum
Tags: , , , , ,



Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Search Engine Optimization by vBSEO 3.2.0