View Full Version : Matt Cutts answer on duplicate detection
Emark2009
08-04-2006, 08:46 AM
So Matt Cutts says in his videos that Google does:
- exact duplicate detection
- near duplicate detection
His advice on duplicated content is: "make sure your pages are quite different from each other"
Ok, this doesn't bring us much further..
Question:
A friend of mine sells online travel packages in Brazil on his website. I have a site about Brazil as well and recently he asked me: "Why don't you sell my packages on your site? So i agreed!
Something like:
Day 1
Arrival at airport. Transfer to hotel. Late night city trip.
Day 2
Early breakfast. Jeep-safari with guide. Evening: brazilian dinner
Day 3
Visit to historical part of city
Day 4
Transfer to airport
(off course with a lot more details)
I am planning to copy these travel packages to my site. Same content, different layout.
Should i worry about duplicated content?
Did you ask this question on Matt Cutt's forum? I'd be interested in hearing the answer.
Use Pablo Picasso's (http://www.picasso.fr/anglais/)technique, "steal" and make it unreckognizeable.
I read somewhere that he said. If I see a motive / picture I like, I steal it and make it unreckognizeable.
I think he shall also have said. "I have not done anything else during my life then spculate in peoples bad taste."
I disagree to the last sentence. He has some wonderful paintings, and his (good) paintings increase in real value (appreciates) year after year.
jackson992
08-04-2006, 04:18 PM
Interesting since usually it's not a problem if you duplicate content withon the same site
shellared
08-04-2006, 05:57 PM
Many merchants use the vendor's copy for their marketing. I've seen advice that if you're thinking about doing this, don't.
Change the copy and you should be good to do.
What a pita. Especially if you sell widgets with some tech specs and the same selections and no sales pitch to the copy. How do you make the tech specs differ from your competitors? Imagine selling hundreds of types of nails, screws, nuts and bolts and having to add copy to each just to be different so you;re not dropped from G.
visio
08-04-2006, 06:03 PM
I would not give much importance about the duplicate content within the same site or not, I have'nt yet come across any penalty because of duplicate content.
Emark2009
08-04-2006, 06:46 PM
Just to be clear: i'm talking about 2 different sites !!
Kgun,interesting point !
Allthough in some cases allmost impossible to put into practice, like Shellared illustrates.
jacobwissler
08-04-2006, 06:55 PM
It would appear obvious that certain types of websites would almost have to contain dupplicate content, such as a hotel, rental car, airline or travel sites that asked for the date of departure. There are only so many different ways to phrase it.
mnsandy
08-04-2006, 07:03 PM
I agree, steal and change, thats the way to go or we call it also Article Rewriting and adding additional content:)
good luck my friend :)))
pemburung
08-04-2006, 07:07 PM
We have concerns about the same issue. We have various itineraries to show the different things we can do in an area. Naturally, some things are the same, and so we just cut and past those days. As the itineraries are detailed, one 3 week itinerary can run 3 or 4 pages (for fast loading time as we use images to illustrate; we care about those good folks out there without DSL or who - gasp - actually surf from home for private things and not the office). This means it's possible for a few pages to be be very similar, or even identical, if the breaks occur just so. Now, we've spent a fair amount of time polishing up these day descriptions to be just right, and don't want to go changing them, and why should we spend the time, which is money. Yes, we could use a noindex command, but then someone looking for detailed search strings may not find us. A page that happens to get linked to by others, and should appear high, may be just the one - in fact you know it will be - that is noindexed.
We also put clients' itineraries on our site, so they and friends can access them, and naturally a lot of these have very similar content. While we can, and do, also noindex these, we leave those to be indexed that have some different content, and do get inquiries based on those itineraries when people are searching. Again, these will have some very similar content to other parts of our website.
On the issue of duplicated material off-site,there is the issue of reselling someone else's tour, as the original poster described. We are international, and a lot of people prefer to buy a foreign tour in their own country, rather than deal directly with an overseas operator, or, we represent some foreign tours. Once again, the tour operator has spent time making the description just right; it's a complete waste of time to change it for the sake of it.
All of this is good for the searcher; they find the sort of results their longtail searches are looking for. Personally, I hate looking for something, and finding six different descriptions that all turn out to be the same product. It wastes my time. It's absolutely legitimate for different resellers to be selling the same thing, as many offer different services around the product. But if it's the same product, I want to know that up front. Google is making everyone waste time with this issue, as far as off-site duplication goes.
I guess it comes down to this; if Google wants to start dropping pages and sites from its index, and given its gazillion dollar income using other people's content, then it has a responsibility to make sure it is not harming those people - actually de facto business partners - by its actions. Few sites should be dropped by algo; most sites should be hand checked to see if it is in fact legitimately using dupe content. While there may be some obvious dupe content sites that could be identified by algo, at some stage there occurs a huge gray area. And so far Google has not shown mcuh finesse with gray.
jacobwissler
08-04-2006, 09:06 PM
I doubt Google cares who they harm. Continental Airlines will have more links than a small travel agency, so the little guy has to buy PPC. Major bling.
pemburung
08-04-2006, 10:34 PM
Agreed about CO. However, it's not so much an issue about where you appear, but that you can appear, if the search is tailored well enough. Being dropped, and therefore not appearing even if the search terms - and the willingness of the searcher to dig deep on the pages - is perfectly tailored to what you offer, is the problem.
But that raises a good point. You don't buy PPC, but rely on organic, have duplicate content, and get nuked. Do you still get nuked if you do buy PPC? Can G take your money but not allow access to your page on their pages?
jacobwissler
08-04-2006, 11:01 PM
No PPC does not nuke you organic listings. I think the most important issue is to have a page that looks good. A click is worthless if the page looks stupid, like a made for Adwords Page.
pemburung
08-04-2006, 11:08 PM
Actually, what I meant was fo you still get nuked for duplicate content if you have PPC, under the same circumstances when you would get fried for dupe if you only had organic?
jacobwissler
08-04-2006, 11:32 PM
Who carees if you are nuked if your PPC is #1? I have never been nuked by Google. Organic is #1(SEO Houston, Houston SEO, search engine optimization Houston), but I am not is an industry that requires duplicate content, such as travel sites.
edhan
08-04-2006, 11:58 PM
I do believe these types of duplication will not be penalize as for Travel industry, itinerary from the agent / affiliate has to be the same since the main site will be handling the tours. You can't possibly change the itinerary but overall for the site, you can make some differences in design & links. I have done for a few others travel sites, most of the content are similar but not being penalize.
Cyclops
08-05-2006, 12:46 AM
Matt Cutts......I don't know why anyone bothers to read his blog, he's proved to be wrong on so many issues that his credibility has flown out the window.
On the Duplicate content issue, read any SEO forum and you will find this same question asked over and over again. It's been proved by so many people (myself included) that duplicats content isn't an issue.
It's a myth, like most of the stuff Matt Cutts spouts.
Few sites should be dropped by algo; most sites should be hand checked to see if it is in fact legitimately using dupe content. While there may be some obvious dupe content sites that could be identified by algo, at some stage there occurs a huge gray area. And so far Google has not shown mcuh finesse with gray.
If any SE company should have the resources, it should be Google, but I think you demand too much of a SE.
Matt Cutts......I don't know why anyone bothers to read his blog, he's proved to be wrong on so many issues that his credibility has flown out the window.
Disagree. The SEO business may be compared to finance. The person that makes less mistakes and handle downturns best is the winner. If there were a person that is always right, get his advice, and you would soon have the global fortune. I can tell you, if you worked for IBM, you would not have a chance of being correct on everything you tell about the company. We use to say IBM is not compatible with itself. May be the same should be said about Google.
Conclusion for SEO (Google): If you are less wrong than other people, people should listen to you.
DrTandem1
08-05-2006, 09:09 AM
I suspect two different sites with small portions of duplicate content, such as your ad, are not a problem.
jacobwissler
08-05-2006, 09:31 AM
Someone (I suspect a former girlfriend) cloned my home page at SEO Houston but left off all the interior pages, and inserted bogus phone numbers. The fake site even has Page Rank (1), but it has never hurt my Page Rank or SERPs.
I tried contacting the host, and they refused to take it down even though it was clearly 100% duplicate, except for the bogus phone numbers.
Someone (I suspect a former girlfriend) cloned my home page at SEO Houston but left off all the interior pages, and inserted bogus phone numbers. The fake site even has Page Rank (1), but it has never hurt my Page Rank or SERPs.
I tried contacting the host, and they refused to take it down even though it was clearly 100% duplicate, except for the bogus phone numbers
My underline.
Interesting. Should be tried in court. In IMO the hoster and the girlfriend should loose and pay you the present value of all future net lost income (including degraded brand) + at least 20 % in inconvinience loss. But I am an economist and not a lawyer.
That loss is very difficult to compute, but it could amount to millions if your business is large enough and it has degraded your brand.
P.S. Note, when I write steal content, I write "steal".
Saying: If you "steal" from one person it is plagiat. If you "steal" from two persons it is research. How much did Shakespeare "steal" from Dante and other authors?
jacobwissler
08-05-2006, 10:20 AM
I see it more as a prank because none of the internal pages or Flash were copied, and the phone numbers don't work, so no one can do business with the company. It was not an attempt to benefit from my brand; it was intended to upset me. That's her style. My site is www.seohouston.com the copied site is http://www.seo-explosion.com
Off topic or related subject?
I see it more as a prank because none of the internal pages or Flash were copied, and the phone numbers don't work, so no one can do business with the company. It was not an attempt to benefit from my brand; it was intended to upset me. That's her style. My site is www.seohouston.com the copied site is http://www.seo-explosion.com
There are fine, small distinctions in the English language. I do not know the exact meaning of the word prank.
That being said, when I have your site on the left screen and her site on the right screen, they are so similar (unless you use a public template) that I see it as a palgiat that degrades your brand. Even with a common public template the sites are so similar because of identical content that I see it as a plagiat that degrades your brand.
It may also degrade your brand if you let it stay there without further actions.
jacobwissler
08-05-2006, 11:11 AM
There is not much I can do, except complain to the host. I am not in the mood for a large legal bill.
DrTandem1
08-05-2006, 12:13 PM
Off topic or related subject?
There are fine, small distinctions in the English language. I do not know the exact meaning of the word prank.
A prank is a practical joke. Like taking a paper bag, filling it with dog crap, placing it on someone's doorstep, lighting the bag on fire, ringing the door bell and running away. They open the door and promptly stomp out the fire! Ah, to be a kid again!
Off topic or related subject?
There are fine, small distinctions in the English language. I do not know the exact meaning of the word prank.
A prank is a practical joke. Like taking a paper bag, filling it with dog crap, placing it on someone's doorstep, lighting the bag on fire, ringing the door bell and running away. They open the door and promptly stomp out the fire! Ah, to be a kid again!
Excellent explanation.
Then, IMO they are degrading / destroying his brand. In my view it is criminal activity.
jacobwissler
08-05-2006, 05:49 PM
A prank is when a Hummer driving, Rolex wearing, super beautiful bitch from Hell steals the remote to your Plasma TV but leaves the TV. Monetary loss, $14, anger factor, 100%. If she had taken the TV, I would have called the cops, but a $14 remote, I just ordered another.
Spoke to the hosting company for the 3rd time, they claim they will send a 24 hour warning and if the site is not changed, they will pull it down. Remains to be seen if they actually do this.
universalsid
08-06-2006, 12:34 AM
In my opinion,
1.This is a easy-to-use tool to check for content duplication:
You can try it:
www.horsesearchengine.com/hedir/sitechecker.php (http://www.horsesearchengine.com/hedir/sitechecker.php)
2.In this case, if you find a content duplication, you can consider using alternative good words to give the same message without changing the major keyphrases. It does not take long to do that.
3.If you only copy and paste that much you have mentioned, i don't think you are going to be penalised. Content duplication from other site may be a concern, if you copy substantial matter (say, one or more paragraphs) of another site.
4. As for example, if you search for a specific product, you can see, pages containing the same description (product feature) appear in the Google SERPs. How will you change the product features of a mobile handset from one site to another? Google understands that.
5. Probably, in the webmaster's guideline or inside sitemap, Google has indicated that content duplication is an issue and pages containing same content belonging to the same website, should carefully be restricted from the access of the GoogleBot. Like printer friendly version of the webpage should be restricted using robots.txt.
6. Matt Cutts' blog is probably the only reliable link between Google and the vast webmasters' community. Each topic it covers on google Search has been discussed so much because these are useful stuff that gives substantial guideline for the web publishers.
Hope that helped.
If I do not remember wrong, the Google PageRank of SEO Huston (http://www.seohouston.com/) was 6, now it is 5. How do you explain that?
Is this an example of reduced eProperty value of your site because of that duplication? If the answer is yes, I will estimate the present value of your future losses to at least USD 100 000.
jacobwissler
08-06-2006, 11:02 AM
Happy to report the offending site has been removed, after my 3rd phone call to Host Rocket, the hosting company.
Yes my SEO Houston PR fell from a 6 to a 5 and my web design page fell from a 7 to a 6, but I still love my position in organic Google SERPS, so I do not worry about the little green bar.
To be more precise if that is the reason that the PageRank dropped from 6 to 5, a drop of about 17 %. Let us assume that that is drop in eProperty value if the alternative is constant PageRank after Bigdaddy etc.
If the alternative is an increase in eProperty value from 6 to 7 (perhaps less probable) the drop in value is 29 %. Let us open for this possibility and estimate the drop in value to 20 %.
I also require 20 % mark up to cover the inconvenience loss. Then we get this problem:
X=Present value of your (web?) business.
(0.2X)1.2=100 000
X=USD 416 666
If you use PageRank as a measure of eProperty value (do you have a better metric) and combine that with the Modigliani-Miller Theorem (http://www.investopedia.com/terms/m/modigliani-millertheorem.asp) you may observe that my estimate are not so wrong.
Questions:
1. Do you say that criminal and I mean criminal activity on the internet (a relatively new industry) should not be punished?
2. If you do not have the resources to take them to court, OK.
Assumption: As far as I have understood from your postings, your business is not small.
P.S. Do you know what the world's best investor, Warren Buffett (hated on Wall Street) says:
You may loose money for my company, but if you destroy my brand I will be ruthless.
May be that is one of the reasons why he is hated on Wall Street.
I fully agree with him and in my opinion they have destroyd / degraded your brand.
Happy to report the offending site has been removed, after my 3rd phone call to Host Rocket, the hosting company.
Yes my SEO Houston PR fell from a 6 to a 5 and my web design page fell from a 7 to a 6, but I still love my position in organic Google SERPS, so I do not worry about the little green bar.
Good, I put that hosting company on my internal black list and the name (since it does not exist any longer) of the copy site in the category web criminals..
jacobwissler
08-06-2006, 01:03 PM
I like your style. Let's all move to Norway. Probably has better weather than Houston in August (terribly hot and humid).
schmeetz
08-06-2006, 03:54 PM
If generating dupcon did in fact result in penalties, it would be far too easy for the scum of the net to floor anyone of there choosing.
Se's may take it into consideration. It's just too difficult to determine.
Someone would truly have to sacrifice their site for the greater good and test this theory.
Any offers? Lol
and even then, it would only apply to that industry or topic specific site.
jacobwissler
08-06-2006, 04:24 PM
I do think that Google once stated, in an official statement, that who links TO you could never harm your site. I have already tested this and found it to be false. When I purchased major ($5,000 per month) advertising on an off topic site, my site disappeared from SERPS. When I pulled the advertising off, my site went back to #1.
Obviously, inbound links can hurt, and a person with malice could harm a site by linking to it from another site. In this case, it was a PR 9 major media site, which some Google filter determined to be attemped manipulation, and I disappeared from SERPS until the advertising was removed.
Obviously, inbound links can hurt, and a person with malice could harm a site by linking to it from another site. In this case, it was a PR 9 major media site, which some Google filter determined to be attemped manipulation, and I disappeared from SERPS until the advertising was removed.
Of course if a PR(9) link to your site is (eliminated) that will hurt your own PR.
What you are saying is that there is a strong relation with the anchor text (?) of a site with high pagerank and your SERP's.
Interesting, since it clearly demonstrates that there is, in some cases, a strong relation between the pagerank of your site or the site that links to you, the anchor text of that link and SERP position.
I do think that Google once stated, in an official statement, that who links TO you could never harm your site.
I would not take the Google guidelines as 100 % true in a fast developing SE world. How often are these guidelines updated?
But are you 100 % sure that is the true explanation?
May be they mean it can not hurt your ranking if a PR(0) site or banned site link to you. The link you mention is so important that that is different. It at least hurts your pagerank if high pagerank (strong vote) sites link to your site with foccuesed and relevant anchortext and these links are deleted. It is natural that there is a relation between Pagerank and SERP position. It is a factor in the overall SERP algorithm. May be that factor can dominate in some situations.
Simplyfy the matter. You have only one link to your site, the pagerank 9 site. Then this link is deleted. Try to imagine the effect on your pagerank (and SERP position).
jacobwissler
08-07-2006, 12:44 PM
I am 100% certain that an inbound link from a PR9 media site wiped me out of SERPS, and when the major media advertising was removed, my site instantly went back to #1. Some Google filter saw the advertising as an attempt to manipulate SERPS, and I was gone.
I am 100% certain that an inbound link from a PR9 media site wiped me out of SERPS, and when the major media advertising was removed, my site instantly went back to #1. Some Google filter saw the advertising as an attempt to manipulate SERPS, and I was gone.
Indicates as expected, that the Google algorithms get smarter and smarter for every day.
pemburung
08-07-2006, 04:19 PM
Jacob, if I understand it what you are saying is you took out an ad on a major media site - let's say, eg, MSN. Google saw this as an unrelated inbound link, and penalized your site. Once you removed this ad/link, your site reappeared.
This can't be what happened. This is the same as being penalized for putting up a billboard on a highway, or an ad in a newspaper. If it was true, then huge amount of revenue from ads across the web would be jeapordized, and/or tens of thousands of sites disappear from the Google SERPs
jacobwissler
08-07-2006, 04:26 PM
It happened. The ad was a lot more than a link, and was combined with total saturation of radio and TV advertising. The moment the link went up, my site fell off SERPS, when the link came down, within 24 hours, I was back to #1.
pemburung
08-07-2006, 04:29 PM
That is too scary - it doesn't suggest, as Kgun did, that G is getting smarter, but that it's algo has become completely unreliable; hence so has G.
I am 100% certain that an inbound link from a PR9 media site wiped me out of SERPS, and when the major media advertising was removed, my site instantly went back to #1. Some Google filter saw the advertising as an attempt to manipulate SERPS, and I was gone.
My bolding.
That is too scary - it doesn't suggest, as Kgun did, that G is getting smarter, but that it's algo has become completely unreliable; hence so has G.
I disagree. Even randomizing may be used as a last strategy to confuse a smart manipulator.
And in advanced control theory, it is shown that you need a chaotic (deterministic process) to control another chaotic process.
Example: A football lying in a saddle (saddlepoint control) that hangs from a rope in the roof. That is (close to) a chaotic movement. You can control that movement, by a computerprogram, so that the football does not fall out of the saddle.
Do you see any possibility for Google to use advanced control theory?
And some relevant and good sites / pages (let us say 10 %) may suffer if that is used as a method to penalize the other 90 % manipulated sites / pages.
jacobwissler
08-07-2006, 04:56 PM
The answer makes for an easy marketing decision, do not accept links, come at the public from the air with massive radio and TV ads to drive traffic to your site, Google does not (yet) monitor the airwaves or impose a penalty for this form of advertising. Every $1 spent on airtime should bring $7 through the front door in gross sales.
This is what the big corporations do, and it works for small business, too.
pemburung
08-07-2006, 05:46 PM
Kgun, sorry, saddlesores or not, penalizing legitimate players in any field is not a smart technique. It's a dumb technique. Easy, simple, and brainless. What you are suggesting is that the IOC randomly bans 10% of cross country skiers at the Olympics to dissuade all from taking drugs. Had this been the rule, Norway would have a far skimpier historical medal collection than it currently does. I mean, every Olympics could have been 2006 for it......
Kgun, sorry, saddlesores or not, penalizing legitimate players in any field is not a smart technique. It's a dumb technique. Easy, simple, and brainless. What you are suggesting is that the IOC randomly bans 10% of cross country skiers at the Olympics to dissuade all from taking drugs. Had this been the rule, Norway would have a far skimpier historical medal collection than it currently does. I mean, every Olympics could have been 2006 for it......
Did you note my underline?
1. I do not think your comparison is relevant. You are talking about human action on a restricted population.
2. I am talking about automatic filters on millions of internet sites.
pemburung
08-08-2006, 09:47 AM
Kgun, the comparison is relevant. Random is random, whether picked out of a hat by a human hand or a machine generated numbr. The size of the population, and the criteria for inclusion in that population, are also irrelevant. All that is relevant is what happens once you are a member of that population, and your comment was the random penalization of maybe 10% of the population as a deterrence could occur and this would be OK. As the population includes legitimate sites, and the selection is random, it must follow that as you say some of these sights may be penalized.
As far as the underlined may goes, you acknowledged that a perecentage may be affected. Ethically, that legal sites even may be effected is OK in a population which has indicated its willingness to participate on that basis; but wrong in a system that is imposed without acknowledgement. If G says that this is the price of being searched by its bots, and also makes the public aware that a percentage of relevant sites may not be included, on a random basis, then all's OK. If it doesn't say these things, but does them while suggesting that it is conducting a complete and thorough search, with appropriate results, then it is acting unethically.
And also as far as the may goes. There's an saying justifying doctors "wasting" time being taught about rare events such as snakebite. When you are the person dying, you do not care about the rarity of the event. Similarly, when your site gets nuked and your income or just information output dies, you are unconcerned with the "fair" random nature of this event. Maybe we could shoot down one airliner per million flights to serve as a warning to hijackers not to try?
I hope this is my last post since I think we are a little off topic.
In any sience there is uncertainty. That is one difference between science and religion. If Google is scientific in their applied information science, they should use a lower significance level than 10 %.