Submit Your Article Forum Rules

Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: Copy scape and duplicate content - more confusing than convincing.

  1. #1
    WebProWorld MVP kgun's Avatar
    Join Date
    May 2005
    Location
    Norway
    Posts
    7,999

    Question Copy scape and duplicate content - more confusing than convincing.

    These

    Marketing and encryption

    SEO blog content duplication?

    threads made me follow up with this post and two related articles. After reading these two threads and articles thoroughly you should ask yourself:
    1. Is it possible to idenify duplicate content aside from trivial copy and paste (without citation)? Citing small sections of an article is not regarded as duplicate content.
    2. Is it possible to identify advanced article writing bots or software and article spinning?
    3. Can search engines that should have fairly advanced algorithms identify article spinning when the combinations are "infinite" for practical purposes?
    4. Is it possible to earn Ad money on your site if you compete with Ad pages produced by bots or article spinning software?
    5. How many (known and unknown) methods are there that mine the SERPs to make gateway and landing pages for an automatically generated site with thousands of pages filled with the same Ad code?
    6. Will good programmers that write automatic content creation software mined from the trillions of pages on the internet, take over at least the Ad space on the internet?
    7. How many affilate and ad providers check and recheck their publishers on a regular basis?
    8. Should an online organization be set up on the internet to protect content writers?
    Last edited by kgun; 02-15-2012 at 08:45 AM.

  2. #2
    Senior Member SEOforGoogle's Avatar
    Join Date
    Jan 2005
    Posts
    506
    I'll try me best to add my $.02

    "Is it possible to idenify duplicate content aside from trivial copy and paste (without citation)? Citing small sections of an article is not regarded as duplicate content."
    - Tough to say. I've seen content pass copyscape that was clearly used elsewhere.

    "Is it possible to identify advanced article writing bots or software and article spinning?"
    - I think there are certain patterns that reveal when the content is generated in this manner.

    "Can search engines that should have fairly advanced algorithms identify article spinning when the combinations are "infinite" for practical purposes?"
    - Again, I think they can detect certain patterns. I bet they have a "Mad Libs" attribute that looks at content that in the same place just swaps out adjectives but always keeps certain content in the same place.

    "Is it possible to earn Ad money on your site if you compete with Ad pages produced by bots or article spinning software?"
    - Absolutely.

    "How many (known and unknown) methods are there that mine the SERPs to make gateway and landing pages for an automatically generated site with thousands of pages filled with the same Ad code?"
    - I only know of a few, but I'm sure the black hatters have more advanced stuff that even we aren't aware of yet.

    "Will good programmers that write automatic content creation software mined from the trillions of pages on the internet, take over at least the Ad space on the internet?"
    - Partially. Long tail terms should be a piece if cake to get ranked for. Competitive terms will still need some link juice to get top rankings.

    "How many affilate and ad providers check and recheck their publishers on a regular basis?"
    - Ha! I think the only time they check is when you sign up. After that, only if someone complains do they go back and look again.

    "Should an online organization be set up on the internet to protect content writers?"
    - Wow. Great question. I think Google is trying to do that with the author tag, giving people the chance to voluntarily proclaim that they are the producers of the content. A problem I see with an online organization is all the issues of validating identities, and relationships. What happens if I am an author for my own site, but get hired to write for another company? Who's the owner of the content? The quick answer is the company that hired the writer. But how will the search engines make that distinction?
    I think this last question deserves it's own thread, I think it's that complex of an issue.

  3. The following user agrees with SEOforGoogle:
  4. #3
    WebProWorld MVP kgun's Avatar
    Join Date
    May 2005
    Location
    Norway
    Posts
    7,999
    Thank you for your fast answer. Did you read the whole thread in the first link in my post? Look at the examples. Any comment on that?

  5. #4
    Banned
    Join Date
    Dec 2011
    Posts
    44
    Whatever you have written and assumed is possible, but will be detected, if not automatically, manually. Content syndication can’t be identified.

  6. #5
    WebProWorld MVP kgun's Avatar
    Join Date
    May 2005
    Location
    Norway
    Posts
    7,999
    This is not about content syndication.

    @SEOforGoogle

    You write about pattern recognintion. As far as I know, software is very bad at pattern recognintion. Software will have problems identifying a bird from an aero plane from a leaf from a cloud from a paper from anything moving in the air.

    If you walk on a populated street, you can easily identify a friend. Will software ever be able to do that? No, is my answer and that is the main reason why a computer will not repeatedly beat the best chess players. Read my article on ArticleNorway for additional information.

    And if you have infinite or for pratical purposes infinite combinations, how is it possible to identify a pattern, a state or a section? How is it possible to identify a section from a seed article if you have enough articles? I will guess that you don't need many before the computing power of most computers is exhausted to identify the author or seed article. How can you identify a seed article behind closed doors? How can you identify sections that are drawn randomly from a semantic collection of articles. Are you sure that search engines have the best text mining and semantic indexing algorithms?

    You did not answer may above question about the other posts where search engine bots index sites like

    site.com/articlenorway.com

    automatically made by a php or other script and automatically populate it with ad code?

    Isn't that a proof that SE algorithms have problems with identifying even the simplest form of SERP data mining?

    And since your nick indicate that you do search engine optimization for Google and you mention Google in your answer, I would personally be careful about that, since you may know that nobody manipulate the organic SERPs more than Google by delivering above the fold ads in a colour that is not easy to discern from organic hits. And I am sure that less than 5 % of the random surfer know that it is ads. My conclusion is that Google is not satisfied before they own the complet ad space on the internet. They are the biggest of all manipulaters in my opinion. The push standards on others that they don't follow themself.

    Own private copy since this thread for a to me unknown reason was removed from the front page of WPW.
    Last edited by kgun; 02-16-2012 at 09:46 AM.

  7. The following user agrees with kgun:
  8. #6
    Member SENtelligence's Avatar
    Join Date
    Feb 2011
    Posts
    96
    Hey kgun,

    Of course, a person will recognize a friend in the street, while software will, most likely, not recognize your friend (unless it has seen you two interact in the past). But I think no person can really do what Google is doing on a regular basis - find more or less relevant stuff in piles, upon piles, upon piles of pages of information.

    I'm no Google fan, I just don't think it makes sense to compare things that have too little in common to be compared effectively.

    Besides, Google's panda is already taking care of the fact that Google can't really tell original content from spun content. And, as we are seeing the expansion of G+ into search, my guess is that, quite soon, it will become even harder for different tools to pretend they're people.

  9. #7
    Moderator chrisJumbo's Avatar
    Join Date
    Oct 2005
    Location
    Near Sacramento, CA
    Posts
    788
    1. Is it possible to identify duplicate content aside from trivial copy and paste (without citation)? Citing small sections of an article is not regarded as duplicate content. - Not with the amount of time we as humans have.
    2. Is it possible to identify advanced article writing bots or software and article spinning?
      - Sometimes. You often know it when you read it, but see examples. Also, it is a time issue.
    3. Can search engines that should have fairly advanced algorithms identify article spinning when the combinations are "infinite" for practical purposes?
      - Some. The cat walked across the street. The dog ran down the parkway. Are those duplicates of each other? Top CD Rates vs. Best Certificate of Deposit Yields. That sentence seems more closely rated and a different processor could then take a second look. Of course, how much computing power would that second processor need?!?
    4. Is it possible to earn Ad money on your site if you compete with Ad pages produced by bots or article spinning software?
      - Yes. And here is where spammers and spinners can run into problems. It comes to authority. Their sites for the most part are rubbish. And I believe Google is getting better and better at figuring that out. The sites don't have authority. The sites aren't shared in any meaningful fashion. 99% of their links are probably from other auto generated sites. So I believe building authority is the key. Build solid relationships and link partners. That is the way to beat them. And this is where true professionals should unite.
    5. How many (known and unknown) methods are there that mine the SERPs to make gateway and landing pages for an automatically generated site with thousands of pages filled with the same Ad code?
    6. Will good programmers that write automatic content creation software mined from the trillions of pages on the internet, take over at least the Ad space on the internet? -- They make take over small portions, but again, build real authority and you will win.
    7. How many affiliate and ad providers check and recheck their publishers on a regular basis?
    8. Should an online organization be set up on the internet to protect content writers? -- Other than helping them build real authority it would seem like a never ending battle to locate and eradicate the spammers and spinners. Those efforts would be better spent on helping your authority.

  10. #8
    Junior Member
    Join Date
    Feb 2010
    Posts
    7
    A most interesting read! Thanks for the enlightenment and informative content.

  11. #9
    Senior Member NetProwler's Avatar
    Join Date
    Jan 2007
    Posts
    197
    Interesting and thought provoking post.
    Can search engines that should have fairly advanced algorithms identify article spinning when the combinations are "infinite" for practical purposes?
    Hardly. Human mind is vastly versatile and creative. For software, you need to identify a specific pattern. Computers may juxtapose and extend such patterns in their analysis. But the basic pattern still requires to be identified.

    A long time ago, I was a technical editor who had to oversee a dozen technical writers. I could tell who wrote a given piece of article just by looking at the first paragraph. When it comes to writing, all the initiates are advised to develop their own unique style. You can identify an author just by reading a few lines. There are subtle indications/clues which can identify a writer easily for a seasoned editor. Computers don't deal with subtlety in my view. Everything has to be explicit, quantifiable and discernible to a set of concrete rules - for the computers. Fuzzy logic is still way too fuzzy when it comes to human language.

  12. #10
    WebProWorld MVP williamc's Avatar
    Join Date
    Jul 2003
    Location
    On a really big hill in Kentucky
    Posts
    4,721
    1 Is it possible to idenify duplicate content aside from trivial copy and paste (without citation)? Citing small sections of an article is not regarded as duplicate content.
    - I think it is possible to identify really simplistic attempts if by no other means than taking a series of snippets from an article and comparing that to others. I also think it would be a total waste of resources and expense for nearly no benefit.

    2 Is it possible to identify advanced article writing bots or software and article spinning?
    - in a nutshell, not from what I have seen from my own creations.

    3 Can search engines that should have fairly advanced algorithms identify article spinning when the combinations are "infinite" for practical purposes?
    - the algo may be made capable of doing it, however, again, it would be too much expense in resources better used elsewhere, IMO.

    4 Is it possible to earn Ad money on your site if you compete with Ad pages produced by bots or article spinning software?
    - of course it is, as you have the ability to write far more compelling copy to sway a users reason to click.

    5 How many (known and unknown) methods are there that mine the SERPs to make gateway and landing pages for an automatically generated site with thousands of pages filled with the same Ad code?
    - how many coders are there in the world?

    6 Will good programmers that write automatic content creation software mined from the trillions of pages on the internet, take over at least the Ad space on the internet?
    - a good bit of it I think yes, again tho, it comes down to profitability.

    7 How many affilate and ad providers check and recheck their publishers on a regular basis?
    - observation only, not many and not often.

    8 Should an online organization be set up on the internet to protect content writers?
    - there are dozens of them already.
    William Cross
    Web Development by Those Damn Coders
    Firearm Friendly Websites because our constitution matters

  13. The following user agrees with williamc:
Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •