|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
First of all, apologies if this is requesting stuff that is already out there. I couldn't find the answer that i was looking for.
I sort of know that scraper sites exist and I've got a vague idea what they do. But I have no idea how to identify them and I don't really know what the impact on our sites would be. I was playing with the Google Sitemap tool, in particular, the AllinURL command and I found a couple of sites that referenced our site and led to a 404 page with a "URL moved" description. It looked odd. Any pointers gratefully received. |
|
|||
|
If I wanted to create a scraper page for "waterproof widgets" I would search on this phrase at any of the major search engines and then copy the SERPs page that is the result, post it to a page called www.domain.com/waterproof-widgets.html and then post Adsense on it.
By collecting the scraps from the SERPs I'm likely to get "content" that is highly relevant to the search engines, and the page can very likely rank for the result. If I wanted to get fancy I could actually go pull more content from each of the links listed, or I could pull the results from all three major engines and then mix them up to get more content, make that content more original, and make it less detectable. To get even fancier, you can pull the results from the SERPs on a daily basis via automated script so that your pages are updated regularly, giving you even more pull with the search engines. |
|
||||
|
hmmmm... I don't think we needed instructions on how to build a scraper site.
Yes, there are lots of scraper programs--they advertise on Adwords a lot, just search "scraper." The sophistication they employ to enable theft is chilling, I have tried a few as shareware trials. These programs are expensive and likely to be obsolete in the next Google update. In general I think the game is about over for the scrapers anyway, Google is hot on their case (or so it would seem). Andi
__________________
...the Rockies may tumble, Gibralter may crumble... G & I Gershwin, 1937 |
|
|||
|
http://www.copyscape.com/ - enter your url and it will display any pages that use your content.
|
|
||||
|
I know there are sites that use my contents. I find them through the "referrer" stats because they just copy and past my entire code, which means they are hotlinking to my graphics.
The copyscape site that you listed only showed sites that link to mine.......not any that are copying my stuff. Links are good. Copying is bad. The copyscape site was not helpful.
__________________
The Weedy Lady at http://www.happydaycards.com Free E Cards for holidays and all occasions, fun pages and great recipes. |
|
||||
|
Quote:
Of course the page I used makes use of some free to use articles for websites. It was a good valid test though of how effective it is. Plus it fed my curiousity as to how many other sites had that same article so there is yet another use for copyscape if you use free content articles on your sites. It even picked up websites using the same text based affiliate text from the merchant as I have used. =) Heidi |
|
||||
|
Quote:
I checked a couple of client URLs and it did draw some duplicate content which could be confused with scrapers. For example we have a standard company introduction for one client which is often used on third party sites as link text descriptor, as well as on the homepage of the client company's website. It seems a useful tool; thanks for sharing it.
__________________
If you've worked in the Adult SEO industry, please tell me... how do you get it up? My web designers |
|
||||
|
Yes, I am with fxstudios also (fantastic company!), but because of the java script I use on about 50 of my pages I can't use the hotlinking feature. I am hoping to find time after the holidays to completely redesign those pages to implement the hotlinking protection. However, the last time I turned it on as an experiment one of my friends could not get any of my graphics on my pages, and that does not include the ones with the Anfy java script on them. I change the "location" of my graphics every few days and that helps a bit.
__________________
The Weedy Lady at http://www.happydaycards.com Free E Cards for holidays and all occasions, fun pages and great recipes. |
|
|||
|
It certainly was useful to give me an overview of the how to. That made the penny drop, to some degree for me. The copyscape site has thrown up a few instances where our copy has shown up in really poor quality sites. So this leads on naturally to a few more questions:
1> What action can be taken? Legal? 2> I'm assuming turning them into to Google for Adsense may be useful. True? 3> What's the protential impact on my SERPs? Thanks for all your help, I'm a wiser man already |
|
|||
|
Quote:
http://www.webproworld.com/viewtopic...hlight=#262082 Read the others as well if you like. |
|
||||
|
Well...the first part, as google junky pointed out, was covered to death. So I won't go there again.
So, with that said, on to 2) and 3). 2) If Google has AdSense running on the SERP pages, it is unlikely they'll act any time soon. That's extra money in their coffers, and they're a for-profit corporation. So they've got a vested interest in keeping the spammers in. Mind you, sooner or later they'll have to crack down on this type of thing...not because of irrelevant SERPs and scraper pages finding their way in, but because of advertisers who pull out because they're pissed off at the irrelevant scraper pages they find their ads on. It may take a while, but Google will act...eventually. 3) Minimal, in the worst case. You don't control the content of the scraper pages, or any other pages that you don't have FTP or other access to. It would be all too easy for competitors to knock other competitors out of the SERPs. I'm on at least 20 scraper pages that I know about (and hundreds more I probably don't), and nothing has ever happened to me in this regard. Again, I don't control it. If you want to see some of them, use Google and search for "ADAM Web Design" and then one of 5fish.net , 360mediaworx.com , elehost.com , or abacus.ca . I'm not going to provide direct links since linking to spam is bad, mmmkay?
__________________
Toronto Web Design | Search Engine Friendly, Standards-Compliant Layouts | Walk on my Path (my blog) |
|
|||
|
I used to work for a company that took a question set input by the user then used java to scrape every known car insurance site in the uk and return an insurance quote from each of them.
A lot of companies have got wise to this sort of thing which is why you see those distored images with a random series of alpha numeric characters that you have to type in in order to proceed. Automated scrapers can't get past this.
__________________
Cooking made simple http://blokeinthekitchen.blogspot.com - http://twitter.com/blokeinkitchen |
|
|||
|
Quote:
|
|
|||
|
Quote:
__________________
Cooking made simple http://blokeinthekitchen.blogspot.com - http://twitter.com/blokeinkitchen |
|
||||
|
One thing is to steal your content, but your bandwidth too?
1. Syndicate your content with Rss. 2. Take control of your server. http://www.webproworld.com/viewtopic.php?t=55223 |
|
|||
|
Quote:
If you do a search for people who backlink to you, you will find many splog scrapers, I think ADAM is correct, it really doesn't hurt you. Think about how stupid it would be for a search engine to punish for something you can't control? and it's good to see more ladies in the house. :)
__________________
SEO Blog |
|
||||
|
What is the purpose of a scraper site? I don't understand why someone would bother to build one ... seems like a waste of time.
__________________
Shirley Bradbury - Woman-owned web business in the wilds of Western Colorado Web Site Design & Marketing Web conferencing services |
|
|||
|
We used copyscape and were amazed at how brazen people were at completely stealing our content, verbatim.
We contacted the worst offenders (one was even a ParaLegal!) and they all removed the content ASAP. At copyscape they say you can contact their hosting company and ICANN, the search engines. It's pretty much a bad thing to get branded as a plagerist. We have a SERIOUS problem with people stealing our images at Ebay. We've contacted Ebay multiple times, the offenders multiple times. And Ebay so far, hasn't even replied us! It's a serious issue and I'd sure like to know how to protect our images. We're on an Apache Server (unix). Any great ideas that won't affect the coming Xmas traffic or greatly upset the website? |
|
|||
|
htaccess for hotlinking protection
http://altlab.com/htaccess_tutorial.html |
|
||||
|
yes. Use .htaccess to get control of your web server.
You may also use PHP to prevent hot linking from other sites. http://www.sitepoint.com/books/ http://www.sitepoint.com/books/phpan...2ebb8dedea92f3 Chapter 7. Books that should be in every webdesigners bookcollection. You may download the first chapters free. http://www.sitepoint.com/forums/ |
|
|||
|
You can prevent your site from been downloaded by using bot traps. For example: http://www.google.com/search?q=site%...world.com+trap
You must have google as referral. because they are cloaking |
|
||||
|
"Setting a Spider-trap
The best method of identifying bad bots is to create what is known as a Spider-trap. Create a directory, block that directory to all agents using robots.txt and link to the directory from a page (usually as a small 1x1 pixel link). Only bad bots will access that directory (ie they've ignored our robots.txt exclusion). These bots can then be directed to a script that will immediately grab their IP address, User Agent or Referrer and add it to an .htaccess file - so that they're banned from the site". http://www.webproworld.com/viewtopic...cd74b4cd6fb731 And you may again use PHP. |
|
||||
|
<?php
$url = $_SERVER["HTTP_REFERER"]; $browser = $_SERVER["HTTP_USER_AGENT"]; $ip = $_SERVER["REMOTE_ADDR"]; ?> http://www.w3schools.com/php/php_functions.asp Related link: http://no2.php.net/getenv |
|
|||
|
Quote:
__________________
Cooking made simple http://blokeinthekitchen.blogspot.com - http://twitter.com/blokeinkitchen |
|
|||
|
Quote:
Here's what you're looking for, http://www.webproworld.com/viewtopic.php?p=264747 |
|
|||
|
here is the best tool to find scraper sites, period.
http://www.linkhounds.com/link-harvester/backlinks.php
__________________
SEO Blog |
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2009 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |