View Single Post
  #8 (permalink)  
Old 08-03-2008, 09:15 AM
kgun's Avatar
kgun kgun is offline
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: May 2005
Location: Norway
Posts: 5,709
kgun RepRank 10kgun RepRank 10kgun RepRank 10kgun RepRank 10kgun RepRank 10kgun RepRank 10kgun RepRank 10kgun RepRank 10kgun RepRank 10kgun RepRank 10kgun RepRank 10
Default Re: My Site has been stolen

Quote:
Originally Posted by wilderness View Post
How do they manage to scrape a whole site?
"A web crawler (also known as a web spider, web robot, or—especially in the FOAF community—web scutter[1]) is a program or automated script which browses the World Wide Web in a methodical, automated manner. Other less frequently used names for web crawlers are ants, automatic indexers, bots, and worms.[2]
This process is called web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a website, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).
A web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies".

Source: Web crawler - Wikipedia, the free encyclopedia

"Advanced Site Crawler 2003 4.2

Description

Allow you to search a website and download images, videos, documents, sounds..."
Brothersoft Editor/ Advanced Site Crawler 2003 is a Windows-based shareware that has two main functions. The first one is to search inside a website that you will choose and will follow one link after the other to search for information. The second function allows you to search a website and download images, videos, documents, sounds and much more! You can download files into separate categories or create a duplicate of the original website".


Source: Download Advanced Site Crawler 2003, Advanced Site Crawler 2003 4.2 Download

Search term:

advanced site crawler

and you find more information.

Unless you block bad bots in .htaccess (if you are on an Apache server) or make a spider trap, it is done in seconds to copy your whole site.

Related:

Scripts: Spider Blocking :: .htaccess, PHP, Block Bad Bots with .htaccess or PHP

http://evolt.org/article/Using_Apach...bots/18/15126/

A useful tool: Copyscape - Website Plagiarism Search - Web Site Content Copyright Protection

Last edited by kgun; 08-03-2008 at 09:27 AM.
Reply With Quote