lol I have to laugh MJ, cause I remember in my very first day, accidentally using Front Page to scrape and it did indeed do a great job!
Originally Posted by mjtaylor
Google, Yahoo, Bing and Wiki Leaks definitely don't use FrontPage.
You could even view the code and copy and paste it into WordPad. Many great sites are made by WordPad, copy and paste and modification so the content is unreckognizeable. Some even advice a newbie to start with WordPad and a ftp program
But I would not call it content (screen) scraping And if the site has more than a few pages it is inherently slow to use that technique.
And if you want to separate content from markup etc. you need a scraper with parsing abilities that could even put the content directly into your database.
Inline assembly and / or plain c is fastest.
We may have different opinions of what a great job is. Neither would I use FrontPage nor Microsoft's C / C++ platform.
Originally Posted by morestar
Last edited by kgun; 12-31-2010 at 11:51 AM.
Tags for this Thread