Submit Your Article Forum Rules

Page 3 of 3 FirstFirst 123
Results 21 to 22 of 22

Thread: I Need A Way To Scrape A Site's Content

  1. #21
    WebProWorld MVP morestar's Avatar
    Join Date
    Jun 2007
    Location
    Toronto, Ontario
    Posts
    4,163
    Quote Originally Posted by mjtaylor View Post
    Front Page does it well, IMO.
    lol I have to laugh MJ, cause I remember in my very first day, accidentally using Front Page to scrape and it did indeed do a great job!

  2. #22
    WebProWorld MVP kgun's Avatar
    Join Date
    May 2005
    Location
    Norway
    Posts
    7,711
    Google, Yahoo, Bing and Wiki Leaks definitely don't use FrontPage.

    You could even view the code and copy and paste it into WordPad. Many great sites are made by WordPad, copy and paste and modification so the content is unreckognizeable. Some even advice a newbie to start with WordPad and a ftp program

    But I would not call it content (screen) scraping And if the site has more than a few pages it is inherently slow to use that technique.

    And if you want to separate content from markup etc. you need a scraper with parsing abilities that could even put the content directly into your database.

    Inline assembly and / or plain c is fastest.

    Quote Originally Posted by morestar View Post
    ... it did indeed do a great job!
    We may have different opinions of what a great job is. Neither would I use FrontPage nor Microsoft's C / C++ platform.
    Last edited by kgun; 12-31-2010 at 11:51 AM.
    Hidden Content :: Hidden Content
    Hidden Content
    Conversations creates communities and conversions create profit.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •