Submit Your Article Forum Rules

Page 1 of 2 12 LastLast
Results 1 to 10 of 22

Thread: I Need A Way To Scrape A Site's Content

Hybrid View

  1. #1
    WebProWorld MVP morestar's Avatar
    Join Date
    Jun 2007
    Location
    Toronto, Ontario (Burlington)
    Posts
    4,249

    Question I Need A Way To Scrape A Site's Content

    Hello members and non-members of WebProWorld.com. I need your help today finding a way to scrape/grab all the textual (and possibly image) content and maybe even page name of a website.

    Basically I'm going to be moving the new site we created for a client over to their new domain shortly and I would basically like to find a tool that will get the content so I can use it for another time - after we launch.

    If anyone knows of a reliable tool that'll do this scraping and is hopefully free that would be fabulous! Thanks everyone.

    Join a free dating site and meet single people in your area.
    Submit your content at my content publishing site and promote your business, services or opinions.

  2. #2
    WebProWorld MVP crankydave's Avatar
    Join Date
    Aug 2004
    Posts
    4,732
    You don't have/can't get FTP access to the current site?

  3. #3
    WebProWorld MVP morestar's Avatar
    Join Date
    Jun 2007
    Location
    Toronto, Ontario (Burlington)
    Posts
    4,249
    Nope, not yet Dave...and I don't want to wait.

    I'm seeing programs everywhere, well, scripts but it's 2010, there must be one site that offers a free service to do so.

    *still looking
    Last edited by morestar; 12-30-2010 at 10:45 AM.
    Join a free dating site and meet single people in your area.
    Submit your content at my content publishing site and promote your business, services or opinions.

  4. #4
    Member julien_simon's Avatar
    Join Date
    Dec 2010
    Location
    Vancouver
    Posts
    76
    this one isn't too bad. It will get you the files for a website. It won't grab any database or anything password protected of course. Let me know what you think: httrack.com

  5. #5
    WebProWorld MVP morestar's Avatar
    Join Date
    Jun 2007
    Location
    Toronto, Ontario (Burlington)
    Posts
    4,249
    Thank you Julien, that program httrack is more than great and will now change the dynamic of the work I do in different areas too!

    Join a free dating site and meet single people in your area.
    Submit your content at my content publishing site and promote your business, services or opinions.

  6. #6
    Member julien_simon's Avatar
    Join Date
    Dec 2010
    Location
    Vancouver
    Posts
    76
    Quote Originally Posted by morestar View Post
    Thank you Julien, that program httrack is more than great and will now change the dynamic of the work I do in different areas too!

    Glad I could help

  7. #7
    WebProWorld MVP kgun's Avatar
    Join Date
    May 2005
    Location
    Norway
    Posts
    7,999

  8. #8
    WebProWorld MVP morestar's Avatar
    Join Date
    Jun 2007
    Location
    Toronto, Ontario (Burlington)
    Posts
    4,249
    With httrack, be weary - there's probably a setting for this - I just went with the default - kgun knows how much I love default settings (re:opera) - httrack seemed to also start scraping content from external links. I may have ticked Google and Youtube off a moment ago...

    Some how I was in youtube's bin folder?
    Join a free dating site and meet single people in your area.
    Submit your content at my content publishing site and promote your business, services or opinions.

  9. #9
    Member julien_simon's Avatar
    Join Date
    Dec 2010
    Location
    Vancouver
    Posts
    76
    Quote Originally Posted by morestar View Post
    With httrack, be weary - there's probably a setting for this - I just went with the default - kgun knows how much I love default settings (repera) - httrack seemed to also start scraping content from external links. I may have ticked Google and Youtube off a moment ago...

    Some how I was in youtube's bin folder?
    did you find anything good?

    I have not used it myself actually so I wasn't aware of that setting problem.

  10. #10
    WebProWorld MVP kgun's Avatar
    Join Date
    May 2005
    Location
    Norway
    Posts
    7,999
    This is one of the simpler PHP cURL tasks that is described in the first chapters of the book cited in my last post. You can download the code from the book without buying it.

    You will not appreciate cURL before you have learned it. Start with that book and PHP cURL and you don't need a third party site for this specific and similar tasks.

    Related WPW thread: http://www.webproworld.com/webmaster...thical-WebBots.
    Last edited by kgun; 12-30-2010 at 12:36 PM.

Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •