WebProWorld Part of WebProNews.com
Page One Link To Us Edit Profile Private Messages Archives FAQ RSS Feeds  
 

Go Back   WebProWorld > Webmaster, IT and Security Discussion > Web Programming Discussion Forum
Subscribe to the Newsletter FREE!


Register FAQ Members List Calendar Arcade Chatbox Mark Forums Read

Web Programming Discussion Forum Working with an API? Developing a plugin? Writing a Mod or script for your favorite blog, Web 2.0 site or Forum? Welcome.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 12-12-2005, 10:20 AM
webmasterjunkie webmasterjunkie is offline
WebProWorld Pro
 

Join Date: Aug 2004
Location: Maryland
Posts: 219
webmasterjunkie RepRank 0
Default Parsing Another Website With PHP

My son, who plays a video game called SOCOM 3, came to me with an interesting question. They have a website at
Code:
http://socom3.scea.com/  <= Short URL
where their stats are stored in a database. You can go to the website and search for a players stats:
Code:
http://socom3html-prod.svo.pdonline.scea.com:10070/SOCOM3_HTML/stats/Stats_CareerSearch_Submit.jsp?userName=INSANE.CASPER&gameMode=0
Only thing is, is that you have to be logged in to get to the page. He gave me his username and password for the site, and I've tried passing it in every way I could find.

My question is, how can I get to the page, and grab the info? Any suggestions?
Reply With Quote
  #2 (permalink)  
Old 12-12-2005, 02:20 PM
webmasterjunkie webmasterjunkie is offline
WebProWorld Pro
 

Join Date: Aug 2004
Location: Maryland
Posts: 219
webmasterjunkie RepRank 0
Default Here's what I have so far

This is my code so far, but it doesn't seem to work. The verification just won't seem to work.
Code:
<?php
$ch = curl_init();
$url = "http://socom3html-prod.svo.pdonline.scea.com:10070/";
$url = $url . "SOCOM3_HTML/stats/Stats_CareerSearch_Submit.jsp?userName=";
$stats_for = "INSANE.CASPER";
$url = $url . $stats_for . "&gameMode=0";
$user_name = "username";
$user_pass = "password";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'PHP scraper 0.01');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_USERPWD, "($user_name.:$user_pass)");
curl_setopt($ch, CURLOPT_POST, "10070");
curl_setopt($ch, CURLOPT_HEADER, 1);
$content = curl_exec($ch);
curl_close($ch);
$content = ereg_replace ('"/', '"http://socom3html-prod.svo.pdonline.scea.com:10070/', $content);
echo $content;
?>
Reply With Quote
  #3 (permalink)  
Old 12-12-2005, 02:25 PM
oomwrtu oomwrtu is offline
WebProWorld Member
 

Join Date: Jul 2003
Location: Eastern US
Posts: 86
oomwrtu RepRank 0
Default

cURL (command line or via PHP). Simple as that :D. It's a bit difficult to find resources on it, so if you need help using it let me know. It allows you to request a page exactly as if the server was a regular user. You can set things like cookie storage location, user-agent, and GET and POST (what you would be looking for to login) info. The result can be returned as a variable, allowing you to do w/e parsing you need to do to it. If you can't find a sample, let me know and I can work with you, cURL has been a life-saver for me, lol.

EDIT: You beat me to it, I will look at your code and see what I can see...

EDIT2: Try this and see how it works, obviously I am unable to try it myself :D. If there is something wrong with it and you can't figure it out, I will be back tonight and I can see what I can do again.
Code:
<?php

// login and store cookie data
$cookielocation = "socomcookies.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookielocation);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookielocation);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$url = "https://socom3html-prod.svo.pdonline.scea.com:10079/SOCOM3_HTML/account/Account_Login_Submit.jsp";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "userName=" . $usernamehere . "&passWord=" . $passwordhere);
$result = curl_exec($ch);

// get actual data
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookielocation);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookielocation);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$url = "http://socom3html-prod.svo.pdonline.scea.com:10070/SOCOM3_HTML/stats/Stats_CareerSearch_Submit.jsp?userName=" . $usernamehere . "&gameMode=0";
curl_setopt($ch, CURLOPT_URL, $url);
$result = explode("\n", curl_exec($ch) );
curl_close($ch); 

// can parse here, currently ouputing each line
foreach( $result as $currline ) { echo $currline . "\n"; }

?>
__________________
Help Matt With College

Help a deserving student pay off his college loans!
Reply With Quote
  #4 (permalink)  
Old 12-12-2005, 02:59 PM
webmasterjunkie webmasterjunkie is offline
WebProWorld Pro
 

Join Date: Aug 2004
Location: Maryland
Posts: 219
webmasterjunkie RepRank 0
Default

Thanks so much for your help, and your offer for more. I implimented the changes you made, and all the outup I get is the numeral 1. That's it "1".
Reply With Quote
  #5 (permalink)  
Old 12-13-2005, 05:45 AM
oomwrtu oomwrtu is offline
WebProWorld Member
 

Join Date: Jul 2003
Location: Eastern US
Posts: 86
oomwrtu RepRank 0
Default

Sorry about not getting back to you last night like I said I would, I got overloaded with schoolwork. I can't wait until next year when I can go off to college and have more free time, yay! lol Anyways...

It would seem as though something got set wrong. I would first check the cookie file to see if it has anything in it (might look something like:).

Code:
scea.com	FALSE	/SOCOM3_HTML/account/Account_Login_Submit.jsp	FALSE	0	username	*Username*
scea.com	FALSE	/SOCOM3_HTML/account/Account_Login_Submit.jsp	FALSE	0	password	*random characters*
If you don't have anything at all there, obviously its an issue with the first part. I would probably also try changing the code to:

Code:
curl_setopt($ch, CURLOPT_POSTFIELDS, "userName=" . $usernamehere . "&passWord=" . $passwordhere);
// display actual output of the first page
echo curl_exec($ch);
echo "<hr />\n<hr />\n";
Code:
curl_setopt($ch, CURLOPT_URL, $url);
// display actual output of the second page
echo curl_exec($ch)

// can parse here, currently ouputing each line
//foreach( $result as $currline ) { echo $currline . "\n"; }
: so that you can further narrow down where the problem might be. Let me know what you get with those changes, I will try to a little more prompt this time :D.
__________________
Help Matt With College

Help a deserving student pay off his college loans!
Reply With Quote
  #6 (permalink)  
Old 12-13-2005, 10:14 AM
webmasterjunkie webmasterjunkie is offline
WebProWorld Pro
 

Join Date: Aug 2004
Location: Maryland
Posts: 219
webmasterjunkie RepRank 0
Default

I added the HEADER option to both $ch and the second one returned the following header.
Code:
HTTP/1.1 302 Moved Temporarily Location: http://socom3html-prod.svo.pdonline....ESSION_EXPIRED Content-Type: text/html;charset=UTF-8 Content-Length: 0 Date: Tue, 13 Dec 2005 15:42:43 GMT Server: Apache-Coyote/1.1
The socomcookies.txt file I made is empty. The permission set is 777.

http://www.eastcoastassassins.com/get_stats.php
Reply With Quote
  #7 (permalink)  
Old 12-13-2005, 02:32 PM
webmasterjunkie webmasterjunkie is offline
WebProWorld Pro
 

Join Date: Aug 2004
Location: Maryland
Posts: 219
webmasterjunkie RepRank 0
Default

OK, I made small changes to the script we've worked on. This is what I have now:
Code:
<?php

$user_socom_name = "";
$user_socom_pass = "";
$socom_stats_for = "INSANE.CASPER";

// login and store cookie data
$cookielocation = "socomcookies.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookielocation);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookielocation);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$url = "https://socom3html-prod.svo.pdonline.scea.com:10079/SOCOM3_HTML/account/Account_Login_Submit.jsp";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "userName=" . $user_socom_name . "&passWord=" . $user_socom_pass);
$result = curl_exec($ch);

// get actual data
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookielocation);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookielocation);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$url = "http://socom3html-prod.svo.pdonline.scea.com:10070/SOCOM3_HTML/stats/Stats_CareerSearch_Submit.jsp?userName=" . $socom_stats_for . "&gameMode=0";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
$content = curl_exec($ch);
curl_close($ch);
$content = ereg_replace('"/', '"http://socom3html-prod.svo.pdonline.scea.com:10070/', $content);
echo $content;

?>
Now the socomcookies.txt file has something in it:
Code:
# Netscape HTTP Cookie File
# http://www.netscape.com/newsref/std/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

socom3html-prod.svo.pdonline.scea.com	FALSE	/SOCOM3_HTML	FALSE	0	JSESSIONID	DFA9DF36522315F8775B6C2B0F58615C
It still will not allow me to access the page we are aiming for though. I know you'r ebusy, but I just wanted to update you.
Reply With Quote
  #8 (permalink)  
Old 12-13-2005, 03:01 PM
oomwrtu oomwrtu is offline
WebProWorld Member
 

Join Date: Jul 2003
Location: Eastern US
Posts: 86
oomwrtu RepRank 0
Default

Wow, this is really stumping me. I only have two more things you can try:
  • Add curl_setopt($ch, CURLOPT_VERBOSE, 1); to each of the executions (should show all of cURL's output and such)
  • Change the follow location option to true (1) like this for both of them: curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); That was one thing I just noticed, that should take care of the redirect problem
  • If those don't help, I would try posting on a PHP dedicated forum such as http://www.phpdn.net. Although I am experienced with it, sometimes it just takes a fresh look :D

Let me know what you get.

EDIT: lol, went to grab something to eat in the middle of typing this and saw that you responded again, let me check what you've got out.

EDIT2: That is very strange, it is accessing the first page fine and not the second... at this point, I would go ahead and post on the forum I suggested, since they are dedicated to PHP. Obviously you can refer to this thread so that you don't have to retype everything. Very sorry I couldn't be much of a help, but let me know what you find, it might be useful to me in case I ever run into a difficult site.

Now that I think about it, you could just go chew out the web design team for making such a difficult website... lol.
__________________
Help Matt With College

Help a deserving student pay off his college loans!
Reply With Quote
  #9 (permalink)  
Old 12-13-2005, 03:29 PM
webmasterjunkie webmasterjunkie is offline
WebProWorld Pro
 

Join Date: Aug 2004
Location: Maryland
Posts: 219
webmasterjunkie RepRank 0
Default

Please don't apologize, you've already taken the script leaps and bounds past where I had it.

I will post on the other forum you suggested. Thanks again for all of your help, and I'll post the fix - if I ever get it.

Also, if you think it would help, I'll get you a username and password to use. There's no financial information stored, just game stats. So the only thing that could be messed up is his score, but he'd probably hate me for life for that.

Let me know if you're interested.
Reply With Quote
  #10 (permalink)  
Old 12-13-2005, 08:29 PM
oomwrtu oomwrtu is offline
WebProWorld Member
 

Join Date: Jul 2003
Location: Eastern US
Posts: 86
oomwrtu RepRank 0
Default

Absolutely, I can't stand to leave a problem unfixed :D. I will send you a PM with contact info (although you could just send it that way, it's up to you). You can trust me with that information; as a teenage guy (almost an adult though, lol), I know what games mean to the people that play them 8).
__________________
Help Matt With College

Help a deserving student pay off his college loans!
Reply With Quote
Reply

  WebProWorld > Webmaster, IT and Security Discussion > Web Programming Discussion Forum
Tags: parsing, php, website



Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



Search Engine Friendly URLs by vBSEO 3.0.0