PDA

View Full Version : search bots and Javascript



pdstein
07-09-2007, 10:14 AM
I've read many times that the general concensous is that search bots can't read Javascript. So, I modified some of the code on my site in the hopes that bots would see a direct link while humans who click would go through a click tracking script:



<a href="http:/www.otherSite.com" onClick="window.open('http://www.mysite.com/clickTrackingScript.php?bannerID=1234')">
Text
</a>


I'm finding, however, that some bots still open the tracking script. So, is it a myth that bots can't read Javascript, or does that really mean bots can't read text/HTML generated by a Javascript?

Is there a better way to do what I'm trying do?

wige
07-09-2007, 11:50 AM
In my experience, at least the Google Bot is able to follow links in Javascript, despite their claims to the contrary. I ran into issues when Google started crawling pages from a AJAX system I was testing, where the only possible way to find the link was through the javascript. I am not sure how thorough this method is, however. You might be able to scrample the address in a javascript function to prevent the bots from crawling it.

seiretto
07-09-2007, 03:39 PM
Is there a better way to do what I'm trying do?

Use AJAX. And here is how, add this:
onClick="track_the_click('1234')"
to the relevant links and add the JavaScript below to each page:

<script language="JavaScript"><!--
// Free with compliments from Seiretto ;-)
var url="http://www.mysite.com/clickTrackingScript.php?bannerID=";
function getHTTPObject()
{
if (typeof XMLHttpRequest != 'undefined')
{
return new XMLHttpRequest();
}
try
{
return new ActiveXObject("Msxml2.XMLHTTP");
} catch (e) {
try
{
return new ActiveXObject("Microsoft.XMLHTTP");
} catch (e) {}
} return false;
}
function track_the_click(id)
{
var http = getHTTPObject();
http.onreadystatechange = function()
{
if (http.readyState == 4) // 4 is complete
{
//alert(http.responseText); // uncomment to see the response!
}
}
http.open("GET", url+id, true);
http.send(null);
}
// -->
</script>
If you do not want any bots following your links in your scripts change the above line
FROM:

var url="http://www.mysite.com/clickTrackingScript.php?bannerID=";
TO:

var url1="http://www.mysite";
url2=".com/clickTrackingScript.php?bannerID=";
url=url1+url2;
// most bots are too stupid to put the two together so will not index it (at least currently).
Hope it helps.

Dave.

imvain2
07-09-2007, 07:04 PM
seiretto's javascript method is a good one to use. I actually came across the exact same problem, my simple solution was to place the tracking script into a seperate directory and disallow that directory in my robots.txt.

Of course my solution assumes the bots follow the robots.txt rules, which the legit ones do.

pdstein
07-09-2007, 09:01 PM
Dave, thanks for the suggestion. I'll give it a try.

- Paul

Conficio
07-10-2007, 05:47 PM
In my understanding, spiders (bots) do not interpret JavaScript per see. Because to be meaningful they'd need to build a DOM and that is way to expensive in terms of computation.

However, because many webmasters to use JavaScript to actually map links in the navigation, the spiders do a cursory scan for patterns of URL's in scripting. So they find your complete tracking URL, while not finding it when the URL is composed out of two string constants.

I think imvain2's solution is the best to your problem. And if you want to can also block folks that do not follow the robots.txt rules, using a bit of Redirect magic, involving the referrer URL (which needs to be a page and not blank like most spiders do).

K<o>