View Full Version : How do you know if your site has been crawled or indexed
loseriam
08-07-2004, 11:06 AM
I was just wondering how do you know when or if your site was crawled or indexed with these search engines. I see people saying that there site was crawled many times but how do they know and how can I find out if mine has been crawled.
Thank you
kevin
minstrel
08-07-2004, 12:06 PM
Didn't I just see this question posted in another thread?
Anyway, the answer is: look at your website logs - depending on which stats package your host uses, you may see spiders identified by their names (Slurp, Googlebot, MSNbot, etc.) or you'll see the spidernames or "refering" (sic) agents in the "agents" section appended to an entry like "Internet Explorer... MSNbot".
sfowler
08-07-2004, 03:47 PM
If you don't fancy wasding through the logs or don't have easy access to them, then just cut out a unique sentence from a page and paste it in as a search. If the page has been indexed by that SE, your pagee has to come up top of the list.
ronniethedodger
08-10-2004, 05:31 AM
C'mon people ... is this SE-101 stuff or what? =)
1. Use the site:www.domain.com query.
2. For server log analysis ... software like AWstats or Sawmill will do the trick. They will identify bot activity (although Sawmill does a better job of it.) No need to wade thru the raw log files.
3. Another way is with scripts that you can attach to the footers of all your pages. Some of these scripts are designed to trigger an entry for bot activity (amongst other things). The scripts are in a variety of flavors Perl, PHP, ASP, etc.
sfowler
08-11-2004, 07:27 AM
Sure, but I find this way is the easiest way to check if an updated text is genuinely in the index, especially when I know that the page was there beforehand.
ronniethedodger
08-12-2004, 10:18 PM
Sure, but I find this way is the easiest way to check if an updated text is genuinely in the index, especially when I know that the page was there beforehand.
I wish I could remember the link, but there is a site that uses this technique for member profiles.
They have a system of coding the profiles of members into this teeny-tiny text onto their pages. It looks something like this (almost barcode looking):
CQ10J KL90Y JK74Y IS72K SO34A SL45W AI78E
AO23A AI23A IC22I AU223 AY2343F OA232D IF839A
YA34A OD098I GA387E NX73D HA399U EW939U EY9449D
JK4940D FH3030A AK2928Y CA3930J AK399D HF3030A
Then they leverage Google search to find the closest matches for your own profile ... since this code is part of the page. Pretty ingenious use of Google.
So yep ... a unique string on your pages will do the trick. Unless it is Yahoo of course, then there is a delay between the crawl and the actual indexing of the page.