|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
I've been checking my stats and I've noticed that Slurp/Yahoo and Googlebot are crawling me like crazy...
In 6,5 days I've had: - Slurp (Yahoo): 98 - Google: 30 (main site) - Google: 131 (one of subdomain sites) - MSN: 13 ---------------- - Szukacz.pl: 5 - WISEnutbot.com: 5 - KROptimusNetLinkTester: 3 - Almaden: 1 - Deltascan: 1 Unique entries from crawlers (1 entry from 1 IP a day). Do you have also so many entries? :) |
|
|||
|
I noticed that Alexa has been crawling my site daily, not knowing what crawling was I went to their website and read up on what it was all about, its quite interesting and offers a whole lot of links that show you what you can do about it etc, something to do with making a robot.txt file that speciefies what the robot is allowed to crawl and what its not allowed to do, I believe that you can also stop anyone from crawling with the right .txt file, not quite sure how it works, but I would go to the sites involved and check thru them and see what they say about it, its your right to limit access if you want to.
I just wish all the funny entries on my stats would go away, I am sick of pages of codes that keep on appearing, I cannot delete the whole IP range from the person who is doing it, just block it on my own pc. |
|
|||
|
The crawling isn't bad provided you don't have low bandwidth limit and spiders doesn't slow down your server.
I know of several webmasters who had to limit Google robots because of that.
__________________
http://www.twojecentrum.pl - Polish e-shopping center http://dzwonki-loga.pl - Ringtones for mobile phones |
|
|||
|
|
|
||||
|
I don't mind being crawled repeatedly, even by bots that produce little result--provided that the spider is associated with some legitimate engine.
These are the bots I allow: Google, Yahoo/Inktomi, MSN Search, AltaVista, AOL Search, AllTheWeb, Lycos, Compass Communications Inc, Excite, Fast Search Inc, IBM Almaden Research Center, iWon, LookSmart, Naver, Overture, SurfWax, WiseNut, InfoMinder, Walhello, Alexa. All others are blocked. A bot database and .htaccess deny list is here: http://hometown.aol.com/botlist22/botlist.txt There are many others but I maintain that myself. Andi
__________________
...the Rockies may tumble, Gibralter may crumble... G & I Gershwin, 1937 |
|
|||
|
Nice list Andi, though I don't feel that using the IP address to deny Zenu link sleuth is of much use since that is a user program and is used from thousands of different IPs.
|
|
||||
|
Quote:
Quote:
http://www.andilinks.com/linkckg.htm But THOSE IP's listed are not people checking the validity of my andilinks.com links, but actually are people clueless enough to check the links on MY online pages directly with the Xenu program... I do want to block anyone whose idle curiosity about my site extends that far. These IP's may be of less value to most, but if the tab delimited database is imported into a db program it is easy enough to craft your own custom .htaccess file from it. I include my .htaccess for those who just want to say "ditto" or don't have the time or skill to make their own. An .htaccess file that gets too large does slow the site down, mine is up to 2.8k and I need to begin pruning out old IP's. Andi
__________________
...the Rockies may tumble, Gibralter may crumble... G & I Gershwin, 1937 |
|
||||
|
There was an apparant rash of unclaimed spiders last month. It was downright ridiculous, and nobody was owning up to them it seemed. USCity.NET shut the door on the AllTheWeb spider finally.
The one that is really confusing is the Fast Enterprise spider. This one now is a spider for hire, so to speak, from what I can tell ... and it showed up one day and went thru a page a minute which is what their FAQ says it would do. The thing was that it did that for 2-1/2 days solid. You could sit back and watch this thing on the logs methodically going thru the site ... it is by all rights a very smart link follower. Then it hit the end and started all over again, and that is when we closed the door on it. |
|
||||
|
Ah, if you liked that story ... try this one at WPW http://www.webproworld.com/viewtopic.php?p=65170#65170. That is my Googlebot PR River story. Mel liked that one a lot as I remember. ;0)
I am not sure, but the Almaden has a unique purpose for some reason that I cannot remember right off hand. I just read something about it the other day too. I don't blame you about cutting any of the others off. Especially the types that look more like they are data mining (as opposed to the spybot type). That was the case with the Fast Enterprise bot ... after it went thru the site once and proceeded to do it again, that is where I drew the line. Bandwidth is not an issue though...well not on the site mentioned above. If it were, that darn thing would have pushed your normal bandwidth allotments to the edge. |
|
||||
|
Quote:
Not just a search engine but "Advanced Text Analytic Solutions." Quote:
Like I said, so long as I don't have bandwidth issues, that is, using bandwidth beyond the budget for my hobby, I'm willing to trust that the information they use may well one day send an important visitor to my site--even if it's just one IBM scientist. By the time my bandwidth usage reaches that level though, I'm likely to have had some epiphany on how to turn my traffic into cash (no, not adsense). So the two lines on that graph may never meet. Andi
__________________
...the Rockies may tumble, Gibralter may crumble... G & I Gershwin, 1937 |
|
||||
|
Quote:
Anyway, the Googlebot article I meant was another one in Webstractions that I can't seem to find now. BTW do you edit Webstraction News yourself? I am going to add that link to my index.htm page so I remember to check it more often. Yes, I still read news the old fashioned way (non-RSS) not having found an RSS reader to my liking--I'm waiting for version 5.0 which is usually where an app finds its stride. I don't have time to "beta" the buggy versions 1.0 - 4.0. :) Andi
__________________
...the Rockies may tumble, Gibralter may crumble... G & I Gershwin, 1937 |
|
||||
|
Oh that Googlebot Story Googlebot taking random stabs at Atom/RDF files
I usually skim probably about 100 or so stories around the net a day ... never in one particular area either. One day I will be into what is happening with Google (which I try to avoid...sheesh) and the next I will be flipping thru some CSS blogs and tutorial sites. My focus wanders. Mostly I write a couple of "readers digest" type of paragraphs. Which is what most of the stories are about anyway before they go off into the tail-end Sergio and Larry life story which consumes 80% of the article ... but I will leave a link for that for those readers who have not heard it for the 100th time yet. Sometimes though, I will combine three or four seperate articles that I have been reading and try to put some more background to the story. That is what happened with the "googlebot story". That one in particular went beyond the initial story that was in the WPNews article that Garrett wrote about. I use one newsreader called Awasu which is a desktop application. You should give that one a whirl. I think you will like it. The other one I use is the new service at My Yahoo which you can add RSS feeds to your personal pages (i have different pages broken down by category -- channels) and that is very convenient and works pretty well. Of course, I have mine in there as well and ping Yahoo about twice a day to pick up on it. Those pages are now listed in the search results with RSS feed next them now (kewl). I saw your entry today (or was it yesterday) about your new program you are trying out -- the Directory Extractor. Here is another one that you may want to look at. I have been playing around with it a little bit and it is pretty kewl. Loop Improvements Net Research Server which crawls, indexes, and searches websites. They have a free version for you play with too at http://www.loopimprovements.com/download.html. It only stores 1000 documents, but it is enough to get a feel for what it does. I noticed that you only have around 125 or so pages on your site ... so you could probably use this to completely index it and have your own built in Search Engine for it. Just a thought. |
|
|||
|
my logs show that google spiders my site a few times a day.
|
|
||||
|
Thanks Ron. I will investigate the Loop Improvements products, though the PJL Links Suite does everything I need to set up my current work.
I do not put anything on my site that I haven't personally reviewed, and this review is really the whole point--the most important part of what I extract remains only as wetware (brain coding). The Google "search andilinks" function works better and cheaper than any on-board search, I even use Google to find things on my own site. But finding intelligent agents is an important part of what I do, I'd like to be able to identify their profiles and block them at the server level. Though I won't be able to actually do this until I get a dedicated server. Andi
__________________
...the Rockies may tumble, Gibralter may crumble... G & I Gershwin, 1937 |
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2009 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |