View Full Version : Effectiveness of Sitemap/Crawler Pages
WildSeeker
11-14-2003, 10:00 AM
Greetings!
I read an article recently that made a point of endorsing the inclusion of a sitemap or crawler page within one's site. The theory being:
"the crawler page is a site map that lists all the pages on your site — it may be a bit too big for humans to read through, but it will be no problem for a search engine. Add an obscure link to the crawler page on one of your site's top-level pages, using a small amount of text ...
... the crawler page won't show up in search results. It does get pulled into the search engine's index, but because it has no text or tags to match a query, it isn't listed as a result. The pages it links to, however, will appear because the search engine's spider found them right after it visited the crawler page."
I would be interested in feedback on this concept and its effectiveness.
minstrel
11-14-2003, 10:21 AM
I think including a site map is a good idea and probably does help spiders to find things on your site...
But why hide it? I see it as a kind of backup navigation scheme for anyone who has trouble figuring out where to click, or an overview of your website for anyone else - I would give it some prominance (not suggesting the top of your page in an <H1> tag or anything but make it easy to find).
Also, if you do try to hide it, it is conceivable that some search engine somewhere may see you as trying to subvert the rules...
rlrouse
11-14-2003, 11:03 AM
Site maps are indeed valuable for users as well as the SE spiders.
Many experienced web surfers will skip the site navigation altogether and head straight for the Site Map, which helps them quickly find what they're looking for.
And mistrel is right, it's wise to place a link to the Site Map in a prominant location on every page.
And having the Site Map indexed by the search engines is a good thing.
Google specifically recommend a site map in their guidelines and I seen a google employee specifically say in a number of messages something like "include a site map" ... Considering that Google are so keen on it, I can not help but wonder if Googlebot specifically searches out for the words "site map" to help it crawl a site ?????????? ... if this is the case (and I really have no idea, so don't quote me), then having a site map may be puting the site at an advantage.
CBP
achronister
11-14-2003, 12:44 PM
I always use the site map. In case the spiders have trouble with the home page for some reason (JavaScript conflict, etc) this is a sure fire way to make it work.
I usually put it in a lower section of the site, but still easily seen by users. The site we are going live with http://www.greattix.com (its actually going to be http://www.ticketsolutions.com when complete. But my point is in the lower section where the disclaimer and privacy statement reside, I'm putting a site map link there.
Hope that helps!
Aaron
pcsndreams
11-14-2003, 12:56 PM
Is there an "easy" way to make a site map? I have been thinking about putting one on my site but I couldn't quite figure out how to make one. I use Dreamweaver, Fireworks and Photoshop to build but they don't seem to help any with the site map. Any input will be appreciated.
achronister
11-14-2003, 01:02 PM
the Apple site map is one of the best ones I've seen. Check it out at http://www.apple.com/find/sitemap.html.
Fairly easy using tables. Another thing I use is a custom 404 page. (my site has 100's of pages) Whenever my users hit a 404 error, they will automatically get redirected to the site map.
Aaron
achronister
11-14-2003, 01:03 PM
PS: I acidentally put a period in the url....
minstrel
11-14-2003, 10:50 PM
Another thing I use is a custom 404 page. (my site has 100's of pages) Whenever my users hit a 404 error, they will automatically get redirected to the site map.
Yes! If you don't yet have a custom 404 page, make it your #1 priority. If someone is looking for a specific page on your site and mispells it (e.g., even something as simple as .htm versus .html versus shtml) or if you have changed the name or URL of one of your pages, you do not want the vistor thinking you have gone out of business...
I don't direct the 404 to a site map but it's not a bad idea - my preference is a custom error page with a search box: see here (http://www.psychlinks.ca/error.htm) for an example.
If you are using an NT/IIS server, you may need to contact the administrator or tech support to enable this feature. For other servers, you can use the htaccess file - do a search in the WebProWorld forums for "htaccess" or let me know and I'll post specific instructions here.
Jurgen
11-15-2003, 12:13 AM
I am working on a sitemap page.
Now here is my question: I do have a few product pages which will be generated "on the fly" with my database based products. Should I put these pages in the sitemap as well? And will they be crawled by the spiders?
Thanks for your input.
Jurgen
www.absolutelyfabulousflowers.com
achronister
11-15-2003, 12:06 PM
I don't direct the 404 to a site map but it's not a bad idea - my preference is a custom error page with a search box:
Yea, there are a few options for 404 pages, I just like the site map so they can see everything and know for sure where they want to go.
As far as a site map on dyanamic pages...I'd say not to do that. I have a dynamic site (500 or so dyanamic pages) and the only things I have in our site map is the links to the static pages (which is quite a few). I reley on the spiders to follow on to the dynamic pages if they can once they hit the specific pages.
Dynamic pages are followed by most spiders to an extent. It really depends on how many variables you have. One and maybe to are okay (example: www.sample.asp?productID=93947) is fine. When you get into the longer versions it will stop the spider. Question marks are considered stop characters so it will usually go past one at that's about it.
So...after my long winded explination, no, I wouldn't put dynamic stuff on your site map, just static links.
Aaron
bike4travel
11-18-2003, 11:14 AM
I have a sitemap and it was listed in Google quite fast. (faster than my other pages)
The only thing is:
The sitemap has a higher ranking than my other pages.
So people click on my sitemap when they are using a search engine.
My sitemap is a unattractive list, and I want people to see my real site. (this should be a product page with the product they are looking for.)
Does anybody has a tip?
Marco
PS I have a dynamic sitemap that contains a list of products. (allprod.php)
minstrel
11-18-2003, 11:25 AM
The sitemap has a higher ranking than my other pages. So people click on my sitemap when they are using a search engine. My sitemap is a unattractive list, and I want people to see my real site.
Why not make that sitemap page more attractive? If customers get there, they should also be able to get to anywhere else on your site - make it functional first, then make it pretty. If that's the first page they'll find in a search engine, make sure that page makes a good first impression.
achronister
11-18-2003, 12:16 PM
Why not make that sitemap page more attractive? If customers get there, they should also be able to get to anywhere else on your site - make it functional first, then make it pretty.
I agree completely. I always refer to the apple site map. http://www.apple.com/find/sitemap.html It's the best I've found and I model mine after its design. I try to do a little more design work on it, but still a good example.
Aaron
minstrel
11-18-2003, 10:05 PM
I always refer to the apple site map. It's the best I've found and I model mine after its design.
That is nice - clean and comprehensive.. I think I'll adopt it :-)
anuj_pandit1
11-18-2003, 11:43 PM
Hi WildSeeker,
I hope you are doing well,
I agree with you that submit only sitemap of the website then you dont need to submit website pages individual.
By using this methode if site map having link of 50 pages then you can submit 50mpages by submitting the site map only.....
I hope it will help you....
Regards
Alok Kumar Upadhyay
SoberRecovery.com
11-19-2003, 02:05 AM
Crawl pages, or site maps, were one of the first tools I ever used when learning about spiders, SERPS, and navigation. And they worked beautifully. The main part of my site is over 140 pages, and each of those individual pages have Page Rank of 5.
I look at site maps as a way to present the search engine spiders with a road map that leads to all areas of my site that I want them to visit. And by combining the use of a site map with a Robots.txt file, I can try to keep the spiders out of places I don't want them to go.
Recently, I contracted with a script developer to create a "real-time" dynamically generated site map for our messsage boards. Now, over 6,000 of our message board posts are indexed by Google.
Hope this helps!
ppeter
11-19-2003, 04:55 AM
I am working on a sitemap page.
Now here is my question: I do have a few product pages which will be generated "on the fly" with my database based products. Should I put these pages in the sitemap as well? And will they be crawled by the spiders?
Jurgen,
i think it is really important for you to actually have STATIC versions of your product pages. If SEs don't index your product pages, searchers won't be able to find your products on the SEs, which is what they will be looking for.
This goes for any website. If your most important pages aren't indexed, people won't find them and you will LOOSE out on business. Please feel free to ask me questions, I'm always happy to help.
Phil (ppeter@vector-networks.co.uk)
ppeter
11-19-2003, 04:59 AM
The sitemap has a higher ranking than my other pages. So people click on my sitemap when they are using a search engine. My sitemap is a unattractive list, and I want people to see my real site.
Why not make that sitemap page more attractive? If customers get there, they should also be able to get to anywhere else on your site - make it functional first, then make it pretty. If that's the first page they'll find in a search engine, make sure that page makes a good first impression.
If you make your important pages really prominent on the site map, you can make sure that the links those pages are the first thing the user lays their eyes on.
Also, if you were to make the links more prominent by using [i] then the SE spiders should pick up on that and see the important pages as being more important than the rest of the site. ;o)
lakein
11-20-2003, 01:24 PM
Here's a functional crawler-and-a-half, referenced on my home page, and serving as well as a useful table of contents to visitors. Also has a "Site" tab on navigation bar:
<http://www.alanlakein.com/TM01page99.htm>
One more thing to add...the main reason for the sitemap is to get pages deep in the site crawled. If you have a site that is only 1 level deep, and many sites are, I don't really see a need for a map. You've probably figured that out by now anyway, just thought I'd mention it.
peace...Paul
minstrel
11-22-2003, 09:18 PM
While I think hierarchical or "cascading" navigational systems are sometimes the best, this question came up elsewhere recently:
Does anyone know if there is a limit to the number of links spiders will crawl on a site map?
rlrouse
11-22-2003, 09:23 PM
...the main reason for the sitemap is to get pages deep in the site crawled.
The main reason for a site map is to provide an easy way for your users to quickly find what they're looking for on your site.
While a site map is indeed valuable as an aid for the search engine bots, it's even more valuable for your users if it's designed properly.
Many experienced web surfers will skip your naviagation menus and head straight for the site map to find what they're looking for in the shortest amount of time.
With few exceptions, sites designed with users in mind will usually do better than average in the search engines. That's why so many people who know nothing about SEO put up pages that end up ranking well anyway.
rlrouse
11-22-2003, 09:27 PM
Does anyone know if there is a limit to the number of links spiders will crawl on a site map?
Google guidelines recommend keeping the number of links on a page to under 100, but I've seen sitemaps with over 250 links get crawled on a regular basis with no problem.
I have no evidence to back it up, but I believe the PageRank of the SiteMap page plays a role (ie. the higher the PR of the page, the more links googlebot will crawl).
minstrel
11-22-2003, 09:32 PM
Thanks, rlrouse... I was about to respond to your previous post when this one arrived.
So, to respond to the one before - I agree completely with your comments both about designing for the user and about sitemap pages. I've just finished re-designing my sitemap page and looking at it I'm thinking it's a lot more user friendly than my index page... now I'm looking at highlighting the sitemap for the user as a parallel or alternate navigation entry point.
rlrouse
11-22-2003, 09:53 PM
I checked out your SiteMap minstrel. Very nice. Everything is well organized and easy to find.
minstrel
11-22-2003, 10:06 PM
I checked out your SiteMap minstrel. Very nice. Everything is well organized and easy to find.
Thanks! I have to make a confession, though... someone recently (I can't recall who so my apologies if it was you) directed me to the apple.com sitemap - I swiped the basic design from them :-)
Google do suggest no more than 100 links on a page, but its not mentioned specifically in the quality guidelines (ie the guidelines that breaking could lead to a ban) - but its on the same page as these, so I thing there is some misunderstanding that there is a penalty for >100 links per page. I think what Google are saying is "think of the user" - >100 links not good for user.
My site map has 100's of links (haven't bother to count them) - its there for the user. I did panic a bit earlier on when the Google cache cut off the bottom half of the page - my first reaction was that Google stopped crawling at 100 links down the page - I could not be bothered changing it. This stayed for a while - then as PR went up - the whole page was included in the cache. A year ago I checked paths googlebot took through the site via logs and it was following the link at the bottom of the page - I have no reason to doubt that Google crawls more than 100 links per page - BUT what does the user think - I think mine is laid out (columns), that it is easy for the user.
CBP
achronister
11-23-2003, 04:26 AM
Thanks! I have to make a confession, though... someone recently (I can't recall who so my apologies if it was you) directed me to the apple.com sitemap - I swiped the basic design from them :-)
Apology accepted ;) It gets confusing sometimes in here. As far as how many pages a SE will index, the way I understand it is the larger the index, the more pages it will crawl. Google being the largest, will index more pages than AltaVista, one of the smallest. As far as exact numbers go, it's hard to tell.
Another piece of advice I can give out, that your may already know, is that many search enignes won't index past two directory levels. I keep my site directory structure as small as possible (root, then only one level up). I've heard this helps out getting more pages indexed as well.
rlouse...what are your thoughts on the directory level issue?
Aaron
rlrouse
11-23-2003, 09:03 AM
I think what Google are saying is "think of the user" - >100 links not good for user.
I think you're right.
janeth
11-23-2003, 10:04 AM
I understand that we name the first page site map
the Google bot sees this and knows it is the site map but what do you call page 2 so the bot knows what it is?
rlrouse
11-23-2003, 10:15 AM
Hi Janeth. I use sitemap-2.html. It isn't really necessary to name it sitemap at all for the search engine spiders. They don't really care. All they see is a list of links to crawl.
But calling it sitemap is good for the users. That's what they expect to see and they know where they are when they see that file name.
minstrel
11-23-2003, 11:32 AM
I have to make a confession, though... someone recently (I can't recall who so my apologies if it was you) directed me to the apple.com sitemap - I swiped the basic design from them :-)
Apology accepted ;) It gets confusing sometimes in here.
Ah! There you are! :-) Well, thanks again - as I said before, I really liked that Apple sitemap... nice and clean and easy to navigate, and, since the highest form of flattery is imitation, I swiped it! :-)
Another piece of advice I can give out, that your may already know, is that many search enignes won't index past two directory levels. I keep my site directory structure as small as possible (root, then only one level up). I've heard this helps out getting more pages indexed as well.
I do that too, although it's as much to stop me getting confused as for any other reason. But it is certainly possible to have a hierarchical navigation structure even with a single or two level directory structure: You can group pages together by subtopic and create subtopic pages with links to the individual pages.
Example: you sell auto parts - you create a page for "Tires", which contains links to secondary pages for "Summer Tires", "Winter Tires", "All Season Tires", "Truck Tires", etc., etc. It simplifies and unclutters the sitemap or index pages which makes it easier for the visitor as well as perhaps more spider-friendly.
janeth
11-23-2003, 11:40 AM
If that is true then the second page of your site map if it runs off your first page would be no good.
Right or Wrong?
minstrel
11-23-2003, 12:12 PM
If that is true then the second page of your site map if it runs off your first page would be no good. Right or Wrong?
I don't think so, unless I'm confused here...
I have the following directories on my site:
root
root/pages
root/images
... etc.
That's root plus one level.
If I added root/pages/subpages I would be up to root plus two levels, and it has been suggested that some spiders might not get to the subpages level.
Is this correct, guys?
janeth
11-23-2003, 12:17 PM
Yes but if you had root/sitemap/sitemap2
it would not get past sitmap2 right?
rlrouse
11-23-2003, 12:29 PM
If I added root/pages/subpages I would be up to root plus two levels, and it has been suggested that some spiders might not get to the subpages level.
The directory structure makes no difference as far as I can tell. The linking structure in conjunction with the PR of the homepage does however.
I have a client site with a structure like this:
root/products/widgets/green-widgets/small-green-widgets.html
root/products/widgets/green-widgets/large-green-widgets.html
root/products/widgets/green-widgets/medium-green-widgets.html
These three pages are each linked to from:
root/products.html
The home page is PR6. The products.html page is PR5, and the .../...-widgets.html pages are all PR4 and all pages are crawled regularly.
I have another client site that initially had very similar directory and linking structures. The home page was PR4. The .../...-widgets.html weren't being crawled at all until I got some good links in and raised the homepage's PR.
minstrel
11-23-2003, 12:39 PM
Okay, now I am getting confused... The first part of your post, rlrouse, seems to say that the physical directory structure on your computer/server is irrelevant - what's relevant is the position of pages within the navigational structure or hierarchy. Based on that, my suggestion about simplifying navigation by creating a hierarchical navigation scheme would be bad advice.
However, in the later part of your post, you seem to be discussing ways to link back to the home page to improve the PR of pages lower in the navigational hierarchy.
So getting back to the question of limits on links for a sitemap page versus levels of directories to be spidered, what would be your advice?
rlrouse
11-23-2003, 01:56 PM
The gist of what I was saying is that PageRank appears to play a role in how deep Googlebot will crawl.
My client sitemap linking structures usually look something like this if the sites consists of hundreds of pages:
index.html>sitemap.html>group of related pages
index.html>sitemap-2.htlm>another group of related pages
And every page on the site links back to the homepage to channel PR back to it.
This way no page on the site is more than two clicks away from the homepage.
The actual directory structure of the pages makes no difference as long as the linking structure keeps everything close the homepage.
janeth
11-23-2003, 02:00 PM
The problem is that if you have a site with over 1,000 pages. You could end up with ten links from your home page to your site map page.
rlrouse
11-23-2003, 04:18 PM
The problem is that if you have a site with over 1,000 pages. You could end up with ten links from your home page to your site map page.
This is true, and it's fairly common.
achronister
11-24-2003, 02:53 PM
It could possibly a combination of PR and directory structure....meaning that if your PR is high, it will do a deep crawl. If your PR is low, it might only go one or two deep.
I'm out on a limb here though, this specific topic may be beyond my expertise. It will definatly be something I'll try and find out at the SE Strategies conference.
Aaron
rlrouse
11-24-2003, 03:35 PM
It could possibly a combination of PR and directory structure....meaning that if your PR is high, it will do a deep crawl. If your PR is low, it might only go one or two deep.
I think you're right. Sites with a homepage PR of 5 or better seem to get "deeper" crawls more often than lower ranked sites. I see evidence of this with my own site and my clients' sites.
I built a sitemap for one of my sites a couple of months ago. Nearly the entire site was dynamic and I built a simple static page for the sitemap and integrated it into the site because Google wasn't spidering many of the dynamic page urls for some reason.
There were only 2 pages on the entire site that had a Google PR of 4; the rest had no ranking at all.
Several weeks later ALL of the pages are listed on Google and the (simple) sitemap page
http://www.lockpickshop.com/sitemap.htm has a PR of 4 and many of the individual product pages also have a PR where they never did before.
The boost in traffic to the site has been incredible-a mimimum of 300 more visitors per day as a result.
In short- I am a big proponent of site maps (: