PDA

View Full Version : A Twitchy Little Sitemap Problem



Matteo
07-12-2007, 08:27 PM
Does anyone know of a "quality" sitemap generator?
I'll tell you what my problem is. On one of my sites I have bread crumb navigation and it's causing the site map generators to lead themselves into an endless loop


Home | Second Page | Third Page

Site map generator will always loop back from the second page to the first page and rarely index the third, and I will get 42 results on the second page.

Now I know I could do this by hand and type in 15,000 (okay maybe half that) entries but who wants to do that.

So I thought I'd shout out to the group because someone to know of a decent sitemap generator that's smarter than a dirt clod. I hope.....my fingers thank you

hostBrain
07-13-2007, 03:29 PM
It's not the site in your sig, is it? - Because it doesn't need a sitemap, imo.
While I'm there your footer links don't work.

Have you tried Google's Sitemap?

dak888
07-13-2007, 05:42 PM
Try this. And it's free.

XML Sitemap Tool for Google Yahoo MSN (http://www.auditmypc.com/xml-sitemap.asp)

DaK

bj
07-13-2007, 08:28 PM
If it's a GOOD sitemap generator, it won't repeat the EXACT same url.

Now, if the breadcrumb is being generated with dynamic url parameters of some sort, then most of the sitemap generators have a way to put in filters to avoid this sort of problem. So the trick is to write the filter. My directory script I used was doing something similar with an option to sort links by alpha or pagerank, and snagging the same page links three times (ordinary, alpha and PR) but it would also append a short string to the end of the url to note it was from an alpha sort or a PR sort. So I just developed the filter based on the short string that was being added so the sort urls wouldn't show, just urls with no short string added to them.

Clarrie
07-14-2007, 06:54 AM
Does anyone know of a "quality" sitemap generator?
I'll tell you what my problem is. On one of my sites I have bread crumb navigation and it's causing the site map generators to lead themselves into an endless loop


Home | Second Page | Third Page

Site map generator will always loop back from the second page to the first page and rarely index the third, and I will get 42 results on the second page.

Now I know I could do this by hand and type in 15,000 (okay maybe half that) entries but who wants to do that.

So I thought I'd shout out to the group because someone to know of a decent sitemap generator that's smarter than a dirt clod. I hope.....my fingers thank you

I use a free crawler called G-Site crawler (gsitecrawler.com). It works like a search engine bot and crawls the site from the visible links, de-duplicating and saving the URLs to a database. It then generates the Google and URL lists. Can even be scheduled crawl and to automatically upload the resulting sitemaps.

As well as being a good sitemap generator, I find it a useful tool to check sites for potential problems as it also generates reports on crawl errors (broken links etc).

A word of warning though - like any crawler it will use bandwidth bigtime, and that can be a problem for big sites (particularly e-commerce sites), but it does have a good range of controls for you to limit the scope of the crawl (including reading the robots.txt file).

tomcatuk
07-15-2007, 06:48 PM
I could be off the mark here, but if sitemap generators have trouble navigating your links, the same is most likely true of Googlebot. Google may get to know about your pages from your sitemap, but if the link structure of the site doesn't create internal links to those pages, Google will never give them any weight - unless you somehow carry out an external linking campaign for each page.

Also, all the pages from the footer links seem to contain basically the same text content. I think it might be worth the time to make them different - as a human reader I see them as duplicate content, and I'm sure Googlebot is a lot smarter than I am at spotting that kind of thing ;)