iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Search Engine Optimization Forum SEO is much easier with help from peers and experts! The WebProWorld SEO forum is for the discussion and exploration of various search engine optimization topics. Any non (engine) specific SEO or SEM topics should go here.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 09-24-2007, 05:30 PM
WebProWorld New Member
 
Join Date: Oct 2003
Location: Texas
Posts: 22
emersonworldwide RepRank 0
Default Sitemap duplicate URLS

I have used the Google SiteMap generator and had it validated. I cannot find an answer anywhere to these questions:
Before I submit it, do I need to go into the xml code and delete:
1. duplicate URLs
2. a page that we no longer link to on our site
3. pages that do not need to be spidered like Guarantee.htm; ReturnPolicy.htm, etc, or do I just list these exclusions in the robot.txt (which I have yet to create).
4. What is the significance of some of the URLs being in bold and most not being in bold?

I thank any and all who can answer these questions for me.
__________________
Emerson WorldWide distributes quality products to relieve pain and improve quality of life. [url]www.emersonww.com
Reply With Quote
  #2 (permalink)  
Old 09-25-2007, 06:38 AM
TrafficProducer's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Location: United Kingdom
Posts: 1,642
TrafficProducer RepRank 3TrafficProducer RepRank 3TrafficProducer RepRank 3
Default Re: Sitemap duplicate URLS

The Best

is

GSiteCrawler will help you generate the best Google Sitemap file for your website. The GSiteCrawler uses different ways to find all the pages in your website and can generate all sorts of files, statistics and more. The sitemaps file format has lately been also adapted by Yahoo! - even MSN/Live.com is pledging it's support.

Google Sitemap Generator for Windows :: GSiteCrawler

This will Crawl your pages... You may tell it to ignore pages as you wish, etc..

Able to create more sitemaps HTML, Google, Yahoo, ROR etc...

Other Info:
Google Site Map
Yahoo Site Map, Yahoo Site Explorer
Site Maps
Reply With Quote
  #3 (permalink)  
Old 09-25-2007, 08:25 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,164
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Sitemap duplicate URLS

Quote:
Originally Posted by TrafficProducer View Post
The Best

is

GSiteCrawler will help you generate the best Google Sitemap file for your website. The GSiteCrawler uses different ways to find all the pages in your website and can generate all sorts of files, statistics and more. The sitemaps file format has lately been also adapted by Yahoo! - even MSN/Live.com is pledging it's support.
VIGOS Gsitemap - Free Google Sitemap Generator for Windows

This will Crawl your pages... You may tell it to ignore pages as you wish, etc..

Able to create more sitemaps HTML, Google, Yahoo, ROR etc...

Other Info:
Google Site Map
Yahoo Site Map, Yahoo Site Explorer
Site Maps
Does GSiteCrawler obey to the robots.txt? I did not have a so good experience so far.
I am now using VIGOS Web Acceleration and HTTP Compression Software
It is easy to use and I love it.
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO

Last edited by Webnauts; 09-25-2007 at 09:35 AM.
Reply With Quote
  #4 (permalink)  
Old 09-25-2007, 09:27 AM
TrafficProducer's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Location: United Kingdom
Posts: 1,642
TrafficProducer RepRank 3TrafficProducer RepRank 3TrafficProducer RepRank 3
Default Re: Sitemap duplicate URLS

Quote:
Does GSiteCrawler obey to the robots.txt?
To be honest I'm not 100% sure about this... I know it is able to check locally and remote links and able to produce robots.txt from the banned list you tell it..

So far as I know it's free to use.

Personally it can be a bit of a nuisance to set up don't use links, (at first), etc but once done the project/s may be saved for later use..

Includes FTP to your host and Ping Google/Yahoo. and also make compressed . gz files, and others...

(I use it for HTML site maps as well).

FAQ:=FAQs and Documentation :: GSiteCrawler


All I can say it's the best I've found so far

This looks to be more about compression of HTML??? and not so much SiteMaps???
Reply With Quote
  #5 (permalink)  
Old 09-25-2007, 09:36 AM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,164
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Sitemap duplicate URLS

Sorry buddy. I just corrected the link above: VIGOS Gsitemap - Free Google Sitemap Generator for Windows
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #6 (permalink)  
Old 09-25-2007, 09:42 AM
TrafficProducer's Avatar
WebProWorld 1,000+ Club
 
Join Date: Jul 2003
Location: United Kingdom
Posts: 1,642
TrafficProducer RepRank 3TrafficProducer RepRank 3TrafficProducer RepRank 3
Default Re: Sitemap duplicate URLS

The name seems almost the same as GSiteCrawler.
Reply With Quote
  #7 (permalink)  
Old 09-25-2007, 05:50 PM
WebProWorld New Member
 
Join Date: Jun 2005
Location: Colorado
Posts: 20
hyperdog RepRank 0
Default Re: Sitemap duplicate URLS

I recommend you keep as many pages indexed as possible. Be sure to redirect old pages if you just can't think of anything else to do with them. Your guarantee page can certainly be optimized for potential customers seeking a guarantee with one of the products you sell. Granted, it isn't a high traffic situation!
__________________
Colorado Web Development
Reply With Quote
  #8 (permalink)  
Old 09-25-2007, 05:59 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,164
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default Re: Sitemap duplicate URLS

Quote:
Originally Posted by hyperdog View Post
I recommend you keep as many pages indexed as possible.
Can you explain why?
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #9 (permalink)  
Old 09-25-2007, 06:28 PM
craigmn3's Avatar
WebProWorld Veteran
 
Join Date: Jan 2004
Location: California
Posts: 335
craigmn3 RepRank 1
Default Re: Sitemap duplicate URLS

Site Map Generators come in many flavors, some I have used got caught in bread crumb navigation loop and kept repeating the same links. A better sitemap generator would do better for you

As to the pages you don't link to......somewhere on your site...you do link to it. These things can't pull links out of mid air....they follow them.

Always designate the pages you don't want crawled in your robots.txt

and the bold vs un bold is some aspect of the program you are using, they mean nothing in google sitemaps.

The besst freebie site map generator I have run into is:

Create your Google Sitemap Online - XML Sitemaps

I think they have a 500 page limit so for the bigger sites it's not very useful
Reply With Quote
  #10 (permalink)  
Old 09-25-2007, 06:30 PM
kgun's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: May 2005
Location: Norway
Posts: 5,678
kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9
Default Re: Sitemap duplicate URLS

Quote:
Originally Posted by emersonworldwide View Post
I have used the Google SiteMap generator and had it validated. I cannot find an answer anywhere to these questions:
Before I submit it, do I need to go into the xml code and delete:
1. duplicate URLs
2. a page that we no longer link to on our site
3. pages that do not need to be spidered like Guarantee.htm; ReturnPolicy.htm, etc, or do I just list these exclusions in the robot.txt (which I have yet to create).
4. What is the significance of some of the URLs being in bold and most not being in bold?

I thank any and all who can answer these questions for me.
If Google is consistent, overlapping information and broken links should be filtered out. But I am unsure.

May be the best is to make your own XML sitemap. It is explained in detail in the following book:

Thomas Meyer (last edition): "No Nonsense XML Web Development With PHP"

You get the necessary code with the book.

There are various methods to transform an XML file to a new XML file, where you have automatically eliminated the information that you don't want in the target XML sitemap.
  • The simplest is an XSL(T) stylesheet, that may be all you need.
  • The most advanced, where you have (nearly) full control of the output is using PHP DOM functions.
Reply With Quote
  #11 (permalink)  
Old 09-25-2007, 06:33 PM
Orion's Avatar
WebProWorld Veteran
WebProWorld MVP
 
Join Date: Sep 2003
Location: Halton Hills, ON
Posts: 702
Orion RepRank 4Orion RepRank 4Orion RepRank 4Orion RepRank 4
Default Re: Sitemap duplicate URLS

yes GSiteCrawler (if you check the box) will obey your robot.txt file then create your xml site map based on those settings.. it will also create a yahoo sitemap for you!

great product I've been using it for 2 years or more now.
Reply With Quote
  #12 (permalink)  
Old 09-25-2007, 06:37 PM
WebProWorld Member
 
Join Date: Sep 2007
Posts: 47
DoneInStyle RepRank 0
Default Re: Sitemap duplicate URLS

Most of the better sitemap generators allow you to set filters. Most of the dynamic programs use similar conventions in the urls that allow you to figure out filters you can use. The better ones also respect robots.txt so you can stop hits on your admin access pages and other restricted areas.
Reply With Quote
  #13 (permalink)  
Old 09-25-2007, 06:38 PM
WebProWorld New Member
 
Join Date: Oct 2003
Location: Texas
Posts: 22
emersonworldwide RepRank 0
Default Re: Sitemap duplicate URLS

Thank you for this information. I will have to "sift" through it for my best options. But, now I have another question on same topic:
I have 10 strange URLS in the sitemap that all link back to my index.html page and I do not know where they originate! They all show the ~ mark????:
/index.html?Elbow.htm~Documents
/index.html?Gloves.htm~Documents
/index.html?knee.htm~Documents
/index.html?Lumbar.htm~Documents
/index.html?wrist8.htm~Documents
/index.html?wristProds.htm~Documents
/index.html?wristsupport2.htm~Documents
__________________
Emerson WorldWide distributes quality products to relieve pain and improve quality of life. [url]www.emersonww.com
Reply With Quote
  #14 (permalink)  
Old 09-25-2007, 06:55 PM
kgun's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: May 2005
Location: Norway
Posts: 5,678
kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9kgun RepRank 9
Default Re: Sitemap duplicate URLS

You know this page:

https://www.google.com/webmasters/to.../protocol.html

More precisely this:

Is there an XML schema that I can validate my XML Sitemap against?

https://www.google.com/webmasters/to...faq_xml_schema

with a link to the following XML Schema that defines the xsd name space.


That is the professional way to validate a well-formed xml sitemap document.

P.S. You do not need to go into the name space thread at W3 Schools. It is there if you need deeper information.

Last edited by kgun; 09-25-2007 at 07:06 PM.
Reply With Quote
  #15 (permalink)  
Old 09-25-2007, 07:02 PM
WebProWorld Member
 
Join Date: Sep 2007
Posts: 47
DoneInStyle RepRank 0
Default Re: Sitemap duplicate URLS

That url form looks like the kind of bird tracks in the url made by a dynamic program. If no other urls in your sitemap have any part of that end structure, then you may be able to use a portion of it for a filter. I'd suggest trying ~Documents first as a filter and see what happens.
Reply With Quote
  #16 (permalink)  
Old 09-25-2007, 07:44 PM
WebProWorld New Member
 
Join Date: Dec 2005
Posts: 11
heavener RepRank 0
Default Re: Sitemap duplicate URLS

Quoted:
"I recommend you keep as many pages indexed as possible."

"Can you explain why?"

The last thing a website needs, IMHO, is to shut the door on even a single customer/visitor. They might be the one in a hundred guy whose performance hangs in the balance and is willing to reward the organization that helps.

Example: I worked with a VP of Marketing to develop a commercial website. He wanted it to "look just like Moen Faucet". Why? Because Moen replaced a busted faucet for free when he got their phone number off the website. Note that he didn't use the website to make the support request, just got the phone number. And yet he bought all Moen faucets for the house he built later because "they have the best website in the world."

Don't discredit parts of your site that may admittedly not bear high traffic - they might be exactly the mine of gold one of your customers needs. I can't tell you the number of sites I've been to where information is so buried it's useless. Let it be indexed and don't worry about it.

P.S. - I once found a company's contact address buried in the middle of their privacy policy page - the only place on their whole site (I had to use QuadWeb Sucker to grab the whole site before I could find it).

Michael Heavener
heavener@heavenr.com
Reply With Quote
  #17 (permalink)  
Old 09-27-2007, 05:30 PM
WebProWorld New Member
 
Join Date: Feb 2007
Posts: 10
NYChris RepRank 0
Default Re: Sitemap duplicate URLS

Does anybody know if this thing can generate from daily logfiles?
Currently, I use the python script from Google to generate my sitemaps from my log files.
I have to do it this way because the site is mostly database driven and contains nearly 100,000 URL's.

I checked out this GSiteCrawler you are talking about and I don't see an option to generate from a log file.
However, I do see that it has a 1GB limitation for the DB and that raises another flag because my daily log files are very close to that size.

Chris
NYC Real Estate
NYC Real Estate Brokerage with Listings of Manhattan Apartments & Lofts for Rent & Sale

Last edited by NYChris; 09-27-2007 at 05:35 PM. Reason: My spelling stinks
Reply With Quote
  #18 (permalink)  
Old 09-29-2007, 11:56 AM
WebProWorld Member
 
Join Date: Oct 2003
Location: Athens Greece
Posts: 30
Tolis RepRank 0
Default Re: Sitemap duplicate URLS

We have the site easytraveller.gr and of the first to use GSiteCrawler since then they have been many releases each better from the previews. I think this is the best way to make a sitemap. Before you create the sitemap you can put your terms of type of files you do not want to be included and as the crawling is finished you can edit from the URLs tab to remove duplicate urls and also fix the frequency of changes
Reply With Quote
  #19 (permalink)  
Old 10-01-2007, 05:57 PM
WebProWorld New Member
 
Join Date: Oct 2003
Location: Texas
Posts: 22
emersonworldwide RepRank 0
Default Re: Sitemap duplicate URLS

Thank you all for this information! I am ready to go from here!
__________________
Emerson WorldWide distributes quality products to relieve pain and improve quality of life. [url]www.emersonww.com
Reply With Quote
  #20 (permalink)  
Old 10-01-2007, 06:12 PM
WebProWorld Member
 
Join Date: Sep 2007
Posts: 47
DoneInStyle RepRank 0
Default Re: Sitemap duplicate URLS

Let us all know what is working for you. Others looking for the information at some future date may possibly benefit.
Reply With Quote
  #21 (permalink)  
Old 10-05-2007, 02:40 PM
WebProWorld Member
 
Join Date: Sep 2005
Location: Detroit, MI
Posts: 77
zycon5000 RepRank 0
Default Re: Sitemap duplicate URLS

Quote:
Originally Posted by NYChris View Post
...I checked out this GSiteCrawler you are talking about and I don't see an option to generate from a log file.
However, I do see that it has a 1GB limitation for the DB and that raises another flag because my daily log files are very close to that size.
GSiteCrawler is an excellent program. We use it frequently to generate our sitemaps. If you have a concern about size, download their SQLDB version if you have that installed on your machine. Unfortunately, they don't have the new beta version in SQL format, only Access format. The 1GB limitation is probably an Access limitation, not the application limitation. Also, GSiteCrawler even found some duplicate URL's while crawling and automatically excluded those from the sitemap.
__________________
Zycon -- The source for engineers and technical buyers to locate and contact manufacturers and suppliers of industrial products and services from abrasives to valves.
Reply With Quote
Reply

  WebProWorld > Search Engines > Search Engine Optimization Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Duplicate domains = Duplicate content? Stevo Google Discussion Forum 4 01-08-2008 06:01 PM
Best urls for SEO mdinowit Search Engine Optimization Forum 1 03-20-2006 07:33 PM
Removing urls from google, - duplicate content mysolitaire Google Discussion Forum 1 03-01-2006 05:56 PM
Google to Restrict Duplicate Ad URLs WPW_Feedbot Search Engine Optimization Forum 0 01-07-2005 05:00 AM
Duplicate Headers = Duplicate Content? emeraldisle Search Engine Optimization Forum 1 09-09-2004 07:23 PM


All times are GMT -4. The time now is 05:15 AM.



Search Engine Optimization by vBSEO 3.3.0