|
|
||||||
|
||||||
| Index Link To US Private Messages Archive FAQ RSS | ||||||
| Google Discussion Forum Google Discussion forum is for topics specifically related to Google. There is a subforum dedicated to AdSense/AdWords subjects. |
Share Thread: & Tags
|
||||
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
I'm hoping someone can help me out with this interesting dilemma. I've been using the Webmaster Tools from Google to check the status of my website. As time goes on, additional links are showing in "duplicate tags", "duplicate titles", and/or "error pages", etc. The problem is that the pages in question are nowhere in my sitemap, nor can they be followed by a site traversal/navigation. For instance, several of the links have a query string after it that we use for our Adwords ads, i.e. src=google. Where are these extraneous links coming from? Some of the links appear to even be made up and aren't even valid URLs. Other links are all lowercase whereas on the site they are Proper Case. Is Google storing these URLs from people with the Google toolbar typing them in?
|
|
|||
|
http://www.zycon.com/Announcements/2007%20NEWS%20RELEASES/HFPC%20%20%20HFPS/www.dwyer-inst.com
http://www.zycon.com/Announcements/2007%20NEWS%20RELEASES/HFPC%20%20%20HFPS/lit@dwyer-inst.com http://www.zycon.com/Announcements/Products/TFLS/www.dwyer-inst.com http://www.zycon.com/products/us-wv-west-virginia/laser-cutting.html http://www.zycon.com/products/industrial-heat-exchangers.html Upon further research, I see that the top 3 URLs are pulled from incorrectly formatted links provided to us in the actual product announcement (Word .DOC format). Not sure how to tell Google to get rid of them. The documents in question are located at http://www.zycon.com/Announcements/P...22631/TFLS.doc and http://www.zycon.com/Announcements/P...FPC%20HFPS.doc. I also noticed that some of the other posted announcements have incorrectly formatted links, but they do not show in Google tools like these do. As for the lowercase URLs, not sure where they're coming from. They are not listed as such in our sitemap. There are more in the Webmaster Tools than are listed here. Does Google treat URLs differently based on case sensitivity? If not, then a simple coding change to allow case-insensitivity for dynamic URLs should remedy this problem. Right now, the lowercase versions return a 404. |
|
|||
|
What forms are you referring to?
|
|
|||
|
The Word .DOC files are press releases posted on our site from 3rd parties. They are not included in our generated sitemap, but they are also not omitted in robots.txt. Spiders can still read and follow the links. I was unaware that there were invalid URLs inside the .DOC files, but more importantly, I was also unaware of the fact that Google can follow links from inside a .DOC file!
Still doesn't explain where these lowercase URLs originated from, however. |
|
||||
|
Quote:
If you allow for case sensitivity, you'll end up with duplicate pages getting indexed because of the different URL's. Been there, done that. Dave |
|
||||
|
Right Dave. And I felt like scraping your content and linking to your website in lower case, upper case, fake URLs, whatever then Google would follow those links to see if they exist.
|
|
|||
|
So it sounds like what we have currently where the lowercase URLs return a 404 and the Proper Cased URLs return 200 is correct then. That's good. Now to find out where these lowercase URLs are coming from in the first place.
|
|
|||
|
You have to be really careful with upper and lower case as others have said.
__________________
promotional items |
|
|||
|
Perhaps you should do a 301 redirect from the incorrect URLs to the correct ones? If you can't get the originating URLs changed, you may as well tell Google where it should be heading.
|
|
|||
|
If this continues to be a problem, I may have to implement this. I'd still like to know where the lowercase URLs are coming from in the first place. Only thing I can think of is someone with the Google Toolbar is manually typing them in and Google is making note of the newly typed URL (can it do this?!?). Other than that, I don't have a clue.
|
![]() |
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Errors of robots.txt unreachable in google webmaster sitemap | pervezalam_mzn | Google Discussion Forum | 2 | 08-30-2008 01:51 AM |
| Sitemap Showing Error in Google Webmaster Tools | tushar123 | Search Engine Optimization Forum | 5 | 07-30-2008 06:50 AM |
| Google Site Links in Webmaster Tools | thindenim | Google Discussion Forum | 12 | 01-02-2008 04:45 PM |
| WebMaster Tools Ext. Links Differ from Link: Command | williemanillie | Google Discussion Forum | 7 | 12-06-2007 09:32 AM |
| Links in Google Webmaster Tools | caravan | Google Discussion Forum | 7 | 10-31-2007 08:50 AM |
|
WebProWorld |
Advertise |
Contact Us |
About |
Forum Rules |
MVP's |
Archive |
Newsletter Archive |
Top |
WebProNews
WebProWorld is an iEntry, Inc. ® site - © 2009 All Rights Reserved Privacy Policy and Legal iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509 |