iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
SEO 101 Welcome to the SEO 101 forum on WebProWorld - This SEO Podcast is geared towards Newbie's in order to teach and bridge the gap between website owners and the elusive SEO practices. So sit back, relax, enjoy, learn, and prosper from the SEO 101 Podcast.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 08-06-2008, 12:48 PM
WebProWorld Member
 
Join Date: Mar 2005
Location: Brewster, NY - USA
Posts: 80
MarkGatESS RepRank 0
Question 'Force' Google to remove non-existent pages


Every time I run the Google webmaster tools and I check for page errors with the Web Crawl, Google repeatedly lists the same seven pages and subfolders that haven't existed on our website for at least two years.



These are the 404 error pages it lists (removed 'http://www.' to prevent links):
  1. endoscopy.com/ess01/sx030002.htm
  2. endoscopy.com/ind/choosing_a_borescope.htm
  3. endoscopy.com/index.html
  4. endoscopy.com/med/index.htm
  5. endoscopy.com/products/ess_ent-p4.htm
  6. endoscopy.com/products/forceps_graspers_retrievers.htm
  7. endoscopy.com/products/medcartscabinets/
Is there any way I can get Google to stop trying to index these pages/folders? I had tried adding them to the robots.txt with no luck. As I said, they haven't existed on our webhost's server for at least two years, maybe more.

Any ideas/links I can try to use to correct this? It's getting quite frustrating!
__________________
~Mark G.
Graphic Designer - Endoscopy Support Services, Inc.
Reply With Quote
  #2 (permalink)  
Old 08-06-2008, 03:00 PM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,629
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Re: 'Force' Google to remove non-existent pages

Most likely there are still links to these pages somewhere. As long as someone links to the page, and the page returns a 404 message, Google will try to index the page. 404 messages do not indicate that the page is gone. Instead, it indicates "the document is not currently accessible. However, keep trying, because it might come back." To indicate that the document is gone permanently, your server would need to return a 410 code. Unfortunately, most webmasters do not know how to return a 410 error message. You would put the following line in your .htaccess file:

Redirect 410 /ess01/sx030002.htm

(Repeating for as many pages as are Gone.)

However, if you have links to these old pages, you may not want to waste the link juice. In most cases, it would be preferable to use a 301 redirect to send the traffic to another appropriate page on your server. This preserves page rank, and will also keep any visitors that follow the outdated link.

Redirect 301 /old/page.htm http:// site.tld/new/page.htm
__________________
The best way to learn anything, is to question everything.
Reply With Quote
  #3 (permalink)  
Old 08-07-2008, 10:55 AM
WebProWorld Member
 
Join Date: Mar 2005
Location: Brewster, NY - USA
Posts: 80
MarkGatESS RepRank 0
Default Re: 'Force' Google to remove non-existent pages

Thanks Wige.

Since my hosting provider kinda sucks, I've contacted them about either providing me with the ability to change the .htaccess file or I will give them the instructions that you gave me and provide them with alternative links to redirect those to other pages on the site. If you're right about them possibly being viable links, it would probably be better to redirect them rather than block them.

Question:

One of the "problem pages" is the index page - the old page was "index.html" and the new/current page is "index.htm".

Is there some way to add a line or two of code in the .htaccess file so that whether the page is index.htm or .html, browsers will recognize them as one and the same? So if down the road it was "accidentally" changed to index.html, those with links to index.htm won't get a 404 error and vice-versus?

Another question:

Can I use Google tools or some other method to "backtrack" what sites have those links to those pages? I've never done it before and I don't know how or if it's possible. There's one of them that I don't even know what it originally was (therefore, I don't know where to redirect them to).
__________________
~Mark G.
Graphic Designer - Endoscopy Support Services, Inc.

Last edited by MarkGatESS; 08-07-2008 at 10:59 AM.
Reply With Quote
  #4 (permalink)  
Old 08-07-2008, 12:50 PM
wige's Avatar
Moderator
WebProWorld Moderator
 
Join Date: Jun 2006
Location: United States
Posts: 2,629
wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9wige RepRank 9
Default Re: 'Force' Google to remove non-existent pages

As far as index.html to index.htm, the best you would be able to do is redirect one to the other. You do not want the browser to be able to access the page at both versions, as this would cause search engines to see the pages as duplicated. Generally, I prefer to rewrite all possible index.ext files to the root of the folder (so folder/index.ext always converts to folder/). Your server should automatically serve index.html or index.htm or index.something depending in which extension you have. Assuming that is the case, you would use the following rule to force browsers to the root:

RedirectMatch 301 ^(.*)/index\.(.*)$ http://url.tld$1/

This will match any folder structure, and remove index.ext from the end, redirecting the user and spider to the directory root. For best results, make sure your internal links point to the directory root.

Also, please note that if you will be passing query strings (I am assuming this is not the case since the pages are .html) this method will not work.
__________________
The best way to learn anything, is to question everything.
Reply With Quote
  #5 (permalink)  
Old 08-12-2008, 04:51 PM
WebProWorld Member
 
Join Date: Mar 2005
Location: Brewster, NY - USA
Posts: 80
MarkGatESS RepRank 0
Question Re: more questions Re: .htaccess

Okay, they got me FTP access. In "my" directory, I see a BAK folder, data folder, and a www folder. The only other things there are two text files for our SSL certs and a zipped file of our shopping cart software. The BAK folder has a copy of all our site images, the data folder holds our database files, and the www folder is the same list of files/folders that I have access through FrontPage - basically, our website files (the www is our "root").

Nowhere did I see an .htaccess file. Is it possible to run a website without the file, or is it that I don't have access to where the file is?

Also, can a web server have a single .htaccess file that runs multiple domains/websites or do I have a specific .htaccess file for our website on that server?
__________________
~Mark G.
Graphic Designer - Endoscopy Support Services, Inc.

Last edited by MarkGatESS; 08-12-2008 at 04:53 PM.
Reply With Quote
  #6 (permalink)  
Old 08-12-2008, 05:23 PM
WebProWorld Pro
 
Join Date: Jan 2005
Location: Denver
Posts: 248
chiron RepRank 2
Default Re: 'Force' Google to remove non-existent pages

Take this advice very suspiciously, htaccess is not something to be trifled with and I am not a master of its full potential.

However, usually I create or see them created in the /www/ path.

If you have root access, unlikely if on a shared versus dedicated server, you can have one at root and if you like, one in each (IP) domain also - they cascade down the path iirc, so only do as little as you need to, especially in areas such as 301 redirects.

Wige is a whiz on this kind of thing so I will shaddup now.
Reply With Quote
  #7 (permalink)  
Old 08-12-2008, 06:45 PM
spiderbait's Avatar
WebProWorld Pro
 
Join Date: Oct 2003
Location: Gibsons, BC, Canada
Posts: 271
spiderbait RepRank 5spiderbait RepRank 5spiderbait RepRank 5spiderbait RepRank 5spiderbait RepRank 5spiderbait RepRank 5
Default Re: more questions Re: .htaccess

Quote:
Originally Posted by MarkGatESS View Post
Nowhere did I see an .htaccess file. Is it possible to run a website without the file, or is it that I don't have access to where the file is?
It's possible that your permissions do not allow you to see or modify the .htacces.

It's more likely that your FTP program is filtering the files you see. In your setup (with the FTP program) you should be able to experiment with filters that will allow you see the .htaccess.

In cuteFTP I have to choose to "enable filters" and then tell cuteFTP to let the server filters do the filtering, not cuteFTP. That's usually enough.
__________________
Jade Burnside, Ahead of the Web
What good is your web site if no one can find it?
SEO & Optimized Web Site Design
Reply With Quote
Reply

  WebProWorld > Search Engines > SEO 101

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How long does it take Google to remove the old pages info? megame Google Discussion Forum 8 05-25-2008 01:14 AM
How to remove single pages from the Google Index Dubbya Google Discussion Forum 6 03-14-2007 01:36 AM
Pagerank for Non Existent pages Mong Google Discussion Forum 2 12-13-2006 10:28 PM
Is there a way to remove some pages? A. Smith Other Engines/Directories 1 05-29-2006 12:00 AM
Old non-existent pages suddenly return !! Spooky Google Discussion Forum 0 11-29-2004 09:09 AM


All times are GMT -4. The time now is 11:13 PM.



Search Engine Optimization by vBSEO 3.3.0