Re: 'Force' Google to remove non-existent pages
As far as index.html to index.htm, the best you would be able to do is redirect one to the other. You do not want the browser to be able to access the page at both versions, as this would cause search engines to see the pages as duplicated. Generally, I prefer to rewrite all possible index.ext files to the root of the folder (so folder/index.ext always converts to folder/). Your server should automatically serve index.html or index.htm or index.something depending in which extension you have. Assuming that is the case, you would use the following rule to force browsers to the root:
RedirectMatch 301 ^(.*)/index\.(.*)$ http://url.tld$1/
This will match any folder structure, and remove index.ext from the end, redirecting the user and spider to the directory root. For best results, make sure your internal links point to the directory root.
Also, please note that if you will be passing query strings (I am assuming this is not the case since the pages are .html) this method will not work.
__________________
The best way to learn anything, is to question everything.
|