View Full Version : Is it safe to delete pages that have been 301 redirected?
NoHarleys
08-16-2011, 10:50 PM
Since one of my sites "fell" last week I decided to play around good and hard within webmaster tools. Found some odd stuff that I feel I should clean up.
Under crawl errors, I have over 3000 pages "not found". This is mainly due to some of my old pages that were html based are linked to my old cgi based cart. I have not used that cart in years so all the links are dead. Even though I have all these old pages forwarded to my new cart with 301 Redirects (via .htaccess), it is still reporting the errors. Is it safe to just remove the old pages from my server that haven't been in use for several years? Or what should be my plan of attack?
SteveGerencser
08-17-2011, 12:40 AM
At this point, yes.. I'd kill the pages and forge them..
weegillis
08-17-2011, 11:31 AM
Try using a custom 404 page. That really helps.
Also it is always good and safe to redirect your traffic when a website hosting is renewed, regardless of the fact that you only change the hierarchy or remove few pages.
It is safe to remove old pages once the traffic stops.
404 should only be served out if you want the page removed from the search index. Otherwise it NEVER really helps. Having one [custom 404 page] in place is only useful for serving up links to pages that MAY be helpful (on one's own site) to the visitor who has requested a missing page.
If we think about it a minute, having 301'd a page is as much as deleting it, anyway. It can never be reached again. 301 is permanent. Yes, the pages can be removed as stated multiple times in this thread.
bhartzer
08-17-2011, 11:40 AM
If you can remove the old pages and the old cgi-based cart then I would do that--just make sure they are truly doing a 301 redirect to the new location (check it with a server header check tool).
I would also do some backlinks checks to see if there are any links from other sites to those URLs and see if you can get them updated.
Finally, spider your own site and see if those URLs and links to those old URLs still exist. There may be some internal links that need to be cleaned up, as well.
cbosleeds
08-17-2011, 03:38 PM
The above advice is good so at the risk of repeating it - check for IBLs to your dead pages and if there are none then kill these pages. if there's a good source of links to any of them, try to get the links moved, but if you can't maybe keep the redirects from the recipient pages live and even put full a href links from these pages to live pages on your site to pass the benefits of the links along to your live site.
office7
08-17-2011, 07:24 PM
NoHarleys, If you simply delete the pages from your server you will continue to get 404 errors until you fix the source of the problem.
In fact, if the files are still on your server, why are you getting 404 errors at all? Are urls being retrieved malformed urls. Sometimes inbound links are not correct. If that is the source of your problem, find the most common errors and redirect those malformed urls to where you want them to go.
If the urls are correct, then find out why the redirects are not working. Check your .htaccess for correct syntax, correct urls, etc. Just deleting the files will not fix the problem.
In fact, once you have the real source of your problem fixed, you can even leave the old files on your server -unless you are short on space!
In addition to the above, you can ask webmasters to link to the new urls. I find that for low value pages, this is not really worth the effort.
Once you fix the real problem and delete the pages from your server, I recommend you also use the "Remove from index" option in Google Webmaster Tools. I trust this helps.
NoHarleys
08-19-2011, 06:50 PM
Let me further explain.
mysite.com/oldhtmlpage1.htm has (had) links to my old shopping cart mysite.com/cgi-bin/oldcart.cgi?product=ypvs02_oldproduct&detail=1 (I went ahead and removed those old pages from the server).
The old shopping cart site (cgi-bin cart) hasn't been used for years. The pages like mysite.com/oldhtmlpage1.htm has been 301 redirected for years to a relevant page (and work). Though Google is still indexing that page which is why all the irrelevant links are still showing up. There are LOTS of these links to the old cart on these old htm pages. Hopefully now that the pages are removed it will stop looking for all those old cgi script links to my old cart.
Google is also showing links from pages in mysite.com/links/oldpage.htm but I don't even have a /links/ folder on my server. If you type that link it you will be taken to a 404 page. So I'm not real sure why it is saying the link to the old cart is on that page since it is a 404 page. It was discovered in 2008 so I'm sure that /links/ folder was removed a long time ago. I suppose I need to remove that folder from Google.
In any case my first update removed about 500 crawl errors so I only have about 2700 to go. Hopefully on Google's next crawl it will determine several other irrelevant pages have been removed and reduce that number even further.
weegillis
08-19-2011, 08:25 PM
The indexing system is slow to curate its archives. This is a well known fact. What are the last attempt dates on the crawl errors? Are they recent? If the redirects are in place and work, then that's what the crawlers should see. They follow 301 and 302 very nicely, if I'm not mistaken.
Are we sure these are even in the index, and not IBL's their crawler is finding?
[meant to add]
Have we ruled out canonicalization as a possible issue?
[/]
NoHarleys
08-19-2011, 11:09 PM
The indexing system is slow to curate its archives. This is a well known fact. What are the last attempt dates on the crawl errors? Are they recent? If the redirects are in place and work, then that's what the crawlers should see. They follow 301 and 302 very nicely, if I'm not mistaken.
Are we sure these are even in the index, and not IBL's their crawler is finding?
[meant to add]
Have we ruled out canonicalization as a possible issue?
[/]
The errors were recent 8/16/11. The reason why I am confused is why the crawlers haven't picked up on the redirects since they have been in place for years. I don't have redirects for the cgi-cart links because those are in the 1000's. But the pages that Google is finding those errors on have been redirected for many years. All the pages that it is showing those links on are the ones I just deleted (according to the information with WT)....not IBLs.
As for canonicalization; I switched everything to www last year to make sure all was the same. The errors are all www related.
At least I'm getting some of these errors removed.