Re: Subdomain Leaked Linking causing Duplicate Content Filter on SERP?
As far as trying to get the search engines to respond to your removal of the duplicated links, beyond what you are already doing, I would suggest that you consider changing from responding to the requests for the duplicated pages with 404 messages, and instead use 301 redirects to point to the original content. There are a few reasons that this may be preferable:
1) Increased speed. Search engines may leave 404 pages in the index for up to six months. This is correct behavior according to the specification. A 404 error message means that a resource is currently unavailable, generally due to a temporary issue - the file was removed for maintenance, a background server is down, etc. A 301 redirect on the other hand means that the requested file is gone, will never return, and should not be requested again.
2) Matches search engine behavior. Search engines try to be as adaptive as possible to keep up with changes to the web. Part of that adaptation is spotting redirects and new file locations and "merging" records in the index accordingly. By redirecting the duplicated pages to the original page, you are helping the search engines spot that the content was duplicated and the problem has been corrected in an explicit manner. The index may need to go through a discovery process to discover the change otherwise - the index has detected that three different pages are identical, and with 404s you have to wait for the index to spot one page is removed, and update the original page accordingly. With a 301, you are explicitly showing the search engine the link between the two URLs merging the records and forcing an update of the data for the surviving version which may remove black marks more quickly.
3) Increased crawl rate. As the search engine sees the redirects, although it may not recrawl the target page immediately, it will schedule the target to be crawled as soon as possible to check for spam and make sure the redirect is valid. This will again cause the index entry for the target page to be updated, possibly multiple times.
4) Maintaining link and bookmark value. As it seems these duplicated URLs have been in existence for some time, it is possible they have generated inbound links and that users have bookmarked them. By redirecting the user to the non-duplicated pages, you keep any pagerank from the links flowing, and prevent a negative user experience by showing the user the content you know they are looking for instead of an error message.
I am not by any means saying that you must change to do it this way, just that it is something to consider. Others may have other experiences, and my comments are based more on my own experiences as well as various tidbits of SE behavior I have seen posted by search engine sources. Other members may be able to point you in the right direction as far as whether or not this would be a good idea in your situation, but I suspect it might help.
__________________
The best way to learn anything, is to question everything.
|