iEntry 10th Anniversary Forum Rules Search
WebProWorld
Register FAQ Calendar Mark Forums Read
Google Discussion Forum Google Discussion forum is for topics specifically related to Google. There is a subforum dedicated to AdSense/AdWords subjects.

Share Thread: & Tags

Share Thread:

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 03-13-2007, 09:43 PM
WebProWorld Pro
 
Join Date: Nov 2004
Location: Westmoreland County, PA
Posts: 218
noel_x99 RepRank 0
Default Duplicate Content Question

In order to manage source control, I'm working with a programmer to set up development, staging (testing), and production (live) version of our sites. All of these will be on the same server. The development site will always be password protected. Essentially, the staging site is an exact (testing) copy of the production site.

The programmer would rather not password protect the staging area.

My question is will the SE's see it as duplicate content? Is there anything special that can be done to prevent the staging version from being seen as a duplicate?

In theory, there would never be a link to the staging site from any website - so the SE's shouldn't find it. (But I don't want to let it up to chance.)

My first thought was to block all access to the site with a robots.txt - but when that file is moved from staging to production along with all of the other pages - then the SEs would be blocked from the production site too.

We're running these sites on a IIS server.

Any ideas/comments/suggestions are appreciated.
__________________
Jane Noel
http://www.InWestmoreland.com
Westmoreland County PA's Business Directory
Reply With Quote
  #2 (permalink)  
Old 03-14-2007, 04:04 AM
WebProWorld Veteran
 
Join Date: Jul 2004
Posts: 913
activeco RepRank 2
Default Re: Duplicate Content Question

Quote:
Originally Posted by noel_x99
My first thought was to block all access to the site with a robots.txt - but when that file is moved from staging to production along with all of the other pages - then the SEs would be blocked from the production site too.
Why is that?

'Disallow: /staging' will always remain the same.
__________________
Impossible? You just underestimate the time.
Reply With Quote
  #3 (permalink)  
Old 03-14-2007, 07:21 PM
chowell's Avatar
WebProWorld Pro
 
Join Date: Oct 2003
Location: Phoenix, AZ
Posts: 224
chowell RepRank 1
Default

The last post is accurate... simply use your robots.txt file to block the spiders from the beta area.

Set a reminder to remove the disallow tag before you go live and/or use separate robots.txt files for the beta & live sites.

You could also use a Meta Robots tag, but you'd have to remove that for launch also.
Reply With Quote
  #4 (permalink)  
Old 03-14-2007, 08:03 PM
WebProWorld Pro
 
Join Date: Nov 2004
Location: Westmoreland County, PA
Posts: 218
noel_x99 RepRank 0
Default

Maybe I'm not explaining it quite correctly. We're using a source control program called TortiseSVN. This program controls the source and as we commit changes, it creates versions of the site.

Suppose we were to put a version - say version 12 - on staging.oursite.com to test it. If everything tested OK, then version 12 would be put into production on the live site: www.oursite.com.

We wouldn't have the opportunity to change a single file - or it would be come version 13.

While search engines shouldn't ever find staging.oursite.com, it will still be there so that we can test version 14, 15, 16...etc before those version of our site go into production.

My concern is that I don't want to have the SEs see that as a duplicate copy of the same site.

I don't think disallow /staging will work because it's not a subfolder. Can I disallow "staging.oursite.com"?

Thanks,
Jane
__________________
Jane Noel
http://www.InWestmoreland.com
Westmoreland County PA's Business Directory
Reply With Quote
  #5 (permalink)  
Old 03-14-2007, 08:41 PM
WebProWorld Pro
 
Join Date: Mar 2005
Posts: 121
subsystems RepRank 2
Default

What about password protecting HTTP access to the staging site?
IIS lets you to setup a username and password that will require a visitor to login before they can access a site, folder and/or filename.
So just have your hosting company or webserver admin person add the login required info the staging site.

The spiders won't have the login info so they won't be able to spider anything.
The staging site would be for testing and limited access anyway. You can give out the login info only to those that need to test the staging site.

Hope this works for you.
Mike
Reply With Quote
  #6 (permalink)  
Old 03-14-2007, 09:38 PM
SemAdvance's Avatar
WebProWorld Veteran
 
Join Date: Dec 2005
Location: In Your Mind
Posts: 788
SemAdvance RepRank 3SemAdvance RepRank 3SemAdvance RepRank 3
Default

This should work

Place the robots.txt on the subdomains root

# For domain: http://staging.oursite.com

User-agent: *
Disallow: /

Hope this helps
Reply With Quote
  #7 (permalink)  
Old 03-14-2007, 09:40 PM
craigmn3's Avatar
WebProWorld Veteran
 
Join Date: Jan 2004
Location: California
Posts: 335
craigmn3 RepRank 1
Default Other Thoughts

Considering that you can get hosting for next to nothing these days, why don't you simply run your test pages on a seperate site. Block it seven ways from sunday, and when your ready to migrate your test pages to your livesite then download/upload.

this is a less "finese" way of doing it but more secure.
Reply With Quote
  #8 (permalink)  
Old 03-14-2007, 09:43 PM
WebProWorld Veteran
 
Join Date: Apr 2004
Posts: 349
imvain2 RepRank 1
Default

Quote:
Originally Posted by SemAdvance
This should work

Place the robots.txt on the subdomains root

# For domain: http://staging.oursite.com

User-agent: *
Disallow: /

Hope this helps
To add to this post, the robots.txt file is looked for per every subdomain. So staging.oursite.com/robots.txt is looked for and www.oursite.com/robots.txt is also looked for. So this solution should work for you.
Reply With Quote
  #9 (permalink)  
Old 03-14-2007, 09:43 PM
WebProWorld Member
 
Join Date: Apr 2004
Location: Chicago, IL
Posts: 48
tacimala RepRank 0
Default

Or only allow certain IP's on the staging site.
Reply With Quote
  #10 (permalink)  
Old 03-14-2007, 10:10 PM
Webnauts's Avatar
WebProWorld 1,000+ Club
WebProWorld MVP
 
Join Date: Aug 2003
Location: Worldwide
Posts: 8,164
Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9Webnauts RepRank 9
Default

Quote:
Originally Posted by SemAdvance
This should work

Place the robots.txt on the subdomains root

# For domain: http://staging.oursite.com

User-agent: *
Disallow: /

Hope this helps
I think this would look better. And it will work!

User-agent: Googlebot
Disallow: /

User-agent: *
Disallow: /
__________________
"Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction." Jeff Atwood
SEO Workers - Search Engine Optimization Consulting Company | SEO Analysis Tool | Webnauts Net SEO
Reply With Quote
  #11 (permalink)  
Old 03-14-2007, 11:18 PM
WebProWorld Member
 
Join Date: Jan 2006
Location: Sydney Australia
Posts: 60
Christiaan RepRank 1
Default

And then there bots that don't care about the robots.txt file!
__________________
Chris
There is no failure until you give up.
Reply With Quote
  #12 (permalink)  
Old 03-14-2007, 11:27 PM
edhan's Avatar
WebProWorld Veteran
 
Join Date: Aug 2003
Location: Singapore
Posts: 716
edhan RepRank 3edhan RepRank 3edhan RepRank 3
Default

Quote:
Originally Posted by Webnauts
I think this would look better. And it will work!

User-agent: Googlebot
Disallow: /

User-agent: *
Disallow: /
Yes, I agree with Webnauts. This will prevent all bots to scan. I normally will create password protection for all testing sites.
Reply With Quote
  #13 (permalink)  
Old 03-14-2007, 11:53 PM
WebProWorld Pro
 
Join Date: May 2004
Location: Montreal > Canada
Posts: 174
just-trying-to-help RepRank 0
Default Re: Duplicate Content Question

Quote:
Originally Posted by activeco
Quote:
Originally Posted by noel_x99
My first thought was to block all access to the site with a robots.txt - but when that file is moved from staging to production along with all of the other pages - then the SEs would be blocked from the production site too.
Why is that?

'Disallow: /staging' will always remain the same.
activeco has got it right.
The spiders will be disallowed in the subfolder/staging and not the index so using the same robots.txt for both the sub folder and the index will be fine.

Quote:
While search engines shouldn't ever find staging.oursite.com, it will still be there so that we can test version 14, 15, 16...etc before those version of our site go into production.
If it's on the server, chances are Google will find it sooner or later.

Quote:
Considering that you can get hosting for next to nothing these days, why don't you simply run your test pages on a separate site.
He would still have the same problem of blocking site #2.

Quote:
SemAdvance wrote:
This should work

Place the robots.txt on the subdomains root

# For domain: http://staging.oursite.com

User-agent: *
Disallow: /

Hope this helps


To add to this post, the robots.txt file is looked for per every subdomain. So staging.oursite.com/robots.txt is looked for and www.oursite.com/robots.txt is also looked for. So this solution should work for you.
Actually, it wouldn't because once transferred to the live site, it would disallow spiders there as well.
Remember: "We wouldn't have the opportunity to change a single file"

Ken
Reply With Quote
  #14 (permalink)  
Old 03-15-2007, 05:12 AM
WebProWorld New Member
 
Join Date: Dec 2005
Posts: 4
hoggy RepRank 0
Default Re: Duplicate Content Question

In your code check Request.ServerVariables to establish the domain\URL in use.

If it's a staging URL, look at client IP address.
If it's not one of your IP's, send a 301 redirect to the live URL.

In the same code you can also change something visual for the staging site (eg add a call to staging.css), so nobody gets them mixed up.

...all code based, so will transfer gracefully from staging to live.

...we've been doing it this way for years.
(Including IIS).
Reply With Quote
  #15 (permalink)  
Old 03-15-2007, 05:36 AM
WebProWorld Veteran
 
Join Date: Jul 2004
Posts: 913
activeco RepRank 2
Default Re: Duplicate Content Question

Quote:
Originally Posted by just-trying-to-help
Quote:
So staging.oursite.com/robots.txt is looked for and www.oursite.com/robots.txt is also looked for. So this solution should work for you.
Actually, it wouldn't because once transferred to the live site, it would disallow spiders there as well.
Remember: "We wouldn't have the opportunity to change a single file"
Correct.

If OP insists on staying on subdomain for staging, one of the solutions could be a server side language such as php or asp to check requested url and provide robots.txt accordingly.
E.g. in PHP something like this:


if (strpos($_SERVER['SERVER_NAME'],"staging"))
{
#provide disallow / version of robots.txt
}
__________________
Impossible? You just underestimate the time.
Reply With Quote
  #16 (permalink)  
Old 03-15-2007, 05:49 AM
WebProWorld Veteran
 
Join Date: Jul 2004
Posts: 913
activeco RepRank 2
Default

Hoggy,

Either we posted in the same time or I haven't seen your reply. In case of the later, my appologizes.
__________________
Impossible? You just underestimate the time.
Reply With Quote
  #17 (permalink)  
Old 03-15-2007, 09:29 AM
WebProWorld Pro
 
Join Date: Nov 2004
Location: Westmoreland County, PA
Posts: 218
noel_x99 RepRank 0
Default

Thanks all! I like serving the robots.txt like hoggy and activeco suggest. I think password protection is an easier idea, but the programmer preferred not to do it. I need to find out why.
__________________
Jane Noel
http://www.InWestmoreland.com
Westmoreland County PA's Business Directory
Reply With Quote
  #18 (permalink)  
Old 03-15-2007, 03:56 PM
WebProWorld Member
 
Join Date: Aug 2005
Location: Seminole, FL
Posts: 54
Seminole386 RepRank 0
Default Duplicate Content

I have several duplicate sites. The URLs are different and so is the title and a single information pages two clicks away from the index page. Other than that they are identical and all are on the same server. All happen to be linked together with a visit my other sites at the bottom of each page. I rank number 1 on Yahoo, Google, MSN and AOL for all my keywords. Hope that helps.
__________________
Life is short; enjoy the journey.
Reply With Quote
Reply

  WebProWorld > Search Engines > Google Discussion Forum

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 09:43 PM.



Search Engine Optimization by vBSEO 3.3.0