Submit Your Article Forum Rules

Page 1 of 5 123 ... LastLast
Results 1 to 10 of 41

Thread: WWW vs non-WWW - understanding the physical file and directory background

  1. #1
    Senior Member
    Join Date
    Sep 2005
    Posts
    188

    WWW vs non-WWW - understanding the physical file and directory background

    Hi,
    Sorry to resurrect this issue, but I've spent a few days reading extensively on "Rel=canonical" without resolving my issues.

    I've taken on an SEO project for a client whos website is seriously underperforming in search. I am working on the on- and off-page factors, and TBH there is lots of scope there, very few backlinks ( as far as I could establish without the trusty Yahoo SiteExplorer - any recommendations for a good alternative also appreciated... ), and a lot of the targetted terms and phrases being very sparsely used. So, I'm optimistic that the techniques I've used successfully in the past will work here.

    My issue is that my starting point is that the site, an opticians, has a URL like joebloggseyecentre.com. OK, so no points for having the primary keyword in the URL, but there are options there, if needed. The difficulty is that in the (very low) SERPs the site appears, it does so without the 'www.' prefix. As far as I can tell, this is because the most significant link they have is from Golden Pages, and it doesn't have the 'www'.

    I had understood from what I'd read thus far was that either by design or by default, some or all sites have two iterations, one with, and one without the www "subdomain". But, I had assumed that this meant that all of the site pages must therefore exist in two seperate locations on the server. You locate the one you don't want to focus on, and insert a 'rel=canonical' on each page, pointing to it's counterpart.

    It may be that the access I have is limited. When I connect to the site I see five folders - 'primary, 'etc', 'logs', 'stats' and 'www'. I notice the icons for all except 'primary' are different, so check the Directory info and find:

    dr-x--x--- 3 bloggs003 web 4096 Jan 12 2010 .
    dr-x--x--- 3 bloggs003 web 4096 Jan 12 2010 ..
    lrwxrwxrwx 1 bloggs003 web 13 Jan 12 2010 etc -> ./primary/etc
    lrwxrwxrwx 1 bloggs003 web 14 Jan 12 2010 logs -> ./primary/logs
    dr-x--x--x 7 bloggs003 web 4096 Jan 12 2010 primary
    lrwxrwxrwx 1 bloggs003 web 15 Jan 12 2010 stats -> ./primary/stats
    lrwxrwxrwx 1 bloggs003 web 13 Jan 12 2010 www -> ./primary/www

    From long-ago days as a Unix Admin I recognize the 'soft links', and concentrate on the 'primary' directory, which has subdirectories for 'etc', 'logs', 'stats' and 'www' as expected. I upload the page changes to the /primary/www directory, then check and confirm that the new files are also in the linked /www directory. Also check both online versions in the browser, and both have successfully changed.

    So, from what I can ascertain, I only have one set of files to work with.

    When I signed up for GWT, I added both www and non-www versions of the site. The non-www was accepted straight away, but the www version took a day or two. Bing WT seems to only accept the 'www' version, and is still showing 'pending' after 3 days.

    I would appreciate if someone could explain how, in this scenario, there can appear to be two versions of the pages, and also how to reconcile this with Google and Bing.

    I have started on the link-building campaign, and have been using the 'www' version, as I had intended to canonicalize this, but given that I only see one physical ( non-unix-linked ) version of each page, may not be able to do this, and possibly should focus on the 'non-www' URL in the process of acquiring backlinks. As always, all help will be gratefully appreciated.

    PJ

  2. #2
    Administrator weegillis's Avatar
    Join Date
    Oct 2003
    Posts
    5,818
    In the absence of a physical pointer to an actual subdomain folder, the server defaults to the root for an index page. Too bad SE's don't have the ability to work within this scope. If they find a link with the subdomain in the URL, and then find another without, they treat it as two URL's, like it or not.

    The ideal solution would be to register one with GWT and Bing, etc., and if possible go with a server configured redirect to that, be it www or not. You can still tell the SE's which you prefer, but your server redirects will set them straight soon enough if you don't.

    The [rel="canonical"] attribute is better reserved for situations where you have many pages which duplicate everything in the source code, but which pull in their content with scripts based on hashes or query strings. Then the thing would be to direct the SE's to the root page on that path, in the template for that page.

    I'm only scratching the surface of the OP, and also only throwing in my 2 cents. Better answers will follow, I'm sure.
    Last edited by weegillis; 02-05-2012 at 01:50 PM. Reason: [..] correction

  3. #3
    Senior Member
    Join Date
    Sep 2005
    Posts
    188
    Thanks a lot for the response Weegillis. Unfortunately, I don't have access to the 'public_html' folder, and the .htaccess file. The original creator/designer of the site is unavailable, the login credentials I was given only give me access to the folders as outlined in my post.

    I know I should probably have just 'sat it out' for a week or so, the sitemap hasn't been picked up yet ( at least the changed Title tags etc. are not showing ), but I really wanted to crack on with the link-building part of the project, and was concerned that the longevity ( about 8 years ) of the non-www site in Google's index might jump up and bite me if I continue to target links to the www version. I considered putting absolute links ( http://www.joebloggs.... ) in the sitemap, but can't remember hearing that as a suggestion here before?

  4. #4
    WebProWorld MVP williamc's Avatar
    Join Date
    Jul 2003
    Location
    On a really big hill in Kentucky
    Posts
    4,721
    Murphy, if it is simply the www vs non-www, simply make a google webmaster tools acct and set your prefered domain to www.
    William Cross
    Web Development by Those Damn Coders
    Firearm Friendly Websites because our constitution matters

  5. #5
    Senior Member
    Join Date
    Sep 2005
    Posts
    188
    Thanks William. I have set up the GWT account, but was holding off on changing this setting until the sitemap is processed, as it seems that only the non-www home page is in Google's index ( still pending ) and I was unsure what the effect of changing the setting would have, if any, on the existing SERPs - the non-www site does show up on Page 1 for 'branded' searches ( bloggs optician mytown ). I'm guessing there wouldn't be any effect, but didn't want to take the chance.

    From previous projects, my recollection was that both Google and Bing Webmaster Tools only took about a day to process the submitted sitemap, if this isn't the case then maybe I just need to be more patient.

    William, are you saying that changing the setting in GWT removes the need for resolving the www/non-www conflict with canonical tags in all cases, or is it just that in my case, with only one version of each page, I don't need to use 'rel=canonical' ? Sorry to labour the point, particularly if it's not required for my current case, but I can't get my head around whether the two versions ( www/non-www ) of a page are sometimes two physically seperate files, rather than two virtual views of the same page, as I'm currently seeing. Appreciate the help!

  6. #6
    WebProWorld MVP williamc's Avatar
    Join Date
    Jul 2003
    Location
    On a really big hill in Kentucky
    Posts
    4,721
    rel=canonical was developed to combat a certain type of thing, namely pages that were seen as duplicated content at different URL's such as domain.com/screams.php and domain.com/screams.php?page=1 where both of those pages show identical content. When the issue is simply one of www vs non-www, the better choice as far as google is concerned is to simply use their own internal mechanism in GWT to handle it. If you are worried about diverging link equity however, it is a different matter entirely, and should be handled thru .htaccess using mod_rewrite to have the server automatically force all of one version to the other.
    William Cross
    Web Development by Those Damn Coders
    Firearm Friendly Websites because our constitution matters

  7. #7
    Senior Member deepsand's Avatar
    Join Date
    May 2004
    Location
    State College, PA
    Posts
    16,650
    Technically, "WWW" is not a sub-domain, but a service prefix. Just as the "FTP" prefix serve to identify a hostname that presumably supports the FTP protocol, the "WWW" identifies one that presumably supports the HTTP protocol. As such prefixes are de facto conventions, said presumptions do not necessarily hold true.

    Properly established DNS records gracefully handle the mappings between a hostname with and without any prefix by way of Aliases carried in CNAME - Canonical Name - DNS records. Thus, for users, no problem exists. In fact, if one observes carefully, he will see many different variant prefixes, such as "WW2," "WW3," WWW2," "WWW3," "WW23," etc.., all of which work quite nicely thanks to DNS.

    That Google and some others have a problem here is one of their own making; they simply don't bother checking to see if "WWW.DN.TLD" and "DN.TLD" reference the same hostname, or if "WWW.DN.TLD/something" and "DN.TLD/something" reference the same content.

    If your concern is simply how listing are displayed in Google's SERPs, then the aforementioned GWMT preference setting will do the trick.

    However, to ensure that Google treats the two canonical forms as being the same entity for indexing purposes, you need to do its work for it. And, this is just what the Canonical tag was designed for.

    While the same can be accomplished with a 301 redirect, such is actually a kludge, as the DNS already automatically executes that function. Returning a 301 Header code to Google is actually a lie, for the purpose of bringing its attention to the fact that both canonical forms do in fact reference the same entity.

  8. The following user agrees with deepsand:
  9. #8
    Senior Member
    Join Date
    Sep 2005
    Posts
    188
    Thanks once again, William and Deepsand. If I've gotten it right, the duality of the www and non-www is not due to two physical versions of the page files, but Google 'virtually' viewing them as two entities, if it has followed links to both versions. Deepsand, can I confirm that, even though I only have one version of each page, I should add <link rel="canonical" href="http://www.sitename.com/thispage.html" /> to each page anyway?

    If I could get back to the physical directory structure from my original post, I am still confused as to what the site creator was trying to achieve. Firstly, is there a default, or a system file, which tells the browsers ( and SE's ) whereabouts in the physical structure the website files are to be found? For most previous project I've worked on, the index.html and other web pages were located in the /public_html directory. ( altho' looking back, in one case they were in a '/webspace/httpdocs/urlname.ie/. In the current situation, all files were placed in a directory structure of /primary/www. But then a 'soft-linked' directory called 'www' was created and pointed to /primary/www ( similar for /etc ; /logs ; /stats ).

    Not expecting anyone here to know what was going on in the mind of whoever set this up! But it may be a familiar pattern, and I'm feeling a bit sheepish that I don't actually know how, when you type in a URL, the host knows which directory to find the files to serve to it, whether there is a range of alternatives, or if it is recorded in a system file somewhere. Assuming the 'www' virtual directory is where the files willl be served from, it would seem like an unnecessary layer, and possibly a delaying factor, to store them in a seperate directory, and then link the two directories.

    I have set the www as the preferred domain - Google has not yet picked up the new versions of the pages, so will just have to wait it out and see what returns in search when all files are indexed. I'll ask the client if they can amend their link on Golden Pages to 'www.' and, if I get the thumbs-up here, will add the canonical tag to all pages. Appreciate the help and advice here, as always

    PJ

  10. #9
    Administrator weegillis's Avatar
    Join Date
    Oct 2003
    Posts
    5,818
    If you can access the site root, then you can install a redirect (assuming your server supports .htaccess). For simply working around the domain canonicalization, (www or not), this is the optimum approach. It will work for all SE's, not just those who support the rel='canonical' attribute. As mentioned above, the attribute is best used in situations where there are multiple pages (not just the 'service' overlap) drawing upon a single source document as their root.

    If you have sorted out with GWT (and Bing, etc.) which URL configuration to use, then the attributes are unnecessary.

  11. #10
    Senior Member
    Join Date
    Sep 2005
    Posts
    188
    Thanks Weegillis. I presume there is no problem with a 'belt & braces approach' - put the canonical code in the pages and set up ( if allowed ) a .htaccess file with 301 redirects. There currently is no .htaccess file in what I see as the root ( not sure if the login I was given is an admin ), but if I can create one, can you reccommend the actual code I should use to basically redirect every non-www html and htm file in the /primary/www folder and the virtual /www folder ( or just site-wide, if that's an option. ). Every time I google this I get various and differing example code, it would be a huge favour if yourself or any of the experts here could show me the code I should use, given my directory structures.
    PJ

Page 1 of 5 123 ... LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •