Submit Your Article Forum Rules

Results 1 to 6 of 6

Thread: index a page but not its session duplicates

  1. #1
    Administrator weegillis's Avatar
    Join Date
    Oct 2003
    Posts
    5,788

    Question index a page but not its session duplicates

    Code:
    Disallow: /courses/register.php?
    Disallow: /courses/reserve.php?
    Allow: /courses/register.php
    Allow: /courses/reserve.php
    Is this set of directives doing what it is supposed to? Will all the bots interpret it the same way?

    The obvious "Disallow: / * ?" will not work because then all the urls containing a session id will be disallowed, which we don't want.

    The above two pages are the only ones with duplicate content issues (title, etc.). The session id is matched to a table and made the SELECTED item in a pull down menu. It is also passed to the Submitted page on confirmation of submission.

    We do want the Register and Reserve pages indexed, just not the individual sessions. This is further complicated by the fact that there are other pages in the same directory that are totally dependent on session ids. We want all of them to be indexed.

  2. #2
    WebProWorld MVP wige's Avatar
    Join Date
    Jun 2006
    Posts
    3,138

    Re: index a page but not its session duplicates

    The method you have should work, although you don't specifically need the allow lines. You have two additional options as well. You could dynamically add the canonical tag, or add a meta noindex tag if a session id is present in the page.
    The best way to learn anything, is to question everything.
    WigeDev - Freelance web and software development

  3. #3
    Administrator weegillis's Avatar
    Join Date
    Oct 2003
    Posts
    5,788

    Re: index a page but not its session duplicates

    Am I correct in surmising from your comment that a url without a '?' will be ignored?

    I have included a rel="nofollow" attribute in the dynamic generated referring link. I'm hoping that if these pages (sessions) have been indexed in the past six months, that they would eventually fall off the radar with this addition to robots.txt.

    Is this the canonical approach to which you have referred?
    HTML Code:
    <link rel="canonical" href="http://www.example.com/courses/register.php" />
    <link rel="canonical" href="http://www.example.com/courses/reserve.php" />
    Now the 'dumb and dumber' question: Which page should this tag go into? The target page(s) or the dynamic page referring to it (them)?

    Our dynaimc referring page utilizes a flag that toggles the link text and the target to one or the other of the above pages.

  4. #4
    WebProWorld MVP wige's Avatar
    Join Date
    Jun 2006
    Posts
    3,138

    Re: index a page but not its session duplicates

    In reference to my previous comment, disallow statements are inclusive, so if you disallow file.php?, anything that starts with file.php? will be blocked, however, file.php would still be allowed, so the allow statement is not technically needed (but won't cause an issue if you leave it there).

    I don't think the rel=nofollow tag on the links will have any effect at all - it appears that this tag simply tells the spider not to pass pagerank, it may not actually prevent the page from being browsed.

    I would consider the canonical tag, as shown, but only if the pages are actually duplicates, or are very similar. If there are large sections of unique text, Google may decide to disregard the canonical tag. If you do implement it though, you would put the tag on all of the pages. So, the top tag you gave would go on register.php, and every session variant. This tells the spider that the pages fit into a logical unit, but again it will only work if there is limited unique content.
    The best way to learn anything, is to question everything.
    WigeDev - Freelance web and software development

  5. #5
    Administrator weegillis's Avatar
    Join Date
    Oct 2003
    Posts
    5,788

    Re: index a page but not its session duplicates

    Limited unique content - very much so. The only difference is in the SELECTED text in the pull down, and the page h1. I could have made a dynamic title but couldn't see the sense in it back then. I'm kind of kicking myself for letting this duplicate issue slide for so long. It only now began showing in GWT.

    On looking at the rel="nofollow" thing, a second time, it makes no sense to worry about juice if the page isn't being browsed. I really don't know what gave me the idea to use it in the first place besides knee jerkiness. Thanks for pointing this out.

  6. #6
    Junior Member
    Join Date
    Jun 2009
    Posts
    3

    Re: index a page but not its session duplicates

    so there is a way to that huh!
    Still learning those tags and codes for my site.
    Please keep it going.

    Thanks a lot.

Similar Threads

  1. Home Page & Index Page
    By kruser in forum Search Engine Optimization Forum
    Replies: 3
    Last Post: 01-22-2008, 08:48 AM
  2. IBL's to index page or inner page, which is better?
    By DMC_34 in forum Google Discussion Forum
    Replies: 4
    Last Post: 04-12-2005, 09:41 AM
  3. Why have asp page copy of index page?
    By RikR in forum Web Programming Discussion Forum
    Replies: 1
    Last Post: 10-22-2004, 07:35 AM
  4. Duplicates and Cloaking
    By Must Not in forum Google Discussion Forum
    Replies: 2
    Last Post: 03-23-2004, 06:20 PM
  5. index page and page ranking
    By New in forum Search Engine Optimization Forum
    Replies: 1
    Last Post: 01-21-2004, 11:50 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •