Submit Your Article Forum Rules

Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 30

Thread: How do you redirect a URL with "%0A" splitting up the .html extension? (.htm%0Al)

  1. #11
    WebProWorld MVP danlefree's Avatar
    Join Date
    Jun 2005
    Posts
    387

    Re: How do you redirect a URL with "%0A" splitting up the .html extension? (.htm%0Al

    You are most welcome.

    The tricky part is that this rule:

    Code:
    RewriteRule ^/?(.*)[\x00-\x1F\x7F-\x90\xA0](.*)$ http://www.domain.com/$1$2 [N,L]
    ... will remove as many escaped characters as there are in the URL (i.e. "%A0f%A0o%A0o%A0" will become "foo").
    Hidden Content | Owner/Operator (Web development, marketing)

  2. #12
    WebProWorld MVP Clint1's Avatar
    Join Date
    Jun 2003
    Location
    Sitting down in a chair
    Posts
    2,225

    Re: How do you redirect a URL with "%0A" splitting up the .html extension? (.htm%0Al

    Coincidentally, I see an oddball URL in my G WMT that has never existed. Obviously from some moron incorrectly linking to the page. The redirect isn't working for it. The page appears with a bunch of garbage on the end of the URL:

    .html%3E%3Cem%3Ekeyword%3C/em%3E%20keyword%20%3Cem%3Ekeyword%20keyword%20keyw ord%20keyword%3C/em%3E.%20keyword%20keyword%20keyword%20%3Cem%3Ekey word%20keyword%3C/em

    Everything in bold is what doesn't really exist, and everywhere you see "keyword" is part of my title tag on the page! Should that redirect be working for something like that?
    Thanks.

  3. #13
    WebProWorld MVP danlefree's Avatar
    Join Date
    Jun 2005
    Posts
    387

    Re: How do you redirect a URL with "%0A" splitting up the .html extension? (.htm%0Al

    Quote Originally Posted by Clint1 View Post
    The redirect isn't working for it. The page appears with a bunch of garbage on the end of the URL:

    .html%3E%3Cem%3Ekeyword%3C/em%3E%20keyword%20%3Cem%3Ekeyword%20keyword%20keyw ord%20keyword%3C/em%3E.%20keyword%20keyword%20keyword%20%3Cem%3Ekey word%20keyword%3C/em
    The %3C and %3E characters are not being parsed out with the rules we put together, so they will be passed through the existing filter - here's an update on the filter to remove them if you desire to do so:

    Code:
    RewriteRule ^/?(.*)[\x00-\x1F\x3C\x3E\x7F-\x90\xA0](.*)$ http://www.domain.com/$1$2 [N,L]

    Quote Originally Posted by Clint1 View Post
    Obviously from some moron incorrectly linking to the page.

    ...

    Should that redirect be working for something like that?
    Unfortunately, at the end of the day, there is nothing you can do to prevent users from creating links to nonexistent things on your site... though you can fix the problem such links create: build friendly 404 pages and create script-based 301 redirects for the malformed links which you are aware of.

    I've put together a few scripts to handle 301 redirects for one-off problems like the links you've described (there's no need to add mod_rewrite directives - the parser overhead is an acceptable tradeoff for links which will only receive a hit rarely) - if you'd like the code (PHP) I can put it up somewhere for you, then you can just push all hits which fail to match an existing file to the script and the script can determine whether the user just got the URL wrong or a 404 needs to be served.
    Hidden Content | Owner/Operator (Web development, marketing)

  4. #14
    WebProWorld MVP Clint1's Avatar
    Join Date
    Jun 2003
    Location
    Sitting down in a chair
    Posts
    2,225

    Re: How do you redirect a URL with "%0A" splitting up the .html extension? (.htm%0Al

    Quote Originally Posted by danlefree View Post
    The %3C and %3E characters are not being parsed out with the rules we put together, so they will be passed through the existing filter - here's an update on the filter to remove them if you desire to do so:

    Code:
    RewriteRule ^/?(.*)[\x00-\x1F\x3C\x3E\x7F-\x90\xA0](.*)$ http://www.domain.com/$1$2 [N,L]
    I tried it like that, and it only removed the %3C and %3E, the %20 and /em were still in the URL. So in looking at how you may do this, I changed the line to this:

    Code:
    RewriteRule ^/?(.*)[\x00-\x1F\x20\x3C\x3E\x7F-\x90\xA0](.*)$
    And that removed the %20. But I'm seeing a pattern where I don't think this will work, because even if next the /em can be removed, all of those words will still remain appended onto the end of the affected URL.


    Unfortunately, at the end of the day, there is nothing you can do to prevent users from creating links to nonexistent things on your site... though you can fix the problem such links create: build friendly 404 pages and create script-based 301 redirects for the malformed links which you are aware of.
    Yeah I know, my htaccess file is loaded with 301's, because site owners are too stupid to check their freakin' links, and they never reply to emails when you ask them to fix them! I've always had a custom 404 page.

    I've put together a few scripts to handle 301 redirects for one-off problems like the links you've described (there's no need to add mod_rewrite directives - the parser overhead is an acceptable tradeoff for links which will only receive a hit rarely) - if you'd like the code (PHP) I can put it up somewhere for you, then you can just push all hits which fail to match an existing file to the script and the script can determine whether the user just got the URL wrong or a 404 needs to be served.
    That would only work on PHP pages, right? Mine are HTML.

  5. #15
    WebProWorld MVP Clint1's Avatar
    Join Date
    Jun 2003
    Location
    Sitting down in a chair
    Posts
    2,225

    Re: How do you redirect a URL with "%0A" splitting up the .html extension? (.htm%0Al

    BTW that x20 I added works great and will come in handy because that's another way people screw up URL's--adding spaces in them. I can go to any of my pages and put a [space] anywhere in the URL and the valid page will display.

    I would guess that rewrite line or another rewrite line could be custom-created for that one specific invalid URL to fix it, but it's not worth it just for one invalid IBL.

  6. #16
    WebProWorld MVP danlefree's Avatar
    Join Date
    Jun 2005
    Posts
    387

    Re: How do you redirect a URL with "%0A" splitting up the .html extension? (.htm%0Al

    Here is a simplified version of the script I like to use:

    PHP Code:
    <?php
        
        define
    'FILE_ERROR_404'$_SERVER['DOCUMENT_ROOT'] . '404.html' );
        
        
    // Perform a redirect
        
    function http_redirect $target ) {
            
    header"HTTP/1.1 301 Moved Permanently" );
            
    header"Location: " $target ); 
            die();
        }
        
        function 
    parse_redirect_array $redirect_array$target_prepend '' ) {
            
            
    // Throw an error if an empty redirect array is passed to function
            
    if ( !is_array$redirect_array ) ) {
                die(
    'Error: Empty redirect array');
            }
            
            
    // Loop through known source URI's and redirect on match
            
    foreach ( $redirect_array as $source_uri => $target_uri ) {
                if ( 
    $source_uri == $_SERVER['REQUEST_URI'] ) {
                    
    http_redirect$target_prepend $target_uri );
                }
            }
            
        }
        
        
    // Offsite redirects array - takes the form of:
        // '/uri_on_site/' => 'http://external-domain.com/'
        
    $redirects_offsite = array(
            
    '/old-blog-folder/' => 'http://www.blog-name.com/',
            
    '/another-old-folder/' => 'http://www.another-name.com/'
        
    );
        
    parse_redirect_array$redirects_offsite );
        
        
    // Onsite redirects array - takes the form of:
        // '/uri_on_site/' => '/correct_uri_on_site/'
        
    $redirects_onsite = array(
            
    '/old-file-name1.html' => '/new_file_name.html',
            
    '/another-old-file-name.html' => '/another_new_file_name.html'
        
    );
        
    parse_redirect_array$redirects_onsite'http://' $_SERVER['HTTP_HOST'] );
        
        
    // Unknown source URI - serve 404 page with proper HTTP response code
        
    header "HTTP/1.1 404 Not Found" );
        
    header "Status: 404 Not Found" );
        include( 
    FILE_ERROR_404 );
        
    ?>
    With a redirect rule like this one you can use the script to handle every 404 error:

    Code:
    RewriteEngine on
    
    # Invalid character correction
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule (.*) /404.php [L]
    Your visitors will never actually see 404.php unless they request it directly - if they request a page which does not exist but has a redirect rule, they'll be redirected and if they request a page which does not exist and has no redirect rule they will remain at the same URI and see the contents of the 404.html file.

    The script could easily be extended to perform regular expression matching on $_SERVER['REQUEST_URI'] or pull the redirects from another file, etc - it could come in handy if you're seeing a lot of malformed requests and don't necessarily wish to spend lots of time with mod_rewrite directives which have the potential to cause unexpected behavior on valid requests and consistently create overhead on every page request.
    Hidden Content | Owner/Operator (Web development, marketing)

  7. #17
    WebProWorld MVP Clint1's Avatar
    Join Date
    Jun 2003
    Location
    Sitting down in a chair
    Posts
    2,225

    Re: How do you redirect a URL with "%0A" splitting up the .html extension? (.htm%0Al

    So Dan, will that work on an HTML pages site?

    I'll read over all that you posted tonight. On that problem URL and the code not working for it, FWIW I looked closely at the browser title bar when the page was loading and all those "em" areas in the URL are shown as <em> and </em> tags. Of course I do not use those in my title meta tag on the page, so those are apparently added from the site where the screwed up URL originates by an incorrect link code. I would bet the site owner probably left out a " mark somewhere in the tag, or didn't use a </a> in the right place, or left it out.

  8. #18
    WebProWorld MVP danlefree's Avatar
    Join Date
    Jun 2005
    Posts
    387

    Re: How do you redirect a URL with "%0A" splitting up the .html extension? (.htm%0Al

    Quote Originally Posted by Clint1 View Post
    So Dan, will that work on an HTML pages site?
    Your web host would need to have PHP installed, though if you have access to add mod_rewrite directives I'd suspect that there is script access of some sort - the script itself is agnostic to the type of content which resides in the 404.html file which it includes and at the URI's it redirects to.
    Hidden Content | Owner/Operator (Web development, marketing)

  9. #19
    WebProWorld MVP Clint1's Avatar
    Join Date
    Jun 2003
    Location
    Sitting down in a chair
    Posts
    2,225

    Re: How do you redirect a URL with "%0A" splitting up the .html extension? (.htm%0Al

    Quote Originally Posted by danlefree View Post
    Your visitors will never actually see 404.php unless they request it directly - if they request a page which does not exist but has a redirect rule, they'll be redirected and if they request a page which does not exist and has no redirect rule they will remain at the same URI and see the contents of the 404.html file.
    Ok so that part is the same way things are setup now. But this part below:


    The script could easily be extended to perform regular expression matching on $_SERVER['REQUEST_URI'] or pull the redirects from another file, etc - it could come in handy if you're seeing a lot of malformed requests and don't necessarily wish to spend lots of time with mod_rewrite directives which have the potential to cause unexpected behavior on valid requests and consistently create overhead on every page request
    Sounds like that would be the additional feature.

  10. #20
    WebProWorld MVP Clint1's Avatar
    Join Date
    Jun 2003
    Location
    Sitting down in a chair
    Posts
    2,225

    Re: How do you redirect a URL with "%0A" splitting up the .html extension? (.htm%0Al

    Quote Originally Posted by danlefree View Post
    Your web host would need to have PHP installed, though if you have access to add mod_rewrite directives I'd suspect that there is script access of some sort - the script itself is agnostic to the type of content which resides in the 404.html file which it includes and at the URI's it redirects to.
    I checked and I see listed in cPanel: "PHP", "PHP Pear", and "PHP.ini Quick Config". So I guess that means it's installed and this is something I could do myself?

Similar Threads

  1. 301 Redirect Means "Some Loss of PageRank" - says Mr Cutts
    By Ace in forum Google Discussion Forum
    Replies: 38
    Last Post: 03-26-2010, 04:50 PM
  2. rel="next" href="psge-2.html - Why would the browser read this tag?
    By morestar in forum Search Engine Optimization Forum
    Replies: 19
    Last Post: 01-12-2010, 02:32 AM
  3. Replies: 8
    Last Post: 10-31-2009, 03:25 PM
  4. What kind of redirect "passes" SEO
    By freetraff in forum Search Engine Optimization Forum
    Replies: 4
    Last Post: 04-09-2009, 09:19 AM
  5. "Borrowing" a client's ranking on redirect
    By simonlogan in forum Google Discussion Forum
    Replies: 8
    Last Post: 06-03-2008, 11:39 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •