PDA

View Full Version : .htaccess to block image requests?



CLBridges
11-08-2003, 08:56 AM
I checked the .htaccess help/tutorial links Sualdam had recommended in a previous post and although it's great information (especially the first link!) I've got a couple of problems that weren't covered in enough detail and questions as to the limitations of what can and can't be done. I'm hoping someone here can help!

I don't know the "techie jargon" to use so will describe things as best I can.. Please bear with me! :)

I am currently blocking certain sites from loading images directly off my site using the following rule in my htaccess file:

=======

# begin rewrite rule to block .gif/.jpg from certain referrers
RewriteEngine on
## add new referrers below this line (include the [OR]!)
RewriteCond %{HTTP_REFERER} firstsite\.com [OR]
RewriteCond %{HTTP_REFERER} lastsite\.com
RewriteCond %{REQUEST_URI} \.(gif|jpg) [NC]
RewriteRule .* - [F]
# end rule

=======

The list is getting too long so I'd like to use it in reverse, blocking ALL sites EXCEPT the ones listed. When I asked my ISP's tech support department how to go about doing this, he sent me this revised version of the same rule:

=======

# begin rewrite rule to block .gif/.jpg from certain referrers
RewriteEngine on
## add new referrers below this line
RewriteCond %{HTTP_REFERER} ! firstsite\.com
RewriteCond %{HTTP_REFERER} ! lastsite\.com
RewriteCond %{REQUEST_URI} \.(gif|jpg) [NC]
RewriteRule .* - [F]
# end rule

=======

I'm a little confused on the new rule. The only difference I notice is adding a ! before the URL and not including the [OR] at the end of each line?

Writing it this way would mean that only requests for image files coming from firstsite.com and lastsite.com would display with all other requests forbidden?

When I asked tech support these same questions, this was his reply:


There's quite a logical difference there, adding NOT (!) and replacing OR with AND (AND is assumed between lines if [OR] is not present.)

It could be read, "If the referrer is neither firstsite.com nor lastsite.com, and the request is for a GIF or JPG file, block the request"

Ummmm... Oooookaaaay.. Logical to whom? I know he doesn't mean "logical" as in "statement"?!?

His reply sounded a little "crisp" if you know what I mean, so I didn't want to keep bugging him. Could someone here please explain the difference in non technical terms?

Questions:
Can I use wildcards? Example: google.* which would allow image requests coming from any google address to be displayed?
What would the format be?
Are there limitations?
Your help is definately appreciated! (poking everyone in the forehead, hoping to tap into all that knowledge!)

Carrie**

cyanide
11-08-2003, 06:49 PM
Try this:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://domain1.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.domain1.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://domain2.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.domain2.com/.*$ [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]

Make sure for each domain that you allow, you put it twice with and without www

CLBridges
11-09-2003, 09:56 PM
Make sure for each domain that you allow, you put it twice with and without www

The current rule (to block URLS)is written without specifying http:// at all and blocks requests coming from the domain name with or without the preceding www. For instance, written like this:

RewriteCond %{HTTP_REFERER} domain1\.com [OR]

requests from all of the following are blocked:

www.domain1.com
domain1.com
anyprefix.domain1.com\any_trailing_directory_or_fi lename\

Wouldn't it work the same way?


Try this:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://domain1.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.domain1.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://domain2.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.domain2.com/.*$ [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]
Isn't there supposed to be a preceding \ before the . in the URL name? What does the ^ do? (vs. using just the ! without the ^ that follows it.) What does the /.^$ at the end of the URL name do? (vs. not using it) What does [NC] mean?

Sorry for all the what, whys and hows. Trying to get a basic understanding of how/why it works was part of my original question.

My main concern with blocking ALL image requests is that my images wont display when using the image search feature in most search engines. Googles image search (all countries) make up a large part of my top referrers. The URLS come through in this format:

http://images.google.com/imgres (my top referrer!)
http://images.google.ca/imgres
http://images.google.de/imgres (etc)

What is the wildcard character or string recognized by htaccess, so the following line would allow requests from all 3 of the URLS shown above?

RewriteCond %{HTTP_REFERER} ! images\.google\.(wildcard)/imgres

Thanx for all your help!

Carrie**

OSFan
11-10-2003, 02:31 PM
AND OR NOT are logical terms, and used in computing just the same as in English.

I am awake OR I am asleep.
I am NOT Female.
I am Male AND I am White.

In computing NOT is often represented by one exclamation mark (!).

When you are not allowing certain sites you are saying:

(If it's X OR if it's Y or if it's Z) AND it's a JPG, redirect the URL.

When you are allowing certain sites you are saying:

If it's NOT X AND it's NOT Y AND it's NOT Z AND it's a JPG, redirect it.

Hence the change.

CLBridges
11-10-2003, 06:11 PM
In computing NOT is often represented by one exclamation mark

OK! So, now I know that the ! means NOT.. :) Didn't know that before! Do you perhaps know where I can find more detailed info on how htaccess works or recommend a book or manual of some sort?

Thanx! Carrie**

OSFan
11-11-2003, 11:31 AM
You are actually making use of the rewrite module of apache there, and it is quite complex, here are the apache manual pages for it:


http://httpd.apache.org/docs/mod/mod_rewrite.html - This is a reference.

http://httpd.apache.org/docs/misc/rewriteguide.html - This is a page of explained examples.

Like with many differences, NOT can be represented differently, in my Pascal programs, its "NOT", in most others its !. Perl Regular Expressions confusingly use ^ too.

And then NOT EQUAL TO can be broken down further into <> for pascal, and != for most others. Although that is beyond this ;-)

CLBridges
11-11-2003, 06:33 PM
And then NOT EQUAL TO can be broken down further into <> for pascal, and != for most others. Although that is beyond this ;-)
The <> is seems familiar.. from mathematics I presume. With ! representing NOT in many cases, the != makes sense as well (now!)

Thanx for the references.. I'll be sure to check 'em out!

Do you know the wildcard character recognized by htaccess?

Carrie**