Block Unwanted Visitors by IP Address or UserAgent in Apache using mod_rewrite

Use .htaccess rules to block unwanted bots, spiders and other UserAgents that don’t fetch, or that fetch and ignore robots.txt.

Blocking visitors by IP address filtering in .htaccess file:

# deny specific IP addresses, and allow all others
order allow, deny
deny from 123.45.6.7
deny from 123.45.6.8
deny from 123.45.6.9
allow from all


Block specific UserAgent using mod_rewrite

   # Block Google Images Bot from Indexing your Copyrighted Images
   # Hopefully someday Google will publish a "supported way" of
   # Disallowing the Google Image Bot when necessary, but until then...
   RewriteEngine on
   RewriteCond %{HTTP_USER_AGENT} ^Googlebot-Image
   RewriteRule ^(.*)$ http://images.google.com/


The catch-22 with this method is that “sneaky” program developers can simply masquerade as “normal” visitors by using common web browser user agent strings. Reinforcing the fact that all three of these methods are USEFUL, but in no way a complete or secure solution even with the precise use of all three.


Also see:

  1. robots.txt
  2. Robots Meta Tag

Robots Meta Tag

meta-tags

Use an embedded meta tag on a specific page to instruct search engine spiders and robots what to index and disallow:

  1. Pages including “noindex, nofollow” indicate that they are NOT to be index, NOT to be included in listings, and NOT to be scanned for reciprocal links.
  2. Pages including “index, nofollow” indicate that they are to be indexed and listed, but not scanned for reciprocal links.
  3. Pages including “index, follow” indicate that they are to be fully index and scanned for all reciprocal links and included in all applicable listings.

DO NOT index, DO NOT include in listings, and DO NOT follow reciprocal links

<input name="robots" content="noindex, nofollow" />

Index, include in listings, but DO NOT follow reciprocal links

<input name="robots" content="index, nofollow" />

Index, include in listings, and follow reciprocal links

<input name="robots" content="index, follow" />

Also see:

  1. Robots.txt
  2. .htacess and mod_rewrite

Validate Your Website’s HTML and xHTML Markup!

W3C Markup Validation Service

W3C Markup Validation Service

W3C CSS Validation Service

W3C CSS Validation Service

With the increasing compliance by browser vendors to adhere to (some) of the W3C document and CSS standards, it is ever-increasingly important to use markup (tags), attributes, and coding practices that adhere to these standards as well.

Specifying document type and character encoding and using supported tags and attributes for the version of HTML (4.01) or xHTML (1.0 or 1.1 Transitional or Strict) not only increases the likelihood of your pages rendering properly and bug-free across multiple browsers and operating platforms, but also aide in your website’s ability to be indexed by spiders and crawlers and is the most important step in organic search engine optimization.

Web Design Tutorials & Articles

A List Apart

A List Apart