Forum Moderators: phranque

Message Too Old, No Replies

An Introduction to Redirecting URLs on an Apache Server

For mod_rewrite beginners

         

DaveAtIFG

4:01 am on Dec 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm just an average webmaster. Admittedly, I'm better at the technical aspects than the creative side, but not much better! :) And I managed to learn enough about this stuff to use it. So can you! Let's start with something easy.

The Quick and Dirty Solution
A meta refresh tag added to the <head> section of a web page will forward a visitor to another page, it looks like this:

<meta http-equiv="Refresh" content="8;URL=http:// www.example.com/somepage.html">

This tag simply sends a visitor FROM the page they're at, TO somepage.html, after an 8 second delay. W3C says not to use it, it's non-standard, "fast refreshes" are frowned on by SEs due to abuse and, although most spiders follow it, your URLs usually won't be updated in SEs. A five second delay or longer has never caused me SE problems but your mileage may vary. Frankly, meta refresh is a pretty lame solution but when you need something "quick and dirty" until you can implement something better...

The Real Deal
Hosting companies usually install mod_alias and mod_rewrite with the Apache server and these are the "Swiss Army Knives" of redirects. But you'll need to know a little about a lot of other things to use them effectively. You will need to become familiar with Regular Expressions, the .htaccess file, and the directives available in each module. Keep readin'! Don't give up! I'm gonna explain all that in plain English too, I promise!

Caution!
Save yourself some headaches! Before trying to use either mod_alias or mod_rewrite, contact your hosting company and ask which of them are installed and available for use on YOUR server. As usual, there are special migraines for FrontPage users. FrontPage extensions are installed on many commercial hosting accounts and most non-FrontPage users simply ignore them. But if you are USING FrontPage extensions (to process forms for example), mod_rewrite will NOT work correctly! This may be true for mod_alias as well, I don't know. Ask your host.

Some hosts hide system files such as .htaccess. Since overwriting an .htaccess file can take your site down, don't assume it's not there simply because you don't see it. Ask your host!

What's An .htaccess File?
An .htaccess file just is a plain text file. It has one directive (that's what Apache folks call the instructions used in this file) per line like this:

RewriteEngine on

The "RewriteEngine" portion is the directive and "on" is a parameter that describes what "RewriteEngine" should do. More on this directive in a minute.

The .htaccess file usually lives it the root directory of a site and allows each site to uniquely configure how Apache delivers its content. Its directives apply to the entire site, but subdirectories can contain their own .htaccess and it applies to this sub and all of its subs and so on, down thru all of your sub sub sub sub subdirectories... You could have a different .htaccess in every sudirectory and make each sub behave a little differently. You get the idea.

Your site may have had an .htaccess from day one, or not, depends on how your host configured Apache. If it did, MAKE A BACK UP! Treat the backup as read only! Do NOT edit, delete, revise, bend, fold, staple, or mutilate any of the lines it contains! Ever! Changing something critical can take your site down. Until you know exactly what you are doing, ADD code to the copied file or revise your OWN code, but keep your paws off of the rest of the it! And don't fool with the back up! If you didn't have an .htaccess when you acquired hosting (and still don't) and your host said you can use mod_alias or mod_rewrite, we'll make one. Not just yet though.

Mod_alias Or Mod_rewrite?
Yes! Mod_alias is available at most hosts, often when mod_rewrite is not. Mod_alias is the easier module to use. It has fewer directives to select from and most are less complex than mod_rewrite directives. It's also much less powerful and there will be times when only mod_rewrite will do.

Let's assume we have page1.html and page2.html in the root of our site and we want to redirect all visits from page1 to page2. Either module can handle it but let's keep it as simple as we can. Mod_alias is perfect for this job! For example:

RedirectMatch page1\.html page2.html

"RedirectMatch" is the directive and the remainder of the line contains parameters that tell the directive what to match and where to send the requests that match. The first parameter, "page1\.html" is a "pattern" of characters to match. The second parameter, "page2.html" is the URL to redirect to.

Not so fast! "Pattern" of characters? Yup. In the *nix world this is called a Regular Expression. If you've used "wildcards" with M$ DOS or Windoze, somefile.* or some?.exe for example, you already understand the concept. But regex (regular expressions) are much more powerful. And complex.

A Few Words About Regex
In order to do nearly any redirects, you'll need to learn a little about regular expressions. If you expect to deal with *nix servers routinely, it's time well spent. You'll encounter hundreds of situations where this skill will be VERY useful.

I know enough about regex to get the job done (after a few false starts!), but I'm far from a master. I'll explain each example in this post as best I can, but YOU will need to learn more about this on your own. We have a few regex masters that frequent this forum that are usually willing to help. Here are a few good tutorials to get you started:
[etext.lib.virginia.edu...]
[gnosis.cx...]

Back To Our First Redirect Example
Here it is again:

RedirectMatch page1\.html page2.html

We could have used a different mod_alias directive to solve THIS problem. But RedirectMatch introduces concepts that are also used by mod_rewrite so... The Apache mod_alias docs are at [httpd.apache.org...]

The "pattern" portion of our example (page1\.html) is pretty simple. It says to match the exact string "page1.html", not page1a.html or page1.shtml. The backslash after the "1" and before the period in our pattern is an escape character. Regex uses a few "special" characters to describe which characters and how many characters to match. Unforunately, the period in our example is one of those special regex characters. So we use a backslash to "escape the period" and tell regex that this is indeed a period that we want to match, and not a regex "special" character.

What Will This Redirect Do To My Primo SE Listings?
The Apache docs show the complete format (syntax) for RedirectMatch like this:

Syntax: RedirectMatch [status] regex URL

The square brackets surrounding "status" indicate that this is an optional parameter (a value that MAY be included) and is not required. When used, it should be included WITHOUT the square brackets, exactly as shown in our example below. The status parameter may contain any of four values, "permanent", "temp", "seeother", or "gone". For example:

RedirectMatch permanent page1\.html page2.html

This example performs our redirect and returns a "301 moved permanently" status code with page2.html. From this status code, SEs will know that this URL has changed, and usually update their index in response. Voila! Using "temp" as the status parameter will return a "302 moved temporarily" header and it's the default returned if no status parameter is included in your directive. Read the darn manual if you want to know more about server headers and status codes!

The Big Gun - Mod-rewrite!
I'm a simple guy so let's continue with our simple example:

Options +FollowSymLinks
RewriteEngine on
RewriteRule ^page1\.html$ page2.html [R=301,L]

My understanding of the first directive, "Options" and it's parameters, is limited. But for this post, I've assumed that your site is remotely hosted. This directive instructs Apache to follow symbolic links within your site. Symbolic links are "abbreviated nicknames" for things within your site and are usually disabled by default. Since mod_rewrite relies on them, we must turn them on.

The "RewriteEngine on" directive does exactly what it says. Mod_rewrite is normally disabled by default and this directive enables the processing of subsequent mod_rewrite directives. Someday, it might be handy to insert RewriteEngine "offs" and "ons" at different points in a complex set of rules to help isolate which rules are failing.

In this example, we have a caret at the beginning of the pattern, and a dollar sign at the end. These are regex special characters called anchors. The caret tells regex to begin looking for a match with the character that immediately follows it, in this case a "p". The dollar sign anchor tells regex that this is the end of the string we want to match. In our simple examples, "page1\.html" and "^page1\.html$" are interchangable expressions and match the same string, however, "page1\.html" matches any string containing "page1.html" (apage1.html for example) anywhere in the URL, but "^page1\.html$" matches only a string which is exactly equal to "page1.html". In a more complex redirect, anchors (and other special regex characters) are often essential.

In our example, we also have an "[R=301,L]". These are called flags in mod_rewrite and they're optional parameters. "R=301" instructs Apache to return a 301 status code with the delivered page and, when not included as in [R,L], defaults to 302. Unlike mod_alias, mod_rewrite can return any status code that you specify in the 300-400 range and it REQUIRES the square brackets surrounding the flag, as in our example.

The "L" flag tells Apache that this is the last rule that it needs to process. It's not required in our simple example but, as your rules grow in complexity, it will become very useful. As your understanding of mod_rewrite deepens, you may add conditions to your rules (RewriteCond directive) and the "L" flag will tell Apache that mod_rewrite can quit processing these conditions after performing the rewrite, IF the RewriteRule pattern is matched. Experts suggest that you get in the habit of including the "L" flag with every RewriteRule to avoid unpleasant surprises. As you gain experience, you may encounter situations where it's not needed. Experts assure me that these are very rare.

The Apache docs for mod_rewrite are at [httpd.apache.org...] and a variety of "real world" rewrite examples are included in the Apache URL Rewriting Guide. It's written by Ralf S. Engelschall, the genius who created mod_rewrite, and you can find it at [httpd.apache.org...]

Build An .htaccess File
To experiment a little and perhaps build some confidence, create different pages on your site named page1.html and page2.html. Each should have different content. If you have an .htaccess file on your server, use a plain text editor (Notepad for example on Windoze) and add one of our examples to it. If you have no .htaccess, simply fire up the old editor and paste an example into it, then save it. (If you're using Windoze and working locally, Windoze will insist that you need a file name, "test.htaccess" should work. Upload "test.htaccess" to your server, ASCII transfer, not binary, then rename it.) Now, try surfing to page1.html and see if your redirect sends you to page2.html, it should. Then surf to page2.html and insure that it displays correctly. Congratulations!

One last point. If you have everything working as described in the last paragraph, rename or delete page1.html, then surf to it again. Mod_rewrite can redirect from non-existent URLs ("phantom pages") to existing ones. This could be very useful for sites that create pages dynamically and often have search engine hostile URLs.

In Conclusion (finally!)
There are too many webmasters that need too many redirects for WebmasterWorld to create them for you... But if you make an honest effort to learn, we'll do our best to help! And when you get lost in the Apache docs or *nix manuals perhaps you will review this post and get going in the right direction again... I hope some of you find it useful!

[edited by: DaveAtIFG at 12:03 am (utc) on Feb. 15, 2003]

Marcia

10:45 am on Dec 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dynamite post, Dave! Thanks for the basic primer, and especially for explaining those little squidgets. ;)

heini

10:54 am on Dec 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have been waiting for something quite like this - big thanks, Dave!

Rumbas

11:41 am on Dec 17, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



That's a real nugget. Even I can understand it now. Thanks Dave!

DaveAtIFG

3:05 pm on Dec 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I appreciate the kind words! :)

Let's extend our thanks to andreasfriedrich and jdMorgan for their invaluble suggestions, contributions, clarifications, and yes Jim, even the nitpicks. ;) We passed this back and forth many times, "polishing it up."

Thanks Guys!

 


 


 


 

Status: 403 Forbidden