...making Linux just a little more fun!
By Pete Savage
In today's fast-paced world, the importance of Search Engine Optimisation and usability has been thrust into the limelight. More and more companies and employers are hearing the buzz that is SEO, and want reassurance that their new Web site will deliver. They all want to be top of the search engines, and we all know that that will rarely happen. There are, however, things we can do to help the issue, and make life a little easier for those lovable bots and spiders who crawl our pages.
Whether you believe the fact that search engines discriminate against ID numbers in URL strings or not, the fact of the matter is that
[1] https://www.a_mucky_page.com/products?id=00234987&category=990003429will never look as nice to the end user, or search engines, as
[2] https://www.a_nice_place_to_be.com/products-modems/speed-baud-5000.html
Getting rid of GET parameters is also an advantage, as some search tools will discriminate against pages based on the number of parameters.
This article is a primer to help overcome this problem, using PHP/MySQL/Apache and a pinch of mod_rewrite thrown in for some spice. In some cases, this may not be the best solution, but is always worth considering, as mod_rewrite is an extremely powerful tool in your PHP Web development toolkit.
To begin with, you will need a Web server running Apache with mod_rewrite installed. If you own the server, this is going to be easy. If not, you may have trouble getting the administrators to install mod_rewrite. The reasons for this are simple; the administrators are not trying to annoy you, but one false move inside mod_rewrite code, and the CPU load goes through the roof. This article assumes you already have some knowledge of LAMP systems (Linux/Apache/MySQL/PHP), and that you have mod_rewrite installed.
Structuring your database is an important step in the process, and must be carefully thought about; however, it may be that your database is well enough structured already - in which case, well done. Keeping your data tidy not only helps you but also will automatically make for a more SEO (Search Engine Optimised) system. We are going to use the example of a single-tier categorisation system. By that, I mean that products are sorted into a single category that fits them best. We will have two tables, one for categories and one for products. Their structure will be as in the tables below. You can see that the product table is linked to the category table through the category_id field.
Let us write some PHP code to pull out the record information based on the URL line [1] above. We have the two parameters we need: product and category. The customer wants the category information listed on the page with the product. Code for this may look like the example below:
<?php //Setup Database mysql_connect(127.0.0.1, blark_inc, my_password); mysql_select_db(blark_inc); //Get the GET parameters $category=$_GET['category']; $id=$_GET['id']; //Call in data for category and products $category = mysql_fetch_array(mysql_query('SELECT * FROM category WHERE id=\"'.$category.'\"')); $product = mysql_fetch_array(mysql_query('SELECT * FROM products WHERE id=\"'.$id.'\"')); //Output the information echo 'Category Name : '.$category['title'].'<br>'; echo 'Category Description : '.$category['description'].'<br>'; echo 'Product Name : '.$product['title'].'<br>'; echo 'Product Description : '.$product['description'].'<br>'; ?>
So, in our simple example, the IDs of both tables are called with the GET method, and their relevant information pulled from the database and output to the end user. As stated previously, though, this method uses a URL string that does not look pretty in the least; rather, it's hideous. You may argue that the category ID does not need to be stated, and while this is true for a single-tier categorisation system, other more complicated types of system may well need this capability. Just imagine if a product were to appear in more than one category.
It is now that we pick up our pot of spice, with mod_rewrite scrawled in friendly letters on the site, and begin to add in that little bit extra. Up until now, we have not really discussed what mod_rewrite does - we have simply put it forward as an answer to all our problems. So, let us delve a little deeper. mod_rewrite is an Apache module that will rewrite URLs according to certain rules. In effect, a browser will request a file by name, e.g., my-life.html, and may be returned a completely different file, e.g., my-friends-life.php, but it all happens transparently: i.e., the user will still think he or she is viewing my-life.html. It is similar to the notion of symbolic links in Linux, but a lot more powerful. Let us look at a few examples of how this could be used. Please note this is not actual code, just basic examples.
Change all .html to .php blark.html would become blark.php Would give the effect of a static site, all links would be .html. Redirect requests pages under maintenance index.php would become maintain.php Could modify a specific filename to point to a maintenance page. Use the page name as a GET parameter for another page my-information.php would become pages.php?page=my-information Useful for having one script to run the show.
Note:
mod_rewrite will only modify the URL if Apache is used to collect the file.
It makes no difference to files stored on your Web server. For example,
includes in PHP will remain unaffected, as they do not obtain the file
through the HTTP protocol but use the local file system.
It is the last one of these examples that is of interest to us. Let us make a simple .htaccess file to test some of the examples we have just written. The .htaccess file contains all the rules for mod_rewrite within that directory, an example of which is below.
RewriteEngine on
#First example - modify all .html to .php RewriteRule ^(.*).html$ $1.php
#Second example - modify index.php to maintain.php RewriteRule ^index.php$ maintain.php
#Third example - Use pagename as a get parameter RewriteRule ^(.*).php$ pages.php/?page=$1
Note:
If you make a mistake in the .htaccess file, and the resulting code
that mod_rewrite finds is invalid, you will be alerted with an Internal
Server error page, Error 500. Do not worry; this is normal. Just alter
the line, and try again.
The expressions' syntax can take some time to understand, and indeed I had a rather long experience learning this, when someone forgot to put a escape character '/' before the '?' in a tutorial. The rules follow the format of
RewriteRule What_I_am_looking_for What_I_want_it_to_become
The '^' means start matching the filename from the beginning of the URL string after the host. For example, with https://www.my-life.com/test.html, mod_rewrite will only look at what comes after https://www.my-life.com/, i.e., test.html. The '()' are used to catch data. Anything that matches and falls inside these parentheses will be stored. This can then be recalled by using a '$1' in the rewritten expression, as can be seen in the first example. The '.*' within the brackets in the first example catches all characters, and the '$' denotes the end of the URL string. In this example, for a file to fit the criteria, it must be a set of characters, followed by '.html' with nothing trailing on the end. When mod_rewrite finds a match, it takes the value inside the brackets and puts it back to work in the rewrite expression '$1.php'. $1 means use the data from the first set of brackets. If you had another set of brackets in the matching expression, then using the data from it would mean using '$2' in the rewrite expression.
The second example should be obvious now. It matches the term index.php exactly with no variations, and rewrites it to maintain.php.
The last example is the one that should prove of most interest to us. We are matching anything before the '.php' and rewriting this as the GET parameter of another file called pages.php. Please take careful note of the '/' in front of the '?'. '?' is a special character in mod_rewrite code, and must be escaped using the '/'.
Note:
For full details on mod_rewrite, head over to
https://httpd.apache.org/docs/mod/mod_rewrite.html.
We now have everything we need to make our search engine-optimised product catalogue. All that is needed is to remove those harmful IDs and to replace them with something else. Why not make a separate field in the table that can hold a unique identifier of a product, but written in text instead of numbers? It may sound like a tiresome task and unnecessary, but it can help you out. We will add a separate field to each table in our database, and call it mod_page. This will hold a modified version of the product/category title, and this will be used as a unique identifier. For example the product 'Speed Baud 5000 Enhanced modem' may have a mod_page value of 'speed-baud-5000'. It is up to you how you create mod_page. It may be that you want to type each one in individually, or it may be that you use a simple PHP script to translate one into the other.
We now have to create a mod_rewrite rule that will interface with the code we wrote previously, as closely as possible. Obviously, now that we are using mod_names instead of IDs to call records, there will have to be some changes, but the structure of it should remain the same. Referring to the above code, it should be clear that we are expecting two parameters. One called 'id' and the other 'category'. Let us sculp a mod_rewrite expression that fulfils these criteria.
To make the URL user friendly I have chosen the format www.blark_inc.com/products-(category name)/(product_name).html .
RewriteRule ^products-(.*)/(.*).html$ products.php/?category=$1&id=$2
This expression will take two pieces of data: The first is the word, or character string after the 'products-' and the second is the name of the page in this phantom directory. Remember that the /products-whatever/ directory does not even exist. Rather, it is being used to fool the user and search engines into thinking that the site is structured in that manner.
To take our example from before,
https://www.blark_inc.com/products-modems/speed-baud-5000.html
will be magically and transparently transformed into
https://www.blark_inc.com/products.php?category=modems&id=speed-baud-5000 .
See how easy it is !
We now need to make a few changes to the products page code in order for it to pull the records out of the database. All that needs to be changed are the field names in the database query lines.
$category = mysql_fetch_array(mysql_query('SELECT * FROM category WHERE id="'.$category.'"')); $product = mysql_fetch_array(mysql_query('SELECT * FROM products WHERE id="'.$id.'"'));
becomes
$category = mysql_fetch_array(mysql_query('SELECT * FROM category WHERE mod_name="'.$category.'"')); $product = mysql_fetch_array(mysql_query('SELECT * FROM products WHERE mod_name="'.$id.'"'));
and thus the total code becomes:
<?php //Setup Database mysql_connect(127.0.0.1, blark_inc, my_password); mysql_select_db(blark_inc); //Get the GET parameters $category=$_GET['category']; $id=$_GET['id']; //Call in data for category and products $category = mysql_fetch_array(mysql_query('SELECT * FROM category WHERE mod_page=\"'.$category.'\"')); $product = mysql_fetch_array(mysql_query('SELECT * FROM products WHERE mod_page=\"'.$id.'\"')); //Output the information echo 'Category Name : '.$category['title'].'<br>'; echo 'Category Description : '.$category['description'].'<br>'; echo 'Product Name : '.$product['title'].'<br>'; echo 'Product Description : '.$product['description'].'<br>'; ?>
As mentioned previously, this is a nice way to make your site look well structured, to both user and search engine. It has been mentioned that mod_rewrite does take a little more CPU load to run than if there were no mod_rewrite at all, but I personally have never had a problem with it. Providing that it is used in the right way, and not used to solve every mis-extensioned file, it should not be discounted, and should form a part of your PHP toolkit.
Pete has been programming since the age of 10 on an old Atari 800 XE.
Though he took an Acoustical Engineering degree from the world-renowned
ISVR in Southampton UK, the call of programming brought him back and he
has been working as a Web developer ever since. He uses both Linux and
Windows platforms. He still lives in the UK, and is currently living
happily with his wife.