"Linux Gazette...making Linux just a little more fun!"

Better Web Page Design Under Linux

By Chris Gibbs

6-Jul-1999 revision: changes for the style sheet, also updated the URL for the Netscape multimedia plugin.

Note: The author does not have regular Internet access at this time and may be slow in responding to e-mails.

Wysiwyg Editors

The Advantage of Linux

Setting up Apache

Starting and Testing Apache

Search Engines

SGML Support

Introduction

Recently an article was published in Linux Gazette entitled Web Page Design Under Linux. This article produced some criticism in later issues. The main criticism seems to have been of the authors preference for hand coding HTML rather than using a HTML editor like the Windows HotDog editor. This is an argument I do not really want to get involved with. Neither do I want to spend much time on style. Whilst in most cases users want simple fast loading, clear pages, there will always be a place for garish eye candy, huge graphics and all kinds of complexities that take forever to download on a 28k modem. What I do want to address are the great things that linux offers. Great things that are free and would cost a fortune to implement on other operating systems. In particular I shall explain how to set your linux box up to be your own intranet server, and thereby fully exploit the abilities Linux offers for designing applications for the Web.

One point I think needs making, and which does not fit in with the rest of this article, is the Plugger Plug-in for Netscape Navigator. In the past many people have complained that Netscape plug-ins are not generally available for Linux. Plugger from https://fredrik.hubbe.net/plugger.html, seeks to address this by providing support for many audio/video/image types.

Wysiwyg Editors

By way of introduction though, I will put my two penny worth into the 'editor argument'. I have never yet found a HTML editor that I like! I am writing this article in StarOffice 5.0. I have never used it to write HTML so this is something of a test. I expect I'll have to edit the source when I finish writing. Another editor that seems as good as any other I have tried is the composer part of Netscape Communicator. I find this irritating, very very irritating. Why? Because I like my text to be fully justified. OK I know that some people think that full justification 'goes against the spirit of HTML', but personally I would rather read text that is fully justified than text which is not. I do not believe I am alone in this preference.

What happens with Netscape is that after I have spent a couple of hours designing some pages until I am happy with them, I load them all into vi and change every occurrence of  into , which can take some time if I've written a lot of text. Now a little later I want to make some changes, so I load the pages into Netscape Composer and I make some changes. But whist Communicator understands , Composer does not. In fact Composer does not allow  and changes each occurrence back to .... Bummer... I have to re-edit all the source by hand again. If I thought there was some advantage to using Composer, rather than hand writing my HTML I guess I would write a little program to search HTML files for  and replace with . But this is not the only short coming of HTML wysiwyg editors. They just don't seem able to do exactly what I want, how I want.

OK in fairness I am now impressed with StarOffice! Although there is no button to give full justification, it is easy to edit the Text Body style so that full justification is automatic. It is also easy to automatically indent the first line of a paragraph, set double line spacing etc. etc. Maybe I will be converted to using a wysiwyg editor for my HTML after all.

One feature that seems to be missing from StarOffice 5.0, is any easy way to define lists. Tables are well supported, but lists are not. I guess that it should be possible to define some new styles to allow the use of different kinds of list, but one would have thought that a button should be available for them. Also given the different kinds of list available for HTML, one might find that the styles menu becomes cumbersome and more difficult than it should be.

OK simple layouts are quicker with a HTML editor, but if you want full control you have to hand edit at some point. So to my way of thinking if you want to write good HTML you must learn HTML. It is a very bad idea to to think you can skip learning HTML by getting an editor that works like a word processor. You will not have the skills you need to produce good web pages. HTML is very easy to learn. Once you know it then you might find that Netscape or StarOffice provide useful tools to help you. But please do not think such tools replace the need to be able to hand code HTML.

The essential document to read if you want to produce great Web-Pages efficiently is HTML 4.0 (W3C: HTML 4.0 Specification), this is the full Document Type Definition for HTML and SGML. For once I have taken my own advice and read it! The problems I mention above regarding text formatting have all been solved for me! I look at the HTML source StarOffice has given, whilst I am impressed, I am not happy. Again I think that an editor like vi or emacs really is better and more efficient than using a wysiwyg editor.

The reason is that HTML 4.0 allows the use of Style Sheets. This article depends upon the use of a style sheet, special.css. This is a document that says how a browser should render my document. An important feature is that browsers that cannot display certain things (e.g. graphics) are not disadvantaged. All browsers can access this page in the way I intend them to. In the past authors have been forced to use techniques to format their pages that cannot be displayed correctly on all browsers. Propriety HTML extensions, the conversion of text into graphics, the use of images for white space control, the use of tables for layout and even the use of programs, have all been used to format text, all these methods cause difficulty for users and extra work for developers. The correct use of style sheets avoids these problems.

Once you are familiar with the use of style sheets, it will not matter how badly Netscape Composer performs, or how unfamiliar you are with StarOffice, using an editor like vi, really can be simpler than using something like Hotdog. Load my style sheet into your favorite editor and see for yourself how easy it is to change the look and feel of this document (this link and the one above are to an identical copy called special.txt, so that you can see the source without the browser parsing it).

STOP PRESS.....

Even as I am writing this document, I have found yet another web browser for linux! This one is worh some attention since it is produced by the W3 consortium, the same people who define the HTML specification. In fact this is the browser they use to test their specification. The following text is displayed when you start it for the first time:-

Amaya: is a Web client that acts both as a browser and as an authoring tool. It has been designed with the primary purpose of demonstrating new Web technologies in a WYSIWYG environment. The current version implements HTML, MathML, CSS, and HTTP.
Main Features: With Amaya, you can manipulate rich Web pages containing forms, tables and the most advanced features from HTML. You can create and edit complex mathematical expressions within Web pages. You can style your documents using Cascading Style Sheets. You can publish documents on local or remote servers with the HTTP Put method.
Browsing and authoring are integrated seamlessly. You can browse and edit Web pages at the same time. For that reason, a simple click just moves the caret to allow text editing; to follow a link, you have to double click.
Online Manual: A User's Manual is available online. You can browse it with the Help menu, which displays each section separately. You can also print it: just follow the Online Manual link below. You'll get the front page. Then build the whole book with the "Make book" entry from the Special menu and print the result.

This browser certainly has some advantages. The version I have is still beta (1.3b), so there are some short comings. I found that the File - Open Document dialog can resize its file box so it is non-functional. Also for some reason not all directories can appear in the directory box. At least one can specify the required file in the URL box! The fact that the manual does not come with the package is a definate minus for me.

What is nice about the browser is the pleasent way it renders pages. This page, for instance, uses full text justification, Amaya can actually split words in the traditional manner when required.

The really nice thing about this browser is the fact that you can edit files as you browse them. So if you are creating a document with many pages it is easy to switch between them. The down side of this is that there seems to be no way to to edit or view document source. Something that I would like to see in other browsers is the ability to create a "Table of Contents", with Amaya you can generate one based on the <H...> elements in your document. This will pop up as a seperate window and allow you to easily navigate through a document that has no links of its own.

At about 4.5 Megbytes, this is probably a very good alternative to StarOffice if you do not have the disk space required for StarOffice. I am certainly interested in seeing how this browser develops in the future. If you want to give it a try you can obtain it from the Amaya homepage. Additionaly there was a review of an earlier release of Amaya in Linux Gazette some years ago see issue 15. All I have to add to that review is that improvements must have been made. It seems the same in appearence as the screen shots show. Amaya displays the old style of Linux Gazette Contents pages quite well, but the new style in the last three or four issues is completey garbled. When Amaya starts up it no longer looks for a page on its home site, and I have not seen it seg fault as described. On the whole it does a very good job.

The Advantage of Linux

Now I've got that out of my system I'll get on to my main point. Drum roll please..... With linux it is simple to build a system you can gain http access to. Trumpet fanfare please.

Why is http access to your machine important?

Even if, like me, you are a stand alone machine, with no kind of network, it is easy to start up your favorite browser and https:// yourself. This means you can get into the wonderful worlds of cgi scripts, client server applications, java. etc. etc. etc. Without the need to access a 'real' network you can test any network application you care to develop for the Internet. You can test every aspect of your web design without wasting a phone bill. You can test applications safe in the knowledge that no matter what mistakes are in your code, only the machine you are using will be at risk, the "real" network will be unaffected until you decide your code is working correctly.

Web page design is not just about putting text/graphics and links onto the Internet. Increasingly it is about providing good user interfaces to network applications and providing an efficient means of communication. In the past only the largest corporations could afford to implement a WAN (Wide area network). Today anybody with a modem and pc can join the Internet, or implement their own intranet (a private network that acts in the same way as the Internet).

To illustrate my points consider the following scenario. You own a small tobacconists and live in a village called Tiny. Because the village is small you do not have many customers, so you don't sell items in vast numbers. That means you do not buy in large quantities from your suppliers and you cannot get the kind of discounts larger shops would get. But you have many relations and friends in other, similar villages who also run small tobacconists. If you all clubbed together and ordered your supplies as one entity you could take the discount advantages of bulk buying from your suppliers. The only problem is knowing which shop needs what items at any given time. You know that the discounts you would get would allow you to employ a van driver to deliver to all the shops and still leave each shop a significant saving.

How can web design under linux help you solve this problem?

The 'man with a van' needs information, what to buy in what quantity and where to deliver it. This sounds like a classic database application. Linux offers many sql database solutions. We want to keep costs to a minimum, we also want to maximize security and reliability. So good choices might be ingres or postgreSQL. If we look at these DBMS's we find that postgreSQL comes with a java interface. So lets say we design a suitable database with postgreSQL. This database will be held on a box that will be our server.

What we need is the ability for each shop to communicate with the server to tell it what stock we need to buy in. Shop keepers do not have to be computer literate. They also do not want to spend much money on computer systems. At least at this time it is unlikely that they could be persuaded to learn a UNIX operating system like linux. Cheap boxes already have Windows. An ideal solution is one where each shop can dial into the server, the manager can start up his/her favorite browser and use it to enter information to the server. It should not matter what operating system each shop uses.

What does our server need to do?

The first thing is to get Apache set up and running. Apache is a web server and comes with most if not all linux distributions as standard. What is not always clear is how to set it up correctly. This is something an installation program cannot do (easily) and needs to be done by hand. It is Apache that allows us to http ourselves. Of course, we will also need to allow remote machines to dial into our server, but that is a matter outside the scope of this document.

Once Apache is running we can design a java application to act as a user interface to our database.

We can test both the client and the server parts of our application on our server until we are certain it performs as required.

Then all we need to do is allow the shopkeepers to be able to dial into the server and gain access via their browsers to the java database interface.

The wonderful thing is that at the test stage we only need to use one linux box which acts as both client and server at the same time.

Setting up Apache

If you do not already know, then Apache is one of the most common http servers in existence. A great many ISP's (Internet Service Providers) use Apache to give their clients (i.e. You) access to the world wide web.

This document does not attempt to address the requirements of a true Internet or intranet server. All I am concerned with here is getting Apache up and running on a standalone machine so that client/server software can be tested. In particular I am not concerned with security issues here. If you do not intend to have a permanent network connection then all should be well. If you intend other machines to have access to your http server then you should read all the relevant documentation. Complete configuration of Apache can be a very complex issue which does not fall within the scope of this document.

Modern Linux distributions, such as S.u.S.E., have special requirements for setting up Apache correctly. To avoid confusion please read the documentation that came with both your linux distribution and your Apache distribution. The following steps will work for any Linux distribution, but be warned, if your distribution has special requirements I cannot be responsible for getting your system startup files in a mess.

For instance I shall describe how to start Apache automatically at boot time by adding a line to your /etc/inittab. Whilst some Slackware users will benefit from this approach S.u.S.E. users should find that it is better to edit their /etc/rc.config file in the appropriate manner.

Preparing your machine for Apache

These steps will prepare your machine for the installation of Apache. You might find that Apache is already installed, following the above steps will not hurt such installations.

Make certain you have set your /etc/HOSTNAME correctly. I call my machine Hawklord
Create a new account for the httpd administrator. I use the user wwwrun, whose primary group is nogroup (65534).

Edit your /etc/hosts to reflect the name of your machine. I have the entries

        127.0.0.1 localhost   
        127.0.0.2 Hawklord.Varteg    Hawklord

Edit your /etc/hosts.allow I have

        ALL:    127.0.0.1  
        ALL:    0.0.0.0
        ALL:    localhost
        ALL:    Hawklord.Varteg

If Apache is not already installed, find a pre-compiled version and install it as per the instructions. You should find that configuration files are placed under /etc/httpd, and other files are installed under /usr/local/httpd.

The directory /usr/local/httpd/htdocs should contain the Apache user manual in html format. Actually this directory will become the root directory of our http site, so you may want to move this documentation elsewhere eg. /usr/doc/Apache.

Plan your http site

When you log into a http site, eg https://linux.org, you find yourself at the root of what can be a very complicated directory structure. You can think of a http site as being a file system just like your own root file system. Whilst it is true that to a user the http site will look like a regular file system, the reality on the servers hard disk(s) can be very different. It is important to understand the differences and use them to your advantage.

On my system the document root is at /usr/local/httpd/htdocs, and this is the directory a user lands in when they access https://Hawklord.Varteg. But there is only one file and no sub-directories on my hard disk. I only keep index.html in the physical location /usr/local/httpd/htdocs. All the documentation users can access is held in other locations on my hard disks.

Looking again at /usr/local/httpd you should find other sub-directories, in particular cgi-bin and icons. These directories should seem to be located under your document root because they will contain files that should be available to any html file on your site that requires them. Though a user should not be able to directly access these directories. Much of my documentation is under /usr/doc, so I make that directory appear as /doc to the http server.

What this means is that you can store all your documentation on the server in locations that seem logical to you, you do not need to copy files or even make symbolic links to /usr/local/httpd/htdocs. Instead plan how you want your documentation to appear to a user. Also you can have directories that users cannot directly access, but which html documents can access.

For instance, the directory /usr/doc/ contains

   Linux_gazette    Howto    Ldp    java-documentation

I also want to access files under /usr/hobbies/literature and /usr/src/java/applets

I want my site to have the following structure:

     /    --->    cgi-bin   
                  docs   --->    Linux_gazette 
                                 Howto 
                                 Ldp 
                                 java-documentation  
                                 literature 
                  icons   
                  java_applets

Planning your http site in this way will save you headaches in the future!

httpd.conf

/etc/httpd/httpd.conf is the main configuration file for Apache. Some versions of Apache and/or Linux distributions recommend that all configuration information is kept in this file. Other versions recommend that you use all three files I shall mention below. If you want to keep all information in one file, simply put all the information in one file, there is no real difference between the two methods. You will find that the example files will contain sufficient comments to enable you to make the best choices for your system. I am only going to describe the changes you need to make to get Apache to work for you. Careful reading of the files will let you configure Apache better for your needs.

I am aware that a TCL configuration utility called Comanche exists for Apache. However, this is still in an early stage of development, so I do not recommend it for beginners. I found in practice the utility would not function correctly if you use only httpd.conf to configure your system. However it could prove useful for experimenting with different configurations.

For each line in the configuration files you can assume that your example file has a correct or sensible entry, unless I specifically mention it. Back up the examples before you make any changes!

ServerType standalone.: Please use standalone unless you know exactly what you are doing.
Port 80: Unless you have changed something this is correct, so do not change it.
HostnameLookups on: Again, it is probably a mistake to change this unless you know otherwise.
User wwwrun: This entry should refer to the user we set up above to be the httpd administrator.
Group: This entry should refer to the primary group you defined for the httpd administrator.
ServerAdmin root@localhost: This is the address Apache will use to send e-mails with details about problems with the server. Using localhost rather than Hawklord.Varteg seems to be more reliable.
ServerRoot /usr/local/httpd: This should point to the location you installed Apache's main files. By default this is /usr/local/httpd
ServerName Hawklord.Varteg: This should be the fully qualified domain name of the server. It should be the same as the entry you made in /etc/hosts.allow and /etc/hosts above.
Logs: Entries concerning log files should probably be left as they are until you feel confident about changing them. Though you might want to experiment with the loglevel entry if you experience problems.

srm.conf

This file contains site specific information. It is where we define how our site will look to a user.

DocumentRoot

should refer to the directory on our hard disk that will be the root directory of our site. For our example this is /usr/local/httpd/htdocs

DirectoryIndex

is the name of the file that should be loaded by a browser when a user enters a directory without specifying a filename, e.g. https://Hawklord.Varteg/ or https://Hawklord.Varteg/docs/. index.html is a sensible default.

Alias .....

Each line starting Alias will define a virtual directory on our system. For the example above this should include:

      Alias /cgi-bin/                  /usr/local/httpd/cgi-bin/
      Alias /docs/                     /usr/doc/
      Alias /docs/Linux_gazette/       /usr/doc/Linux_gazette/
      Alias /docs/Howto/               /usr/doc/Howto/
      Alias /docs/LDP/                 /usr/doc/LDP/
      Alias /docs/java-documentation/  /usr/doc/java-documentation/
      Alias /docs/literature/          /usr/hobbies/literature/
      Alias /icons/                    /usr/local/httpd/icons/
      Alias /java_applets/             /usr/src/java/compiled/

ErrorDocument

Error documents are the response the server will give when the user types a wrong URL, or tries to access a restricted file or directory etc. Apache gives good default error documents, but you can override this behavior and provide your own responses. I keep my error documents in the directory /usr/local/httpd/error

access.conf

This file contains permissions for our sites directories. If. when you test your configuration by starting httpd and pointing your browser to (eg.) https://Hawklord, or https://localhost (both will work for the above example), you get a file access error you will need to alter this file. Each directory in your site should have its own entry.

By default Apache has a very restricted set of permissions for the root directory, I have found that changing to:

   <Directory />
       Options All
       Order allow,deny
       Allow from all
       Options FollowSymLinks
   </Directory>

solved some problems for me. It is important to realize that a directory inherits its permissions from its parent directory. So if you want to allow outside access to your site you need to take great care when setting up your directory permissions.

Starting and Testing Apache

Once you are satisfied that you have correctly installed and configured Apache, you will want to test it! Log into your machine as root. At the prompt type:

     #:  httpd &

Now you can log into your machine as any user, start your favorite browser and enter the URL https://localhost. If all goes well you should load the Apache site file index.html. That is unless you moved the Apache documentation and provided your own index.html in /usr/local/httpd/htdocs

Once you ar satisfied that all is well, you will want to have httpd start at system boot time. Some Linux distributions, such as Red-Hat or S.u.S.E. will have a script to start Apache in their init.d directory. If this is the case then you just need to enable the script for sys V init in the normal manner.

As an alternative you can put the following line in your /etc/inittab

      ap:45:once:/bin/su --command=/usr/sbin/httpd

'ap' must be a unique identifier. '45' refers to the runlevels for which the command will be executed. Once is probably safer to use than 'respawn', since if there is a mistake in this line you will see a lot of error messages ;-(

The final part of the line '/bin/su --command=/usr/sbin/httpd', is intended to start up Apache as a process owned by wwwrun. It would be wise to test this command before you put it in your /etc/inittab.

Search Engines

If you have Apache running, and a large linux installation, then you might want to consider implementing a search engine. S.u.S.E. Linux provides htdig, in fact to gain full benefit from the S.u.S.E. Help System you need to use something like htdig. The only problem is the disk space you will need. I have a 1Gig partition devoted to documentation, this may seem a lot to many users! I have a lot of personal documentation, program documentation (increasingly this is HTML), all issues of Linux Gazette, Gimp documentation, java documentation etc. This takes about 500 Meg. The database htdig uses is between 200 - 300 Meg on my system. To update the database I need 200 - 300 Meg spare under /tmp. Actually when I update the database I change the location of /tmp since I do not have enough space on my root partition. Now since I have arranged all the documentation to be available to Apache, it is all referenced in htdig's database. If I have a question about any aspect of linux, or any of my personal subjects, all I have to do is formulate a suitable search pattern. I cannot adequately describe the savings in time this has given me! In the past I would have needed to access newsgroups to find answers to my problems. With htdig I can avoid this 99.9% of the time! Given the low cost of hard disk space, the fact that current program documentation is usually given as html, that most documentation of any kind is available as html, then it makes good sense to use Apache in conjunction with a search engine in order to have a most efficient information retrieval system.

Htdig may not be perfect, if you are used to Infoseek or lycos, it is a bit annoying because you cannot search for a phrase e.g. "starting the x server". Rather a document is searched for that contains all the words you enter. An advantage is that related words are searched for as well, e.g. if you search for 'god' you can also get results for 'gods' and 'godly'. Once you get used to htdig it becomes an indispensable tool. The time it saves you in looking for information is well worth the cost in terms of disk space. (on my system the real cost is about 250 Meg, though I need another temporary 250Meg when re-building the database).

SGML Support

Finally I shall mention Linux's SGML (Standard Generalized Markup Language) support, this is not normally concidered part of web page design since most home users will simpy want to be able to create their own HTML home pages and have no other use for such documents.

However, a great many people will want to produce documents in many formats. The same document might need to be available for publication as a book, or as an info page as well as being available as web pages. The linux documentation project contains many documents that are available in different formats according to users needs.

SGML allows a single source to be used to produce many different kinds of text format. The following package descriptions are taken directly from the S.u.S.E. 6.0 distribution, though they should all be available for other distributions:

Package "sgmltool"

SGML-Tools - a text-formatting package
SGML-Tools is a text-formatting package based on SGML (Standard Generalized Markup Language), which allows you to produce LaTeX, HTML, GNU info, LyX, RTF, and plain ASCII (via groff) from a single source.

This system is tailored for writing technical software documentation, an example of which are the Linux HOWTO documents. It should be useful for all kinds of printed and online documentation.

SGML-Tools is not able to process arbitrary SGML documents; in such a case, give jade_dsl a try and write your own DSSSL scripts (take the docbk30 package as an example).

Package "jade_dsl"

DSSSL-Engine for SGML documents

Jade is an implementation of DSSSL (Document Style, Semantics and Specification Language); pronounce it as "dissl" -- it rimes with whistle.

It has backends for SGML, RTF, MIF, TeX, and HTML.

The parser "nsgmls" and helper tools like "sgmlnorm", "spam", "spent", and "sx" are now included in the separate package "sp".

You'll find the documentation at /usr/doc/packages/jade_dsl/.

Package "sp"

SGML parser tools

The tools of this package provide the possibility to manage SGML and XML documents.

It contains the parser `nsgmls' and the supporting programs `sgmlnorm', `spam', `spent', and `sx'. `sx' is useful as a converting tool from SGML to XML, the comming WWW standard. You'll find the documentation for all the programs under /usr/doc/packages/sp/.

Package "sp_libs"

Libries required for sp and jade

Package "gf"

A "general formatter" for SGML documents

`gf' from Gary Houston is short for "general formatter", i.e., it can work on documents which use the ISO "general" document type definition (DTD). It can convert SGML documents conforming to a small number of DTDs into various output formats: LaTeX, ASCII, RTF and Texinfo. However not every output format can be generated for every DTD.

Apart from the general DTD, gf supports the HTML DTD used in the WWW project and Gary's Snafu DTD. `gf' is not intended as a flexible system for hacking up a formatter for a random DTD, but as a usable document production system for a few DTDs.

Package "jadetex"

JadeTeX - LaTeX macros to process TeX output from Jade (jade_dsl)

With Sebastian Rahtz' macro package `jadetex' is is possible to process the output of the TeX backend of Jade (jade_dsl). Resulting DVI files are viewable e.g., with `xdvi' or printable like any other DVI file.

I have no real experience with SGML so I will leave the appraisal of these packages to the reader. For some people these will prove indespensible tools for producing HTML pages.