"Linux Gazette...making Linux just a little more fun!"


Defining a Linux-based Production System

By Jurgen Defurne


Introduction

In a previous article ("Thoughts About Linux", LG, October) I browsed upon several topics to use Linux in business, not only for networking and communication, but also for real production systems. There are several conditions which need to be fulfilled, but it should be possible to define a basic database system, which is rapidly deployed and has everything that is needed in such a system.
The past two months I have been increasing my knowledge about Linux and available tools on Linux. There are several points which need further elaboration, but I have a fairly good idea of what is needed.

Goal

The goal is to have a look at the parts which are needed to implement a reliable production database system, together with the tools needed to provide for (rapid) (inhouse) development, but for a fairly lower cost than is needed with traditional platforms. It must be reliable, in the sense that all necessary procedures for a minimum downtime are available, with the emphasis that a little downtime can be tolerated.
I do need to place a remark here. I worked on several projects, where people tried to save time by asking for rapid development, or trying to save money by reusing parts which lay around, or by converting systems. What happened in all cases was that both money and time were lost, the main reason being not understanding all aspects of the problems.
This is a mistake I don't want to make. I think that I now have enough experience to show a way to achieve the above defined goal, describing a Linux-based production platform which has lower deployment and exploitation costs.

Basic recommendations

These are general guidelines. The first part in creating and exploiting a successful production system is in constraining the amount of tools that are needed on the platform. This leads to the second part of success, understanding and knowing your tools. Experience is still the most valuable tool, but depending on the amount and complexity of tools, much time can be wasted trying to get to know the tools that are at hand. Good handbooks with clear examples and thorough cross-references are a great help, as are courses on the subjects that matter.

Hardware

At the moment I won't go very deep into hardware. The base platform should be a standard PC with an IDE harddrive on a PCI controller which is fast back-to-back capable. I tested the basic data rate of a Compaq Despro system (166 MHz, Triton II chipset) and got a raw data-speed (unbuffered, using write()) of 2.5 MB (megabytes)/s. I suppose that for a small entry platform this is fairly reasonable. Further tests should be developed to test the loading of the machine under production circumstances.
The most important part , however, is that all machines running Linux with production data, should be equipped with a UPS. This is because the e2fs file system (as most Un*x filesystems) is very vulnerable in the case of an unexpected system shutdown. For this reason, a tape drive is also indispensable, with good backup and restore procedures which must be run from day 0.

Production tools

Our main engine is the database management system. For production purposes, the following features must be available :

Fast query capability

This feature is especially necessary for interactive applications. Your clients shouldn't wait half a minute before the requested operation is fulfilled. This capability can be enhanced by buffering, faster CPU's, faster disk-drives, faster bus-systems and RAID.

Batch job entry

This is a very valuable production tool. There are much jobs which depend on the daily production changes, but which need much processing time afterwards. These are mostly run at night, always on the same points of time, daily, weekly, monthly or yearly.

Printing

Printing is a very important action in every production system and should be looked after from the design phase. This is because many companies have several documents that are official. Not only printing on laser- or inkjet printers should be supported, but also printing with heavy duty printing equipment for invoices, multi-sheet paper, etc.

Telecommunication

Telecommunication is not only about the Internet. There are still many systems out there that work with direct lines between them. The main reason is that this gives the people who are responsible for the services, a much greater degree of control over access and implementation. In addition to TCP/IP; e-mail and fax, support for X.25 should also be an option in this area.
People should also have control over the messages and/or faxes they send. A queue of messages should be available, where everybody can see all messages globally (dest, time, etc) and where they have access to their own messages.

Transaction monitoring

With transaction monitoring, I mean the ability to rollback pending updates on database tables. This feature is especially needed when one modification concerns several tables. These modifications must all be committed at the same time, or be rolled back into the previous state.

Journaling

This capability is needed to repair files and filesystems which got corrupted due to a system failure. After restarting the system, a special program is used to undo all changes which couldn't be committed. In this sense, journaling stands very close to transaction monitoring.

User interfacing

This is a tricky part, because it is part development and part production. On the production side, the interface system should give users rapid access to their applications and also partition all applications between departments. Most production systems I have seen do this with menu's. There are several reasons. The main reason is that most production systems still work with character-based applications. There are many GUI's out there, but production systems will still be solely character based (except for graphics and printing, but I consider these niche markets), even on a GUI. The second reson is that a production system usually has lots and lots of large and little program's. You just can't iconify them all and put them in maps. Then you would only have a graphical menu, with all icons adding more to confusion than clarity.

What's available ?

Note : When I name or specify products, I will only specify those with which I am already familiar. I presume that any one of you will have his/her own choices. They serve as basic examples, and do not imply any preferences on my side.

The only database system on Linux I personally know for the moment, is PostgreSQL. It supports SQL and transaction monitoring. Is it fast ? I don't know. One should have a backup of a complete production database, which can then be used to test against the real production machine, with interactive, background and batch jobs running like they do in the real world.

For batch jobs, crond should always be started. In addition to this, the at and batch commands can be used to have a more structured implementation of business processes.

For printing, I know (and use) the standard Linux tools lpd, Ghostscript and TeTeX. There might be a catch however. In some places you need to merge documents with data. The main reason for this is that a wordprocessing package offers more control over the format and contents of the document, instead of printing the document with a simple reporting application. On my current workplace, a migration to HP is busy. The solution there is WordPerfect. In the past, I have used this solution under DOS, to automatically start WP and produce a merged document. Is this possible too with StarOffice ?
Are there other print solutions which offer more interactive control over the printing process than lpd ? Users should have more easy access to their printjobs and the printing process.

Telecommunication is a real strong point of Linux. I won't enumerate them all. Even if it doesn't support X.25, it is still possible to use direct dial-up lines using SLIP or PPP.

Journaling is the weakest point of Linux. I have worked with the following filesystems : FAT, HPFS, NTFS, e2fs, the Novell filesystem and the filesystem of the WANG VS minicomputer system. With all these systems, I have had power-failures or crashes, but the only file-system that gives trouble after this is e2fs. In all other cases, a filesystem check repairs all damage and the computer continues. On WANG VS, index-sequential files are available. When a crash occurs, the physical integrity of an indexed file can be compromised. To defend against this, there are two solutions. The first is reorganizing the file. This is copying the file on a record-by-record basis. This rebuilds the complete file and its indices, and inserts or deletes which were not committed are rejected. The second option is using the built in transaction system. A file can flagged as belonging to a database. Every modification to these files is logged until the transaction is completely committed. After a failure has occurred, the files can be restored in their original states using the existing journals. This is a matter of minutes.
I think that the only filesystem on PC which offers comparable functionality is that of Novell.
The e2fs file system check works, but it does offer not enough explanation. When there is a really bad crash, the filesystem is just a mess.

Development tools

I will describe here the kind of tools that I needed when I was maintaining a production database in a previous job. The main theme here is that programmers in a production environment should be productive. This means that they should be offered a complete package, with all tools and documentation necessary to start immediately (or in a relatively short time). This means that for every package there should be a short, practical tutorial available.
I will divide this section into two parts, the first being necessary tools, the second being optional tools. Also necessary for development is a methodology. This methodology should be equal through all delivered tools. The easiest way to do this is through an integrated development environment.

Compulsory development tools

Which tools are the minimum needed to start coding without much hassle ? I found these tools to be invaluable on several platforms :

Integrated development environment

Your IDE should give access to all your tools in an easy and consistent manner. It should be highly customisable, but be delivered with in a configuration which gives direct access to all installed tools.

Editor

If you have a real good editor, it can act as an integrated development environment. Features which enhance productivity are powerful search-and-replace capabilities and macro features (even a simple record-only macro feature is better than no macro features). Syntax colouring is nice, but one can live easy without it. Syntax completion can be nice, but you have to learn to live with it. Besides, the editor cannot know which parts of statement you don't need, so ultimately you will have more clutter in your source, or you waste your time erasing unnecessary source code.

Screen development

This is an area where big savings can be done. For powerful screen development you need the following parts in the development package :
  1. Standard screens which are drawn upon information in the data dictionary
  2. Easy passing of variables between screens and applications
  3. A standard way of activating a screen in an application
The savings are on several places. If you create a new screen, then you should immediately get a default screen with all fields from the requested table. After this, only some repositioning and formatting to local business standards needs to be done. I worked with two such systems, FoxPro and WANG PACE, and the savings are tremendous in all parts of the software cycle (prototyping, implementation and maintenance).

Data dictionary

A data dictionary is a powerful tool, from which much information can be extracted for all parts of the development process. The screen builder and the HL preprocessor should be able to extract their information from the data dictionary. The ability to define field- and record-checking functions in the data dictionary instead of the application program, eliminates the need to propagate changes in this code through all applications. With the proper reports, one should also be able to look at different angles into the structure of the database.

High level language with DBMS preprocessor support

You can't do complete development without a high-level language. There are always functions needed which can't be implemented through the database system. To make development easier, it should be possible to embed database control statements in the source program. The compiler should be invoked automatically.

Scripting language

A scripting language is very useful in several aspects. Preparing batch jobs is part of it. I also found out that a business system consists of several reusable pieces, which can be easily strung together using a scripting language. Also, the overall steering and maintenance of the system can be greatly simplified.

Optional development tools

These are tools that were avalailable on several platforms, which can come in handy, but aren't necessarily usable to deliver production environment applications. I found out that these are little used.

Interactive query system

This is often designed to be used by people which are not programmers. Experience has thaught me however that people who are not programmers in a business, don't have the time to learn these tools. It is a useful tool for a programmer to test queries and views, but it isn't really useful as a production tool. Only in some cases, for real quick and dirty work, is it worth using.

Report editor

This is even a more overestimated tool. I shared thoughts about this with other programmers, and our conclusion was : bosses always ask reports which are much more complicated than a simple report editor can handle. It would be far better to use a programming language specifically designed for reporting (any one know of such a thing ? Any experiences with Perl for extraction and reporting ?).

What's available ?

Note : I will direct my attention only at the compulsory development tools. The rest of the environment will be centered around the features of PostgreSQL.

As an integrated development environment, EMACS is probably the first which comes to mind. It integrates even with my second subject, a powerful editor. Is it even at all possible to draw a line between the two ? Is EMACS a powerful editor which serves as a development environment, or is it a development environment which is tightly integrated with its editor?

The data dictionary, the screen development package and the DBMS preprocessor are more thightly bound than other parts of the package. The screen editor and the DBMS preprocessor should get their information from the data dictionary, and the DBMS HL statements should also provide for interaction with screens. It should be both possible to develop screens for X-windows, as well for character-based terminals.
In the field of high level languages, there are several options, but a business oriented language is still missing. Yes, I am talking COBOL here, although an xBase dialect is also great for such applications. I have programmed for eight years in several languages, only the last two year in COBOL, and it IS better for expressing business programs than C/C++. If anyone would ask me now to write business programs in C/C++, I think the first thing I would do was write a preprocessor so that I could express my programs with COBOL-like syntax.
I don't know how ADA goes for business programs, but a combination of GNAT, with a provision to copy embedded SQL statements to the translated C-source and then through the SQL preprocessor would maybe work.
I only had a small look at Perl, and from Tcl and Python I know absolutely nothing, but while interactive languages are fine for interactive programs, you should also bear in mind that some programs must process much data, and that therefore access to a native code compiler is essential.
There is another point in which only COBOL is good. This is in financial mathematics. This is due to the use of packed decimal numbers up to 18 digits long where the decimal point can be in any place. You should have compiler support for that too. On the x86 platform this capability exists in the numerical processor, which is capable of loading and storing 18 digit packed decimal numbers. Computations are carried out in the internal 80-bit floating point format of the co-processor.

When you have a Linux system, the first scripting language you run into is probably that of the bash shell. This should be sufficient for most purposes, although my experiences with scripting languages is that they benefit greatly from statements for simple interaction (prompting and simple field entry).

What should be delivered ?

As I said before, this list doesn't present any endorsement from me towards any of these products or programs. This list should be expanded with all products which fit in either one of these categories, so all hints all wellcome.
Another weak point in some areas of Linux is documentation. For a production environment, the Linux documentation project is probably a must, preprinted from the Postscript sources. For the commercial products, good documentation is also not a problem. For other parts of Linux tools, the books from O'Reilly & Associates are very valuable. HOWTO's are NOT suited for a production environment, but since they are more about implementation, they are suitable for the people who put the system together. The catch is this : when a system is delivered, all necessary documentation should be prepared and delivered too. I worked with several on-line documentation systems, but when in a production environment, nothing beats printed documentation.
 
Production system
DBMS
- Fast query/update
- Transaction processing
- Journaling
postgreSQL
mySQL
mSQL
Adabas
c-tree Plus/Faircom Server
...
Communication ppp
slip
efax
...
Batch job entry crond
at
batch
Printing lpd
User interfacing ?
Development system
IDE EMACS
Editor EMACS
vi
Screen development Depends on DBMS
Data dictionary Depends on DBMS
Application language C
C++
Cobol ?
Perl
Tcl(/Tk)
Python
Java
Scripting language bash

Summary

I am still trying to drag Linux into business. If you want to do business using Linux, you should be able to deliver a complete system to the customer. In this article I outlined the components of such a system and some weaknesses which should be overcome. As a result, I created a table enumerating the needed components for such a system.
This table is absolutely not finished. I welcome all references to programs and products to update this table. It should be possible to publish an update once a month. What I also should do, is extend the table with references to available documentation.
Another part which needs more attention is developing tests to assess the power of the database system, ie. what can be expected in terms of throughput and response under several load scenarios.
 


Copyright © 1999, Jurgen Defurne
Published in Issue 36 of Linux Gazette, January 1999


[ TABLE OF CONTENTS ] [ FRONT PAGE ]  Back  Next