...making Linux just a little more fun!

<-- prev | next -->

Preparing For My Interviews Part 2: MySQL and Python

By Mark Nielsen

Preparing For My Interviews Part 2: MySQL and Python

By Mark Nielsen

  1. Introduction
  2. Mysql Master/Slave and Clusters
  3. The Python script and Python Module MySQL.py.
  4. Executing the Python script.
  5. Some commands to execute.
  6. Next Month: Stored Procedures in MySQL (5.0?)
  7. Conclusion


Introduction

This article is actually a lie. I am not preparing for interviews anymore. I started this article at the end of October, but then I got a 2 month contract right after that. Still, it's the thought that counts.

This article is to help you setup MySQL Master/Slave/Cluster combinations on one computer. Why would that be useful? Well, for starters, there isn't an abundance of Master/Slave/Cluster documentation. The Cluster technology is really new. Second, MySQL is a very very very hot database. There aren't too many good MySQL DBAs out there. If you want to secure your job in the future, learn MySQL. The Master/Slave and Cluster technologies are a must. If you don't know anything about the Master/Slave or Clusters, you can pretty much forget getting hired. In every MySQL interview (I had like 10 companies interviewing me in a six week period and half of them used MySQL for something) the Master/Slave questions popped up and topics of the Clustering technology came up as well. Third, it is a pain to setup Master/Slave or Cluster environments by hand. If my script works on one of your computers, copy over the software, copy over the config files, make slight changes to the config files, and you can get a real multi-computer environment setup in minutes.

The second purpose of this article is to show you how to use Python to write a simple application (compiling and installing MySQL). I tried to include a lot of stuff in the Python code which is common to most Python scripts/modules. Also, I am thinking about turning the Python code into a package in the Vaults of Parnassus - just for kicks, and so I can say that I have some sort of Python code published.

So, why did I use the programming language Python to execute all the commands? I love Python, it is object oriented (from the ground up), has good exception handling, has been compilable for a long time, it is hard to write ugly Python code, it is easy to understand other people's code, there is only one way to do things (usually), it's easy to manipulate, and it is a general tool from web programming to Unix scripting to database programming to GUI programming to mathematical programming (my favorite). There are so many reasons to love Python if you are a true object-oriented programmer. The other reason why I try to use Python for all personal projects is to convince people where I work (or will work) that Python rocks and should be used wherever appropriate. The sad thing is, most managers don't want to use Python because of the lack of people who can write good Python code, so it is my duty to convince them otherwise. Update: I got hired at Google, which uses a lot of Python so now I am a happy camper!

Why compile MySQL? Why not just use the RPMs? I don't think clustering is built into some of the RPMs. Also, I always like to compile and install software myself. If you can't compile it, odds are, there might be other problems. If you do decide to use RPMs, and you decide to use an RPM server, please use YUM and do not use commercial RPM servers. All the commercial RPM servers I have seen (just one, you know who) are way overpriced and are geared towards the executive staff or people who want to buy support so that they can blame someone if something goes wrong. The sales staff are really good at making you believe you can jump to the moon, when really, the software they present only manages the installation of RPMs. Any 1st year programmer could write a web interface to do the same thing. By using YUM (it is easy, simple, and free of restrictions), you empower yourself and you will help bring down the costs of the the overpriced commercial RPM services. If the commercial RPM services weren't so overpriced (they should be like $25 per computer max) and didn't have so many stupid restrictions (like installing thebind RPM requires a more expensive license), I wouldn't mind, but they are getting away with doing so little, it is ridiculous. Politics and the suits have gotten in the way of good technology at some of these companies.

One more thing, I bought a 2-gig RAM 64 bit AMD CPU with Serial ATA hard drives. I bought this hardware configuration for the simple purpose of using a 64 bit operating system. Why? Because many companies are using 64 bit AMD cpus for their databases and other things. Just at work the other day, I mentioned I got a MySQL cluster to work on my home computer, so I as asked to setup a test cluster at work. See how valuable it is to stay ahead of the market?

The nice thing about the 64 bit AMD CPU is that is will also support the 32bit operating systems. Thus, I was able to take the safe route by installing a 32 bit Linux OS before I risked using the 64 bit versions (which had a lot of problems in the past). Plus, don't kill me for saying it, but I still wanted to play some Windows games that weren't available for Linux yet. I am close to the point where I never need to use Windows again. If gaming companies would always make Linux versions of their products, I would never have use Windows for anything. That would really be nice.

Mysql Master/Slave and Clusters

What is MySQL? Well, if you mean the MySQL database server, it is an SQL database server available from mysql.com. It is comparable to PostgreSQL and Oracle in most ways. MySQL is one of the two most popular Open Source database servers out there (PostgreSQL being the other).

What are MySQL Master/Slaves? The Master MySQL server lets people read and write data to it. It copies all of its data to the MySQL slave computers. Thus, you end up with many database servers with the same data. This can be useful for load-balancing your webservers against many database servers or performing a failover in case the Master dies (you shutoff the Master and make one of the Slaves the new Master). Normally, the Slave computers are read-only. Since most websites have a 4-to-1 ratio of reads to writes, having many read-only MySQL slaves can be very useful. How it is useful? Your website can handle more data and use more database connections spread over multiple servers. All the writes still have to be done to the Master, but the website can choose which Slave it wants to read data from. Separating the write connections from the read connections can speed up your website a lot.

What is MySQL Clustering? MySQL Clustering lets you put multiple computers together which are all the same and have all the same data. You can connect to any computer in the cluster and perform read/write operations which are immediately available to the other mirrors. The main different between Clustering and Master/Slaves is that each computer in a Cluster can be written to while in a Master/Slave configuration you are only suppose to write to the Master. Also, in Clustering, when you write data to one machine, the computer will not respond back with an "OK" status until the data has been copied to all the other computers. With the Master/Slave configuration, the data is not necessarily copied to the Slaves immediately. Effectively, you can view the cluster as one single entity/database. The nice thing is, if one computer is the cluster crashes, it doesn't affect the other computers or the data in the cluster. This behavior doesn't exist in the Master/Slave replication. If the Master goes down, the replication stops.

Clustering is a little more complicated, but it has huge advantages and will most likely be used a lot by many companies. It is fairly new technology, so in my opinion, it is good time to start testing it. For small websites which love to be on the bleeding edge of technology and you don't mind being risky, go ahead and use MySQL Clustering. I am a little cautious about using it for heavy performance database needs, but I am anxious to at least try! You should be warned that there are a lot of limitations with MySQL Clustering. For example, your database can only be as big as the free RAM you have. Also, I have had a lot of problems getting more than just a two storage node cluster working.

The Python script and Python Module MySQL.py

The Python Script "Compile_MySQL.py" is meant to access the MySQL.py module to install, configure, and get MySQL running. I tried to make it as simple as possible so that a lot the options you want can be specified at the command line.

The MySQL.py module currently only has one class called "Installer" which is mostly finished. It has other classes I am working on, but nothing worth talking about yet. I included a lot of stuff in the module like:

  1. I broke down the main modules to approximate the steps you would use to install MySQL. In addition, I separated out the compiling from the configuring because I wanted you to be able to skip the compiling if you already installed MySQL once.
  2. Use of the try/except statements.
  3. The first string of the method(s) is used for automatic documentation extraction.
  4. The concept of classes.
  5. The init and del methods which are executed at creation and destruction of an object.
  6. Other internal methods to handle printing an object or when you try to compare this object against other objects.
  7. Escaping to shell to execute code, get its return status, and output data.
  8. Regular expressions and using them repeatedly. It's only necessary to create a regular expression once, so that you don't waste resources recreating regular expressions.
  9. Storing variables in an object.
  10. Line command arguments are passed.
  11. The script only creates one set of MySQL binaries. It uses different directories and ports for each instance of the MySQL server.
Personally, I think the module could use a lot of work. I see repetitive code that could be converted into methods, there need to be more command line options, and there needs to be a place to save the configuration for later runs (using xml) if we want to use this as an admin tool.

Here is sample script to use my MySQL module. Note, if you already ran this script and got mysql working, and you want to reconfigure it, then comment out "M_Obj.Execute_Compile_Script()" to reconfigure mysql. I tried to comment each below in the script.

#!/usr/bin/python
   ### Just a bunch of standard modules I load. 
import urllib2, urllib, string, re, os, struct
import base64, string, gzip, sys, time, commands
import getopt

   ### The module I created. 
import MySQL

   ## Initialize our installation object. 
M_Obj = MySQL.Installer()

#--------------------------------------------------------
### Build and install the mysql binaries. These binaries will be common
### with all the mysql services.

M_Obj.Write_Compile_Script()   ### This creates out bash compile script.
M_Obj.Execute_Compile_Script() ### This executes our bash script.
                               ### This creates one set of binaries for
                               ### all mysql instances. 

M_Obj.Stop_Instances()   ### In case there are any running.
M_Obj.Setup_Instances()  ### Setup and initialize the databases. 
M_Obj.Start_Instances()  ### Start the databases. 

Executing the Python script

Download these files:
  1. MySQL.py and save it as MySQL.py.
  2. Compile_MySQL.py and save it as Compile_MySQL.py.
  3. Config files. Save all the config files in this directory (or grab this tarball containing all of them) into a directory called "Config" where you execute Compile_MySQL.py.
Then execute this command:
python Compile_MySQL.py  

This will download and install MySQL. It will make a log file called "/tmp/mysql_install.log". If it doesn't download MySQL, download mysql-4.1.7.tar.gz manually from mysql.com and save it in the directory "/usr/local/src/mysql_compile".

NOTE: If you want to specify another database server, then execute this, substituting the appropriate URL:

python Compile_MySQL.py -d https://www.signal42.com/mirrors/mysql/Downloads/MySQL-5.0/mysql-5.0.2-alpha.tar.gz

Some commands to run

So, now we got the database server installed. Let us do some stuff to verify the Master/Slaves are working as well as the MySQL Cluster. I made things really easy for you (assuming the installation was good). I have made a bunch of scripts for you.

Here are a bunch of scripts in the home directory where mysql got installed.

ScriptUsagePurpose
bash_aliasessource bash_aliases This setups aliases to connect to each service. The aliases connect to the master service, slave services, ndb_mgm, and each cluster mysqld service.
Start./StartStarts the master, slaves, and cluster.
Stop./Stop Stops the master, slaves, and cluster.

The scripts listed below are created in the "scripts" directory where mysql got installed. If you did a "source bash_aliases", these commands will be in the bash path. You can execute these scripts as many times as you like.

ScriptPurpose
Master_Status.bash Gets the status of the master service.
Slave_Status.bash Gets the status of the slave services.
Master_Slave_Status.bash Gets the status of the master and slaves.
Master_Slave_Test.bash Inserts data and then the master and slaves perform the same sql query which should show the same results.
Cluster_Status.bash Gets the status of the cluster
Cluster_Test.bash Inserts data into the cluster. It shuts down each storage node and sees if each mysqld server can still access the data.

Try to learn what the scripts do step by step. If you can fully understand what those scripts are doing, then you know at least the basics about how to manage the mysql services.

Follow these steps:

cd /usr/local/mysql-cluster    ### Change to the mysql directory. 
source bash_aliases            ### Load some aliases into the bash shell.
./Start                        ### Start mysql master, slaves, cluster.

Cluster_Status.bash	       ### Get the status of the cluster
Master_Status.bash             ### Get the status of the master service.
Slave_Status.bash              ### Get the status of the slave service.

Master_Slave_Test.bash         ### Test the master/slave services.
Cluster_Test.bash              ### Test the cluster.

Next Month

I am probably going to use MySQL 5.0 at home from now on. So, I am probably not going to test MySQL 4.1.x anymore. Saying that, possible next months topics include:
  1. MySQL failover for Master/Slave cluster. The master has gone bad and we want to make a slave the new master. There are lots of issue to consider when doing this, like data corruption.
  2. Stored procedures.
  3. More cluster configurations.
  4. Scripts to check if your MySQL Master/Slave/Cluster computers are running smoothly.
  5. Maybe check out all the new tools you can download from MySQL?

Conclusion

The Python module was sort of an overkill. However, I wanted to make it really easy to compile and install MySQL every time there is an upgrade. I am going to use Python for everything I do and I plan on using it for all SysAdmin needs in the future (if I can help it). Every time there is a new version of Python, I am so impressed because it just gets easier, simpler, and more powerful. In Python 2.4 (which just came out), sets are really cool for intersection and union calculations.

MySQL is coming along nicely. With all the new features being put into MySQL, it can compete against some of the larger database companies. It is important to learn MySQL because new job opportunities are emerging and MySQL will be a hot item over the next few years (before the masses learn it). The Python module and the script I wrote make it really easy to get a fairly complicated set of MySQL instances installed on your computer. To use it in production, just copy over the binaries and the config files and make a few changes to the config files and you are done! Once you see how Master/Slave/Cluster configurations work, it really becomes simple to understand how it all works.

Afterword:
Get your certifications.

In the last few months, I believe the MySQL Professional Certification paid off. Most employers don't want just a DBA anymore. PostgreSQL and MySQL make database servers easy to use for most people. No longer do you have to worship the DBA from hell because he/she knows all the little tricks. All the docs and examples are online and you can fool around with the source code yourself. This is going to put pressure on DBAs to be more than just DBAs. I believe the MySQL jobs I applied for really liked my programming and Sys Admin skills. I know for a fact that one company used my MySQL Professional Certification as leverage to interview me because they could claim "he is certified". They really wanted a programmer, but because of politics, they couldn't directly get a programmer, so my certification let me slip in. Cool, huh? A lot of certifications really don't tell you whether someone is good or not. However, the LPI and MySQL Certifications have two things going for them. First, they are actually reasonable certifications. Most certifications are meant to just make money for the company, but LPI and MySQL don't seem to be like that; they really want you to be qualified. Second, managers and HR like certifications because it protects them. I know most techies hate certifications, but because of politics, you really need to get certified to fight against the machine. LPI and MySQL Certifications are fairly inexpensive, unlike a lot of the other stupid certifications, so it really isn't that bad.

 


[BIO] Mark Nielsen was enjoying his work at cnet.com as a MySQL DBA, but is moving to Google as a MySQL DBA. During his spare time, he uses Python heavily for mathematical and web projects.

Copyright © 2005, Mark Nielsen. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 110 of Linux Gazette, January 2005

<-- prev | next -->
Tux