How to Build a Parallel Computing Cluster

In this howto, we are going to describe the procedure of building a diskless parallel computing cluster for computational physics. First of all, we will give a brief overview of Linux Operating system. Based on this knowledge, we hope readers will have a better understanding of the whole setup procedure. This howto is divided into the following section.

Introduction.
Overview of Linux Operating system.
The detail procedure for building the whole cluster.
Reference

Introduction

Computer simulation is a powerful tool for the study of complex systems. Due to its mighty computational power and realistic animation capability, the traditional trial-and-error investigation process can be speeded up by changing the parameters through the interactive interface and the understanding of the simulated result can be comprehended by computer graphic or animation. The simulation also provides an oppotunity for investigators to look at some processes which are not easy to be observed or too expensive to be really carried out.

To solve a complex problem correctly and efficiently, there are several concerns in computational physics. One of these concerns is how to speed up the simulation process. One solution for speeding up is to use or buy expensive parallel supercomputer. For most research groups, this is not practical. Thanks to the advanced technology of current PC, we can build up parallel computing cluster entirely from commodity parts both in hardware and software within an affordable budget.

The original PC cluster project, also called Beowulf project, was started at the Center of Excellence in Space Data and Information Sciences NASA in early 1994. It is a system which usually consists of one master or server node, and one or more client nodes connected together via Ethernet. The master node controls the whole cluster and serves files to the client nodes. The master node is also the cluster's console and gateway to the outside Internet world.

The advantages of a Beowulf-like cluster are:

Hardware is available from multiple sources that means low prices and easy maintenance.
Software both operating system (LINUX) and parallel programming packages (MPI, PVM etc.) are available from Internet community.
The sofeware are usually based on standards in computer industry.
Source code is freely available to everyone under the GNU General Public License that means the source code can be modified and improved depends on individual needs.
Huge amount of free documents and tutorial of building a Beowulf cluster can be found from Internet.
Considering the performance-to-price ratio, it is really cheap.

In last summer, we built a Beowulf-like parallel computing cluster for testing this idea. The cluster includes one master node, one NFS and NFS-root server node, and several diskless client nodes. All of these nodes are connected to switch hups by ethernet to achieve the parallel computing capability. The cluster hardware configuration is shown in Figure 1. . The detail specification of the cluster is listed in the following:

Hardware Configuration

One Master Node:
Two Pentium III 1GHz CPU, 512M RAM, three 3Com 3c905c ethernet cards, one 30G bytes hard disk, one floppy, one VGA card, one monitor
One NFS and NFS Root Server Node:
Two Pentium III 1GHz CPU, 512M RAM, two 3Com 3c905c ethernet cards, one 30G bytes hard disk, one floppy, one VGA card, one monitor
11 Client Nodes:
One Pentium III 1GHz CPU, 512M RAM, two 3Com 3c905c ethernet cards, no hard disk, one floppy, one VGA cards (for debugging)
2 Hubs
D-Link DES-1016R, D-Link DFE-916DX

Software Configuration

Operating system: RedHat 6.2
Network booting: etherboot-4.0
Parallel computing: Message Passing Interface mpich-1.2.1
Display: X window library, OpenGL library, Tcl/Tk

In this article, we like to share our experience of how to build the cluster with others. It is difficult to give a precise step-by-step procedure for readers to follow. The reason is very simple, the Linux OS evolve so quickly. Today's setup might become obsolete tommorrow. Instead of giving a detail description, we like to give a general guile line, basic principle and where to look for additional information on internet. Our experience strongly suggests that for successfully building up your own cluster, some knowledge about the Linux OS is necessary. Only with these knowlegde, when something goes wrong, you will have enough confident to try to solve it. After all, the majority of the reader of this article are physicists and physicists don't like doing thing blindly.

Linux Operating System

The challenge of buind a diskless-client PC cluster is mainly on how to boot kernel and how to mount the root file system from a remote server. Since client node doesn't have a hard disk to host its kernel and file system, so they must be provided by other sever through the connection of network.

The server node needs a full installation of RedHat Linux OS.

The server must prepare a simplified kernel image for client nodes to down load.

The server must prepare a mimimum root file system for client nodes to mount.

The client node must have a network boot disk to boot from its floppy driver then ask for the kernel image from network.

The client node found a kernel image from the server, down load it, uncompress it, and execute it.

The kernel image for client than will mount its root file system as a NFS-root file system. Based on the above analysis, we need to know

What is Linux Kernel?
Build an Kernel Image for Diskless Clients?
Build a root filesystem for Diskless Clients?
Boot Procedure of Linux?
- Booting server kernel.
- Booting client kernel.

What is Linux Kernel?

The purpose of Linux Kernel is to insulate the hardware complexity from users. It provides a set of system function calls for user to avoid dealing with the hardware details. For example, if users want to access a file from hard disk, just simply issue a read system call to kernel and kernel will handle the detail such as moving disk R/W arm to the correct (track, sector) position of hard disk and return the contents of file to the users. From this aspect, you can see it would be a nightmare if you are dealing with a computer system without a kernel.

Linux kernel is a multi-processes, multi-user system. It contains several components such as process management, memory management, filesystems, device control, networking etc. It responds to user's requests by allocating CPU, RAM, I/O devices, networking resources in a fair way. In short, the kernel of Linux OS is a big chunk of executable code in charge of handling all such requests. If the system want to be functional, the first thing is to down load and execute the kernel.

The size of kernel can be big or small totally depend on the application. For example, if you don't need PCMCIA you don't need to include it in the kernel. In general, bigger kernel provides more services but consume more CPU times making system slowing down. For server kernel, it is kind of big because we ask it to do a lot of thing. For client kernel, it is comparable small because it just simply execute the programs assigned by server.

Build a Kernel Image

The procedure for building a kernel can be seen from the README file, which comes with the kernel source and Kernel-HOWTO

Installing the kernel:

If you install the full sources, do a
```
	cd /usr/src
	gzip -cd linux-2.2.XX.tar.gz | tar xvf -
```
to get it all put it in place, where the options -cd in gzip mean that decompress the file then send the output to stdout Replace "XX" with the version number of the latest kernel.
Make sure you have no stale .o files and dependencies lying around:
```
	cd /usr/src/linux
	make mrproper
```

Configuring the kernel:

Do a make config to configure the basic kernel. make config needs bash to work: it will search for bash in $BASH, /bin/bash and /bin/sh (in that order), so one of those must be correct for it to work. To see further information about the kernel configuration, see Documentation/Configure.help.
Do not skip this step even if you are only upgrading one minor version.
Alternate configuration commands are:
- make menuconfig
- make xconfig
- make oldconfig : Default all questions based on the contents of your existing ./.config file.
Check the top Makefile for further site-dependent configuration
Finally, do a make dep to set up all the dependencies correctly.

Compiling the Kernel:

Do a make zImage or make bzImage to create a compressed kernel image.

If you configured any of the parts of the kernel as modules, you will have to do make modules followed by make modules_install. Read Documentation/modules.txt for more information.

Building a root filesystem

Beside the kernel, you also need a root file system to host programs, configurations, and data. Creating the root filesystem involves selecting files necessary for the system to run.

A root filesystem must contain everything needed to support a full Linux system. To be able to do this, the disk must include the minimum requirements for a Linux system:

The basic filesystem structure,
Minimum set of directories: /dev, /proc, /bin, /etc, /lib, /usr, /tmp,
Basic set of utilities: sh, ls, cp, mv, etc.,
Minimum set of config files: rc, inittab, fstab, etc.,
Devices: /dev/hd*, /dev/tty*, /dev/fd0,etc.,
Runtime library to provide basic functions used by utilities.

In order to build such a root filesystem, you need a spare device that is large enough to hold all the files before compression. There are several choices: here we choose ramdisk.

Use a ramdisk (DEVICE=/dev/ram0). In this case, memory is used to simulate a disk drive. To learn how to use ramdisk see the following link How to Use a Ramdisk for Linux.

Prepare the DEVICE with:

dd if=/dev/zero of=/dev/ram0 bs=1k count=4096

This command zeros out the device. Zeroing the device is critical because the filesystem will be compressed later, so all unused portions should be filled with zeros to achieve maximum compression.

Next, create the filesystem.

mke2fs -m 0 -N 2000 /dev/ram0

Next, make a mounting point and mount the device.

mkdir -p /tmp/ramdisk 
mount -t ext2 /dev/ram0 /tmp/ramdisk

Populating the filesystem

Here is a reasonable minimum set of directories for your root filesystem.

/dev -- Device files, required to perform I/O
/proc -- Directory stub required by the proc filesystem
/etc -- System configuration files
/sbin -- Critical system binaries
/bin -- Essential binaries considered part of the system
/mnt -- A mount point for maintenance on other disks
/usr -- Additional utilities and applications

First, create the directories listed above.

cd /tmp/ramdisk
mkdir dev proc etc sbin bin mnt usr usr/lib

For making /dev

cp -dpR /dev/fd[01]* /tmp/ramdisk/dev
cp -dpR /dev/tty[0-6] /tmp/ramdisk/dev

mknod console c 5 1

For the detail root filesystem contents, go to ramdisk.tar

Finally, after you set up all the libraries you need, run ldconfig to remake /etc/ld.so.cache on the root filesystem. The cache tells the loader where to find the libraries. You can do this with

ldconfig -r /tmp/ramdisk

When you have finished constructing the root filesystem, unmount it, copy it to a file and compress it:

umount /tmp/ramdisk
dd if=/dev/ram0 bs=1k | gzip -v9 > rootfs.gz

Transferring the root filesystem

dd if=rootfs.gz of=/dev/fd0 bs=1k seek=KERNEL_BLOCK

The Booting Procedure of Linux OS

All PC systems starts the boot process by executing code in ROM (specifically, the BIOS) to load the sector from sector 0, cylinder 0 of the boot drive. The boot drive is usually the first floppy drive (/dev/fd0) or first hard disk (/dev/hda). The BIOS then tries to execute this sector. On most bootable disks, sector 0, cylinder 0 contains either:

code from a boot loader such as LILO, which locates the kernel, loads it and executes it to start the boot proper; or
the start of operating system kernel, such as Linux.

When the kernel is completely loaded, it initializes device drivers and its internal data structures. Once it is completely initialized, it consults a special location in its image called the ramdisk word. This word tells it how and where to find its root filesystem. A root filesystem is simply a filesystem that will be mounted as '/'. The kernel has to be told where to look for the root filesystem; if it cannot find a loadable image there, it halts.

In some boot situations - often when booting from a diskette - the root filesystem is loaded into a ramdisk, which is RAM accessed by the system as if it were a disk. Also, the kernel can load a compressed filesystems from the floppy and uncompress it onto the ramdisk, allowing many more files to be squeezed onto the diskette.

Once the root filesystem is loaded and mounted, you see a message like:

VFS: Mounted root (ext2 filesystem) readonly.

Once the system has loaded a root filesystem successfully, it tries to execute the init program (in /bin or /sbin). init reads its configuration file /etc/inittab, looks for a line designated sysinit (/etc/rc.d/rc.sysinit, and executes the named script. This script is a set of shell commands that set up basic system services, such as fsck on hard disks, loading necessary kernel modules, initializing swapping, initializing the network, and mounting disks mentioned in /etc/fstab.

The script often invokes various other scripts to do modular initialization. For example, in the common SysVinit structure, the directory /etc/rc.d contains a complex structure of subdirectories whose files specify how to enable and shut down most system services. However, on a bootdisk the sysinit script is often very simple.

When sysinit script finishes control retruns to init, which then enters the default runlevel, specified in /etc/inittab with the initdefault keyword.

Detail Procedure for Build a Diskless PC cluster

The procedure for building a PC cluster can be divided into two parts.

NFS and NFS-Root server setup.
client node setup.

NFS and NFS-Root server setup

The setup for server is straight forward. There are several ways to install your server, either by installation disk (provided by RedHat) or network. For simplicity, let's assume you have a full installation of RedHat 6.2 on your server computer. If the installation procedure is correct, you should have a full functional Linux OS with the network connection.

Next, you must prepare a network booting disk for client computers as the follows:

Network booting from a floppy:

	   Down load etherboot-4.0 and etherboot-4.7.24 from
	   http://www.slug.org.au/etherboot. Get the file floppyload.bin
	   from etherboot-4.0/bin and get the file 3c905c-tpo.lzrom from
	   etherboot-4.7.24/src/bin32 then enter the following command to
	   make a booting floppy from network ( you must be super user )

	   # cat floppyload.bin 3c905c-tpo.lzrom > /dev/fd0

	   note: To get 3c905c-tpo.lzrom, you must go to etherboot-4.7.24/
		 src to carry out make. For detail, please see INSTALL
		 instruction.

Procedure for setting up the dhcp server

	  1. prepare /etc/dhcpd.conf file such as

-----------------------------------------------------------------------------
# Sample configuration file for ISCD dhcpd
#
# Don't forget to set run_dhcpd=1 in /etc/init.d/dhcpd
# once you adjusted this file and copied it to /etc/dhcpd.conf.
#

default-lease-time            21600;
max-lease-time                21600;

option subnet-mask            255.255.255.0;
option broadcast-address      192.168.0.255;

shared-network WORKSTATIONS {
    subnet 192.168.0.0 netmask 255.255.255.0 {
    }
}
group   {
    use-host-decl-names       on;
    option log-servers        192.168.0.254;

    host pc1 {
        hardware ethernet     00:01:02:92:70:69;
        fixed-address         192.168.0.1;
        filename              "/tftpboot/pc1/vmlinuz.3c905nomodPc1";
    }
    host pc2 {
        hardware ethernet     00:01:02:91:43:0F;
        fixed-address         192.168.0.2;
        filename              "/tftpboot/pc2/vmlinuz.3c905nomodPc2";
    }
    host pc3 {
        hardware ethernet     00:01:02:92:70:18;
        fixed-address         192.168.0.3;
        filename              "/tftpboot/pc3/vmlinuz.3c905nomodPc3";
    }
    host pc4 {
        hardware ethernet     00:01:02:91:43:45;
        fixed-address         192.168.0.4;
        filename              "/tftpboot/pc4/vmlinuz.3c905nomodPc4";
    }
}
-----------------------------------------------------------------------------

	2. Edit /etc/rc.d/init.d/dhcpd script file, find the line

        daemon /usr/sbin/dhcpd

	then change it to

        daemon /usr/sbin/dhcpd eth1 (for eth0, leave it don't change it)

	3. Check if the file /var/state/dhcp/dhcpd.leases exists, if not

	touch /var/state/dhcp/dhcpd.leases

	to create it.

	4. Add a soft link in /etc/rc.d/rc3.d

	ln -s ../init.d/dhcpd S65dhcpd

	Now, you can test dhcp server by putting the netboot floppy in
	client PC and turn the power on.

Procedure for setting up tftp server


	1. Check the /etc/services to make sure the following line exists
	   tftp            69/udp
	
	2. Check the /etc/inetd.conf to make sure the following line is
	   uncomment out.

	tftp    dgram   udp     wait    root    /usr/sbin/tcpd  in.tftpd

	3. Start inetd again to read the new configuration files.

	4. tftp daemon is invoked by inetd, you must make sure the
	   /etc/hosts.allow contains the following line

	ALL: 192.168.0.

	or more specifically the following lines

	#bootpd:    0.0.0.0 (for bootpd uncomment this line)
	in.tftpd:  192.168.0.
	portmap:   192.168.0.

	5. Add the host name in /etc/hosts and must be consist with
	   the content of /etc/dhcpd.conf

-------------------------------------------------------------------------
192.168.0.1             pc1
192.168.0.2             pc2
192.168.0.3             pc3
192.168.0.4             pc4
-------------------------------------------------------------------------

Create pc1, pc2, pc3, and pc4 client root directory on server's /tftpboot directory as the following command.
```
	mkdir -p /tftpboot/pc1
	mkdir -p /tftpboot/pc2
	mkdir -p /tftpboot/pc3
	mkdir -p /tftpboot/pc4
```

Prepare kernel for client node. When setting kernel parameters, you must make sure you specify the following.

	* No module support (for simplicity).
	* Support for your specific network card, for example, 3com 3c905c.
	* RAM disk support.
	* BOOTP support.
	* /proc filesystem support.
	* NFS filesystem support.
	* Root file system on NFS

After you create a new kernel, named for example bzImage, from the kernel source. Carry out the following command to make a network bootable kernel. (/usr/local/bin/mkNetKernelA.bat)
```
	./mknbi-linux --rootdir=/tftpboot/pc$1/pc$1root 
         /usr/src/linux/arch/i386/boot/bzImage >
	 /tftpboot/pc$1/vmlinuz.3c905nomodPc$1
```

Prepare root filesystem for each client.

	Copy the sever root filesystem to /tftpboot/pc1 and delete any
	unnecessary files or packages to reduce the size of client root
	filesystems. Modify the network setup, NFS setup, and others
	in /tftpboot/pc1/etc directory.

Reference

Linux Documentation Project (http://www.linuxdoc.org)
Contains a lot of HOWTO for various aspects of Linux OS. Further details of this article can be found here.
The Beowulf Project (http://www.beowulf.org)
The Beowulf Project official site.
Etherboot (http://www.slug.org.au/etherboot)
Boot a kernel image over an Ethernet network.