How to Build a Parallel Computing Cluster


In this howto, we are going to describe the procedure of building a diskless parallel computing cluster for computational physics. First of all, we will give a brief overview of Linux Operating system. Based on this knowledge, we hope readers will have a better understanding of the whole setup procedure. This howto is divided into the following section.
  1. Introduction.
  2. Overview of Linux Operating system.
  3. The detail procedure for building the whole cluster.
  4. Reference

Introduction

Computer simulation is a powerful tool for the study of complex systems. Due to its mighty computational power and realistic animation capability, the traditional trial-and-error investigation process can be speeded up by changing the parameters through the interactive interface and the understanding of the simulated result can be comprehended by computer graphic or animation. The simulation also provides an oppotunity for investigators to look at some processes which are not easy to be observed or too expensive to be really carried out.

To solve a complex problem correctly and efficiently, there are several concerns in computational physics. One of these concerns is how to speed up the simulation process. One solution for speeding up is to use or buy expensive parallel supercomputer. For most research groups, this is not practical. Thanks to the advanced technology of current PC, we can build up parallel computing cluster entirely from commodity parts both in hardware and software within an affordable budget.

The original PC cluster project, also called Beowulf project, was started at the Center of Excellence in Space Data and Information Sciences NASA in early 1994. It is a system which usually consists of one master or server node, and one or more client nodes connected together via Ethernet. The master node controls the whole cluster and serves files to the client nodes. The master node is also the cluster's console and gateway to the outside Internet world.

The advantages of a Beowulf-like cluster are:

In last summer, we built a Beowulf-like parallel computing cluster for testing this idea. The cluster includes one master node, one NFS and NFS-root server node, and several diskless client nodes. All of these nodes are connected to switch hups by ethernet to achieve the parallel computing capability. The cluster hardware configuration is shown in Figure 1. . The detail specification of the cluster is listed in the following:

Hardware Configuration

Software Configuration

In this article, we like to share our experience of how to build the cluster with others. It is difficult to give a precise step-by-step procedure for readers to follow. The reason is very simple, the Linux OS evolve so quickly. Today's setup might become obsolete tommorrow. Instead of giving a detail description, we like to give a general guile line, basic principle and where to look for additional information on internet. Our experience strongly suggests that for successfully building up your own cluster, some knowledge about the Linux OS is necessary. Only with these knowlegde, when something goes wrong, you will have enough confident to try to solve it. After all, the majority of the reader of this article are physicists and physicists don't like doing thing blindly.

Linux Operating System

The challenge of buind a diskless-client PC cluster is mainly on how to boot kernel and how to mount the root file system from a remote server. Since client node doesn't have a hard disk to host its kernel and file system, so they must be provided by other sever through the connection of network.

The server node needs a full installation of RedHat Linux OS.

The server must prepare a simplified kernel image for client nodes to down load.

The server must prepare a mimimum root file system for client nodes to mount.

The client node must have a network boot disk to boot from its floppy driver then ask for the kernel image from network.

The client node found a kernel image from the server, down load it, uncompress it, and execute it.

The kernel image for client than will mount its root file system as a NFS-root file system. Based on the above analysis, we need to know

What is Linux Kernel?

The purpose of Linux Kernel is to insulate the hardware complexity from users. It provides a set of system function calls for user to avoid dealing with the hardware details. For example, if users want to access a file from hard disk, just simply issue a read system call to kernel and kernel will handle the detail such as moving disk R/W arm to the correct (track, sector) position of hard disk and return the contents of file to the users. From this aspect, you can see it would be a nightmare if you are dealing with a computer system without a kernel.

Linux kernel is a multi-processes, multi-user system. It contains several components such as process management, memory management, filesystems, device control, networking etc. It responds to user's requests by allocating CPU, RAM, I/O devices, networking resources in a fair way. In short, the kernel of Linux OS is a big chunk of executable code in charge of handling all such requests. If the system want to be functional, the first thing is to down load and execute the kernel.

The size of kernel can be big or small totally depend on the application. For example, if you don't need PCMCIA you don't need to include it in the kernel. In general, bigger kernel provides more services but consume more CPU times making system slowing down. For server kernel, it is kind of big because we ask it to do a lot of thing. For client kernel, it is comparable small because it just simply execute the programs assigned by server.

Build a Kernel Image

The procedure for building a kernel can be seen from the README file, which comes with the kernel source and Kernel-HOWTO

Installing the kernel:

Configuring the kernel:

Compiling the Kernel:

Do a make zImage or make bzImage to create a compressed kernel image.

If you configured any of the parts of the kernel as modules, you will have to do make modules followed by make modules_install. Read Documentation/modules.txt for more information.

Building a root filesystem

Beside the kernel, you also need a root file system to host programs, configurations, and data. Creating the root filesystem involves selecting files necessary for the system to run.

A root filesystem must contain everything needed to support a full Linux system. To be able to do this, the disk must include the minimum requirements for a Linux system:

In order to build such a root filesystem, you need a spare device that is large enough to hold all the files before compression. There are several choices: here we choose ramdisk.

Use a ramdisk (DEVICE=/dev/ram0). In this case, memory is used to simulate a disk drive. To learn how to use ramdisk see the following link How to Use a Ramdisk for Linux.

Prepare the DEVICE with:

dd if=/dev/zero of=/dev/ram0 bs=1k count=4096

This command zeros out the device. Zeroing the device is critical because the filesystem will be compressed later, so all unused portions should be filled with zeros to achieve maximum compression.

Next, create the filesystem.

mke2fs -m 0 -N 2000 /dev/ram0

Next, make a mounting point and mount the device.

mkdir -p /tmp/ramdisk 
mount -t ext2 /dev/ram0 /tmp/ramdisk

Populating the filesystem

Here is a reasonable minimum set of directories for your root filesystem.

First, create the directories listed above.

cd /tmp/ramdisk
mkdir dev proc etc sbin bin mnt usr usr/lib

For making /dev

cp -dpR /dev/fd[01]* /tmp/ramdisk/dev
cp -dpR /dev/tty[0-6] /tmp/ramdisk/dev
or
mknod console c 5 1

For the detail root filesystem contents, go to ramdisk.tar

Finally, after you set up all the libraries you need, run ldconfig to remake /etc/ld.so.cache on the root filesystem. The cache tells the loader where to find the libraries. You can do this with

ldconfig -r /tmp/ramdisk

When you have finished constructing the root filesystem, unmount it, copy it to a file and compress it:

umount /tmp/ramdisk
dd if=/dev/ram0 bs=1k | gzip -v9 > rootfs.gz

Transferring the root filesystem

dd if=rootfs.gz of=/dev/fd0 bs=1k seek=KERNEL_BLOCK

The Booting Procedure of Linux OS

All PC systems starts the boot process by executing code in ROM (specifically, the BIOS) to load the sector from sector 0, cylinder 0 of the boot drive. The boot drive is usually the first floppy drive (/dev/fd0) or first hard disk (/dev/hda). The BIOS then tries to execute this sector. On most bootable disks, sector 0, cylinder 0 contains either:

When the kernel is completely loaded, it initializes device drivers and its internal data structures. Once it is completely initialized, it consults a special location in its image called the ramdisk word. This word tells it how and where to find its root filesystem. A root filesystem is simply a filesystem that will be mounted as '/'. The kernel has to be told where to look for the root filesystem; if it cannot find a loadable image there, it halts.

In some boot situations - often when booting from a diskette - the root filesystem is loaded into a ramdisk, which is RAM accessed by the system as if it were a disk. Also, the kernel can load a compressed filesystems from the floppy and uncompress it onto the ramdisk, allowing many more files to be squeezed onto the diskette.

Once the root filesystem is loaded and mounted, you see a message like:

VFS: Mounted root (ext2 filesystem) readonly.

Once the system has loaded a root filesystem successfully, it tries to execute the init program (in /bin or /sbin). init reads its configuration file /etc/inittab, looks for a line designated sysinit (/etc/rc.d/rc.sysinit, and executes the named script. This script is a set of shell commands that set up basic system services, such as fsck on hard disks, loading necessary kernel modules, initializing swapping, initializing the network, and mounting disks mentioned in /etc/fstab.

The script often invokes various other scripts to do modular initialization. For example, in the common SysVinit structure, the directory /etc/rc.d contains a complex structure of subdirectories whose files specify how to enable and shut down most system services. However, on a bootdisk the sysinit script is often very simple.

When sysinit script finishes control retruns to init, which then enters the default runlevel, specified in /etc/inittab with the initdefault keyword.

Detail Procedure for Build a Diskless PC cluster

The procedure for building a PC cluster can be divided into two parts.
  1. NFS and NFS-Root server setup.
  2. client node setup.

NFS and NFS-Root server setup

The setup for server is straight forward. There are several ways to install your server, either by installation disk (provided by RedHat) or network. For simplicity, let's assume you have a full installation of RedHat 6.2 on your server computer. If the installation procedure is correct, you should have a full functional Linux OS with the network connection.

Next, you must prepare a network booting disk for client computers as the follows:

Reference

  1. Linux Documentation Project (http://www.linuxdoc.org)

    Contains a lot of HOWTO for various aspects of Linux OS. Further details of this article can be found here.

  2. The Beowulf Project (http://www.beowulf.org)

    The Beowulf Project official site.

  3. Etherboot (http://www.slug.org.au/etherboot)

    Boot a kernel image over an Ethernet network.