Raspberry Pi Cluster

Figure 1 : Pi Beowulf Cluster

The compulsory Raspberry Pi cluster, visually impressive, theoretically powerful and dubiously useful, as shown in figure 1. Orignally had 37 nodes, a head node and 36 slaves, however, by the time i got around to taking a picture had to borrow 7 for other things. This is the classic Multiple-Instruction Multiple-data (MIMD) architecture, often referred to as a cluster, but i much prefer the traditional name: Beowulf cluster (Link ). Beowulf is the main character in the Old English poem, a man with the strength of 30 i.e. a Beowulf computer has the strength of 30 normal computers. My first introduction to parallel processing was in the 80s using the Inmos Transputer (Link ), programmed using Occam (Link ). Both the Transputer and the Occam programming language had an elegant simplicity about them, allowing you to construct and program complex parallel architectures. However, you soon find out that designing an algorithm to take advantage of this parallelism is easier said than done e.g. you would distribute a program over multiple nodes just to find out that context switches and communication delays meant that your program ran faster on a single Transputer rather than the array. The same is true for the Raspberry Pi, just because you can configure them as a Beowulf cluster doesn't mean you should, or to put it another way, you need to find an application that takes advantage of the this architectures strengths.

Each Raspberry pi is connected to a motherboard which provides its power, basic IO protection and eight LEDs, as shown in figures 2 and 3.

Figure 2 : Single node

Figure 3 : Pi Beowulf nodes

Figure 4 : dhcp / file server

To program this architecture you can use simple shell scripts i.e. run the same program on each machine using a shared disk to pass data and results. Each computer is identified by its IP address, assigned using a dhcp server (old core2 duo box (figure 4) in my case, but could be another Pi). To simplify configurations the dhcp.conf file is edited to assign IP addresses based on the computer's MAC address e.g.

host pi01 {
    hardware ethernet 00:11:22:33:44:55;
    fixed-address 192.168.50.1;
}

The computer's name can then be linked to the IP address. The computers name is defined in the file /etc/hostname, each file is edited with the computer's name e.g. Pi01 etc. To link the computer's name to the IP address and define the names and IP addresses of each machine in the cluster the /etc/hosts file is editted e.g.

127.0.0.1       localhost
192.168.50.1    pi01
192.168.50.2    pi02
192.168.50.3    pi03
192.168.50.4    pi04

Could of gone for a static IP address, defined in /etc/network/interfaces, but having a centralised set up looked like it would be more manageable. The hosts file is then used on each machine, being lazy manually copied this file to each computer using the secure copy command:

scp /ect/host pi@192.168.50.2:/etc

A better solution would be to share one file between all of the machines, so that its easier to update / manage. The Beowulf cluster also needs to pass data between the different nodes, this can be achieved by mounting a nfs drive. On the head Pi to export a folder edit the /etc/exports file e.g.:

/home/shared    192.168.50.0/24 (rw, fsid=0, insecure ,no_subtree_check, async)

For each of the cluster nodes to mount the exported folder edit the /etc/fstab file e.g.:

Pi01:/home/shared    /mnt/shared nfs auto 0 0

To remove the need for logging into each node the system can be configured to use passwordless SSH. On the head node type (press Enter when prompted for a password):

ssh-keygen

Then to copy the generated key type:

ssh-copy-id [user]@[machine]

If you are logged into a node as pi all you now need to do to log into a different node is (a password is now not required):

ssh pi02

This also simplifies the process of executing programs on different nodes. To execute a program on a different node all you need to enter is either:

ssh [user]@[server] '[command]'
ssh [user]@[server] 'bash -s' < local_script.sh

To allow finer grain parallelism the Message Passing Interface (MPI) can be used (Link). This library supports a number of different languages: C, Fortran and Java, allowing you to create communications channels between different nodes in the cluster. The Raspberry Pi Beowulf cluster was based on an existing Ubuntu based machine (Link)(Local). This covers the basic set up and testing. Another nice example of a Raspberry Pi based machine: (Local). The traditional Hello World program using the MPI library is shown below:

#include < stdio.h >
#include < stdlib.h >
#include < mpi.h >

int main(int argc, char **argv)
{
	int err, size, id, length;

	char name[BUFSIZ];

	err = MPI_Init(&argc, &argv);
	err = MPI_Comm_rank(MPI_COMM_WORLD, &id);
	err = MPI_Comm_size(MPI_COMM_WORLD, &size);
	err = MPI_Get_processor_name(name, &length);

	printf("%s: Hello World from process %i out of %i processes \n", name,id, size);

	err = MPI_Finalize();
	exit(0);

}

To execute this code on eight different machines the following command line is used:

mpiexec -f hosts -n 8 /home/pi/hello_word

Where the hosts file (below) defines a pool of prcoessing node names i.e. pi01 - pi036, the final result is shown in figure 5

Figure 5 : Hello World screen shot

pi01
pi02
pi03
pi04
pi05
pi06
pi07
pi08
...

Figure 6 : hosts file (shortened)

Not an expert in MPI, but it seems to assume that you have one program running on each node. The program can identify which node it is running on and then select the correct function to execute. Communications is via point-to-point links or scatter and gather techniques. Links to a couple of good tutorials:(Link)(Link). For open days needed a visual demo to show the cluster in action, decided to go for the games of life (Link), divding the virtual world into a number of slices, each slices simulated on a different Raspberry Pi. Being lazy found some existing examples on the web (Local)(Local)(Local), sorry lost original links. Not sure about the level of parallelism in some of these examples, but all worked, with varying levels of success. To be fair parallelising the game of life was always going to be tricky owing to the bottleneck of returning results to the head node to be displayed. As a result, not the fastest game of life you will see. A video of the game of life in action, running on eight nodes: (Local) (Local)

Figure 7 : Games of Life, start (left), end (right)

To show the level of precessing load on each Pi i added a crontab entry to schedule the following python program (below) to run every two seconds. This program uses a thermometer scale to display the CPU load on the LED array, perhaps a little over kill i.e. context switches reducing processing performance, but does make a nice display. A full code listing can be found here: (Local).

while True:
	loading = (psutil.cpu_percent() / 100) * 255
	bargraph(loading)

	#print str(loading)

	time.sleep(2)

Future of the cluster, not sure, but would like for find an application to show the Beowulf cluster off to its best.

Creative Commons Licence

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Contact details: email - mike.freeman@york.ac.uk, telephone - 01904 32(5473)

Back