RAID - adding disks to a Pi


Figure 1: RAID-0

I watched a YouTube video: LINK, that looked at using USB flash drives in a RAID-0 configuration to see if they could increase read/write speeds. In this video the presenter used a 10 port USB hub to construct different sized arrays: 2, 4, 8 disks, as shown in figure 2. When i first watched this video i wasn't sure how this would work as the whole premise of RAID-0 is to read and write data to multiple disks in parallel e.g. if a system wants to store a 200K file it is broken down into two 100K blocks (stripes), each written to a separate disk in parallel, therefore doubling your write performance (overlapping transfers). For a quick discussion on RAID-0: LINK. The problem with the USB hub approach is that each disk shares a single bus, therefore, you can't transfer multiple blocks of data to different disks at the same time, USB is a serial bus. In the video there did seem to be some performance gains, which was interesting, time to build my own system.


Figure 2: YouTube video: LINK

To replicate this experiment I had an old USB2 4 port hub and two USB2 16GB flash drives. The USB 2.0 standard defines three speeds: Low=1.5Mbps, Full=12Mbps and Fast=480Mbps. If you have a USB 2.0 device you can not automatically assume it will be working its max speed as the USB ports may share a common controller, therefore, the bus bandwidth may be shared between multiple devices e.g. mouse, keyboard, Ethernet etc. In theory performing a simplistic calculation the bandwidth across a USB2 bus should be 60MBps, but this does not take into consideration protocol and system overheads. Reading around there seems to be agreement that the typical max bandwidth of USB2 is 280Mbps or 35MBps, this increasing to a theoretically maximum bandwidth of 53MBps (never seen). Therefore, for a RAID-0 USB flash drive system to work the bottleneck needs to be inside the USB drive i.e. the process of reading / writing to the NAND flash memory cells i.e. these RW processes need to be significantly slower than 35Mbps, otherwise the communication time will become the bottleneck.

To create a RAID array you have two choices: hard or soft. For max performance you will need a dedicated hardware RAID controller, reducing CPU load, but these are expensive and unsurprisingly not available for the Raspberry Pi :). The other alternative is mdadm a command line tool to create and monitor software RAID devices. The downside of using this approach to increase disk performance is that it does increase CPU load. To install:

sudo apt-get install mdadm

Next, connect the USB hub and plug in the two disks. To identify the devices at the command line type: lsblk, you can also specify the attributes displayed e.g.:

sudo lsblk -o NAME, SIZE, FSTYPE, MOUNTPOINT

For this particular system the two USB disks were /dev/sda and /dev/sdb. To create the RAID-0 disk need to use the mdadm command below, for more info the man page is quite informative i.e. man mdadm. To combine the two disks into a single RAID drive enter:

sudo mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 /dev/sda /dev/sdb

This will create a single drive md0. To check all is good enter:

cat /proc/mdstat

Next, we need to create a filing system on this disk and mount it into the filing space. As this array is on a Pi, a Linux system, went for the standard ext4 filing system. At the command line enter, to be honest can't remember if the force options (-F) is needed :

sudo mkfs.ext4 -F /dev/md0

To mount, need to create a directory in the existing file structure, went for /mnt/md0, then mount:

sudo mkdir -p /mnt/md0
sudo mount /dev/md0 /mnt/md0

That should be it, we now have a 32GB RAID-0 disk, in theory "doubling" the systems transfer speed and halving its reliability :). To mount disk edit the file /etc/fstab as shown below. To get each disks UUID string use the command blkid, as shown in figure 3.

proc                  /proc           proc    defaults          0       0
PARTUUID=434d1053-01  /boot           vfat    defaults          0       2
PARTUUID=434d1053-02  /               ext4    defaults,noatime  0       1

UUID=be1103c5-fb4b-476f-b7fd-09f0a4b747d6  /mnt/hdd  ext4  defaults  0  2
UUID=ed123b53-f863-45a9-a45d-38f25ef07dad  /mnt/md0  ext4  defaults  0  2

# a swapfile is not a swap partition, no line here
#   use  dphys-swapfile swap[on|off]  for that

Figure 3: blkid command

To mount these components/disks i.e. Pi, hub and disks, went for my standard Pi case LINK, plus some extra bits. Also added some brackets to hold an USB hard disk (was thinking about backups and swap disk) and like the other case a reset and shutdown button, plus a status LED. The 3D models can be downloaded here: LINK. These models and final assembled system are shown below:


Figure 4: 3D models






Figure 5: system

To test the performance of these disks i.e. system's SDCard, single USB flash drive, USB RAID-0 and USB hard disk, i use command line tool dd to read and write "data" to the disks. One of the main problems of using this command to measure transfer speeds is that Linux uses Page Caches to accelerate access to these types of non volatile storage i.e. when data is read or write it is buffered in memory, therefore, when you read this data again you will access it from this buffer/cache, hiding the true transfer speed of your disks, which of course greatly helps the Pi since it is running from an SDCard. To view how memory is currently being used at the command line type: free -ht, as example for this Pi system is shown below:


Figure 6: memory

To compare the speed of these disks with memory a ram drive was also created i.e. a temporary virtual disk created in RAM. To create this virtual disk the command below is used. The mount points for all the disks are shown in figure 7.

sudo mount -t tmpfs -o size=200M tmpfs /mnt/ram

Figure 7: disks

To write and then read a 128MB block of data to each disk the following scripts were used. The read script first synchronises the disk, writing back buffered data then flushes the cache to ensure data is read from the actual disk.

#!/bin/sh
dd if=/dev/zero of=/mnt/hdd/dump bs=256K count=512

Figure 8: write script

sync; echo 3 | tee /proc/sys/vm/drop_caches
dd if=/mnt/hdd/dump of=/dev/null bs=256K count=512

Figure 9: read script

In the above scripts the USB hard disk's write / read performance is bench marked, this process was repeated for each disk (changing the mount point), the results are shown below:

Unsurprisingly the RAM drive is the fastest. The performance of the SDCard was a little strange on writes, ignoring the 200MBps results, average is around 22MBps for read and writes. These high values were repeatable, not sure what this shows, would jump to the conclusion of caching, but maybe as this disk also hosts the root directory maybe it could be related to that? The USB drive was perhaps too cheap and cheerful, significantly slower write performance, but it was a branded disk. The hard disk drive showed good performance, i'm guessing that the faster write performance was due to caching hardware (buffers) on the external hard disk. In comparison there does seem to be an increase in performance using USB RAID-0, this having the highest read speed of the physical disks and if compared to the other USB drives significantly better write performance. However, this test does just look at the transfer speed i.e. block transfers, rather than random access performance e.g. reading and writing small files across the filing system. This is a problem for most USB flash drives as they are optimised for block transfers i.e. when you wish to write data you have to erase a complete block, rather than a single memory location. To determine what the block size is for the USB sticks found this script on StackOverflow: LINK, trimmed it down a little, result below:

#!/bin/bash

TEST_FILE=dump
TEST_FILE_SIZE=134217728

# Block sizes of 64b 128b 256b 512b 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M
for BLOCK_SIZE in 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576
do
  # Calculate number of segments required to copy
  COUNT=$(($TEST_FILE_SIZE / $BLOCK_SIZE))

  # Clear kernel cache to ensure more accurate test
  [ $EUID -eq 0 ] && [ -e /proc/sys/vm/drop_caches ] && echo 3 > /proc/sys/vm/drop_caches

  # Create a test file with the specified block size
  DD_RESULT=$(dd if=/dev/zero of=$TEST_FILE bs=$BLOCK_SIZE count=$COUNT conv=fsync 2>&1 1>/dev/null)

  # Extract the transfer rate from dd's STDERR output
  TRANSFER_RATE=$(echo $DD_RESULT | \grep --only-matching -E '[0-9.]+ ([MGk]?B|bytes)/s(ec)?')

  # Output the result
  echo $BLOCK_SIZE $TRANSFER_RATE
done

Figure 10: block transfer script

Running this script i got the following results:


Figure 11: block transfer sizes

From these tests it does look like the block size does have a significant impact on transfer speed. To be honest, not sure of all the implications here i.e. what is actually happening on this disk. Transfer speeds seem to level out around 4K. I'm not sure how smaller files are stored in this flash memory e.g. do smaller files incur the overhead of erasing a full block (4K), or can it erase smaller regions e.g. half blocks. When writing smaller files how are these block used i.e. is a 64B file allocated a 4K block in flash, which seem too wasteful or will multiple 64B files be stored in a single block, in which case the USB stick will have to perform a series of operations: read - buffer - erase - update buffer - write back. Which could explain why there is a significant drop in transfer speed for the smaller block sizes. For general usage this would be an issue, so the previous raw bandwidth figures do not give a complete picture of disk performance, need to consider the types/sizes of files transferred to this disk. Whilst testing the RAID-0 disk recorded this video showing that both USB disks do actually work at the "same" time i.e. LEDs flash at the same time, showing the stripes being transferred: LINK.

After these tests decided to benchmark some other USB flash drives to see if these results looked typical, some small variations, but all within the same ball park, again some of the disks displayed strange behaviours, seemingly locking up for a few seconds, not sure why, perhaps drivers/OS being loaded, but examples below:

The Raspberry Pi has 1GB of main memory, it does have a swap "partition", but as this is on the SDCard its transfer speed is limited to 20-ish MBps. Typically this virtual memory is not used, the only time i have seen it being used was when i was streaming video from another Pi (3D printer camera: LINK), the web browser must of had a memory leak, as you watched the video, you could see the available memory slowly decreasing, until finally the browser would crash. The size and location of the swap drive is defined in /etc/dphys-swapfile. To add a 512MB swap drive onto the external USB hard disk edit the following lines.

CONF_SWAPSIZE=512
CONF_SWAPFILE=/mnt/hdd/swap

Figure 12: /etc/dphys-swapfile

The reason for using the external hard disk is that it has good random access performance and good transfer speeds. Combined with what i suspect is a reasonable hardware cache. To create this swap file at the command line enter

dd if=/dev/zero of=/mnt/hdd/swap bs=1M count=512

To enable this swap file enter the commands below. Thought that you would also need to declare this in the file /etc/fstab, but it does look like you have to, they is a comment to this affect in the default file.

mkswap /mnt/hdd/swap
swapon /mnt/hdd/swap

To turn swap off used the command swapoff same syntax. To test if this swap drive is being used i found this python code online, basically it creates a list, then repeatedly appends blocks of 100MB. Initially the Pi has 600MB-ish of free memory, when all of this free memory has been used it will start to page out memory into the swap drive, as shown in figure 13 below, well when the Pi starts to swap for some reason the free command stops working, but you can see the external USB disk LED flashing away

import os
import time
import psutil

a=[]
for block in range(0, 8): 
  a.append(' ' * 100 * 10**6)
  process = psutil.Process(os.getpid())
  print( process.memory_info().rss)
  time.sleep(1)
  os.system("free -ht")
  time.sleep(1)

Figure 13: block transfer sizes

The original aim for this creation was a demo tool for teaching, but would be nice to find a role for this device, perhaps a NAS device?

Creative Commons Licence

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Contact details: email - mike.freeman@york.ac.uk, telephone - 01904 32(5473)

Back