AWS EC2 Performance Storage cloud

In this part we will discuss using Software RAID in Linux and setting up a special mdadm mirror between the ephemeral SSD and the EBS volume. Setting the flag --write-mostly with EBS device on the mirror will ensure that the md driver avoids reading from ebs if at all possible and send all reads to the SSD. This option was originally added when mirroring over a slow network interface, but performs equally well to concentrate reads on an SSD.

In our test environment we have a c3.8xlarge instance running, this is the disk configuration:

1
2
3
4
5
6
7
$ lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
xvda    202:0    0     8G  0 disk
└─xvda1 202:1    0     8G  0 part /
xvdb    202:16   0   320G  0 disk
xvdc    202:32   0   320G  0 disk
xvdd    202:48   0   400G  0 disk

/dev/xvdb and /dev/xvdc are the 320GB SSD ephemeral disks available to a c3.8xlarge instance while /dev/xvdd is a provisioned IOPS EBS volume with 4000 iops.

Installing mdadm and fio on Ubuntu 14.04

mdadm is strangely not installed by default on ubuntu 14.04, so the first step is to install it and while we are at it, lets also install our io performance tester fio.

1
$ apt-get -y install mdadm fio

For some strange reason mdadm depends on the postifx and I got a pop asking me to select a mail delivery option which I selected localhost for – I don’t need this for now, if in production you might want to setup an SES SMTP with your new postfix installation.

Setting up software raid-1 mirror

First we will setup a standard software mirror or raid-1 device between /dev/xvdc (SSD) and /dev/xvdd (EBS) but tell md to only use the latter for writes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ sudo mdadm --create /dev/md1 --level=1 --raid-devices=2 \
> --bitmap=internal --write-behind=1024 \
> --assume-clean /dev/xvdc --write-mostly /dev/xvdd

mdadm: /dev/xvdc appears to contain an ext2fs file system
    size=335515648K  mtime=Thu Jan  1 00:00:00 1970
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
mdadm: largest drive (/dev/xvdd) exceeds size (335384384K) by more than 1%
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.

Ignore the notes about the previous ext2 file system detected on the SSD, cloud-init on ubuntu formats ephemerals disks by default. Also ignore the mdadm message about the difference in device size. Next lets verify what we have just configured by looking at /proc/mdstat:

1
2
3
4
5
6
7
$ sudo cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 xvdd[1](W) xvdc[0]
      335384384 blocks super 1.2 [2/2] [UU]
      bitmap: 0/3 pages [0KB], 65536KB chunk

unused devices: <none>

See the (W) next to /dev/xvdd (EBS) indicating this is the write-mostly device. We should now have a raw device appearing in /dev/md1 with no file system:

1
2
$ sudo file -s /dev/md1
/dev/md1: data

Next let us setup an ext4 file system on the new raid-1 device:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ sudo mkfs.ext4 /dev/md1

mke2fs 1.42.9 (4-Feb-2014)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
20963328 inodes, 83846096 blocks
4192304 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
2559 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
  32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
  4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

And mount it to /mnt:

1
2
3
4
5
$ sudo mount /dev/md1 /mnt/
$ sudo mount | grep /dev/md1
/dev/md1 on /mnt type ext4 (rw)
$ sudo df -h | grep /mnt
/dev/md1        315G   67M  299G   1% /mnt

Now we are ready to start testing with fio and examine the results.

Test results

We conducted the standard io tests for read, write sequential and random workloads and here are the results of the md raid-1 array in comparison to the native SSD and EBS volumes:

Analysis

  • Sequential and random write results are identical to those of EBS, which is expected
  • Sequential reads are almost identical to the results of the SSD which is great
  • Disappointing random read results – this is where SSD’s excel. We only see a 50% improvement in performance from EBS but nowhere near the expected performance of the SSD.

Next is Part 3 where we will look at Kernel block level caching with bcache.

Comments