AWS EC2 Performance Storage cloud

bcache is a Linux kernel’s block layer cache (hence the name, block cache). It allows one or more fast storage devices such as an SSD to act as a cache for one or more slower drives, effectively creating hybrid drive. Sounds like just the right tool for the job.

bcache has a few interesting features the following are worth noting:

  • A single cache device can be used to cache multiple devices.
  • Recovers from unclean shutdown.
  • Many write options: Writethrough, writeback and writearound.
  • Designed for SSD’s by never performing random writes and by turning them into sequential writes instead.
  • It was merged into the Linux kernel mainline in kernel version 3.10.

flashcashe and dm-cache are similar to bcache and offer more or less similar functionality.

Installtion and configuration

In ubuntu this is really straight forward:

1
$ sudo add-apt-repository ppa:g2p/storage
1
$ sudo apt-get update
1
$ sudo apt-get install bcache-tools

If you remember this is what our disks structure looked like:

1
2
3
4
5
6
7
$ sudo lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
xvda    202:0    0     8G  0 disk
└─xvda1 202:1    0     8G  0 part /
xvdb    202:16   0   320G  0 disk
xvdc    202:32   0   320G  0 disk
xvdd    202:48   0   400G  0 disk

We first need to make sure that our backing device /dev/xvdd (EBS) and our cache device /dev/xvdc (SSD)are both formatted with ext4:

1
2
3
4
$ sudo file -s /dev/xvdc
/dev/xvdc: Linux rev 1.0 ext4 filesystem data, UUID=d77daab7-66db-41aa-8532-3b9a9a94777b (extents) (large files) (huge files)
$ sudo file -s /dev/xvdd
/dev/xvdd: Linux rev 1.0 ext4 filesystem data, UUID=bf188b21-ee5d-4951-b87e-2a3db3ac31d7 (extents) (large files) (huge files)

Then we have to remove non bcache superblocks from each device just in case:

1
2
3
4
5
6
$ sudo wipefs -a /dev/xvdd
2 bytes were erased at offset 0x438 (ext4)
they were: 53 ef
$ sudo wipefs -a /dev/xvdc
2 bytes were erased at offset 0x438 (ext4)
they were: 53 ef

Next we create our bcache devices using make-bcache with -B for backing devices and -C for cache device as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ sudo make-bcache -B /dev/xvdd
UUID:         618a30c4-7af5-459f-bdf9-6a0ecd1d34f0
Set UUID:     5bd5353d-043c-45a5-86c6-74e9c8db3a64
version:      1
block_size:       1
data_offset:      16
$ sudo make-bcache -C /dev/xvdc
UUID:         7724af30-b499-4893-9fe1-68736062fcd9
Set UUID:     2c11aa0d-9f20-40d3-9067-1d1434330475
version:      0
nbuckets:     655304
block_size:       1
bucket_size:      1024
nr_in_set:        1
nr_this_dev:      0
first_bucket:     1

bcache-tools now ships udev rules, and bcache devices are known to the kernel immediately in systems such as ubuntu. The devices show up as /dev/bcache<N> as well as (with udev) /dev/bcache/by-uuid/<uuid> and /dev/bcache/by-label/<label>

Next we format the new device with ext4 as follows

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ sudo mkfs.ext4 /dev/bcache0
mke2fs 1.42.9 (4-Feb-2014)
Discarding device blocks: done
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
26214400 inodes, 104857598 blocks
5242879 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
3200 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
  32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
  4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
  102400000

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

and mount it:

1
2
3
4
$ sudo mount /dev/bcache0 /mnt
$ sudo lsblk /dev/bcache0
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
bcache0 251:0    0   400G  0 disk /mnt

And finally we need to attach the cache device to the backing device, this can be done copying the UUID in /sys/fs/bcache/ and runnin the following command:

1
$ echo 2c11aa0d-9f20-40d3-9067-1d1434330475 > /sys/block/bcache0/bcache/attach

Replace with your own UUID.

By default, bcache uses writethrough caching. With writethrough, only reads are cached and writes are written directly to the backing drive:

1
2
$ cat /sys/block/bcache0/bcache/cache_mode
[writethrough] writeback writearound none

We can get some serious improvements by enabling writeback caching:

1
$ echo writeback > /sys/block/bcache0/cache_mode

Caution: using writeback mode is not as reliable as writethrough.

By default, bcache doesn’t cache everything. It tries to skip sequential IO -because you really want to be caching the random IO, and if you copy a 10 gigabyte file you probably don’t want that pushing 10 gigabytes of randomly accessed data out of your cache. But since we will be benchmarking reads from cache, we want to disable that behaviour:

1
echo 0 > /sys/block/bcache0/bcache/sequential_cutoff

Test Results

Now we are ready for some testing we will do two tests, one with writethrough and another with writeback. Note that we will perform the tests with warm cache.

Analysis

  • Sequential and random read results after warming the cache is similar to native SSD which is fantastic.
  • Sequential and random write results when using writethrough caching is exactly the same as EBS which is expected.
  • When turning on writeback caching significant we see double the performance of EBS but once again nowhere near the performance of the SSD. I must be doing something wrong here because the bcache performance testing is reporting faster random write speeds compared to the SSD on its own.

Nonetheless these are improved results compared to the part 2. Its is important to note that the cache device can be detached as required. Also note that under writeback caching there is a potential of dataloss if the SSD ephemeral disk is lost before the data is flushed to the backing device.

Next is part 4 where we will look at at ZFS.

Comments