ZFS - an advanced file system and volume manager.

ZFS is complex if you dive deep, but briefly it’s a volume manager that can pool disks together in different spans, think of it as a software raid.

  • The smallest level is a two-way mirror (similar to RAID-1/RAID-10).
  • Next step is RAID-Z1 (similar to RAID-5), a single parity pool with as little as 3 drives which would use 1/3 of the space is parity.
  • Followed by Raid-Z2, a double parity pool, require at least 2 disks for parity.
  • And lastly Raid-Z3, triple parity, required at least 3 disks for parity.

And it does not stop with just pooling of drives, it brings a bunch of great features to the filesystem level.

  • Copy-on-Write - Data is always written to a different block, hence not corrupting original data if interrupted.
  • Snapshots - ZFS tracks changes on block level, due to Copy-on-Write, and can easily and without extra space required be restored.
  • Integrity verification and automatic repairs - All written data is checksummed and verified when accessed, then repaired if mismatch.

From running ZFS on my NAS for a few years now, I’ve gathered some notes on basic tuning. As I’m still learning and experimenting I might add more to this later but for now - these are my humble tips.

Alignment Shift (only on creation of vdev) - if unsure: ashift=12

Alignment Shift, ashift option lets us choose the block allocation size on a vdev. Best performance is gained if this size matches the sector size of the physical drive. Older drives had a small sector size of just 512b while now the most common size is 4KiB.

When a vdev is created and a drive is added it’ll automatically try to determine the best block size if not specified, the option should most of the time be ashfit=12 (2^12 = 4096b = 4KiB).

Check current ashift:

# specific pool:
zpool get ashift tank
# all pools:
zpool get ashift
# if above dont work:
zpool get all | grep ashift

Set the option when creating a new pool:

# Create a pool:
zpool create -o ashift=12 tank mirror sda sdb
# when adding more drives:
zpool add -o ashift=12 tank mirror sdc sdd

Record size

Changing record size don’t affect current data, only new (or re-written) data.

General rule of thumb (source: klarasystems.com)

  • 1MiB for general-purpose file sharing/storage
  • 1MiB for BitTorrent download folders—this minimizes the impact of fragmentation!
  • 64KiB for KVM virtual machines using Qcow2 file-based storage
  • 16KiB for MySQL InnoDB
  • 8KiB for PostgreSQL
  • Standard: 128KiB

Set with:

sudo zfs set recordsize=128K tank
sudo zfs set recordsize=64K tank/VMs
sudo zfs set recordsize=1M tank/media

Access time - atime

The atime option let us choose if we want to update the access time of a file or not. ZFS keeps track of three timestamps per file - access, modified and changed (atime, mtime, ctime). Having the atime option on forces ZFS to update the timestamp every time a file is read which leads to more IO access and less performance.

We can easily discard this option to save on wear and gain some performance:

# Check current:
zfs get atime tank
# Set atime to off:
zfs set atime=off tank

Compression

Compression usually speeds up the writing to drives due to compressing before writing. Look at this example from a dataset of VM-images:

off on gzip lz4 zstd
Data [MiB] 101376 62976 52940.8 62976 54067.2
Compression 1.00 1.61 1.91 1.61 1.87
Time [s] 1005 611 973 616 552
Speed [MB/s] 100.9 165.9 104.2 164.6 183.7

Or this example of a dataset of documents:

off on gzip lz4 zstd
Data [MiB] 58163.2 47923.2 45465.6 47923.2 45363.1
Compression 1.00 1.21 1.27 1.21 1.28
Time [s] 794 756 780 753 750
Speed [MB/s] 73.3 76.9 74.6 77.2 77.6

Where the default on is lz4 compression.
Data size on disk, compression strength, time to write, speed/s.
ztsd wins in most occations and is a very good all around compression due to good compression/cpu ratio.

While not shown above, lz4 performs better when there’s more static data (eg media) due to its focus on reading.

read more

Set compression:

zfs get compression tank
zfs set compression=lz4 tank
zfs set compression=zstd tank/VMs

Xattr (Linux Extended Attributes)

man xattr

Extended attributes are name:value pairs associated permanently with files and directories, similar to the environment strings associated with a process.

These attributes may by default be written to disk in hidden sub-directories, which means more IO requests when accessing a file (multiple attributes can be accessed). Changing this from on (dir) to sa (System Attributes) will instead store the attributes directly in the inodes!

Check current option:

# Specific pool:
zfs get xattr tank
# All pools/datasets:
zfs get xattr

Change the option to sa:

zfs set xattr=sa tank

That’s it for now - I’ll add to this if I explore something more soon!