Tips: ZFS Tuning
ZFS - an advanced file system and volume manager.⌗
ZFS is complex if you dive deep, but briefly it’s a volume manager that can pool disks together in different spans, think of it as a software raid.
- The smallest level is a two-way mirror (similar to RAID-1/RAID-10).
- Next step is RAID-Z1 (similar to RAID-5), a single parity pool with as little as 3 drives which would use 1/3 of the space is parity.
- Followed by Raid-Z2, a double parity pool, require at least 2 disks for parity.
- And lastly Raid-Z3, triple parity, required at least 3 disks for parity.
And it does not stop with just pooling of drives, it brings a bunch of great features to the filesystem level.
- Copy-on-Write - Data is always written to a different block, hence not corrupting original data if interrupted.
- Snapshots - ZFS tracks changes on block level, due to Copy-on-Write, and can easily and without extra space required be restored.
- Integrity verification and automatic repairs - All written data is checksummed and verified when accessed, then repaired if mismatch.
From running ZFS on my NAS for a few years now, I’ve gathered some notes on basic tuning. As I’m still learning and experimenting I might add more to this later but for now - these are my humble tips.
Alignment Shift (only on creation of vdev) - if unsure: ashift=12⌗
Alignment Shift, ashift option lets us choose the block allocation size on a vdev. Best performance is gained if this size matches the sector size of the physical drive. Older drives had a small sector size of just 512b while now the most common size is 4KiB.
When a vdev is created and a drive is added it’ll automatically try to determine the best block size if not specified, the option should most of the time be ashfit=12 (2^12 = 4096b = 4KiB).
Check current ashift:
# specific pool:
zpool get ashift tank
# all pools:
zpool get ashift
# if above dont work:
zpool get all | grep ashift
Set the option when creating a new pool:
# Create a pool:
zpool create -o ashift=12 tank mirror sda sdb
# when adding more drives:
zpool add -o ashift=12 tank mirror sdc sdd
Record size⌗
Changing record size don’t affect current data, only new (or re-written) data.
General rule of thumb (source: klarasystems.com)⌗
- 1MiB for general-purpose file sharing/storage
- 1MiB for BitTorrent download folders—this minimizes the impact of fragmentation!
- 64KiB for KVM virtual machines using Qcow2 file-based storage
- 16KiB for MySQL InnoDB
- 8KiB for PostgreSQL
- Standard: 128KiB
Set with:
sudo zfs set recordsize=128K tank
sudo zfs set recordsize=64K tank/VMs
sudo zfs set recordsize=1M tank/media
Access time - atime⌗
The atime option let us choose if we want to update the access time of a file or not. ZFS keeps track of three timestamps per file - access, modified and changed (atime, mtime, ctime). Having the atime option on forces ZFS to update the timestamp every time a file is read which leads to more IO access and less performance.
We can easily discard this option to save on wear and gain some performance:
# Check current:
zfs get atime tank
# Set atime to off:
zfs set atime=off tank
Compression⌗
Compression usually speeds up the writing to drives due to compressing before writing. Look at this example from a dataset of VM-images:
off | on | gzip | lz4 | zstd | |
---|---|---|---|---|---|
Data [MiB] | 101376 | 62976 | 52940.8 | 62976 | 54067.2 |
Compression | 1.00 | 1.61 | 1.91 | 1.61 | 1.87 |
Time [s] | 1005 | 611 | 973 | 616 | 552 |
Speed [MB/s] | 100.9 | 165.9 | 104.2 | 164.6 | 183.7 |
Or this example of a dataset of documents:
off | on | gzip | lz4 | zstd | |
---|---|---|---|---|---|
Data [MiB] | 58163.2 | 47923.2 | 45465.6 | 47923.2 | 45363.1 |
Compression | 1.00 | 1.21 | 1.27 | 1.21 | 1.28 |
Time [s] | 794 | 756 | 780 | 753 | 750 |
Speed [MB/s] | 73.3 | 76.9 | 74.6 | 77.2 | 77.6 |
Where the default on is lz4 compression.
Data size on disk, compression strength, time to write, speed/s.
ztsd wins in most occations and is a very good all around compression due to good compression/cpu ratio.
While not shown above, lz4 performs better when there’s more static data (eg media) due to its focus on reading.
Set compression:
zfs get compression tank
zfs set compression=lz4 tank
zfs set compression=zstd tank/VMs
Xattr (Linux Extended Attributes)⌗
Extended attributes are name:value pairs associated permanently with files and directories, similar to the environment strings associated with a process.
These attributes may by default be written to disk in hidden sub-directories, which means more IO requests when accessing a file (multiple attributes can be accessed). Changing this from on (dir) to sa (System Attributes) will instead store the attributes directly in the inodes!
Check current option:
# Specific pool:
zfs get xattr tank
# All pools/datasets:
zfs get xattr
Change the option to sa:
zfs set xattr=sa tank
That’s it for now - I’ll add to this if I explore something more soon!