ZFS Snapshots: The case of the vanishing disk usage

3 Jun 2022

I have a home server that runs on TrueNAS (which is FreeBSD) using ZFS for its storage filesystem. I recently ran into the common issue of not knowing where my disk usage was going, discovered that disk usage accounting for snapshots is more complex than for regular filesystems, and thought I’d write it up. Hopefully others find it helpful and/or enlightening.

The Problem

Disk usage generally seems like the sort of thing that should be easy to reason about. With modern filesystems however, that isn’t really true anymore. Particularly when it comes to copy-on-write snapshots like ZFS supports.

On my server I have a directory called ‘backups’. Every now and then I zip up some files and dump them there in case I need them later. Here is what du claims is the amount of disk space consumed by that directory:

% du -d 0 -h /mnt/Vol_1/Home/Jacques/backup/
256G    /mnt/Vol_1/Home/Jacques/backup/

I also take monthly snapshots of this folder. Here is what zfs claims is the amount of disk space consumed by those snapshots:

% zfs list -t snapshot /mnt/Vol_1/Home/Jacques/backup/
NAME                                                              USED  AVAIL     REFER  MOUNTPOINT
Vol_1/Home/Jacques/backup@snap-2021-04-01_03:00                    72K      -     27.3G  -
Vol_1/Home/Jacques/backup@snap-2021-05-01_03:00                    72K      -      581G  -
Vol_1/Home/Jacques/backup@snap-2021-06-01_03:00-25monthlifetime   384K      -      581G  -
Vol_1/Home/Jacques/backup@snap-2021-07-01_03:00-25monthlifetime   384K      -      609G  -
Vol_1/Home/Jacques/backup@snap-2021-08-01_03:00-25monthlifetime   376K      -      609G  -
Vol_1/Home/Jacques/backup@snap-2021-09-01_03:00-25monthlifetime   376K      -      609G  -
Vol_1/Home/Jacques/backup@snap-2021-10-01_03:00-25monthlifetime   344K      -      609G  -
Vol_1/Home/Jacques/backup@snap-2021-11-01_03:00-25monthlifetime   352K      -      609G  -
Vol_1/Home/Jacques/backup@snap-2022-02-01_03:00-25monthlifetime   376K      -      637G  -
Vol_1/Home/Jacques/backup@snap-2022-03-01_03:00-25monthlifetime   360K      -      637G  -
Vol_1/Home/Jacques/backup@snap-2022-04-01_03:00-25monthlifetime   416K      -      637G  -
Vol_1/Home/Jacques/backup@snap-2022-05-01_03:00-25monthlifetime   432K      -      256G  -
Vol_1/Home/Jacques/backup@snap-2022-06-01_03:00-25monthlifetime   432K      -      256G  -

Not a whole lot! This is one of the excellent features of ZFS. Until I make changes to the files in the snapshot, the snapshot itself consumes almost no disk space.

So you can imagine my surprise, then, when I get a notification saying that I’m running out of disk space, check what zfs (rather than du) says about the usage and get this:

% zfs list -t filesystem /mnt/Vol_1/Home/Jacques/backup/
NAME                        USED  AVAIL     REFER  MOUNTPOINT
Vol_1/Home/Jacques/backup   637G   279G      256G  /mnt/Vol_1/Home/Jacques/backup

637GB used. So we’re just…missing…around 381GB of disk space? That’s odd. A closer look at some man pages reveals a reasonable explanation though. In particular the zfsprops man page says this about the USED property:

The used space of a snapshot (...) is space that is referenced exclusively by this snapshot

This is actually quite sensible. The USED column tells you how much storage will be freed up by destroying just that one individual snapshot. How do you tell how much space would be freed up by destroying multiple snapshots?

The Solution

Ask zfs destroy while very carefully explaining to it that you don’t actually want to delete anything:

zfs destroy -nv filesystem@snapshot1%snapshot2

This will ask ZFS to destroy all snapshots from snapshot1 to snapshot2 (inclusive). Note the -nv in there though. -n tells it to do a “dry run” so that it doesn’t actually delete anything and -v tells it to show additional info.

Lets give that a try:

% zfs destroy -nv Vol_1/Home/Jacques/backup@snap-2021-04-01_03:00%snap-2022-06-01_03:00-25monthlifetime
would destroy Vol_1/Home/Jacques/backup@snap-2021-04-01_03:00
would destroy Vol_1/Home/Jacques/backup@snap-2021-05-01_03:00
would destroy Vol_1/Home/Jacques/backup@snap-2021-06-01_03:00-25monthlifetime
would destroy Vol_1/Home/Jacques/backup@snap-2021-07-01_03:00-25monthlifetime
would destroy Vol_1/Home/Jacques/backup@snap-2021-08-01_03:00-25monthlifetime
would destroy Vol_1/Home/Jacques/backup@snap-2021-09-01_03:00-25monthlifetime
would destroy Vol_1/Home/Jacques/backup@snap-2021-10-01_03:00-25monthlifetime
would destroy Vol_1/Home/Jacques/backup@snap-2021-11-01_03:00-25monthlifetime
would destroy Vol_1/Home/Jacques/backup@snap-2022-02-01_03:00-25monthlifetime
would destroy Vol_1/Home/Jacques/backup@snap-2022-03-01_03:00-25monthlifetime
would destroy Vol_1/Home/Jacques/backup@snap-2022-04-01_03:00-25monthlifetime
would destroy Vol_1/Home/Jacques/backup@snap-2022-05-01_03:00-25monthlifetime
would destroy Vol_1/Home/Jacques/backup@snap-2022-06-01_03:00-25monthlifetime
would reclaim 382G

If we assume the fact that it shows 382 instead of 381 is just due to rounding then that explains the missing storage capacity. Interestingly, you may note that in the zfs list output above, the REFER value actually does match what du told us.

I should close by saying that I would certainly have spent far longer investigating this had I not stumbled upon one Matthew McDonald on GitHub who built a tool to help solve this mystery and wrote a readme that helpfully explains what’s going on. Many thanks to Matthew!