btrfs的去重有丶东西啊……居然能让img镜像实际上实现弹性储存占用……

(维格纳的朋友) #1

我用的bees,去那边询问“nocow是否会让去重失效以及对虚拟磁盘镜像是否能去重时”,他顺便告诉我“你用qcow2我也能给你去重了,别说你用虚拟磁盘了,就算你用子卷,用快照,用压缩,照样给你去重了,就是加密有点悬,当然推荐你用img,性能更好些,不然就是双重写时复制了,而且btrfs也有快照”
他怎么一说,我想起来我用UEFI引导的QEMU虚拟机,貌似没法快照,忘了还有btrfs这玩意的快照可以用了,因为升级时默认快照险些把我磁盘塞满,我总是认为这东西照虚拟磁盘,怕不是分分钟塞爆磁盘。

sudo btrfs filesystem usage /home
Overall:
    Device size:                   3.49TiB
    Device allocated:              1.22TiB
    Device unallocated:            2.27TiB
    Device missing:                  0.00B
    Used:                        972.49GiB
    Free (estimated):              2.54TiB      (min: 2.54TiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,single: Size:1.21TiB, Used:969.53GiB
   /dev/nvme0n1p1          1.21TiB

Metadata,single: Size:5.01GiB, Used:2.95GiB
   /dev/nvme0n1p1          5.01GiB

System,single: Size:4.00MiB, Used:160.00KiB
   /dev/nvme0n1p1          4.00MiB

Unallocated:
   /dev/nvme0n1p1          2.27TiB

qemu-img convert -f qcow2 -O raw ~/.vm/Win10.qcow2 ~/.vm/Win10.img

rm -rf ~/.vm/Win10.qcow2
   
sudo btrfs filesystem usage /home
Overall:
    Device size:                   3.49TiB
    Device allocated:              1.07TiB
    Device unallocated:            2.42TiB
    Device missing:                  0.00B
    Used:                        971.70GiB
    Free (estimated):              2.54TiB      (min: 2.54TiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 80.00KiB)

Data,single: Size:1.06TiB, Used:968.77GiB
   /dev/nvme0n1p1          1.06TiB

Metadata,single: Size:5.01GiB, Used:2.94GiB
   /dev/nvme0n1p1          5.01GiB

System,single: Size:4.00MiB, Used:160.00KiB
   /dev/nvme0n1p1          4.00MiB

Unallocated:
   /dev/nvme0n1p1          2.42TiB

实测似乎img去重命中率还要高点!
img完爆qcow2?

说起来机械硬盘上的冷数据压缩后本身差不多1T了
然后再加上虚拟磁盘镜像和一堆openwrt的源码
去重到这份上
属实:water_buffalo::beer:
不过HDD还是别用,貌似去重会导致大量磁盘碎片,用在HDD上怕不是当场暴毙……

嗯?3.49T的nvme盘
别在意,只是一片随处可见的洋垃圾企业盘而已~

Copy-on-write allows all writes to be continuous–since every write relocates data, all writes can be relocated to contiguous areas, even if the writes themselves are randomly ordered.

If a file is written randomly, then later sequential reads will be slower. The sequential logical order of the reads will not match the random physical order of data on the disk.

If a file is written continuously, then later sequential reads will be faster. This is how the btrfs ‘defrag’ feature works–it simply copies fragmented data into a contiguous area in order, so that future sequential reads are in logical and physical order at the same time.

If a file is read continuously, then performance will be proportional to the size of each non-consecutive fragment. There will be one seek per fragment, plus another seek to read a new metadata block on every ~100th fragment. On SSDs the seeks are replaced with IO transaction overheads, which are almost as expensive as physical head movements on SATA SSD devices.

If a file is read randomly (e.g. hash table lookups), then performance will be close to the worst-case rate all the time.

Data extent fragmentation makes random read performance a little worse, but metadata pages usually fit in RAM cache, so once the cache is hot, only the data block reads contribute significantly to IO load. If you have a really large file and the metadata pages don’t fit in RAM cache, then you’ll take a metadata page read hit for every data block, and on a fast SSD that can be a 80% performance loss (one 16K metadata page random read for each 4K data page random read). Slow disks only have a 50% performance loss (the seek time dominates, so the 16K random read cost is equivalent to the 4K one).

Double the RAM cache costs and/or performance losses from fragmentation if csums are used (each read needs another O(log(n)) metadata page lookup for the csum).

bees开发者对btrfs的写时复制功能的详细解释~
……
屌大的能不能阅读理解一下?


未来等内核补丁被集成,bees开发者打算加入在磁盘被写入数据前直接抛弃数据块的功能,用于遏制ssd写入放大以及降低写入负荷,以及如果可能将让去重与碎片整理同时进行。

#2

资瓷,虽然我不懂,

(benren) #3

这个去重是实时的吗?

(维格纳的朋友) #4

对,如你所见的。
我在删除qcow2镜像后他就将储存空间释放了出来。
而且使用img镜像似乎去重命中率更高。

(维格纳的朋友) #5

不过目前是在你写入两份重复文件之后他会删除一份重复文件然后链接,这会使写入操作多一次,不过也未必会加剧ssd写入放大,可能有些文件在缓存里就被去重了。
hdd就先别用,因为删除重复数据块本身就会破坏数据文件连续性,然后整理文件碎片又似乎会触发去重…loop。
初次全盘去重就是一次删除全部重复数据,然后链接,这没毛病。

(runapp) #6

你测试vm里的io性能了吗?