[GH-ISSUE #83] Clone should probably be ignore / size computed differently #33

Closed
opened 2026-06-08 11:25:23 +03:00 by zhus · 15 comments
Owner

Originally created by @Babwin on GitHub (Mar 21, 2020).
Original GitHub issue: https://github.com/bootandy/dust/issues/83

Hello,

First, thx for this amazing tools.
Thanks to it I was able to clean my computer from a lot of bull***

I open this issue on a subject i don't rely understand by I hope it can help.

Thx to bsd/apple clone things, i am able to clone dir/ file with cp -c .

Clone are, as understand it, a kind of weird hard link but when you write over it, it save the diff.

In my example, you can see that I clone the dankest movie of my library few time and df -h doesn't report a disk usage difference.

dust repeat the clones as there take more space on disk.

[Movies] df -h
Filesystem      Size   Used  Avail Capacity     iused      ifree %iused  Mounted on
/dev/disk1s1   466Gi   10Gi   90Gi    11%      484283 4881968597    0%   /
devfs          403Ki  403Ki    0Bi   100%        1404          0  100%   /dev
/dev/disk1s2   466Gi  352Gi   90Gi    80%     3172969 4879279911    0%   /System/Volumes/Data
/dev/disk1s5   466Gi   12Gi   90Gi    12%          12 4882452868    0%   /private/var/vm
map auto_home    0Bi    0Bi    0Bi   100%           0          0  100%   /System/Volumes/Data/home
/dev/disk2s2   105Mi  105Mi    0Bi   100%           3 4294967276    0%   /Volumes/Install Google Drive File Stream
drivefs         30Gi  7.0Gi   23Gi    24% 18446744069414596880 4294967295 146880675765702656%   /Volumes/GoogleDrive
drivefs         30Gi  7.0Gi   23Gi    24% 18446744069414740697 4294967295 11796403584574832%   /Volumes/GoogleDrive
[Movies] cp -cR Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT clone1
[Movies] cp -cR Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT clone2
[Movies] cp -cR Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT clone3
[Movies] cp -cR Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT clone4
[Movies] cp -cR Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT clone5
[Movies] cp -cR Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT clone6
[Movies] df -h
Filesystem      Size   Used  Avail Capacity     iused      ifree %iused  Mounted on
/dev/disk1s1   466Gi   10Gi   90Gi    11%      484283 4881968597    0%   /
devfs          403Ki  403Ki    0Bi   100%        1404          0  100%   /dev
/dev/disk1s2   466Gi  352Gi   90Gi    80%     3172987 4879279893    0%   /System/Volumes/Data
/dev/disk1s5   466Gi   12Gi   90Gi    12%          12 4882452868    0%   /private/var/vm
map auto_home    0Bi    0Bi    0Bi   100%           0          0  100%   /System/Volumes/Data/home
/dev/disk2s2   105Mi  105Mi    0Bi   100%           3 4294967276    0%   /Volumes/Install Google Drive File Stream
drivefs         30Gi  7.0Gi   23Gi    24% 18446744069414596880 4294967295 146880675765702656%   /Volumes/GoogleDrive
drivefs         30Gi  7.0Gi   23Gi    24% 18446744069414740697 4294967295 11796403584574832%   /Volumes/GoogleDrive
[Movies] dust
  46G ─┬ .
  24G  ├─┬ Star.Wars.The.Clone.Wars.S01.1080p.BluRay.x264-FLHD[rartv]
 1.1G  │ ├── Star.Wars.The.Clone.Wars.S01E22.1080p.BluRay.x264-FLHD.mkv
 1.1G  │ ├── Star.Wars.The.Clone.Wars.S01E20.1080p.BluRay.x264-FLHD.mkv
 1.1G  │ └── Star.Wars.The.Clone.Wars.S01E16.1080p.BluRay.x264-FLHD.mkv
 2.9G  ├─┬ Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT
 2.9G  │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi
 2.9G  ├─┬ clone1
 2.9G  │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi
 2.9G  ├─┬ clone2
 2.9G  │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi
 2.9G  ├─┬ clone3
 2.9G  │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi
 2.9G  ├─┬ clone4
 2.9G  │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi
 2.9G  ├─┬ clone5
 2.9G  │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi
 2.9G  ├─┬ clone6
 2.9G  │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi
 1.4G  └─┬ Zero.Dark.Thirty.2012.1080p.BluRay.x265.10bit-z97
 1.4G    └── Zero.Dark.Thirty.2012.1080p.BluRay.x265.10bit-z97.mkv

I never ever developed in rust except 10 minute ago while try to see if metadata and filetype structure could help us here. seems not.

Actually, I have no clue to know if a file is a clone or not : (https://stackoverflow.com/questions/46417747/apple-file-system-apfs-check-if-file-is-a-clone-on-terminal-shell)

Would be glad to help you if you don't have any OSx to try thing out, but I don't think I would be able to PR anything.

Best Regards,

Originally created by @Babwin on GitHub (Mar 21, 2020). Original GitHub issue: https://github.com/bootandy/dust/issues/83 Hello, First, thx for this amazing tools. Thanks to it I was able to clean my computer from a lot of bull*** I open this issue on a subject i don't rely understand by I hope it can help. Thx to bsd/apple clone things, i am able to clone dir/ file with cp -c . Clone are, as understand it, a kind of weird hard link but when you write over it, it save the diff. In my example, you can see that I clone the dankest movie of my library few time and `df -h` doesn't report a disk usage difference. dust repeat the clones as there take more space on disk. ``` [Movies] df -h Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/disk1s1 466Gi 10Gi 90Gi 11% 484283 4881968597 0% / devfs 403Ki 403Ki 0Bi 100% 1404 0 100% /dev /dev/disk1s2 466Gi 352Gi 90Gi 80% 3172969 4879279911 0% /System/Volumes/Data /dev/disk1s5 466Gi 12Gi 90Gi 12% 12 4882452868 0% /private/var/vm map auto_home 0Bi 0Bi 0Bi 100% 0 0 100% /System/Volumes/Data/home /dev/disk2s2 105Mi 105Mi 0Bi 100% 3 4294967276 0% /Volumes/Install Google Drive File Stream drivefs 30Gi 7.0Gi 23Gi 24% 18446744069414596880 4294967295 146880675765702656% /Volumes/GoogleDrive drivefs 30Gi 7.0Gi 23Gi 24% 18446744069414740697 4294967295 11796403584574832% /Volumes/GoogleDrive [Movies] cp -cR Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT clone1 [Movies] cp -cR Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT clone2 [Movies] cp -cR Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT clone3 [Movies] cp -cR Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT clone4 [Movies] cp -cR Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT clone5 [Movies] cp -cR Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT clone6 [Movies] df -h Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/disk1s1 466Gi 10Gi 90Gi 11% 484283 4881968597 0% / devfs 403Ki 403Ki 0Bi 100% 1404 0 100% /dev /dev/disk1s2 466Gi 352Gi 90Gi 80% 3172987 4879279893 0% /System/Volumes/Data /dev/disk1s5 466Gi 12Gi 90Gi 12% 12 4882452868 0% /private/var/vm map auto_home 0Bi 0Bi 0Bi 100% 0 0 100% /System/Volumes/Data/home /dev/disk2s2 105Mi 105Mi 0Bi 100% 3 4294967276 0% /Volumes/Install Google Drive File Stream drivefs 30Gi 7.0Gi 23Gi 24% 18446744069414596880 4294967295 146880675765702656% /Volumes/GoogleDrive drivefs 30Gi 7.0Gi 23Gi 24% 18446744069414740697 4294967295 11796403584574832% /Volumes/GoogleDrive [Movies] dust 46G ─┬ . 24G ├─┬ Star.Wars.The.Clone.Wars.S01.1080p.BluRay.x264-FLHD[rartv] 1.1G │ ├── Star.Wars.The.Clone.Wars.S01E22.1080p.BluRay.x264-FLHD.mkv 1.1G │ ├── Star.Wars.The.Clone.Wars.S01E20.1080p.BluRay.x264-FLHD.mkv 1.1G │ └── Star.Wars.The.Clone.Wars.S01E16.1080p.BluRay.x264-FLHD.mkv 2.9G ├─┬ Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT 2.9G │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi 2.9G ├─┬ clone1 2.9G │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi 2.9G ├─┬ clone2 2.9G │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi 2.9G ├─┬ clone3 2.9G │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi 2.9G ├─┬ clone4 2.9G │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi 2.9G ├─┬ clone5 2.9G │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi 2.9G ├─┬ clone6 2.9G │ └── Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi 1.4G └─┬ Zero.Dark.Thirty.2012.1080p.BluRay.x265.10bit-z97 1.4G └── Zero.Dark.Thirty.2012.1080p.BluRay.x265.10bit-z97.mkv ``` I never ever developed in rust except 10 minute ago while try to see if metadata and filetype structure could help us here. seems not. Actually, I have no clue to know if a file is a clone or not : (https://stackoverflow.com/questions/46417747/apple-file-system-apfs-check-if-file-is-a-clone-on-terminal-shell) Would be glad to help you if you don't have any OSx to try thing out, but I don't think I would be able to PR anything. Best Regards,
zhus closed this issue 2026-06-08 11:25:23 +03:00
Author
Owner

@bootandy commented on GitHub (Mar 21, 2020):

Hey,

Thanks, Nice fine.

From reading about apple copy it appears that the equivalent is cp reflink=always in linux. Sadly my linux install doesn't support this :-(. I do not have OSX.

from man cp:
--reflink[=always] is specified, perform a lightweight copy, where the data blocks are copied only when modified. If this is not possible the copy fails

Can you please run 'ls -li' on the directory with the original and cloned object - I am curious to know if the inodes (first number in the column) are different on the cloned object

<!-- gh-comment-id:602094661 --> @bootandy commented on GitHub (Mar 21, 2020): Hey, Thanks, Nice fine. From reading about apple copy it appears that the equivalent is `cp reflink=always` in linux. Sadly my linux install doesn't support this :-(. I do not have OSX. from man cp: ` --reflink[=always] is specified, perform a lightweight copy, where the data blocks are copied only when modified. If this is not possible the copy fails` Can you please run 'ls -li' on the directory with the original and cloned object - I am curious to know if the inodes (first number in the column) are different on the cloned object
Author
Owner

@bootandy commented on GitHub (Mar 21, 2020):

Might be able to fix this issue using this:
http://m4rw3r.github.io/rust/std/os/macos/fs/trait.MetadataExt.html

<!-- gh-comment-id:602095124 --> @bootandy commented on GitHub (Mar 21, 2020): Might be able to fix this issue using this: http://m4rw3r.github.io/rust/std/os/macos/fs/trait.MetadataExt.html
Author
Owner

@niboo-ave commented on GitHub (Mar 22, 2020):

Here is the result of the ls -li (inside Movie dir, Source dir, and one of the target dir)

[Movies] ls -li
total 0
19238843 drwxr-xr-x   4 antoine  staff   128 Mar 11 23:14 Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT
 8436613 drwxr-xr-x  27 antoine  staff   864 Jan  6 23:40 Star.Wars.The.Clone.Wars.S01.1080p.BluRay.x264-FLHD[rartv]
14311823 drwxr-xr-x   5 antoine  staff   160 Mar  9 11:21 TV
 7932834 drwx------   4 antoine  staff   128 Dec 20  2017 Zero.Dark.Thirty.2012.1080p.BluRay.x265.10bit-z97
21151541 drwxr-xr-x   4 antoine  staff   128 Mar 22 14:07 clone1
21151546 drwxr-xr-x   4 antoine  staff   128 Mar 22 14:07 clone2
21151556 drwxr-xr-x   4 antoine  staff   128 Mar 22 14:07 clone3
21151566 drwxr-xr-x   4 antoine  staff   128 Mar 22 14:07 clone4
21151572 drwxr-xr-x   4 antoine  staff   128 Mar 22 14:07 clone5
  603418 drwxr-xr-x  76 antoine  staff  2432 Mar 18 18:07 funnyWEBM
[Movies] ls -li Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT
total 6104072
19242305 -rw-r--r--  1 antoine  staff          31 Mar 11 23:14 RARBG.txt
19238844 -rw-r--r--@ 1 antoine  staff  3123007576 Mar 11 23:22 Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi
[Movies] ls -li clone1
total 6104072
21151542 -rw-r--r--  1 antoine  staff          31 Mar 11 23:14 RARBG.txt
21151543 -rw-r--r--@ 1 antoine  staff  3123007576 Mar 11 23:22 Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi

I don't think it's the "same" since it's base on a feature from Apple File Systeme but probably the same idea.

Here is an extract of the man page on my MAC OS X. can't find it anyware only.

-c    copy files using clonefile(2)

Will check the MetadataExt

<!-- gh-comment-id:602202298 --> @niboo-ave commented on GitHub (Mar 22, 2020): Here is the result of the ls -li (inside Movie dir, Source dir, and one of the target dir) ``` [Movies] ls -li total 0 19238843 drwxr-xr-x 4 antoine staff 128 Mar 11 23:14 Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT 8436613 drwxr-xr-x 27 antoine staff 864 Jan 6 23:40 Star.Wars.The.Clone.Wars.S01.1080p.BluRay.x264-FLHD[rartv] 14311823 drwxr-xr-x 5 antoine staff 160 Mar 9 11:21 TV 7932834 drwx------ 4 antoine staff 128 Dec 20 2017 Zero.Dark.Thirty.2012.1080p.BluRay.x265.10bit-z97 21151541 drwxr-xr-x 4 antoine staff 128 Mar 22 14:07 clone1 21151546 drwxr-xr-x 4 antoine staff 128 Mar 22 14:07 clone2 21151556 drwxr-xr-x 4 antoine staff 128 Mar 22 14:07 clone3 21151566 drwxr-xr-x 4 antoine staff 128 Mar 22 14:07 clone4 21151572 drwxr-xr-x 4 antoine staff 128 Mar 22 14:07 clone5 603418 drwxr-xr-x 76 antoine staff 2432 Mar 18 18:07 funnyWEBM [Movies] ls -li Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT total 6104072 19242305 -rw-r--r-- 1 antoine staff 31 Mar 11 23:14 RARBG.txt 19238844 -rw-r--r--@ 1 antoine staff 3123007576 Mar 11 23:22 Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi [Movies] ls -li clone1 total 6104072 21151542 -rw-r--r-- 1 antoine staff 31 Mar 11 23:14 RARBG.txt 21151543 -rw-r--r--@ 1 antoine staff 3123007576 Mar 11 23:22 Sonic.the.Hedgehog.2020.720p.HDRip.XviD.MP3-STUTTERSHIT.avi ``` I don't think it's the "same" since it's base on a feature from Apple File Systeme but probably the same idea. Here is an extract of the man page on my MAC OS X. can't find it anyware only. ``` -c copy files using clonefile(2) ``` Will check the MetadataExt
Author
Owner

@bootandy commented on GitHub (Mar 22, 2020):

Those files have different inodes so I assume they are different files. There might be something we can plugin to in the macos specific metadataext: http://m4rw3r.github.io/rust/std/os/macos/fs/trait.MetadataExt.html in order to detect this but whoever fixes this will likely need a mac.

<!-- gh-comment-id:602223909 --> @bootandy commented on GitHub (Mar 22, 2020): Those files have different inodes so I assume they are different files. There might be something we can plugin to in the macos specific metadataext: http://m4rw3r.github.io/rust/std/os/macos/fs/trait.MetadataExt.html in order to detect this but whoever fixes this will likely need a mac.
Author
Owner

@niboo-ave commented on GitHub (Mar 22, 2020):

As I said i'm not use to Rust at all.

I don't realy undertstand how to use trait:

#![allow(unused)]
fn main() -> std::io::Result<()> {
    use std::os::macos::fs::MetadataExt;
    use std::fs;

    let metadata = fs::metadata("/Users/antoine/Movies/clone1")?;


    impl MetadataExt for Metadata;


    println!("{:?}", metadata);


    Ok(())

}

what am i doing with this ?

<!-- gh-comment-id:602267517 --> @niboo-ave commented on GitHub (Mar 22, 2020): As I said i'm not use to Rust at all. I don't realy undertstand how to use trait: ``` #![allow(unused)] fn main() -> std::io::Result<()> { use std::os::macos::fs::MetadataExt; use std::fs; let metadata = fs::metadata("/Users/antoine/Movies/clone1")?; impl MetadataExt for Metadata; println!("{:?}", metadata); Ok(()) } ``` what am i doing with this ?
Author
Owner

@bootandy commented on GitHub (Mar 23, 2020):

remove this:
impl MetadataExt for Metadata;

add this:
println!("{:?}", metadata.as_raw_stat());

and print that data for a regular file, a cloned file, and a different file and see if you can work out what the difference is between them.

This might be a bit beyond you if you aren't a rust user so don't worry too much.

<!-- gh-comment-id:602273152 --> @bootandy commented on GitHub (Mar 23, 2020): remove this: impl MetadataExt for Metadata; add this: ` println!("{:?}", metadata.as_raw_stat());` and print that data for a regular file, a cloned file, and a different file and see if you can work out what the difference is between them. This might be a bit beyond you if you aren't a rust user so don't worry too much.
Author
Owner

@niboo-ave commented on GitHub (Mar 23, 2020):

I have this while compiling:

 --> test.rs:8:22
  |
8 |     println!("{:?}", metadata.as_raw_stat());
  |                      ^^^^^^^^^^^^^^^^^^^^^^ `std::os::macos::raw::stat` cannot be formatted using `{:?}` because it doesn't implement `std::fmt::Debug`
  |
  = help: the trait `std::fmt::Debug` is not implemented for `std::os::macos::raw::stat`
  = note: required because of the requirements on the impl of `std::fmt::Debug` for `&std::os::macos::raw::stat`
  = note: required by `std::fmt::Debug::fmt`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0277`.

I try to not format the string but I receive this one.

 --> test.rs:8:22
  |
8 |     println(metadata.as_raw_stat());
  |                      ^^^^^^^^^^^
  |
  = note: `#[warn(deprecated)]` on by default

error: aborting due to previous error

For more information about this error, try `rustc --explain E0423`.
<!-- gh-comment-id:602294050 --> @niboo-ave commented on GitHub (Mar 23, 2020): I have this while compiling: ``` --> test.rs:8:22 | 8 | println!("{:?}", metadata.as_raw_stat()); | ^^^^^^^^^^^^^^^^^^^^^^ `std::os::macos::raw::stat` cannot be formatted using `{:?}` because it doesn't implement `std::fmt::Debug` | = help: the trait `std::fmt::Debug` is not implemented for `std::os::macos::raw::stat` = note: required because of the requirements on the impl of `std::fmt::Debug` for `&std::os::macos::raw::stat` = note: required by `std::fmt::Debug::fmt` error: aborting due to previous error For more information about this error, try `rustc --explain E0277`. ``` I try to not format the string but I receive this one. ``` --> test.rs:8:22 | 8 | println(metadata.as_raw_stat()); | ^^^^^^^^^^^ | = note: `#[warn(deprecated)]` on by default error: aborting due to previous error For more information about this error, try `rustc --explain E0423`. ```
Author
Owner

@bootandy commented on GitHub (Mar 25, 2020):

ok, so it doesn't implement debug so you'd have to print each of the fields out manually.

<!-- gh-comment-id:603529191 --> @bootandy commented on GitHub (Mar 25, 2020): ok, so it doesn't implement debug so you'd have to print each of the fields out manually.
Author
Owner

@Babwin commented on GitHub (Mar 25, 2020):

I don't know if this help

[test] ./test
st_dev :16777221
st_uid :502
st_mode :16877
st_nlink:4
st_ino:21151541
st_uid: 502
st_gid: 20
st_rdev:0
st_atime:1584882512
st_atime_nsec:325028357
st_mtime:1584882447
st_mtime_nsec:667178615
st_ctime:1584882447
st_ctime_nsec:667178615
st_birthtime:1584882447
st_birthtime_nsec:664890469
st_size:128
st_blocks:0
st_blksize:4096
st_flags:0
st_gen:0
st_lspare:0
st_qspare:[0,0]

But i get this warning while compiling (for each field)

warning: use of deprecated item 'std::os::macos::raw::stat::st_qspare': these type aliases are no longer supported by the standard library, the `libc` crate on crates.io should be used instead for the correct definitions

I will try to check more deeply tonight.

<!-- gh-comment-id:603735521 --> @Babwin commented on GitHub (Mar 25, 2020): I don't know if this help ``` [test] ./test st_dev :16777221 st_uid :502 st_mode :16877 st_nlink:4 st_ino:21151541 st_uid: 502 st_gid: 20 st_rdev:0 st_atime:1584882512 st_atime_nsec:325028357 st_mtime:1584882447 st_mtime_nsec:667178615 st_ctime:1584882447 st_ctime_nsec:667178615 st_birthtime:1584882447 st_birthtime_nsec:664890469 st_size:128 st_blocks:0 st_blksize:4096 st_flags:0 st_gen:0 st_lspare:0 st_qspare:[0,0] ``` But i get this warning while compiling (for each field) ``` warning: use of deprecated item 'std::os::macos::raw::stat::st_qspare': these type aliases are no longer supported by the standard library, the `libc` crate on crates.io should be used instead for the correct definitions ``` I will try to check more deeply tonight.
Author
Owner

@bootandy commented on GitHub (Aug 23, 2022):

I'll close this issue unless I hear anything in the next few days.

<!-- gh-comment-id:1223767749 --> @bootandy commented on GitHub (Aug 23, 2022): I'll close this issue unless I hear anything in the next few days.
Author
Owner

@mdekstrand commented on GitHub (Sep 9, 2022):

It looks like the raw metadata structure does not provide info to detect cloned files on macOS.

I'm not sure what levels of cloning are supported on APFS, but on Linux XFS & BTRFS, cloning is a block- or extent-level operation, not a file-level operation; if part of a file has been modified, then some blocks may be cloned but others not. Fully detecting clones, if it is even possible, is likely to require scanning a structure listing the blocks in the file. From clonefile(2) on my mac, it looks like APFS probably works similarly.

It would be very helpful for dust to have better support for these files, but I expect doing so is rather difficult. This blog post discusses one adventure in trying to detect clones; looks like you can detect that a file may have been cloned, but not necessarily that it is cloned.

<!-- gh-comment-id:1242123366 --> @mdekstrand commented on GitHub (Sep 9, 2022): It looks like the raw metadata structure does not provide info to detect cloned files on macOS. I'm not sure what levels of cloning are supported on APFS, but on Linux XFS & BTRFS, cloning is a block- or extent-level operation, not a file-level operation; if part of a file has been modified, then some blocks may be cloned but others not. Fully detecting clones, if it is even possible, is likely to require scanning a structure listing the blocks in the file. From `clonefile(2)` on my mac, it looks like APFS probably works similarly. It would be very helpful for dust to have better support for these files, but I expect doing so is rather difficult. [This blog post](https://eclecticlight.co/2021/04/02/how-can-you-tell-whether-a-file-has-been-cloned-in-apfs/) discusses one adventure in trying to detect clones; looks like you can detect that a file *may* have been cloned, but not necessarily that it *is* cloned.
Author
Owner

@bootandy commented on GitHub (Sep 11, 2022):

Thanks for the information @mdekstrand it is interesting.

Sadly, I think this is going to be too tricky to solve.

<!-- gh-comment-id:1242909545 --> @bootandy commented on GitHub (Sep 11, 2022): Thanks for the information @mdekstrand it is interesting. Sadly, I think this is going to be too tricky to solve.
Author
Owner

@mdekstrand commented on GitHub (Sep 12, 2022):

I'm going to leave this here, in case someone comes along and does want to try to work on this issue: on Linux, it looks like the FIEMAP ioctl is the way to obtain the detailed extent data needed to detect sharing.

<!-- gh-comment-id:1243946808 --> @mdekstrand commented on GitHub (Sep 12, 2022): I'm going to leave this here, in case someone comes along and does want to try to work on this issue: on Linux, it looks like the [`FIEMAP` ioctl](https://www.kernel.org/doc/html/latest/filesystems/fiemap.html) is the way to obtain the detailed extent data needed to detect sharing.
Author
Owner

@niboo-ave commented on GitHub (Sep 13, 2022):

I continue the conversation, I think it's interesting.
Don't you think it should be the responsibility of the file system to return the "real size" value of a file ? And not to each app to "manage" each file system.

<!-- gh-comment-id:1245279961 --> @niboo-ave commented on GitHub (Sep 13, 2022): I continue the conversation, I think it's interesting. Don't you think it should be the responsibility of the file system to return the "real size" value of a file ? And not to each app to "manage" each file system.
Author
Owner

@mdekstrand commented on GitHub (Sep 13, 2022):

@antoineVerlant I don't think that would make sense for this problem. In the case of cloned extents, each file is its real size — stat returns a size, and that is the size of the file. The problem is that the two files together take less space than it looks like by adding their sizes. But only the user-space program knows what files are in the set it is considering. If you run dust on a directory, and some of the files are clones of files in other directories not included in the dust run, then their full sizes should be reported; only when there are clones within the set of files counted in a dust run should dust consider accounting for the shared space. The operating system has no way to account for that, unless it is augmented with complex system calls to obtain detailed space usage across directory trees.

Other seemingly-related problems are much easier to handle. Hard-links can be detected by comparing inodes, and only counting a file the first time its inode is seen. Sparse files have their actual space used reported by the operating system. It's just clones that are the tricky problem (at least among the kinds of problems a program like dust is likely to encounter).

<!-- gh-comment-id:1245617486 --> @mdekstrand commented on GitHub (Sep 13, 2022): @antoineVerlant I don't think that would make sense for this problem. In the case of cloned extents, each file is its real size — `stat` returns a size, and that is the size of the file. The problem is that the two files together take less space than it looks like by adding their sizes. But only the user-space program knows what files are in the set it is considering. If you run `dust` on a directory, and some of the files are clones of files in other directories not included in the `dust` run, then their full sizes should be reported; only when there are clones *within* the set of files counted in a `dust` run should `dust` consider accounting for the shared space. The operating system has no way to account for that, unless it is augmented with complex system calls to obtain detailed space usage across directory trees. Other seemingly-related problems are much easier to handle. Hard-links can be detected by comparing inodes, and only counting a file the first time its inode is seen. Sparse files have their actual space used reported by the operating system. It's just clones that are the tricky problem (at least among the kinds of problems a program like `dust` is likely to encounter).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bootandy/archived-dust#33