mirror of
https://github.com/bootandy/dust.git
synced 2026-06-08 11:29:05 +03:00
[GH-ISSUE #83] Clone should probably be ignore / size computed differently #33
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Babwin on GitHub (Mar 21, 2020).
Original GitHub issue: https://github.com/bootandy/dust/issues/83
Hello,
First, thx for this amazing tools.
Thanks to it I was able to clean my computer from a lot of bull***
I open this issue on a subject i don't rely understand by I hope it can help.
Thx to bsd/apple clone things, i am able to clone dir/ file with cp -c .
Clone are, as understand it, a kind of weird hard link but when you write over it, it save the diff.
In my example, you can see that I clone the dankest movie of my library few time and
df -hdoesn't report a disk usage difference.dust repeat the clones as there take more space on disk.
I never ever developed in rust except 10 minute ago while try to see if metadata and filetype structure could help us here. seems not.
Actually, I have no clue to know if a file is a clone or not : (https://stackoverflow.com/questions/46417747/apple-file-system-apfs-check-if-file-is-a-clone-on-terminal-shell)
Would be glad to help you if you don't have any OSx to try thing out, but I don't think I would be able to PR anything.
Best Regards,
@bootandy commented on GitHub (Mar 21, 2020):
Hey,
Thanks, Nice fine.
From reading about apple copy it appears that the equivalent is
cp reflink=alwaysin linux. Sadly my linux install doesn't support this :-(. I do not have OSX.from man cp:
--reflink[=always] is specified, perform a lightweight copy, where the data blocks are copied only when modified. If this is not possible the copy failsCan you please run 'ls -li' on the directory with the original and cloned object - I am curious to know if the inodes (first number in the column) are different on the cloned object
@bootandy commented on GitHub (Mar 21, 2020):
Might be able to fix this issue using this:
http://m4rw3r.github.io/rust/std/os/macos/fs/trait.MetadataExt.html
@niboo-ave commented on GitHub (Mar 22, 2020):
Here is the result of the ls -li (inside Movie dir, Source dir, and one of the target dir)
I don't think it's the "same" since it's base on a feature from Apple File Systeme but probably the same idea.
Here is an extract of the man page on my MAC OS X. can't find it anyware only.
Will check the MetadataExt
@bootandy commented on GitHub (Mar 22, 2020):
Those files have different inodes so I assume they are different files. There might be something we can plugin to in the macos specific metadataext: http://m4rw3r.github.io/rust/std/os/macos/fs/trait.MetadataExt.html in order to detect this but whoever fixes this will likely need a mac.
@niboo-ave commented on GitHub (Mar 22, 2020):
As I said i'm not use to Rust at all.
I don't realy undertstand how to use trait:
what am i doing with this ?
@bootandy commented on GitHub (Mar 23, 2020):
remove this:
impl MetadataExt for Metadata;
add this:
println!("{:?}", metadata.as_raw_stat());and print that data for a regular file, a cloned file, and a different file and see if you can work out what the difference is between them.
This might be a bit beyond you if you aren't a rust user so don't worry too much.
@niboo-ave commented on GitHub (Mar 23, 2020):
I have this while compiling:
I try to not format the string but I receive this one.
@bootandy commented on GitHub (Mar 25, 2020):
ok, so it doesn't implement debug so you'd have to print each of the fields out manually.
@Babwin commented on GitHub (Mar 25, 2020):
I don't know if this help
But i get this warning while compiling (for each field)
I will try to check more deeply tonight.
@bootandy commented on GitHub (Aug 23, 2022):
I'll close this issue unless I hear anything in the next few days.
@mdekstrand commented on GitHub (Sep 9, 2022):
It looks like the raw metadata structure does not provide info to detect cloned files on macOS.
I'm not sure what levels of cloning are supported on APFS, but on Linux XFS & BTRFS, cloning is a block- or extent-level operation, not a file-level operation; if part of a file has been modified, then some blocks may be cloned but others not. Fully detecting clones, if it is even possible, is likely to require scanning a structure listing the blocks in the file. From
clonefile(2)on my mac, it looks like APFS probably works similarly.It would be very helpful for dust to have better support for these files, but I expect doing so is rather difficult. This blog post discusses one adventure in trying to detect clones; looks like you can detect that a file may have been cloned, but not necessarily that it is cloned.
@bootandy commented on GitHub (Sep 11, 2022):
Thanks for the information @mdekstrand it is interesting.
Sadly, I think this is going to be too tricky to solve.
@mdekstrand commented on GitHub (Sep 12, 2022):
I'm going to leave this here, in case someone comes along and does want to try to work on this issue: on Linux, it looks like the
FIEMAPioctl is the way to obtain the detailed extent data needed to detect sharing.@niboo-ave commented on GitHub (Sep 13, 2022):
I continue the conversation, I think it's interesting.
Don't you think it should be the responsibility of the file system to return the "real size" value of a file ? And not to each app to "manage" each file system.
@mdekstrand commented on GitHub (Sep 13, 2022):
@antoineVerlant I don't think that would make sense for this problem. In the case of cloned extents, each file is its real size —
statreturns a size, and that is the size of the file. The problem is that the two files together take less space than it looks like by adding their sizes. But only the user-space program knows what files are in the set it is considering. If you runduston a directory, and some of the files are clones of files in other directories not included in thedustrun, then their full sizes should be reported; only when there are clones within the set of files counted in adustrun shoulddustconsider accounting for the shared space. The operating system has no way to account for that, unless it is augmented with complex system calls to obtain detailed space usage across directory trees.Other seemingly-related problems are much easier to handle. Hard-links can be detected by comparing inodes, and only counting a file the first time its inode is seen. Sparse files have their actual space used reported by the operating system. It's just clones that are the tricky problem (at least among the kinds of problems a program like
dustis likely to encounter).