[GH-ISSUE #264] Add flag to ignore hard links #114

Closed
opened 2026-06-08 11:25:44 +03:00 by zhus · 5 comments
Owner

Originally created by @EmNudge on GitHub (Sep 9, 2022).
Original GitHub issue: https://github.com/bootandy/dust/issues/264

Attempting to calculate the size of a directory after using tools which take advantage of hard links to reduce duplication across directories. It's not enough to only count hard links a single time in a run since those hard links all originate elsewhere and the originals won't be caught by dust or du.

Not even du has this feature afaik, so it would be a nice bonus to have from dust.

Alternatively, we can follow the behavior of du in that listing multiple directories allows us to use hard links from previous dirs in ignoring later dirs. When trying to filter out packages from pnpm on macos, for example, we can run dust -sh ./Library/pnpm ./our-folder and we should be removing all the hardlinks from ./our-folder that have been counted already from ./Library/pnpm.

Originally created by @EmNudge on GitHub (Sep 9, 2022). Original GitHub issue: https://github.com/bootandy/dust/issues/264 Attempting to calculate the size of a directory after using tools which take advantage of hard links to reduce duplication across directories. It's not enough to only count hard links a single time in a run since those hard links all originate elsewhere and the originals won't be caught by `dust` or `du`. Not even `du` has this feature afaik, so it would be a nice bonus to have from `dust`. Alternatively, we can follow the behavior of `du` in that listing multiple directories allows us to use hard links from previous dirs in ignoring later dirs. When trying to filter out packages from `pnpm` on macos, for example, we can run `dust -sh ./Library/pnpm ./our-folder` and we should be removing all the hardlinks from `./our-folder` that have been counted already from `./Library/pnpm`.
zhus closed this issue 2026-06-08 11:25:44 +03:00
Author
Owner

@bootandy commented on GitHub (Sep 11, 2022):

So an option that ignores hard-links? I imagine it should ignore soft-links as well?

Do you think the links should be completely ignored or should they be listed but just as near-empty files ?

<!-- gh-comment-id:1242907704 --> @bootandy commented on GitHub (Sep 11, 2022): So an option that ignores hard-links? I imagine it should ignore soft-links as well? Do you think the links should be completely ignored or should they be listed but just as near-empty files ?
Author
Owner

@EmNudge commented on GitHub (Sep 11, 2022):

Yeah, an option to ignore all symbolic links would be nice. You can keep them, but list the size of the symbolic link instead of the content it points to, so it would show as a very small file. I guess the option would more accurately be named "don't follow symlinks" rather than ignore them.

<!-- gh-comment-id:1242993881 --> @EmNudge commented on GitHub (Sep 11, 2022): Yeah, an option to ignore all symbolic links would be nice. You can keep them, but list the size of the symbolic link instead of the content it points to, so it would show as a very small file. I guess the option would more accurately be named "don't follow symlinks" rather than ignore them.
Author
Owner

@bootandy commented on GitHub (Oct 20, 2022):

I don't think this is possible.

https://unix.stackexchange.com/questions/122333/how-to-tell-which-file-is-original-if-hard-link-is-created

It is not possible to know which is a hard-link and which is the original

<!-- gh-comment-id:1284628862 --> @bootandy commented on GitHub (Oct 20, 2022): I don't think this is possible. https://unix.stackexchange.com/questions/122333/how-to-tell-which-file-is-original-if-hard-link-is-created It is not possible to know which is a hard-link and which is the original
Author
Owner

@EmNudge commented on GitHub (Nov 22, 2022):

That's unfortunate. Perhaps then just a flag to ignore duplicate files? It would accidentally capture legitimate duplicates, but it would still solve a bit of the problem.

<!-- gh-comment-id:1322893738 --> @EmNudge commented on GitHub (Nov 22, 2022): That's unfortunate. Perhaps then just a flag to ignore duplicate files? It would accidentally capture legitimate duplicates, but it would still solve a bit of the problem.
Author
Owner

@bootandy commented on GitHub (Jan 5, 2023):

it already ignores duplicate files. If 2 files have the same inode it will only count them once.

<!-- gh-comment-id:1371477671 --> @bootandy commented on GitHub (Jan 5, 2023): it already ignores duplicate files. If 2 files have the same inode it will only count them once.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bootandy/archived-dust#114