[GH-ISSUE #444] Handle deleted files on Linux #193

Closed
opened 2026-06-08 11:26:05 +03:00 by zhus · 5 comments
Owner

Originally created by @tatref on GitHub (Oct 14, 2024).
Original GitHub issue: https://github.com/bootandy/dust/issues/444

Hi,

On Linux, if a file is deleted while a process still has a handle on it, the disk space is still used, but not visible on the FS (ls, find.. will not find it). This is a common sysadmin issue, so I think it would be great to add an option to search for deleted files

The only way to know that a file is still taking up space, is by walking the file descriptors under /proc/$pid/fd/, and checking if the files still exist.
We can use lsof to show the deleted files:

[root@enterprise ~]# lsof -n | grep deleted
httpd    2357 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
mysqld   2588  mysql    4u   REG 253,17      52     0  1495 /tmp/ibY0cXCd (deleted)
mysqld   2588  mysql    5u   REG 253,17    1048     0  1496 /tmp/ibOrELhG (deleted)

Do you think it would be a worthy feature to add to dust?

Originally created by @tatref on GitHub (Oct 14, 2024). Original GitHub issue: https://github.com/bootandy/dust/issues/444 Hi, On Linux, if a file is deleted while a process still has a handle on it, the disk space is still used, but not visible on the FS (ls, find.. will not find it). This is a common sysadmin issue, so I think it would be great to add an option to search for deleted files The only way to know that a file is still taking up space, is by walking the file descriptors under `/proc/$pid/fd/`, and checking if the files still exist. We can use `lsof` to show the deleted files: ``` [root@enterprise ~]# lsof -n | grep deleted httpd 2357 apache 29u REG 253,17 3926560 0 1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted) mysqld 2588 mysql 4u REG 253,17 52 0 1495 /tmp/ibY0cXCd (deleted) mysqld 2588 mysql 5u REG 253,17 1048 0 1496 /tmp/ibOrELhG (deleted) ``` Do you think it would be a worthy feature to add to dust?
zhus closed this issue 2026-06-08 11:26:05 +03:00
Author
Owner

@bootandy commented on GitHub (Oct 18, 2024):

That is an interesting edge case.

I can imagine this being a common problem for sysadmins.

dust works by walking thru the filesystem, if ls won't find it I don't think dust will.

I'm currently not keen on adding this feature. Like you said, it would require walking the file descriptors, checking if that path matched the path dust was run for, then working out if it had already been included. Then if I were to show it I'd need a way of marking it as 'different' because it wouldn't be removed if you 'rm' the file.

I think we are probably better served with lsof -n | grep deleted

<!-- gh-comment-id:2420724511 --> @bootandy commented on GitHub (Oct 18, 2024): That is an interesting edge case. I can imagine this being a common problem for sysadmins. dust works by walking thru the filesystem, if `ls` won't find it I don't think dust will. I'm currently not keen on adding this feature. Like you said, it would require walking the file descriptors, checking if that path matched the path dust was run for, then working out if it had already been included. Then if I were to show it I'd need a way of marking it as 'different' because it wouldn't be removed if you 'rm' the file. I think we are probably better served with `lsof -n | grep deleted`
Author
Owner

@tatref commented on GitHub (Oct 18, 2024):

Yes you are correct, dust or ls can't find the file with a syscall in the filesystem, the only way is through /proc/

The workflow you describe is what I imagined. Yes rm can't delete the file, but the space is still used on the FS. We could maybe add a flag --list-deleted or something, then display the file deleted files as others. Or don't add a flag, and list the file with a different color/pattern

The thing is, using lsof can be complicated: the size shown is not the used space on the FS (it does not take into account sparse files), same file can be listed multiple time... Also there is no easy way of grouping the files by dir, as with dust.

I can help to make a demo implementation if you want

<!-- gh-comment-id:2420931004 --> @tatref commented on GitHub (Oct 18, 2024): Yes you are correct, dust or ls can't find the file with a syscall in the filesystem, the only way is through /proc/ The workflow you describe is what I imagined. Yes `rm` can't delete the file, but the space is still used on the FS. We could maybe add a flag `--list-deleted` or something, then display the file deleted files as others. Or don't add a flag, and list the file with a different color/pattern The thing is, using `lsof` can be complicated: the size shown is not the used space on the FS (it does not take into account sparse files), same file can be listed multiple time... Also there is no easy way of grouping the files by dir, as with dust. I can help to make a demo implementation if you want
Author
Owner

@bootandy commented on GitHub (Oct 21, 2024):

If you were to have this --list-deleted flag. Do you think you would want it to be merged in with the regular files. Because I'm wondering if it should ONLY show the deleted files.

If you are hunting down lost disk space your procedure for actual files in the filesystem is going to be different than for processes that are holding on to deleted files.

So I'm proposing this:

dust --show-deleted 
./file_still_used_by_proc
./other_file_still_used_by_proc

<does not show files on the filesystem only deleted files>

What do you think about this ?

<!-- gh-comment-id:2427451226 --> @bootandy commented on GitHub (Oct 21, 2024): If you were to have this `--list-deleted` flag. Do you think you would want it to be merged in with the regular files. Because I'm wondering if it should ONLY show the deleted files. If you are hunting down lost disk space your procedure for actual files in the filesystem is going to be different than for processes that are holding on to deleted files. So I'm proposing this: ``` dust --show-deleted ./file_still_used_by_proc ./other_file_still_used_by_proc <does not show files on the filesystem only deleted files> ``` What do you think about this ?
Author
Owner

@tatref commented on GitHub (Oct 24, 2024):

Hi,

I think it's better to merge the deleted files with the regular files, because in the end, both are taking up space.

The name of the deleted files are suffixed by the kernel with (deleted), so if theses files are visible in the output (not too deep in the tree), they will be visible like so:

100M   ┌── img.dd (deleted)                                        │████████████████████                                                                                                                            │  14%
 25M   │         ┌── s-h0x1gqqby2-1mdf3k6-24jxlf5lag4v0vlmktn2jvjdd│█████▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓                                                                                                                    │   3%
 25M   │       ┌─┴ procfs-2f2m88sfnv9m0                            │█████▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓                                                                                                                    │   3%
 34M   │       │ ┌── s-h0x19zmz8c-1btx6ww-8dz0hhsxz6pck6g30opy722lu│███████▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓                                                                                                                    │   5%

If you want to test, you can do the following:

# in 1st terminal
dd if=/dev/zero of=img.dd bs=1M count=100     # create a 100 MB file
less img.dd                                   # open the file (type y to confirm), and keep the terminal open

# in 2nd terminal
rm img.dd
<!-- gh-comment-id:2433741890 --> @tatref commented on GitHub (Oct 24, 2024): Hi, I think it's better to merge the deleted files with the regular files, because in the end, both are taking up space. The name of the deleted files are suffixed by the kernel with ` (deleted)`, so if theses files are visible in the output (not too deep in the tree), they will be visible like so: ``` 100M ┌── img.dd (deleted) │████████████████████ │ 14% 25M │ ┌── s-h0x1gqqby2-1mdf3k6-24jxlf5lag4v0vlmktn2jvjdd│█████▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ │ 3% 25M │ ┌─┴ procfs-2f2m88sfnv9m0 │█████▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ │ 3% 34M │ │ ┌── s-h0x19zmz8c-1btx6ww-8dz0hhsxz6pck6g30opy722lu│███████▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ │ 5% ``` If you want to test, you can do the following: ``` # in 1st terminal dd if=/dev/zero of=img.dd bs=1M count=100 # create a 100 MB file less img.dd # open the file (type y to confirm), and keep the terminal open # in 2nd terminal rm img.dd ```
Author
Owner

@bootandy commented on GitHub (Jun 5, 2025):

i think this is too niche so I'm closing it.

<!-- gh-comment-id:2941506707 --> @bootandy commented on GitHub (Jun 5, 2025): i think this is too niche so I'm closing it.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bootandy/archived-dust#193