mirror of
https://github.com/bootandy/dust.git
synced 2026-06-08 11:29:05 +03:00
[GH-ISSUE #277] Getting OOM killed with large number of inodes #122
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @WesleyAC on GitHub (Nov 30, 2022).
Original GitHub issue: https://github.com/bootandy/dust/issues/277
On a system with 1,500,000 inodes used and 2GB RAM (with ~500MB free),
dustis consistently OOM-killed. I'm not sure what the internals ofdustlook like, but it seems to me that it should be possible to operate with significantly less than that.It looks like what's happening is that every node name is stored up-front when the directory tree is being walked, and then the summary is only done at the end. I'm not sure how much surgery would be required to improve the situation. One thought is to not store the
PathBufin theNode, and instead only get the name for the node when assembling the final output, by walking the directory tree one more time and matching inode numbers. This would probably want to be behind a--low-memoryswitch or something like that, since it's a tradeoff — it would take about twice a much time (walking the file tree twice), but use significantly less memory.Another option, which is less of a tradeoff, is to store just the top section of the path, rather than the absolute path in the
Nodeobject, and only assemble absolute paths from the tree of nodes. I haven't looked deeply into how much the assumption of absolute paths is baked-in, so maybe this is a non-starter, though.Would love for
dustto be more useful on my file-filled and ram-constrained server :)@bootandy commented on GitHub (Jan 4, 2023):
I'll see what I can do about this.
We had to limit stack size on small machines before: https://github.com/bootandy/dust/issues/256
We keep a summary of all the nodes that we've visited to deal with duplicate inodes so we don't double count them so I want to keep the inode_device
We could try storing just a string instead of a PathBuf as you suggested, but I'm not sure that will make much difference.
We might be able to remove 'depth' from the Node - that might not be used anymore.
We might also be able to remove the 'top' part of the paths.
But I'm not sure the above would allow us to save much memory.
I don't want to walk the tree more than once, it will be too slow. And adding it as an option would cause the code to get too messy.
@WesleyAC commented on GitHub (Jan 5, 2023):
Thanks! I think that removing the top part of the paths is probably most impactful — running
find . | xargs dirname | wc -cas a approximation of how much RAM might be saved (the actual amount would probably be slightly more than this because of alignment, etc) in a directory that I had this problem on gives108593973, around 108MB, which is a respectable saving.Another thought is that the code could be more deeply aware of the
-dflag — my understanding is that it shouldn't need to keep track of the name at all if the file or directory is more nested than the number given by-d. I could also imagine a new flag to not show sizes of files at all, only directories, which would enable a similar optimization — not storing names of files, only of directories. For me, the output would be similarly useful in most cases — generally I'm usingdustto find directories that are large, rather than particular offending files.For that matter, you could probably omit the filename in all cases, and get names only for the files that need to be displayed by calling
statxon each file in the parent directory to check the inode number. That way, instead of re-crawling the whole tree, you just need to loop over the files in a directory for each file in the output (which is probably only few dozen files at most). I don't know if that tradeoff is worth it in the non-ram-constrained case, but it seems like it might be worthwhile.@bootandy commented on GitHub (Jan 18, 2023):
Shaving the top of the paths of proved to be problematic.
I don't think I can realisticly get the memory requirement down much, sorry.