mirror of
https://github.com/bootandy/dust.git
synced 2026-06-08 11:29:05 +03:00
[GH-ISSUE #457] Memory allocation issue when running on windows #203
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @hmarko on GitHub (Dec 31, 2024).
Original GitHub issue: https://github.com/bootandy/dust/issues/457
Hi.
I'm trying to count file on 30M files dataset on SMB.
Anything can be done to overcome it, or I reached the maximum scale of dust ?
Thanks !
.\dust -F -j -r -d 4 -n 100 -s 400000 -f \\server\share$\Groups
Indexing: \\server\share$\Groups 9949070 files, 9.5M ... /memory allocation of 262144 bytes failed
@hmarko commented on GitHub (Jan 1, 2025):
Just an update .. running against the same repository from Linux client completes successfully.
I suspect this is an issue which is relevant only to windows version
@bootandy commented on GitHub (Jan 15, 2025):
Can you try running dust with more memory: eg:
dust -S 1073741824-S lets you specify stack size so you can try increasing / decreasing the number and see if windows sorts itself out.@hmarko commented on GitHub (Jan 20, 2025):
C:\DUST>C:\DUST\dust.exe -S 1073741824 -D -p -j -r -f -n 100 -d 7 -z 200000 "\\srv\c$\folder"
Indexing: \\srv\c$\folder 12401021 files, 11M ... \memory allocation of 262144 bytes failed
@bootandy commented on GitHub (Jan 26, 2025):
I'm not sure I can do anything here. If windows is failing to assign enough memory to run dust, I'm not sure if there is anything I can do.
I'd recommend repeatedly halving the number in -S and then repeatedly doubling it and seeing if you can get a good run.
@hmarko commented on GitHub (Jan 26, 2025):
I see the same also on linux on file systems with many million files.
I will try to play with the -S but as far as I can see it is a general scalability issue.
BTW, did you try it on file systems with 20-30 million files or more?
@bootandy commented on GitHub (Jan 26, 2025):
The same on linux ? Ok, let me try and recreate on linux.
Using these 2 scripts I made a large number of files on my ext4 filesystem:
Gives:
@bootandy commented on GitHub (Jan 26, 2025):
I think by the time you are getting up to tracking a few tens of million files you are pushing the memory limits of your average system. HTOP certainly wasn't very happy when I ran the above ^
@hmarko commented on GitHub (Jan 26, 2025):
I ran an identical command to what you did and It worked.
In my use case there are a few differences that may related:
Anyway, the servers I use have 32G of RAM and are doing nothing else.
Is there any way I can use it to debug?
Thanks!
@bootandy commented on GitHub (Jan 26, 2025):
I'm not sure I can offer much more.
adding '-d' doesn't make it useless memory.
I can only suggest cd-ing into a subdirectory so it has less data to trawl through.
@hmarko commented on GitHub (Jan 27, 2025):
Thanks !
I will learn some rust and run some debugs myself.
I will let you know if something pops
@hmarko commented on GitHub (Mar 9, 2025):
Hi.
I could easily reproduce the dust crash with the following script. One problem is the length of the file names and the number of sub-directories.
``
#!/bin/bash
``
@bootandy commented on GitHub (Mar 15, 2025):
Is that only on windows ?
I tried the above on my linux box and it was dust handled it ok.
@hmarko commented on GitHub (Mar 19, 2025):
not only on Windows.. it also happens on Linux on VM with 64G RAM .
@eliphatfs commented on GitHub (Jun 19, 2025):
I have a 300TB volume on linux with billions of files. It takes 30GB RESS + 170GB kmem and goes OOM for the container.
I limited the depth to 3 so theoretically it can be done in as little memory as the number of directories smaller than 3 depth.
I am using
parallel du -hs ::: */*/*instead and it works quite fine (the catch is the workload is not balanced between processes and the last, largest directory takes a long time).@bootandy commented on GitHub (Jul 5, 2025):
I don't think this is possible to fix. -
duruns and dumps its output as it runs.dustloads it all into memory to make a decision. If there is too much to load dust will run out of memory.