[GH-ISSUE #457] Memory allocation issue when running on windows #203

Closed
opened 2026-06-08 11:26:07 +03:00 by zhus · 15 comments
Owner

Originally created by @hmarko on GitHub (Dec 31, 2024).
Original GitHub issue: https://github.com/bootandy/dust/issues/457

Hi.

I'm trying to count file on 30M files dataset on SMB.
Anything can be done to overcome it, or I reached the maximum scale of dust ?
Thanks !

.\dust -F -j -r -d 4 -n 100 -s 400000 -f \\server\share$\Groups
Indexing: \\server\share$\Groups 9949070 files, 9.5M ... /memory allocation of 262144 bytes failed

Originally created by @hmarko on GitHub (Dec 31, 2024). Original GitHub issue: https://github.com/bootandy/dust/issues/457 Hi. I'm trying to count file on 30M files dataset on SMB. Anything can be done to overcome it, or I reached the maximum scale of dust ? Thanks ! .\dust -F -j -r -d 4 -n 100 -s 400000 -f [\\\\server\share$\Groups](file://server/share$/Groups) Indexing: [\\\\server\share$\Groups](file://server/share$/Groups) 9949070 files, 9.5M ... /memory allocation of 262144 bytes failed
zhus closed this issue 2026-06-08 11:26:07 +03:00
Author
Owner

@hmarko commented on GitHub (Jan 1, 2025):

Just an update .. running against the same repository from Linux client completes successfully.
I suspect this is an issue which is relevant only to windows version

<!-- gh-comment-id:2566932310 --> @hmarko commented on GitHub (Jan 1, 2025): Just an update .. running against the same repository from Linux client completes successfully. I suspect this is an issue which is relevant only to windows version
Author
Owner

@bootandy commented on GitHub (Jan 15, 2025):

Can you try running dust with more memory: eg: dust -S 1073741824 -S lets you specify stack size so you can try increasing / decreasing the number and see if windows sorts itself out.

<!-- gh-comment-id:2593758592 --> @bootandy commented on GitHub (Jan 15, 2025): Can you try running dust with more memory: eg: `dust -S 1073741824` -S lets you specify stack size so you can try increasing / decreasing the number and see if windows sorts itself out.
Author
Owner

@hmarko commented on GitHub (Jan 20, 2025):

C:\DUST>C:\DUST\dust.exe -S 1073741824 -D -p -j -r -f -n 100 -d 7 -z 200000 "\\srv\c$\folder"
Indexing: \\srv\c$\folder 12401021 files, 11M ... \memory allocation of 262144 bytes failed

<!-- gh-comment-id:2602570449 --> @hmarko commented on GitHub (Jan 20, 2025): C:\DUST>C:\DUST\dust.exe -S 1073741824 -D -p -j -r -f -n 100 -d 7 -z 200000 "[\\\\srv\\c$\\folder](file://srv/c$/folder)" Indexing: [\\\\srv\\c$\\folder](file://srv/c$/folder) 12401021 files, 11M ... \memory allocation of 262144 bytes failed
Author
Owner

@bootandy commented on GitHub (Jan 26, 2025):

I'm not sure I can do anything here. If windows is failing to assign enough memory to run dust, I'm not sure if there is anything I can do.

I'd recommend repeatedly halving the number in -S and then repeatedly doubling it and seeing if you can get a good run.

<!-- gh-comment-id:2614109396 --> @bootandy commented on GitHub (Jan 26, 2025): I'm not sure I can do anything here. If windows is failing to assign enough memory to run dust, I'm not sure if there is anything I can do. I'd recommend repeatedly halving the number in -S and then repeatedly doubling it and seeing if you can get a good run.
Author
Owner

@hmarko commented on GitHub (Jan 26, 2025):

I see the same also on linux on file systems with many million files.

I will try to play with the -S but as far as I can see it is a general scalability issue.

BTW, did you try it on file systems with 20-30 million files or more?

<!-- gh-comment-id:2614139909 --> @hmarko commented on GitHub (Jan 26, 2025): I see the same also on linux on file systems with many million files. I will try to play with the -S but as far as I can see it is a general scalability issue. BTW, did you try it on file systems with 20-30 million files or more?
Author
Owner

@bootandy commented on GitHub (Jan 26, 2025):

The same on linux ? Ok, let me try and recreate on linux.

Using these 2 scripts I made a large number of files on my ext4 filesystem:

cat ~/temp/many_files/make.sh 
#! /bin/bash
for n in {1..1000}; do
    dd if=/dev/urandom of=file$( printf %03d "$n" ).bin bs=1 count=$(( RANDOM + 1024 ))
done


cat ~/temp/many_files/silly4/make.sh 
#! /bin/bash
for n in {1..1000}; do
	mkdir $n
	touch $n/bspl{00001..09009}.$n
done


Gives:

 (collapse)andy:(0):~/dev/rust/dust$ dust -f ~/temp/ -n 10
    99,003     ┌── many_small │█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   0%
   599,419     ├── many_small2│██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   1%
   900,982     ├── silly2     │███░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   2%
   999,031     ├── silly      │████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   2%
 2,232,767     ├── silly3     │████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   5%
 9,009,001     ├── silly4     │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │  22%
 9,009,001     ├── silly5     │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │  22%
 9,009,001     ├── silly6     │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │  22%
 9,009,001     ├── silly7     │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │  22%
40,887,211   ┌─┴ many_files   │████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ │ 100%
40,887,212 ┌─┴ temp           │████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ │ 100%
 (collapse)andy:(0):~/dev/rust/dust$ 

<!-- gh-comment-id:2614335822 --> @bootandy commented on GitHub (Jan 26, 2025): The same on linux ? Ok, let me try and recreate on linux. Using these 2 scripts I made a large number of files on my ext4 filesystem: ``` cat ~/temp/many_files/make.sh #! /bin/bash for n in {1..1000}; do dd if=/dev/urandom of=file$( printf %03d "$n" ).bin bs=1 count=$(( RANDOM + 1024 )) done cat ~/temp/many_files/silly4/make.sh #! /bin/bash for n in {1..1000}; do mkdir $n touch $n/bspl{00001..09009}.$n done ``` Gives: ``` (collapse)andy:(0):~/dev/rust/dust$ dust -f ~/temp/ -n 10 99,003 ┌── many_small │█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 0% 599,419 ├── many_small2│██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 1% 900,982 ├── silly2 │███░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 2% 999,031 ├── silly │████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 2% 2,232,767 ├── silly3 │████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 5% 9,009,001 ├── silly4 │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 22% 9,009,001 ├── silly5 │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 22% 9,009,001 ├── silly6 │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 22% 9,009,001 ├── silly7 │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 22% 40,887,211 ┌─┴ many_files │████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ │ 100% 40,887,212 ┌─┴ temp │████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ │ 100% (collapse)andy:(0):~/dev/rust/dust$ ```
Author
Owner

@bootandy commented on GitHub (Jan 26, 2025):

I think by the time you are getting up to tracking a few tens of million files you are pushing the memory limits of your average system. HTOP certainly wasn't very happy when I ran the above ^

<!-- gh-comment-id:2614473714 --> @bootandy commented on GitHub (Jan 26, 2025): I think by the time you are getting up to tracking a few tens of million files you are pushing the memory limits of your average system. HTOP certainly wasn't very happy when I ran the above ^
Author
Owner

@hmarko commented on GitHub (Jan 26, 2025):

I ran an identical command to what you did and It worked.
In my use case there are a few differences that may related:

  1. I use SMB or NFS to access the fs over the network
  2. My directory structure is more complex (can get deep and narrow)
  3. There are long directories and file names

Anyway, the servers I use have 32G of RAM and are doing nothing else.
Is there any way I can use it to debug?

Thanks!

<!-- gh-comment-id:2614482507 --> @hmarko commented on GitHub (Jan 26, 2025): I ran an identical command to what you did and It worked. In my use case there are a few differences that may related: 1. I use SMB or NFS to access the fs over the network 2. My directory structure is more complex (can get deep and narrow) 3. There are long directories and file names Anyway, the servers I use have 32G of RAM and are doing nothing else. Is there any way I can use it to debug? Thanks!
Author
Owner

@bootandy commented on GitHub (Jan 26, 2025):

I'm not sure I can offer much more.

adding '-d' doesn't make it useless memory.

I can only suggest cd-ing into a subdirectory so it has less data to trawl through.

<!-- gh-comment-id:2614529630 --> @bootandy commented on GitHub (Jan 26, 2025): I'm not sure I can offer much more. adding '-d' doesn't make it useless memory. I can only suggest cd-ing into a subdirectory so it has less data to trawl through.
Author
Owner

@hmarko commented on GitHub (Jan 27, 2025):

Thanks !

I will learn some rust and run some debugs myself.

I will let you know if something pops

<!-- gh-comment-id:2614961999 --> @hmarko commented on GitHub (Jan 27, 2025): Thanks ! I will learn some rust and run some debugs myself. I will let you know if something pops
Author
Owner

@hmarko commented on GitHub (Mar 9, 2025):

Hi.

I could easily reproduce the dust crash with the following script. One problem is the length of the file names and the number of sub-directories.

``
#!/bin/bash

    BASE_DIR="/files"
    
    NUM_DIRS=10000
    NUM_FILES=10000
    
    FILENAME_LENGTH=50
    
    generate_random_string() {
      local length=$1
      tr -dc A-Za-z0-9 </dev/urandom | head -c $length
    }
    
    create_structure() {
      local current_depth=$1
      local current_dir=$2
    
      if [ $current_depth -gt 10 ]; then
        return
      fi
    
      for ((i=0; i<$NUM_DIRS; i++)); do
        dir_name=$(generate_random_string $FILENAME_LENGTH)
        new_dir="$current_dir/$dir_name"
        mkdir -p "$new_dir"
    
        for ((j=0; j<$NUM_FILES; j++)); do
          file_name=$(generate_random_string $FILENAME_LENGTH)
          touch "$new_dir/$file_name"
        done
    
        create_structure $((current_depth + 1)) "$new_dir"
      done
    }
    
    create_structure 1 "$BASE_DIR"

``

<!-- gh-comment-id:2708869602 --> @hmarko commented on GitHub (Mar 9, 2025): Hi. I could easily reproduce the dust crash with the following script. One problem is the length of the file names and the number of sub-directories. `` #!/bin/bash BASE_DIR="/files" NUM_DIRS=10000 NUM_FILES=10000 FILENAME_LENGTH=50 generate_random_string() { local length=$1 tr -dc A-Za-z0-9 </dev/urandom | head -c $length } create_structure() { local current_depth=$1 local current_dir=$2 if [ $current_depth -gt 10 ]; then return fi for ((i=0; i<$NUM_DIRS; i++)); do dir_name=$(generate_random_string $FILENAME_LENGTH) new_dir="$current_dir/$dir_name" mkdir -p "$new_dir" for ((j=0; j<$NUM_FILES; j++)); do file_name=$(generate_random_string $FILENAME_LENGTH) touch "$new_dir/$file_name" done create_structure $((current_depth + 1)) "$new_dir" done } create_structure 1 "$BASE_DIR" ``
Author
Owner

@bootandy commented on GitHub (Mar 15, 2025):

Is that only on windows ?

I tried the above on my linux box and it was dust handled it ok.

<!-- gh-comment-id:2725839653 --> @bootandy commented on GitHub (Mar 15, 2025): Is that only on windows ? I tried the above on my linux box and it was dust handled it ok.
Author
Owner

@hmarko commented on GitHub (Mar 19, 2025):

not only on Windows.. it also happens on Linux on VM with 64G RAM .

<!-- gh-comment-id:2735374375 --> @hmarko commented on GitHub (Mar 19, 2025): not only on Windows.. it also happens on Linux on VM with 64G RAM .
Author
Owner

@eliphatfs commented on GitHub (Jun 19, 2025):

I have a 300TB volume on linux with billions of files. It takes 30GB RESS + 170GB kmem and goes OOM for the container.
I limited the depth to 3 so theoretically it can be done in as little memory as the number of directories smaller than 3 depth.

I am using parallel du -hs ::: */*/* instead and it works quite fine (the catch is the workload is not balanced between processes and the last, largest directory takes a long time).

<!-- gh-comment-id:2986466958 --> @eliphatfs commented on GitHub (Jun 19, 2025): I have a 300TB volume on linux with billions of files. It takes 30GB RESS + 170GB kmem and goes OOM for the container. I limited the depth to 3 so theoretically it can be done in as little memory as the number of directories smaller than 3 depth. I am using `parallel du -hs ::: */*/*` instead and it works quite fine (the catch is the workload is not balanced between processes and the last, largest directory takes a long time).
Author
Owner

@bootandy commented on GitHub (Jul 5, 2025):

I don't think this is possible to fix. - du runs and dumps its output as it runs. dust loads it all into memory to make a decision. If there is too much to load dust will run out of memory.

<!-- gh-comment-id:3038534254 --> @bootandy commented on GitHub (Jul 5, 2025): I don't think this is possible to fix. - `du` runs and dumps its output as it runs. `dust` loads it all into memory to make a decision. If there is too much to load dust will run out of memory.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bootandy/archived-dust#203