[GH-ISSUE #457] Memory allocation issue when running on windows #203

New Issue

2026-06-08T11:26:07+03:00

zhus commented

2026-06-08 11:26:07 +03:00

Originally created by @hmarko on GitHub (Dec 31, 2024).
Original GitHub issue: https://github.com/bootandy/dust/issues/457

Hi.

I'm trying to count file on 30M files dataset on SMB.
Anything can be done to overcome it, or I reached the maximum scale of dust ?
Thanks !

.\dust -F -j -r -d 4 -n 100 -s 400000 -f \\server\share$\Groups
Indexing: \\server\share$\Groups 9949070 files, 9.5M ... /memory allocation of 262144 bytes failed

Originally created by @hmarko on GitHub (Dec 31, 2024). Original GitHub issue: https://github.com/bootandy/dust/issues/457 Hi. I'm trying to count file on 30M files dataset on SMB. Anything can be done to overcome it, or I reached the maximum scale of dust ? Thanks ! .\dust -F -j -r -d 4 -n 100 -s 400000 -f [\\\\server\share$\Groups](file://server/share$/Groups) Indexing: [\\\\server\share$\Groups](file://server/share$/Groups) 9949070 files, 9.5M ... /memory allocation of 262144 bytes failed

zhus closed this issue

2026-06-08 11:26:07 +03:00

zhus commented

2026-06-08 11:26:07 +03:00

@hmarko commented on GitHub (Jan 1, 2025):

Just an update .. running against the same repository from Linux client completes successfully.
I suspect this is an issue which is relevant only to windows version

@hmarko commented on GitHub (Jan 1, 2025): Just an update .. running against the same repository from Linux client completes successfully. I suspect this is an issue which is relevant only to windows version

zhus commented

2026-06-08 11:26:07 +03:00

@bootandy commented on GitHub (Jan 15, 2025):

Can you try running dust with more memory: eg: dust -S 1073741824 -S lets you specify stack size so you can try increasing / decreasing the number and see if windows sorts itself out.

@bootandy commented on GitHub (Jan 15, 2025): Can you try running dust with more memory: eg: `dust -S 1073741824` -S lets you specify stack size so you can try increasing / decreasing the number and see if windows sorts itself out.

zhus commented

2026-06-08 11:26:07 +03:00

@hmarko commented on GitHub (Jan 20, 2025):

C:\DUST>C:\DUST\dust.exe -S 1073741824 -D -p -j -r -f -n 100 -d 7 -z 200000 "\\srv\c$\folder"
Indexing: \\srv\c$\folder 12401021 files, 11M ... \memory allocation of 262144 bytes failed

@hmarko commented on GitHub (Jan 20, 2025): C:\DUST>C:\DUST\dust.exe -S 1073741824 -D -p -j -r -f -n 100 -d 7 -z 200000 "[\\\\srv\\c$\\folder](file://srv/c$/folder)" Indexing: [\\\\srv\\c$\\folder](file://srv/c$/folder) 12401021 files, 11M ... \memory allocation of 262144 bytes failed

zhus commented

2026-06-08 11:26:07 +03:00

@bootandy commented on GitHub (Jan 26, 2025):

I'm not sure I can do anything here. If windows is failing to assign enough memory to run dust, I'm not sure if there is anything I can do.

I'd recommend repeatedly halving the number in -S and then repeatedly doubling it and seeing if you can get a good run.

@bootandy commented on GitHub (Jan 26, 2025): I'm not sure I can do anything here. If windows is failing to assign enough memory to run dust, I'm not sure if there is anything I can do. I'd recommend repeatedly halving the number in -S and then repeatedly doubling it and seeing if you can get a good run.

zhus commented

2026-06-08 11:26:07 +03:00

@hmarko commented on GitHub (Jan 26, 2025):

I see the same also on linux on file systems with many million files.

I will try to play with the -S but as far as I can see it is a general scalability issue.

BTW, did you try it on file systems with 20-30 million files or more?

@hmarko commented on GitHub (Jan 26, 2025): I see the same also on linux on file systems with many million files. I will try to play with the -S but as far as I can see it is a general scalability issue. BTW, did you try it on file systems with 20-30 million files or more?

zhus commented

2026-06-08 11:26:07 +03:00

@bootandy commented on GitHub (Jan 26, 2025):

The same on linux ? Ok, let me try and recreate on linux.

Using these 2 scripts I made a large number of files on my ext4 filesystem:

cat ~/temp/many_files/make.sh 
#! /bin/bash
for n in {1..1000}; do
    dd if=/dev/urandom of=file$( printf %03d "$n" ).bin bs=1 count=$(( RANDOM + 1024 ))
done


cat ~/temp/many_files/silly4/make.sh 
#! /bin/bash
for n in {1..1000}; do
	mkdir $n
	touch $n/bspl{00001..09009}.$n
done

Gives:

 (collapse)andy:(0):~/dev/rust/dust$ dust -f ~/temp/ -n 10
    99,003     ┌── many_small │█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   0%
   599,419     ├── many_small2│██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   1%
   900,982     ├── silly2     │███░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   2%
   999,031     ├── silly      │████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   2%
 2,232,767     ├── silly3     │████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   5%
 9,009,001     ├── silly4     │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │  22%
 9,009,001     ├── silly5     │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │  22%
 9,009,001     ├── silly6     │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │  22%
 9,009,001     ├── silly7     │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │  22%
40,887,211   ┌─┴ many_files   │████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ │ 100%
40,887,212 ┌─┴ temp           │████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ │ 100%
 (collapse)andy:(0):~/dev/rust/dust$

@bootandy commented on GitHub (Jan 26, 2025): The same on linux ? Ok, let me try and recreate on linux. Using these 2 scripts I made a large number of files on my ext4 filesystem: ``` cat ~/temp/many_files/make.sh #! /bin/bash for n in {1..1000}; do dd if=/dev/urandom of=file$( printf %03d "$n" ).bin bs=1 count=$(( RANDOM + 1024 )) done cat ~/temp/many_files/silly4/make.sh #! /bin/bash for n in {1..1000}; do mkdir $n touch $n/bspl{00001..09009}.$n done ``` Gives: ``` (collapse)andy:(0):~/dev/rust/dust$ dust -f ~/temp/ -n 10 99,003 ┌── many_small │█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 0% 599,419 ├── many_small2│██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 1% 900,982 ├── silly2 │███░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 2% 999,031 ├── silly │████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 2% 2,232,767 ├── silly3 │████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 5% 9,009,001 ├── silly4 │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 22% 9,009,001 ├── silly5 │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 22% 9,009,001 ├── silly6 │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 22% 9,009,001 ├── silly7 │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 22% 40,887,211 ┌─┴ many_files │████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ │ 100% 40,887,212 ┌─┴ temp │████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ │ 100% (collapse)andy:(0):~/dev/rust/dust$ ```

zhus commented

2026-06-08 11:26:07 +03:00

@bootandy commented on GitHub (Jan 26, 2025):

I think by the time you are getting up to tracking a few tens of million files you are pushing the memory limits of your average system. HTOP certainly wasn't very happy when I ran the above ^

@bootandy commented on GitHub (Jan 26, 2025): I think by the time you are getting up to tracking a few tens of million files you are pushing the memory limits of your average system. HTOP certainly wasn't very happy when I ran the above ^

zhus commented

2026-06-08 11:26:07 +03:00

@hmarko commented on GitHub (Jan 26, 2025):

I ran an identical command to what you did and It worked.
In my use case there are a few differences that may related:

I use SMB or NFS to access the fs over the network
My directory structure is more complex (can get deep and narrow)
There are long directories and file names

Anyway, the servers I use have 32G of RAM and are doing nothing else.
Is there any way I can use it to debug?

Thanks!

@hmarko commented on GitHub (Jan 26, 2025): I ran an identical command to what you did and It worked. In my use case there are a few differences that may related: 1. I use SMB or NFS to access the fs over the network 2. My directory structure is more complex (can get deep and narrow) 3. There are long directories and file names Anyway, the servers I use have 32G of RAM and are doing nothing else. Is there any way I can use it to debug? Thanks!

zhus commented

2026-06-08 11:26:07 +03:00

@bootandy commented on GitHub (Jan 26, 2025):

I'm not sure I can offer much more.

adding '-d' doesn't make it useless memory.

I can only suggest cd-ing into a subdirectory so it has less data to trawl through.

@bootandy commented on GitHub (Jan 26, 2025): I'm not sure I can offer much more. adding '-d' doesn't make it useless memory. I can only suggest cd-ing into a subdirectory so it has less data to trawl through.

zhus commented

2026-06-08 11:26:07 +03:00

@hmarko commented on GitHub (Jan 27, 2025):

Thanks !

I will learn some rust and run some debugs myself.

I will let you know if something pops

@hmarko commented on GitHub (Jan 27, 2025): Thanks ! I will learn some rust and run some debugs myself. I will let you know if something pops

zhus commented

2026-06-08 11:26:07 +03:00

@hmarko commented on GitHub (Mar 9, 2025):

Hi.

I could easily reproduce the dust crash with the following script. One problem is the length of the file names and the number of sub-directories.

``
#!/bin/bash

    BASE_DIR="/files"
    
    NUM_DIRS=10000
    NUM_FILES=10000
    
    FILENAME_LENGTH=50
    
    generate_random_string() {
      local length=$1
      tr -dc A-Za-z0-9 </dev/urandom | head -c $length
    }
    
    create_structure() {
      local current_depth=$1
      local current_dir=$2
    
      if [ $current_depth -gt 10 ]; then
        return
      fi
    
      for ((i=0; i<$NUM_DIRS; i++)); do
        dir_name=$(generate_random_string $FILENAME_LENGTH)
        new_dir="$current_dir/$dir_name"
        mkdir -p "$new_dir"
    
        for ((j=0; j<$NUM_FILES; j++)); do
          file_name=$(generate_random_string $FILENAME_LENGTH)
          touch "$new_dir/$file_name"
        done
    
        create_structure $((current_depth + 1)) "$new_dir"
      done
    }
    
    create_structure 1 "$BASE_DIR"

``

@hmarko commented on GitHub (Mar 9, 2025): Hi. I could easily reproduce the dust crash with the following script. One problem is the length of the file names and the number of sub-directories. `` #!/bin/bash BASE_DIR="/files" NUM_DIRS=10000 NUM_FILES=10000 FILENAME_LENGTH=50 generate_random_string() { local length=$1 tr -dc A-Za-z0-9 </dev/urandom | head -c $length } create_structure() { local current_depth=$1 local current_dir=$2 if [ $current_depth -gt 10 ]; then return fi for ((i=0; i<$NUM_DIRS; i++)); do dir_name=$(generate_random_string $FILENAME_LENGTH) new_dir="$current_dir/$dir_name" mkdir -p "$new_dir" for ((j=0; j<$NUM_FILES; j++)); do file_name=$(generate_random_string $FILENAME_LENGTH) touch "$new_dir/$file_name" done create_structure $((current_depth + 1)) "$new_dir" done } create_structure 1 "$BASE_DIR" ``

zhus commented

2026-06-08 11:26:07 +03:00

@bootandy commented on GitHub (Mar 15, 2025):

Is that only on windows ?

I tried the above on my linux box and it was dust handled it ok.

@bootandy commented on GitHub (Mar 15, 2025): Is that only on windows ? I tried the above on my linux box and it was dust handled it ok.

zhus commented

2026-06-08 11:26:08 +03:00

@hmarko commented on GitHub (Mar 19, 2025):

not only on Windows.. it also happens on Linux on VM with 64G RAM .

@hmarko commented on GitHub (Mar 19, 2025): not only on Windows.. it also happens on Linux on VM with 64G RAM .

zhus commented

2026-06-08 11:26:08 +03:00

@eliphatfs commented on GitHub (Jun 19, 2025):

I have a 300TB volume on linux with billions of files. It takes 30GB RESS + 170GB kmem and goes OOM for the container.
I limited the depth to 3 so theoretically it can be done in as little memory as the number of directories smaller than 3 depth.

I am using parallel du -hs ::: */*/* instead and it works quite fine (the catch is the workload is not balanced between processes and the last, largest directory takes a long time).

@eliphatfs commented on GitHub (Jun 19, 2025): I have a 300TB volume on linux with billions of files. It takes 30GB RESS + 170GB kmem and goes OOM for the container. I limited the depth to 3 so theoretically it can be done in as little memory as the number of directories smaller than 3 depth. I am using `parallel du -hs ::: */*/*` instead and it works quite fine (the catch is the workload is not balanced between processes and the last, largest directory takes a long time).

zhus commented

2026-06-08 11:26:08 +03:00

@bootandy commented on GitHub (Jul 5, 2025):

I don't think this is possible to fix. - du runs and dumps its output as it runs. dust loads it all into memory to make a decision. If there is too much to load dust will run out of memory.

@bootandy commented on GitHub (Jul 5, 2025): I don't think this is possible to fix. - `du` runs and dumps its output as it runs. `dust` loads it all into memory to make a decision. If there is too much to load dust will run out of memory.

zhus referenced this issue

2026-06-08 11:27:26 +03:00

[PR #203] [MERGED] docs(readme): add pacstall installation method #352

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: bootandy/archived-dust#203