[GH-ISSUE #375] HDD performance is poor #166

Closed
opened 2026-06-08 11:25:58 +03:00 by zhus · 3 comments
Owner

Originally created by @Davester47 on GitHub (Mar 14, 2024).
Original GitHub issue: https://github.com/bootandy/dust/issues/375

Performance of dust compared to single-threaded coreutils du on my HDD is about 2x worse. I was testing on my home directory, which according to dust has 139,000 files for a total of 711 gigabytes. The filesystem is ext4.

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ time dust
...
711G ┌─┴ .
dust  1.00s user 3.06s system 4% cpu 1:26.04 total

And for du:

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ time du -sh .
712G	.
du -sh .  0.29s user 1.54s system 4% cpu 45.589 total

I'm not sure what is causing the difference here, but I'm guessing it has to do with the directory traversal order used by the two programs. du uses depth-first search, whereas dust appears to be using breadth-first search through rayon. Perhaps depth first search is more effective on mechanical HDDs?

Originally created by @Davester47 on GitHub (Mar 14, 2024). Original GitHub issue: https://github.com/bootandy/dust/issues/375 Performance of `dust` compared to single-threaded coreutils `du` on my HDD is about 2x worse. I was testing on my home directory, which according to dust has 139,000 files for a total of 711 gigabytes. The filesystem is ext4. ``` $ echo 1 | sudo tee /proc/sys/vm/drop_caches $ time dust ... 711G ┌─┴ . dust 1.00s user 3.06s system 4% cpu 1:26.04 total ``` And for du: ``` $ echo 1 | sudo tee /proc/sys/vm/drop_caches $ time du -sh . 712G . du -sh . 0.29s user 1.54s system 4% cpu 45.589 total ``` I'm not sure what is causing the difference here, but I'm guessing it has to do with the directory traversal order used by the two programs. `du` uses depth-first search, whereas `dust` appears to be using breadth-first search through rayon. Perhaps depth first search is more effective on mechanical HDDs?
zhus closed this issue 2026-06-08 11:25:58 +03:00
Author
Owner

@bootandy commented on GitHub (Mar 15, 2024):

du is a simpler tool.

It's also ancient, highly crafted C code.

Disks, folder structures are all highly varied. I'm not surprised that du is sometimes faster.

If you run dust like this: you can see how long it takes to run in single threaded mode
$ export RAYON_NUM_THREADS=1; time dust /

<!-- gh-comment-id:1998514192 --> @bootandy commented on GitHub (Mar 15, 2024): du is a simpler tool. It's also ancient, highly crafted C code. Disks, folder structures are all highly varied. I'm not surprised that du is sometimes faster. If you run dust like this: you can see how long it takes to run in single threaded mode `$ export RAYON_NUM_THREADS=1; time dust / `
Author
Owner

@Davester47 commented on GitHub (Mar 15, 2024):

I ran it like you said in single-threaded mode, and it was almost as fast as du! I guess something about the access pattern really disagrees with HDDs when it's running multi-threaded. I wouldn't suppose it'd be easy to change the order you go through the filesystem, would it?

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ export RAYON_NUM_THREADS=1; time dust .
...
711G ┌─┴ .
dust .  0.53s user 1.89s system 5% cpu 48.260 total
<!-- gh-comment-id:1998562646 --> @Davester47 commented on GitHub (Mar 15, 2024): I ran it like you said in single-threaded mode, and it was almost as fast as du! I guess something about the access pattern really disagrees with HDDs when it's running multi-threaded. I wouldn't suppose it'd be easy to change the order you go through the filesystem, would it? ``` $ echo 1 | sudo tee /proc/sys/vm/drop_caches $ export RAYON_NUM_THREADS=1; time dust . ... 711G ┌─┴ . dust . 0.53s user 1.89s system 5% cpu 48.260 total ```
Author
Owner

@bootandy commented on GitHub (Mar 15, 2024):

I wouldn't suppose it'd be easy to change the order you go through the filesystem, would it?

Not really.

This breath first approach works quite well with Rayon, Removing the recursion might be sensible but I can't really move to depth first easily.

<!-- gh-comment-id:1998653538 --> @bootandy commented on GitHub (Mar 15, 2024): > I wouldn't suppose it'd be easy to change the order you go through the filesystem, would it? Not really. This breath first approach works quite well with Rayon, Removing the recursion might be sensible but I can't really move to depth first easily.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bootandy/archived-dust#166