[PR #575] perf(walk): improvements to the walker hot path #560

Open
opened 2026-06-08 11:29:24 +03:00 by zhus · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/bootandy/dust/pull/575
Author: @arcuru
Created: 5/12/2026
Status: 🔄 Open

Base: masterHead: walker-perf


📝 Commits (6)

  • aa433ae perf(walk): flatten recursion via rayon::scope, drop -S stack-size knob
  • ecdc8ef perf(walk): cache per-dir stat to eliminate duplicate syscall
  • 24aff08 perf(walk): cache entry path and file_type in process_entry
  • 242f722 perf(walk): trim filter-path overhead and skip ignore_file when off
  • 44c4c3f perf(walk): swap FxHash for inode dedup HashSet
  • 81cf4eb perf(walk): skip per-file statx in --filecount mode

📊 Changes

15 files changed (+826 additions, -478 deletions)

View changed files

📝 Cargo.lock (+16 -141)
📝 Cargo.toml (+1 -1)
📝 README.md (+1 -1)
📝 completions/_dust (+2 -2)
📝 completions/_dust.ps1 (+2 -2)
📝 completions/dust.elv (+2 -2)
📝 completions/dust.fish (+1 -1)
📝 man-page/dust.1 (+1 -1)
📝 src/cli.rs (+2 -3)
📝 src/config.rs (+0 -9)
📝 src/dir_walker.rs (+584 -140)
📝 src/main.rs (+27 -52)
📝 src/node.rs (+62 -43)
📝 src/platform.rs (+98 -80)
📝 tests/tests_symlinks.rs (+27 -0)

📄 Description

More changes as a stack on top of my previous #574. Ideally this would be stacked commits but right now this will show the prev commit inside this PR...

Happy to split/rebase as requested, though I am also fine if you pull/modify yourself.

Five small reductions on the per-entry walker hot path. None of them change observable behavior they just remove redundant work. They compound for noticeable wins on tree-heavy and wide-directory walks.

  1. Per-dir stat cache. walk_dir() was statting each directory for its is_dir() check, then build_node() statted it again to populate the Node. Cache the parsed metadata tuple in PendingDir via OnceLock. One statx saved per directory.

  2. Cache entry.path() / entry.file_type(). Both were called 2-3× per entry, and entry.path() allocates a fresh PathBuf each call. Compute once in process_entry and thread through. Also gate is_ignored_path on a cheap empty-check so walks without --ignore-directory skip the HashSet probe entirely.

  3. Skip ignore_file on default walks. Precompute has_any_filter when constructing WalkData. In the hot loop, three-way branch: full ignore_file when filters are active, inlined dot-file check when only --ignore-hidden is on, otherwise skip. When filters are active, also coalesce the duplicate get_metadata calls inside ignore_file, replace path.is_file() stats with the cached file_type, hoist fs::canonicalize out of the ignore_directories loop, and thread the prefetched tuple into build_node so each entry is statted at most once on filter-active walks.

  4. FxHash for inode dedup. clean_inodes hashes 25M (inode, dev) pairs on my test /nix/store. SipHash's DoS resistance is pointless for primitive keys from our own syscalls. Adds rustc-hash = "2", swaps HashSet for FxHashSet. This can be inlined to avoid the dep by defining a simple hasher.

  5. Skip per-file statx in --filecount mode. Under -f, every file's size is overwritten with 1, so the per-file statx is removable. The only field the dedup still needs is (inode, dev): inode comes from DirEntry::ino() (filled by getdents64), dev from the parent directory's cached tuple. Gated to -f without -L and without metadata-needing filters, so output is byte-identical to the previous stat path. Unix-only; Windows DirEntry has no cheap d_ino analogue.

Benchmarks

Cumulative for this batch of changes. "Before" is inclusive of #574.

host: 24 CPU; hyperfine --warmup 2 --runs 10 for synthetics, --warmup 1 --runs 3-5 for large targets

target before after speedup
balanced (1000 dirs / 20k files) 9.3 ms 8.8 ms 1.06×
wide_flat (1 dir, 100k files) 76.9 ms 72.9 ms 1.06×
deep_narrow (1500-deep chain) 139.3 ms 98.9 ms 1.41×
-e "\.rs$" regex filter on dust src 35.7 ms 16.9 ms 2.11×
-x dust src 48.3 ms 30.4 ms 1.59×
-f balanced 9.6 ms 7.7 ms 1.25×
-f wide_flat 82.4 ms 74.8 ms 1.10×
~ (273k entries) 182 ms 173 ms 1.06×
/nix/store (7.5M+ entries, ZFS) 33.5 s 25.6 s 1.31×
-f /nix/store 25.6 s 18.8 s 1.36×

Default-walk synthetics (balanced, wide_flat, ~) are tighter on this run. Those workloads are already close to the syscall floor; the per-entry CPU savings show up in user time more than wall time. I also sampled the system times for these runs, and they are more dramatic where they apply: -e -75% sys, -x -41% sys, -f /nix/store -61% sys. strace confirms -f wide_flat goes from ~100k statx calls to ~10.

Further Work

One more potential perf change stacked on top of this:

  • AT_STATX_DONT_SYNC on Linux. Biggest win on network filesystems (~3.8× on an 11 TB NFS mount in my testing); helps cold-cache local in smaller amounts. dust's default std::fs::Metadata uses AT_STATX_SYNC_AS_STAT, which for a read-only walker is stricter coherence than we need.

That is a small behavior change though, which may or may not be acceptable.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/bootandy/dust/pull/575 **Author:** [@arcuru](https://github.com/arcuru) **Created:** 5/12/2026 **Status:** 🔄 Open **Base:** `master` ← **Head:** `walker-perf` --- ### 📝 Commits (6) - [`aa433ae`](https://github.com/bootandy/dust/commit/aa433ae7ff7c1f23bfbdd13fba551f8074abff87) perf(walk): flatten recursion via rayon::scope, drop -S stack-size knob - [`ecdc8ef`](https://github.com/bootandy/dust/commit/ecdc8ef561958dd152085f8eff3d08e045a4b620) perf(walk): cache per-dir stat to eliminate duplicate syscall - [`24aff08`](https://github.com/bootandy/dust/commit/24aff081310f6232945b0c76c98a91039058b685) perf(walk): cache entry path and file_type in process_entry - [`242f722`](https://github.com/bootandy/dust/commit/242f722b1c1f55f6db775ab0508db6820ae29b81) perf(walk): trim filter-path overhead and skip ignore_file when off - [`44c4c3f`](https://github.com/bootandy/dust/commit/44c4c3f95689833d194ee08d02326ff74da8d681) perf(walk): swap FxHash for inode dedup HashSet - [`81cf4eb`](https://github.com/bootandy/dust/commit/81cf4eb4d93cd847ef1330a42b5eb1f4578f7111) perf(walk): skip per-file statx in --filecount mode ### 📊 Changes **15 files changed** (+826 additions, -478 deletions) <details> <summary>View changed files</summary> 📝 `Cargo.lock` (+16 -141) 📝 `Cargo.toml` (+1 -1) 📝 `README.md` (+1 -1) 📝 `completions/_dust` (+2 -2) 📝 `completions/_dust.ps1` (+2 -2) 📝 `completions/dust.elv` (+2 -2) 📝 `completions/dust.fish` (+1 -1) 📝 `man-page/dust.1` (+1 -1) 📝 `src/cli.rs` (+2 -3) 📝 `src/config.rs` (+0 -9) 📝 `src/dir_walker.rs` (+584 -140) 📝 `src/main.rs` (+27 -52) 📝 `src/node.rs` (+62 -43) 📝 `src/platform.rs` (+98 -80) 📝 `tests/tests_symlinks.rs` (+27 -0) </details> ### 📄 Description More changes as a stack on top of my previous #574. Ideally this would be stacked commits but right now this will show the prev commit inside this PR... Happy to split/rebase as requested, though I am also fine if you pull/modify yourself. Five small reductions on the per-entry walker hot path. None of them change observable behavior they just remove redundant work. They compound for noticeable wins on tree-heavy and wide-directory walks. 1. **Per-dir stat cache.** `walk_dir()` was statting each directory for its `is_dir()` check, then `build_node()` statted it again to populate the Node. Cache the parsed metadata tuple in `PendingDir` via `OnceLock`. One `statx` saved per directory. 2. **Cache `entry.path()` / `entry.file_type()`.** Both were called 2-3× per entry, and `entry.path()` allocates a fresh `PathBuf` each call. Compute once in `process_entry` and thread through. Also gate `is_ignored_path` on a cheap empty-check so walks without `--ignore-directory` skip the HashSet probe entirely. 3. **Skip `ignore_file` on default walks.** Precompute `has_any_filter` when constructing `WalkData`. In the hot loop, three-way branch: full `ignore_file` when filters are active, inlined dot-file check when only `--ignore-hidden` is on, otherwise skip. When filters _are_ active, also coalesce the duplicate `get_metadata` calls inside `ignore_file`, replace `path.is_file()` stats with the cached `file_type`, hoist `fs::canonicalize` out of the `ignore_directories` loop, and thread the prefetched tuple into `build_node` so each entry is statted at most once on filter-active walks. 4. **FxHash for inode dedup.** `clean_inodes` hashes 25M `(inode, dev)` pairs on my test `/nix/store`. SipHash's DoS resistance is pointless for primitive keys from our own syscalls. Adds `rustc-hash = "2"`, swaps `HashSet` for `FxHashSet`. This can be inlined to avoid the dep by defining a simple hasher. 5. **Skip per-file `statx` in `--filecount` mode.** Under `-f`, every file's size is overwritten with 1, so the per-file `statx` is removable. The only field the dedup still needs is `(inode, dev)`: inode comes from `DirEntry::ino()` (filled by `getdents64`), dev from the parent directory's cached tuple. Gated to `-f` without `-L` and without metadata-needing filters, so output is byte-identical to the previous stat path. Unix-only; Windows `DirEntry` has no cheap `d_ino` analogue. ## Benchmarks Cumulative for this batch of changes. "Before" is inclusive of #574. host: 24 CPU; hyperfine `--warmup 2 --runs 10` for synthetics, `--warmup 1 --runs 3-5` for large targets | target | before | after | speedup | | ------------------------------------- | -------: | ------: | --------: | | balanced (1000 dirs / 20k files) | 9.3 ms | 8.8 ms | 1.06× | | wide_flat (1 dir, 100k files) | 76.9 ms | 72.9 ms | 1.06× | | deep_narrow (1500-deep chain) | 139.3 ms | 98.9 ms | **1.41×** | | `-e "\.rs$"` regex filter on dust src | 35.7 ms | 16.9 ms | **2.11×** | | `-x` dust src | 48.3 ms | 30.4 ms | **1.59×** | | `-f` balanced | 9.6 ms | 7.7 ms | 1.25× | | `-f` wide_flat | 82.4 ms | 74.8 ms | 1.10× | | ~ (273k entries) | 182 ms | 173 ms | 1.06× | | /nix/store (7.5M+ entries, ZFS) | 33.5 s | 25.6 s | **1.31×** | | `-f` /nix/store | 25.6 s | 18.8 s | **1.36×** | Default-walk synthetics (`balanced`, `wide_flat`, `~`) are tighter on this run. Those workloads are already close to the syscall floor; the per-entry CPU savings show up in user time more than wall time. I also sampled the system times for these runs, and they are more dramatic where they apply: `-e` -75% sys, `-x` -41% sys, `-f /nix/store` -61% sys. strace confirms `-f wide_flat` goes from ~100k `statx` calls to ~10. ## Further Work One more potential perf change stacked on top of this: - `AT_STATX_DONT_SYNC` on Linux. Biggest win on network filesystems (~3.8× on an 11 TB NFS mount in my testing); helps cold-cache local in smaller amounts. dust's default `std::fs::Metadata` uses `AT_STATX_SYNC_AS_STAT`, which for a read-only walker is stricter coherence than we need. That is a small behavior change though, which may or may not be acceptable. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
zhus added the pull-request label 2026-06-08 11:29:24 +03:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bootandy/archived-dust#560