mirror of
https://github.com/bootandy/dust.git
synced 2026-06-08 11:29:05 +03:00
[GH-ISSUE #171] Filter by file type/extension #74
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @polarathene on GitHub (Aug 10, 2021).
Original GitHub issue: https://github.com/bootandy/dust/issues/171
I often find support for exclude/ignore, but these tools seem to lack the opposite of allowing to only care about certain extensions (eg images
png|gif|jpg|jpeg|webpor archiveszip|tar|rar|7zip|zstd). Such functionality would be quite handy for me to get a better perspective of content breakdown for a server with user uploaded content.Identifying how much disk usage a particular image type is using, or what the largest N images are. CLI tools like this one all seem to have the capability of doing such, but when the scanned content is not as well organized, or spread out among dirs where sibling/nested dirs can contain various other content or mixed types (eg if I only want to know about png disk usage), this is problematic without the ability to filter.
@bootandy commented on GitHub (Aug 12, 2021):
That's an interesting idea. I think it might be worth exploring. I'll look in to it.
We could group by extension type as well as a specific extension.
https://www.computerhope.com/issues/ch001789.htm
https://www.online-convert.com/file-type
@bootandy commented on GitHub (Aug 12, 2021):
Here is an experimental branch where I've added -t for listing file_types and -y for a specific file type:
https://github.com/bootandy/dust/tree/by_type2
Is this what you were hoping for?
@polarathene commented on GitHub (Aug 13, 2021):
Example output from server
Using
-t -ris a decent breakdown by extension.This is on one of the websites the server manages with
-y png -r:-y webp -r:-y jpg -r:It also works nicely for getting a file count by extension, eg
-y png -fr:Likewise with
-t -frI get a nice summary by extension:This is already very handy, thanks! 😀
However it would be nice to have something in-between supporting multiple extensions such as only the image data shown above
-y "png jpg webp",-y "png,jpg,webp"or-y "\.(png|jpg|webp)$"(regex pattern on file extension). That way I can get an overview of the image data without individually querying each one and comparing/summing. As you can see there is some sizable archives in the mix which is why I requested the feature so I can focus on the image stats (as a whole and individually).I've used
dutreein the past, it had a handy--aggr=50Mfeature to aggregate files in directories that were 50M total or more, otherwise they were ignored, such as those smaller lines that aren't as useful when interested in the bigger numbers.dutreealso could show me the top N largest files, I'm not sure ifdustoffers that.This is great though, especially
-y! If you do consider the mult-extension case, being able to still breakdown by extension like-tdoes but mixed with the aggregated dirs view-yoffers would be cool too 👍Something like below?:
or if the size column would be better without the extension size interleaved..:
or as I believe
dustis doing, collapse/avoid displaying size info that's not changing/useful:This seems to be easier to make sense of, although I haven't merged all the other dirs (and percentage is of course wrong). Being able to aggregate or exclude directories with a size lower than N may help filter out some noise from such a view if that became an issue.
@bootandy commented on GitHub (Aug 13, 2021):
Instead of
-y "png,jpg,webp"this should work-y png -y jpg, -y ebpBut I think I need to support comma and space separated strings as well - like you suggest.
--aggris an interesting idea, I'm not sure if that fits with dust as dust displays 'n' lines where as dutree displays all the lines. So you can think of it as dust has an implicit --aggr based on how big your terminal is, if you don't want aggr usedust -n 9999I think interleaving the size column into the tree might be difficult.
@polarathene commented on GitHub (Aug 13, 2021):
That would be a better UX for sure! :)
Thanks for the tip, very useful.
That's fine, for the most part I mentioned it as a way to filter out information that may not be of interest. In
dutreethat meant collapsing/grouping nested data below the limit.For
dustperhaps it'd make more sense to hide anything below a given size. I don't know if that is a feature that is as useful to others, it's probably most useful for keeping the focus on larger file content, especially with the filtered extensions if they were presented with their respective sizes like in my last response.I'm not too concerned on location, I just went with an example that came to mind :) up to you if it adds value to the feature of course.
You could present the extensions with size
.png (15G), .jpg (11G), .webp (238M)or similar format along the percentage bar column rows. That keeps the height as-is, assuming you can invert the text based on a cell being filled or not by the percentage bar overlaying the text.Adding a new column could work too, either on the same row or multi-row.
@MatthieuBizien commented on GitHub (Sep 1, 2021):
I was looking for that exact issue 😀
This feature may be even more useful if it accepts an arbitrary pattern as a filter, eg.
dust --filter "*/images/*.png"would filter the png files in the images subfolders.@bootandy commented on GitHub (Sep 2, 2021):
So @MatthieuBizien in that example dust would only build a tree for files that match the filter ? So only png files in a folder called images ?
@bootandy commented on GitHub (Sep 19, 2021):
shipped with version 0.7