[GH-ISSUE #276] Avoid writing half-uploaded files #146

Closed
opened 2026-04-08 16:50:40 +03:00 by zhus · 4 comments
Owner

Originally created by @Lukinoh on GitHub (Oct 31, 2023).
Original GitHub issue: https://github.com/sigoden/dufs/issues/276

Specific Demand

Hello,

Currently, when you upload a file, it is written sequentially.
So if an error occurs, you could have only half-file written to the disk.
A simple way to reproduce the behaviour is to start uploading a big file, and press F5.

I don't know if it is a use case that makes sense for you to handle in an other way.
Anyway, here is my simple suggestion.

Implement Suggestion

Disclaimer - I have literally 0 experiences or practice in Rust. First, time writing something with it.

So the idea is to write the file in a temporary location until the download is finished. Then, move the file to the correct place.
I did a PoC you can find here.

Technical comments:

  • We cannot create a file to determine if we can write in the directory. Otherwise, the file would appear before it is completely uploaded.
    So, I used the metadata of the parent folder to determine if the folder is read-only or not.

  • To find where to store the temporary file, I dumbly took the first reference I found. So the solution is based on the crate tempfile. It allows you to create temporary files or folders.
    Unfortunately, the temporary file is not compatible with tokio::io::copy. Hence, I created a temporary directory in which I create the temporary file using tokio::fs::File::create.
    And... while writing this feature request, I found that an async_tempfile exists which seems to be compatible with tokio.

Originally created by @Lukinoh on GitHub (Oct 31, 2023). Original GitHub issue: https://github.com/sigoden/dufs/issues/276 ## Specific Demand Hello, Currently, when you upload a file, it is written sequentially. So if an error occurs, you could have only half-file written to the disk. A simple way to reproduce the behaviour is to start uploading a big file, and press F5. I don't know if it is a use case that makes sense for you to handle in an other way. Anyway, here is my simple suggestion. ## Implement Suggestion **Disclaimer - I have <ins>literally</ins> 0 experiences or practice in Rust. First, time writing something with it.** So the idea is to write the file in a temporary location until the download is finished. Then, move the file to the correct place. I did a PoC you can find [here](https://github.com/Lukinoh/dufs/commit/c8e2f4835051e8d7773c96c1dbf59b95a0431bd0). Technical comments: - We cannot create a file to determine if we can write in the directory. Otherwise, the file would appear before it is completely uploaded. So, I used the metadata of the parent folder to determine if the folder is read-only or not. - To find where to store the temporary file, I dumbly took the first reference I found. So the solution is based on the crate [tempfile](https://docs.rs/tempfile/latest/tempfile/). It allows you to create temporary files or folders. Unfortunately, the temporary file is not compatible with `tokio::io::copy`. Hence, I created a temporary directory in which I create the temporary file using `tokio::fs::File::create`. And... while writing this feature request, I found that an [async_tempfile](https://docs.rs/async-tempfile/latest/async_tempfile/) exists which seems to be compatible with `tokio`.
zhus closed this issue 2026-04-08 16:50:40 +03:00
Author
Owner

@sigoden commented on GitHub (Nov 3, 2023):

Isn't it easier to just delete the interrupted file directly?

Tmpfile has two shortcomings:

  1. tmpdir may not exist, for example, the official docker image of dufs is based on scratch and does not have /tmp.
  2. If the disk of workdir and tmpdir is not the same, moving it requires additional costs.

Seeing files being downloaded is not a disadvantage; having one uploading and one downloading at the same time is a feature.

<!-- gh-comment-id:1792380956 --> @sigoden commented on GitHub (Nov 3, 2023): Isn't it easier to just delete the interrupted file directly? Tmpfile has two shortcomings: 1. tmpdir may not exist, for example, the official docker image of dufs is based on scratch and does not have `/tmp`. 2. If the disk of workdir and tmpdir is not the same, moving it requires additional costs. Seeing files being downloaded is not a disadvantage; having one uploading and one downloading at the same time is a feature.
Author
Owner

@Lukinoh commented on GitHub (Nov 3, 2023):

Actually, that was my first implementation, but it does not work very well.

If you upload a file, and then do F5, the file will be displayed even if it has failed to upload, because the endpoint answer before the file is deleted.
Moreover, if other users access the page, they will see the file while it is not yet fully uploaded.

I am not sure of the behaviour if the /tmp folder does not exist beforehand. I am curious if he creates it, or if it fails, I will check later.

<!-- gh-comment-id:1792537574 --> @Lukinoh commented on GitHub (Nov 3, 2023): Actually, that was my first implementation, but it does not work very well. If you upload a file, and then do F5, the file will be displayed even if it has failed to upload, because the endpoint answer before the file is deleted. Moreover, if other users access the page, they will see the file while it is not yet fully uploaded. I am not sure of the behaviour if the `/tmp` folder does not exist beforehand. I am curious if he creates it, or if it fails, I will check later.
Author
Owner

@Lukinoh commented on GitHub (Nov 4, 2023):

I am not sure of the behaviour if the /tmp folder does not exist beforehand. I am curious if he creates it, or if it fails, I will check later.

It fails.

tmpdir may not exist, for example, the official docker image of dufs is based on scratch and does not have /tmp.
If the disk of workdir and tmpdir is not the same, moving it requires additional costs.

I gave a look to the tempdir function, and it is based on std::env::tempdir.
On Unix, tempdir returns the value of the TMPDIR environment variable if it is set, otherwise for non-Android it returns /tmp.

Hence, you could workaround the "not same disk issue" by changing the "TMPDIR" value.
But I am not sure that's the best solution, and won't totally solve the issue with docker image of dufs.

Seeing files being downloaded is not a disadvantage; having one uploading and one downloading at the same time is a feature.

I am sorry, I am not sure to understand this sentence.


Otherwise, we could just improve a bit the simple solution you implemented by doing as the web browsers do.

Instead of writing the file with its name directly, it is written with a temporary name file.
It has the advantages to highlight the fact that a file is being uploaded to the server, while avoiding that a user thinks the file is ready to be downloaded.

In Chrome when you download a file, it creates a temporary file called not confirmed NUMBER.crdownload:
image

And once the download is finished, it is renamed with the correct name.

Or in Firefox when you download a file, it will create two files:
image

And once the download is finished, the part file is renamed with the correct name.

<!-- gh-comment-id:1793460636 --> @Lukinoh commented on GitHub (Nov 4, 2023): > I am not sure of the behaviour if the /tmp folder does not exist beforehand. I am curious if he creates it, or if it fails, I will check later. It fails. > tmpdir may not exist, for example, the official docker image of dufs is based on scratch and does not have /tmp. If the disk of workdir and tmpdir is not the same, moving it requires additional costs. I gave a look to the `tempdir` function, and it is based on [`std::env::tempdir`](https://doc.rust-lang.org/std/env/fn.temp_dir.html). On Unix, `tempdir` returns the value of the `TMPDIR` environment variable if it is set, otherwise for non-Android it returns `/tmp`. Hence, you could workaround the "not same disk issue" by changing the "TMPDIR" value. But I am not sure that's the best solution, and won't totally solve the issue with docker image of dufs. > Seeing files being downloaded is not a disadvantage; having one uploading and one downloading at the same time is a feature. I am sorry, I am not sure to understand this sentence. --- Otherwise, we could just improve a bit the simple solution you implemented by doing as the web browsers do. Instead of writing the file with its name directly, it is written with a temporary name file. It has the advantages to highlight the fact that a file is being uploaded to the server, while avoiding that a user thinks the file is ready to be downloaded. In Chrome when you download a file, it creates a temporary file called `not confirmed NUMBER.crdownload`: ![image](https://github.com/sigoden/dufs/assets/2392459/6b062d58-a726-459b-a74a-d9a8f4a2f0e8) And once the download is finished, it is renamed with the correct name. Or in Firefox when you download a file, it will create two files: ![image](https://github.com/sigoden/dufs/assets/2392459/c90ddc89-f433-4532-917b-efc139b741ff) And once the download is finished, the `part` file is renamed with the correct name.
Author
Owner

@sigoden commented on GitHub (Nov 4, 2023):

The problem has already been solved. Just delete the files if they have not been completely uploaded.

<!-- gh-comment-id:1793463061 --> @sigoden commented on GitHub (Nov 4, 2023): The problem has already been solved. Just delete the files if they have not been completely uploaded.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: sigoden/dufs#146