keep rsync from removing unfinished source files,
I have two machines, speed and mass. speed has a fast Internet connection and is running a crawler which downloads a lot of files to disk. mass has a lot of disk space. I want to move the files from speed to mass after they’re done downloading. Ideally, I’d just run:
$ rsync --remove-source-files speed:/var/crawldir .
but I worry that rsync will unlink a source file that hasn’t finished downloading yet. (I looked at the source code and I didn’t see anything protecting against this.) Any suggestions?
It seems to me the problem is transferring a file before it’s complete, not that you’re deleting it.
If this is Linux, it’s possible for a file to be open by process A and process B can unlink the file. There’s no error, but of course A is wasting its time. Therefore, the fact that rsync deletes the source file is not a problem.
The problem is rsync deletes the source file only after it’s copied, and if it’s still being written to disk you’ll have a partial file.
How about this: Mount
mass as a remote file system (NFS would work) in
speed. Then just web-crawl the files directly.
That’s the answer keep rsync from removing unfinished source files, Hope this helps those looking for an answer. Then we suggest to do a search for the next question and find the answer only on our site.
The answers provided above are only to be used to guide the learning process. The questions above are open-ended questions, meaning that many answers are not fixed as above. I hope this article can be useful, Thank you