I wonder why it's implemented as a per-file copy+delete instead of a "copy all f...

lmilcin · on June 2, 2021

It is easier to abstract (at least nively). First, you abstract moving a single file, then you create abstraction for "all files" basically by repeating same operation for all files. You could do this for a subset of files and so on.

As to slow operations... that is more likely because of synchronous implementation.

The popular, naive implementation is, as above, to repeat same simple operation over and over again: read from source, write to destination, read from source, write to destination.

A better implementation (what I would do) would be to pipeline operations. Basic pipeline would have three components, each streaming data to and/or from buffer

1: Read from source to pipeline buffer

2: Read from pipeline buffer to write to destination, write information about files to delete to another pipeline buffer

3: Read files to delete from pipeline buffer and execute deletions.

Using *nix shell you could do something like that in single line

1: tar -c files to output

2: pipe output to tar -xv, write files to destination producing list of written files, pipe written files to output

3: read piped list of written files and remove them from input dir

Now, this is not perfect because we are wasting performance on creating tar file when we immediately discard it, but you get the picture.

tetha · on June 2, 2021

> It is easier to abstract (at least nively). First, you abstract moving a single file, then you create abstraction for "all files" basically by repeating same operation for all files. You could do this for a subset of files and so on.

Litte bit of a rant, but I see this SO MUCH in database layers in applications. Implement an operation for one row, slap a for loop around it, it works kinda quickly on the small test data set... and then prod has an intimate conversation with a brick wall. It has been 0 days at work since that happened.

> As to slow operations... that is more likely because of synchronous implementation.

Interestingly, I think the answer is a solid maybe and depends on the storage and how you issue your i-o operations. A flash storage will increase performance if you increase parallel operations, up to a point. However - and this code apparently was written 20 years ago - on spinning drives, parallel io-operations slow you down if the OS does not merge those operations. So it's entirely not obvious.

lmilcin · on June 2, 2021

On single drive you run a variation where you read X MB of data to a buffer, then write X MB of data out, then execute deletes, and so on. This lets avoid some of the problems with small files. Not all, because small files will unlikely to be consecutive, the head will have to jump a lot, and then you still need to do a lot of small writes to filesystem (to remove the files).

There are obviously improvements you could do. For some filesystem you can just remove entire folders rather than remove the files individually just to remove parent folder.

TeMPOraL · on June 2, 2021

> The popular, naive implementation is, as above, to repeat same simple operation over and over again: read from source, write to destination, read from source, write to destination.

Reminds me of the kind of patterns functional programming languages introduce, where you process data by describing operations on individual items and assembling them into "a stream". I'm always wary of those - without a good implementation and some heavy magic at the language level, they tend to become the kind of context-switching performance disaster you describe.

lmilcin · on June 2, 2021

Yes. Functional world is not impervious to leaky abstractions.

I am personally of the opinion that, to be a good developer, you have to have mental model of what happens beneath. If you are programming in a high level language it is easy to try forget about the fact that your program runs on real hardware.

I know, because I work mostly on Java projects and trying to talk to Java developers about real hardware is useless.

I have an interview question where I ask "what prevents one process from dereferencing a pointer written out by another process on the same machine" and I get all sorts of funny answers and only 5-10% candidates even have beginning of understanding what is going on. Most don't know what virtual memory is or are surprised that two processes can resolve different values under same pointer.

MaxBarraclough · on June 2, 2021

> Most don't know what virtual memory is or are surprised that two processes can resolve different values under same pointer.

Yikes. How many of them have a degree in computer science?

lmilcin · on June 3, 2021

A lot of them have although I must admit that CS graduates fare noticeably better.

What happens is they know what virtual memory is but can't connect the concepts. It is knowledge without understanding.

yurishimo · on June 2, 2021

Probably not many of them, because it doesn't usually take a graduate to write Java. Especially if your experience with Java is also limited to tools like Spring. I think the question is kind of pointless unless your specifically hiring people to work on an application that needs that sort of thru-put. Most apps don't need it.

chii · on June 3, 2021

The funny thing is that in the old days, it is required to know how the machine worked - you coded assembly against physical memory.

But that was incredibly hard, and error prone (and comes with tonnes of limitations). The fact that today, it's very easily possible to write working programs without knowing any of the underlying details, is a marvel.

If i were to hire a truck driver, i wouldn't expect to have to ask him to understand how the truck itself worked (e.g., fuel injection when he presses the accelerator). He only needs to know how to operate the truck from the interface (steering wheel!). Why isn't this the same for a java programmer?

lmilcin · on June 3, 2021

> But that was incredibly hard, and error prone (and comes with tonnes of limitations).

No, it wasn't. If it was incredible anything it was tedious. But nobody really expected anything super complex from you. Just look at the kinds of programs that were produced in 80s or 90s.

It would be incredibly hard today. Machines got very complex, operating systems got very complex.

But it wasn't so complex in 90s. I think I stopped writing assembly when I started using WINAPI because that was the point when assembly stopped being practical.

Sometimes I wonder how I would fare if I was 18yo again today and had to start in development and learn everything from scratch. I feel being able to learn everything as the technologies were evolving is a huge advantage I enjoy.

chenglou · on June 3, 2021

There’s a much vaster gulf between a truck driver and a truck engineer, vs a Java programmer and a programmer one layer below. Per your analogy, the truck driver is the program user here, not the program maintainer.

ithkuil · on June 3, 2021

Rust for example does indeed apply lots of black compiler magic to really cut the cost of those abstractions (I've seen the output of complex iterator chains compiled down to exactly the same machine code you'd write without the abstractions). However man is it slow to compile. Pick your poison.

Zababa · on June 3, 2021

From what I understood the problem with Rust compilation time are more with LLVM and monomorphization of generics. I know that OCaml has a generics system that's kinda like Rust but doesn't monomorphize functions and is really really fast at compiling. OCaml also has its own backend.

lmilcin · on June 3, 2021

> Pick your poison.

I pick Rust every time.

I have this concept of easy problems and hard problems. Every decision to choose technology is a compromise and comes with its own problems. It is your job to know whether these are easy or hard problems.

Compilation time is easy problem. Just put more hardware to it or modularize your application or schedule your coffee breaks correctly.

Building reliable abstractions to prevent hard to debug problems is hard problem. Building a large, complex, reliable application in ANSI C is hard problem.

I try imagine, what would you rather spend your time on: coffee breaks or debugging complex bugs?

ithkuil · on June 3, 2021

> just put more hardware to it or modularize your application

This is not that easy as it seems.

Rust compilation is not embarrassingly parallel as a lot of what's going on seems to be deferred at the link stage.

Modules inside a crate can't be compiled in parallel because of some reason I don't truly understand, please educate me.

Crates can be truly compiled in parallel, crating projects with hundreds of crates is quite a pain for other reasons.

The result is that a change single line of code in the project I'm working in can take more than 2 minutes on my last gen MacBook pro.

I have a M1 Mac mini where the compilation is twice as fast but I don't have enough RAM there and I'll have a hard time to convince my manager to make an exemption to my companies laptop refresher policies "because of rust".

If I have to take a coffee break every time I need to wait for an incremental compilation my heart would explode.

My current solution is:

1. Multitask. Do somethings else while I'm waiting.

2. Split the code that I'm iterating on in a new project. With minimal dependencies and copy the code back in the main project when I'm done (or just import it as a crate if it makes sense)

Both options suck and I wish they weren't necessary. Please let me know what else could I do. What hardware should I buy etc

(I already use zld on Mac and use minimal debug symbols, iirc debug=1)

genpfault · on June 2, 2021

> doing similar operations to a connected android phone

MTP is terrible[1].

[1]: https://en.wikipedia.org/wiki/Media_Transfer_Protocol#Perfor...

initplus · on June 2, 2021

MTP is an absolute dogs breakfast.

I've had multiple experiences of issues on the device side causing the entire explorer.exe process to crash!

Explorers handling of the MTP protocol is not resilient to badly behaved devices, would not at all be surprised if there are security implications where a badly behaved MTP device can get an RCE in explorer.

jakub_g · on June 2, 2021

Oh boy. I was thinking USB 2.0 is main reason for Android<>Windows copying of photos to suck so much, but the rabbit hole is much deeper.

It's sad that with cloud being the solution for everything those days, this will probably never be improved within next decade.

userbinator · on June 3, 2021

It's ridiculously wasteful and slow to have to upload to a cloud server who-knows-where, then download it again to the computer several feet away. Yet with Android having removed USB mass storage, that's often the easiest way. It's against the interests of those who profit off selling cloud storage to make local transfers easy.

steerablesafe · on June 3, 2021

If I need to move a lot of data I just use adb and USB debugging to access the files. That is actually fast, but it's ridiculous that I have to do it.

mihi · on June 3, 2021

Ha, I like that! I should try that as well.

BlueTemplar · on June 3, 2021

I'm using Resilio's Sync (before known as BitTorrent) for this. I guess that technically, it's a kind of cloud, except I'm running it on my intranet ?

toast0 · on June 2, 2021

copy + delete one at a time makes a lot of sense if you're working on a filesystem without a way to move without copying (I don't think you can actually move a file in fat32), because copy all could require more space than is available.

The same could be true here where you're moving from a zip file to probably the same filesystem the zip files is in; if removing a file from the zip file is actually an in-place move data then truncate. The problem, of course, is that removing a file from the zip file is tremendously expensive. Reading the file with one syscall per byte doesn't help (especially post-Spectre workarounds that make syscalls more expensive).

codetrotter · on June 2, 2021

This is the real answer IMO. For example, if you have a 2TB drive with 50GB available space left and are trying to move 1TB of data, and the move is requiring copying, but all individual files on their own are less than 50GB in size, then I’d be pretty upset if my computer was unable to move the files just because it was wanting to not delete anything until the end.

But ideally I’d want the system to delete at the end if possible, and to otherwise delete as needed, instead of either doing only all at end or only after every single file.

m4lvin · on June 3, 2021

> copy + delete [...] I don't think you can actually move a file in fat32

Wait, does that mean that a file which is larger than half of the partition in fat32 cannot be moved? Or not even be renamed?

toast0 · on June 3, 2021

Fat32 supports rename in the same directory as a simple operation. Moving a file though, I don't think so (and nobody corrected me, so I might be right).

toast0 · on June 3, 2021

Ok, well I just tried this on a full fat32 partition, I'm able to move a large file no problems reported, so I was wrong.

formerly_proven · on June 2, 2021

IIRC explorer/shell namespaces are built on top of IStorage / IStream and those don't know bulk operations.

vidarh · on June 2, 2021

Probably because someone worked on a high enough abstraction not to spot it, then tested it on small enough files to not see enough of a performance issue to dig into what actually happened..

TeMPOraL · on June 2, 2021

Perhaps the common approach to performance testing is wrong (forget for a moment that most shops don't think about it at all; let's just consider quality developers/companies). Instead of just monitoring whether the product is fast enough for minimal/typical/peak expected usage, maybe it would be good to focus on determining a boundary. E.g. how much data does it take for the program to run 60 seconds? Or, in general, 10x longer than the maximum you'd consider acceptable? How much data does it take for it to run out of memory?

These determinations can be made with few tests each. Start with some reasonable amount of data, keep doubling it until the test fails. Continue binary-searching between last success and last failure.

The results may come out surprising. Performance does not scale linearly - just because your program takes 1 second for 1 unit of data, and 2 seconds for 2 units of data, doesn't mean it'll take 20 seconds for 20 units of data. It may as well take an hour. Picking a threshold far above what's acceptable, and continuously monitoring how much load is required to reach it, will quickly identify when something is working much slower than it should.

duxup · on June 2, 2021

I find it's kinda hell moving files from both my android AND iPhone ... it feels like a 20 year old process in terms of it working in fits and starts and random failures.

worewood · on June 2, 2021

It's so if the move fails midway you won't need to start from scratch.

slver · on June 2, 2021

In theory all software is coded using a "many by default" approach. So every time batching matters, we take those batching opportunities automatically due to the way software is coded.

In practice we only batch when it starts hurting. It doesn't hurt to delete files one by one on a normal file system. It's made for that. So the API wasn't "many by default" and that's how it works for zip files as well.