That's basically scaled up story of 'I store my files on my computer and it is 10x cheaper than using dropbox'
While disks fail rate is already explored in another threads here, there is one related thing that catch my interest. Disk failure in such setup is not just cost of new disk + replacement cost (someone has to go there and change it!). It also inconvenience with dealing with failing requests. Ok, you are willing to lose 5% of your dataset. But are your '200-lines of code' robust enough to handle such cases. What if disk didn't fail, but start to be veeeeery slow. Does your training process can efficiently skip such bad objects. Do you have enough transparency to understand how much data you already lost? Is it still below 5%? And so on and so forth.
I feel like this article was written right after they built this construction and before let say 6 months of usage. Because I'm pretty sure their costs will go much higher than they calculated here. Especially if they start including hidden costs, like the work needed to be done on training side.
Yes, cost for self-hosting most probably still be less than aws (aws is not cheap). But it might start to be comparable with storage solutions of small ('neo') cloud providers if you buy gpu there.
While disks fail rate is already explored in another threads here, there is one related thing that catch my interest. Disk failure in such setup is not just cost of new disk + replacement cost (someone has to go there and change it!). It also inconvenience with dealing with failing requests. Ok, you are willing to lose 5% of your dataset. But are your '200-lines of code' robust enough to handle such cases. What if disk didn't fail, but start to be veeeeery slow. Does your training process can efficiently skip such bad objects. Do you have enough transparency to understand how much data you already lost? Is it still below 5%? And so on and so forth.
I feel like this article was written right after they built this construction and before let say 6 months of usage. Because I'm pretty sure their costs will go much higher than they calculated here. Especially if they start including hidden costs, like the work needed to be done on training side.
Yes, cost for self-hosting most probably still be less than aws (aws is not cheap). But it might start to be comparable with storage solutions of small ('neo') cloud providers if you buy gpu there.