Show HN: Amazon S3 API Support for Backblaze B2 Cloud Storage Service

krullie · on Oct 4, 2017

I've never really been able to figure out what a good strategy is for object storage organization. Do you create a bucket per application instance? user? organization?

Right now I'm playing with a new service and came up with this which is probably over engineerd:

Here is an actual object key including the bucket: 7dcdb229600e4467a2714866e0d406f6/85/26c/c0271374067b5db832adb7909a7/bbda55db15266f7ce2284d8f5f66fc85e495e2b12265ef87537237ad5e2658b24c081970332417f60e5fc352ae9b8c1031398c02ecde03eb29af2d3c8eda8a4b/y18.gif

Given the file's uuid is aabbbcccccccccccccccccccccccccccccc

for original images: {{organizations_uuid as bucket}}/aa/bbb/cccccccccccccccccccccccccccccc/{{sha512sum}}/{{originalfilename}}

And for all derivatives of it: {{organizations_uuid as bucket}}/aa/bbb/cccccccccccccccccccccccccccccc/derived/{{this files uuid}}_{{filename}}

My thinking was that:

- using the organizations' uuid (which can have multiple users) as a bucket makes backing up per organization and having on prem deployments easier.

- Encoding the file's uuid in the object name can identify it easily and by splitting that uuid up in 2/3/rest would help with spreading of objects.

- Encoding the file's sha512sum in the key name would enable checking that file's integrity even without a database.

- putting all derived files under derived but with the original file's uuid prefix makes the link between them clear.

I know this will result in long object names as shown above in the actual example but it does include quite some information. What parts of this is considered bad practice? Do you have any real world examples for other strategies? They seem hard to come by.

aaronds · on Oct 4, 2017

Perhaps I'm missing something about your use case, but I only create buckets per application, or sometimes file category (videos, profile images, whatever).

Trivial example following your org/user pattern:

my_app/profile_images/org_id/user_id/aabbccccccc.jpeg

I then obviously have a reference to that file in my db.

krullie · on Oct 4, 2017

I don't have any other real use case for bucket per org other than easy bucket mirroring, backup and maybe migration from shared hosting to on premises.

I didn't think of using different prefixes for different media usages. We for example would then use thumbnail/originating_file_uuid.png and poster/originating_file_uuid.png.

How would you handle uploaded media then?

aaronds · on Oct 4, 2017

Something to the effect of my_app/profile_images/org_id/user_id/uuid_thumb.png uuid_large.png uuid_original.png can work.

Then in your app, you can have a way of specifying which version of the file you'd like to reference, for example:

`user.avatar.large -> '<path>/uuid_large.png'`

Not sure if that helps?

krullie · on Oct 4, 2017

The problem with that is that the originally uploaded filename is lost. At least without storing it in a separate database.

I could shorten the key by moving the sha512sum from the url to a CHECKSUM file.

org/file_uuid/original/original_filename.png CHECKSUM org/file_uuid/thumb/160x90.png 48x48.png org/file_uuid/poster/1k.png 2k.png org/file_uuid/other/

aaronds · on Oct 4, 2017

Correct, I have no need for the original filename in most cases. If I did want this info, for example if I was building a file browser type thing (ala dropbox), then sure I'd keep that in the db.

Personally I'm uploading directly from the browser to S3 using presigned URL's. All files get uploaded to a /tmp directory in my bucket. This bucket is configured so that all files in /tmp are deleted after 1 day (to remove any unsaved uploads). When a form is submitted, I pass the key to the temporary file in the form (via e.g <input type="hidden" name="s3_key">) and create the associated database record. I then move the file from its temporary location to its permanent one upon saving said record.

Feel free to email me to continue this discussion - email address is on my profile.

tedmiston · on Oct 5, 2017

> The problem with that is that the originally uploaded filename is lost. At least without storing it in a separate database.

Sure, but that's a tradeoff nearly every website accepts because they just need the image itself. If you do want to preserve the original filename, is there a reason for not just keeping it in a database?

krullie · on Oct 5, 2017

I'd like to have these systems as decoupled as possible or at least have some meaningful information without a dependency on an external datastore. This might be just me being paranoid and overthinking it but after having to deal with a nasty monolith of an application for the last couple of years and finally convincing the rest that we need to change if we want to be able to expand I want to do it right.

therealmarv · on Oct 4, 2017

This is something Backblaze should provide on their side. But kudos and thanks minio for the work. Really impressive!

homero · on Oct 4, 2017

Gorgeous, when I first beta tested it, I told them to make it s3 compatible but they convinced me it wasn't the best idea. I still want it though.

anoother · on Oct 4, 2017

First time I've heard of B2. Makes you wonder why it's not S3-compatible in the first place...

jjeaff · on Oct 4, 2017

According to them, it is to keep their cost low. The b2 API requires a call to get an endpoint before you make the upload.

S3 just uses one endpoint and proxies the upload where it wants to go. B2 saves money by not proxying everything coming in.

ams6110 · on Oct 4, 2017

Maybe they thought they could do better. Maybe they didn't want to be in a race to the bottom with Amazon

microcolonel · on Oct 4, 2017

Let's hope Amazon doesn't start doing something anticompetitive with the network routing to B2 from EC2, and instead competes on price.

ComputerGuru · on Oct 4, 2017

What’s the point of an answer like that? Does Amazon have a history of sabotaging network links between, say, Google’s or MSFT’s networks and their own? Or is this just an attempt at being funny?

jjeaff · on Oct 4, 2017

They have a history of sabotaging anyone who competes with them, though not with AWS.

For example, they banned all Chromecast sales from amazon.com after they launched the firetv.

sitepodmatt · on Oct 4, 2017

I don't think that would happen. S3 is already at a competitive advantage in same region due to the absence of data transfer costs you typically pay.

I'd be more interested if Blackblaze purchased several 10gig links per region to AWS under an arrangement like DirectConnect, or had alternative direct peering with AWS removing transit and peering exchange risks. I dont think current DirectConnect is compatible with this though as it seems Blackblaze would have to swallow all the traffic costs for every customer. I could be wrong though...

kondro · on Oct 4, 2017

With multi-region redundancy only (as reduced redundancy is actually more expensive) and amazing integration and workflow options, B2 and S3 are not comparable products.