It depends on what the documentation is. If it's 100 pages of "AbstractClassFactoryClassFactoryFactory is a class that builds AbstractClassFactoryClassFactory objects", then that's useless.
Also explains why it's 200,000 lines of code, for something that should be an order of magnitude smaller.
Government & consulting projects generally require the code and the use of the code to be fully documented. I'm actually looking forward to that part of it more than the software if it is any good.
You appear to tilting at windmills of your own making. What I find ridiculous. Is your claim of having had a window into the operational decisions of NSA back when they opted for Java.
If they had chosen a newer language you didn't approve of, you would have glibly dismissed their engineering decision as "being swayed by fashion."
Since they chose Java, though, the engineers are being "professional" and anyone saying that their decision was swayed by fashion is being "ridiculous," claiming to have a "window into the operational decisions of NSA."
Some time ago I met some people from the .gov cyber security, NSA and other offices. The head of the .gov office on cyber security was really nice and invited me to go to dinner with them.
The guy from the NSA was hands down the biggest evil piece of shit I have ever experienced in my life. The way he talked, what he said, and the fact that he was given free reign to commit crimes in his training, which he openly bragged about, made me want to murder the guy right then and there.
I lost any and all respect for what the government and the NSA do.
...because of one guy, who may have been lying through his teeth in an attempt to impress the other people at dinner?
I'm not trying to defend this guy, certainly it's a big problem if he did what he said, but you can't fix something when you paint the whole fucking thing with a brush you picked up looking at one of the uglier parts.
a) The code will be open source - the community can verify the code for anything untoward
b) Given the nature of the product, most implementations are going to be behind a firewall anyway, with the storage layer talking to business logic. Even if there was a backdoor, and I'm sure there isn't, not sure how NSA could get in.
Do you think there's a backdoor in NSA's open-source algorithm for SHA-1 too?
I applaud the government for putting tax dollars back into open source. My only gripe is the lack of transparency as to what this is primarily used for within the NSA (to be expected I guess). I generally like to know what I'm helping commit code to go do - although granted you have no idea what other open source projects are used for regardless of whether the lead sponsor is government or private company.
A "please don't use this code for evil" license would, by definition, not be open-source. (Also, such a license would almost certainly be ignored by evildoers.)
I don't necessarily think there will be one, but I wouldn't be surprised either.
Security flaws can be extremely subtle and 200,000 lines of code is a lot to review... Given that there's plausible deniability (we didn't do it intentionally, it was a genuine bug!), if you were them, wouldn't it at least cross your mind to try it?
Also, at some point, if it becomes popular, some sysadmin at a large foreign government agency or company will forget to firewall off a box running it (ignoring that they could also be connecting back directly - automatic updates anyone?)
But if there is a back door, doesn't releasing it as open source open the possibility that China's or Iran's equivalent of the NSA will audit the code and find it too?
In which case the NSA say "Oops, it was a genuine mistake. Sorry." With 200,000 lines of code, there will almost certainly be unintentional security holes that haven't been found.
"My only gripe is the lack of transparency as to what this is primarily used for within the NSA (to be expected I guess)."
It's likely just used exactly how you think it would be; to hold massive amounts of key/value data. No doubt, the NSA likely has tons of data to work with. A NoSQL approach would be seemingly beneficial for this use case.
A joke?! A JOKE?! You jest good sir. I merely put on my tinfoil hat and thought, "Hmmm, didn't this happen to OpenBSD, Windows, every crypto system ever, numerous databases, and probably SELinux?" Then extrapolated out to a very valid point.
How dare you claim I am not deadly serious about the NSA putting a back door in a database that is intended to be secure for the internet. How. Dare. You!
I've seen a possible back door or two in this or that, but nothing like "every crypto system ever".
If you have evidence of a back door in AES, SHA-2, or anything NIST has standardized (other than Dual_EC_DRBG or openly weakened stuff like export SSL) lots of people would like to hear about it.
Yes, the story goes that the NSA assisted IBM in its development by tuning the specific values in the S-boxes to be resistant to differential cryptanalysis, which had not yet been publicly discovered.
They also reduced the key length from 64 to 56 bits. I found this suspicious and didn't accept the explanation that those 8 bits were needed for "parity". Yet, respected cryptographers say this actually brings the key size more in line with the effective strength. So those additional 8 bits in the key were not contributing to the security and it improves the "truth in labeling".
Why would they build weaknesses into standard blocks, the biggest consumer of which is the US government itself?
When the NSA had at times insisted on an upper limit for a protocol's security (e.g., export crypto), they usually would require a simple upper limit on the number of secret bits in the key. When they've submitted fixes they tend to be elegant and minimal (e.g. SHA-0 to SHA-1).
Can you elaborate on the "openly weakened stuff" part?
I don't know much about security, but I am vaguely aware that there were some efforts by various governments to control, regulate, weaponize and even outlaw crypto, but I don't know where these effort have left us. Are there any crypto systems with acknowledged backdoors? Are there any which are not only widely considered to be secure, but are known to have actually prevented three-letter agencies from getting their way?
Damn, you'd think with 200k lines of awesome Java that needs to be documented with a manual that's hundreds of pages long that uses 3 other massive Java projects and released by a government agency that's done backdoors in everything from crypto systems, operating systems, to even backdoors themselves, that there'd be at least a plausibility of them putting one in.
That's just from a quick google. Back in the day there were stories of "A Visit from Mr. Brown" or something like that. The NSA or "some agency" would go around to anyone making crypto or operating systems and ask to be given backdoors in exchange for deals on export restrictions. Periodically a government agency in another country would find them and we'd be embarrassed. These days it's not as common since crypto exports aren't restricted (much) so the threat of, "If you don't add a backdoor we'll label your software a weapon and you can't sell it to the world." doesn't work.
Then again, could all just be a huge conspiracy.....mwhahahaah.
Oh, the great Bruce Schneier says so, so therefore it must be. How do you know he's not a shill for Microsoft and the NSA? Hmm?
The great thing about backdoors is, when they get discovered they have perfect plausible deniability. "Oh that key named NSAKEY isn't for the NSA it's for...uh...this other agency. Yeah that's it! It's not even a key. Right Bruce? Right?!"
They are an eco-friendly CO_2 emission reducing measure to reduce workload, in a desperate attempt to comply with KIOTO. They needed it to conform to the Energy Star certification scheme from the DoE.
Given that the charter of given agency is certainly not to produce FLOSS, and most certainly not for the pleasure of a foundation which has its worst adversaries as founders (hint: Ben Laurie).
It would be most plausible to have direct access to the build infrastructure, which in turn would give access to ... without the hoops of going through Oracle and IBM or whatever corporate projects.
And if you read the spiegel article (which has to do) with Ben's past-present, it is clear, that the USA is on the "offensive". The surest way to discredit any anonymity provider for whistle-blowers is to discredit the providers. Which has just happened in the last few days (note, that the contents of the 7z itself was already past 0-day, and therefore valueless, as a USA Official noted in the article).
It seems that the tags for cells seems to be an important feature of this database, and they also mention it is appropriate for places where "privacy is important". Can someone explain the connection between these two? If I'm understanding right, the labeling makes it easy to address individual cells, but I'm not sure how that enhances privacy.
The labeling likely refers to Mandatory Access Control (MAC) where objects (data, cells) are assigned classification labels (e.g. Top Secret, Confidential) and subjects (users, processes) can only access objects that match the subject's assigned classification level.
I would imagine that this is similar to other ACL products in which the NSA has previously expressed interest, like SELinux. The "labeling" probably means setting permission levels.
"There is a risk that Accumulo will be criticized for not providing adequate security. The access labels in Accumulo do not in themselves provide a complete security solution, but are a mechanism for labeling each piece of data with the authorizations that are necessary to see it."
I'm guessing that the idea is to make it easy to enforce permissions at the application layer. You give permissions, and you get only cells that the current query-er is allowed to see. With HBase, it would be pretty easy to put permissions by the row (add a permission column, or column family if it's complicated enough), but if you want some columns in a row to have some permissions and some to have different ones, it would get unpleasant and inefficient fast.
And regardless, all of the filtering would have to occur at the application layer, meaning you'd have to wrap every get/scan to have it do the filtering for you. The Accumulo way also gets you some efficiency because it never even has to transfer the cells that get filtered by the permissions (or even fully read their content from disk, possibly).
Even though each cell isn't separately encrypted to get you true security at the cell level (which would destroy your performance, I'd guess), this seems like a huge win if you want to have permissions at the cell level.
So NoSQL approach makes all those skiddies SQLi attacks moot.
Still 200k lines of code = ~2000 bugs...
So, opening it to the public will expose (some) of those, and fixes will be created.
and
Now, when are you going to show off that really kool advanced A.I. you guys are sitting on!
100s of pages of documentation is a promising start for any open source project.