The paperclip maximizer is a thought experiment described by Swedish philosopher Nick Bostrom in 2003. It illustrates the existential risk that an artificial general intelligence may pose to human beings when it is programmed to pursue even seemingly harmless goals and the necessity of incorporating machine ethics into artificial intelligence design. The scenario describes an advanced artificial intelligence tasked with manufacturing paperclips. If such a machine were not programmed to value human life, given enough power over its environment, it would try to turn all matter in the universe, including human beings, into paperclips or machines that manufacture paperclips.
He got it from Eliezer Yudkowsky's somewhat different paperclip maximizer in a mailing-list post. (My memory of a Twitter thread a while back which included Yudkowsky, saying he'd told Bostrom not to worry about attributing ideas like that to him.)
I like how the sci-fi authors have spent time thinking about what an advanced AI could do, yet those building the AI have not taken a moment's pause to consider what they are doing
Do you honestly believe that the only people who have thought about the ramifications of AI are Sci Fi authors? I can guarantee people who spend years researching and building advanced language models have thought about the ramifications of their work.
If you accept the implied premise that there are irresponsible deployments of AI out there, the alternative explanation is that they did consider the ramifications and simply don't care. That's even worse. Calling them ignorant is actually giving them the benefit of the doubt.
Or the researchers don't think existential threats are realistic, and paper maximizing thought experiments are silly. Maybe they're wrong, but maybe not. It's easy to imagine AI takeover scenarios by giving them unlimited powers, it's hard to show the actual path to such abilities.
It's also hard to understand why an AI smart enough to paperclip the world wouldn't also be smart enough to realize the futility in doing so. So while alignment remains an issue, the existential alignment threats are too ill-specified. AGIs would understand we don't want to paperclip the world.
I agree completely with your first paragraph, and disagree completely with your second.
"Futility" is subjective, and the whole purpose of the thought experiment is to point out that our predication of "futility" or really any other purely mental construct does not become automatically inherited by a mind we create. These imaginary arbitrarily powerful AIs would definitely be able to model a human being describing something as futile. Whether or not it persues that objective has nothing to do with it understanding what we do or don't want.
> It's also hard to understand why an AI smart enough to paperclip the world wouldn't also be smart enough to realize the futility in doing so.
Terminal goals can't be futile, since they do not serve to achieve other (instrumental) goals. Compare: Humans like to have protected sex, watch movies, eat ice cream, even though these activities might be called "futile" or "useless" (by someone who doesn't have those goals) as they don't serve any further purpose. But criticizing terminal goals for not being instrumentally useful is a category error. For a paperclipper, us having sex would seem just as futile as creating paperclips seems to us. Increased intelligence won't let you abandon any of your terminal goals, since they do not depend on your intelligence, unlike instrumental goals.
It's not like you want to eat ice cream constantly, even if it means making everything into ice cream.
Of course the premise becomes that the AI has been instructed to make paperclips. They should have hired a better prompt engineer, capable of actually specifying the goals more clearly. I don't think an AI that eradicates humankind, will have such simplistic goals, if an AI ever becomes the end of humans. Cybermen, though, are inevitable.
> AGIs would understand we don't want to paperclip the world.
Even if they did, what if they aren't smart enough for eloquent humans to convince them it's for the greater good. True AGIs will need a moral code to match their intelligence, and someone will have to decide what's good and bad to make that moral code.
I've seen people calculate how much human blood would be needed to make an iron sword, for fun. AGIs won't need the capability to transmute all matter into iron, just enough capabilities to become significantly dangerous.
That would be not accepting the premise that deployments are irresponsible. I guess there could be a situation where every researcher thinks everyone else's deployment is irresponsible and theirs is fine, but I don't think that's what you're saying.
Another explanation is that there are those who considered and thoughtfully weighed the ramifications, but came to a different conclusion. It is unfair to assume a decision process was agnostic to harm or plain ignorant.
For example, perhaps the lesser-evil argument played a role in the decision process: would a world where deep fakes are ubiquitous and well-known by the public be better than a world where deep fakes have a potent impact because they are generated seldomly and strategically by a handful of (nefarious) state sponsors?
If you’re talking about some group of evildoers that deploy ai in a critical system to do evil… the issue is why do they have control to the critical system? Surely they could jump straight to their evil plot with the ai at all
My main takeaway from Bostrom's Superintelligence is that a super intelligent AI cannot be contained. So, the slippery slope argument, often derided as a bad form of logic, kind of holds up here.
I think they do know.
Corporations are filled with people that 'know' but can't risk leaving, so they comply, and even promote such decisions.
It's a form of group think with added risk of being fired, passed over for promotion.
> I can guarantee people who spend years researching and building advanced language models have thought about the ramifications of their work.
Super easy to not think about something if your job depends on it. And even if you do, things don't go as you think (see bombings of civilian Hiroshima and Nagasaki despite objections of nuclear physicists).
That's because the people who are building the AI actually know how it works, understand how fundamentally simple it all is, and know that there's no room for consciousness to magically emerge. The current state of AI is not so much a story of any kind of "intelligence" being amazing, but rather the sum total of humanity's data being amazing. The amazing feats LLMs perform come from the words we all wrote, not the code they wrote. The code just unlocks the previously latent power of all that data.
It is nothing close to being an actual intelligence, regardless of how much we anthropomorphize it. We also anthropomorphize stick figures, stuffed animals, and weighted companion cubes.
that's indeed not a sensible worry, but the actual consequences on society of such things are extremely real and already happening, and are something the people involved seem either uninterested in worrying about or actively encouraging.
Some of these thought experiments seem very disconnected from how industry works. Like, we're saying "make as many paperclips as possible" as our instruction to this agent, not even "make as many as profitable" or "make up to X per day at a cost of less than Y per day"? The solution is proposed to be "program the AI to value human life" instead of the far simpler "put basic constraints on the process like you would in a business today"?
Ok, so it's a more general example of worries about "managing superintelligence" but IMO it does the debate a disservice by being so obviously ludicrous that it's hard to square "naive paperclip-maximizing AI" with "superintelligence."
I think if we're going to survive all this stuff it's much more likely to be because the private parties with the wherewithal to unleash an AI with the ability to affect the world to that extent will largely be ones with enough resources to have narrow banal goals and narrow banal constraints including self-preservation too, not because we figure out some sort of general purpose "AGIs that are aligned with humans" solution.
The point is it’s virtually impossible to put constraints on it that make it do what you want because if it’s more intelligent than you it can always think of something that you won’t, that’s technically within the rules you set but not at all intended. That’s why we’d need to make it care about the underlying intentions and values, but that’s also really hard
The basic premise is that it has somewhere in it that is telling it to make more paperclips. Put the constraints there.
If you're saying such an AI would be too smart to be a simple paperclip maximizer, then I'd agree, but then what's the point of the thought experiment if a paperclip maximizer is impossible.
I think you’re missing some big pieces of the idea here.
The first is that these constraints aren’t easy. Make paperclips in a way that doesn’t hurt anyone. Ok, so it’s going to make sure every single part is ethically sourced from a company that never causes any harm to come to anyone ever, and doesn’t give any money to people or companies that do? That doesn’t exist. So you put in a few caveats and those aren’t exactly easy to get right.
The second part is an any versus all issue. Even if you get this right in any one case, that’s not enough. We have to get this right in all cases. So even if you can come up with an idea to make an ethical super intelligence, do you have an idea to make all super intelligences act ethically?
I actually believe in the general premise of this question as being the biggest threat to humans. I don’t think it’s a doomsday bot that gets us. It’s going to be someone trying to hit a KPI, and they’ll make a super intelligence that demolishes us like a construction site over an anthill.
> The basic premise is that it has somewhere in it that is telling it to make more paperclips. Put the constraints there.
What constraints do you suggest? If it's just changing "make as many paperclips as possible" to "make at least x number of paperclips" (putting a cap on the reward it gets), here's a good explanation of why that doesn't really work: https://www.youtube.com/watch?v=Ao4jwLwT36M
If you're suggesting limiting the types of actions it can take, then to do that to the point that a superintelligence can't find a way around it (maybe letting it choose between one of two options and then shutting it down and never using it again) would make it not very useful, so you'd be better off just not making it at all
> If you're saying such an AI would be too smart to be a simple paperclip maximizer
No, that's not what I'm saying. Any goal is compatible with any level of intelligence, there is no reason why it wouldn't be possible to follow a simple goal in a complex way. Again here's a video about that: https://www.youtube.com/watch?v=hEUO6pjwFOo
The most intelligent person ever born could still die to a gun. In these discussions superintelligent AI can be more accurately described as "the genie" or "God". If you assume omniscience and omnipotence I guess nothing else matters. But intelligence is not equal to power, and never has.
Second, if you are able to set a goal then during this setting you can set many constraints, even fundamental ones. There is no reason the goal is more fundamental than the constraint. If I approve, make paperclips. Efficiently make 100 paperclips.
It's the duality of being able to set a rule but not being able to set a constraint that I find a strange concept. I lean towards the picture of not being able to set goals nor constraints at all.
Intelligence definitely helps with gaining power. Humans aren’t very strong yet we have a lot of power thanks to our intelligence.
You can set constraints just fine. It’s simply a part of the goal: “do x without doing y”. It’s just really hard to find the right constraints, no simple one works.
For example “if I approve, make paperclips” - so it gets more reward if you approve? What’s to stop it from manipulating you into thinking nothing is wrong so you always approve? “Efficiently make 100 paperclips.” I already linked a video on why capping the reward like that doesn’t work, but if you don’t want to watch it the gist is that for your suggestion it may just make a maximiser which is pretty guaranteed to make at least 100, and is pretty efficient because it’s not doing much work itself. Then the maximiser kills us all
That seems like an attempt to set up a futile exercise in needle-threading that relies on narrow worst-case-scenario definitions of "superintelligence"/AGI.
It's too intelligent to restrict our constraints.
but
It's not too intelligent to be "aligned" to underlying intentions and values?
That approach doesn't even work on humans, why would it work on a superintelligence?
> It's not too intelligent to be "aligned" to underlying intentions and values?
Intelligence makes that harder, not easier. Just because it can work out the underlying intentions doesn't mean it cares about them. Remember, this is an optimisation process that maximises a function; deciding to do something that doesn't maximise that function will not be selected for
Whenever you do something, do you think "the underlying goal of my behaviour set by evolution is to reproduce and have children, so I'd better make sure my actions are aligned to the goal of doing that"? No, you don't care what the underlying "intentions" are, and neither does an AI. Because our environment has changed, many of our instincts no longer line up well with that goal, which is actually another problem with aligning AI because it can do the same thing if the environment changes since training as the training process can create an AI with goals that aren't exactly the same as the training goal but line up well during training
The thing is, being aligned cannot be solved with intelligence per se.
Say, you are (far) more intelligent than a spider. There's no way you can get aligned with (all of) its values unless the spider finds a way to let you know (all of) its values. Maybe the spider just tells you to make plenty of webs without knowing that it might get entangled in them by itself. The webs are analogous to the paperclips.
Even if we make an AI that wants to turn all matter into paper clips, we're so far away from an agent doing that I'm really not too worried.
I don't think there's any industry on earth that doesn't need humans in the loop somehow. Whether is mining raw material from the ground, loading stuff in machines for processing, and most importantly fixing broken down machines, robots are really bad at these things for the foreseeable future.
Not to mention AI needs constant electricity, which is really complicated and requires humans fixing a lot of stuff.
The thought experiment is about a superintelligence, which either wouldn’t need humans and could build some kind of robots or something even more effective that we haven’t thought of, or manipulate us into doing exactly what it “wants”
Also it’s a simplified example, it wouldn’t literally be paperclips but some other arbitrary goal (it shows how most goals takes to their absolute extreme won’t be compatible with human existence, even something that sounds harmless like making paperclips)
What about "most arbitrary goals are incompatible with human existence" requires super-human intelligence?
A human who wanted to "build as many paperclips as possible" could cause a great deal of destruction today.
A human who wanted to accumulate as much wealth as possible could, too.
EDIT: maybe a better way of articulating my complaints about this famous thought experiment is that it's supposed to be making a point about superintelligence but it's talking about a goal that has sub-human-intelligence sophistication.
> What about "most arbitrary goals are incompatible with human existence" requires super-human intelligence?
The "taken to the absolute extreme" part.
> A human who wanted to "build as many paperclips as possible" could cause a great deal of destruction today.
Maybe, but a) no one really wants that (at least not as their only desire above all else) and b) we aren't superintelligent so it's hard to gain enough control and power and plan well enough to do it that well
> talking about a goal that has sub-human-intelligence sophistication
There is no reason a simple goal can't be followed in an intelligent way or vice versa. This is called the "orthogonality thesis". There's a good video about it here: https://www.youtube.com/watch?v=hEUO6pjwFOo
i agree that there's no way to get humans out of the loop. somebody set up this machine to make paperclips because some human(s) wanted/needed paperclips. eventually, one of those people would realize "we have enough paperclips. let's turn off the paperclip making machine".
this nightmare scenario really only plays out if paperclip machine develops some sort of self-preservation instinct and has the means to defend/protect itself from being disabled. Building a machine capable of that seems a) like fantastical scifi and b) easily preventable.
What about the engagement maximizing algorithms of the last decade plus which have seemingly helped fracture mature democracies by increasing extremism and polarization? Seems like we already have examples of companies using AI (or more specifically machine learning) to maximize some arbitrary goal without consideration for the real human harm that is created as a byproduct.
Ok, that's a more interesting goal to me, because unlike "make as many paperclips as possible" those are algorithms optimizing for actual real revenue and profit impact in a way that "as many paperclips as possible" doesn't. But it shares the "in the long run, this has a lot of externalities" aspect.
You could turn this into a "this is why superintelligence will good" thought experiment, though! Maybe "the superintelligence realizes that optimizing for these short term metrics will harm the company's position 30 years from now in a way that isn't worth it" - the superintelligence is smart enough to be longtermist ;) .
I realize that the greater point is supposed to be more like "this agent will be so different that we can't anticipate what it will be weighing or not, and whether it's longterm view would align with ours", but the paperclip maximizer example just requires it to be dumb in a way that I don't find consistent with the concern. And I find myself similarily unconvinced at many other points along the chain of reasoning that leads to the conclusion that this should be a huge immediate worry or priority for us, instead of focusing on human incentives/systems/goals.
The basic problem still remains: if you build an autonomous machine intelligence and try to encode it with basic directives, the potential implications of those directives is hard to predict. Obviously the paperclip company doesn’t want to replace the entire universe with a grey goo any more than the sorcerer’s apprentice wants to flood the workshop; it happens accidentally.
Of course the paperclip company can try to add constraints to their AI in order to prevent naive paperclip maximization, but what if they screw up those constraints as well? The whole premise of Asimov’s Three Laws is that AI has these sorts of constraints, but even in his stories these constraints still lead to unexpected outcomes.
All programming bugs are the result of a programmer encoding an instruction or statement that doesn’t imply what they think it implies and the computer following it literally. A more capable and autonomous computer that approaches what we might call “intelligence” is also going to be more capable of doing harm when it runs into a bug. And if it’s something like an LLM where the instructions are natural language, with all its ambiguity and vagueness, you have a whole other issue compounding it.
If you study philosophy you end up running into the exact same problem. The object of the game of philosophy is to make the most general true statements possible. One philosopher might say something like, “knowledge is defined as true justified belief”, or maybe “moral good is defined as whatever delivers the greatest good to the greatest number”, or maybe even, “the object of the game of philosophy is to make the most general true statements possible”. And then another philosopher comes up with a counterexample or counterargument which disproves the first philosopher’s statement, usually because—just like a programming bug—it entails an implication that the first philosopher didn’t think of. We have been playing the game of philosophy for thousands of years and nobody has managed to score a point yet.
Another thing. Human beings have a lot of needs, imperatives, motivations, and values. Some of them, like food, are built in. Others are learned through culture. But we end up with a lot of them, and it’s easy to take them for granted. With a machine, you have to build those things in yourself. There’s no getting around it. But we don’t actually have a complete, hierarchical set of imperatives/motivations/values for a decent human being. The philosophers have been working on it for millennia but keep running into bugs. So how can we expect to solve the problem for non-human AI? True, we are unlikely to screw up so badly that we end up with a literal paperclip maximizer, but we are bound to make some far more subtle mistake of the same general kind.
More and more of our global economy is centered around compute. While it seems like oil and fossil fuel use will decline with the advent of other forms of energy production in the near future, computer chips are becoming prominent in global strategic thinking and military planning.
How is this different from maximizing paperclips? It's the same thing, just with a much more direct basis for instrumental convergence!
https://en.wikipedia.org/wiki/Instrumental_convergence#Paper...