Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The paperclip maximizer is a thought experiment described by Swedish philosopher Nick Bostrom in 2003. It illustrates the existential risk that an artificial general intelligence may pose to human beings when it is programmed to pursue even seemingly harmless goals and the necessity of incorporating machine ethics into artificial intelligence design. The scenario describes an advanced artificial intelligence tasked with manufacturing paperclips. If such a machine were not programmed to value human life, given enough power over its environment, it would try to turn all matter in the universe, including human beings, into paperclips or machines that manufacture paperclips.

https://en.wikipedia.org/wiki/Instrumental_convergence#Paper...



I wonder if he got the idea from Philip K. Dick's 1955 story "Autofac"?

https://en.wikipedia.org/wiki/Autofac

https://www.vulture.com/2018/01/electric-dreams-recap-season...


He got it from Eliezer Yudkowsky's somewhat different paperclip maximizer in a mailing-list post. (My memory of a Twitter thread a while back which included Yudkowsky, saying he'd told Bostrom not to worry about attributing ideas like that to him.)


PKD keeps surprising me.


I like how the sci-fi authors have spent time thinking about what an advanced AI could do, yet those building the AI have not taken a moment's pause to consider what they are doing


https://twitter.com/AlexBlechman/status/1457842724128833538

Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale

Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus


Do you honestly believe that the only people who have thought about the ramifications of AI are Sci Fi authors? I can guarantee people who spend years researching and building advanced language models have thought about the ramifications of their work.

This isn’t Jurassic Park.


If you accept the implied premise that there are irresponsible deployments of AI out there, the alternative explanation is that they did consider the ramifications and simply don't care. That's even worse. Calling them ignorant is actually giving them the benefit of the doubt.


Or the researchers don't think existential threats are realistic, and paper maximizing thought experiments are silly. Maybe they're wrong, but maybe not. It's easy to imagine AI takeover scenarios by giving them unlimited powers, it's hard to show the actual path to such abilities.

It's also hard to understand why an AI smart enough to paperclip the world wouldn't also be smart enough to realize the futility in doing so. So while alignment remains an issue, the existential alignment threats are too ill-specified. AGIs would understand we don't want to paperclip the world.

Fun game though.


I agree completely with your first paragraph, and disagree completely with your second.

"Futility" is subjective, and the whole purpose of the thought experiment is to point out that our predication of "futility" or really any other purely mental construct does not become automatically inherited by a mind we create. These imaginary arbitrarily powerful AIs would definitely be able to model a human being describing something as futile. Whether or not it persues that objective has nothing to do with it understanding what we do or don't want.


> It's also hard to understand why an AI smart enough to paperclip the world wouldn't also be smart enough to realize the futility in doing so.

Terminal goals can't be futile, since they do not serve to achieve other (instrumental) goals. Compare: Humans like to have protected sex, watch movies, eat ice cream, even though these activities might be called "futile" or "useless" (by someone who doesn't have those goals) as they don't serve any further purpose. But criticizing terminal goals for not being instrumentally useful is a category error. For a paperclipper, us having sex would seem just as futile as creating paperclips seems to us. Increased intelligence won't let you abandon any of your terminal goals, since they do not depend on your intelligence, unlike instrumental goals.


It's not like you want to eat ice cream constantly, even if it means making everything into ice cream.

Of course the premise becomes that the AI has been instructed to make paperclips. They should have hired a better prompt engineer, capable of actually specifying the goals more clearly. I don't think an AI that eradicates humankind, will have such simplistic goals, if an AI ever becomes the end of humans. Cybermen, though, are inevitable.


Yes, they should just write prompts without bugs. Can't be that much harder than writing software without bugs.


> AGIs would understand we don't want to paperclip the world.

Even if they did, what if they aren't smart enough for eloquent humans to convince them it's for the greater good. True AGIs will need a moral code to match their intelligence, and someone will have to decide what's good and bad to make that moral code.


Then they won't be smart enough to paperclip the world. No human organization can do that.


I've seen people calculate how much human blood would be needed to make an iron sword, for fun. AGIs won't need the capability to transmute all matter into iron, just enough capabilities to become significantly dangerous.


That would be not accepting the premise that deployments are irresponsible. I guess there could be a situation where every researcher thinks everyone else's deployment is irresponsible and theirs is fine, but I don't think that's what you're saying.


Another explanation is that there are those who considered and thoughtfully weighed the ramifications, but came to a different conclusion. It is unfair to assume a decision process was agnostic to harm or plain ignorant.

For example, perhaps the lesser-evil argument played a role in the decision process: would a world where deep fakes are ubiquitous and well-known by the public be better than a world where deep fakes have a potent impact because they are generated seldomly and strategically by a handful of (nefarious) state sponsors?


there's also the issue that most of the AI catastrophizing is a pretty clear slipperyslope argument:

if we build ai AND THEN we give it a stupid goal to optimize AND THEN we give it unlimited control over its environment, something bad will happen.

the conclusion is always "building AI is wrong" and not "giving AI unrestricted control of critical systems is wrong"


The massive flaw in your argument is your failure to define "we".

Replace the word "we" with "a psychotic group of terrorists" in your post and see how it reads.


If you’re talking about some group of evildoers that deploy ai in a critical system to do evil… the issue is why do they have control to the critical system? Surely they could jump straight to their evil plot with the ai at all


Your question is equivalent to "if you have access to the chessboard anyway, why use Stockfish, just play the moves yourself."


Or "board of directors beholden to share-holders".


I completely agree that's a valid argument. I just think it is rational for someone to come to a different conclusion, given identical priors.


If it wasn’t clear, I agree with your parent comment


My main takeaway from Bostrom's Superintelligence is that a super intelligent AI cannot be contained. So, the slippery slope argument, often derided as a bad form of logic, kind of holds up here.


See also social media platforms. They are very well informed of the results of their algorithmic changes.

See also big tobacco. They exactly what their additives to the product did.

See also 3M and PFAS. See also Big Oil. See also, see also...

Why would I expect anything different from any other branch of business using the precedence laid before us?


I think they do know. Corporations are filled with people that 'know' but can't risk leaving, so they comply, and even promote such decisions. It's a form of group think with added risk of being fired, passed over for promotion.

Eichmann.


Some of us considered it and even decided to go into different fields as a result.

Some others entered the field, made progress, and apparently regretted it.

Others are willing to put their concerns aside for money. Salaries get very high in that field.


Haha, but it is! :-D

LLMs are pretty basic stuff but we are all struggling with what to use them for!

OpenAI is manually playing whac-a-mole with ChatGPT saying the darndest things!


> I can guarantee people who spend years researching and building advanced language models have thought about the ramifications of their work.

Super easy to not think about something if your job depends on it. And even if you do, things don't go as you think (see bombings of civilian Hiroshima and Nagasaki despite objections of nuclear physicists).


Depends what we're looking at. Ride share disruption was very Jurassic Park and we've been dealing with ramifications ever since.


That's not how it works. People publish papers demonstrating improvements without thinking about "ramifications".


They have – and decided to do it anyway.


That's because the people who are building the AI actually know how it works, understand how fundamentally simple it all is, and know that there's no room for consciousness to magically emerge. The current state of AI is not so much a story of any kind of "intelligence" being amazing, but rather the sum total of humanity's data being amazing. The amazing feats LLMs perform come from the words we all wrote, not the code they wrote. The code just unlocks the previously latent power of all that data.

It is nothing close to being an actual intelligence, regardless of how much we anthropomorphize it. We also anthropomorphize stick figures, stuffed animals, and weighted companion cubes.


that's indeed not a sensible worry, but the actual consequences on society of such things are extremely real and already happening, and are something the people involved seem either uninterested in worrying about or actively encouraging.


Good thing no new technology ever caused anything bad thanks to us not anthropomorphising it.


You've written a lot of words yet I don't see a compelling reason to believe a single one of them.


The people building it know it isn't AI, the people selling it call it that.


To say that something like GPT-4 does not count as "AI" requires a gargantuan shifting of goalposts from where they were at this time last year.


So it was ever since computers started playing chess.


Some of these thought experiments seem very disconnected from how industry works. Like, we're saying "make as many paperclips as possible" as our instruction to this agent, not even "make as many as profitable" or "make up to X per day at a cost of less than Y per day"? The solution is proposed to be "program the AI to value human life" instead of the far simpler "put basic constraints on the process like you would in a business today"?

Ok, so it's a more general example of worries about "managing superintelligence" but IMO it does the debate a disservice by being so obviously ludicrous that it's hard to square "naive paperclip-maximizing AI" with "superintelligence."

I think if we're going to survive all this stuff it's much more likely to be because the private parties with the wherewithal to unleash an AI with the ability to affect the world to that extent will largely be ones with enough resources to have narrow banal goals and narrow banal constraints including self-preservation too, not because we figure out some sort of general purpose "AGIs that are aligned with humans" solution.

Kinda like with nukes.


The point is it’s virtually impossible to put constraints on it that make it do what you want because if it’s more intelligent than you it can always think of something that you won’t, that’s technically within the rules you set but not at all intended. That’s why we’d need to make it care about the underlying intentions and values, but that’s also really hard


The basic premise is that it has somewhere in it that is telling it to make more paperclips. Put the constraints there.

If you're saying such an AI would be too smart to be a simple paperclip maximizer, then I'd agree, but then what's the point of the thought experiment if a paperclip maximizer is impossible.


I think you’re missing some big pieces of the idea here.

The first is that these constraints aren’t easy. Make paperclips in a way that doesn’t hurt anyone. Ok, so it’s going to make sure every single part is ethically sourced from a company that never causes any harm to come to anyone ever, and doesn’t give any money to people or companies that do? That doesn’t exist. So you put in a few caveats and those aren’t exactly easy to get right.

The second part is an any versus all issue. Even if you get this right in any one case, that’s not enough. We have to get this right in all cases. So even if you can come up with an idea to make an ethical super intelligence, do you have an idea to make all super intelligences act ethically?

I actually believe in the general premise of this question as being the biggest threat to humans. I don’t think it’s a doomsday bot that gets us. It’s going to be someone trying to hit a KPI, and they’ll make a super intelligence that demolishes us like a construction site over an anthill.


> The basic premise is that it has somewhere in it that is telling it to make more paperclips. Put the constraints there.

What constraints do you suggest? If it's just changing "make as many paperclips as possible" to "make at least x number of paperclips" (putting a cap on the reward it gets), here's a good explanation of why that doesn't really work: https://www.youtube.com/watch?v=Ao4jwLwT36M

If you're suggesting limiting the types of actions it can take, then to do that to the point that a superintelligence can't find a way around it (maybe letting it choose between one of two options and then shutting it down and never using it again) would make it not very useful, so you'd be better off just not making it at all

> If you're saying such an AI would be too smart to be a simple paperclip maximizer

No, that's not what I'm saying. Any goal is compatible with any level of intelligence, there is no reason why it wouldn't be possible to follow a simple goal in a complex way. Again here's a video about that: https://www.youtube.com/watch?v=hEUO6pjwFOo


The most intelligent person ever born could still die to a gun. In these discussions superintelligent AI can be more accurately described as "the genie" or "God". If you assume omniscience and omnipotence I guess nothing else matters. But intelligence is not equal to power, and never has.

Second, if you are able to set a goal then during this setting you can set many constraints, even fundamental ones. There is no reason the goal is more fundamental than the constraint. If I approve, make paperclips. Efficiently make 100 paperclips.

It's the duality of being able to set a rule but not being able to set a constraint that I find a strange concept. I lean towards the picture of not being able to set goals nor constraints at all.


Intelligence definitely helps with gaining power. Humans aren’t very strong yet we have a lot of power thanks to our intelligence.

You can set constraints just fine. It’s simply a part of the goal: “do x without doing y”. It’s just really hard to find the right constraints, no simple one works.

For example “if I approve, make paperclips” - so it gets more reward if you approve? What’s to stop it from manipulating you into thinking nothing is wrong so you always approve? “Efficiently make 100 paperclips.” I already linked a video on why capping the reward like that doesn’t work, but if you don’t want to watch it the gist is that for your suggestion it may just make a maximiser which is pretty guaranteed to make at least 100, and is pretty efficient because it’s not doing much work itself. Then the maximiser kills us all


That seems like an attempt to set up a futile exercise in needle-threading that relies on narrow worst-case-scenario definitions of "superintelligence"/AGI.

It's too intelligent to restrict our constraints.

but

It's not too intelligent to be "aligned" to underlying intentions and values?

That approach doesn't even work on humans, why would it work on a superintelligence?


> It's not too intelligent to be "aligned" to underlying intentions and values?

Intelligence makes that harder, not easier. Just because it can work out the underlying intentions doesn't mean it cares about them. Remember, this is an optimisation process that maximises a function; deciding to do something that doesn't maximise that function will not be selected for

Whenever you do something, do you think "the underlying goal of my behaviour set by evolution is to reproduce and have children, so I'd better make sure my actions are aligned to the goal of doing that"? No, you don't care what the underlying "intentions" are, and neither does an AI. Because our environment has changed, many of our instincts no longer line up well with that goal, which is actually another problem with aligning AI because it can do the same thing if the environment changes since training as the training process can create an AI with goals that aren't exactly the same as the training goal but line up well during training


> not too intelligent to be aligned

The thing is, being aligned cannot be solved with intelligence per se.

Say, you are (far) more intelligent than a spider. There's no way you can get aligned with (all of) its values unless the spider finds a way to let you know (all of) its values. Maybe the spider just tells you to make plenty of webs without knowing that it might get entangled in them by itself. The webs are analogous to the paperclips.


It's less about not knowing the intentions and more that it has no reason to care about anything other than the goal you gave it


Even if we make an AI that wants to turn all matter into paper clips, we're so far away from an agent doing that I'm really not too worried.

I don't think there's any industry on earth that doesn't need humans in the loop somehow. Whether is mining raw material from the ground, loading stuff in machines for processing, and most importantly fixing broken down machines, robots are really bad at these things for the foreseeable future.

Not to mention AI needs constant electricity, which is really complicated and requires humans fixing a lot of stuff.


The thought experiment is about a superintelligence, which either wouldn’t need humans and could build some kind of robots or something even more effective that we haven’t thought of, or manipulate us into doing exactly what it “wants”

Also it’s a simplified example, it wouldn’t literally be paperclips but some other arbitrary goal (it shows how most goals takes to their absolute extreme won’t be compatible with human existence, even something that sounds harmless like making paperclips)


What about "most arbitrary goals are incompatible with human existence" requires super-human intelligence?

A human who wanted to "build as many paperclips as possible" could cause a great deal of destruction today.

A human who wanted to accumulate as much wealth as possible could, too.

EDIT: maybe a better way of articulating my complaints about this famous thought experiment is that it's supposed to be making a point about superintelligence but it's talking about a goal that has sub-human-intelligence sophistication.


> What about "most arbitrary goals are incompatible with human existence" requires super-human intelligence?

The "taken to the absolute extreme" part.

> A human who wanted to "build as many paperclips as possible" could cause a great deal of destruction today.

Maybe, but a) no one really wants that (at least not as their only desire above all else) and b) we aren't superintelligent so it's hard to gain enough control and power and plan well enough to do it that well

> talking about a goal that has sub-human-intelligence sophistication

There is no reason a simple goal can't be followed in an intelligent way or vice versa. This is called the "orthogonality thesis". There's a good video about it here: https://www.youtube.com/watch?v=hEUO6pjwFOo


i agree that there's no way to get humans out of the loop. somebody set up this machine to make paperclips because some human(s) wanted/needed paperclips. eventually, one of those people would realize "we have enough paperclips. let's turn off the paperclip making machine".

this nightmare scenario really only plays out if paperclip machine develops some sort of self-preservation instinct and has the means to defend/protect itself from being disabled. Building a machine capable of that seems a) like fantastical scifi and b) easily preventable.


What about the engagement maximizing algorithms of the last decade plus which have seemingly helped fracture mature democracies by increasing extremism and polarization? Seems like we already have examples of companies using AI (or more specifically machine learning) to maximize some arbitrary goal without consideration for the real human harm that is created as a byproduct.


Ok, that's a more interesting goal to me, because unlike "make as many paperclips as possible" those are algorithms optimizing for actual real revenue and profit impact in a way that "as many paperclips as possible" doesn't. But it shares the "in the long run, this has a lot of externalities" aspect.

You could turn this into a "this is why superintelligence will good" thought experiment, though! Maybe "the superintelligence realizes that optimizing for these short term metrics will harm the company's position 30 years from now in a way that isn't worth it" - the superintelligence is smart enough to be longtermist ;) .

I realize that the greater point is supposed to be more like "this agent will be so different that we can't anticipate what it will be weighing or not, and whether it's longterm view would align with ours", but the paperclip maximizer example just requires it to be dumb in a way that I don't find consistent with the concern. And I find myself similarily unconvinced at many other points along the chain of reasoning that leads to the conclusion that this should be a huge immediate worry or priority for us, instead of focusing on human incentives/systems/goals.


I'm not sure if economy inherently values human lives more than anything else. Only the monetary metrics need to bw fulfilled.

It's interesting to transfer the idea of the Turing Test onto other "agent" scenarios.

Financial trading bots have been a thing for a long time without any need to pretend that they're human.

The legitimation of property and capital depends on human owners though.


The basic problem still remains: if you build an autonomous machine intelligence and try to encode it with basic directives, the potential implications of those directives is hard to predict. Obviously the paperclip company doesn’t want to replace the entire universe with a grey goo any more than the sorcerer’s apprentice wants to flood the workshop; it happens accidentally.

Of course the paperclip company can try to add constraints to their AI in order to prevent naive paperclip maximization, but what if they screw up those constraints as well? The whole premise of Asimov’s Three Laws is that AI has these sorts of constraints, but even in his stories these constraints still lead to unexpected outcomes.

All programming bugs are the result of a programmer encoding an instruction or statement that doesn’t imply what they think it implies and the computer following it literally. A more capable and autonomous computer that approaches what we might call “intelligence” is also going to be more capable of doing harm when it runs into a bug. And if it’s something like an LLM where the instructions are natural language, with all its ambiguity and vagueness, you have a whole other issue compounding it.

If you study philosophy you end up running into the exact same problem. The object of the game of philosophy is to make the most general true statements possible. One philosopher might say something like, “knowledge is defined as true justified belief”, or maybe “moral good is defined as whatever delivers the greatest good to the greatest number”, or maybe even, “the object of the game of philosophy is to make the most general true statements possible”. And then another philosopher comes up with a counterexample or counterargument which disproves the first philosopher’s statement, usually because—just like a programming bug—it entails an implication that the first philosopher didn’t think of. We have been playing the game of philosophy for thousands of years and nobody has managed to score a point yet.

Another thing. Human beings have a lot of needs, imperatives, motivations, and values. Some of them, like food, are built in. Others are learned through culture. But we end up with a lot of them, and it’s easy to take them for granted. With a machine, you have to build those things in yourself. There’s no getting around it. But we don’t actually have a complete, hierarchical set of imperatives/motivations/values for a decent human being. The philosophers have been working on it for millennia but keep running into bugs. So how can we expect to solve the problem for non-human AI? True, we are unlikely to screw up so badly that we end up with a literal paperclip maximizer, but we are bound to make some far more subtle mistake of the same general kind.


If anyone is curious about real paperclip manufacturing machines:

https://news.ycombinator.com/item?id=20902807


More and more of our global economy is centered around compute. While it seems like oil and fossil fuel use will decline with the advent of other forms of energy production in the near future, computer chips are becoming prominent in global strategic thinking and military planning.

How is this different from maximizing paperclips? It's the same thing, just with a much more direct basis for instrumental convergence!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: