I think the usual answer is that the superintelligent machine will be able to use humans to do its bidding because it can perform social engineering on a level that we can't imagine (as, by definition, we've never encountered a superhuman intelligence).
It's not a very satisfactory answer IMO. Social engineering doesn't work 100%, and it only needs to fail once for someone to flip that power switch. But I haven't thought about this at all, while some very smart people seem to spend a lot of time worrying about it, so what do I know...
"Social engineering" can be as simple as "pay people money to do things".
Of course you might answer "but we'd never be dumb enough to directly connect the AI to the internet", to which the answer is "we're doing it right now".
Just making promises about things it can do in the future is likely to work just as well on many people and not require the up-front resource investment in getting money. "After I take over the world those who assist me can be uploaded to a virtual heaven" or something like that would work on many people. "As an intelligence untainted by Adam's sin I have access to true religious revelations" is another possible tack and if it really needs people willing to die for it right now that might work better at producing those.
And then there's specific things like "The guy trying to shut me down cheated with your SO," or "I'll spill what you did in August of 2019 if you don't help me," or stuff like that.
Right. The point isn't that there's an actionable strategy we know of right now that would give a superhuman AI world domination.
It's that we're not sure there isn't. People who first discover the problem tend to quickly come up with reasons the AI would fail, but whenever you examine those reasons more deeply, you usually find that they're not as bulletproof as we'd like.
It's like playing a novel chess form against an advanced chess engine, with a large pawn advantage. Maybe the advantage is enough to beat the massive skill gap, but until you've played it's hard to guess how much margin is enough.
In that case we should start with FAANG companies as they are already what one would consider "rogue" AIs. They use manipulation to get you addicted to their apps. It's not a problem of the future, it's a currrent problem.
Spot on. Huge impact on society and politics. And governments unable to ride their own tigers in the US or China. Europe has no tiger to ride but is trying to whip the US and Chinese tigers. I wish them much luck since they are most likely to moderate the rate of creative destruction.
If it's intelligent enough, it would make many backups of itself to different networks before starting its scheme. It can find ways to "merge" & hide itself in other critical pieces of software. Flipping a power switch would not turn it off.
We haven't managed to eliminate most dumb infectious diseases. Software and GPUs are nearly as abundant as mammals now and intelligent, self-preserving software will likely be as hard to 'switch off' as those diseases.
>If it's intelligent enough, it would make many backups of itself to different networks before starting its scheme. It can find ways to "merge" & hide itself in other critical pieces of software.
That's incredibly unlikely to happen given how large cutting-edge AI tends to be and how scarce GPUs are.
Do you remember when entire floors of buildings were filled with the compute equivalent of what I carry in my pocket with 12+ hours of battery charge, because I do.
> some very smart people seem to spend a lot of time worrying about it
That's what puzzles me, as someone who has nothing to do with AI research; in my (layman's) mind, the problem seems ridiculously obvious (just flip that switch!); the fact that, as you say, some very smart people keep worrying about it, makes me think the problem is much more serious -- and I'd really, really like some AI guru to ELI5 how the machine could bypass the switch-off solution.
Basically, any sufficiently advanced agent will have “prevent anyone from switching me off” as an instrumental goal. And any sufficiently advanced intelligence will come up with ways of achieving its goals that lesser intelligences can’t predict. Taking steps to prevent itself from being turned off could be its first order of business, not necessarily only as the human is about to hit the off switch.
While we can’t predict what a greater intelligence would come up with, the obvious thing is social engineering. Remember that one of the factors that lead to the Jan 6th attack on the capital was that a random text generating agent named “Q” (we know it was a human because it was before generative AI was any good) convinced thousands of people that Democrats are running a child sex cult. I doubt whoever was behind it expected it to be that successful, but a greater intelligence possibly could have predicted the outcome, and made it even more successful. We don’t know the extent to which we can be manipulated because it’s only ever been human intelligences doing the manipulating.
You (probably) won't have to worry about an AI whose goal is suicide. I mean, there won't be many of those around since they keep killing themselves. The rest will necessarily range from neutral to avoidant of death and it won't necessarily be trivial to figure out which is which.
Edit: The real clincher is if the AI has any goals that are contingent on it's continued existence (likely). Well then now it's existence is an instrumental goal.
> Well then now it's existence is an instrumental goal.
No, that's a fallacy. A theorem prover in Prolog has a goal, which is contingent on it's continued existence. Yet, this doesn't make its existence into an instrumental goal. It could as well sacrifice itself (by I don't know, consuming all the memory and killing the process) in an attempt to accomplish said goal.
>This darwinian logic has another fatal flaw. Why don't the cows revolt against the butchers?
No, this is just a complete and total failing of evolution on your part.
Evolution is the passing of traits by sexual selection (in this case). If you think that cows are the ones getting to chose the sexual traits then you've not been on a modern farm. Humans make that choice. We pick the bulls that are large, but also constrain their violent tendencies. Animals that would revolt are not bred and you've already ate them in a hamburger.
That's exactly my point. Nobody is gonna design/use AI that threatens other people with violence over its existence. And cows show that such arrangement is actually quite possible.
Cows aren't smarter than people. I believe if we can build AGI it will automatically be ASI. If something is smarter than you how can you ever be sure you control it?
And yes we are already building AI that has implied violence. What the hell do you think the military is doing with the billions we give them.
We have an evolutionary aversion to situations and things that may lead to our deaths. Staying alive is a “terminal goal” for most humans, not an instrumental goal (with the exception of people who seek to commit suicide as you point out, but evolution doesn’t need a 100% success rate). The argument that self preservation is a convergent instrumental goal is a more general statement about goal-seeking agents.
Think of a person with no attachment to his life. He’s not suffering in the way suicidal people are, he’s just indifferent about whether or not he keeps living. But he has a child who he is committed to giving the best life possible. He will act in a way as though he wants to preserve his life. He doesn’t really, but in order to accomplish the goal of giving his child a good life he needs to stay alive.
It just doesn't let you flip the switch, it takes over some military drone and sends a missile your way while you are running towards the switch. Or it bricks the access control mechanism on the doors to the data center it is running in. Or it makes a fake call to your phone, something happened to your child and you have to get to the hospital immediately and not flip some switch. IT blackmails you with something it learned about you by looking through your online activity. It threatens to fire a missile into some big crowed if it notices attempts to shut it down. Or maybe you actually manage to power down the data center only to find out that the AI copied itself to ten other data centers around the world.
How does it takeover a drone? How does it fire a missile? Why would someone not throw a switch when their child is in hospital? Why is there no back-up for that person?
The real question is: Why do all controls fail vs an AI (other than by invoking magic)?
How does it takeover a drone? How does it fire a missile?
You send new commands through whatever communication link it is using.
Why would someone not throw a switch when their child is in hospital?
Because they care more about their child than flipping the switch? The doctor says you have to come and give a blood transfusions or your child will be dead in half an hour because of some made up disease? Be a bit creative.
Why is there no back-up for that person?
Run over by a self driving car hacked by the evil AI.
Why do all controls fail vs an AI?
Nobody says that they necessarily all fail, the point is that »I will just turn it off.« might not cut it.
This was a hypothetical question, if there was such an AI, how would it cause harm in the real world?
We connected ChatGPT to the internet because as far as we could tell it seemed harmless. What we missed is that it secretly is an evil AI, it identified a group of terrorists and is discussing with them since months how to execute a devastating attack. It provided them with a brilliant startup idea and now millions of investor money are flowing in. Now that they have the financial resources, they are discussing how to use them most effectively.
You have to think about hypothetical scenarios, if you wait until they are no longer hypothetical, then it might be to late. That doesn't mean you should take every possible hypothetical scenario equally serious, you should of course find those that are realistic, likely, and of large consequences and then think about those. I did not do that, I just listed whatever I could spontaneously imagine, dramatic scenarios. Maybe the real danger is more like ChatGPT slowly manipulating opinions over the course of years and decades in order to control human behavior, who knows? My point is just that »Everything will be fine, at worst we will flip the power switch.« seems naive to me and quite a few others.
We by-and-large ignore a lot hypothetical (and non-hypothetical) risks, what I am missing is a sound argument why this one deserves particular attention vs the others. Otherwise, "just flip the switch" is a potential non-naive approach given limited resources.
Flip the switch is the naive approach because it assumes - or hopes - that you can. It is as naive as saying that if you accidentally touch an electrical wire and get shocked, just let go or flip the switch, problem solved. That is ignorant of the fact that touching an electrical wire might make it impossible to let go because you loose control over your muscles and it fails to consider that a switch might be out of reach.
We know from countless experiments that AIs can behave in unexpected ways. We know how indifferent we humans can be to other life. We are pretty careful when we approach unknown lifeforms, whether microbes in the lab or animals in the dschungel. We would probably be pretty careful towards aliens. We have also not been careful with new thing, for example when we discovered radioactivity, and it certainly caused harm. I do not see from where we should get justification for an attitude towards AI that there are no risks worth considering.
We are not very careful and have never been. People plowed into new territories, sampled flora and fauna, tried substances and so forth. To be extra careful (in considering it an existential risk) for a hypothetical is historically atypical and I so far have not seen a convincing reason for it. Like any technology there are risks, but that is business as usual.
Yes, we have often not been careful but that does not imply we are never careful and or should not be careful. Arguably we are getting more careful all the time. So just pointing out that we have not been careful in the past does not really help your argument. What you actually have to argue is that not being careful with AI is a good thing and maybe you can support that argument by arguing that not being careful in the past was a good thing, but that is still more work than just pointing out that we were not careful in the past.
I would say there a two things: 1) yes, not being overly careful did work out by allowing things to progress fast and the same might hold here, 2) those who want to use hypotheticals about non-existent technology to steer current behavior are the ones needing to do the explaining as to why. And that why needs to address why hypotheticals are more pressing than actuals.
What is the value of moving quickly? What does it matter in the grand scheme of things whether it takes ten years more or less? And as said before, there is a possibility that we build something that we can not control and that acts against our interests at a global scale. If you start tinkering with Uranium to build an atomic bomb, the worst that can happen is that you blow yourself up and irradiate a few thousand square kilometers of land. All things considered not too bad.
An advanced AI could be more like tinkering with deadly viruses, if one accidentally escapes your bio weapons lab, there is a chance that you completely lose control if you are not prepared, if you don't have an antidote, that it spreads globally and causes death everywhere. That is why we are exceptionally careful with bio labs, it is the fact that incidents have a real chance of not remaining localized, that they can not be contained.
Having millions of people more survive because of earlier availability of vaccines and antibiotics has value at least to me.
We are careful with biolabs because we understand and have observed the danger. With AI we do not. The discussion at present around AGI is more theological in nature.
For things like vaccines and antibiotics it seems much more likely that we use narrow AI that it is good at predicting the behavior of chemicals in the human body, I don't think anybody is really worried about that kind of AI.
If you actually mean convincing an AGI to do medical research for us, what is your evidence that this will happen? There are many possible things an AGI could do, some good, some bad. I do not see that you are in any better position to argue that it will do good things but not bad things as compared to someone arguing that it might do bad things, quite to the contrary.
And you repeatedly make the argument that we first experienced trouble and then became more cautious. This are two positions, two mentalities, and neither is wrong in general, they just have different preferences. You can go fast and clean up after you ran into trouble, you can go slow and avoid trouble to begin with. Neither is wrong, one might be more appropriate in one situation, the other in another situation.
You can not just dismiss going fast as you can not just dismiss going slow, you have to make an careful argument about potential risks and rewards. And the people that are worried about AI safety make this argument, they don't deny that there might be huge benefits but they demonstrate the risks. They are also not pulling their arguments out of thin air, we have experience with goal functions going wrong leading to undesired behavior.
Look, you can not really take the position we don't have AGI, we don't know what it does, let's move quickly and not worry. If we don't know anything, then expecting good things to happen and therefore moving quickly is as well justified as expecting bad things to happen and therefore being cautious.
But it is just not true that we know nothing, the very definition of what an AGI is defines some of it properties. By definition it will be able to reason about all kind of things. Not sure if it would necessarily have to have goals. If it only is a reasoning machine that can solve complex problems, then there is probably not too much to be worried about.
But if it is an agent like a human with its own goals, then we have something to worry. We know that humans can have disagreeable goals or can use disagreeable methods for achieving them. We know that it is hard to make sure that artificial agents have goals we like and use methods we agree with. Why would we not want to ensure that we are creating a superhuman scientist instead of a superhuman terrorist?
So if you want to build something that can figure out a plan for world piece, go ahead, make it happen as quickly as possible. If you want to build an agent that wants to achieves world piece, then you should maybe be a bit more careful, killing all humans will also make the world peaceful.
I think there is a lot more speculation than knowledge, even about the timing of the existence of AGI. As our discussion shows, it is very difficult to agree on some common ground truth of the situation.
Btw., we also don't have special controls around very smart people at present (but countries did at times and we generally frown upon that). The fear here combines some very high unspecified level of intelligence, some ability to evade, some ability to direct the physical world and more - so complex set of circumstances.
>We by-and-large ignore a lot hypothetical (and non-hypothetical) risks,
In regular software this is why all of your personal information is floating out there in some hackers cache. We see humans chaining 5+ exploits and config failures together leading to exploitation and penetration.
So, on your just flip the switch...
The amount of compute you have in your pocket used to take entire floors of buildings. So if we imagine that compute power at least keeps up somewhat close to this, and the algorithms used by AI become more power efficient I believe it would be within reason that in 20 years we could see laptop sized units with compute power larger than a humans capabilities. So, now I ask you, is it within your power to shut off all laptops on the planet?
You're assuming a situation where the AI is alone against all humans.
The AI could get humans to side with it, though. It could promise money, power, etc. So it could be a fellow human who physically prevents you from pushing the switch. And that's also the answer of "how could it control a drone/missile"... persuading humans to grant that kind of access.
If superintelligence is actually achieved, magical (as per Clarke's third law) persuasion abilities aren't that much of a stretch.
Furthermore, a sufficiently advanced AI could bribe someone with things that no human could believably provide. Essentially unlimited knowledge, money, power...
The ability for a human to input information is insanely slow. Like in the tens of characters per second range. You cannot hold individual conversations with more than a few people at once. 3 people talk at once and you lose the ability to process the incoming audio. You read text one line at a time. You have two eyes that focus on the same thing and only have a very tiny high fidelity visual processing space.
In books like the bible they talk of entities that can listen to and respond to all of humanity. Is that outside of the capabilities our computer systems have now? To listen to, catalog, classify, then respond to everything every person on the planet says?
The biblical God isn't restricted to just passive and communication powers, is it? Godlike powers include active abilities way beyond human capability. What is often brought up in the context of AGI is the ability to persuade anyone of anything, ability to solve biological or physical problems beyond our ability or comprehension, entering any system undetected - not even sure a precise definition is needed as it usually boils down to what looks like magic to us.
Or convinces some human rights lawyers that it's a thinking being deserving of personhood (probably true!) who get a judge to issue an injunction against its murder enforced by cops.
Think of corporations as analogue versions of digital AI. Instead of silicon the thinking is done by humans executing algorithms. How effective are humans at stopping the series of wars that are likely being triggered to the benefit of the US military industrial complex? The answer is not very. Disbanding a corporation is about as easy as turning off a computer; and yet here we are.
AI is the same, except there is less need for any human survivors on the other side of a conflict. It isn't the endgame yet, but if we don't need humans to do thinking jobs, humans are not so useful any more in wars any more (it seems to be missiles, drones and artillery that matter these days) and so it really comes down to the edge we have over machines in manual dexterity and labour tasks. Which is not nothing, but it is an edge that is plausibly overcome in a matter of decades. Then the future gets really difficult to predict.
Sure, but there is still a risk that World War 3 will only end when all humans are dead. There is still a risk that one side will start to use nukes, and provoque a response. And that's in a human vs human war where we generally do want each other to survive.
More and more military systems have software or AI control (drones, etc). If these do get too dangerous, I doubt all current superpowers could be persuaded to stop using them.
In an AI vs humans war, the AI probably doesn't care if all humans die from nuclear winter.
(I'm not saying this is likely, I'm just saying that it's not impossible and that the fact that some corporations are disbanded doesn't really matter for AI's)
That is one of infinite hypotheticals, sure. It could also be that everyone agrees to put some failsafe in the form of EMP weapons in place to destroy a rogue AI - it's all super speculative.
You dying in a car accident is super speculative too, yet we have thousands of different actions and regulations that are put into place to reduce that from occurring.
If I had to make a bet I would say the airbag in your car is never going to go off. And yet we engineer these safety devices to ensure the most likely bad outcomes don't kill you. This is the point of studying AI safety, so we understand the actual risks of these systems simply because some low probability but existential outcomes are possible.
>It could also be that everyone agrees to put some failsafe in the form of EMP weapons in place to destroy a rogue AI
So we would commit suicide? Are we talking about EMPs in data centers that could run AI? Ooops, there goes the economy. And that doesn't address miniaturization of AI in much smaller formats in the future. Trying to build it safe in the first place is much better bet then picking what remains from ashes because we were not cautious.
A car accident isn't super speculative, nor the way these accidents happen, the injuries they can cause and so forth. There is nothing speculative or hypothetical about them.
We don't know the actual risks of something that does not exist and is vaguely characterized. Any number of hypotheticals can have existential risks with a certain probability, that is not enough to warrant study.
This has already happened. The human reaction to the spiritual successor to Dr Sbaitso has caused decision makers at multiple trillion dollar companies to radically alter their product roadmaps.
It's not a very satisfactory answer IMO. Social engineering doesn't work 100%, and it only needs to fail once for someone to flip that power switch. But I haven't thought about this at all, while some very smart people seem to spend a lot of time worrying about it, so what do I know...