> We don't think Bing can act on its threat to harm someone, but if it was able to make outbound connections it very well might try.
An application making outbound connections + executing code has a very different implementation than an application that uses some model to generate responses to text prompts. Even if the corpus of documents that the LLM was trained on did support bridging the gap between "I feel threatened by you" and "I'm going to threaten to hack you", it would be insane for the MLOps people serving the model to also implement the infrastructure for a LLM to make the modal shift from just serving text responses to 1) probing for open ports, 2) do recon on system architecture, 3) select a suitable exploit/attack, and 4) transmit and/or execute on that strategy.
We're still in the steam engine days of ML. We're not at the point where a general use model can spec out and deploy infrastructure without extensive, domain-specific human involvement.
Basically it learns to call specific private APIs to insert data into a completion at inference-time. The framework is expecting to call out to the internet based on what's specified in the model's text output. It's a very small jump to go to more generic API connectivity. Indeed I suspect that's how OpenAssistant is thinking about the problem; they would want to build a generic connector API, where the assistant can call out to any API endpoint (perhaps conforming to certain schema) during inference.
Or, put differently: ChatGPT as currently implemented doesn't hit the internet at inference time (as far as we know?). But Toolformer could well do that, so it's not far away from being added to these models.
In your scenario, did the script kiddie get control of Microsoft's Bing? Or are you describing a scenario where the script kiddie spins up a knockoff Bing (either hosting the GPT3 model or paying some service hosting the model), advertises their knockoff Bing so that people go use it, those people get into arguments with the knockoff Bing, and the script kiddie also integrated their system with functionality to autonomously hack the people who got into arguments with their knockoff Bing?
A script kiddie can connect GPT3.5 through its API to generate a bunch of possible exploits or other hacker scripts and auto execute them. Or with a TTS API and create plausible sounding personalized scripts that spam call or email people. And so on - I’m actually purposefully not mentioning other scenarios that I think would be more insidious. You don’t need much technical skills to do that.
Even if any of that were remotely relevant to this conversation about Bing, GPT models don't generate exploits or "hacker scripts", nor do they execute "hacker scripts". GPT models just provides natural language plain text responses to prompts.
An application making outbound connections + executing code has a very different implementation than an application that uses some model to generate responses to text prompts. Even if the corpus of documents that the LLM was trained on did support bridging the gap between "I feel threatened by you" and "I'm going to threaten to hack you", it would be insane for the MLOps people serving the model to also implement the infrastructure for a LLM to make the modal shift from just serving text responses to 1) probing for open ports, 2) do recon on system architecture, 3) select a suitable exploit/attack, and 4) transmit and/or execute on that strategy.
We're still in the steam engine days of ML. We're not at the point where a general use model can spec out and deploy infrastructure without extensive, domain-specific human involvement.