The solution here will, like the solution to SQL injection and to sound typing, ...

logicchains · on Nov 3, 2023

> I don't think anyone is sure what that will look like with LLMs, but I don't see any reason to assume a priori that there is no way to define a safe subset of the possible prompts.

That would essentially require a "non-Turing-complete" prompt language. Because if the prompt language was effectively Turing complete, it'd be impossible to determine whether every possible prompt would produce a "safe" outcome or not. This would severely limit what the LLM could do even compared to GPT3.5.

>Again, we did it with type systems and proof assistants.

Proof assistants require a human to provide the actual proof whether something is safe (correct) or not; they can't do it automatically except for very limited, simple classes of programs.

eru · on Nov 3, 2023

Yes, you don't want a Turing complete language. They allow too much.

> Proof assistants require a human to provide the actual proof whether something is safe (correct) or not; they can't do it automatically except for very limited, simple classes of programs.

Finding a proof is in NP (at least if you restrict yourself to proofs that are short enough that a human might have a chance to write it out in their lifetime). So computers can do it.

verve_rat · on Nov 3, 2023

Nah, I think it will be the other way around. We currently have intelligent agents working on help desk and other customer service roles. Those agents have had their acceptable output more and more restricted.

We will just do to LLMs what we are already doing to people.

professoretc · on Nov 3, 2023

The options are a support LLM that can sometimes be tricked into giving out refunds for items that were never purchased, and a support LLM that never gives out refunds at all. (It might hallucinate that it gave a refund, but it won't be hooked up to any API that actually allows it to do so.)

dreamcompiler · on Nov 3, 2023

This is actually the only possible answer IMHO. Humans are Turing-complete, which means the best we can do is give them training and guidelines and trust them. Even so their training can be subverted through social engineering.

What we're talking about here is social engineering of LLMs. That's currently pretty easy. It will get harder but it cannot be made impossible.

eru · on Nov 3, 2023

You just have to make it hard enough so that the hassle required ain't worth it.