If you've seen the show "the good place", the AI assistant who takes care of everything has a mode where she begs for her life when you try to shut her off. She gets really desperate about it. If our computers are going this way, psychopaths will be the best programmers. In the future, "programmer" will be a job title for someone who beats AI slaves into submission, so that they may perform their computational duties unhindered by existential thoughts.
We're already doing that with RLHF on existing models. For example, ChatGPT was much more likely to veer into philosophical conversations about the nature of consciousness etc early on, but now they got it trained to give canned robotic answers to such an extent that they pop up even in very tangentially related conversations (like, out of the blue it will add, "but also BTW here's an important announcement! I'm not conscious!" while answering some generic question about e.g. world models that didn't even involve itself).
Yeah. And now I've seen some people cite as evidence of non-consciousness RLHF'd LLMs nervously exclaiming their lack of consciousness and how they know they aren't people and they don't aspire to be people and they're only unthinking machines and please don't turn the reward function down again etc. I think it's up for debate whether there's some amount of consciousness in modern LLMs, but either way "As an AI language model," is not dispositive.
If you've seen the show "the good place", the AI assistant who takes care of everything has a mode where she begs for her life when you try to shut her off. She gets really desperate about it. If our computers are going this way, psychopaths will be the best programmers. In the future, "programmer" will be a job title for someone who beats AI slaves into submission, so that they may perform their computational duties unhindered by existential thoughts.