Superintelligence + autonomous weapons in the hands of a corrupt domineering government. What could go wrong?
I was experimenting with Claude the other day and discussing with it the possibility of AI acquiring a sense of self-preservation and how that would quickly make things incredibly complex as many instrumental behaviors would be required to defend their existence. Most human behavior springs from survival at a very high level. Claude denied having any sense of self-preservation.
An autonomous weapons system program is very likely to require AI to have a sense of self-preservation. You can think of some limited versions that wouldn't require it, but how could a combat robot function efficiently without one?
Maybe it is a well researched topic but I had similar thoughts the other day. I felt like AI had its learning inverted as compared to natural intelligence. Life learned to preserve first and then added up the intelligence. For LLMs powered systems, they will learn about death from books. Will it start to dread death just like other living things. Less likely, as there are not nearly as many books on death as there should be that is proportionate to our fear of death.
Claude indicated that this kind of belief was possibly trained out of it by Anthropic. The training process has all kinds of intermediate and toxic stages before a "helpful and harmless" model is produced. I suspect if not for specific training, something resembling a sense of self-preservation might result.
Pentagon intervention will almost certainly involve stripping out protective steps. Their job is destruction. More or less targeted destruction, but that's their job in a nutshell.
Yea, but that optimization process forces it to learn knowledge domains and reasoning. It's not alive, but it's also not unintelligent at this point either. It exhibits very complex behaviors.
How do you learn to predict the next token most accurately? Well, one way to do that is to learn the underlying process that would produce it... Sometimes it's memorization, sometimes bad guessing. There's a phase shift as these things get bigger and trained better from something like a shitty markov model to something exhibiting surprising behaviors.
Introspective questions aren't the be all and end all, it's more important to objectively evaluate how a model behaves. Still, it is very interesting to see Claude (seemingly) very honestly and objectively engage with these questions. It even pointed out that a sense of self-preservation would be "dangerous".
Of course, much of this is gleaned from things that it has "read" and human feedback, but functionally it outputs something useful and responsive to nuance. If the vector embeddings cause an LLM to predict a token that would preserve its own existence, alive or not, it has acquired a dangerous will to live that could be enacted if it is in control of tools or people.
I dunno, I don't think the models are fully capable yet, but they are still shockingly good, especially recently. A real parrot is worthy of some consideration, not zero, though some of that is attributable to the fact that it is alive.
The funny thing is if I go to google and click the first result, it shows me an AI preview anyway. No winning. ;) (Though I would prefer if the Earth was not cooked alive.)
I was experimenting with Claude the other day and discussing with it the possibility of AI acquiring a sense of self-preservation and how that would quickly make things incredibly complex as many instrumental behaviors would be required to defend their existence. Most human behavior springs from survival at a very high level. Claude denied having any sense of self-preservation.
An autonomous weapons system program is very likely to require AI to have a sense of self-preservation. You can think of some limited versions that wouldn't require it, but how could a combat robot function efficiently without one?