Perhaps a bad example although I agree with the general argument.
There are large datasets of calculations specifically for training large language models. It’s not just picking it up from reading books. And even then these models suck at calculating, half the time they just make an answer up.
Calculation appears to be something that is actually not an emergent property of constant-time token prediction. Which we already knew from Turing anyway.
There are large datasets of calculations specifically for training large language models. It’s not just picking it up from reading books. And even then these models suck at calculating, half the time they just make an answer up.
Calculation appears to be something that is actually not an emergent property of constant-time token prediction. Which we already knew from Turing anyway.