'Try, Score, Change': Reinforcement Learning for Children

yorwba · 2026-05-10T08:49:02 1778402942

It's an interesting constrained-writing exercise, but I think it demonstrates that the focus on syllable count is too simplistic as a heuristic for whether children are likely to understand a word (which in any case is context-dependent.) For example, the model inserts spaces to break up words that are more usually written without space "near by," or resorts to contorted expressions using somewhat obscure monosyllabic vocabulary ("a rule that makes moves in runs that beat a base more apt" is hardly a clear explanation of the REINFORCE algorithm.)

The CHILDES corpus might provide a more reasonable proxy for how people normally talk to children.

gwern · 2026-05-10T20:29:00 1778444940

Yes, I definitely take this as a kind of reductio of Grow-Speech, after having defined it rigorously enough to be enforceable, or as chatbots love to say now, 'auditable' (https://gwern.net/grow-speech).

The LLMs follow the rules rigorously (barring a handful of inessential remaining errors like 'wires'), but show that you can easily satisfy the letter and not the spirit of the exercise, and that Grow-Speech can be flimsy and arbitrary once you start trying to use it seriously for something much more ambitious than https://gwern.net/doc/cs/algorithm/1998-steele.pdf because you just start using phrases or obscure Anglo-Saxon words (even if you can't go full Anglish).

When I look back at it now, I realize that Steele spends a lot of the apparently impressive length on fluff or descriptive language, and trades heavily on the fact that we already know what a programming language is or what an integer or an object is. I do not think anyone who doesn't have at least a hazy grasp of 'object' is really going to grasp a definition like:

> An object is a datum the meanings of whose parts are laid down by a set of language rules. In the Java programming language, these rules use types to make clear which parts of an object may cite other objects. Objects may be grouped to form classes. Knowing the class of an object tells you most of what you need to know of how that object acts. Objects may have fields; each field of an object can hold a datum. Which datum it holds may change from time to time. Each field may have a type, which tells you what data can be in that field at run time (and, what is more, it tells you what data can not be in that field at run time).

But when you try to tell people about something genuinely unfamiliar like REINFORCE, the obscurity becomes clear. (I'm going to revise REINFORCE to "a rule that makes moves in runs that beat a base more apt, and moves in runs that fall short less apt"... but it's not that much better, honestly.)