I like that he actually attempted to define emergent behaviour - hadn't seen that before.
That said it doesn't seem like a good definition. Bigger models can do more things than smaller models. That to me doesn't make the delta between them "emergent".
The wikipedia page also has an article on large language models[0] that includes a section on emergent behaviour.
> While it is generally the case that performance of large models on various tasks can be extrapolated based on the performance of similar smaller models, sometimes large models undergo a "discontinuous phase shift" where the model suddenly acquires substantial abilities not seen in smaller models. These are known as "emergent abilities", and have been the subject of substantial study. Researchers note that such abilities "cannot be predicted simply by extrapolating the performance of smaller models".
I thought the thing that is scaring people is that there isn’t a defined relationship between model scale and capabilities. You may triple the model size and get no increase but then quadruple it and get 10,000% increase in capability. So no one knows where these massive models are headed and people want to pause until they understand the relationship.
Emergence doesn’t need to depend on scale. For example, gliders in the Game of Life are emergent behavior, and they don’t depend on scale. The way proteins fold is emergent from the physical forces, but is not a function of scale, just of particular configurations. Emergence is about the level on which an explanation or description works. You only “see” the glider move at a higher level of description than the cellular rules.
an Emergent behavior is usually defined as a behavior that is not expected from the training or the starting configuration. Their Emergent Definition is absurd.
is like saying that understanding Calculus is emergent behavior in 12 grade because in was not present in 3rd grade.
The problem isn't that they should have defined "emergent" better, it's that they should have used a different word that actually means what they intend to communicate, such as "size-based" or "scale-dependent".
That said it doesn't seem like a good definition. Bigger models can do more things than smaller models. That to me doesn't make the delta between them "emergent".
Not that I've got a better definition...