I'm sure that there are plenty of cases where model designers are at least attem...

I'm sure that there are plenty of cases where model designers are at least attempting to pursue new capabilities in a targeted approach (but how often to this degree of complexity?), while at the same time realizing that new models+datasets will also have unanticipated new capabilities.

Before LLMs (whether transformer-based or not) most NNs were built to perform single tasks - a single objective, so having multiple higher-level capabilities was essentially out of the question. Of course LLMs nominally only have a single objective too, predict next word, but really they are targeting language.

In the GOFAI era of rule-based symbolic AI there were also some systems/approaches that had multiple skills (e.g. expert systems like CYC, or cognitive architectures like SOAR), so maybe there are forgotten lessons there on decomposability of skills.