Depends a lot on the task demands. "Got 95% of the way to designing a successful drug" and "Got 100% of the way" is a huge difference in terms of value, and that small bump in intelligence would justify a few orders of magnitude more in cost.
But that objective measure is exactly what we’re lacking in programming: There is often many ways to skin a cat, but the model only takes one. Without knowing about those it didn’t take, how do you judge the quality of a new model?
I agree with you, but my gut tells me that a lot of people don’t know what a good outcome should/could look like and are accepting whatever it delivers.