>> and setting a new state-of-the-art accuracy on Science QA.
Given the size of these things, would there be any more actual information in it than say Wikipedia and maybe some other sources? In other words, is it just able to summarize and answer questions about its training data?
Stable Diffusion is 2GB and can produce just about any image imaginable with the right prompt. LLMs are similar. Size is not as important as the number of useful paths through the weights. With billions of parameters, the number of paths in a well trained model is almost intractably huge. The training methods are more important than the size of the model. Don't be mislead by their size.
Given the size of these things, would there be any more actual information in it than say Wikipedia and maybe some other sources? In other words, is it just able to summarize and answer questions about its training data?