The hypothesis you have raised about the source of the implicit assumptions these models make is indeed an interesting and plausible one, in my opinion.
Biases in data will always exist, as this is the nature of our world. We need to think about them carefully and understand the challenges they introduce, especially when training large "foundational" models that encode a vast amount of data about the world. We should be particularly cautious when interpreting their outputs and when using them to draw any kind of scientific conclusions.
I think this is one of many reasons why we implemented the system with inherent human overseeing and strongly encourage people to provide input and feedback throughout the process.
This is a super cool idea! We have considered implementing a variation of what you suggested, with the additional feature of linking each factual statement directly to the relevant lines in the literature. Imagine that in each scientific paper, you could click on any factual or semi-factual statement to be led to the exact source—not just the paper, but the specific relevant lines. From there, you could continue clicking to trace the origins of each fact or idea.
> From there, you could continue clicking to trace the origins of each fact or idea.
Exactly! I think you would like automated semantic knowledge graph building example in txtai.
Imagine how much could be done when price/token drops by another few orders of magnitude! I can envision a world with millions of research agents doing automated research on many thousands of data sets simultaneously and then pooling their research together for human scientists to study, interpret and review.
Skimmed through much of it, I don't see anything explicit about which philosophy of science is applied. It seems more like automated information processing, similar to what quant finance and similar is up to.
Do you belong to some popperian philosophy? It can't be feyerabendian, since his thinking put virtue as foundational for science. Do you agree with the large journal publishers, that the essence of science is to increase their profits?
Not sure why you think you've earned my respect, and it would be very hard for me to violate your rights since we communicate by text alone.
Biases in data will always exist, as this is the nature of our world. We need to think about them carefully and understand the challenges they introduce, especially when training large "foundational" models that encode a vast amount of data about the world. We should be particularly cautious when interpreting their outputs and when using them to draw any kind of scientific conclusions.
I think this is one of many reasons why we implemented the system with inherent human overseeing and strongly encourage people to provide input and feedback throughout the process.