Training an LLM on a ton of multiple choice questions doesn't "infect it" like you're thinking. The tokens capture the fact it's a multiple choice question, and the LLM eventually captures the nuance of textual entailment as a common form of multiple choice question.
In a more natural conversational setting, you'd get a different answer:
In a more natural conversational setting, you'd get a different answer:
https://chat.openai.com/share/00fed9d6-e3de-4319-9c76-ae1800...
https://chat.openai.com/share/dc9a796c-870c-44ee-b421-31c24b...