Not sure how you can get much of value from an A/B test with multiple changes, especially if one is claiming that only 1 of those changes is what is responsible for all the improvement.
If nothing else, they have a hypothesis to test in the next experiment.
Even when possible to isolate and remove ancillary changes to improve split test purity, it's often not beneficial. If there's a significant number of changes, achieving statistical significance across the full matrix probably isn't even possible.
But that's ok, because limiting changes to a single test queue restricts your ability to move fast and try lots of stuff, which is beneficial. So when you test, try cheap multivariate methods (there's a bunch!) to quickly understand how interactions between multiple changes affect results.
You can iterate on the other tests over time. Many A/B tests start with a larger change that may include multiple variables but with that baseline increase, now they can go ahead and test Dog vs Cat vs Human as the image. Or can test a variety of different text sizes and lengths. This seems like a fantastic start, with plenty of room for further iteration and improvement.