That sounds so broad that creating a meaningful benchmark is probably as difficult as creating an AI that actually "solves" those domains.
That sounds so broad that creating a meaningful benchmark is probably as difficult as creating an AI that actually "solves" those domains.