My own experience backs this. I don't sit here obsessing about performance. In a database context that is appropriate but I'm not implementing databases. Nor do I prematurely optimize. I just give some thought to the matter occasionally, especially at the architecture level, and as a result I tend to produce systems that often surprise my fellow programmers at its performance. And I strongly assert that is not because I'm some sort of design genius... ask me about my design mistakes! I've got 'em. It's just that I try a little. My peer's intuitions are visibly tuned for systems where nobody even tried.
I'm not sure how much "low hanging" fruit there is, though. A lot of modern slowdown is architectural. Nobody sat down and thought about the flow through the system holistically at the design phase, and the system rolled out with a design that intrinsically depends on a synchronous network transaction every time the user types a key, or the code passes back and forth between three layers of architecture internally getting wrapped and unwrapped in intermediate objects a billion times per second (... loops are terrible magnifiers of architecture failures, a minor "oops" becomes a performance disaster when done a billion times...) when a better design could have just done the thing in one shot, etc. I think a lot of the things we have fundamental performance issues with are actually so hard to fix they all but require new programs to be written in a lot of cases.
Then again, there is also visibly a lot of code in the world that has simply never been run through a profiler, not even for fun (and it is so much fun to profile a code base that has never been profiled before, I highly recommend it, no sarcasm), and it's hard to get a statistically-significant sense of how much of the performance issues we face are low-hanging fruit and how much are bad architecture.
I'm not sure how much "low hanging" fruit there is, though. A lot of modern slowdown is architectural. Nobody sat down and thought about the flow through the system holistically at the design phase, and the system rolled out with a design that intrinsically depends on a synchronous network transaction every time the user types a key, or the code passes back and forth between three layers of architecture internally getting wrapped and unwrapped in intermediate objects a billion times per second (... loops are terrible magnifiers of architecture failures, a minor "oops" becomes a performance disaster when done a billion times...) when a better design could have just done the thing in one shot, etc. I think a lot of the things we have fundamental performance issues with are actually so hard to fix they all but require new programs to be written in a lot of cases.
Then again, there is also visibly a lot of code in the world that has simply never been run through a profiler, not even for fun (and it is so much fun to profile a code base that has never been profiled before, I highly recommend it, no sarcasm), and it's hard to get a statistically-significant sense of how much of the performance issues we face are low-hanging fruit and how much are bad architecture.