Considering scalability from the start does not just mean optimizing for millions of concurrent users, but choosing your software stack or your platform with scalability in mind. I get that it's important to take the next step and that premature optimization can stand in your way but there are easy to use technologies (like NoSQL, caching) und (cloud-) platforms with low overhead that let you scale with your customers and work wether you're big or small. This can be far supirior to fixing throughput and performance iteration to iteration.
I chose my software stack with scalability in mind: team scalability and rapid iteration (we started with Rails, displacing a Node.js implementation that was poorly designed and messily implemented). Because of that previous proof-of-concept implementation we needed to replace, we were forced into a design (multiple services with SSO) that complicated our development and deployment, but has given us the room to manoeuvre while we work out the next growth phase (which will be a combination of several technologies, including Elixir, Go, and more Rails).
One thing we didn’t choose up front, because it’s generally a false optimization (it looks like it will save you time, but in reality it hurts you unless you really know you need it), is anything NoSQL as a primary data store. We use Redis heavily, but only for ephemeral or reconstructable data.
The reality is, though, you have to know and learn the scalability that your system needs and you can only do that properly by growing it and not making the wrong assumptions up front, and not trying to take on more than you are ready for. (For me, my benchmark was a fairly small 100 req/sec, which was an order of magnitude or two larger than the Node.js system we replaced, and we doubled the benchmark on our first performance test. We also reimplemented everything we needed to reimplement in about six months, which was the other important thing. My team was awesome, and most of them had never used Rails before but were pleased with how much more it gave them.)
You're right, NoSQL systems tend to be more complex and especially failure scenarios are hard to comprehend. In most cases, however, this is due to being a distributed datastore where tradeoffs, administration and failure scenarios are simply much more complex. I think some NoSQL systems do an outstanding job to hide nasty details from their users.
If you compare using a distributed database to building sharding yourself for say your MySQL backed architecture, NoSQL will most certainly be the better choice.