The math of Big-O isn't that hard and the article while having good intentions misses the point. Big-O is about asymptotic behaviour and the graphs are misleading in that regard (well, they're simply wrong, not misleading). There are algorithms where if you just look at the Big-O you'd think one has faster run time than the other but because the constants fall out that wouldn't be the case for any practical problem size.
What O(N) means is there is some large enough number where the run-time (edit: or any function really) is bounded by a constant times the input size for any input size larger than that number (see, math, not hard). That constant may be so large that an O(N^2) may be a better solution for any practical purpose.
EDIT: As an example of this we can look at two multiplication algorithms, Karatsuba and Schönhage–Strassen, the latter having better asymptotic performance but that really kicks in once you have large enough numbers (100,000 digits or so). ( http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen... )
He's saying that big O only matters for big input sizes, because big O is specifically about the algorithm's asymptotic performance, which means its performance for a large value of n. If you have a small value of n then the other constant time operations in the algorithm may affect the running time more.
n^2 is less than n^3 right?
But is 2 * n^2 + 100 less than n^3?
Depends on how big n is, right?
Big O notation just assumes that n is big enough to make the answer "yes."
More or less, yes. What he is saying is that there is a constant hidden in the big O. Say, we have two algorithms A and B with a runtime that can be bounded by the functions a(n) = 1000n^2 and b(n) = 0.001n^3 respectively. Hence a ∈ O(n^2) and b ∈ O(n^3). So the first algorithm A is clearly faster asymptotically. However, suppose we have input sizes of around n=50000, it actually turns out that algorithm B is faster because g(50000) < f(50000). This is because of the constant that is hidden in the big O. For sufficiently large n however (n > 10000000 in this case), algorithm A will always be faster.
A good implementation could check problem sizes beforehand and chose the algorithm accordingly.
What I don't agree with is that "big O only matters for big input sizes". Big is not really a well-defined term. The problem here is that "big" depends entirely on the algorithms. It might also be n > 10. There's nothing in the definition of the landau symbols that prevents that.
The definition of a "big" input size is somewhat circular, yes.
"The big O performance of an algorithm only matters for a sufficiently large value of n."
"Define a sufficiently large value of n."
"Large enough that the big O performance of the algorithm starts to matter more than the coefficients of n and constant time operations."
So yes, that could be 10 or 10 million depending on the nature of the problem, the constants and coefficients of the algorithm, the language, the hardware, etc etc. You could have an algorithm that takes factorial time but maybe you're only using it in a problem domain where n is always < 10 so you'll probably never notice or care that it's O(n!)
* That constant may be so large that an O(N^2) may be a better solution for any practical purpose.*
Another example of this is that a good system/library sorting function will sometimes use an asymptotically fast algorithm like quicksort or mergesort only for arrays over a certain size, but will switch over to a O(n^2) algorithm like insertion sort to sort smaller arrays, because it has smaller constants and coefficients. (But sometimes this is overkill, because if n is small, you probably aren't going to notice the speed difference anyway.)
So yes, Big O is all about asymptotic behavior, which basically means, the rate of growth as n approaches infinity. Because a large value of n is when the choice of algorithm starts to really matter for performance.
His tilde notation is also more informative than the big-O one. The canonical example is Quicksort. If you look at its big-O time complexity, it's O(n^2). When for all practical cases, it's ~n log n.
What O(N) means is there is some large enough number where the run-time (edit: or any function really) is bounded by a constant times the input size for any input size larger than that number (see, math, not hard). That constant may be so large that an O(N^2) may be a better solution for any practical purpose.
EDIT: As an example of this we can look at two multiplication algorithms, Karatsuba and Schönhage–Strassen, the latter having better asymptotic performance but that really kicks in once you have large enough numbers (100,000 digits or so). ( http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen... )