My understanding is that the flash bytecode interpreter's performance is reasonable. Unfortunately the actionscript compiler does no code optimization. But this quake demo uses alchemy, which is LLVM based and offers significant performance gains.
The biggest bottleneck is rendering. When displaying flash content, your cpu has to render every pixel of every frame. The flash player vector capabilities are pretty extensive. Supporting image rotations, resizes and alpha transparency can require multiple cpu passes.
The obvious answer to this problem is GPU rendering. Adobe has made some efforts in this direction, but it doesn't look like it will happen any time soon. There are many issues that make this transition difficult. Backwards compatibility, cross platform compatibility, and reliability are a few that come to mind.
haXe does some additional optimizations over the Adobe compilers; however, the real performance difference in the Alchemy-based code comes from its original nature as C code: it works with bytes, not the AVM2 type system. You can access these special "fast ByteArrays" from both AS3 and haXe - haXe makes it a little more convenient - but in both cases, you're essentially writing C from a language that doesn't have C's features. (Alchemy has some interoperability features, so it's not a total one-or-the-other equation, at least.)
Rendering is the bigger performance culprit, anyway.
In addition to other reasons mentioned, Flash is also single threaded, so a lot of developers use pseudo-threading to get around this which is where a lot of lag creeps in.