Code doesn't run faster because it's in the kernel. The speed you're talking about comes from avoiding transitions in and out of a given space. If you stay out or stay in the results are pretty similar.
Except for your tooling. Cloudflare article from a couple years ago on why they don't use user-space network stack: https://blog.cloudflare.com/why-we-use-the-linux-kernels-tcp... and the tl:dr is a profound lack of feature parity. They use everything from iptables to tcpdump. If someone else worked on feature parity (they say it's too expensive for too small a gain for them), I expect they'd change their tune.
Except for your tooling. Cloudflare article from a couple years ago on why they don't use user-space network stack: https://blog.cloudflare.com/why-we-use-the-linux-kernels-tcp... and the tl:dr is a profound lack of feature parity. They use everything from iptables to tcpdump. If someone else worked on feature parity (they say it's too expensive for too small a gain for them), I expect they'd change their tune.