> What's more interesting is that the Rust implementation is just a factor of 1....

dagmx · on Dec 27, 2022

Numpy is often faster because it’s often using highly optimized simd, or makes use of BLAS/Fortran/LAPACK/MKL/CuBLAS implementations.

A pure rust implementation will likely always be slower by virtue of not using the same tightly designed optimized code.

Side note: A fun implementation detail of numpy is that after you install it from pypi, it does a user side compile of some of the modules on first import. Which means you need to be somewhat careful if you ever relocate an install of it to a new machine

orlp · on Dec 27, 2022

And note that for example on Apple M1 it's essentially impossible to beat an implementation that uses Apple's Accelerate library for things like matrix multiplication, because Apple uses undocumented instructions unavailable to the public in that library.

orangepurple · on Dec 27, 2022

  Deprecated since version 1.20: The native libraries on macOS, provided by Accelerate, are not fit for use in NumPy since they have bugs that cause wrong output under easily reproducible conditions. If the vendor fixes those bugs, the library could be reinstated, but until then users compiling for themselves should use another linear algebra library or use the built-in (but slower) default, see the next section.

Source: https://numpy.org/doc/stable/user/building.html

dagmx · on Dec 27, 2022

Added back the very next release

https://numpy.org/doc/stable/release/1.21.0-notes.html

> With the release of macOS 11.3, several different issues that numpy was encountering when using Accelerate Framework’s implementation of BLAS and LAPACK should be resolved.

the8472 · on Dec 27, 2022

If numpy uses runtime detection of available SIMD instructions while rust is only compiled with the x86-64 baseline (which only includes SSE2) then compiling the module with `RUSTFLAGS=-Ctarget-cpu=native` might provide some additional performance gains on number-crunchy code.

remram · on Dec 27, 2022

Interesting! Using what compiler?

dagmx · on Dec 27, 2022

I believe it looks to see what’s available and otherwise falls back to less efficient implementations.

dj_mc_merlin · on Dec 27, 2022

The diagram shows Numpy (orange line) below "Rumpy" (blue line). Since the y axis is time, less is more, so Numpy is faster indeed.

orangepurple · on Dec 27, 2022

I had to do a double take. The Rust implementation is slower and harder to maintain. I recommend adding a cythonized function and a numba jitted function to the benchmark for completeness.