Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> What's more interesting is that the Rust implementation is just a factor of 1.23 slower (for large arrays) than just using Numpy

I suppose what is meant is faster (also follows from the diagram?). But it is still not a dramatic gain for many use cases. This shows how non-trivial the python performance calculus: pure python, versus numpy python, versus compiled c/c++ or rust. People who want to speed up python should really look whether numpy helps before complicating their codebase more.

But there are more benefits to those bindings besides performance so its really nice to see the expanding options



Numpy is often faster because it’s often using highly optimized simd, or makes use of BLAS/Fortran/LAPACK/MKL/CuBLAS implementations.

A pure rust implementation will likely always be slower by virtue of not using the same tightly designed optimized code.

Side note: A fun implementation detail of numpy is that after you install it from pypi, it does a user side compile of some of the modules on first import. Which means you need to be somewhat careful if you ever relocate an install of it to a new machine


And note that for example on Apple M1 it's essentially impossible to beat an implementation that uses Apple's Accelerate library for things like matrix multiplication, because Apple uses undocumented instructions unavailable to the public in that library.


  Deprecated since version 1.20: The native libraries on macOS, provided by Accelerate, are not fit for use in NumPy since they have bugs that cause wrong output under easily reproducible conditions. If the vendor fixes those bugs, the library could be reinstated, but until then users compiling for themselves should use another linear algebra library or use the built-in (but slower) default, see the next section.
Source: https://numpy.org/doc/stable/user/building.html


Added back the very next release

https://numpy.org/doc/stable/release/1.21.0-notes.html

> With the release of macOS 11.3, several different issues that numpy was encountering when using Accelerate Framework’s implementation of BLAS and LAPACK should be resolved.


If numpy uses runtime detection of available SIMD instructions while rust is only compiled with the x86-64 baseline (which only includes SSE2) then compiling the module with `RUSTFLAGS=-Ctarget-cpu=native` might provide some additional performance gains on number-crunchy code.


Interesting! Using what compiler?


I believe it looks to see what’s available and otherwise falls back to less efficient implementations.


The diagram shows Numpy (orange line) below "Rumpy" (blue line). Since the y axis is time, less is more, so Numpy is faster indeed.


I had to do a double take. The Rust implementation is slower and harder to maintain. I recommend adding a cythonized function and a numba jitted function to the benchmark for completeness.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: