What about reasonably fast but smallest code, for running on a microcontroller? Anything signifactly better in terms of compiled size (including lookups)?
If you compress the table (see my earlier comment) and use plain Schubfach then you can get really small binary size and decent perf. IIRC Dragonbox with the compressed table was ~30% slower which is a reasonable price to pay and still faster than most algorithms including Ryu.
Note that ~3-6ns is on modern desktop CPUs where extra few kB matter less. On microcontrollers it will be larger in absolute terms but I would expect the relative difference to also be moderate.