The cheapest NVidia GPU with 11+GB VRAM is probably the 2060 12GB, although the ...

pifm_guy · on Jan 22, 2023

I really want these ML libraries to get smarter with use of VRAM. Just cos I don't have enough VRAM shouldn't stop me computing the answer. It should just transfer in the necessary data from system ram as needed. Sure, it'll be slower, but I'd prefer to wait 1 minute for my answer rather than getting an error.

And if I don't have enough system RAM, bring it in from a file on disk.

Tha majority of the ram is consumed by big weights matrices, so the framework knows exactly which bits of data are needed when and in what order, so should be able to do efficient streaming data transfers to have all the data in the right place at the right time. It would be far more efficient than 'swap files' that don't know ahead of time what data will be needed so impact performance severely.

albertzeyer · on Jan 22, 2023

If it doesn't fit in the GPU memory, it will often be faster to just compute everything on CPU. So the framework doesn't really need to be smart. And the framework can easily already compute everything on CPU. So it can do what you want already.

MrNeon · on Jan 22, 2023

Using RAM to hold unneeded layers was one of the first optimizations made for Stable Diffusion. The AUTOMATIC1111 WebUI implements it, not sure about others.