I really want these ML libraries to get smarter with use of VRAM. Just cos I don't have enough VRAM shouldn't stop me computing the answer. It should just transfer in the necessary data from system ram as needed. Sure, it'll be slower, but I'd prefer to wait 1 minute for my answer rather than getting an error.
And if I don't have enough system RAM, bring it in from a file on disk.
Tha majority of the ram is consumed by big weights matrices, so the framework knows exactly which bits of data are needed when and in what order, so should be able to do efficient streaming data transfers to have all the data in the right place at the right time. It would be far more efficient than 'swap files' that don't know ahead of time what data will be needed so impact performance severely.
If it doesn't fit in the GPU memory, it will often be faster to just compute everything on CPU. So the framework doesn't really need to be smart. And the framework can easily already compute everything on CPU. So it can do what you want already.
Using RAM to hold unneeded layers was one of the first optimizations made for Stable Diffusion. The AUTOMATIC1111 WebUI implements it, not sure about others.
The setup.py file seems to indicate that PyTorch is used, which I think can also run on AMD GPUs, provided you are on Linux.