What are the most affordable GPUs that will run this? (it said it needs CUDA, min 11GB VRAM, so I guess my relatively puny 4GB 570RX isn't going to cut it!)
Can you imagine only being able to cook a hamburger on one brand of grill? But you can make something kinda similar in the toaster oven you can afford?
I want to be productive on this comment… but the crypto/cuda nexus of GPU work is simply not rational. Why are we still here?
You want to work in this field? Step 1. Buy an NVIDIA gpu. Step 2. CUDA. Step 3. Haha good luck, not available to purchase.
This situation is so crazy. My crappiest computer is way better at AI, just because I did an intel/nvidia build.
I don’t hate NVIDIA for innovating. The stagnation and risk of monopoly setting us back for unnecessary generations makes me a bit miffed.
So. To attempt to be productive here, what am I not seeing?
> Can you imagine only being able to cook a hamburger on one brand of grill? [...] the crypto/cuda nexus of GPU work is simply not rational. Why are we still here?
Because nvidia spent a long time chasing the market and spending resources, like they wanted it.
You wanted to learn about GPU compute in 2012? Here's a free udacity course sponsored by nvidia, complete with an online runtime backed by cloud GPUs so you can run the exercises straight from your browser.
You're building a deep learning framework? Here's a free library of accelerated primitives, and a developer who'll integrate it into your open source framework and update it as needed.
OpenCL, in contrast, behaves as if every member of the consortium is hoping some other member will bear these costs - as if they don't really want to be in the GPU compute business, except out of a begrudging need for feature parity.
And in terms of being rational - if you're skilled enough to be able to add support for a new GPU vendor into an ML library, you're probably paid enough that the price of a midrange nvidia GPU is trivial in comparison.
All is not lost, though - vendors like Intel are increasingly offering ML acceleration libraries [1] and most neural networks can be exported from one framework and imported into another.
Because innovating in the hardware space is just a lot more expensive and slow.
Also the vast majority of ML researchers and engineers are not system programmers. They don't care about vendor lock because they're not the ones writing the drivers.
I really want these ML libraries to get smarter with use of VRAM. Just cos I don't have enough VRAM shouldn't stop me computing the answer. It should just transfer in the necessary data from system ram as needed. Sure, it'll be slower, but I'd prefer to wait 1 minute for my answer rather than getting an error.
And if I don't have enough system RAM, bring it in from a file on disk.
Tha majority of the ram is consumed by big weights matrices, so the framework knows exactly which bits of data are needed when and in what order, so should be able to do efficient streaming data transfers to have all the data in the right place at the right time. It would be far more efficient than 'swap files' that don't know ahead of time what data will be needed so impact performance severely.
If it doesn't fit in the GPU memory, it will often be faster to just compute everything on CPU. So the framework doesn't really need to be smart. And the framework can easily already compute everything on CPU. So it can do what you want already.
Using RAM to hold unneeded layers was one of the first optimizations made for Stable Diffusion. The AUTOMATIC1111 WebUI implements it, not sure about others.
If this was all packaged into a desktop app (e.g. Tauri or electron) how big would the app be? I'd imagine you could get it down to < 500MB (even if you packaged miniconda with it).
> I'd imagine you could get it down to < 500MB (even if you packaged miniconda with it).
I don't know where that imagined number came from. This tool appears to be using Stable Diffusion, and the base Stable Diffusion model is 4 or 5 gigabytes by itself. I think there are some other models that are necessary to use the base Stable Diffusion model, and while they are smaller, they still add to the total size.
On the strength of this HN submission I just ordered an RTX 3060 12GB card for £381 on Amazon so I can run this and future AI models.
This stuff is fascinating, and @bryced's imaginAIry project made it accessible to people like me who never had any formal training in machine learning.
For what it's worth, it ran fine on my 2070 (8GB of VRAM), even with the GPU being used to render my desktop (Windows), which used another ~800MB of VRAM. I was running it under WSL, which also worked fine.
Note the level of investment that NVIDIA's software team has here: they have a separate WSL-Ubuntu installation method that takes care not to overwrite Windows drivers but installs the CUDA toolkit anyway. I expected this to be a niche, brittle process, but it was very well supported.