What are the most affordable GPUs that will run this? (it said it needs CUDA, mi...

b33j0r · on Jan 22, 2023

Can you imagine only being able to cook a hamburger on one brand of grill? But you can make something kinda similar in the toaster oven you can afford?

I want to be productive on this comment… but the crypto/cuda nexus of GPU work is simply not rational. Why are we still here?

You want to work in this field? Step 1. Buy an NVIDIA gpu. Step 2. CUDA. Step 3. Haha good luck, not available to purchase.

This situation is so crazy. My crappiest computer is way better at AI, just because I did an intel/nvidia build.

I don’t hate NVIDIA for innovating. The stagnation and risk of monopoly setting us back for unnecessary generations makes me a bit miffed.

So. To attempt to be productive here, what am I not seeing?

michaelt · on Jan 23, 2023

> Can you imagine only being able to cook a hamburger on one brand of grill? [...] the crypto/cuda nexus of GPU work is simply not rational. Why are we still here?

Because nvidia spent a long time chasing the market and spending resources, like they wanted it.

You wanted to learn about GPU compute in 2012? Here's a free udacity course sponsored by nvidia, complete with an online runtime backed by cloud GPUs so you can run the exercises straight from your browser.

You're building a deep learning framework? Here's a free library of accelerated primitives, and a developer who'll integrate it into your open source framework and update it as needed.

OpenCL, in contrast, behaves as if every member of the consortium is hoping some other member will bear these costs - as if they don't really want to be in the GPU compute business, except out of a begrudging need for feature parity.

And in terms of being rational - if you're skilled enough to be able to add support for a new GPU vendor into an ML library, you're probably paid enough that the price of a midrange nvidia GPU is trivial in comparison.

All is not lost, though - vendors like Intel are increasingly offering ML acceleration libraries [1] and most neural networks can be exported from one framework and imported into another.

[1] https://www.intel.com/content/www/us/en/developer/tools/onea...

dist1ll · on Jan 22, 2023

Because innovating in the hardware space is just a lot more expensive and slow.

Also the vast majority of ML researchers and engineers are not system programmers. They don't care about vendor lock because they're not the ones writing the drivers.

dannyw · on Jan 23, 2023

Because:

1. It's just not a huge deal to many people. Most people who want to do local ML training and inference can just buy a NVIDIA GPU.

2. AMD only has a skeleton team working on their solution. It's clear it's not a focus.

ColonelPhantom · on Jan 22, 2023

The cheapest NVidia GPU with 11+GB VRAM is probably the 2060 12GB, although the 3060 12GB would be a better choice.

The setup.py file seems to indicate that PyTorch is used, which I think can also run on AMD GPUs, provided you are on Linux.

pifm_guy · on Jan 22, 2023

I really want these ML libraries to get smarter with use of VRAM. Just cos I don't have enough VRAM shouldn't stop me computing the answer. It should just transfer in the necessary data from system ram as needed. Sure, it'll be slower, but I'd prefer to wait 1 minute for my answer rather than getting an error.

And if I don't have enough system RAM, bring it in from a file on disk.

Tha majority of the ram is consumed by big weights matrices, so the framework knows exactly which bits of data are needed when and in what order, so should be able to do efficient streaming data transfers to have all the data in the right place at the right time. It would be far more efficient than 'swap files' that don't know ahead of time what data will be needed so impact performance severely.

albertzeyer · on Jan 22, 2023

If it doesn't fit in the GPU memory, it will often be faster to just compute everything on CPU. So the framework doesn't really need to be smart. And the framework can easily already compute everything on CPU. So it can do what you want already.

MrNeon · on Jan 22, 2023

Using RAM to hold unneeded layers was one of the first optimizations made for Stable Diffusion. The AUTOMATIC1111 WebUI implements it, not sure about others.

smallerfish · on Jan 22, 2023

It works fine on CPUs. Takes about a minute to generate images on my 8 core i7 desktop.

cbeach · on Jan 22, 2023

How do I get it to use CPU rather than GPU, please? I have 128GB of RAM but a lowly 2GB of VRAM in my PC.

5e92cb50239222b · on Jan 22, 2023

Try setting the CUDA_VISIBLE_DEVICES environment variable to ''.

cbeach · on Jan 22, 2023

That worked! Thanks.

Der_Einzige · on Jan 23, 2023

Hopefully one can utilize a highly intel (or AMD) optimized stack, such as intel version of pytorch to make this run even faster

bryced · on Jan 22, 2023

I'm running on a 2080 TI and an edit runs in 2 seconds. On my Apple M1 Max 32Gb edits take about 60 seconds.

mritchie712 · on Jan 22, 2023

If this was all packaged into a desktop app (e.g. Tauri or electron) how big would the app be? I'd imagine you could get it down to < 500MB (even if you packaged miniconda with it).

coder543 · on Jan 22, 2023

> I'd imagine you could get it down to < 500MB (even if you packaged miniconda with it).

I don't know where that imagined number came from. This tool appears to be using Stable Diffusion, and the base Stable Diffusion model is 4 or 5 gigabytes by itself. I think there are some other models that are necessary to use the base Stable Diffusion model, and while they are smaller, they still add to the total size.

kadoban · on Jan 22, 2023

Whatever 3060 variant that has the most VRAM is probably your best shot these days.

cbeach · on Jan 22, 2023

On the strength of this HN submission I just ordered an RTX 3060 12GB card for £381 on Amazon so I can run this and future AI models.

This stuff is fascinating, and @bryced's imaginAIry project made it accessible to people like me who never had any formal training in machine learning.

singhrac · on Jan 23, 2023

For what it's worth, it ran fine on my 2070 (8GB of VRAM), even with the GPU being used to render my desktop (Windows), which used another ~800MB of VRAM. I was running it under WSL, which also worked fine.

Note the level of investment that NVIDIA's software team has here: they have a separate WSL-Ubuntu installation method that takes care not to overwrite Windows drivers but installs the CUDA toolkit anyway. I expected this to be a niche, brittle process, but it was very well supported.

CptanPanic · on Jan 24, 2023

Google colab free acount, you get access to 15GB vRAM T4 GPU, or Kaggle which gives you access to 2xT4's, or one P100 GPU.